vllm.v1.attention.backends.mla.prefill ¶
Modules:
| Name | Description |
|---|---|
base | Abstract base class for MLA prefill backends. |
flash_attn | FlashAttention backend for MLA prefill. |
flashinfer | FlashInfer backend for MLA prefill. |
registry | Registry for MLA prefill backends. |
selector | Selector for MLA prefill backends. |
tokenspeed_mla | TokenSpeed CuTe DSL backend for MLA prefill. |
trtllm_ragged | TRT-LLM Ragged backend for MLA prefill. |
MLAPrefillBackend ¶
Bases: ABC
Abstract base class for MLA prefill backends.
Source code in vllm/v1/attention/backends/mla/prefill/base.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | |
prepare_metadata ¶
prepare_metadata(
prefill_metadata: MLACommonPrefillMetadata,
) -> None
Prepare backend-specific metadata before the forward pass.
Called by the metadata builder after constructing the prefill metadata.
Source code in vllm/v1/attention/backends/mla/prefill/base.py
MLAPrefillBackendEnum ¶
Bases: Enum
Enumeration of all supported MLA prefill backends.
Source code in vllm/v1/attention/backends/mla/prefill/registry.py
clear_override ¶
get_class ¶
get_class() -> type[MLAPrefillBackend]
Get the backend class (respects overrides).
Returns:
| Type | Description |
|---|---|
type[MLAPrefillBackend] | The backend class |
Raises:
| Type | Description |
|---|---|
ImportError | If the backend class cannot be imported |
ValueError | If CUSTOM is used without being registered |
Source code in vllm/v1/attention/backends/mla/prefill/registry.py
get_path ¶
get_path() -> str
Get the class path for this backend (respects overrides).
Returns:
| Type | Description |
|---|---|
str | The fully qualified class path string |
Raises:
| Type | Description |
|---|---|
ValueError | If Backend.CUSTOM is used without being registered |
Source code in vllm/v1/attention/backends/mla/prefill/registry.py
get_mla_prefill_backend ¶
get_mla_prefill_backend(
vllm_config: VllmConfig,
) -> type[MLAPrefillBackend]
Select the MLA prefill backend based on configuration and device.
This function first checks for explicit user preferences via mla_prefill_backend in AttentionConfig, then falls back to automatic priority-based selection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vllm_config | VllmConfig | The vLLM configuration. | required |
Returns:
| Type | Description |
|---|---|
type[MLAPrefillBackend] | The selected prefill backend class. |
Source code in vllm/v1/attention/backends/mla/prefill/selector.py
register_mla_prefill_backend ¶
register_mla_prefill_backend(
backend: MLAPrefillBackendEnum,
class_path: str | None = None,
) -> Callable[[type], type]
Register or override an MLA prefill backend implementation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
backend | MLAPrefillBackendEnum | The MLAPrefillBackendEnum member to register. | required |
class_path | str | None | Optional class path. If not provided and used as decorator, will be auto-generated from the class. | None |
Returns:
| Type | Description |
|---|---|
Callable[[type], type] | Decorator function if class_path is None, otherwise a no-op. |
Examples:
Override an existing MLA prefill backend¶
@register_mla_prefill_backend(MLAPrefillBackendEnum.FLASH_ATTN) class MyCustomFlashAttn(MLAPrefillBackend): ...
Register a custom third-party MLA prefill backend¶
@register_mla_prefill_backend(MLAPrefillBackendEnum.CUSTOM) class MyCustomPrefillBackend(MLAPrefillBackend): ...
Direct registration¶
register_mla_prefill_backend( MLAPrefillBackendEnum.CUSTOM, "my.module.MyCustomPrefillBackend" )