vllm.model_executor.models.config ¶
Gemma4Config ¶
Bases: VerifyAndUpdateConfig
Source code in vllm/model_executor/models/config.py
verify_and_update_config staticmethod ¶
verify_and_update_config(vllm_config: VllmConfig) -> None
Configure attention for heterogeneous head dimensions.
Gemma4 uses different head dimensions for sliding window (head_dim) vs full attention (global_head_dim) layers. The default FA3 on Hopper cannot handle head_dim > 256, which causes mixed backend selection and numerical divergence.
When FA4 is available we force it for ALL layers, giving a uniform kernel path and avoiding the mixed FA3+FA4 penalty. When FA4 is not available we fall back to Triton.
Source code in vllm/model_executor/models/config.py
HybridAttentionMambaModelConfig ¶
Bases: VerifyAndUpdateConfig
Source code in vllm/model_executor/models/config.py
verify_and_update_config classmethod ¶
verify_and_update_config(vllm_config: VllmConfig) -> None
Perform early validation and setup for hybrid attention/mamba models.
Block size alignment with mamba page sizes is handled later by Platform.update_block_size_for_backend(), which runs after model layers are constructed and the attention backend is known.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vllm_config | VllmConfig | vLLM Config | required |
Source code in vllm/model_executor/models/config.py
LlamaNemotronVLConfig ¶
Bases: VerifyAndUpdateConfig
Config handler for LlamaNemotronVL embedding models.
Source code in vllm/model_executor/models/config.py
MambaModelConfig ¶
Bases: VerifyAndUpdateConfig
Source code in vllm/model_executor/models/config.py
verify_and_update_config classmethod ¶
verify_and_update_config(vllm_config: VllmConfig) -> None
Enable FULL_AND_PIECEWISE cuda graph mode by default (required to get good performance for mamba layers in V1).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vllm_config | VllmConfig | vLLM Config | required |
Source code in vllm/model_executor/models/config.py
NemotronHForCausalLMConfig ¶
Bases: VerifyAndUpdateConfig
Source code in vllm/model_executor/models/config.py
DEFAULT_MAMBA_SSM_CACHE_DTYPE class-attribute instance-attribute ¶
Only float32 is known to have no accuracy issues by default.
update_mamba_ssm_cache_dtype classmethod ¶
update_mamba_ssm_cache_dtype(
*,
cache_config: CacheConfig,
hf_config: PretrainedConfig,
) -> None
Update mamba_ssm_cache_dtype for NemotronH models when set to 'auto' (or not explicitly set), to the value specified in the HF config, or to float32 if not specified.
Source code in vllm/model_executor/models/config.py
Qwen3_5ForConditionalGenerationConfig ¶
Bases: VerifyAndUpdateConfig
Source code in vllm/model_executor/models/config.py
verify_and_update_config staticmethod ¶
verify_and_update_config(vllm_config: VllmConfig) -> None
Update mamba_ssm_cache_dtype for Qwen3.5 models when set to 'auto' (or not explicitly set), to the value specified in the HF config's mamba_ssm_dtype field. Warn if the user explicitly overrides it to a different value.