vllm.models.deepseek_v4.nvidia.ops ¶

NVIDIA-only (cutedsl/cutlass) kernels for DeepSeek V4.

These modules import cutlass/cutedsl at module top level, so they must not be imported on non-CUDA platforms. Callers should gate on vllm.utils.import_utils.has_cutedsl() before importing from here.

This __init__ deliberately imports nothing: re-exporting the cutedsl modules here would eagerly import cutlass (initializing the CUDA driver) for anyone who imports vllm.models.deepseek_v4, breaking forked subprocesses. Import the leaf modules directly under a has_cutedsl()/is_cuda() gate.

Modules:

Name	Description
`fused_indexer_q_cutedsl`
`prepare_megamoe`	Triton input-staging kernel for DeepSeek V4 MegaMoE.
`sparse_attn_compress_cutedsl`	CuTe DSL sparse-attention compressor for DeepSeek V4.