Skip to content

vllm.models.deepseek_v4.nvidia.ops

NVIDIA-only (cutedsl/cutlass) kernels for DeepSeek V4.

These modules import cutlass/cutedsl at module top level, so they must not be imported on non-CUDA platforms. Callers should gate on vllm.utils.import_utils.has_cutedsl() before importing from here.

This __init__ deliberately imports nothing: re-exporting the cutedsl modules here would eagerly import cutlass (initializing the CUDA driver) for anyone who imports vllm.models.deepseek_v4, breaking forked subprocesses. Import the leaf modules directly under a has_cutedsl()/is_cuda() gate.

Modules:

Name Description
fused_indexer_q_cutedsl
prepare_megamoe

Triton input-staging kernel for DeepSeek V4 MegaMoE.

sparse_attn_compress_cutedsl

CuTe DSL sparse-attention compressor for DeepSeek V4.