vllm.model_executor.kernels.linear.nvfp4.flashinfer ¶
FlashInferB12xNvFp4LinearKernel ¶
Bases: NvFp4LinearKernel
NVFP4 GEMM via FlashInfer's b12x CuTe DSL warp-level MMA kernel (SM120+).
Source code in vllm/model_executor/kernels/linear/nvfp4/flashinfer.py
FlashInferCudnnNvFp4LinearKernel ¶
Bases: NvFp4LinearKernel
NVFP4 GEMM via FlashInfer's cuDNN wrapper.
Source code in vllm/model_executor/kernels/linear/nvfp4/flashinfer.py
FlashInferCutlassNvFp4LinearKernel ¶
Bases: NvFp4LinearKernel
NVFP4 GEMM via FlashInfer's CUTLASS wrapper.
Source code in vllm/model_executor/kernels/linear/nvfp4/flashinfer.py
FlashInferTrtllmNvFp4LinearKernel ¶
Bases: NvFp4LinearKernel
NVFP4 GEMM via FlashInfer's TensorRT-LLM wrapper.