vllm.v1.worker.encoder_cudagraph_defs ¶
Data transfer objects for encoder CUDA graph management.
EncoderCudaGraphCaptureInputs dataclass ¶
Everything needed for one CUDA graph capture.
Returned by prepare_encoder_cudagraph_capture_inputs().
Source code in vllm/v1/worker/encoder_cudagraph_defs.py
EncoderCudaGraphConfig dataclass ¶
Configuration for encoder CUDA graph management.
Provided by the model at init time via get_encoder_cudagraph_config(). Values are fixed for the lifetime of the manager.
Source code in vllm/v1/worker/encoder_cudagraph_defs.py
buffer_keys instance-attribute ¶
Keys for the tensor buffers recorded into the CUDA graph. Before replay the manager zeros then slice-copies new data into these buffers.
max_frames_per_video class-attribute instance-attribute ¶
max_frames_per_video: int = 1
Maximum number of frames per video. Only relevant when "video" is in modalities. Image-only models can use the default of 1.
EncoderCudaGraphReplayBuffers dataclass ¶
New buffer values for graph replay, computed by the model from actual batch inputs.
Returned by prepare_encoder_cudagraph_replay_buffers(). Keys match EncoderCudaGraphConfig.buffer_keys.
Source code in vllm/v1/worker/encoder_cudagraph_defs.py
EncoderItemSpec dataclass ¶
Description of a single encoder input item.
Returned by get_encoder_cudagraph_item_specs() to describe each image or video in a batch without the manager needing to understand model-specific input formats.