vllm.utils.numa_utils ¶
NUMA binding utilities for vLLM worker processes.
Adapted in part from SGLang's NUMA helper implementation: https://github.com/sgl-project/sglang/blob/ba6d54d0f08f82f42b8224908ae2459a496b31b3/python/sglang/srt/utils/numa_utils.py
_PctSku ¶
_can_set_mempolicy ¶
_can_set_mempolicy() -> bool
Check whether the current process can use NUMA memory policy syscalls.
Source code in vllm/utils/numa_utils.py
_get_cpu_binding ¶
Return the CPU list a process should be pinned to (or None).
Source code in vllm/utils/numa_utils.py
_get_enginecore_numa_nodes ¶
Return the sorted, unique NUMA nodes of the EngineCore's DP shard.
Source code in vllm/utils/numa_utils.py
_get_gpu_index ¶
Compute the physical GPU index used for NUMA lookup.
Source code in vllm/utils/numa_utils.py
_get_numactl_enginecore_args ¶
_get_numactl_enginecore_args(
parallel_config,
local_rank: int,
dp_local_rank: int | None = None,
) -> str
Compute the numactl args for an EngineCore subprocess.
--numa-bind-cpus is deliberately ignored here: the user provides a per-worker CPU list, and binding EngineCore to any of those entries would shrink its cpus_allowed below the strict-superset that the workers' --physcpubind spawns require. We fall back to --cpunodebind=<shard nodes> instead, which is always a safe superset. PCT auto-detection still applies when the user did not pass --numa-bind-cpus (its priority-core union across the shard nodes is also a safe superset by construction).
Source code in vllm/utils/numa_utils.py
_get_numactl_executable ¶
Return the fixed wrapper executable used to launch numactl.
Source code in vllm/utils/numa_utils.py
_get_numactl_worker_args ¶
_get_numactl_worker_args(
parallel_config,
local_rank: int,
dp_local_rank: int | None = None,
) -> str
Compute the numactl args for a single TP/PP worker subprocess.
Source code in vllm/utils/numa_utils.py
_is_auto_numa_available ¶
_is_auto_numa_available() -> bool
Check whether automatic GPU-to-NUMA detection should be attempted.
Source code in vllm/utils/numa_utils.py
_maybe_get_pct_cpu_binding ¶
Return the union of PCT priority cores across numa_nodes (or None).
PCT (Priority Core Turbo) lets a subset of cores boost above the rest; we want workers and the EngineCore on those cores. The Linux kernel does not expose PCT membership without root, so we use the empirical heuristic documented above _PCT_CAPABLE_SKUS: priority cores within each NUMA node satisfy cpu_id % stride in (0, 1) intersected with the node's cpulist, where stride is the SKU's logical CPUs per priority group (16 on 64-core SKUs, 18 on 72-core SKUs). Only triggers on the SKUs in _PCT_CAPABLE_SKUS with the expected CPPC highest_perf signal; on any other host it returns None and the caller falls back to the default NUMA-node bind.
Returns the sorted CPU ids as a list[int]; the caller is expected to format them for the chosen tool (e.g. comma-joined for numactl --physcpubind).
Source code in vllm/utils/numa_utils.py
_pct_sku_config cached ¶
_pct_sku_config() -> _PctSku | None
Detect a PCT-capable Granite Rapids Xeon with PCT enabled.
See the comment block above _PCT_CAPABLE_SKUS for the full context (why we hard-code SKUs, why we read CPPC highest_perf, etc.).
Returns the matching _PctSku config when both gates hold: * /proc/cpuinfo model name contains an SKU listed in _PCT_CAPABLE_SKUS. * /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf matches that SKU's expected highest_perf. Otherwise returns None and the caller falls back to the default NUMA-node bind.
Source code in vllm/utils/numa_utils.py
_pct_sku_from_cpuinfo ¶
_pct_sku_from_cpuinfo() -> _PctSku | None
Return the _PctSku config for this host's SKU, or None.
Reads /proc/cpuinfo's model name and looks the SKU up in _PCT_CAPABLE_SKUS. Returns None when the host is not a known PCT-capable Granite Rapids Xeon (or when /proc/cpuinfo is unreadable).
Source code in vllm/utils/numa_utils.py
configure_subprocess ¶
configure_subprocess(
vllm_config: VllmConfig,
local_rank: int,
dp_local_rank: int | None = None,
process_kind: str = "worker",
)
Temporarily replace the multiprocessing executable with a numactl wrapper.
Source code in vllm/utils/numa_utils.py
get_auto_numa_nodes cached ¶
Auto-detect NUMA nodes for all visible GPUs.