Steps

Install KEDA in your Kubernetes cluster and install the NVIDIA DCGM Exporter DaemonSet to expose GPU utilization metrics to Prometheus
Configure a Prometheus ScaledObject in KEDA that references the Prometheus endpoint and defines a trigger based on a DCGM metric query (e.g., DCGM_FI_DEV_GPU_UTIL averaged across nodes)
Set the target value in the trigger to the GPU utilization percentage at which KEDA should add a new pod (e.g., 70), and set minReplicaCount and maxReplicaCount to bound scaling
Set minReplicaCount to 0 if you want scale-to-zero during idle periods; KEDA will scale the deployment back up when the metric exceeds the activation threshold
Deploy your inference workload as a Kubernetes Deployment or StatefulSet that the ScaledObject targets; confirm GPU resource requests are set so the scheduler places pods on GPU nodes
Verify autoscaling behavior by generating inference load and observing KEDA events and pod count changes with kubectl describe scaledobject

Known gotchas

KEDA is built with CGO_ENABLED=0 and cannot read GPU metrics via NVML directly; all GPU telemetry must flow through an external exporter such as DCGM — do not attempt to use NVML-based metrics natively in KEDA
Scale-to-zero with GPU pods incurs longer cold start times than CPU pods because GPU driver initialization and model loading add significant startup overhead; set activation thresholds conservatively
DCGM metrics reflect per-GPU utilization, not per-pod — if multiple inference pods share a node, the metric may trigger scaling even when the bottleneck is a single saturated pod rather than all pods

keda.sh · 5 steps · unrated

Autoscale GPU inference pods with Kubernetes HPA using DCGM Exporter metrics

docs.nvidia.com/datacenter/cloud-native · 5 steps · unrated

Create a KEDA ScaledObject to autoscale a GPU inference Deployment based on a custom metrics trigger

keda.sh · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Configure KEDA to autoscale GPU inference pods on Kubernetes using NVIDIA DCGM Exporter metrics

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?