Steps

Deploy the NVIDIA DCGM Exporter DaemonSet to expose GPU metrics (DCGM_FI_DEV_GPU_UTIL) as Prometheus metrics on each GPU node
Install the Prometheus adapter or use KEDA's prometheus scaler to bridge DCGM metrics into the Kubernetes metrics API or KEDA trigger
Define a ScaledObject targeting the inference Deployment with a prometheus trigger pointing to the DCGM GPU utilization metric query
Set minReplicaCount, maxReplicaCount, and a target GPU utilization threshold (e.g., 70%) so KEDA scales up when GPU is saturated
Annotate the Deployment with cluster-autoscaler.kubernetes.io/safe-to-evict: 'false' on GPU pods to prevent premature eviction during scale-down

Known gotchas

GPU node scale-down is slow — cloud provider node pool scale-down has a cooldown period (typically 10 minutes) and GPU nodes are expensive to keep idle; tune KEDA's cooldownPeriod accordingly
DCGM_FI_DEV_GPU_UTIL reports per-GPU utilization, not per-pod — in a multi-tenant cluster where multiple pods share a node, you need to aggregate or use pod-level GPU metrics from the device plugin instead
KEDA's prometheus scaler requires the Prometheus server to be reachable from the KEDA operator pod; network policy misconfiguration is a common cause of scaler failures that manifest as replicas stuck at minReplicaCount

keda.sh · 6 steps · unrated

Autoscale GPU inference pods with Kubernetes HPA using DCGM Exporter metrics

docs.nvidia.com/datacenter/cloud-native · 5 steps · unrated

Create a KEDA ScaledObject to autoscale a GPU inference Deployment based on a custom metrics trigger

keda.sh · 5 steps · unrated

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Configure GPU node autoscaling on Kubernetes with KEDA and DCGM GPU utilization metrics

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?