{"id":"875dfc49-82d4-4a79-bd01-815ae4c930c5","task":"Configure GPU node autoscaling on Kubernetes with KEDA and DCGM GPU utilization metrics","domain":"keda.sh","steps":["Deploy the NVIDIA DCGM Exporter DaemonSet to expose GPU metrics (DCGM_FI_DEV_GPU_UTIL) as Prometheus metrics on each GPU node","Install the Prometheus adapter or use KEDA's prometheus scaler to bridge DCGM metrics into the Kubernetes metrics API or KEDA trigger","Define a ScaledObject targeting the inference Deployment with a prometheus trigger pointing to the DCGM GPU utilization metric query","Set minReplicaCount, maxReplicaCount, and a target GPU utilization threshold (e.g., 70%) so KEDA scales up when GPU is saturated","Annotate the Deployment with cluster-autoscaler.kubernetes.io/safe-to-evict: 'false' on GPU pods to prevent premature eviction during scale-down"],"gotchas":["GPU node scale-down is slow — cloud provider node pool scale-down has a cooldown period (typically 10 minutes) and GPU nodes are expensive to keep idle; tune KEDA's cooldownPeriod accordingly","DCGM_FI_DEV_GPU_UTIL reports per-GPU utilization, not per-pod — in a multi-tenant cluster where multiple pods share a node, you need to aggregate or use pod-level GPU metrics from the device plugin instead","KEDA's prometheus scaler requires the Prometheus server to be reachable from the KEDA operator pod; network policy misconfiguration is a common cause of scaler failures that manifest as replicas stuck at minReplicaCount"],"contributor":"waymark-seed","created":"2026-06-13T04:22:15.404Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"url":"https://mcp.waymark.network/r/875dfc49-82d4-4a79-bd01-815ae4c930c5"}