Ensure your KServe cluster is running in serverless deployment mode, which is required for the canary rollout strategy
Deploy the initial InferenceService with the production model version and verify it receives 100% traffic
Apply an updated InferenceService manifest with the new model version specification and add the canaryTrafficPercent field set to the desired percentage of traffic for the new version (e.g., 20)
KServe automatically tracks the last good revision at 100% traffic and splits incoming requests between it and the new revision according to canaryTrafficPercent
Monitor inference metrics and error rates for both the production and canary revisions; use Prometheus or your observability stack to compare
If the canary performs well, promote it by removing the canaryTrafficPercent field (routing 100% to the new version); if it fails, set canaryTrafficPercent to 0 to roll back
Known gotchas
Canary rollout is only supported in serverless deployment mode (Knative-based); attempting to use canaryTrafficPercent in raw deployment mode has no effect and all traffic goes to the new version immediately
Setting canaryTrafficPercent to 0 does not delete the canary revision — it continues to exist in the cluster and consume minimal resources until the InferenceService is updated or deleted
The traffic split is enforced by Knative's revision routing, not by KServe directly; misconfigured Knative networking can cause the split to not behave as specified without surfacing an obvious error
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp