Integrate a feature flag SDK (e.g., LaunchDarkly, Unleash, or a homegrown flags service) into the model serving layer so each inference request can be evaluated against a flag at runtime
Define a flag with percentage-based rollout rules: route N% of requests (by user ID, session ID, or a random hash) to the canary model and the remainder to the stable model
Deploy both the stable and canary model versions behind the same endpoint or load balancer, ensuring both are healthy and warm before enabling traffic
Enable the flag at a low percentage (e.g., 5%) and monitor canary-specific metrics — latency, error rate, prediction distribution, and business KPIs — using a dimension or label that identifies the canary cohort
Gradually increase the canary traffic percentage as confidence grows; use automated checks or manual gates to halt rollout if canary metrics degrade beyond a defined threshold
Complete the rollout by setting the flag to 100% canary traffic, then retire the stable model version and remove the flag from the code path
Known gotchas
Sticky sessions are critical for user-facing models: if a user's request can be served by either the canary or stable model on different calls, inconsistent outputs cause a poor experience — hash on a stable user identifier rather than a random value per request
Prediction distribution drift between the canary and stable model is a leading indicator of a problem before business KPIs react; log and compare the output distributions in real time rather than waiting for downstream metric degradation
Rolling back by disabling the flag is fast but does not undo side effects of canary predictions (e.g., written recommendations, logged decisions); design the rollback plan to account for any state changes made by the canary model
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp