Define the SLO error rate and target availability as recording rules that compute the ratio of bad events to total events
Calculate burn-rate as the ratio of the current error rate to the acceptable steady-state error rate derived from the SLO target
Create a fast-burn alert using a short window (such as 1 hour) and high burn-rate threshold to catch sudden outages
Create a slow-burn alert using a longer window (such as 6 hours or 3 days) and a lower burn-rate threshold to catch gradual degradation
Require both the short and long windows to be above threshold simultaneously to reduce false positives from transient spikes
Known gotchas
Using only a short window for burn-rate alerts causes alert fatigue from transient spikes that do not materially impact the error budget; always pair with a longer window check
The burn-rate thresholds must be calibrated to the SLO target percentage; using generic values without adjusting for your specific target can produce meaningless alerts
Burn-rate alerting counts events (requests), not time; services with very low traffic may have high apparent burn rates from a small number of errors, requiring minimum-volume guards
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp