Define an SLO spec in Sloth YAML format with slo.service, slo.slos[].name, slo.slos[].objective (e.g., 99.9), and slo.slos[].sli.events specifying error and total metric selectors
Run sloth generate -i slo-spec.yaml -o slo-rules.yaml to output Prometheus recording rules and multi-window multi-burn-rate alerting rules conforming to the Google SRE Workbook alert method
Apply the generated rules to Prometheus via your Kubernetes PrometheusRule CRD or by adding to prometheus.rules config; rules create metrics like slo:sli_error:ratio_rate5m and slo:burnrate5m
The generated alerts come in page and ticket severity pairs: a page alert fires when the short window AND long window both exceed the burn-rate threshold (e.g., 14.4x budget consumed in 1h/5m); ticket alert fires at lower thresholds (e.g., 6x in 6h/30m)
Wire the page-severity alert to your PagerDuty or Opsgenie integration via Alertmanager routes using the severity label set by Sloth
Validate burn rate calculations: at a 14.4x burn rate for a 99.9% SLO, the error budget would exhaust in approximately 1 hour — use this as a sanity check for threshold tuning
Known gotchas
Multi-window alerts require ALL windows to be simultaneously breached — a single short burst that recovers before the long window fires will not page; this is by design to reduce false positives but can miss fast-recovering issues
Sloth-generated recording rules must be applied before alert rules, as alert rules reference the recording rule metrics; applying them in the wrong order causes no-data alerts
OpenSLO YAML format is supported by Sloth as an alternative to Sloth's native format; if adopting OpenSLO for portability, verify the version of Sloth supports it as format support has varied across releases
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp