Ensure the application emits a Prometheus histogram metric (not a summary) for request latency; histograms expose _bucket, _count, and _sum series that Prometheus can use for arbitrary percentile computation at query time.
Define the SLO target for p99 latency, e.g., 99% of requests must complete in under 500ms over a 30-day window; express this as the fraction of requests in the ≤500ms buckets over total requests.
Write a recording rule that computes the SLI as a ratio: the good events are requests that fall within the latency threshold (sum of _bucket series with le label ≤ threshold), and total events are all requests (_count series); use histogram_fraction() if available or the bucket sum divided by _count.
Create a recording rule for the error ratio (1 - SLI) and use it in a multi-burn-rate alerting rule set following the same fast/slow burn pattern as availability SLOs.
For more accurate high-percentile computation, configure the application SDK to emit native histograms (exponential histograms in OTel, or enable --enable-feature=native-histograms on Prometheus); native histograms eliminate bucket misconfiguration as a source of error.
Visualize the SLO on a Grafana dashboard using histogram_quantile(0.99, ...) in PromQL for real-time p99, and the recording-rule SLI ratio for error budget tracking; show both views to distinguish instantaneous latency from SLO compliance.
Known gotchas
histogram_quantile() in Prometheus assumes a uniform distribution within each bucket; if the target threshold falls between two bucket boundaries, the computed fraction is an approximation that can under- or overcount by up to one bucket's worth of requests.
Summary metrics (as opposed to histograms) are pre-aggregated on the client side and cannot be re-aggregated across multiple instances; use histograms for SLO computation across horizontally scaled services.
Very sparse histograms (few requests per evaluation interval) produce noisy SLI ratios that oscillate between 0 and 1; apply a minimum request volume condition (e.g., only alert when _count rate exceeds a threshold) to suppress alerts during low-traffic windows.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp