Add the tailsamplingprocessor (from otelcol-contrib) to the processors section; set decision_wait to a duration long enough for all spans of the longest expected trace to arrive (e.g., 30s)
Define an always_sample policy for error traces using a composite policy with a status_code sub-policy matching ERROR, ensuring no failing trace is ever dropped
Add a latency policy to retain traces where any span exceeds a latency threshold (e.g., threshold_ms: 2000) to capture slow outliers even when overall error rate is low
Define a probabilistic policy at a low rate (e.g., sampling_percentage: 5) to retain a representative baseline of healthy, fast traces for traffic pattern analysis
Combine policies using a composite policy with the and_policy_eval type to express complex rules such as: retain traces from a specific service AND that contain an error
Set num_traces (the in-memory buffer size) based on expected trace throughput and decision_wait duration; too small a buffer causes early eviction and incomplete sampling decisions
Known gotchas
All spans for a given trace must arrive at the same Collector instance for tail sampling to work; deploy the loadbalancingexporter on a frontend tier to route by trace ID before the tail sampling tier
decision_wait starts from the first span seen for a trace; if spans arrive late (e.g., async Kafka consumers), increase decision_wait but be aware this linearly increases memory consumption
Policies are evaluated in order and the first matching policy determines the sampling decision; place the always_sample error policy first to guarantee error traces are never dropped by a later probabilistic policy
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp