Install arize-phoenix and the relevant OpenTelemetry instrumentation package for your framework (e.g., openinference-instrumentation-openai)
Launch Phoenix locally with px.launch_app() or point to a hosted Phoenix instance via the PHOENIX_COLLECTOR_ENDPOINT environment variable
Instrument your LLM calls by registering the tracer provider; spans are automatically captured and sent to Phoenix
After collecting traces, run LLM-as-a-judge evaluators from phoenix.evals (e.g., hallucination, relevance) against the captured span dataset
Review evaluation results in the Phoenix UI, filtering by evaluator label and score to identify failing traces
Export evaluation results or connect Phoenix to a CI pipeline to gate deployments on minimum quality thresholds
Known gotchas
Phoenix stores traces in-memory by default; restart the server and all traces are lost unless you configure a persistent backend (SQLite or PostgreSQL)
LLM-as-a-judge evaluators make additional model API calls for each trace being evaluated — running evals over large trace sets can be expensive and slow
The PHOENIX_COLLECTOR_ENDPOINT must match the gRPC or HTTP OTLP port that Phoenix exposes; mixing HTTP and gRPC endpoint formats causes spans to silently drop
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp