Install Phoenix and the OpenTelemetry instrumentation: pip install arize-phoenix opentelemetry-sdk arize-phoenix-evals
Start the Phoenix server locally: python -m phoenix.server.main or use the hosted Arize Phoenix cloud; the UI is available at http://localhost:6006
Instrument the LLM application with OpenTelemetry auto-instrumentation: from phoenix.otel import register; tracer_provider = register(project_name='my-project', endpoint='http://localhost:6006/v1/traces') — this captures spans automatically for supported frameworks like LangChain or LlamaIndex
Run the application under load to collect traces; view spans in the Phoenix UI under the Traces tab
Run evaluations against collected traces: from phoenix.evals import llm_classify; results = llm_classify(dataframe=traces_df, template=HALLUCINATION_PROMPT_TEMPLATE, model=eval_model, rails=['hallucinated', 'factual'])
Attach evaluation scores back to spans: from phoenix.trace import SpanEvaluations; px.Client().log_evaluations(SpanEvaluations(eval_name='hallucination', dataframe=results))
Known gotchas
Auto-instrumentation patches framework internals at import time; the register() call must happen before importing LangChain, OpenAI, or other instrumented libraries, otherwise spans are not captured
Phoenix stores traces in-memory by default; for persistence across server restarts configure a PostgreSQL or SQLite backend by setting the PHOENIX_SQL_DATABASE_URL environment variable
llm_classify sends each trace to an LLM judge for scoring; rate limits on the judge model can cause evaluations of large trace sets to take a long time or fail — use concurrency and retry parameters in the API call
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp