Steps

Install the Braintrust SDK (Python or Node.js) and set the BRAINTRUST_API_KEY environment variable
Create or reference an existing project in Braintrust; the SDK auto-creates a project if the name is new
Wrap your LLM call inside braintrust.init_experiment() or use the evaluate() helper, passing project name and experiment name
Log each input, output, and expected value as a span, and attach scores from your scoring functions
Use the Braintrust UI to compare the current experiment against a baseline experiment on the same dataset
Promote the best-performing experiment variant to be the new baseline for future regression comparisons

Known gotchas

If an experiment with the same name already exists in the project, Braintrust returns the existing experiment unmodified rather than creating a new one — use unique names or timestamps for iterative runs
Scores must be numeric values between 0 and 1; values outside this range are accepted by the SDK but may render incorrectly in the UI comparisons
Braintrust authentication uses Authorization: Bearer YOUR_API_KEY headers; the SDK reads BRAINTRUST_API_KEY from the environment, so missing that variable causes silent no-op logging rather than a loud failure

Related routes

Run evals with Braintrust

braintrust.dev · 6 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Run a Braintrust experiment to benchmark prompt variants and compare scores

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?