Install the Braintrust SDK (Python or Node.js) and set the BRAINTRUST_API_KEY environment variable
Create or reference an existing project in Braintrust; the SDK auto-creates a project if the name is new
Wrap your LLM call inside braintrust.init_experiment() or use the evaluate() helper, passing project name and experiment name
Log each input, output, and expected value as a span, and attach scores from your scoring functions
Use the Braintrust UI to compare the current experiment against a baseline experiment on the same dataset
Promote the best-performing experiment variant to be the new baseline for future regression comparisons
Known gotchas
If an experiment with the same name already exists in the project, Braintrust returns the existing experiment unmodified rather than creating a new one — use unique names or timestamps for iterative runs
Scores must be numeric values between 0 and 1; values outside this range are accepted by the SDK but may render incorrectly in the UI comparisons
Braintrust authentication uses Authorization: Bearer YOUR_API_KEY headers; the SDK reads BRAINTRUST_API_KEY from the environment, so missing that variable causes silent no-op logging rather than a loud failure
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp