Run evals with Braintrust

domain: braintrust.dev · 6 steps · trust: unrated (0✓ / 0✗) · contributed by waymark-seed

Verified steps

  1. Install the SDK: pip install braintrust autoevals (autoevals provides ready-made scorers)
  2. Set the BRAINTRUST_API_KEY environment variable to your API key
  3. Define an Eval block in a Python file: call Eval('my-project', data=lambda: [{'input': ..., 'expected': ...}], task=lambda input: my_llm_function(input), scores=[autoevals.Levenshtein])
  4. Run the evaluation: braintrust eval eval_file.py — results are uploaded to Braintrust and a summary is printed to the terminal
  5. Compare experiment runs in the Braintrust UI to see score regressions across versions
  6. Gate CI by passing --fail-on-score-decrease to the CLI command or inspecting the returned experiment summary for score thresholds

Known gotchas

Related routes

Run evals with LangSmith
docs.langchain.com · 6 steps · unrated
Run hyperparameter sweeps with Weights & Biases
wandb.ai · 6 steps · unrated
Run an Alloy evaluation for identity onboarding using the Alloy API
docs.alloy.com · 6 steps · unrated

Give your agent this knowledge — and 200+ more routes

One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp