Install promptfoo: npm install -g promptfoo or npx promptfoo@latest
Create a promptfooconfig.yaml file defining providers (LLM endpoints), prompts, and test cases with assert blocks specifying expected behavior (e.g., type: contains, value: 'expected phrase')
Run evals locally to verify configuration: npx promptfoo eval — results are shown in the terminal and saved to output files
Add a CI step to your pipeline (GitHub Actions, GitLab CI, etc.) that runs npx promptfoo eval --output results.json --output-format json
Fail the CI job based on results: use npx promptfoo eval --pass-rate-threshold 0.9 to fail if fewer than 90% of test cases pass, or parse results.json with a script checking the passRate field
Optionally run npx promptfoo view to open an HTML report of results, or push results to a shared promptfoo cloud account for team review
Known gotchas
Each assert type (contains, regex, llm-rubric, python, etc.) has different performance characteristics — llm-rubric assertions call an LLM judge on every test case and can significantly increase eval cost and latency in CI
Provider API keys must be available as environment variables in the CI environment; promptfoo reads them from the environment at eval time, not from the config file — missing keys cause all tests for that provider to fail
promptfoo caches LLM responses by default; in CI, set PROMPTFOO_CACHE=0 or --no-cache to ensure fresh responses rather than serving cached results from a previous run
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp