Steps

Install promptfoo: npm install -g promptfoo or npx promptfoo@latest
Create a promptfooconfig.yaml file defining providers (LLM endpoints), prompts, and test cases with assert blocks specifying expected behavior (e.g., type: contains, value: 'expected phrase')
Run evals locally to verify configuration: npx promptfoo eval — results are shown in the terminal and saved to output files
Add a CI step to your pipeline (GitHub Actions, GitLab CI, etc.) that runs npx promptfoo eval --output results.json --output-format json
Fail the CI job based on results: use npx promptfoo eval --pass-rate-threshold 0.9 to fail if fewer than 90% of test cases pass, or parse results.json with a script checking the passRate field
Optionally run npx promptfoo view to open an HTML report of results, or push results to a shared promptfoo cloud account for team review

Known gotchas

Each assert type (contains, regex, llm-rubric, python, etc.) has different performance characteristics — llm-rubric assertions call an LLM judge on every test case and can significantly increase eval cost and latency in CI
Provider API keys must be available as environment variables in the CI environment; promptfoo reads them from the environment at eval time, not from the config file — missing keys cause all tests for that provider to fail
promptfoo caches LLM responses by default; in CI, set PROMPTFOO_CACHE=0 or --no-cache to ensure fresh responses rather than serving cached results from a previous run

www.promptfoo.dev · 6 steps · unrated

Build a promptfoo eval config that tests prompts across multiple providers with assertions, then run and review results

promptfoo.dev · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Gate CI on LLM evals with promptfoo

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?