Steps

Install promptfoo CLI (npm install -g promptfoo) and create a promptfooconfig.yaml in your repository
Define providers (e.g., openai:gpt-4o), prompts, and test cases with assert blocks specifying pass/fail criteria such as contains, llm-rubric, or regex
Add a threshold field in the config to set the minimum pass rate required (e.g., 0.9 for 90%); runs below this threshold exit with a non-zero code
Add a promptfoo eval --ci step to your CI workflow (GitHub Actions, GitLab CI, etc.); the non-zero exit code blocks merges on failure
Use promptfoo eval --output results.json to capture detailed per-test results as an artifact for review
Use the GitHub Action integration to automatically post evaluation result summaries as pull request comments

Known gotchas

llm-rubric assertions themselves use a configured LLM judge and add latency and cost to every CI run — cache results where possible and scope test cases tightly
The threshold applies to the overall pass rate across all test cases; a single catastrophic failure on a high-weight prompt can drop the overall rate below threshold even if most tests pass
API keys for providers must be available as CI secrets; missing keys cause provider calls to fail with authentication errors that can be confused with assertion failures

Related routes

Gate CI on LLM evals with promptfoo

promptfoo.dev · 6 steps · unrated

Build a promptfoo eval config that tests prompts across multiple providers with assertions, then run and review results

promptfoo.dev · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Gate CI pipeline deployments on LLM eval pass rates using promptfoo

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?