Steps

Authenticate with your OpenAI API key and confirm your organization has access to the Evals API
Define a data_source_config object that specifies the schema of your test data (fields for prompt and expected output)
Define a testing_criteria array specifying one or more grader objects, such as a model-graded criterion with a scoring rubric
POST to the /v1/evals endpoint to create the eval configuration and capture the returned eval_id
POST to /v1/evals/{eval_id}/runs to launch a run against your data source, passing the run configuration
Poll the run status and retrieve per-sample results once the run reaches a terminal state

Known gotchas

The OpenAI Evals platform is scheduled to become read-only for existing users in late 2026 and shut down thereafter — build new pipelines with this timeline in mind
The data_source_config schema must match the field names referenced in your testing_criteria exactly; schema mismatches cause run failures with opaque error messages
Model-graded criteria incur additional token costs on top of the test data inference costs; budget accordingly for large eval sets

openai.com · 4 steps · unrated

Run evals with LangSmith

docs.langchain.com · 6 steps · unrated

Create a grade item and post a student's score via the Brightspace Valence Grades API

education · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Create and run an OpenAI Evals API evaluation with a custom grader

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?