Initialize a Great Expectations project (great_expectations init) to create the project directory and configure a Data Context; define a Datasource pointing to your warehouse (Snowflake, BigQuery, Spark, or Pandas) using the appropriate connection configuration.
Create an Expectation Suite for each critical dataset: use the interactive notebook workflow or the Python API to add expectations (expect_column_values_to_not_be_null, expect_column_values_to_be_between, expect_table_row_count_to_be_between, etc.) informed by profiling the reference dataset.
Define a Checkpoint that pairs the Expectation Suite with a Batch Request (a query or table reference) and one or more Actions: UpdateDataDocsAction to regenerate HTML docs, SlackNotificationAction or a custom action to alert on failure.
Integrate the Checkpoint into your pipeline: call context.run_checkpoint(checkpoint_name=...) in your Airflow operator, dbt post-hook, or Spark job; the call raises an exception on validation failure if configured, blocking downstream tasks.
Store Expectation Suites and Validation Results in a shared backend (GCS, S3, or a database store) so that results are accessible to all pipeline runners and the Data Docs site.
Review Data Docs after each run to inspect which expectations passed or failed, drill into failed batches, and refine thresholds based on observed data distributions.
Known gotchas
Expectations defined on a small sample or a clean historical batch may have thresholds that do not reflect legitimate seasonal or growth-driven variation; review and update suites regularly rather than treating them as static contracts.
Running Great Expectations against large warehouse tables using full-table Batch Requests can be expensive; use a splitter or sampling configuration to validate a representative subset without scanning the entire table.
Great Expectations v1 (GX Core) has a significantly different API from v0.x; if migrating an existing project, the expectation gallery and checkpoint configuration formats changed, requiring a migration step before existing suites work with the new version.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp