Define Dataset objects using URI strings that represent logical data assets (e.g., Dataset('s3://bucket/prefix/') or Dataset('snowflake://table/my_table')); URIs are opaque identifiers—Airflow does not validate or connect to them.
In the producing DAG, annotate the outlet task with outlets=[my_dataset] so that Airflow records a dataset update event each time the task completes successfully.
In the consuming DAG, replace the schedule parameter with schedule=[my_dataset] (a list of Dataset objects); the DAG will be queued to run after all listed datasets have been updated since the last run.
Use the Airflow UI Datasets view to inspect the dataset dependency graph, see when each dataset was last updated, and identify which DAGs produce or consume each dataset.
Combine dataset scheduling with time-based constraints by using DatasetOrTimeSchedule (Airflow 2.9+) to trigger on whichever comes first: a dataset update or a cron schedule.
To test dataset-triggered runs locally, manually emit a dataset update event via the Airflow REST API dataset events endpoint.
Known gotchas
Dataset scheduling only triggers when the producing task completes successfully; a task failure does not update the dataset, so the downstream DAG waits until the next successful run of the producer.
Dataset URIs are case-sensitive and must match exactly between producer and consumer; a URI mismatch means the dependency is silently never satisfied.
If multiple producing DAGs update the same dataset, the consuming DAG waits until all of them have updated once since the last consumer run, which can cause unexpected delays if one producer runs infrequently.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp