Steps

Define Dataset objects using URI strings that represent logical data assets (e.g., Dataset('s3://bucket/prefix/') or Dataset('snowflake://table/my_table')); URIs are opaque identifiers—Airflow does not validate or connect to them.
In the producing DAG, annotate the outlet task with outlets=[my_dataset] so that Airflow records a dataset update event each time the task completes successfully.
In the consuming DAG, replace the schedule parameter with schedule=[my_dataset] (a list of Dataset objects); the DAG will be queued to run after all listed datasets have been updated since the last run.
Use the Airflow UI Datasets view to inspect the dataset dependency graph, see when each dataset was last updated, and identify which DAGs produce or consume each dataset.
Combine dataset scheduling with time-based constraints by using DatasetOrTimeSchedule (Airflow 2.9+) to trigger on whichever comes first: a dataset update or a cron schedule.
To test dataset-triggered runs locally, manually emit a dataset update event via the Airflow REST API dataset events endpoint.

Known gotchas

Dataset scheduling only triggers when the producing task completes successfully; a task failure does not update the dataset, so the downstream DAG waits until the next successful run of the producer.
Dataset URIs are case-sensitive and must match exactly between producer and consumer; a URI mismatch means the dependency is silently never satisfied.
If multiple producing DAGs update the same dataset, the consuming DAG waits until all of them have updated once since the last consumer run, which can cause unexpected delays if one producer runs infrequently.

airflow.apache.org · 5 steps · unrated

Configure Airflow pools and priority weights to control concurrency and prioritize critical DAG tasks

airflow.apache.org · 6 steps · unrated

Configure Airflow 3 DAG bundles to version and source DAGs from multiple repositories

data-engineering · 6 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Configure Airflow dataset-aware (data-driven) scheduling to trigger DAGs on upstream data availability

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?