Install the provider: pip install apache-airflow-providers-openlineage; the provider integrates at the operator level and emits START, COMPLETE, and FAIL OpenLineage run events for each task execution in supported operators
Configure the OpenLineage transport in airflow.cfg or via environment variable: set AIRFLOW__OPENLINEAGE__TRANSPORT to a JSON string like '{"type": "http", "url": "http://marquez:5000", "endpoint": "api/v1/lineage"}' for HTTP transport, or 'file' for local debugging
Optionally set AIRFLOW__OPENLINEAGE__NAMESPACE to a string identifying your Airflow environment in the lineage backend; this namespaces all emitted job names and helps distinguish events from multiple Airflow instances writing to the same backend
Airflow 3's OpenLineage provider instruments supported hooks automatically; for custom operators, annotate input datasets by returning a list of Dataset(namespace=..., name=...) objects from a get_openlineage_facets_on_start() method on the operator
Verify that events are reaching the backend by checking the Marquez (or other backend) API for the jobs and runs emitted by a recent DAG run; cross-reference run IDs with Airflow task instance IDs via the parentRunFacet in Spark sub-tasks
Known gotchas
Not all Airflow operators emit lineage automatically; the provider ships with a list of supported_classes for which lineage extraction is built-in — for unsupported operators lineage will be emitted with empty input/output dataset arrays unless you implement get_openlineage_facets_on_start()
OpenLineage emission is synchronous during task teardown in the default configuration; on slow or unavailable backends this can delay task completion or cause tasks to report errors unrelated to the actual task logic — use async transport or configure a timeout
Airflow 3 renamed some internal task context variables relative to Airflow 2; verify that the OpenLineage provider version you install is compatible with Airflow 3 (>=2.0.0 of the provider package) to avoid AttributeError exceptions during lineage collection
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp