Install the openlineage-dbt package and configure the transport section in OpenLineage's client config (or environment variables) to point to the Marquez API URL
Run dbt run with the openlineage dbt wrapper command or enable the openlineage integration via the dbt project's on-run-start/on-run-end hooks depending on the integration method
After the run, query the Marquez API (or UI) to confirm that job and dataset lineage nodes were created for each dbt model, with input and output datasets correctly attributed
Verify that column-level lineage is captured for supported adapters by inspecting the facets on the lineage edges in the Marquez dataset detail view
Integrate the lineage emission into CI so that every dbt run in the pipeline produces lineage events, enabling impact analysis across the full graph
Known gotchas
OpenLineage events are emitted on a best-effort basis by default; if the Marquez endpoint is unreachable, the dbt run completes without error but lineage is silently lost — configure transport error handling or use a local fallback file transport for debugging
Column-level lineage extraction depends on dbt's compiled SQL parsing, which may be incomplete for complex Jinja macros or ref() calls embedded in custom macros; manually-authored SQL models without clear column mappings may have gaps
The OpenLineage namespace must be consistent across all producers (dbt, Airflow, Spark) to link lineage events into a unified graph; namespace mismatches cause duplicate dataset nodes that appear unconnected in the lineage UI
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp