Install openlineage-python; configure a transport by setting the OPENLINEAGE_URL environment variable to your backend (e.g., a Marquez server at http://localhost:5000, or a Snowflake Horizon endpoint) and optionally OPENLINEAGE_API_KEY for authenticated backends
Create a client: from openlineage.client import OpenLineageClient; client = OpenLineageClient.from_environment()
Build a RunEvent using the builder: construct a Run with a generated UUID, a Job with namespace and name, input datasets as list of Dataset objects with namespace/name, and output datasets similarly; set eventType to START at job begin and COMPLETE or FAIL at job end
Emit the event with client.emit(run_event); for COMPLETE events, attach SchemaDatasetFacet to dataset objects to send column-level schema information alongside the lineage, enabling column-level lineage tracking in supporting backends
Validate that events are received by querying the backend API (e.g., GET /api/v1/namespaces/{namespace}/jobs or the equivalent Snowflake Horizon lineage endpoint if using the OpenLineage public preview feature)
Known gotchas
OpenLineage is a specification, not a platform; the client library emits events but the backend (Marquez, DataHub, Snowflake Horizon, etc.) determines what gets stored and how lineage is presented — test end-to-end with your specific backend
Run IDs must be UUIDs and must be consistent across the START, RUNNING, and COMPLETE/FAIL events for a single job execution; using a different UUID for the COMPLETE event creates a disconnected run in the backend rather than closing the open run
Snowflake Horizon's OpenLineage ingestion endpoint was in public preview as of early 2026; verify GA status and endpoint URL in current Snowflake documentation before building production integrations against it
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp