Steps

Install openlineage-python; configure a transport by setting the OPENLINEAGE_URL environment variable to your backend (e.g., a Marquez server at http://localhost:5000, or a Snowflake Horizon endpoint) and optionally OPENLINEAGE_API_KEY for authenticated backends
Create a client: from openlineage.client import OpenLineageClient; client = OpenLineageClient.from_environment()
Build a RunEvent using the builder: construct a Run with a generated UUID, a Job with namespace and name, input datasets as list of Dataset objects with namespace/name, and output datasets similarly; set eventType to START at job begin and COMPLETE or FAIL at job end
Emit the event with client.emit(run_event); for COMPLETE events, attach SchemaDatasetFacet to dataset objects to send column-level schema information alongside the lineage, enabling column-level lineage tracking in supporting backends
Validate that events are received by querying the backend API (e.g., GET /api/v1/namespaces/{namespace}/jobs or the equivalent Snowflake Horizon lineage endpoint if using the OpenLineage public preview feature)

Known gotchas

OpenLineage is a specification, not a platform; the client library emits events but the backend (Marquez, DataHub, Snowflake Horizon, etc.) determines what gets stored and how lineage is presented — test end-to-end with your specific backend
Run IDs must be UUIDs and must be consistent across the START, RUNNING, and COMPLETE/FAIL events for a single job execution; using a different UUID for the COMPLETE event creates a disconnected run in the backend rather than closing the open run
Snowflake Horizon's OpenLineage ingestion endpoint was in public preview as of early 2026; verify GA status and endpoint URL in current Snowflake documentation before building production integrations against it

Ingest table-level and column-level lineage into DataHub via the Python SDK

Amazon SageMaker Pipelines: build a pipeline with Processing, Training, and RegisterModel steps using the SageMaker Python SDK

ml-ops · 6 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Ingest pipeline metadata and dataset lineage into OpenLineage-compatible backends from a custom Python job

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?