Install datahub using pip install acryl-datahub; configure the REST emitter by creating a DatahubRestEmitter pointing to your DataHub GMS endpoint (e.g., http://localhost:8080) and a DataHub access token if auth is enabled
Create UpstreamLineageClass objects for table-level lineage: define a list of Upstream entries each containing a dataset URN and a DatasetLineageTypeClass (such as TRANSFORMED); wrap them in an UpstreamLineageClass and emit as a MetadataChangeProposalWrapper for the target dataset URN
For column-level lineage, create FineGrainedLineageClass entries mapping upstream fieldPaths to downstream fieldPaths; include these in an UpstreamLineageClass alongside the table-level entries
Emit the lineage event using emitter.emit(mcp) where mcp is a MetadataChangeProposalWrapper; for large graphs, use the DatahubRestEmitter batch context manager to send multiple MCPs in a single request
Verify lineage in the DataHub UI under the Lineage tab of the affected dataset, or query the GraphQL endpoint with searchAcrossLineage to validate upstream and downstream propagation
Known gotchas
Dataset URNs must exactly match the URNs already registered in DataHub for the lineage edges to render; a URN that references a non-existent dataset will silently create a dangling lineage edge pointing to an unresolved entity
Column-level lineage requires fieldPath values to match the schema field names registered under the dataset's SchemaMetadata aspect; if the schema has not been ingested first, column lineage edges will not resolve in the UI
DataHub deduplicates lineage based on the full upstream URN set; emitting a partial lineage update replaces the existing upstream list rather than merging — send all upstream datasets in a single MCP to avoid unintentionally removing prior lineage edges
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp