Load the CDC source data into a Delta DataFrame or temp view containing inserts, updates, and deletes with a change_type indicator column.
Write the MERGE statement: MERGE INTO delta.`/path/to/customers` t USING cdc_source s ON t.id = s.id WHEN MATCHED AND s.change_type = 'delete' THEN DELETE WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * WHEN NOT MATCHED BY SOURCE THEN DELETE.
The WHEN NOT MATCHED BY SOURCE clause deletes rows in the target that are absent from the source, enabling full-table sync semantics.
Add predicates to both the MERGE condition and individual WHEN clauses to limit the scan and rewrite scope to specific partitions.
After the MERGE, run DESCRIBE HISTORY delta.`/path/to/customers` to confirm the MERGE operation was recorded with the correct operationMetrics.
Known gotchas
WHEN NOT MATCHED BY SOURCE is only available in Delta Lake 2.0+ (DBR 10.5+); earlier versions do not support this clause and will require a workaround using a separate DELETE statement.
A MERGE with WHEN NOT MATCHED BY SOURCE effectively touches all rows in the target to check presence in the source, causing a full table scan and rewrite on large tables; consider partitioning and filtering carefully.
Duplicate match keys in the source DataFrame result in a non-deterministic MERGE; deduplicate or rank the source by change sequence before executing the merge to guarantee correctness.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp