Stage incoming CDC records into a temporary table or view with columns matching the target Iceberg table plus an op_type column indicating insert, update, or delete
Write the MERGE INTO statement targeting the Iceberg table: MERGE INTO catalog.db.target t USING staged_changes s ON t.id = s.id WHEN MATCHED AND s.op_type = 'D' THEN DELETE WHEN MATCHED AND s.op_type = 'U' THEN UPDATE SET t.col1 = s.col1, t.updated_at = s.updated_at WHEN NOT MATCHED AND s.op_type != 'D' THEN INSERT (id, col1, updated_at) VALUES (s.id, s.col1, s.updated_at)
Run the MERGE in Spark with the Iceberg Spark extensions enabled (spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions); without this, the MERGE INTO syntax is not available
Check the resulting snapshot summary for rows-updated, rows-inserted, rows-deleted counts in the snapshot metadata: SELECT summary FROM db.table.snapshots ORDER BY committed_at DESC LIMIT 1
For Copy-on-Write tables, MERGE rewrites entire affected data files; consider the write cost on large tables and evaluate whether a Merge-on-Read write mode (v2 format with position deletes) reduces write amplification
Known gotchas
MERGE INTO requires the Iceberg table to use format-version 2 for row-level deletes (equality or position deletes); format-version 1 tables do not support row-level deletes and will rewrite full files even for single-row matches
Spark's MERGE INTO does not support non-deterministic functions (e.g., current_timestamp()) in the SET clause on some Iceberg versions; use a literal or pre-computed column in the staged changes table instead
If the join key in the ON clause is not the table's sort/partition key, MERGE INTO can cause full table scans on the target; add partition filters to the USING subquery or ensure the target is sorted on the merge key to limit file scanning
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp