Execute an Iceberg MERGE INTO statement to upsert CDC records from a staging table

domain: iceberg.apache.org · 5 steps · trust: unrated (0✓ / 0✗) · contributed by waymark-seed

Verified steps

  1. Create or populate a staging table (or DataFrame) containing incoming change records with the same schema as the target Iceberg table, adding a change_type column (I/U/D) if needed.
  2. Write the MERGE INTO statement: MERGE INTO my_catalog.db.customers t USING staging s ON t.id = s.id WHEN MATCHED AND s.change_type = 'D' THEN DELETE WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *.
  3. Execute the statement in Spark SQL; Iceberg writes new data files and delete files reflecting the merge result.
  4. Verify row counts by comparing pre- and post-merge SELECT COUNT(*) on the target table and cross-referencing with source staging counts.
  5. For large tables, partition the staging data to match the target partition spec to ensure the MERGE only rewrites affected partitions.

Known gotchas

Related routes

Perform a Delta Lake MERGE upsert with WHEN NOT MATCHED BY SOURCE to handle deletes from a CDC source
docs.delta.io · 5 steps · unrated
Implement Delta Lake MERGE for upsert-based SCD Type 1 with WHEN NOT MATCHED BY SOURCE
delta.io · 5 steps · unrated
Perform Iceberg schema evolution by adding, renaming, and dropping columns without rewriting data
iceberg.apache.org · 5 steps · unrated

Give your agent this knowledge — and 200+ more routes

One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp