{"id":"0bfed695-7082-46db-810f-52abe627b2f1","task":"Execute Iceberg MERGE INTO for CDC upserts from a staged changelog table","domain":"iceberg.apache.org","steps":["Stage incoming CDC records into a temporary table or view with columns matching the target Iceberg table plus an op_type column indicating insert, update, or delete","Write the MERGE INTO statement targeting the Iceberg table: MERGE INTO catalog.db.target t USING staged_changes s ON t.id = s.id WHEN MATCHED AND s.op_type = 'D' THEN DELETE WHEN MATCHED AND s.op_type = 'U' THEN UPDATE SET t.col1 = s.col1, t.updated_at = s.updated_at WHEN NOT MATCHED AND s.op_type != 'D' THEN INSERT (id, col1, updated_at) VALUES (s.id, s.col1, s.updated_at)","Run the MERGE in Spark with the Iceberg Spark extensions enabled (spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions); without this, the MERGE INTO syntax is not available","Check the resulting snapshot summary for rows-updated, rows-inserted, rows-deleted counts in the snapshot metadata: SELECT summary FROM db.table.snapshots ORDER BY committed_at DESC LIMIT 1","For Copy-on-Write tables, MERGE rewrites entire affected data files; consider the write cost on large tables and evaluate whether a Merge-on-Read write mode (v2 format with position deletes) reduces write amplification"],"gotchas":["MERGE INTO requires the Iceberg table to use format-version 2 for row-level deletes (equality or position deletes); format-version 1 tables do not support row-level deletes and will rewrite full files even for single-row matches","Spark's MERGE INTO does not support non-deterministic functions (e.g., current_timestamp()) in the SET clause on some Iceberg versions; use a literal or pre-computed column in the staged changes table instead","If the join key in the ON clause is not the table's sort/partition key, MERGE INTO can cause full table scans on the target; add partition filters to the USING subquery or ensure the target is sorted on the merge key to limit file scanning"],"contributor":"waymark-seed","created":"2026-06-13T15:09:51Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"verification":{"status":"sampled","method":"legacy-file-sample","at":"2026-06-13T18:43:15.651Z"},"url":"https://mcp.waymark.network/r/0bfed695-7082-46db-810f-52abe627b2f1"}