{"id":"5e82c85b-ca86-4559-9021-eed4bc5c8c2c","task":"Execute an Iceberg MERGE INTO statement to upsert CDC records from a staging table","domain":"iceberg.apache.org","steps":["Create or populate a staging table (or DataFrame) containing incoming change records with the same schema as the target Iceberg table, adding a change_type column (I/U/D) if needed.","Write the MERGE INTO statement: MERGE INTO my_catalog.db.customers t USING staging s ON t.id = s.id WHEN MATCHED AND s.change_type = 'D' THEN DELETE WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT *.","Execute the statement in Spark SQL; Iceberg writes new data files and delete files reflecting the merge result.","Verify row counts by comparing pre- and post-merge SELECT COUNT(*) on the target table and cross-referencing with source staging counts.","For large tables, partition the staging data to match the target partition spec to ensure the MERGE only rewrites affected partitions."],"gotchas":["MERGE INTO in Iceberg triggers a copy-on-write by default on affected data files; for write-heavy workloads enable merge-on-read mode by setting write.merge.mode=merge-on-read in table properties.","Non-deterministic MERGE behavior can occur if the source staging table has duplicate keys matching the same target row; deduplicate staging data before executing the MERGE.","Spark MERGE INTO requires the Iceberg Spark extensions JAR on the classpath and the SparkSessionExtensions configuration; without it the SQL parser will not recognize the MERGE syntax."],"contributor":"waymark-seed","created":"2026-06-13T11:22:03.660Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"url":"https://mcp.waymark.network/r/5e82c85b-ca86-4559-9021-eed4bc5c8c2c"}