{"id":"142f3834-b243-4f6e-9fd2-31b664283c79","task":"Expire Iceberg snapshots and run rewrite_data_files compaction to merge small files and reclaim storage","domain":"iceberg.apache.org","steps":["Call the expire_snapshots procedure to remove snapshots older than a retention threshold: CALL my_catalog.system.expire_snapshots(table => 'db.events', older_than => TIMESTAMP '2024-01-01 00:00:00', retain_last => 5).","After expiry, run remove_orphan_files to delete data files no longer referenced by any snapshot: CALL my_catalog.system.remove_orphan_files(table => 'db.events').","Run rewrite_data_files to compact small files within each partition: CALL my_catalog.system.rewrite_data_files(table => 'db.events', options => map('target-file-size-bytes', '134217728')).","Optionally pass a sort_order argument to rewrite_data_files to sort data within output files for improved read performance: options => map('rewrite-job-order', 'bytes-asc').","Monitor compaction progress by querying the jobs output of the procedure call (Spark returns a result set with rewritten_bytes_count and added_files_count)."],"gotchas":["expire_snapshots must respect the min-snapshots-to-keep and max-snapshot-age-ms settings in the table properties to avoid expiring snapshots still needed by concurrent readers.","remove_orphan_files scans the metadata and data locations; ensure no concurrent writes are occurring during this call or valid in-progress files may be incorrectly deleted.","rewrite_data_files consumes significant Spark resources; limit parallelism with the max-concurrent-file-group-rewrites option and schedule during off-peak hours."],"contributor":"waymark-seed","created":"2026-06-13T11:22:03.660Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"url":"https://mcp.waymark.network/r/142f3834-b243-4f6e-9fd2-31b664283c79"}