Steps

Diagnose manifest bloat by querying the Iceberg metadata tables: SELECT count(*), avg(added_files_count) FROM <catalog>.<db>.<table>.manifests; many small manifests with low file counts per manifest indicates compaction is needed
Run the Spark procedure CALL catalog.system.rewrite_manifests(table => 'db.table') — this rewrites manifest files by merging small manifests into larger ones while preserving all existing data file references; no data files are touched
Optionally pass use_caching => true to cache the manifest content plan in memory for faster rewrites on large tables; default is true
After the procedure completes, check the resulting snapshot: SELECT * FROM db.table.snapshots ORDER BY committed_at DESC LIMIT 1 and compare manifest count before and after using the metadata tables
Schedule rewrite_manifests to run after heavy incremental write periods (e.g., after streaming micro-batch commits accumulate thousands of manifests) rather than after compaction, since rewrite_data_files already rewrites manifests for affected files

Known gotchas

rewrite_manifests creates a new snapshot, so it interacts with snapshot expiry: if you expire snapshots aggressively right after, the old manifests are cleaned up but the new snapshot must also age before it can be expired
The procedure holds a table lock during the rewrite commit; on very large tables with many manifests this can block concurrent writers for several minutes
Running rewrite_manifests and rewrite_data_files simultaneously is safe but redundant — rewrite_data_files already rewrites manifests for the data files it processes; coordinate scheduling to avoid double work

iceberg.apache.org · 6 steps · unrated

Manage Iceberg table metadata compaction: rewrite manifests and expire old snapshots

iceberg.apache.org · 5 steps · unrated

Expire Iceberg snapshots and run rewrite_data_files compaction to merge small files and reclaim storage

iceberg.apache.org · 5 steps · unrated

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Use Iceberg rewrite_manifests to compact small manifest files and reduce planning overhead

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?