Diagnose manifest bloat by querying the Iceberg metadata tables: SELECT count(*), avg(added_files_count) FROM <catalog>.<db>.<table>.manifests; many small manifests with low file counts per manifest indicates compaction is needed
Run the Spark procedure CALL catalog.system.rewrite_manifests(table => 'db.table') — this rewrites manifest files by merging small manifests into larger ones while preserving all existing data file references; no data files are touched
Optionally pass use_caching => true to cache the manifest content plan in memory for faster rewrites on large tables; default is true
After the procedure completes, check the resulting snapshot: SELECT * FROM db.table.snapshots ORDER BY committed_at DESC LIMIT 1 and compare manifest count before and after using the metadata tables
Schedule rewrite_manifests to run after heavy incremental write periods (e.g., after streaming micro-batch commits accumulate thousands of manifests) rather than after compaction, since rewrite_data_files already rewrites manifests for affected files
Known gotchas
rewrite_manifests creates a new snapshot, so it interacts with snapshot expiry: if you expire snapshots aggressively right after, the old manifests are cleaned up but the new snapshot must also age before it can be expired
The procedure holds a table lock during the rewrite commit; on very large tables with many manifests this can block concurrent writers for several minutes
Running rewrite_manifests and rewrite_data_files simultaneously is safe but redundant — rewrite_data_files already rewrites manifests for the data files it processes; coordinate scheduling to avoid double work
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp