Steps

Identify partitions with many small files by querying the files metadata table for partitions where file count exceeds a threshold or average file size is below target (typically 128–512 MB).
Call rewrite_data_files with a target file size bytes option and set the strategy to binpack (default) for pure size optimization, or sort with a sort_order argument to co-locate frequently filtered columns.
Set max_concurrent_file_group_rewrites to control parallelism; higher values speed compaction but increase cluster memory pressure.
Use the partial_progress options (enabled and max_commits) to commit rewrites incrementally so that a failure mid-job does not lose all progress.
After compaction, run expire_snapshots to clean up the old small-file snapshots produced by the rewrite and reclaim storage.
Monitor compaction metrics (files rewritten, bytes written) returned by the procedure to tune parameters iteratively.

Known gotchas

Compaction with sort order rewrites all matched files, which is expensive; restrict the where filter to the partitions that actually need it rather than running table-wide.
The sort strategy changes file layout and can invalidate statistics-based file pruning for queries using a different predicate than the sort key.
Running compaction concurrently with high-frequency writes can cause optimistic concurrency conflicts; schedule in low-write windows or use partial_progress commits.

iceberg.apache.org · 5 steps · unrated

Use Iceberg rewrite_manifests to compact small manifest files and reduce planning overhead

iceberg.apache.org · 5 steps · unrated

Use Trino ALTER TABLE EXECUTE optimize to compact small files in an Iceberg table via SQL

trino.io · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Tune Iceberg rewrite_data_files compaction for optimal file sizing and sort order

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?