Connect to Trino with the Iceberg catalog configured and identify a table with many small files: SELECT file_count, total_size FROM <catalog>.information_schema.tables WHERE table_name = '<table>' or query the Iceberg metadata table iceberg.<schema>.<table>$files and aggregate file sizes
Run the optimize command: ALTER TABLE <catalog>.<schema>.<table> EXECUTE optimize(file_size_threshold => '128MB') — Trino rewrites files smaller than the threshold into larger files using the Iceberg rewrite_data_files mechanism under the hood
Optionally scope the optimize to a specific partition to reduce resource usage: ALTER TABLE <catalog>.<schema>.<table> EXECUTE optimize WHERE partition_column = 'value' and file_size_threshold => '128MB'
Monitor query progress in the Trino Web UI; for large tables with many small files this can be a long-running query — consider running during off-peak hours and setting an appropriate query_max_execution_time session property
After completion, query the Iceberg metadata again to confirm file counts have decreased and average file sizes are near the target; then run expire_snapshots (via Spark procedure or Iceberg REST API) to clean up the now-superseded pre-compaction files
Known gotchas
ALTER TABLE EXECUTE optimize in Trino requires the Iceberg connector and does not work with the Hive connector even if the table is in Parquet format — ensure you are using the correct catalog type
The optimize command holds a Trino worker resource for the duration of the compaction; on busy clusters set appropriate resource group limits to prevent compaction from consuming all available worker memory
Trino's optimize does not sort data within the compacted files; if you also need sort-order optimization (equivalent to ZORDER or sort-based compaction in Iceberg), you must use the Spark rewrite_data_files procedure with sort_order options instead
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp