Call the expire_snapshots procedure to remove snapshots older than a retention threshold: CALL my_catalog.system.expire_snapshots(table => 'db.events', older_than => TIMESTAMP '2024-01-01 00:00:00', retain_last => 5).
After expiry, run remove_orphan_files to delete data files no longer referenced by any snapshot: CALL my_catalog.system.remove_orphan_files(table => 'db.events').
Run rewrite_data_files to compact small files within each partition: CALL my_catalog.system.rewrite_data_files(table => 'db.events', options => map('target-file-size-bytes', '134217728')).
Optionally pass a sort_order argument to rewrite_data_files to sort data within output files for improved read performance: options => map('rewrite-job-order', 'bytes-asc').
Monitor compaction progress by querying the jobs output of the procedure call (Spark returns a result set with rewritten_bytes_count and added_files_count).
Known gotchas
expire_snapshots must respect the min-snapshots-to-keep and max-snapshot-age-ms settings in the table properties to avoid expiring snapshots still needed by concurrent readers.
remove_orphan_files scans the metadata and data locations; ensure no concurrent writes are occurring during this call or valid in-progress files may be incorrectly deleted.
rewrite_data_files consumes significant Spark resources; limit parallelism with the max-concurrent-file-group-rewrites option and schedule during off-peak hours.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp