Connect to a Spark session or a Delta Lake-compatible engine (Databricks, Delta Rust) with write access to the Delta table.
Run OPTIMIZE to compact small files: OPTIMIZE delta.`/path/to/table`; or with Z-ordering for query acceleration: OPTIMIZE delta.`/path/to/table` ZORDER BY (column_name).
Wait for OPTIMIZE to complete; it returns a metrics object showing how many files were added, removed, and the total size.
Run VACUUM to delete files no longer referenced by the current or recent snapshots: VACUUM delta.`/path/to/table` RETAIN 168 HOURS; (the default and recommended minimum retention is 7 days).
Confirm the file count reduction by running DESCRIBE DETAIL on the table and inspecting the numFiles field.
Known gotchas
Running VACUUM with a retention period shorter than 7 days (168 hours) requires explicitly setting spark.databricks.delta.retentionDurationCheck.enabled to false; the default safety check prevents accidental data loss.
VACUUM permanently deletes files; any open readers or time-travel queries referencing versions older than the retention threshold will fail after VACUUM completes.
OPTIMIZE with ZORDER rewrites all files in the affected partitions; on large tables this is expensive — apply partition filters (WHERE clause) to limit the scope.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp