Call the expire_snapshots table procedure (or SparkActions API) with an older-than timestamp to mark snapshots eligible for deletion; set a min_snapshots_to_keep guard to avoid deleting the current snapshot.
After expiry, run remove_orphan_files against the table's data and metadata directories with a retention threshold that exceeds the longest-running concurrent writer, to avoid deleting files still being written.
Execute both procedures in a maintenance job outside peak query hours; these are metadata-intensive operations that generate many small file-system calls.
Verify row counts and snapshot history in the snapshots and files metadata tables before and after to confirm the expected reduction without data loss.
Optionally schedule the procedures as a recurring Airflow or Spark scheduled job, passing the target table identifier and retention window as parameters.
Known gotchas
Setting the orphan-file retention window shorter than the age of actively-written files can delete data files mid-write and corrupt the table.
On object stores like S3, listing all objects for orphan detection is expensive; scope the location path to the table prefix to limit API calls.
expire_snapshots only deletes files referenced solely by expired snapshots; shared files used by still-live snapshots are never removed.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp