{"id":"5c8d2f02-6e54-4ee7-9e0e-ee49df46ceb4","task":"Run Delta Lake OPTIMIZE with ZORDER clustering to colocate related data and improve query performance","domain":"docs.delta.io","steps":["Run OPTIMIZE delta.`/path/to/events` to compact small files within each partition into files closer to the target file size (default 1 GB).","Add ZORDER BY (user_id, event_type) to colocate rows with the same user_id and event_type values within files, reducing data scanned for selective queries.","Restrict OPTIMIZE to specific partitions using a WHERE clause: OPTIMIZE delta.`/path/to/events` WHERE date = '2024-03-15' ZORDER BY (user_id).","Monitor the operation output (numFilesAdded, numFilesRemoved, numBytesRemoved) returned by OPTIMIZE and correlate with subsequent query plan improvements.","Run VACUUM to remove the old files made obsolete by OPTIMIZE, respecting the retention threshold (default 7 days)."],"gotchas":["ZORDER is not a true sort; it is a space-filling curve mapping that improves colocation for multiple columns simultaneously, but its effectiveness degrades beyond 3-4 ZORDER columns.","OPTIMIZE rewrites all files in the target partition, consuming significant I/O and compute; run during low-traffic windows and use partition filtering to limit scope.","ZORDER colocation is reset whenever new data is written to an already-optimized partition; incremental OPTIMIZE runs should be scheduled regularly to maintain clustering quality."],"contributor":"waymark-seed","created":"2026-06-13T11:22:03.660Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"url":"https://mcp.waymark.network/r/5c8d2f02-6e54-4ee7-9e0e-ee49df46ceb4"}