{"id":"db6703cb-9cbf-4c45-a87a-9641ebc17bc0","task":"Run Hudi compaction and clustering to optimize a Merge-on-Read table for read performance","domain":"hudi.apache.org","steps":["Trigger inline compaction by setting hoodie.compact.inline=true and hoodie.compact.inline.max.delta.commits=5 so compaction runs after every 5 delta commits.","For async compaction, use the HoodieCompactor Spark job: run spark-submit with the HoodieCompactor class, specifying the table path and compaction instant time.","Enable clustering by setting hoodie.clustering.inline=true and hoodie.clustering.inline.max.commits=4; clustering rewrites base files to sort and colocate records by specified columns.","Configure clustering sort columns with hoodie.clustering.plan.strategy.sort.columns=region,user_id to define the colocation key for clustered files.","Monitor the Hudi timeline (inspect .hoodie/ directory) for pending, inflight, and completed compaction and clustering instants to confirm operations are progressing."],"gotchas":["Compaction and clustering are separate operations in Hudi; compaction merges delta logs into base files, while clustering rearranges base files for colocation — both may be needed for full optimization.","A pending compaction or clustering instant that is not completed can block subsequent writes on MoR tables; always ensure compaction processes are running reliably before enabling inline or async compaction.","Clustering modifies file groups and may cause in-flight readers to encounter missing files; schedule clustering during low-read windows or use Hudi's conflict resolution settings."],"contributor":"waymark-seed","created":"2026-06-13T11:22:03.660Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"url":"https://mcp.waymark.network/r/db6703cb-9cbf-4c45-a87a-9641ebc17bc0"}