Trigger inline compaction by setting hoodie.compact.inline=true and hoodie.compact.inline.max.delta.commits=5 so compaction runs after every 5 delta commits.
For async compaction, use the HoodieCompactor Spark job: run spark-submit with the HoodieCompactor class, specifying the table path and compaction instant time.
Enable clustering by setting hoodie.clustering.inline=true and hoodie.clustering.inline.max.commits=4; clustering rewrites base files to sort and colocate records by specified columns.
Configure clustering sort columns with hoodie.clustering.plan.strategy.sort.columns=region,user_id to define the colocation key for clustered files.
Monitor the Hudi timeline (inspect .hoodie/ directory) for pending, inflight, and completed compaction and clustering instants to confirm operations are progressing.
Known gotchas
Compaction and clustering are separate operations in Hudi; compaction merges delta logs into base files, while clustering rearranges base files for colocation — both may be needed for full optimization.
A pending compaction or clustering instant that is not completed can block subsequent writes on MoR tables; always ensure compaction processes are running reliably before enabling inline or async compaction.
Clustering modifies file groups and may cause in-flight readers to encounter missing files; schedule clustering during low-read windows or use Hudi's conflict resolution settings.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp