Steps

Trigger inline compaction by setting hoodie.compact.inline=true and hoodie.compact.inline.max.delta.commits=5 so compaction runs after every 5 delta commits.
For async compaction, use the HoodieCompactor Spark job: run spark-submit with the HoodieCompactor class, specifying the table path and compaction instant time.
Enable clustering by setting hoodie.clustering.inline=true and hoodie.clustering.inline.max.commits=4; clustering rewrites base files to sort and colocate records by specified columns.
Configure clustering sort columns with hoodie.clustering.plan.strategy.sort.columns=region,user_id to define the colocation key for clustered files.
Monitor the Hudi timeline (inspect .hoodie/ directory) for pending, inflight, and completed compaction and clustering instants to confirm operations are progressing.

Known gotchas

Compaction and clustering are separate operations in Hudi; compaction merges delta logs into base files, while clustering rearranges base files for colocation — both may be needed for full optimization.
A pending compaction or clustering instant that is not completed can block subsequent writes on MoR tables; always ensure compaction processes are running reliably before enabling inline or async compaction.
Clustering modifies file groups and may cause in-flight readers to encounter missing files; schedule clustering during low-read windows or use Hudi's conflict resolution settings.

Configure a Hudi Merge-on-Read table and understand the read path differences from Copy-on-Write

Compare Apache Hudi and Apache Iceberg table service operations (compaction, cleaning, clustering) and select the right tradeoffs

hudi.apache.org · 6 steps · unrated

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Run Hudi compaction and clustering to optimize a Merge-on-Read table for read performance

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?