Understand when compaction is needed: MOR tables accumulate delta log files alongside base Parquet files on each upsert; reads must merge these logs at query time, which degrades performance as log files accumulate; compaction merges logs into new base files
Schedule compaction asynchronously by running a Spark job with hoodie.datasource.write.operation=UPSERT and hoodie.compact.inline=false (the default); then separately trigger compaction scheduling with operation=SCHEDULE_COMPACTION to create a compaction plan
Execute the scheduled compaction plan by running a dedicated Hudi compaction job using the HoodieCompactor class or the hudi-utilities-bundle with spark-submit, referencing the table path; the job runs all pending compaction plans or a specific instant
Alternatively, enable inline compaction by setting hoodie.compact.inline=true and hoodie.compact.inline.max.delta.commits=<N> so compaction runs automatically after N delta commits — simpler but blocks the write pipeline during compaction
After compaction, verify by checking the .hoodie timeline for compaction instants with state=completed and then querying the table; the number of log files per partition should decrease substantially
Known gotchas
Compaction is only applicable to MOR tables; calling compaction procedures on COW tables has no effect because COW tables have no delta logs
Asynchronous compaction runs concurrently with writers; Hudi handles this safely through its timeline, but ensure the compaction job uses the same Hudi version as the writer — version mismatches can cause timeline corruption
If inline compaction is enabled and the write job is killed mid-compaction, the compaction instant is left in a requested or inflight state; subsequent write attempts may stall or fail until the inflight instant is rolled back using the Hudi CLI rollback command
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp