Hudi compaction (MOR tables): the compaction service merges delta log files into columnar base files on a configurable schedule (inline or async); tune hoodie.compact.inline.max.delta.commits to control how often compaction triggers relative to ingestion commits.
Iceberg compaction (rewrite_data_files): Iceberg has no built-in background service; compaction is an explicit user-triggered action via the rewrite_data_files procedure; it is better suited for batch-schedule maintenance jobs than continuous streaming ingestion.
Hudi cleaning: the cleaner service deletes old file versions based on a retention policy (number of commits or time); configure hoodie.cleaner.policy and hoodie.cleaner.commits.retained to bound storage growth.
Iceberg cleanup: use expire_snapshots and remove_orphan_files as separate steps; there is no unified cleaning service, giving more explicit control but requiring more operational orchestration.
Hudi clustering: reorganizes data within partitions by a sort key for query performance, similar to Iceberg's sort-order compaction; set hoodie.clustering.inline or run async clustering via a Spark job.
Choose Hudi for high-frequency upsert workloads where continuous background services reduce operational burden; choose Iceberg for batch-oriented workloads, polyglot engine support, or when tight control over maintenance timing is needed.
Known gotchas
Hudi inline compaction increases write latency on ingestion jobs because compaction runs synchronously with commits; async compaction requires a separate long-running compaction job that must be managed independently.
Iceberg does not natively track which files need compaction; you must query file-level metadata to identify candidates, whereas Hudi's metadata table tracks this automatically.
Both formats require the compaction/maintenance job to use the same catalog and table metadata configuration as the writer; catalog mismatches cause the maintenance job to operate on a stale table view and miss recent commits.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp