Create a MoR table by setting hoodie.datasource.write.table.type=MERGE_ON_READ in the write options; all other upsert options (record key, precombine) remain the same as CoW.
Write upsert batches; Hudi appends delta log files (Avro-encoded) alongside the base Parquet files rather than rewriting the base files on every write.
Query the read-optimized view (base files only) for fast reads without applying deltas: spark.read.format('hudi').option('hoodie.datasource.query.type', 'read_optimized').load('/path/to/hudi/events').
Query the real-time view (base + delta logs merged) for up-to-date results: spark.read.format('hudi').option('hoodie.datasource.query.type', 'snapshot').load('/path/to/hudi/events').
Schedule compaction to merge delta logs into base Parquet files: configure hoodie.compact.inline=true for inline compaction or trigger async compaction via HoodieCompactor.
Known gotchas
The read-optimized view may return stale data if compaction has not run recently and many delta log files have accumulated; always use the snapshot view for correctness-critical queries.
Inline compaction (hoodie.compact.inline=true) blocks the write path until compaction completes, increasing write latency; async compaction avoids this but requires a separate process to manage.
MoR tables have two separate Hive/Glue tables registered (_ro for read-optimized, _rt for real-time) when using the Hive Sync feature; ensure downstream consumers query the correct table suffix.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp