Identify the begin instant timestamp by inspecting the .hoodie timeline directory or by querying the table for the latest successful commit that your downstream consumer has already processed
Configure the Spark read for incremental mode by setting hoodie.datasource.query.type=incremental, hoodie.datasource.read.begin.instanttime=<yyyyMMddHHmmss>, and optionally hoodie.datasource.read.end.instanttime=<yyyyMMddHHmmss> for a bounded window
Issue the read: spark.read.format('hudi').options(incrementalOptions).load('<table_path>') — for COW tables this reads base files written in the time range; for MOR tables you should use read-optimized query type or ensure the snapshot includes compacted data
The result contains all records inserted or updated in the instant range; use the _hoodie_commit_time metadata column to track exactly which commit each record came from for your downstream checkpoint
Persist the latest _hoodie_commit_time seen in each batch as your checkpoint; on the next run pass this value as begin.instanttime to avoid reprocessing records
Known gotchas
Incremental queries on MOR tables return merged (snapshot) data for records in the time range but do not natively surface deletes as explicit events; if you need delete detection, use a separate approach or the CDC-style _hoodie_is_deleted column if available in your Hudi version
The begin instant is exclusive (records at exactly that instant are not included); ensure your checkpoint stores the last instant processed and passes it as begin.instanttime so you do not create gaps
Hudi cleaning removes old file versions based on the configured number of commits to retain; if your incremental consumer falls far behind and the begin.instanttime references files that have been cleaned, the query fails — monitor consumer lag relative to the clean policy
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp