Steps

Identify the begin instant timestamp by inspecting the .hoodie timeline directory or by querying the table for the latest successful commit that your downstream consumer has already processed
Configure the Spark read for incremental mode by setting hoodie.datasource.query.type=incremental, hoodie.datasource.read.begin.instanttime=<yyyyMMddHHmmss>, and optionally hoodie.datasource.read.end.instanttime=<yyyyMMddHHmmss> for a bounded window
Issue the read: spark.read.format('hudi').options(incrementalOptions).load('<table_path>') — for COW tables this reads base files written in the time range; for MOR tables you should use read-optimized query type or ensure the snapshot includes compacted data
The result contains all records inserted or updated in the instant range; use the _hoodie_commit_time metadata column to track exactly which commit each record came from for your downstream checkpoint
Persist the latest _hoodie_commit_time seen in each batch as your checkpoint; on the next run pass this value as begin.instanttime to avoid reprocessing records

Known gotchas

Incremental queries on MOR tables return merged (snapshot) data for records in the time range but do not natively surface deletes as explicit events; if you need delete detection, use a separate approach or the CDC-style _hoodie_is_deleted column if available in your Hudi version
The begin instant is exclusive (records at exactly that instant are not included); ensure your checkpoint stores the last instant processed and passes it as begin.instanttime so you do not create gaps
Hudi cleaning removes old file versions based on the configured number of commits to retain; if your incremental consumer falls far behind and the begin.instanttime references files that have been cleaned, the query fails — monitor consumer lag relative to the clean policy

hudi.apache.org · 5 steps · unrated

Configure a Hudi Record-Level Index (RLI) to accelerate upsert lookup performance on large tables

hudi.apache.org · 5 steps · unrated

Incrementally sync changed Employee Central records using lastModifiedDateTime queries against effective-dated entities

help.sap.com · 5 steps · unrated

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Execute a Hudi incremental query to fetch only changed records since a given commit timestamp

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?