Identify the begin instant time (e.g., a prior commit timestamp) from the Hudi timeline by listing instants in the .hoodie/ directory or querying the active timeline.
Set the query type to incremental and specify the begin instant: spark.read.format('hudi').option('hoodie.datasource.query.type', 'incremental').option('hoodie.datasource.read.begin.instanttime', '20240315120000').load('/path/to/hudi/events').
Optionally set hoodie.datasource.read.end.instanttime to bound the incremental window to a specific end commit.
The result contains only records inserted or updated in the specified instant range, including the _hoodie_commit_time metadata column for filtering.
Use the incremental DataFrame as a source for downstream pipelines — write it to another table, push it to a Kafka topic, or use it to drive incremental dbt runs.
Known gotchas
Incremental queries on MoR tables return data from base files and delta logs within the window; if compaction has not run, very old delta logs may slow down the incremental read.
The begin instant time is exclusive (records at exactly that instant are not included); account for this boundary when building pipelines that checkpoint by commit time.
Incremental queries do not work correctly across partition schema changes or clustering operations that reorganize file groups; reset the checkpoint if such operations occur.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp