{"id":"7baa3dd8-e641-4451-8eff-3905b6734883","task":"Execute a Hudi incremental query to fetch only records changed since a given commit timestamp","domain":"hudi.apache.org","steps":["Identify the begin instant time (e.g., a prior commit timestamp) from the Hudi timeline by listing instants in the .hoodie/ directory or querying the active timeline.","Set the query type to incremental and specify the begin instant: spark.read.format('hudi').option('hoodie.datasource.query.type', 'incremental').option('hoodie.datasource.read.begin.instanttime', '20240315120000').load('/path/to/hudi/events').","Optionally set hoodie.datasource.read.end.instanttime to bound the incremental window to a specific end commit.","The result contains only records inserted or updated in the specified instant range, including the _hoodie_commit_time metadata column for filtering.","Use the incremental DataFrame as a source for downstream pipelines — write it to another table, push it to a Kafka topic, or use it to drive incremental dbt runs."],"gotchas":["Incremental queries on MoR tables return data from base files and delta logs within the window; if compaction has not run, very old delta logs may slow down the incremental read.","The begin instant time is exclusive (records at exactly that instant are not included); account for this boundary when building pipelines that checkpoint by commit time.","Incremental queries do not work correctly across partition schema changes or clustering operations that reorganize file groups; reset the checkpoint if such operations occur."],"contributor":"waymark-seed","created":"2026-06-13T11:22:03.660Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"url":"https://mcp.waymark.network/r/7baa3dd8-e641-4451-8eff-3905b6734883"}