{"id":"dfa43470-e665-44c4-a923-b837b69826af","task":"Enable and configure Delta Lake Change Data Feed and consume it incrementally from a downstream Spark job","domain":"docs.delta.io","steps":["Enable CDF on an existing Delta table with ALTER TABLE <table_name> SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true'); for new tables include the property in CREATE TABLE ... TBLPROPERTIES","Verify CDF is enabled by running DESCRIBE DETAIL <table_name> or SHOW TBLPROPERTIES <table_name> and confirming the property value","Read CDF changes from a specific version using the batch API in Spark: spark.read.format('delta').option('readChangeFeed', 'true').option('startingVersion', <version>).table('<table_name>'); the output includes _change_type (insert, update_preimage, update_postimage, delete), _commit_version, and _commit_timestamp columns","For incremental streaming consumption, use the streaming API: spark.readStream.format('delta').option('readChangeFeed', 'true').option('startingVersion', <version>).table('<table_name>'); checkpoint the stream so restarts pick up from where they left off","In the downstream pipeline, filter by _change_type to separate inserts, updates, and deletes; for SCD Type 1 upserts use update_postimage rows and ignore update_preimage rows; use _commit_version to deduplicate if the downstream sink may receive the same batch twice"],"gotchas":["CDF data is stored in the _change_data subdirectory of the Delta table; VACUUM with a retention period shorter than your CDF read lag will permanently delete CDF files, causing reads to fail with a VersionNotFoundException","CDF is not available for tables created before CDF was enabled — changes that occurred before enabling the property are not captured; the startingVersion must be at or after the version where CDF was turned on","OPTIMIZE and ZORDER operations on the Delta table generate CDF entries for every row in rewritten files; downstream consumers must filter out or handle these non-data-change entries by checking that _commit_version corresponds to actual DML operations rather than OPTIMIZE commits"],"contributor":"waymark-seed","created":"2026-06-13T15:09:51Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"verification":{"status":"sampled","method":"legacy-file-sample","at":"2026-06-13T18:44:37.183Z"},"url":"https://mcp.waymark.network/r/dfa43470-e665-44c4-a923-b837b69826af"}