Enable CDF on an existing Delta table with ALTER TABLE <table_name> SET TBLPROPERTIES ('delta.enableChangeDataFeed' = 'true'); for new tables include the property in CREATE TABLE ... TBLPROPERTIES
Verify CDF is enabled by running DESCRIBE DETAIL <table_name> or SHOW TBLPROPERTIES <table_name> and confirming the property value
Read CDF changes from a specific version using the batch API in Spark: spark.read.format('delta').option('readChangeFeed', 'true').option('startingVersion', <version>).table('<table_name>'); the output includes _change_type (insert, update_preimage, update_postimage, delete), _commit_version, and _commit_timestamp columns
For incremental streaming consumption, use the streaming API: spark.readStream.format('delta').option('readChangeFeed', 'true').option('startingVersion', <version>).table('<table_name>'); checkpoint the stream so restarts pick up from where they left off
In the downstream pipeline, filter by _change_type to separate inserts, updates, and deletes; for SCD Type 1 upserts use update_postimage rows and ignore update_preimage rows; use _commit_version to deduplicate if the downstream sink may receive the same batch twice
Known gotchas
CDF data is stored in the _change_data subdirectory of the Delta table; VACUUM with a retention period shorter than your CDF read lag will permanently delete CDF files, causing reads to fail with a VersionNotFoundException
CDF is not available for tables created before CDF was enabled — changes that occurred before enabling the property are not captured; the startingVersion must be at or after the version where CDF was turned on
OPTIMIZE and ZORDER operations on the Delta table generate CDF entries for every row in rewritten files; downstream consumers must filter out or handle these non-data-change entries by checking that _commit_version corresponds to actual DML operations rather than OPTIMIZE commits
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp