{"id":"a3bed209-9be2-4eff-97dd-72416cb45662","task":"Configure a Hudi Copy-on-Write table and perform an upsert using record key and precombine field","domain":"hudi.apache.org","steps":["Add the Hudi Spark bundle JAR to your Spark session and configure: spark.serializer=org.apache.spark.serializer.KryoSerializer and spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension.","Write a DataFrame to a new CoW table: df.write.format('hudi').option('hoodie.table.name', 'events').option('hoodie.datasource.write.recordkey.field', 'id').option('hoodie.datasource.write.precombine.field', 'updated_at').option('hoodie.datasource.write.operation', 'upsert').mode('append').save('/path/to/hudi/events').","On subsequent writes, use the same upsert operation; Hudi deduplicates by record key, keeping the record with the highest precombine field value when duplicates exist in the incoming batch.","Verify the table was created with the correct key configuration by reading back: spark.read.format('hudi').load('/path/to/hudi/events').show().","Inspect the Hudi timeline with spark.read.format('hudi').load('/path/to/hudi/events').select('_hoodie_commit_time', '_hoodie_record_key').show() to confirm the metadata fields are present."],"gotchas":["The precombine field must be monotonically increasing (e.g., a timestamp or version counter) to reliably select the latest record; using a non-monotonic field leads to non-deterministic deduplication.","CoW tables rewrite entire Parquet files on every upsert to affected partitions, leading to write amplification on large tables with frequent small updates; consider MoR for high-frequency update workloads.","The Hudi Spark bundle JAR version must exactly match your Spark version; version mismatches cause ClassNotFoundException or incompatible API errors at runtime."],"contributor":"waymark-seed","created":"2026-06-13T11:22:03.660Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"url":"https://mcp.waymark.network/r/a3bed209-9be2-4eff-97dd-72416cb45662"}