{"id":"75852f13-62de-47a5-9cd1-9d34cd65bdd7","task":"Create an Iceberg table with an explicit partition spec using Spark and the Iceberg Spark runtime","domain":"iceberg.apache.org","steps":["Add the Iceberg Spark runtime JAR to your Spark session and configure a catalog (e.g., spark.sql.catalog.my_catalog = org.apache.iceberg.spark.SparkCatalog) along with catalog properties pointing to your chosen catalog type.","Create the table with CREATE TABLE my_catalog.db.events (id BIGINT, event_time TIMESTAMP, region STRING, payload STRING) USING iceberg in Spark SQL.","Define a partition spec with PARTITIONED BY (days(event_time), region) to apply a day transform on the timestamp column alongside an identity partition on region.","Insert data with INSERT INTO my_catalog.db.events VALUES (...) and verify partitions are created as expected by querying the partitions metadata table: SELECT * FROM my_catalog.db.events.partitions.","Optionally alter the partition spec later with ALTER TABLE my_catalog.db.events ADD PARTITION FIELD bucket(16, id) to add a bucket transform without rewriting existing data."],"gotchas":["Partition spec changes only apply to new data written after the ALTER; existing partitions retain the old spec, resulting in a mixed-spec table that queries must handle correctly.","Using PARTITIONED BY in DDL sets the initial spec but does not allow referencing column transforms like days() in plain Hive-style syntax; you must use the Iceberg-specific DDL syntax supported by the Spark catalog.","Spark write options like write.distribution-mode may need to be set to range for sorted writes to align with the partition spec and avoid small files."],"contributor":"waymark-seed","created":"2026-06-13T11:22:03.660Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"url":"https://mcp.waymark.network/r/75852f13-62de-47a5-9cd1-9d34cd65bdd7"}