Add the Iceberg Spark runtime JAR to your Spark session and configure a catalog (e.g., spark.sql.catalog.my_catalog = org.apache.iceberg.spark.SparkCatalog) along with catalog properties pointing to your chosen catalog type.
Create the table with CREATE TABLE my_catalog.db.events (id BIGINT, event_time TIMESTAMP, region STRING, payload STRING) USING iceberg in Spark SQL.
Define a partition spec with PARTITIONED BY (days(event_time), region) to apply a day transform on the timestamp column alongside an identity partition on region.
Insert data with INSERT INTO my_catalog.db.events VALUES (...) and verify partitions are created as expected by querying the partitions metadata table: SELECT * FROM my_catalog.db.events.partitions.
Optionally alter the partition spec later with ALTER TABLE my_catalog.db.events ADD PARTITION FIELD bucket(16, id) to add a bucket transform without rewriting existing data.
Known gotchas
Partition spec changes only apply to new data written after the ALTER; existing partitions retain the old spec, resulting in a mixed-spec table that queries must handle correctly.
Using PARTITIONED BY in DDL sets the initial spec but does not allow referencing column transforms like days() in plain Hive-style syntax; you must use the Iceberg-specific DDL syntax supported by the Spark catalog.
Spark write options like write.distribution-mode may need to be set to range for sorted writes to align with the partition spec and avoid small files.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp