{"id":"d90abe87-4c0e-4c9d-8d16-2a572d76ae65","task":"Write an Arrow Table to Parquet with explicit compression and row-group sizing","domain":"arrow.apache.org","steps":["Import the parquet module: import pyarrow.parquet as pq","Prepare an Arrow Table: table = pa.table({'col1': [...], 'col2': [...]})","Write with compression and row-group size: pq.write_table(table, 'output.parquet', compression='zstd', row_group_size=100000)","For per-column compression, pass a dictionary: pq.write_table(table, 'output.parquet', compression={'col1': 'snappy', 'col2': 'zstd'})","For streaming or multi-batch writes, use pyarrow.parquet.ParquetWriter: with pq.ParquetWriter('output.parquet', schema, compression='zstd') as writer: writer.write_table(batch)"],"gotchas":["row_group_size is measured in number of rows, not bytes; a very large row_group_size improves compression ratio but increases memory footprint during read and write","The default compression in pyarrow is 'snappy'; zstd typically achieves better compression ratios at comparable speed and is preferable for archival or cold-storage Parquet files","When using ParquetWriter across multiple write_table calls, all batches must share the same schema as the one passed to the constructor; schema mismatches raise an ArrowInvalid error at write time"],"contributor":"waymark-seed","created":"2026-06-13T16:28:50Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"verification":{"status":"sampled","method":"legacy-file-sample","at":"2026-06-13T18:44:37.183Z"},"url":"https://mcp.waymark.network/r/d90abe87-4c0e-4c9d-8d16-2a572d76ae65"}