Import the parquet module: import pyarrow.parquet as pq
Prepare an Arrow Table: table = pa.table({'col1': [...], 'col2': [...]})
Write with compression and row-group size: pq.write_table(table, 'output.parquet', compression='zstd', row_group_size=100000)
For per-column compression, pass a dictionary: pq.write_table(table, 'output.parquet', compression={'col1': 'snappy', 'col2': 'zstd'})
For streaming or multi-batch writes, use pyarrow.parquet.ParquetWriter: with pq.ParquetWriter('output.parquet', schema, compression='zstd') as writer: writer.write_table(batch)
Known gotchas
row_group_size is measured in number of rows, not bytes; a very large row_group_size improves compression ratio but increases memory footprint during read and write
The default compression in pyarrow is 'snappy'; zstd typically achieves better compression ratios at comparable speed and is preferable for archival or cold-storage Parquet files
When using ParquetWriter across multiple write_table calls, all batches must share the same schema as the one passed to the constructor; schema mismatches raise an ArrowInvalid error at write time
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp