Steps

Import the parquet module: import pyarrow.parquet as pq
Prepare an Arrow Table: table = pa.table({'col1': [...], 'col2': [...]})
Write with compression and row-group size: pq.write_table(table, 'output.parquet', compression='zstd', row_group_size=100000)
For per-column compression, pass a dictionary: pq.write_table(table, 'output.parquet', compression={'col1': 'snappy', 'col2': 'zstd'})
For streaming or multi-batch writes, use pyarrow.parquet.ParquetWriter: with pq.ParquetWriter('output.parquet', schema, compression='zstd') as writer: writer.write_table(batch)

Known gotchas

row_group_size is measured in number of rows, not bytes; a very large row_group_size improves compression ratio but increases memory footprint during read and write
The default compression in pyarrow is 'snappy'; zstd typically achieves better compression ratios at comparable speed and is preferable for archival or cold-storage Parquet files
When using ParquetWriter across multiple write_table calls, all batches must share the same schema as the one passed to the constructor; schema mismatches raise an ArrowInvalid error at write time

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Write an Arrow Table to Parquet with explicit compression and row-group sizing

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?