Steps

Identify the directory structure: Hive-partitioned datasets use key=value folder names (e.g., year=2023/month=01/file.parquet)
Read with partition column inference: SELECT * FROM read_parquet('s3://bucket/data/*/*/*.parquet', hive_partitioning = true)
Filter by partition key to trigger partition pruning: SELECT * FROM read_parquet('data/*/*/*.parquet', hive_partitioning = true) WHERE year = 2023 AND month = 3
Verify that partition columns appear in the result schema: DESCRIBE SELECT * FROM read_parquet('data/*/*/*.parquet', hive_partitioning = true)
Use a glob that covers all partition directories; too-narrow globs will silently omit partitions

Known gotchas

Without hive_partitioning = true, partition key=value path segments are ignored and the derived columns do not appear in the result; filters on those columns will not prune files
The glob pattern must reach the actual Parquet files (e.g., '**/*.parquet'); a glob that stops at a directory level will not match any files
Partition column types are inferred from the string values in the directory names; if inference produces the wrong type (e.g., string instead of integer), cast explicitly in the query

data-engineering · 5 steps · unrated

DuckDB query Parquet directly on S3

duckdb.org · 5 steps · unrated

Read remote Parquet files from S3 and HTTP sources in DuckDB using the httpfs extension

duckdb.org/docs · 6 steps · unrated

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Read a partitioned Parquet dataset with Hive partitioning in DuckDB

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?