Identify the directory structure: Hive-partitioned datasets use key=value folder names (e.g., year=2023/month=01/file.parquet)
Read with partition column inference: SELECT * FROM read_parquet('s3://bucket/data/*/*/*.parquet', hive_partitioning = true)
Filter by partition key to trigger partition pruning: SELECT * FROM read_parquet('data/*/*/*.parquet', hive_partitioning = true) WHERE year = 2023 AND month = 3
Verify that partition columns appear in the result schema: DESCRIBE SELECT * FROM read_parquet('data/*/*/*.parquet', hive_partitioning = true)
Use a glob that covers all partition directories; too-narrow globs will silently omit partitions
Known gotchas
Without hive_partitioning = true, partition key=value path segments are ignored and the derived columns do not appear in the result; filters on those columns will not prune files
The glob pattern must reach the actual Parquet files (e.g., '**/*.parquet'); a glob that stops at a directory level will not match any files
Partition column types are inferred from the string values in the directory names; if inference produces the wrong type (e.g., string instead of integer), cast explicitly in the query
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp