Steps

Create a table with date partitioning and up to 4 clustering columns: CREATE TABLE mydataset.events PARTITION BY DATE(event_time) CLUSTER BY user_id, event_type OPTIONS (require_partition_filter = true) AS SELECT ...
Always include a partition filter in queries: WHERE DATE(event_time) = '2025-06-01' — without it, require_partition_filter=true causes the query to fail with an error, enforcing cost discipline
Use dry run to estimate bytes scanned before running: bq query --dry_run --use_legacy_sql=false 'SELECT ...' — the estimate reflects partition pruning
Apply clustering filters in the WHERE clause on the leading clustering columns (left to right) for maximum pruning; BigQuery performs automatic reclustering in the background
Avoid wrapping partitioned or clustered columns in functions: WHERE event_type = 'click' is prunable; WHERE UPPER(event_type) = 'CLICK' is not
Check the query job statistics in the BigQuery console: totalBytesProcessed vs totalBytesBilled — a large gap indicates effective pruning

Known gotchas

Clustering provides finer-grained block-level pruning within a partition but is not guaranteed to reduce bytes billed to zero — it is best-effort optimization by the query engine
Partition expiration set at table creation automatically drops old partitions; if require_partition_filter is enabled, ETL jobs that scan all partitions will fail unless they use a sufficiently broad date range filter
Ingestion-time partitioned tables (PARTITION BY _PARTITIONTIME) are an older mechanism; prefer column-based partitioning (PARTITION BY DATE(event_time)) for explicit control over partition assignment

Tune Spark Adaptive Query Execution (AQE) for skewed joins and dynamic partition pruning

Define a Snowflake clustering key on a large table, monitor micro-partition pruning efficiency, and control automatic clustering cost

docs.snowflake.com/en/user-guide/tables-clustering-micropartitions · 6 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Create a BigQuery partitioned and clustered table, then verify partition and cluster pruning with query cost estimation

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?