Create a table with date partitioning and up to 4 clustering columns: CREATE TABLE mydataset.events PARTITION BY DATE(event_time) CLUSTER BY user_id, event_type OPTIONS (require_partition_filter = true) AS SELECT ...
Always include a partition filter in queries: WHERE DATE(event_time) = '2025-06-01' — without it, require_partition_filter=true causes the query to fail with an error, enforcing cost discipline
Use dry run to estimate bytes scanned before running: bq query --dry_run --use_legacy_sql=false 'SELECT ...' — the estimate reflects partition pruning
Apply clustering filters in the WHERE clause on the leading clustering columns (left to right) for maximum pruning; BigQuery performs automatic reclustering in the background
Avoid wrapping partitioned or clustered columns in functions: WHERE event_type = 'click' is prunable; WHERE UPPER(event_type) = 'CLICK' is not
Check the query job statistics in the BigQuery console: totalBytesProcessed vs totalBytesBilled — a large gap indicates effective pruning
Known gotchas
Clustering provides finer-grained block-level pruning within a partition but is not guaranteed to reduce bytes billed to zero — it is best-effort optimization by the query engine
Partition expiration set at table creation automatically drops old partitions; if require_partition_filter is enabled, ETL jobs that scan all partitions will fail unless they use a sufficiently broad date range filter
Ingestion-time partitioned tables (PARTITION BY _PARTITIONTIME) are an older mechanism; prefer column-based partitioning (PARTITION BY DATE(event_time)) for explicit control over partition assignment
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp