Enable query profiling: PRAGMA enable_profiling; or set profiling_output to a file path
Run EXPLAIN ANALYZE on a Parquet scan query: EXPLAIN ANALYZE SELECT event_type, count(*) FROM read_parquet('events.parquet') WHERE ts > '2025-01-01' GROUP BY event_type
Inspect the plan output for PARQUET_SCAN operator; confirm 'Filters' shows the pushed-down predicate and 'Projection' lists only selected columns
Check Parquet file metadata to confirm row group statistics exist: SELECT * FROM parquet_metadata('events.parquet') — rows without min/max statistics prevent predicate pushdown
Write Parquet files with row group statistics using DuckDB: COPY (SELECT ...) TO 'output.parquet' (FORMAT PARQUET, ROW_GROUP_SIZE 122880) — DuckDB writes statistics automatically
Compare scan times with and without filters to quantify pushdown benefit; a scan that reads fewer bytes despite touching the same file confirms pushdown is active
Known gotchas
Predicates on computed expressions (e.g. WHERE YEAR(ts) = 2025) cannot be pushed down into Parquet row group filtering; rewrite as WHERE ts >= '2025-01-01' AND ts < '2026-01-01'
Parquet files written by some tools (e.g. older Pandas/PyArrow versions) may omit row group statistics; re-write the file with DuckDB or PyArrow with write_statistics=True to enable pushdown
EXPLAIN ANALYZE executes the full query — use it on representative but bounded data in development, not on full production datasets during peak hours
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp