Enable AQE with spark.sql.adaptive.enabled=true and confirm the version supports it (Spark 3.0+)
Enable skew join optimization with spark.sql.adaptive.skewJoin.enabled=true and set spark.sql.adaptive.skewJoin.skewedPartitionFactor and skewedPartitionThresholdInBytes to match the data distribution
Run the query and inspect the Spark UI's SQL tab for the AQEShuffleRead nodes; verify that skewed partitions were split
Enable dynamic partition pruning with spark.sql.optimizer.dynamicPartitionPruning.enabled=true and confirm that the query plan shows a DynamicPruning filter on the fact table join
Compare runtime and shuffle bytes before and after AQE using the Spark UI metrics to validate the improvement
Known gotchas
AQE skew join splitting only works for sort-merge joins; broadcast joins and shuffle hash joins are not subject to skew splitting, so very small tables should still be broadcast explicitly
Dynamic partition pruning requires that the smaller side of the join (the dimension table) fits within the broadcast threshold; if the dimension is too large, the pruning filter is not injected and the full fact table is scanned
AQE changes the query plan at runtime, which can make query plans non-reproducible across runs; this complicates benchmarking because two identical queries may produce different plans depending on runtime statistics
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp