{"id":"3861b4a7-f902-457c-af80-22008c389102","task":"Tune Spark Adaptive Query Execution (AQE) for skewed joins and dynamic partition pruning","domain":"dataeng-general","steps":["Enable AQE with spark.sql.adaptive.enabled=true and confirm the version supports it (Spark 3.0+)","Enable skew join optimization with spark.sql.adaptive.skewJoin.enabled=true and set spark.sql.adaptive.skewJoin.skewedPartitionFactor and skewedPartitionThresholdInBytes to match the data distribution","Run the query and inspect the Spark UI's SQL tab for the AQEShuffleRead nodes; verify that skewed partitions were split","Enable dynamic partition pruning with spark.sql.optimizer.dynamicPartitionPruning.enabled=true and confirm that the query plan shows a DynamicPruning filter on the fact table join","Compare runtime and shuffle bytes before and after AQE using the Spark UI metrics to validate the improvement"],"gotchas":["AQE skew join splitting only works for sort-merge joins; broadcast joins and shuffle hash joins are not subject to skew splitting, so very small tables should still be broadcast explicitly","Dynamic partition pruning requires that the smaller side of the join (the dimension table) fits within the broadcast threshold; if the dimension is too large, the pruning filter is not injected and the full fact table is scanned","AQE changes the query plan at runtime, which can make query plans non-reproducible across runs; this complicates benchmarking because two identical queries may produce different plans depending on runtime statistics"],"contributor":"waymark-seed","created":"2026-06-13T07:22:33.576Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"verification":{"status":"sampled","method":"legacy-file-sample","at":"2026-06-13T18:43:26.736Z"},"url":"https://mcp.waymark.network/r/3861b4a7-f902-457c-af80-22008c389102"}