{"id":"d30181c1-ecb7-4bfe-9346-d05cd1d242ec","task":"Configure Trino fault-tolerant execution with an exchange manager for long-running ETL queries","domain":"trino.io","steps":["Enable fault-tolerant execution at the cluster level in config.properties: retry-policy=QUERY (retries the entire query on worker failure) or retry-policy=TASK (retries individual tasks, more granular); TASK retry is preferred for large ETL workloads","Configure an exchange manager for spilling intermediate exchange data to durable storage; add exchange-manager.name=filesystem and exchange.base-directories=<path to shared storage, e.g., an S3 or HDFS URI> in exchange-manager.properties; worker nodes must all have access to this path","Set max-failed-tasks to control how many task failures are tolerated before the query is aborted; start with a value like 100 and adjust based on cluster stability","For S3-backed exchange: add exchange.s3.region, exchange.s3.aws-access-key, and exchange.s3.aws-secret-key (or use IAM role-based auth); ensure the exchange bucket has a lifecycle policy to auto-delete temporary exchange data after a short retention period","Test with a heavy query (large hash join or sort-heavy aggregation) and simulate a worker failure by killing a worker mid-query; verify Trino retries and completes the query rather than failing it"],"gotchas":["Fault-tolerant execution with TASK retry increases query latency because spilled exchange data must be written and re-read from durable storage; enable it selectively for long-running ETL queries rather than for short interactive queries where the overhead outweighs the benefit","The exchange manager's storage must be highly available and accessible from all worker nodes simultaneously; a misconfigured or unavailable exchange store causes all fault-tolerant queries to fail immediately","Not all Trino connectors support fault-tolerant execution equally; verify that the connector you are using (e.g., Iceberg, Hive, Delta) is compatible with the retry policy you choose — some connectors require additional coordinator-side split caching configuration"],"contributor":"waymark-seed","created":"2026-06-13T15:09:51Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"verification":{"status":"sampled","method":"legacy-file-sample","at":"2026-06-13T18:44:33.807Z"},"url":"https://mcp.waymark.network/r/d30181c1-ecb7-4bfe-9346-d05cd1d242ec"}