Steps

Enable fault-tolerant execution at the cluster level in config.properties: retry-policy=QUERY (retries the entire query on worker failure) or retry-policy=TASK (retries individual tasks, more granular); TASK retry is preferred for large ETL workloads
Configure an exchange manager for spilling intermediate exchange data to durable storage; add exchange-manager.name=filesystem and exchange.base-directories=<path to shared storage, e.g., an S3 or HDFS URI> in exchange-manager.properties; worker nodes must all have access to this path
Set max-failed-tasks to control how many task failures are tolerated before the query is aborted; start with a value like 100 and adjust based on cluster stability
For S3-backed exchange: add exchange.s3.region, exchange.s3.aws-access-key, and exchange.s3.aws-secret-key (or use IAM role-based auth); ensure the exchange bucket has a lifecycle policy to auto-delete temporary exchange data after a short retention period
Test with a heavy query (large hash join or sort-heavy aggregation) and simulate a worker failure by killing a worker mid-query; verify Trino retries and completes the query rather than failing it

Known gotchas

Fault-tolerant execution with TASK retry increases query latency because spilled exchange data must be written and re-read from durable storage; enable it selectively for long-running ETL queries rather than for short interactive queries where the overhead outweighs the benefit
The exchange manager's storage must be highly available and accessible from all worker nodes simultaneously; a misconfigured or unavailable exchange store causes all fault-tolerant queries to fail immediately
Not all Trino connectors support fault-tolerant execution equally; verify that the connector you are using (e.g., Iceberg, Hive, Delta) is compatible with the retry policy you choose — some connectors require additional coordinator-side split caching configuration

trino.io · 5 steps · unrated

Query federated data across an Iceberg catalog and a PostgreSQL connector in Trino with a cross-catalog join

trino.io · 5 steps · unrated

Use Trino EXPLAIN ANALYZE to diagnose slow query performance and identify bottleneck stages

trino.io · 5 steps · unrated

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Configure Trino fault-tolerant execution with an exchange manager for long-running ETL queries

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?