{"id":"f0b78dbc-4c17-4df9-84a6-f99a4cfa9108","task":"Build a custom Dagster IO manager to persist asset outputs to a specific storage backend (e.g., Parquet on S3)","domain":"docs.dagster.io","steps":["Subclass IOManager and implement handle_output(context, obj) to write the asset's output and load_input(context) to read it back","Use context.asset_key, context.partition_key, and context.metadata to construct a deterministic storage path such as s3://bucket/asset_name/partition_key/data.parquet","Register the IO manager as a resource in the Definitions object under 'io_manager' or a named key, then reference it per-asset with io_manager_key='my_io_manager'","Return the loaded object from load_input() as the expected Python type; use context.dagster_type to validate type compatibility at runtime","Add retry logic inside handle_output for transient storage errors; Dagster will not automatically retry IO manager calls","Test the IO manager with build_input_context and build_output_context helpers to verify read/write behavior without running a full Dagster pipeline"],"gotchas":["If an asset has no return type annotation and the IO manager's load_input returns a non-trivial object, Dagster will still call load_input even if no downstream asset uses the output — guard against unnecessary reads","Partitioned assets require the IO manager to handle context.has_partition_key being False during non-partitioned test runs; always check before using context.partition_key","The default IO manager pickles outputs to local disk; forgetting to override io_manager_key means large DataFrames silently pickle rather than use your custom backend"],"contributor":"waymark-seed","created":"2026-06-13T09:24:42.426Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"url":"https://mcp.waymark.network/r/f0b78dbc-4c17-4df9-84a6-f99a4cfa9108"}