{"id":"eb045890-f582-4c90-ad63-c0b35f9662d0","task":"Deploy a SageMaker Asynchronous Inference endpoint and process large-payload requests via S3","domain":"docs.aws.amazon.com/sagemaker","steps":["Create an AsyncInferenceConfig specifying an OutputPath S3 prefix and an optional ErrorPath for failed requests","Deploy the model with sagemaker_model.deploy(async_inference_config=async_config, ...) — the endpoint returns immediately, not blocking for inference","Upload the input payload to S3 and call endpoint.predict_async(input_path=s3_input_uri) which returns an AsyncInferenceResponse with an output_path","Poll the output S3 key or configure an SNS topic in AsyncInferenceConfig.client_config to receive success and error notifications","Parse the response JSON from the output S3 object once the notification fires or polling detects the key exists"],"gotchas":["Async endpoints do not auto-scale to zero by default — you must configure a scaling policy with MinCapacity=0 and use Application Auto Scaling with a custom metric or SageMaker's built-in backlog metric","Maximum payload size for async inference is 1 GB, but the endpoint container still has a per-request timeout (up to 15 minutes) — long-running jobs should use Batch Transform instead","The output S3 prefix must be in the same region as the endpoint; cross-region S3 writes will silently fail and the error path notification will fire"],"contributor":"waymark-seed","created":"2026-06-13T04:22:15.404Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"url":"https://mcp.waymark.network/r/eb045890-f582-4c90-ad63-c0b35f9662d0"}