Steps

Create an AsyncInferenceConfig specifying an OutputPath S3 prefix and an optional ErrorPath for failed requests
Deploy the model with sagemaker_model.deploy(async_inference_config=async_config, ...) — the endpoint returns immediately, not blocking for inference
Upload the input payload to S3 and call endpoint.predict_async(input_path=s3_input_uri) which returns an AsyncInferenceResponse with an output_path
Poll the output S3 key or configure an SNS topic in AsyncInferenceConfig.client_config to receive success and error notifications
Parse the response JSON from the output S3 object once the notification fires or polling detects the key exists

Known gotchas

Async endpoints do not auto-scale to zero by default — you must configure a scaling policy with MinCapacity=0 and use Application Auto Scaling with a custom metric or SageMaker's built-in backlog metric
Maximum payload size for async inference is 1 GB, but the endpoint container still has a per-request timeout (up to 15 minutes) — long-running jobs should use Batch Transform instead
The output S3 prefix must be in the same region as the endpoint; cross-region S3 writes will silently fail and the error path notification will fire

docs.aws.amazon.com/sagemaker · 6 steps · unrated

Configure target-tracking auto scaling on a SageMaker real-time inference endpoint

docs.aws.amazon.com/sagemaker · 5 steps · unrated

Deploy a machine learning model on SageMaker Serverless Inference for intermittent traffic workloads

docs.aws.amazon.com/sagemaker · 6 steps · unrated

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Deploy a SageMaker Asynchronous Inference endpoint and process large-payload requests via S3

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?