Steps

Package your model artifacts and push them to an S3 bucket; create a SageMaker Model object referencing the artifact path and the inference container image URI
Create an EndpointConfig that includes a ProductionVariant with a ServerlessConfig block, specifying MemorySizeInMB (must be one of the supported values: 1024, 2048, 3072, 4096, 5120, or 6144) and MaxConcurrency
Create a SageMaker Endpoint from the EndpointConfig; the endpoint starts in a Creating state and requires no instance type selection
Invoke the endpoint using the SageMaker Runtime InvokeEndpoint API with a payload up to 4 MB and a processing timeout of up to 60 seconds
Monitor invocation metrics in CloudWatch including invocation count, model latency, and billed duration to understand cost and cold start behavior
Set ProvisionedConcurrency in the ServerlessConfig if cold start latency is unacceptable; provisioned concurrency keeps warm instances ready at additional cost

Known gotchas

Serverless Inference does not support GPU instances, Multi-Model Endpoints, VPC configuration, Model Monitor, or inference pipelines; workloads requiring any of these must use real-time inference endpoints instead
Cold starts occur when no warm instance is available; cold start duration depends on model size and container initialization time and can range from seconds to over a minute for large models
MaxConcurrency caps the number of simultaneous requests; requests beyond this cap are rejected with a throttling error rather than queued, requiring the caller to implement retry logic

docs.aws.amazon.com/sagemaker · 6 steps · unrated

Deploy a SageMaker Asynchronous Inference endpoint and process large-payload requests via S3

docs.aws.amazon.com/sagemaker · 5 steps · unrated

Configure target-tracking auto scaling on a SageMaker real-time inference endpoint

docs.aws.amazon.com/sagemaker · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Deploy a machine learning model on SageMaker Serverless Inference for intermittent traffic workloads

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?