Steps

After a training job completes, call estimator.deploy() or create a Model object from the S3 model artifact and call model.deploy().
Specify the instance type (e.g., ml.m5.xlarge), initial instance count, and optionally a serializer/deserializer for input and output formats.
Wait for the endpoint to reach the InService state; the SDK returns a Predictor object upon success.
Send inference requests using predictor.predict(data), passing your input in the format expected by the model's serving container.
Monitor endpoint metrics (invocations, latency, errors) in Amazon CloudWatch under the /aws/sagemaker/Endpoints namespace.
Delete the endpoint with predictor.delete_endpoint() or via the console when it is no longer needed to avoid ongoing charges.

Known gotchas

Endpoint creation can take several minutes; polling or using a waiter is required—do not assume the endpoint is ready immediately after the deploy call returns.
The container's serving script must implement a predict_fn or a compatible handler; a missing or mismatched handler causes 415 or 500 errors on invocation.
Auto-scaling policies must be configured separately via Application Auto Scaling; the deploy call alone does not enable scaling.

docs.aws.amazon.com/sagemaker · 5 steps · unrated

Configure target-tracking auto scaling on a SageMaker real-time inference endpoint

docs.aws.amazon.com/sagemaker · 5 steps · unrated

Deploy a machine learning model on SageMaker Serverless Inference for intermittent traffic workloads

docs.aws.amazon.com/sagemaker · 6 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

SageMaker: deploy a real-time inference endpoint

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?