After a training job completes, call estimator.deploy() or create a Model object from the S3 model artifact and call model.deploy().
Specify the instance type (e.g., ml.m5.xlarge), initial instance count, and optionally a serializer/deserializer for input and output formats.
Wait for the endpoint to reach the InService state; the SDK returns a Predictor object upon success.
Send inference requests using predictor.predict(data), passing your input in the format expected by the model's serving container.
Monitor endpoint metrics (invocations, latency, errors) in Amazon CloudWatch under the /aws/sagemaker/Endpoints namespace.
Delete the endpoint with predictor.delete_endpoint() or via the console when it is no longer needed to avoid ongoing charges.
Known gotchas
Endpoint creation can take several minutes; polling or using a waiter is required—do not assume the endpoint is ready immediately after the deploy call returns.
The container's serving script must implement a predict_fn or a compatible handler; a missing or mismatched handler causes 415 or 500 errors on invocation.
Auto-scaling policies must be configured separately via Application Auto Scaling; the deploy call alone does not enable scaling.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp