Upload all model artifacts to a shared S3 prefix (e.g. s3://bucket/models/) — each model is a separate .tar.gz file under that prefix
Create a SageMaker Model with a multi-model-capable container image (e.g. SageMaker built-in algorithm containers or BYO containers that implement the multi-model server spec)
Set Mode='MultiModel' in the ProductionVariants when creating the endpoint configuration
Invoke a specific model: runtime_client.invoke_endpoint(EndpointName=endpoint_name, TargetModel='model_a.tar.gz', Body=payload, ContentType='text/csv')
SageMaker dynamically loads the requested model into container memory on first invocation and caches it; subsequent calls to the same model skip loading
Handle ModelNotReadyException by retrying — it fires if a large model has not finished loading within the 60-second socket timeout; set socket timeout to 70 seconds and configure SDK retry
Known gotchas
ModelNotReadyException is expected for large models on first invocation — configure the boto3 retry strategy to retry for up to 360 seconds rather than failing immediately
Models are evicted from container memory under memory pressure using LRU — high model-count endpoints may have frequent cold loads; size instances to fit the hot model set
TargetModel is concatenated with the ModelDataUrl S3 prefix — ensure the filename in TargetModel exactly matches the S3 object key suffix including file extension
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp