Prepare a training script and package it, or use a built-in SageMaker algorithm image URI for your framework.
Create an IAM role with the AmazonSageMakerFullAccess policy (or a scoped equivalent) and note the role ARN.
Instantiate an Estimator in the SageMaker Python SDK, specifying the image URI or framework, instance type, instance count, role ARN, and output S3 path.
Define data channels pointing to S3 URIs for training (and optionally validation) data using sagemaker.inputs.TrainingInput.
Call estimator.fit(inputs) to submit the training job; the SDK polls until the job reaches a terminal state.
Monitor progress in the SageMaker console under Training Jobs, or stream logs via the SDK; retrieve the model artifact from the output S3 path on completion.
Known gotchas
The IAM execution role must have s3:GetObject and s3:PutObject permissions on the specific buckets used; overly restrictive policies cause silent failures.
Instance types for training (ml.p3.*, ml.g4dn.*) differ from inference instance families; choosing a training instance type for deployment and vice versa raises an error.
Spot instance training requires setting use_spot_instances=True and providing a checkpoint S3 URI so interrupted jobs can resume.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp