Initialize the SDK: import google.cloud.aiplatform as aip; aip.init(project='your-project', location='us-central1')
Upload the model to Vertex AI Model Registry: model = aip.Model.upload(display_name='my-model', artifact_uri='gs://your-bucket/model/', serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-0:latest')
Create an endpoint: endpoint = aip.Endpoint.create(display_name='my-endpoint')
Deploy the model to the endpoint: model.deploy(endpoint=endpoint, deployed_model_display_name='my-model-v1', machine_type='n1-standard-4', min_replica_count=1, max_replica_count=3)
Send a prediction request: endpoint.predict(instances=[{'feature1': 1.0, 'feature2': 2.0}])
Undeploy a model version when replacing it: endpoint.undeploy(deployed_model_id=deployed_model_id) before or after deploying the new version
Known gotchas
The serving container image URI must exactly match the framework and version of your saved model artifacts — a mismatch causes the deployment to fail at container startup with a cryptic error
Traffic splitting between deployed model versions on the same endpoint is configured via the traffic_percentage parameter in deploy(); omitting it routes 100% to the new version immediately
model.deploy() is synchronous by default and can take several minutes; in automated pipelines set sync=False and poll the operation if you need non-blocking behavior
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp