Upload a model artifact to GCS and call aiplatform.Model.upload() specifying serving_container_image_uri and artifact_uri
Create or get an existing Endpoint with aiplatform.Endpoint.create(display_name=...)
Deploy the model using endpoint.deploy(model=model, traffic_percentage=100, machine_type='n1-standard-4', min_replica_count=1)
To add a second model version for A/B testing, deploy it with traffic_percentage=20 and set the existing deployment to 80 via endpoint.update_traffic_split()
Monitor prediction latency and error rates via Cloud Monitoring metrics under the aiplatform.googleapis.com namespace
Known gotchas
Traffic split percentages across all deployed models on an endpoint must sum to exactly 100 — partial updates that don't satisfy this constraint are rejected
The serving container must expose a /predict HTTP endpoint on port 8080 by default; overriding requires specifying serving_container_ports in Model.upload()
Model upload does not validate the container image until deploy() is called — a bad image URI will not surface an error until deployment time
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp