Enable the Vertex AI API in your Google Cloud project and ensure you have an IAM role with the Vertex AI User permission.
Package your training code as a Python module or build a custom container image and push it to Google Artifact Registry.
Use the Vertex AI SDK: instantiate aiplatform.CustomTrainingJob (for a script) or aiplatform.CustomContainerTrainingJob (for a container), passing the container image URI or script path.
Call job.run() with the machine type, accelerator type and count if needed, replica count, and the GCS output directory.
Monitor the job in the Vertex AI console under Training or poll job.state until it reaches JobState.JOB_STATE_SUCCEEDED.
Retrieve output artifacts from the specified GCS bucket; model files are written there by your training script.
Known gotchas
The service account running the job needs Storage Object Admin on the GCS bucket used for output; missing permissions cause the job to fail silently at artifact write time.
Machine type names and accelerator type names must match the exact strings documented by Vertex AI (e.g., 'n1-standard-8', 'NVIDIA_TESLA_T4'); invalid strings cause immediate job rejection.
If using a custom container, the entrypoint must be set correctly; a wrong CMD or ENTRYPOINT causes the worker to exit immediately with a non-zero code.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp