Ensure KServe is installed on the cluster (standard or serverless mode with Knative) and the serving.kserve.io CRD is registered
Write an InferenceService manifest specifying apiVersion: serving.kserve.io/v1beta1, kind: InferenceService, and a predictor block with the model framework and storage URI, for example: predictor.sklearn.storageUri pointing to a GCS or S3 path
Apply the manifest: kubectl apply -f isvc.yaml in the target namespace
Wait for the service to reach Ready state: kubectl get inferenceservice <name> -n <namespace> and check the READY column
Retrieve the inference URL from the status field or via kubectl get inferenceservice <name> -o jsonpath='{.status.url}'
Send a prediction using the V2 inference protocol: POST to <url>/v2/models/<name>/infer with a JSON body containing inputs array
Known gotchas
In serverless mode, scale-to-zero is enabled by default; the first request after idle incurs cold-start latency — set minReplicas: 1 in the autoscaling annotations to keep a warm replica
Storage URI credentials (S3, GCS, Azure) must be provided as a Kubernetes secret named the same as the service account or referenced via the storageSpec.secretKeyRef field — missing credentials cause the model agent init container to fail
The v1beta1 API uses a predictor.model block in newer KServe versions (ClusterServingRuntime-based) rather than the older predictor.sklearn / predictor.xgboost shorthand — check the installed KServe version to use the correct spec
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp