Ensure KServe is installed in your Kubernetes cluster and the knative-serving or raw deployment mode is configured as expected.
Write an InferenceService manifest in YAML specifying apiVersion: serving.kserve.io/v1beta1, kind: InferenceService, and a spec.predictor section with the framework (e.g., sklearn, xgboost, pytorch) and storage URI pointing to the model in S3 or GCS.
Apply the manifest with kubectl apply -f inferenceservice.yaml in the target namespace.
Watch the resource with kubectl get inferenceservice -n NAMESPACE until the READY column shows True.
Retrieve the endpoint URL from the InferenceService status (status.url) and send a POST request to the v1/models/MODEL_NAME:predict path with a JSON body in the v2 inference protocol format.
Check predictor pod logs with kubectl logs for debugging if the service does not reach Ready state.
Known gotchas
The storage URI must be accessible to the cluster's service account; missing IAM or Workload Identity bindings cause the model download to fail and the pod to crash-loop.
KServe uses specific label and annotation selectors; deploying into a namespace without the kserve label causes the webhook to not fire and the resource may be partially created.
Serverless mode requires Knative Serving and a compatible ingress (Istio or Kourier); without these, the InferenceService will never become Ready.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp