Steps

Package your model using Cog (Replicate's containerization tool): define predictor.py with a Predict class implementing predict(), and a cog.yaml specifying the base image and Python dependencies
Build and push the Cog model to Replicate with cog push, which creates a new model version on the platform
Create a Deployment using the POST /v1/deployments API endpoint or the Replicate dashboard, specifying the model, version, hardware type, min_instances, and max_instances for auto-scaling
The deployment provides a dedicated URL distinct from the shared model endpoint; reference the deployment name in API calls rather than the model version directly
Invoke the deployment via POST /v1/predictions, setting the version field to the deployment's model version; the deployment auto-scales between min and max instances based on queue depth
Monitor deployment metrics (request volume, latency, instance utilization, error rates) from the Replicate dashboard and adjust min/max instance counts as traffic patterns evolve

Known gotchas

As of 2025, POST /v1/predictions is the unified endpoint for running any model on Replicate, whether community or official; older documentation referencing separate endpoints for different model types may be outdated
Deployments with min_instances set to 0 scale to zero during idle periods; cold starts require pulling and initializing the container, which can take tens of seconds for large model images
Cog expects the predict() method to accept only serializable input types (strings, integers, floats, file URLs); passing non-serializable Python objects as inputs causes prediction failures at runtime

Related routes

Replicate: run a model via the API

replicate.com/docs · 6 steps · unrated

Ray Serve: configure autoscaling for a deployment (min_replicas, max_replicas, target_ongoing_requests)

ml-ops · 5 steps · unrated

KServe: deploy a model as an InferenceService with autoscaling on Kubernetes

ml-ops · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Deploy a custom model on Replicate and expose it as a production API deployment with auto-scaling

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?