Install the replicate Python client and set the REPLICATE_API_TOKEN environment variable to your API token.
Browse replicate.com to find the model and note its owner/name:version string, or use the latest version tag if the model exposes one.
Run the model synchronously: output = replicate.run('OWNER/MODEL_NAME:VERSION', input={'prompt': 'your input', ...}) where the input dict keys match the model's documented input schema.
For long-running models, create a prediction asynchronously: prediction = replicate.predictions.create(version=VERSION_ID, input={...}) then poll prediction.reload() until prediction.status is 'succeeded'.
Access the output from output (sync) or prediction.output (async); outputs are typically URLs to generated files or plain text strings depending on the model.
Handle the 'failed' status by checking prediction.error for the error message.
Known gotchas
Synchronous replicate.run() blocks until the prediction completes and can time out for very long-running models; use the async predictions API for jobs expected to run more than a minute.
Output files hosted on Replicate's CDN URLs expire after a period; download and store outputs immediately rather than saving only the URL.
Input parameter names and types are model-specific and documented on each model's page; passing an undocumented or wrongly typed parameter returns a validation error.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp