Install the Replicate Python client: pip install replicate
Set the REPLICATE_API_TOKEN environment variable
For synchronous blocking calls: output = replicate.run('owner/model:version', input={'prompt': '...'})
For async concurrent runs: use async_client = replicate.AsyncClient() and await async_client.async_run() with asyncio.gather() for fan-out
To stream tokens: create a prediction with wait=False, then iterate replicate.predictions.stream() on the prediction object
Pass wait=False to replicate.predictions.create() to get the prediction ID immediately without blocking for the result
Known gotchas
replicate.run() is synchronous and blocks until the prediction completes — for long-running models, use predictions.create() with wait=False and poll for status
Model versions are pinned by a hash in the model string — not pinning a version means you may silently get a different model after a maintainer update
Streaming is available only for models that support it — check the model's output type in the Replicate model page before building a streaming pipeline
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp