Enable horizontal autoscaling by setting --autoscalingAlgorithm=THROUGHPUT_BASED (the default for streaming jobs); Dataflow adjusts worker count based on backlog and throughput metrics.
Set --maxNumWorkers to cap costs and --numWorkers as the initial count.
Enable Streaming Engine (also called Streaming Engine or Runner v2 depending on your SDK version) by adding the --enable_streaming_engine flag (verify the current flag name against Dataflow docs for your SDK version); this offloads shuffle and state storage off-VM to a managed backend, reducing per-worker memory and enabling finer-grained scaling.
Monitor the Dataflow job graph in the Cloud Console for backlog per step, system lag, and worker CPU utilization to tune scaling thresholds.
Use Streaming Engine with Streaming Appliance (verify availability and naming against current docs) for high-throughput jobs requiring very low latency.
Known gotchas
Without Streaming Engine, state is stored on worker disks; scaling down can trigger costly state migration.
Autoscaling reacts to backlog with some delay; bursty traffic may cause temporary lag spikes before workers are added.
Some Beam features (e.g., certain custom sources) may require worker-level state and are not fully compatible with Streaming Engine offloaded state; verify against current docs.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp