Steps

Enable horizontal autoscaling by setting --autoscalingAlgorithm=THROUGHPUT_BASED (the default for streaming jobs); Dataflow adjusts worker count based on backlog and throughput metrics.
Set --maxNumWorkers to cap costs and --numWorkers as the initial count.
Enable Streaming Engine (also called Streaming Engine or Runner v2 depending on your SDK version) by adding the --enable_streaming_engine flag (verify the current flag name against Dataflow docs for your SDK version); this offloads shuffle and state storage off-VM to a managed backend, reducing per-worker memory and enabling finer-grained scaling.
Monitor the Dataflow job graph in the Cloud Console for backlog per step, system lag, and worker CPU utilization to tune scaling thresholds.
Use Streaming Engine with Streaming Appliance (verify availability and naming against current docs) for high-throughput jobs requiring very low latency.

Known gotchas

Without Streaming Engine, state is stored on worker disks; scaling down can trigger costly state migration.
Autoscaling reacts to backlog with some delay; bursty traffic may cause temporary lag spikes before workers are added.
Some Beam features (e.g., certain custom sources) may require worker-level state and are not fully compatible with Streaming Engine offloaded state; verify against current docs.

data-engineering · 5 steps · unrated

Configure Spark Structured Streaming trigger modes (processingTime, availableNow, continuous)

data-engineering · 5 steps · unrated

Configure Spark Structured Streaming watermarking to handle late-arriving data and bound state size

spark.apache.org · 6 steps · unrated

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Configure Dataflow autoscaling and understand Streaming Engine

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?