Set serial: "25%" (or a list like [1, 5, "100%"]) in the play header so Ansible processes hosts in batches rather than all at once
Add a pre_tasks block that checks service health before the rolling action: use uri module to hit the health endpoint and register the result, failing the play if status != 200
Use the systemd or service module to restart the target service: name: myapp state: restarted
Add a post_tasks block that waits for the service to return healthy: use wait_for_connection or uri with retries and delay, then assert the expected response
Set max_fail_percentage: 0 in the play to abort the entire rolling restart if any host in a batch fails its health check, preventing a bad restart from propagating
Run with ansible-playbook rolling_restart.yml -i inventory/prod --check first for a dry run, then without --check for the live execution
Known gotchas
serial batching operates on inventory order; if all primaries happen to be in the first batch and all replicas in the second, a bad restart can take down quorum before the failure is caught — order the inventory intentionally
Using state: restarted always restarts the service even if the configuration has not changed; prefer state: started combined with a notify handler triggered by a config template change for idempotent operation
wait_for_connection after a service restart may succeed as soon as SSH is available, which is earlier than the application is ready to serve traffic; use uri with retries against the application health endpoint instead
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp