Steps

Before starting, review the target site's Terms of Service and robots.txt to confirm that automated access for your use case is permitted; consult legal counsel for commercial data collection
Implement a per-domain request queue with a minimum delay between requests (derived from robots.txt Crawl-delay or a conservative default such as 1–5 seconds) using a FIFO queue and setTimeout
Set a descriptive User-Agent header identifying your bot, its purpose, and a contact URL so site operators can reach you: page.setExtraHTTPHeaders({ 'User-Agent': 'MyBot/1.0 (+https://mycompany.com/bot)' })
Respect HTTP response signals: honor 429 (Too Many Requests) with exponential backoff, stop on 403 (Forbidden), and do not retry 404 unless the URL was expected to exist
Prefer fetching structured data feeds, APIs, sitemaps, or RSS/Atom where available instead of scraping rendered HTML — these are explicitly provided for programmatic access and impose less server load

Known gotchas

Rate limiting per domain is critical: multiple concurrent Playwright workers each making requests without coordination can collectively violate crawl-delay requirements — use a shared queue or semaphore across workers
Retry-After headers in 429 responses specify how long to wait before retrying; honor this header value instead of using your own backoff period when it is present
Even technically permitted scraping can become a ToS violation if it degrades site performance for real users; monitor your request rate against the site's observed capacity and throttle proactively

playwright · 5 steps · unrated

Parse robots.txt and respect crawl-delay directives in a Playwright-based scraper

playwright.dev · 5 steps · unrated

Connect Playwright to a cloud browser pool (Browserless or Browserbase) via WebSocket

docs.browserless.io · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Set up rate-limited, ToS-compliant web scraping with Playwright using request queuing and polite delays

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?