Steps

Host robots.txt at the root of each hostname (https://example.com/robots.txt); it applies only to that exact origin and does not inherit to subdomains
Define User-agent blocks followed by Allow and Disallow directives; use specific User-agent names (Googlebot, Bingbot) before the catch-all User-agent: * block
Use the Google Search Console robots.txt Tester tool to verify that specific URLs are allowed or blocked as intended before deploying changes
Avoid disallowing CSS, JavaScript, and font files that are necessary for rendering; Googlebot must be able to fetch page resources to evaluate the rendered content
Add a Sitemap directive pointing to your sitemap URL at the bottom of the file to help crawlers discover it

Known gotchas

robots.txt blocks crawling but not indexing; a page disallowed in robots.txt can still appear in search results if other pages link to it — use the noindex meta tag or header for indexing control
The Allow directive takes precedence over Disallow when both match a URL with equal specificity; the longer (more specific) matching rule wins, not the order of rules in the file
URL-encoded and decoded paths are treated as different patterns by some crawlers; a Disallow for /search%3F will not block /search? in all implementations

developers.google.com · 6 steps · unrated

Configure per-engine crawl-delay directives in robots.txt, since Bing and Yandex honor it but Googlebot ignores it entirely

developers.google.com · 5 steps · unrated

Parse robots.txt and respect crawl-delay directives in a Playwright-based scraper

playwright.dev · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Write and audit robots.txt rules to control crawler access without blocking critical resources

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?