Understand the RFC 9309 rule: when multiple rules of differing directives match a URL, the most specific rule (longest matching path) wins regardless of whether it is Allow or Disallow
Test conflicting rule pairs using Google's robots.txt Tester in Search Console, entering the specific URL to see which rule wins
Resolve ambiguity by making the more permissive path explicitly more specific; for example, Allow: /admin/public/ paired with Disallow: /admin/ correctly allows the subdirectory
Verify Googlebot, Googlebot-Image, and Googlebot-News as separate user-agents if you need to apply different rules to each crawler type
After editing, confirm the file is served from the exact path https://yourdomain.com/robots.txt at the root of the host with no redirect
Known gotchas
RFC 9309 specifies that crawlers must parse at least 500 kibibytes of the robots.txt file and ignore the rest; a file exceeding this limit may have valid rules silently truncated
The robots.txt Disallow directive is a crawling hint, not an access control mechanism; pages blocked by robots.txt can still appear in search results if they are linked from other pages, because Google can infer their existence from links
Wildcard patterns (* and $) are Google extensions to the RFC 9309 spec and are not universally supported by all crawlers; test behavior for each crawler you target if precision matters
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp