Collect raw access logs from your web server (Apache, Nginx, or CDN log export) and filter log lines where the User-Agent field contains Googlebot to identify candidate bot requests
For each candidate IP address, perform a reverse DNS lookup (PTR record) to confirm the hostname resolves to a domain ending in .googlebot.com or .google.com
Forward-confirm the verified IPs by resolving the returned hostname back to an IP address and checking it matches the original request IP; only requests that pass both reverse and forward DNS checks are genuine Googlebot
Aggregate verified Googlebot requests by URL path, response code, and time of day to identify crawl budget allocation, frequently crawled URLs, and URLs returning error codes to Googlebot
Identify crawl budget waste by finding high-crawl-frequency URLs that return 404, 302 chains, or soft 404s, and fix or block them to reclaim budget for important pages
Known gotchas
Skipping the forward DNS confirmation step allows spoofed Googlebot requests (where an attacker sets the User-Agent to Googlebot) to pollute your analysis; the two-step verification is mandatory per Google's own documentation
CDN and load balancer logs may log the CDN edge IP instead of the original requester IP in the standard IP field; check whether your logging configuration captures the true client IP via X-Forwarded-For or equivalent headers
Googlebot crawl rate adapts to server response times; a server under load that responds slowly will see Googlebot back off, making crawl log data during high-traffic periods unrepresentative of normal crawl patterns
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp