Export raw access logs from your web server or CDN, filtering lines where the user-agent string contains 'Googlebot'
For each candidate IP address, perform a reverse DNS lookup to confirm the hostname ends in .googlebot.com or .google.com, then do a forward DNS lookup on that hostname to confirm it resolves back to the original IP
Categorize crawled URLs by type (canonical pages, parameter variants, internal search results, faceted navigation URLs) to identify which URL classes consume disproportionate crawl share
Overlay the log-derived crawl time series with the Search Console Crawl Stats report to confirm alignment before making infrastructure changes
Prioritize reducing crawl waste sources: consolidate parameter variants with canonical tags, block dead-end URL patterns in robots.txt, and reduce redirect chains
Known gotchas
Legitimate Googlebot IPs must pass both reverse and forward DNS verification; log entries that claim to be Googlebot but fail this check are not Google and should be treated as scraper traffic
The Search Console Crawl Stats report shows crawl activity for verified properties but does not break down by individual URL or parameter pattern; log analysis is required for granular investigation
CDNs that serve cached responses may not forward the real client IP to origin logs; ensure your log collection captures the true remote IP (or the CDN's logged client IP) before building the IP-to-Googlebot mapping
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp