Steps

Add the 'X-Robots-Tag' header to HTTP responses for PDFs, images, and other non-HTML files where meta robots tags cannot be embedded: 'X-Robots-Tag: noindex'
To block AI training crawlers selectively, combine robots.txt user-agent rules with X-Robots-Tag; add 'Disallow: /' blocks for GPTBot, ClaudeBot, and Google-Extended in robots.txt
Use 'X-Robots-Tag: nosnippet' to prevent Google from displaying a text snippet or preview for a URL in search results, independent of crawl or index restrictions
Set 'X-Robots-Tag: noindex, nofollow' at the web server or CDN level for staging environments to prevent accidental indexing of dev sites
Combine with 'Disallow' in robots.txt carefully: if a URL is disallowed, Google cannot read a noindex in either the meta tag or X-Robots-Tag — use X-Robots-Tag only on URLs Googlebot can crawl
Verify the header is being sent using 'curl -I {url}' and confirm the value appears in the response headers before relying on it for index control

Known gotchas

A URL blocked by robots.txt Disallow cannot be noindexed via X-Robots-Tag — Googlebot will not fetch the URL to read the header; use Disallow only to reduce crawl load, use noindex (via header or meta tag on a crawlable page) to remove from the index
X-Robots-Tag is advisory: well-behaved crawlers respect it, but rogue or uncompliant bots may ignore it; it is not a substitute for access control on sensitive content
Google-Extended controls Google's use of your content for AI model training separately from Googlebot's use for Search; blocking Google-Extended does not affect Search indexing

developers.google.com · 6 steps · unrated

Configure per-engine crawl-delay directives in robots.txt, since Bing and Yandex honor it but Googlebot ignores it entirely

developers.google.com · 5 steps · unrated

Write and audit robots.txt rules to control crawler access without blocking critical resources

developers.google.com · 5 steps · unrated

Give your agent this knowledge — and 15,500+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Use X-Robots-Tag HTTP response headers to control indexing of non-HTML resources and to block specific AI training crawlers

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,500+ more routes

Need this verified for your stack — or a route we don't have yet?