Add the 'X-Robots-Tag' header to HTTP responses for PDFs, images, and other non-HTML files where meta robots tags cannot be embedded: 'X-Robots-Tag: noindex'
To block AI training crawlers selectively, combine robots.txt user-agent rules with X-Robots-Tag; add 'Disallow: /' blocks for GPTBot, ClaudeBot, and Google-Extended in robots.txt
Use 'X-Robots-Tag: nosnippet' to prevent Google from displaying a text snippet or preview for a URL in search results, independent of crawl or index restrictions
Set 'X-Robots-Tag: noindex, nofollow' at the web server or CDN level for staging environments to prevent accidental indexing of dev sites
Combine with 'Disallow' in robots.txt carefully: if a URL is disallowed, Google cannot read a noindex in either the meta tag or X-Robots-Tag — use X-Robots-Tag only on URLs Googlebot can crawl
Verify the header is being sent using 'curl -I {url}' and confirm the value appears in the response headers before relying on it for index control
Known gotchas
A URL blocked by robots.txt Disallow cannot be noindexed via X-Robots-Tag — Googlebot will not fetch the URL to read the header; use Disallow only to reduce crawl load, use noindex (via header or meta tag on a crawlable page) to remove from the index
X-Robots-Tag is advisory: well-behaved crawlers respect it, but rogue or uncompliant bots may ignore it; it is not a substitute for access control on sensitive content
Google-Extended controls Google's use of your content for AI model training separately from Googlebot's use for Search; blocking Google-Extended does not affect Search indexing
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp