Locate the merchant's product feed URL from their robots.txt, sitemap index, or Google Merchant Center public endpoint; prefer XML/JSON feed formats over HTML scraping.
Fetch the feed with a descriptive User-Agent identifying your agent and respect Cache-Control / ETag headers to avoid redundant downloads.
Parse the feed schema: Google Merchant Center XML feeds use 'g:' namespace attributes (e.g., g:id, g:price, g:availability, g:condition); normalize these into your internal product model.
Handle pagination tokens or next-page links present in large feeds; many merchants paginate at 1000–5000 items per page.
Validate required fields (id, title, description, link, image_link, price, availability) and log products with missing mandatory attributes for manual review.
Store a feed snapshot with an ingestion timestamp so change detection can diff against the previous snapshot on the next run.
Known gotchas
Feed URLs are often gated behind OAuth or API keys; never assume a public feed is unauthenticated just because it lacks a login page.
Price fields in feeds frequently omit currency or use locale-specific decimal separators; always parse the currency code separately and normalize to a canonical format before comparison.
Feeds can lag real-time inventory by hours; treat feed availability as a hint, not a guarantee, and confirm stock before committing to a purchase.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp