Request the PDP with a well-identified User-Agent and check for a JSON-LD or Open Graph data layer first, as these are cheaper to parse than full DOM traversal.
If no structured markup is present, parse the HTML DOM to locate the price element (commonly identified by CSS classes, data attributes, or itemprop values), title (h1 or product title element), and availability indicator.
For variant-based products (size, color, etc.), identify the variant selector elements; note that selected-variant data is often injected via JavaScript into a window.__INITIAL_STATE__ or similar global variable — extract this JSON blob from the script tags.
Normalize extracted price strings: strip currency symbols, handle locale-specific thousands separators and decimal points, and pair with a currency code inferred from the page locale or URL TLD.
Validate the extracted data for internal consistency: price should be a positive number, availability should map to a known state, and the product title should match the URL slug or breadcrumb to detect off-target extraction.
Version your extraction selectors: when a merchant redesigns their PDP, selectors break silently; add a freshness check that flags extractions that produce suspiciously empty or null fields for manual review.
Known gotchas
Client-side rendered PDPs (React, Vue, etc.) may return an empty or skeleton HTML document to a plain HTTP fetch; use a headless browser for these pages and be aware this is a heavier and more detectable operation.
Prices on PDPs can be personalized (logged-in discounts, geo-pricing) or A/B tested; extracted prices may not match what a different user session would see.
Extraction selectors are brittle by nature — a CSS class rename during a merchant's A/B test or redesign will silently break data collection; monitor extraction success rates and alert on degradation.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp