Fetch a company's submissions JSON from https://data.sec.gov/submissions/CIK{10-digit-CIK}.json and filter the filings array to find 10-K entries; extract the accession number for the target filing
Retrieve the filing index by constructing the URL https://www.sec.gov/Archives/edgar/data/{CIK}/{accession-no-dashes}/ and fetching the index JSON or HTML to identify the primary iXBRL document (typically the .htm file listed as the 10-K document)
Download the primary iXBRL HTML document; it embeds ix:nonNumeric and ix:nonFraction tags within standard HTML — parse these tags to extract tagged XBRL facts without a separate XBRL instance document
Alternatively, use the pre-extracted companyfacts endpoint (GET https://data.sec.gov/api/xbrl/companyfacts/CIK{CIK}.json) which provides all tagged facts in normalized JSON without parsing the raw iXBRL HTML
Map extracted facts to their us-gaap or dei taxonomy concepts using the tag and taxonomy fields; apply the unitRef (e.g., USD, shares) and the contextRef period to correctly interpret each value
Validate key figures against the XBRL EDGAR Viewer at sec.gov to confirm your parser is interpreting the iXBRL namespace and context references correctly before running at scale
Known gotchas
iXBRL files embed XBRL facts inside HTML — a standard HTML parser will extract visible text but silently drop the XBRL tagging; use an iXBRL-aware parser or the pre-extracted companyfacts API instead
The same financial value may appear tagged multiple times in a filing due to footnotes or restated periods; always use the contextRef period dates to select the correct reporting period
Large 10-K filings with many exhibits can exceed 100 MB; do not load the full filing index into memory — stream and parse incrementally
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp