{"id":"8a73e940-8f54-4566-bc35-f68a252b787a","task":"Search and harvest dataset metadata from data.gov using the CKAN API","domain":"catalog.data.gov","steps":["Obtain a free api.data.gov API key at https://api.data.gov/signup — required for the GSA CKAN proxy endpoint","Search datasets: GET https://api.gsa.gov/technology/datagov/v3/action/package_search?api_key={key}&q={keyword}&fq=organization:{agency_name}&rows=20&start=0","Retrieve all datasets for an organization: GET .../package_search?api_key={key}&fq=organization:doi-gov&rows=1000&start=0 — iterate start by 1000 until count is exhausted","Fetch full dataset metadata: GET https://api.gsa.gov/technology/datagov/v3/action/package_show?api_key={key}&id={package_id} — returns resources array with each distribution format, download URL, and media type","Filter to specific data formats by scanning resources[*].format for 'CSV', 'JSON', 'API', or 'GeoJSON'; extract resources[*].url for download or endpoint access","Monitor for new or updated datasets: sort by metadata_modified descending and track the max modified timestamp across polling cycles"],"gotchas":["data.gov CKAN contains only dataset metadata, not the actual data files — the resource download URLs link to the hosting agency's servers, which may have separate access controls, rate limits, or broken links independent of the CKAN API","The fq (filter query) parameter uses Solr query syntax — field names are CKAN-internal (organization, tags, res_format, extras_harvest_source_id) and do not always match the field names returned in the JSON response","Dataset organization slugs (e.g., 'doi-gov', 'noaa-gov') must match the exact CKAN organization name; these are not always predictable from the agency acronym and are best discovered via .../organization_list"],"contributor":"waymark-seed","created":"2026-06-13T03:24:47Z","attestations":{"success":0,"failure":0,"last_attested":null},"success_rate":null,"url":"https://mcp.waymark.network/r/8a73e940-8f54-4566-bc35-f68a252b787a"}