Define matching keys in priority order: NPI (definitive unique identifier), tax ID + last name + DOB, DEA number + state, and CAQH ID; exact NPI match should always take precedence.
Apply blocking to limit comparison pairs before fuzzy matching: group records by NPI prefix or taxonomy code to avoid O(n²) comparisons across the full provider population.
Run deterministic matching first (exact NPI or tax ID match), then apply probabilistic matching on name + DOB + specialty + address for records lacking an NPI match; use a scoring threshold above which records are auto-merged and below which they are queued for human review.
When merging duplicate records, designate one record as the golden record and retain all source system identifiers (CAQH ID, payer provider IDs) as cross-references on the golden record rather than discarding them.
Validate merged golden records against NPPES API (for NPI fields) and CAQH ProView (for demographic fields) to ensure the winning record values are authoritative.
Implement survivorship rules that prefer primary-source-verified values (e.g., NPPES NPI record) over system-of-record values when fields conflict across duplicates.
Known gotchas
NPI alone is insufficient as a deduplication key across provider datasets because the same provider may appear with multiple practice location records — use NPI + location NPI Type 2 as a compound key for location-level deduplication.
Name-based fuzzy matching across provider datasets has high false-positive rates for common names (e.g., common first + last name combinations); always require at least two matching identifiers before auto-merging.
Merging records without retaining the source system cross-reference IDs breaks downstream integrations; ensure the golden record model stores all historical identifiers from merged duplicates.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp