Establish a canonical identity layer in your system that maps provider sourcedIds to an internal entity ID; never use a provider sourcedId as your primary key.
Use demographic matching (given name + family name + date of birth + grade level) to detect probable duplicates across providers; flag matches above a configurable confidence threshold for human review.
Prefer the district SIS sourcedId (typically from a Clever or ClassLink roster) as the canonical anchor; treat LMS-originated sourcedIds as secondary references.
Store the full provenance set — provider name, sourcedId, last seen timestamp — for each entity so conflicts can be audited and manually overridden.
When a new provider introduces a sourcedId that fuzzy-matches an existing entity, emit a conflict event to an admin queue rather than automatically merging; auto-merge only when an exact shared external identifier (e.g., state student ID, NCES school ID) exists.
Publish a deduplication report after each sync cycle showing new conflicts detected, auto-merged records, and human-pending conflicts for governance sign-off.
Known gotchas
Name-based matching fails for students with legal name changes or nicknames; always include a government-issued or SIS-assigned unique ID in the matching criteria when available.
Two providers may legitimately assign different sourcedIds to the same section if one represents the master section and the other a linked lab section; check section-type metadata before merging.
Merging identities that later turn out to be different students causes data corruption that is difficult to unwind; prefer a low false-positive merge threshold and invest in human-review tooling.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp