Identify all data fields in your dataset and map them against the 18 Safe Harbor identifier categories: names, geographic data smaller than state, all dates (except year) for individuals over 89, phone numbers, fax numbers, email addresses, SSNs, medical record numbers, health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers, device identifiers, URLs, IP addresses, biometric identifiers, full-face photographs, and any other unique identifying numbers or codes.
Remove or generalize each identified field: drop direct identifiers entirely; truncate ZIP codes to 3 digits (or suppress if the 3-digit ZIP has fewer than 20,000 people); replace full dates with year only (or remove for patients over 89).
Suppress any free-text fields (clinical notes, comments) or apply NLP-based named-entity recognition to detect and redact identifiers embedded in unstructured text.
Verify that no remaining combination of fields could reasonably identify an individual; the Safe Harbor method requires you to have no actual knowledge that the remaining information could be used to identify a person.
Document your de-identification process, the fields removed or transformed, and the date of de-identification to support compliance audits.
If using Expert Determination instead of Safe Harbor, engage a qualified statistician to apply statistical methods and document that re-identification risk is very small.
Known gotchas
Safe Harbor de-identification does not equal anonymization; re-identification may still be possible via linkage attacks, especially with rare conditions or small geographic areas—consider additional k-anonymity or differential privacy measures for high-risk datasets.
Free-text clinical notes are the most common source of residual PHI after structured-field removal; automated NLP redaction tools have imperfect recall and should be validated on representative samples before production use.
De-identified data is no longer protected under HIPAA, but re-identification and misuse may still be prohibited by state law, institutional policy, or data use agreements—check applicable obligations before sharing de-identified datasets.
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp