Accuracy of privacy preserving record linkage for real world data in the United States: a systemic review

Khushi Tyagi; Sarah J Willis

doi:10.1093/jamiaopen/ooaf002

. 2025 Jan 22;8(1):ooaf002. doi: 10.1093/jamiaopen/ooaf002

Accuracy of privacy preserving record linkage for real world data in the United States: a systemic review

Khushi Tyagi ¹, Sarah J Willis ^2,^✉

PMCID: PMC11752849 PMID: 39845287

Abstract

Objectives

Examine the accuracy of privacy preserving record linkage (PPRL) matches in real world data (RWD).

Materials and Methods

We conducted a systematic literature review to identify articles evaluating PPRL methods from January 1, 2013 to June 15, 2023. Eligible studies included original research reporting quantitative metrics such as precision and recall in health-related data sources. Covidence software was used to manage the review process.

Results

Five studies met our inclusion criteria. Tokenization and hash functions were used to hash and encrypt personally identifiable information (PII) including first and last names, dates of birth (DOB), and Social Security Numbers (SSNs) in a variety of RWD. All identified studies utilized deterministic matching. Combinations of tokenized or hashed PII that included “quasi-identifiers” like names and DOBs had consistently high precision (>95%) but lower recall, likely due to misspelled or inconsistently spelled names and name changes. SSN-based combinations demonstrated high precision but variable recall due to incomplete SSN data in RWD. Studies that employed algorithms in which at least one match was identified from a specified set of PII combinations provided high precision and high recall.

Discussion

The systematic review indicates that PPRL methods generally provide highly accurate patient data linkage while maintaining privacy.

Conclusions

Researchers should carefully consider the completeness and stability of each PII element selected for PPRL and may want to employ a strategy that allows for patient records to be matched if they meet at least one of several combinations of PII.

Keywords: privacy preserving record linkage, administrative claims, healthcare, electronic health records, data anonymization, personally identifiable information

Background

Fragmentation within the United States (US) healthcare system results in patient data being scattered across various locations and databases. Linking healthcare data from disparate sources may enhance patient care and facilitate cross-institutional research.¹^,² However, the methods used to link the data must protect patient privacy in accordance with the Health Insurance Portability and Accountability Act (HIPAA). The protection of patients’ medical information, which includes not only clinical data but also sensitive personal details about lifestyle, family life, and habits, is necessary to prevent damage to a patient’s reputation, opportunities, and dignity.³^,⁴ Trust between a physician and patient is crucial for the exchange of confidential information.⁵ Breaches of privacy can lead to patients withholding information, which in turn affects their willingness to seek treatment or follow-up care.³

Privacy preserving record linkage (PPRL) methods are a potential solution to this linkage dilemma.^6–13 PPRL techniques, which include a variety of tools available from open sources and commercial vendors, typically hash and encrypt personally identifiable information (PII) including first and last names, dates of birth, Social Security Numbers (SSN), addresses, etc.⁶^,¹⁴ These techniques result in irreversible and nonsensical strings which make sure that the original PII is hidden. In addition, these techniques are reproducible, and the data elements can be utilized in disparate data sources to identify the same individuals.¹⁴^,¹⁵ PPRL techniques also allow for the comparison of records either on a holistic level, where all PII attributes within a record are merged into a single string for comparison, or on an individual attribute level, which employs specialized functions tailored to the data type of each attribute.¹⁶ This facilitates both deterministic and probabilistic comparisons between data sources. Deterministic comparisons determine if attribute values are identical, whereas probabilistic comparisons assess the degree of similarity between values, an essential feature given the inevitability of typographical errors and variations in real-world data-matching scenarios.¹⁷

Although PPRL provides the ability to link data across different databases, researchers and healthcare professionals need to understand the accuracy of potential matches identified using these methods. Several studies have evaluated this for specific PPRL techniques, but to our knowledge, a comprehensive review of these evaluations has not been done. Therefore, we conducted a systematic literature review to identify articles that evaluated PPRL in health-related data sources from the United States and which provided quantitative metrics associated with accuracy, to determine whether certain PII elements, individually or in combinations, consistently result in more accurate matches.

Methods

Protocol

We developed a protocol using the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 Statement.¹⁸ The protocol is described below, but not registered on a public website, and was not amended. Our PRISMA 2020 checklist is presented in Table S1.

Information sources, literature eligibility, search, and selection

The literature search was conducted in 2 major databases, PubMed and Embase. The search terms were “(‘privacy preserving record linkage’ OR ‘patient tokenization’) AND (‘precision’ OR ‘recall’ OR ‘F1’ OR ‘accuracy’ OR ‘specificity’ OR ‘false discovery rate’ OR ‘sensitivity’)” and were not restricted to article titles or abstracts. Eligible articles were published during January 1, 2013 to June 15, 2023 (date of literature search) and published in English. The authors also searched the reference lists of eligible articles to identify additional studies.

To ensure the selection of appropriate studies, the 2 authors independently screened titles, abstracts, and full texts to determine inclusion, with discrepancies resolved by discussion. Articles that did not use PII elements to conduct privacy preserving matches, conference abstracts or literature reviews/metanalyses, articles that did not evaluate the performance of PPRL or did not report at least one quantitative metric (precision, recall, F1, false discovery rate, accuracy, or specificity) were excluded from the review. We also excluded studies that evaluated PPRL methods in data collected outside of the United States due to the unique challenges researchers using US-based data may face due to the increased fragmentation in the healthcare system. Covidence was used to manage and streamline the review process.

Data items, data collection, and review

During the extraction process, we collected data on each eligible study’s PPRL methodology and validation process. This encompassed the types of real world data (RWD) used for linkage, total number of records or unique individuals in each dataset, and the specific PII elements used for matching. Data extraction was performed independently by the 2 authors to ensure accuracy and completeness. Additionally, we noted the number of records successfully matched in each instance, along with metrics assessing matching performance: precision, recall, and F1 scores. Precision measures the proportion of true positive matches among all positive matches made by the PPRL technique. It reflects the accuracy of the positive predictions. Recall measures the proportion of true positives correctly identified by the algorithm. It indicates the ability to capture all relevant instances or matches. The F1 score is the mean of precision and recall, balancing the 2 metrics to provide a single measure of a model’s performance. Descriptions of the “gold standard” validation methods, linkage techniques employed (eg, bloom filters, Datavant tokens, or specific software packages), and research limitations were also recorded. We did not conduct a risk of bias assessment in each study because our literature review was not designed to assess or compare the results of clinical trials, interventions, or outcome measurements from inferential statistical analyses.

Results

The search on PubMed and Embase databases yielded 187 studies (Figure 1). Of these, 18 duplicates were removed, leaving 169 studies for title and abstract screening. During the screening phase, we excluded 145 studies. Of the remaining 24 studies, 1 was not able to be retrieved, and 18 studies were excluded for reasons listed in the PRISMA diagram (Figure 1). Details about the excluded articles and reason for exclusion are provided in Table S2. Therefore, 5 studies met all the criteria and underwent a thorough data extraction process. The details of these 5 studies are included in Table 1.

Flow chart depicting the number of articles identified during search, number of articles excluded during screening, number of articles excluding during full-text review and reason for exclusion, and the number of articles included in the review. — PRISMA diagram.

Table 1.

Article information for articles selected for systematic literature review.

Article title	Authors	Data type (s)	Location	Year	Reference
Real-world matching performance of deidentified record-linking tokens	Bernstam EV, Applegate RJ, Yu A, et al.	EHR	Texas	2022	⁸
A methodological assessment of privacy preserving record linkage using survey and administrative data	Mirel LB, Resnick DM, Aram J, et al.	Survey data (NHCS) and vital statistics (NDI)	All states and Washington, DC	2022	⁶
The impact of name transformation on match rates within a large consumer database	Leshin J, Sanghvi A, Ravuri K, et al.	Person-level consumer data	N/A	2022	⁷
Design and implementation of a privacy preserving electronic health record linkage tool in Chicago	Kho AN, Cashy JP, Jackson KL, et al.	EHR	Illinois	2015	⁹
Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network	Bian J, Loiacono A, Sura A, et al.	EHR and administrative health care claims	Florida	2019	¹⁰

Open in a new tab

Abbreviations: EHR = electronic health records; NDI = National Death Index; NHCS = National Hospital Care Survey.

Three of the 5 studies utilized tokenization to protect PII effectively when matching or deduplicating database records. In Bernstam et al study, 9 single tokens and 4 token sets were generated via a commercial vendor, Datavant. The tokens used combinations of last name, first name, gender, date of birth (DOB), address information, SSN, and cell phone numbers (Table 2). In general, precision scores were >95%, while recall scores were lower for each single token and token sets. The single token that combined last name + first name + gender + DOB had the highest precision (99.9%) but considerably lower recall (64.8%). Precision for single tokens and token sets that included SSN was >99% (token 5, token 16, SSN match). However, <4% of eligible records had SSN available for matching (data not shown). The token set “Single Token Match,” in which at least one of 6 specified tokens matched, had very high precision (97.0%) and highest recall overall (95.5%). Token sets that relied upon multiple matched tokens (Demographic Token Match and Net Tokens Match) resulted in few false positive matches (precision >99%) but recall was ∼75%.⁸

Table 2.

Combinations of personally identifiable information, precision, recall, and F1 score for articles selected for systematic literature review.

Reference	PII combinations	Precision	Recall	F1
⁸	Last Name + 1st Initial of First Name + Gender + DOB (Token 1)	97.9%	90.3%	94.0%
	Last Name (soundex) + First Name (soundex) + Gender + DOB (Token 2)	98.6%	78.7%	88.0%
	Last Name + First Name + DOB + 3-digit zip code (Token 3)	Not reported	Not reported	Not reported
	Last Name + First Name + Gender + DOB (Token 4)	99.9%	64.8%	79.0%
	Gender + DOB + SSN (Token 5)	99.7%	87.7%	93.0%
	Last Name + 1st 3 Characters of First Name + Gender + DOB (Token 7)	98.5%	88.6%	93.0%
	First Name + Address (Token 9)	99.3%	23.4%	38.0%
	First Name + SSN (Token 16)	99.6%	61.1%	76.0%
	Cell Phone Number (Token 22)	95.6%	52.1%	67.0%
	Single Token Match (Token 1 or 2, or 3 or 4, or 5 or 16)	97.0%	95.5%	96.0%
	Demographic Token Match (Token 1 and 2)	99.6%	76.2%	86.0%
	Net Tokens Match—more tokens match than do not (used Tokens 1, 2, 4, 5, 7, 9, 16)	99.9%	75.0%	86.0%
	SSN Match (Token 5 or 16)	99.5%	90.9%	95.0%
⁶	All possible combinations of Tokens 1, 2, 4, 5, 7, 16, and 40^a	93.8%	98.7%	97.8%
⁶	Combinations of Tokens 1, 2, 4, 5, 7, 16, and 40^a where combinations with false positives >50% removed	98.9%	97.8%	Not reported
⁷	Last Name + First Name + Gender + DOB	94.6%	70.4%	80.7%
	Late Name + Nickname (.5,10) + Gender + DOB	93.5%	86.2%	89.7%
	Last Name + Nickname (.8,10) + Gender + DOB	94.5%	80.1%	86.7%
	Last Name (MPH) + Nickname (.3,20) (MPH) + Gender + DOB	91.9%	89.0%	90.5%
	Last Name (MPH) + Nickname (.5,10) (MPH) + Gender + DOB	92.5%	87.8%	90.1%
	Last Name (soundex) + Nickname (.5,10) (soundex) + Gender + DOB	92.2%	87.9%	90.0%
	Last Name + First Name (1 character correction) + Gender + DOB	94.3%	69.1%	79.7%
	Last Name + Nickname (.5,10), 3 letter truncation + Gender + DOB	91.4%	89.1%	90.2%
	Last Name + Nickname (.5,10), 1 letter truncation + Gender + DOB	82.7%	90.0%	86.2%
⁹	At least one of the following matched: Last Name + First Name + DOB, DOB + SSN, Last Name + SSN, or 3 letters of Last Name + 3 letters of First Name + Last Name (soundex) + First Name (soundex) + DOB + SSN	>99.9%	95.7%	Not reported
¹⁰	Last Name + First Name + Race + DOB	96.7%	94.7%	Not reported
	Last Name + First Name + Gender + DOB	93.9%	95.5%	Not reported
	Last Name + Gender + Race + DOB	42.7%	94.7%	Not reported
	Last Name + First Name + City + DOB	39.6%	81.3%	Not reported
	Last Name + First Name + Zip + DOB	26.7%	71.5%	Not reported
	Last Name + First Name + Zip + Race	17.2%	60.0%	Not reported
	Last Name + First Name + Zip + Gender	16.2%	58.5%	Not reported
	Last Name + First Name + City + Zip	16.1%	58.5%	Not reported
	First Name + Gender + Race + DOB	10.7%	97.8%	Not reported
	Last Name + First Name + Race + DOB or Last Name + First Name + Gender + DOB	97.3%	Not reported	Not reported

Open in a new tab

Token 40 is defined as Last Name + First Name + DOB + State.

Abbreviations: DOB = date of birth; SSN = social security number; Zip = 5 digit zip code.

Mirel et al study used the same commercial vendor, Datavant, to tokenize PII. They created 7 single tokens and evaluated the accuracy of matches made for all possible combinations of these tokens (n = 29). Overall, the matches made from all combinations of tokens resulted in 93.8% precision and 98.7% recall. When matches made for token combinations with >50% false positives were removed, precision increased to 98.9% and recall decreased slightly to 97.8%.⁶

In the study by Leshin et al, Datavant tokenization software was again used, and they evaluated how name transformations affect precision and recall. The first token generated used last name + first name + gender + DOB (referred to as Token 4 by Bernstam et al and Mirel et al) and result in 94.6% precision and 70.4% recall. The various transformations to first and last names including nicknames for first names (ie, allowing Robert to match with Bob), phonetic spellings, character corrections, and truncation. All name transformations reduced precision, but generally increased recall. The transformation that yielded the highest recall (90.0%) allowed first names to match with nicknames having relative frequency ≥50% (relative to other nicknames used in nickname clusters) and counts ≥10 (at least 10 records in nickname cluster used specified nickname).⁷

The remaining 2 studies employed unique deterministic PPRL approaches. In both studies, participating sites deidentified patient records using cryptographic hash functions and sent the deidentified data to an external agency for matching. Kho et al used SHA-512, a HIPAA-compliant cryptographic hash function created by the National Security Agency.⁹ Up to 17 512-bit hashes were created for each patient in the study and were based upon varying combinations of last name, first name, DOB, SSN, and gender. In their evaluation of accuracy, records were matched if they had the same hashed last name + first name + dob or DOB + SSN or last name + SSN or 3 letters last name + 3 letters first name + soundex last name + soundex first name + DOB + SSN. The precision and recall of their matching technique which allowed for patients to matched if they met one of these 4 combinations was >99.9% and 95.7%, respectively.⁹

Bian et al developed an open-source record linkage tool in Python, named OneFL Deduper. The tool uses a salted SHA256 algorithm, another cryptographic hash function, and hashes various combinations of last name, first name, race, gender, and DOB. In their pilot study using voter registration records, the best performing combinations included hashed last name + first name + race + DOB (precision 96.7%, recall 94.7%) and hashed last name + first name + gender + DOB (precision 93.9%, recall 95.5%). When they applied the OneFL Deduper to match electronic health records with administrative claims, matches were defined as those that matched on at least one of the 2 best performing combinations. The resulting precision increased to 97.3% and recall was not provided.¹⁰

Discussion

We identified 5 studies evaluating accuracy of PPRL in the United States through our systematic literature review. Overall, PPRL provided accurate matches across different RWD types, settings, and methods.

The studies demonstrate that false positive matches are limited when “quasi-identifiers” such as first and last names and dates of birth are only used for PPRL, given the consistency of precision scores ≥95%. However, quasi-identifiers tend to miss a greater proportion of potential matches within databases. This may be due in large part to inconsistently spelled or misspelled names, that is, Caitlin and Katelyn, names with diacritical marks or hyphenation, and the common practice of changing one’s last name after marriage. Leshin et al observed that the proportion of false negative matches decreased when phonetic spellings for first and last names were allowed and when they allowed commonly used nicknames to match with first names.⁷ Kho et al also noted that 15% of the false negative matches were due to different last names captured in their evaluation database and 97% of these false negatives occurred among patients with female sex.⁹

A consistently high-performing PII element for PPRL is SSN. SSN in combination with first name and SSN along with gender and DOB resulted in almost no false positive matches and few false negative matches in Bernstam et al.⁸ However, <4% of records available for matching had SSN recorded in that study. Kho et al relied upon several hashed combinations with SSN but included one PII combination that did not use SSN. They employed this strategy because one of their participating sites did not have SSN available for any patients and a second had SSN for only 28% of patients.⁹ Thus, it is important to note that while SSN can be used to improve the accuracy of PPRL methods, it is lack of routine capture in RWD necessitates that researchers rely upon additional PII elements in combination with SSN.

Four of the 5 studies evaluated matches made with multiple combinations of tokens or hashed PII. Bernstam et al used 2 token sets which allowed matches when multiple tokens were matched, that is, patients’ PII must match on Tokens 1 and 2 to be classified as a match.⁸ While this technique resulted in almost no false positive matches, approximately 1 in 4 true matches were missed. Bernstam et al and Kho et al employed matching algorithms that required patients’ PII to match on at least one token or hashed PII combination from a series of several tokens or hashed PII combinations.⁸^,⁹ These matching algorithms resulted in precision and recall >95% which indicates that researchers employing similar algorithms could increase their capture of true positive matches while still limiting false positive matches.

Using location and address data may also impact PPRL precision and recall. Addresses are unstable identifiers that often differ across datasets due to individuals moving or having missing/incorrect data. This instability tends to lead to lower recall as true matches are missed when address data are inconsistent or incomplete.⁷^,¹¹ While address data can enhance the uniqueness of patient data and improve precision, it is variability and likelihood of missing or outdated information can negatively affect recall. Recall for matches made with address data may be particularly low among highly mobile patient populations, such as young adults.¹⁹ This type of bias highlights the need to consider not only the completeness of PII elements, such as SSN, in studied populations but also the stability of selected PII elements overtime in the underlying patient population a researcher intends to study.

The reviewed studies exhibit several notable similarities to PPRL performance in several evaluations using data collected outside of the United States. The use of a unique healthcare ID, analogous to an SSN in United States, combined with DOB and gender demonstrated high precision.^11–13 Additionally, combinations using only quasi-identifiers like first and last names and DOB were also highly precise and had strong recall across various data sources.^11–13 The inclusion of additional demographic information with these quasi-identifiers, such as gender or race, mirrors the results of our literature review, showing enhanced linkage quality when such elements are used.¹²^,¹³

There are several limitations to this review that should be acknowledged. First, we focused on US-based RWD and our results may not generalize to other RWD from other countries and languages. We acknowledge that this study does not fully capture the diversity of practices and outcomes in record linkage techniques worldwide. Second, the research is restricted to articles available on Embase and PubMed. These platforms, while comprehensive, may not encompass all relevant studies, potentially omitting significant findings published elsewhere. Additionally, the studies reviewed are limited to those published in English, which precludes inclusion of valuable research conducted in other languages. Furthermore, the time frame of the studies included is limited to those published up to June 15, 2023. There may be recent advancements and findings beyond this date that have been omitted.

Despite these limitations, this systematic literature review is one of few regarding PPRL methodologies and its associated accuracy. The review provides researchers with several important considerations to weigh when employing PPRL in their work.

Conclusions

PPRL is a reliable solution to link patient records across disparate data sources and may enhance patient care and facilitate cross-institutional research. Researchers should carefully consider the completeness and stability of each PII element selected for PPRL and may want to employ a strategy that allows for patient records to be matched if they meet at least one of several combinations of PII.

Supplementary Material

ooaf002_Supplementary_Data

ooaf002_supplementary_data.zip^{(27.3KB, zip)}

Contributor Information

Khushi Tyagi, US Commercial Office, Pfizer, Inc., New York, NY 10001, United States.

Sarah J Willis, US Commercial Office, Pfizer, Inc., Cambridge, MA 02139, United States.

Author contributions

Khushi Tyagi and Sarah J. Willis were responsible for the study concept and design, review of all articles, data extraction among eligible articles, and drafting and editing the manuscript.

Supplementary material

Supplementary material is available at JAMIA Open online.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Conflicts of interest

S.J.W. and K.T. are employees of Pfizer, Inc. and may hold stock in the company.

Data availability

The data underlying this article are available in the article and in its online supplementary material.

References

1. Jayaratne M, Nallaperuma D, De Silva D, et al. A data integration platform for patient-centered e-healthcare and clinical decision support. Future Gener Comput Syst. 2019;92:996-1008. [Google Scholar]
2. Batko K, Ślęzak A. The use of big data analytics in healthcare. J Big Data. 2022;9:3. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Noroozi M, Zahedi L, Bathaei FS, et al. Challenges of confidentiality in clinical settings: compilation of an ethical guideline. Iran J Public Health. 2018;47:875-883. [PMC free article] [PubMed] [Google Scholar]
4. Sankar P, Mora S, Merz JF, Jones NL. Patient perspectives of medical confidentiality: a review of the literature. J Gen Intern Med. 2003;18:659-669. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Iott BE, Campos-Castillo C, Anthony DL. Trust and privacy: how patient trust in providers is related to privacy behaviors and attitudes. AMIA Annu Symp Proc. 2020;2019:487-493. [PMC free article] [PubMed] [Google Scholar]
6. Mirel LB, Resnick DM, Aram J, et al. A methodological assessment of privacy preserving record linkage using survey and administrative data. Stat J IAOS. 2022;38:413-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Leshin J, Sanghvi A, Ravuri K, et al. The impact of name transformation on match rates within a large consumer database. AMIA Annu Symp Proc. 2023;2022:692-699. [PMC free article] [PubMed] [Google Scholar]
8. Bernstam EV, Applegate RJ, Yu A, et al. Real-world matching performance of deidentified record-linking tokens. Appl Clin Inform. 2022;13:865-873. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Kho AN, Cashy JP, Jackson KL, et al. Design and implementation of a privacy preserving electronic health record linkage tool in Chicago. J Am Med Inform Assoc. 2015;22:1072-1080. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Bian J, Loiacono A, Sura A, et al. Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network. JAMIA Open. 2019;2:562-569. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Brown AP, Borgs C, Randall SM, et al. Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets. BMC Med Inform Decis Mak. 2017;17:83. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Brown AP, Randall SM, Ferrante AM, et al. Estimating parameters for probabilistic linkage of privacy-preserved datasets. BMC Med Res Methodol. 2017;17:95. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Schnell R, Bachteler T, Reiher J. Privacy-preserving record linkage using Bloom filters. BMC Med Inform Decis Mak. 2009;9:41. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Pathak A, Serrer L, Bhalla M, et al. Proposed framework for adopting privacy-preserving record linkage for public health action. J Public Health Manag Pract. 2025;31:E26-E33. [DOI] [PubMed] [Google Scholar]
15. Tachinardi U, Grannis SJ, Michael SG, et al. Privacy-preserving record linkage across disparate institutions and datasets to enable a learning health system: the national COVID cohort collaborative (N3C) experience. Learn Health Syst. 2024;8:e10404. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Vatsalan D, Christen P, Verykios VS. A taxonomy of privacy-preserving record linkage techniques. Inf Syst. 2013;38:946-969. [Google Scholar]
17. Nagels J, Wu S, Gorokhova V. Deterministic vs probabilistic: best practices for patient matching based on a comparison of two implementations. J Digit Imaging. 2019;32:919-924. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Syst Rev. 2021;10:89. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Pew Research Center. American Mobility: Who Moves? Who Stays Put? Where’s Home? Last updated December 2008. Accessed December 19, 2024. https://www.pewresearch.org/wp-content/uploads/sites/3/2010/10/Movers-and-Stayers.pdf

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ooaf002_Supplementary_Data

ooaf002_supplementary_data.zip^{(27.3KB, zip)}

Data Availability Statement

The data underlying this article are available in the article and in its online supplementary material.

[ooaf002-B1] 1. Jayaratne M, Nallaperuma D, De Silva D, et al. A data integration platform for patient-centered e-healthcare and clinical decision support. Future Gener Comput Syst. 2019;92:996-1008. [Google Scholar]

[ooaf002-B2] 2. Batko K, Ślęzak A. The use of big data analytics in healthcare. J Big Data. 2022;9:3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B3] 3. Noroozi M, Zahedi L, Bathaei FS, et al. Challenges of confidentiality in clinical settings: compilation of an ethical guideline. Iran J Public Health. 2018;47:875-883. [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B4] 4. Sankar P, Mora S, Merz JF, Jones NL. Patient perspectives of medical confidentiality: a review of the literature. J Gen Intern Med. 2003;18:659-669. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B5] 5. Iott BE, Campos-Castillo C, Anthony DL. Trust and privacy: how patient trust in providers is related to privacy behaviors and attitudes. AMIA Annu Symp Proc. 2020;2019:487-493. [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B6] 6. Mirel LB, Resnick DM, Aram J, et al. A methodological assessment of privacy preserving record linkage using survey and administrative data. Stat J IAOS. 2022;38:413-421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B7] 7. Leshin J, Sanghvi A, Ravuri K, et al. The impact of name transformation on match rates within a large consumer database. AMIA Annu Symp Proc. 2023;2022:692-699. [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B8] 8. Bernstam EV, Applegate RJ, Yu A, et al. Real-world matching performance of deidentified record-linking tokens. Appl Clin Inform. 2022;13:865-873. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B9] 9. Kho AN, Cashy JP, Jackson KL, et al. Design and implementation of a privacy preserving electronic health record linkage tool in Chicago. J Am Med Inform Assoc. 2015;22:1072-1080. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B10] 10. Bian J, Loiacono A, Sura A, et al. Implementing a hash-based privacy-preserving record linkage tool in the OneFlorida clinical research network. JAMIA Open. 2019;2:562-569. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B11] 11. Brown AP, Borgs C, Randall SM, et al. Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets. BMC Med Inform Decis Mak. 2017;17:83. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B12] 12. Brown AP, Randall SM, Ferrante AM, et al. Estimating parameters for probabilistic linkage of privacy-preserved datasets. BMC Med Res Methodol. 2017;17:95. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B13] 13. Schnell R, Bachteler T, Reiher J. Privacy-preserving record linkage using Bloom filters. BMC Med Inform Decis Mak. 2009;9:41. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B14] 14. Pathak A, Serrer L, Bhalla M, et al. Proposed framework for adopting privacy-preserving record linkage for public health action. J Public Health Manag Pract. 2025;31:E26-E33. [DOI] [PubMed] [Google Scholar]

[ooaf002-B15] 15. Tachinardi U, Grannis SJ, Michael SG, et al. Privacy-preserving record linkage across disparate institutions and datasets to enable a learning health system: the national COVID cohort collaborative (N3C) experience. Learn Health Syst. 2024;8:e10404. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B16] 16. Vatsalan D, Christen P, Verykios VS. A taxonomy of privacy-preserving record linkage techniques. Inf Syst. 2013;38:946-969. [Google Scholar]

[ooaf002-B17] 17. Nagels J, Wu S, Gorokhova V. Deterministic vs probabilistic: best practices for patient matching based on a comparison of two implementations. J Digit Imaging. 2019;32:919-924. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B18] 18. Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Syst Rev. 2021;10:89. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ooaf002-B19] 19. Pew Research Center. American Mobility: Who Moves? Who Stays Put? Where’s Home? Last updated December 2008. Accessed December 19, 2024. https://www.pewresearch.org/wp-content/uploads/sites/3/2010/10/Movers-and-Stayers.pdf

PERMALINK

Accuracy of privacy preserving record linkage for real world data in the United States: a systemic review

Khushi Tyagi, BA

Sarah J Willis, PhD, MPH

Roles

Abstract

Objectives

Materials and Methods

Results

Discussion

Conclusions

Background

Methods

Protocol

Information sources, literature eligibility, search, and selection

Data items, data collection, and review

Results

Figure 1.

Table 1.

Table 2.

Discussion

Conclusions

Supplementary Material

Contributor Information

Author contributions

Supplementary material

Funding

Conflicts of interest

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Accuracy of privacy preserving record linkage for real world data in the United States: a systemic review

Khushi Tyagi, BA

Sarah J Willis, PhD, MPH

Roles

Abstract

Objectives

Materials and Methods

Results

Discussion

Conclusions

Background

Methods

Protocol

Information sources, literature eligibility, search, and selection

Data items, data collection, and review

Results

Figure 1.

Table 1.

Table 2.

Discussion

Conclusions

Supplementary Material

Contributor Information

Author contributions

Supplementary material

Funding

Conflicts of interest

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases