Skip to main content
Journal of the American Medical Informatics Association: JAMIA logoLink to Journal of the American Medical Informatics Association: JAMIA
. 2025 Aug 26;32(10):1629–1637. doi: 10.1093/jamia/ocaf140

Electronic health records-based algorithms to screen for U.S. Centers for Disease Control and Prevention tier 1 genetic diseases: a scoping review

William R Harris 1, Marianna S Hernandez 2, Khanh N H Ngo 3, Anne Fladger 4, Charles A Brunette 5,6, Sulaiman R Hamarneh 7, Joshua W Knowles 8,9, Matthew S Lebo 10,11, Jason L Vassy 12,13,
PMCID: PMC12451938  PMID: 40857647

Abstract

Objective

Missed diagnosis of genetic conditions is a persistent challenge in clinical care, particularly for familial hypercholesterolemia (FH), hereditary breast and ovarian cancer (HBOC), and Lynch syndrome—conditions designated by the U.S. Centers for Disease Control and Prevention (CDC) as Tier 1 genomic applications. This scoping review summarizes evidence on the use of electronic health record (EHR)-based algorithms to identify individuals with these conditions.

Materials and Methods

We conducted a scoping review using the JBI Manual for Evidence Synthesis and reported results according to PRISMA-ScR guidelines. We searched Ovid MEDLINE, Embase, and Web of Science through October 2024 for studies evaluating EHR-based algorithms to identify individuals with FH, HBOC, or Lynch syndrome. Eligible studies addressed (1) performance of algorithms in detecting clinically or genetically confirmed cases or (2) outcomes from the implementation of algorithms in unselected populations with follow-up to identify new diagnoses.

Results

Of 598 articles screened, 22 met inclusion criteria. Most studies (20/22) focused on FH. Fourteen FH studies assessed algorithm performance, and 7 reported prospective implementation. FH algorithm performance varied widely (AUROC range 0.78-0.95), with machine learning models outperforming rule-based approaches. Implementation studies reported positive predictive values ranging from 11% to 67%. Only two studies addressed HBOC or Lynch syndrome, both using rules-based algorithms with limited sensitivity.

Discussion

Machine learning models consistently outperform rules-based algorithms relying on clinical criteria, but limited evidence exists for HBOC and Lynch syndrome.

Conclusions

Early identification of CDC Tier 1 genetic conditions through EHR-based screening algorithms holds promise but will require both technical and implementation advances to realize improved patient care and outcomes.

Keywords: electronic health records, genomic screening, machine learning, familial hypercholesterolemia, hereditary breast and ovarian cancer syndrome, Lynch syndrome

Introduction

Missed diagnosis is a persistent challenge in clinical medicine, particularly for common monogenic diseases.1 Three such diseases—familial hypercholesterolemia, hereditary breast and ovarian cancer, and Lynch Syndrome—have been designated by the U.S. Centers for Disease Control and Prevention (CDC) as “Tier 1” genomic applications, because identification and management of these genetic conditions have the potential for significant positive impact on public health through evidence-based interventions.2 Familial hypercholesterolemia (FH) is an inherited disorder of lipid metabolism characterized by markedly elevated low-density lipoprotein cholesterol (LDL-C) levels; individuals with FH can lower their risk of heart disease by initiating lipid-lowering treatment early in life3; Hereditary breast and ovarian cancer (HBOC) syndrome, most often caused by pathogenic variants in the BRCA1 or BRCA2 gene, is associated with increased risk of breast, ovarian, prostate, and other cancers, and men and women with HBOC can reduce their risk through earlier screening and prophylactic surgeries or medications4 Similarly, Lynch syndrome is a hereditary predisposition to colorectal, endometrial, and other cancers, and patients who know they have Lynch syndrome can undergo earlier and more frequent screenings.5

The CDC estimates that over 2 million Americans are affected by these three conditions, but many individuals are not aware that they are at risk and many frontline clinicians do not recognize these genetic causes of otherwise common conditions, such as breast cancer and cardiovascular disease.1,2 Accurate estimates of the prevalence of these conditions is difficult, given that many individuals remain undiagnosed. In the United States, recent estimates of the prevalence of these conditions are 1 in 250 for FH, 1 in 400 for HBOC, and 1 in 450 for Lynch Syndrome.6–8

Diagnosis of FH, Lynch syndrome, and HBOC can be made through clinical criteria or through genetic testing. For FH, patients can be diagnosed with one of three sets of diagnostic criteria: the Simon-Broome (SB) Register Diagnostic Criteria, the Dutch Lipid Clinic Network (DLCN) Diagnostic Criteria, and the US Make Early Diagnosis Prevent Early Deaths (MEDPED) Diagnostic Criteria. Each of these clinical criteria involves some combination of lipid testing data, family history, and physical exam findings, many of which are not regularly collected in routine medical care.3 The diagnosis of FH can also be made through genetic testing that identifies pathogenic variants in the genes encoding the lipoprotein receptor (LDLR), apolipoprotein B (APOB), and proprotein convertase subtilisin/kexin 9 (PCSK9).9 For Lynch syndrome, diagnosis is made through genetic testing that identifies inactivating mutations in mismatch repair genes (MLH1, MSH2, MSH6, or PMS2), including deletions in the EPCAM gene that cause epigenetic silencing of MSH2,10 while patients are identified as high-risk for Lynch syndrome through clinical criteria such as Amsterdam Criteria or Bethesda Guidelines, which incorporate personal and family cancer history. Diagnosis of HBOC is made through genetic testing that includes a panel of genes including BRCA1 and BRCA2. Several models, such as BRCAPRO and The Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA), have been developed to assess an individual’s risk of carrying a BRCA1 or BRCA2 pathogenic variant.4 Still, making a genetic diagnosis of these conditions first requires the patient or treating clinician to consider genetic causes in their differential diagnosis. Moreover, in many cases genetic diagnosis of an affected individual represents a missed opportunity to have identified and addressed their genetic risk prior to disease manifestation.

Clinicians need support to identify undiagnosed cases of genetic disease. Traditional clinical genetics has relied on the diagnostic criteria or risk algorithms above to identify such individuals, but advances in electronic health records (EHR) and artificial intelligence (AI) present the opportunity for broader case-finding through computational methods, ranging from simple rules-based algorithms to more advanced machine learning (ML) methods, a subset of AI.11 Such approaches might aid health systems in screening for these hereditary conditions by identifying patients who are likely to have undiagnosed genetic disease and would benefit from diagnostic genetic testing.

To determine the tools available to address these challenges, we performed a scoping review to summarize the available evidence for two questions related to the EHR-based identification of undiagnosed genetic conditions:

Question 1: What is known about the performance of EHR-based algorithms in identifying clinically diagnosed or genetically confirmed cases of CDC Tier 1 genetic disorders within a healthcare system-based population?

Question 2: What is known about the outcomes of the implementation of EHR algorithms plus targeted follow-up in identifying cases of CDC Tier 1 genetic disorders within an unselected healthcare system-based population?

Methods

Approach

Our preliminary review of the literature identified a paucity of studies and heterogeneity in methods used to assess EHR-based algorithms for screening for Tier 1 genetic disorders. We therefore chose a scoping review design, instead of a systematic review, to map existing methods and evidence in this emerging field. Our protocol was formulated using the Population-Concept-Context guidelines described for scoping reviews in the JBI Manual for Evidence Synthesis.12 The Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) guided the reporting of this review.13

Search strategy

To plan the full scoping review, a preliminary scan of the literature was performed using OVID MEDLINE with search terms including “Hereditary Breast and Ovarian Cancer,” “Familial Hypercholesterolemia,” “Lynch Syndrome,” “Electronic Health Records,” “Machine Learning,” and “Algorithms,” with appropriate AND/OR statements to narrow results to a manageable number for initial review. Initial review of these results was completed by a single reviewer and discussed among the research team. This knowledge was used to refine the eligibility criteria and plan a more comprehensive search. The full search was conducted on October 16, 2024. The search was conducted using three databases: Ovid MEDLINE, Embase, and Web of Science; see full search strategy in the Supplementary Material. The search was restricted to studies published in English. Downloaded search results were imported to Covidence review software for deduplication, screening, full-text review, and data extraction. The references of eligible studies were searched for additional eligible studies, which were also imported to Covidence for screening and full-text review.

Inclusion criteria and screening process

We used two distinct sets of inclusion criteria for selecting eligible studies. Our first set of inclusion criteria was developed for observational studies that evaluated the performance of EHR-based screening algorithms for Tier 1 genetic disorders (see Question 1 in Background section). Studies were considered eligible if they evaluated performance metrics for EHR-based screening algorithms for Tier 1 genetic disease, as defined below in Data Extraction. Eligible studies used populations with a known proportion of individuals with Tier 1 genetic disorders, diagnosed clinically or through genetic testing, and output of EHR-based screening algorithms was compared to these known diagnoses to assess algorithm performance. We defined an EHR-based screening algorithm as any algorithm that uses electronic health record data as input, and delivers an output that includes some classification of likelihood of genetic disease (eg, “high likelihood of FH”). Examples included machine learning models and simple decision trees using computable diagnostic criteria. Of note, studies were not included if they only assessed risk for a certain phenotype (eg, risk of breast cancer) rather than genotype (eg, HBOC).

Our second set of inclusion criteria was developed for prospective studies that evaluated clinical implementation of EHR-based screening algorithms to identify previously undiagnosed individuals with Tier 1 genetic disorders (Question 2). Studies were considered eligible if they implemented EHR-based algorithms on data from an unselected health care population—that is, not limited to patients with a specific phenotype—combined with an intervention that promoted genetic testing and/or clinical evaluation for individuals screening positive on the algorithm.

During record screening, titles and abstracts were reviewed independently in duplicate by at least two team members. Conflicts were resolved during virtual team meetings. If conflicts could not be resolved during abstract and title review, the study was included in full-text review. During full-text review, studies were again reviewed independently by at least two team members. All conflicts were resolved during team meetings.

Data extraction

Data extraction was carried out using Covidence data extraction tools. Data was extracted on the general characteristics of each study, including year, country, study population, methods, and algorithm components. For Question 1, given the scoping review design, we used an expansive definition of the performance of observational studies assessing EHR-based screening algorithms performance, extracting metrics such as specificity, sensitivity, positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic (AUROC) curve. Similarly, for prospective studies that implemented EHR-based algorithms as a way of identifying undiagnosed Tier 1 genetic disorders (Question 2), we extracted data on any outcomes of clinical utility, defined conceptually as a change in clinical management or improved patient outcomes. Example metrics included the number of new cases identified per individual screened, PPV, specificity, and sensitivity.

Results

Study characteristics

Our comprehensive search of Ovid MEDLINE, Embase, and Web of Science identified 532 articles. After reviewing the reference lists of eligible studies, an additional 66 articles were added to Covidence for screening. After screening, 22 studies met inclusion criteria. The PRISMA diagram (Figure 1) shows the results of the screening process. Studies were conducted between 2015 and 2024. Of the 22 eligible studies, 9 were conducted in the United States,14–22 6 in the United Kingdom,23–28 2 in the Netherlands,29,30 3 in Australia,31–33 1 in Portugal,34 and 1 in South Africa.35 Twenty studies evaluated FH,15–17,19–35 1 study evaluated HBOC,18 and 1 evaluated both HBOC and Lynch syndrome.14 Fourteen were retrospective studies that assessed algorithm performance (Question 1),16–18,20,23–27,29,30,32,34,35 7 were prospective studies that implemented EHR-based algorithms in clinical care (Question 2),14,15,21,22,28,31,33 and 1 study included both retrospective and prospective components.19

Figure 1.

Alt text: PRISMA diagram showing that 598 records were identified and 22 studies ultimately included.

PRISMA diagram illustrating study selection process.

Performance of EHR-based algorithms for FH (Question 1)

Observational studies that assessed the performance of EHR-based algorithms for FH (Question 1) are summarized in Table S1. Of the 14 studies listed, 6 assessed algorithm performance against genetically confirmed FH as the gold standard of diagnosis16,27,29,30,34,35; the remaining 8 studies used clinical diagnosis of FH or expert review of EHR data as the gold standard.17,19,20,23–26,32 Twelve studies used EHR-based algorithms that incorporated ML, including random forest classifiers, logistic regression, deep learning, gradient-boosting machines (GBM), neural networks (NN), and ensemble models that incorporated multiple types of ML methods.16,17,19,24–30,34,35 Five studies included EHR-based algorithms that used rules-based classification, typically based on one or more sets of commonly used clinical diagnostic criteria, such as the DLCN, SB, or MEDPED criteria.20,27,30,32,34

Reported performance metrics for FH observational studies are also provided in Table S1. Of the best performing models in each study, the AUROC ranged from 0.78 to 0.95. Other performance metrics, such as sensitivity, specificity, PPV, and NPV varied widely, depending on thresholds used to classify a positive FH screen. For each study that compared ML models against algorithms that extracted EHR data for automated classification based on accepted clinical diagnostic criteria, ML models out-performed diagnostic criteria based on AUROC values. For example, in the study by Stevens, the authors trained an ensemble ML model that achieved an AUROC of 0.79, compared to a simple rules-based classification algorithm based on DLCN scores that achieved an AUROC of 0.67.27

Outcomes from implementing FH algorithms (Question 2)

Seven studies reported the results of implementing EHR-based screening algorithms to prospectively identify individuals with undiagnosed FH. Characteristics and key findings of these studies are presented in Table 1. The types of models used for EHR-based screening include rules based classification, random forest classification (FIND FH), and logistic regression (FAMCAT). Five of the studies reported genetic testing results, while 2 of the studies reported clinical diagnosis of FH. The majority of these studies screened large healthcare system-based populations for FH and then recruited a relatively small subset for clinical follow-up with or without genetic testing. For example, a study in Kaiser Permanente Northern California screened over 1.8 million patients, finding that 1 in 245 (over 7000) screened positive for “Probable FH” on a rules-based classification algorithm based on MEDPED criteria; ultimately only 82 patients met with a cardiologist and underwent genetic testing, 55 of whom (67.1%) tested positive.18

Table 1.

Studies reporting results of prospective application of EHR-based algorithms to detect familial hypercholesterolemia (Question 2).

Record Population (n screened) Type of algorithm(s) (name) Key findings
Eid 202215 St. Elizabeth Healthcare System (264 264) Rules based classification based on combination of DLCN criteria and American Heart Association (AHA) criteria
  • 1573 patients flagged as high risk for FH by the hybrid model were offered referral to a lipid clinic or a precision medicine clinic for genetic testing.

  • 6%-10% of patients flagged underwent genetic testing

  • 22% of the patients tested were positive for FH

Birnbaum 202122 Kaiser Permanente Northern California (1 831 658) Rules based classification based on MEDPED criteria
  • 1 out of 245 patients in the cohort screened positive for “Probable FH.”

  • 182 patients were invited for a cardiology visit, starting with those with the highest LDL-C values in a given age range

  • 100 patients met with a cardiologist, of whom 82 underwent genetic testing

  • 55 out of 82 (67.1%) tested positive for FH; an additional 20 family members were diagnosed through cascade testing

Sheth 202121 University of Pennsylvania Healthcare System with cardiovascular co-morbidities (1 607  606) Random forest (FIND FH)
  • 8614 patients flagged as “likely FH”

  • Primary care providers successfully contacted for 5006 patients, 153 of whom were seen in preventive cardiology clinics

  • 112 out of 153 (73.2%) patients received genetic testing, 16 of whom (14%) tested positive for FH

Brett 202133 General practices in Australia (232 139) Rules based classification based on DLCN criteria (TARB-Ex)
  • 1843 patients flagged for risk of FH based on DLCN ≥ 5, 900 of whom (49%) were confirmed as high risk by general practitioner EHR review

  • 556 out of 900 (62%) were clinically assessed for FH by a general practitioner, of whom 147 (26%) were clinically diagnosed with FH

  • Follow-up data obtained from 77 patients with FH showed statistically significant reduction in LDL-C

Qureshi 202128 Patients in general practices in the UK (86 219) Logistic regression (FAMCAT1, FAMCAT2)
  • 3375 patients identified as “at-risk” for FH and invited for study.

  • 260 out of 3375 patients underwent genetic testing.

  • 16 out of 260 patients that underwent genetic testing were positive for FH.

  • Authors calculated FAMCAT1 score, FAMCAT2 score, and DLCN criteria for 260 patients who underwent genetic testing

  • FAMCAT2 outperformed FAMCAT1 and DLCN criteria in predicting the presence of FH pathogenic variants.

  • FAMCAT2 performance based on a threshold of 95% sensitivity: Sensitivity: 94.7%; Specificity: 31.2%; PPV: 25.8%; NPV: 95.9%

Myers 201919 Population 1: US National Research Database (170 416 201) Random forest (FIND FH) Population 1:
  • 1 331 759 screened positive for FH

  • Physicians of a subset of patients who screened positive were contacted to provide an assessment of FH based on clinical diagnostic criteria

  • EHR data from 45 patients were evaluated by contacted physicians, 87% of whom were identified as possible, probable, or definite FH

Population 2: Patients in Oregon Health & Science University (OHSU) health-care system (173 733) Population 2:
  • 866 screened positive for FH, 103 of whom were reviewed by an FH expert

  • 77% of those reviewed by an expert were found to have possible, probable, or definite FH

Kirke 201531 Adult patients in Australia (94 379) Rules-based classification (Canning Tool) Method 1 (Canning Tool in primary care clinic):
  • 41 100 patients screened, 2494 patients with positive screen invited for clinical assessment, 26 patients underwent genetic testing, of whom 3 (11.5%) were positive

Rules-based classification (LDL-C, total cholesterol threshold) Method 2 (Screening based on thresholds of total cholesterol >7.5 mmol/l or LDL-C > 4.5 mmol/L):
  • 52 200 patients screened, 4517 patients with positive screen invited for clinical assessment, 30 patients underwent genetic testing, of whom 8 (26.7%) were positive

Abbreviations: AHA, American Heart Association; DLCN, Dutch Lipid Clinic Network; EHR, electronic health record; FAMCAT, Familial Hypercholesterolemia Case Ascertainment Tool; FH, familial hypercholesterolemia; LDL-C, low-density lipoprotein cholesterol; MEDPED, Make Early Diagnosis Prevent Early Deaths; NPV, negative predictive value; PPV, positive predictive value; TARB-Ex, Tool for Assessing Risk of Familial Hypercholesterolemia in the Electronic Medical Record.

Performance and outcomes from implementation of EHR-based algorithms for HBOC and Lynch syndrome (Questions 1 and 2)

The search identified only 2 studies of EHR-based algorithms addressing either Question 1 or 2 for HBOC and Lynch syndrome (Table 2). The study by Kiser was a retrospective study that assessed an EHR-based screening algorithm for HBOC.18 Their study population included a subset of individuals with a known proportion of HBOC pathogenic variants, and they reported performance of their algorithm with genetic testing as the gold standard. To develop and train their model, they extracted family history of HBOC-related cancers from EHR data using a combination of discrete codes and string matches in comment fields. The study by Del Fiol was a prospective study that implemented an EHR-based algorithm to screen for hereditary cancers, including both HBOC and Lynch Syndrome.14 They implemented a rules-based screening tool based on National Comprehensive Cancer Network (NCCN) guidelines, using structured family history data within EHRs, to identify patients at risk of hereditary cancers; 2 of 10 patients who ultimately underwent genetic testing had positive genetic results (PPV 20%).14

Table 2.

Studies reporting EHR-based algorithms to detect hereditary breast and ovarian cancer or Lynch syndrome (Questions 1 and 2).

Record Study type Population (n screened) Type of algorithm Key findings
Kiser 202418 Diagnostic Test Accuracy Study (Question 1) Renown Health, a health system in Northern Nevada (835,727 total, 38,003 with prior genetic testing) Rules-based algorithm for HBOC Authors implemented an algorithm that extracts data from EHR to assess whether patients met USPSTF eligibility for genetic services. The performance of their algorithm is based on a subset of patients (38,003) who obtained prior genetic testing as part of a population genomics study.
  • Performance metrics on female population: Sensitivity: 29%; Specificity: 85%

  • Performance on male population: Sensitivity: 14%; Specificity: 93%

Del Fiol 201914 Implementation Study (Question 2) University of Utah Health Centers (143 012) Rules-based algorithm based on National Comprehensive Cancer Network (NCCN) criteria Authors implemented a clinical decision support tool that automatically extracts EHR data to identify patients meeting criteria for hereditary breast cancer and colon cancer genetic testing.
  • 5245 (3.7%) of population met criteria for genetic evaluation

  • 71 patients were contacted to schedule genetic counseling appointments

  • 2 of 10 patients who received genetic testing were positive for hereditary cancer syndrome(s)

Abbreviations: EHR, electronic health record; HBOC, hereditary breast and ovarian cancer; NCCN, National Comprehensive Cancer Network; USPSTF, U.S. Preventive Services Task Force.

Discussion

This scoping review examined the current evidence on EHR-based algorithms to identify undiagnosed cases of CDC Tier 1 genetic conditions, with a focus on two questions: (1) what is known about the performance of these algorithms in identifying clinically diagnosed or genetically confirmed cases, and (2) what is known about the outcomes from any clinical implementations of such algorithms in identifying new cases in unselected populations. While prior reviews have focused on familial hypercholesterolemia (FH) alone,36,37 our work expands the scope to include hereditary breast and ovarian cancer (HBOC) and Lynch syndrome, three conditions collectively proposed for opportunistic or population genomic screening.38–40 Although methods and populations varied and the overwhelming weight of evidence is for FH, this synthesis suggests that EHR-based algorithms are a promising tool for screening for Tier 1 genetic conditions and merit further evaluation.

For Question 1, we found 15 studies that evaluated algorithm performance, nearly all of which focused on FH. Algorithm performance for FH varied, with AUROC values ranging from 0.78 to 0.95. As shown in prior reviews of FH,36,37 studies that employed machine learning (ML) techniques—such as random forests, gradient boosting, or ensemble models—generally outperformed rule-based approaches using clinical diagnostic criteria like the Dutch Lipid Clinic Network (DLCN) or Simon Broome (SB) criteria, which rely heavily on the presence or absence of genetic testing and information about relatives that is often not contained in an EHR (eg, presence of xanthomas in 1st degree relatives). These results underscore the potential of ML methods to improve case-finding by leveraging complex patterns in structured EHR data. However, variation in algorithm performance metrics across studies reflects heterogeneity in study populations, diagnostic gold standards, and classification thresholds. Notably, ML models trained and validated using genomic data from biobanks showed particularly strong performance, suggesting a valuable role for research databases in algorithm development.16,27 In contrast, there is limited evidence for the effectiveness of EHR-based algorithms in screening for HBOC and Lynch syndrome. This gap may be due to the challenge of extracting meaningful family health history data from EHRs, which are often poorly collected during routine clinical visits and frequently lack structured documentation.41–43 FH screening algorithms, on the other hand, generally use structured data such as diagnostic codes and LDL-C and total cholesterol levels. The only study we identified assessing EHR-based algorithm performance for HBOC reported low sensitivity—29% in females and 14% in males—using an algorithm that extracted discrete codes and string matches from EHR comment fields.18 Recent studies have used natural language processing (NLP) to extract family history data to screen for hereditary cancers.43,44 These studies did not meet our eligibility criteria, however, as they did not evaluate algorithm performance against confirmed cases or incorporate these algorithms in clinical care, but they could represent a promising path forward to improve the extraction of family health history data to enable better genomic screening.

For Question 2, we found eight studies that prospectively implemented EHR-based algorithms. Seven of these addressed FH, and, as in prior reviews,36,37 we found that reported PPVs ranged widely from 11.1% to 67.1%, likely due in large to differences in prevalence among the studied population and whether the study used genetic testing or clinical diagnosis as the gold standard for FH diagnosis. Among studies using genetic testing as the gold standard, the study by Birnbaum reported the highest PPV at 67.1%, notably by initially recruiting individuals with the highest LDL-C levels (median LDL-C of 287.5) for genetic testing.22 The one clinical implementation study we identified for HBOC and Lynch syndrome was a small pilot study: of 71 patients contacted who screened positive on a rules-based algorithm based on NCCN guidelines, only 10 ultimately underwent genetic testing, two of whom were positive for hereditary cancer syndromes.14 While this study supports the further evaluation of EHR-based algorithms to identify individuals with HBOC or Lynch syndrome, evidence of effectiveness is currently lacking.

This review identifies gaps and suggests next steps to advance EHR-based screening algorithms for Tier 1 genetic conditions. First, while ML techniques have shown high performance in identifying FH, their application to HBOC and Lynch syndrome remains limited. Advances in artificial intelligence—including deep learning and large language models—offer potential solutions to extract unstructured data, such as family health history and imaging data, and to integrate heterogeneous data sources.45,46 The Genetic Cancer Risk Detector (GARDE), an EHR-based screening algorithm incorporating NLP to identify patients meeting NCCN genetic testing criteria for hereditary cancers demonstrates this promise but requires further validation and assessment of clinical effectiveness.44,47 Second, the collective findings across studies identified for Question 2 point to implementation challenges as a critical bottleneck to EHR-enabled genomic screening, which will persist even after algorithm performance is optimized. Most studies in this review reported low uptake of genetic testing among patients flagged as high-risk, pointing to barriers between algorithm output and clinical action. Implementing EHR-based genomic screening into clinical practice will require not only technical innovation but also alignment with patient, provider, and health system priorities, incentives, and workflows. Scalable implementation strategies will be essential, such as clinical decision support, automated referrals, and provider and patient engagement. For example, the recent Broadening the Reach, Impact, and Delivery of Genetic Services (BRIDGE) randomized clinical trial used the GARDE algorithm to identify patients meeting criteria for hereditary cancer screening and found that a chatbot service was equivalent to standard of care in completion of pretest cancer genetic services and completion of genetic testing.48 Further implementation science research will be needed to address the gaps between EHR-based screening to receipt of appropriate care.

Our review has limitations. We restricted our search to English-language studies. Given the emerging nature of this field, we performed a scoping review instead of a systematic review. Heterogeneity in populations, methods, and outcome measures limited comparability across studies, and we did not formally assess study quality or risk of bias. Nonetheless, given the burgeoning interest in genomic screening, this review provides a valuable synthesis of emerging efforts to use EHRs for case-finding Tier 1 genomic conditions.

In conclusion, early identification of CDC Tier 1 genetic conditions—familial hypercholesterolemia, hereditary breast and ovarian cancer, and Lynch syndrome—through EHR-based screening algorithms holds promise but will require both technical and implementation advances to realize improved patient care and outcomes.

Supplementary Material

ocaf140_Supplementary_Data

Acknowledgments

Jason L. Vassy is an employee of the Department of Veterans Affairs (VA); the views expressed in this manuscript do not reflect those of the VA or the US government.

Contributor Information

William R Harris, Harvard Medical School, Boston, MA, 02115, United States.

Marianna S Hernandez, Massachusetts College of Pharmacy and Health Sciences, Boston, MA, 02115, United States.

Khanh N H Ngo, University of California, Irvine, Irvine, CA, 92697, United States.

Anne Fladger, Countway Library of Medicine, Harvard Medical School, Boston, MA, 02115, United States.

Charles A Brunette, VA Boston Healthcare System, Boston, MA, 02130, United States; Department of Medicine, Harvard Medical School, Boston, MA, 02115, United States.

Sulaiman R Hamarneh, VA Boston Healthcare System, Boston, MA, 02130, United States.

Joshua W Knowles, Department of Medicine, Division of Cardiovascular Medicine, Cardiovascular Institute and Prevention Research Center, Stanford University School of Medicine, Stanford, CA, 94305, United States; The Family Heart Foundation, Fernandina Beach, FL 32034, United States.

Matthew S Lebo, Harvard Medical School, Boston, MA, 02115, United States; Department of Pathology, Harvard Medical School, Personalized Medicine, Mass General Brigham, Boston, MA 02139 United States.

Jason L Vassy, VA Boston Healthcare System, Boston, MA, 02130, United States; Department of Medicine, Harvard Medical School, Boston, MA, 02115, United States.

Author contributions

William R. Harris (Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Writing—original draft, Writing—review & editing), Marianna Hernandez (Formal analysis, Writing—review & editing), Khanh N.H. Ngo (Formal analysis, Writing—review & editing), Anne Fladger (Conceptualization, Data curation, Methodology, Resources, Software, Writing—review & editing), Charles A. Brunette (Writing—review & editing), Sulaiman Ragheb Hamarneh (Conceptualization, Writing—review & editing), Joshua W. Knowles (Writing—review & editing), Matthew S. Lebo (Writing—review & editing), and Jason L. Vassy (Conceptualization, Formal analysis, Investigation, Methodology, Project administration, Resources, Supervision, Writing—original draft, Writing—review & editing)

Supplementary material

Supplementary material is available at Journal of the American Medical Informatics Association online.

Funding

Jason L. Vassy was supported by the National Institutes of Health [R35 HG010706; U01 HG013781] and the Department of Veterans Affairs [I01 HX003627; I01 CX002635]. Joshua W. Knowles received support from the NIH through NIH R01 DK116750, R01 DK120565, R01 DK106236, R01 DK107437, R01 DK137889, P30DK116074 (to the Stanford Diabetes Research Center). The funders had no role in the scoping review. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

Conflicts of interest

None declared.

Data availability

No new data were generated or analyzed in support of this research.

References

  • 1. Murray MF, Khoury MJ, Abul-Husn NS.  Addressing the routine failure to clinically identify monogenic cases of common disease. Genome Med. 2022;14:60. 10.1186/s13073-022-01062-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Tier 1 Genomics Applications and their Importance to Public Health | CDC. October 24, 2022. Accessed February 3, 2025. https://archive.cdc.gov/www_cdc_gov/genomics/implementation/toolkit/tier1.htm
  • 3. Bouhairie VE, Goldberg AC.  Familial hypercholesterolemia. Cardiol Clin. 2015;33:169-179. 10.1016/j.ccl.2015.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Samadder NJ, Giridhar KV, Baffy N, Riegert-Johnson D, Couch FJ.  Hereditary cancer syndromes—a primer on diagnosis and management: Part 1: breast-ovarian cancer syndromes. Mayo Clin Proc. 2019;94:1084-1098. 10.1016/j.mayocp.2019.02.017 [DOI] [PubMed] [Google Scholar]
  • 5. Williams MH, Hadjinicolaou AV, Norton BC, Kader R, Lovat LB.  Lynch syndrome: from detection to treatment. Front Oncol. 2023;13:1166238. 10.3389/fonc.2023.1166238 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.US Prevalence of Familial Hypercholesterolemia. American College of Cardiology. Accessed May 9, 2025. https://www.acc.org/latest-in-cardiology/journal-scans/2016/03/15/14/42/http%3a%2f%2fwww.acc.org%2flatest-in-cardiology%2fjournal-scans%2f2016%2f03%2f15%2f14%2f42%2fprevalence-of-familial-hypercholesterolemia-in-the-1999
  • 7. Owens DK, Davidson KW, Krist AH, US Preventive Services Task Force, et al.  Risk assessment, genetic counseling, and genetic testing for BRCA-Related Cancer: US preventive services task force recommendation statement. JAMA. 2019;322:652-665. 10.1001/jama.2019.10987 [DOI] [PubMed] [Google Scholar]
  • 8. Hegde M, Ferber M, Mao R, Samowitz W, Ganguly A, Working Group of the American College of Medical Genetics and Genomics (ACMG) Laboratory Quality Assurance Committee. ACMG technical standards and guidelines for genetic testing for inherited colorectal cancer (Lynch syndrome, familial adenomatous polyposis, and MYH-associated polyposis). Genet Med. 2014;16:101-116. 10.1038/gim.2013.166 [DOI] [PubMed] [Google Scholar]
  • 9. Sturm AC, Knowles JW, Gidding SS, et al. ; Convened by the Familial Hypercholesterolemia Foundation. Clinical genetic testing for familial hypercholesterolemia: JACC scientific expert panel. J Am Coll Cardiol. 2018;72:662-680. 10.1016/j.jacc.2018.05.044 [DOI] [PubMed] [Google Scholar]
  • 10. Leclerc J, Vermaut C, Buisine MP.  Diagnosis of lynch syndrome and strategies to distinguish lynch-related tumors from sporadic MSI/dMMR tumors. Cancers. 2021;13:467. 10.3390/cancers13030467 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Simon GJ, Aliferis C, eds. Artificial Intelligence and Machine Learning in Health Care and Medical Sciences: Best Practices and Pitfalls. Springer Nature; 2024. 10.1007/978-3-031-39355-6 [DOI] [PubMed] [Google Scholar]
  • 12. Peters MD, Godfrey C, McInerney P, Munn Z, Tricco AC, Khalil H.  Scoping reviews. In: Aromataris E, Lockwood C, Porritt K, Pilla B, Jordan Z, eds. JBI Manual for Evidence Synthesis.  JBI; 2024. 10.46658/JBIMES-24-09 [DOI] [Google Scholar]
  • 13. Tricco AC, Lillie E, Zarin W, et al.  PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169:467-473. 10.7326/M18-0850 [DOI] [PubMed] [Google Scholar]
  • 14. Del Fiol G, Kohlmann W, Bradshaw RL, et al.  Standards-based clinical decision support platform to manage patients who meet guideline-based criteria for genetic evaluation of familial cancer. JCO Clin Cancer Inform. 2020;4:1-9. 10.1200/CCI.19.00120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Eid WE, Sapp EH, Wendt A, Lumpp A, Miller C.  Improving familial hypercholesterolemia diagnosis using an EMR-based hybrid diagnostic model. J Clin Endocrinol Metab. 2022;107:1078-1090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Gidding SS, Kirchner HL, Brangan A, et al.  Yield of familial hypercholesterolemia genetic and phenotypic diagnoses after electronic health record and genomic data screening. J Am Heart Assoc. 2023;12:e030073. 10.1161/JAHA.123.030073 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Banda JM, Sarraju A, Abbasi F, et al.  Finding missed cases of familial hypercholesterolemia in health systems using machine learning. NPJ Digit Med. 2019;2:23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Kiser D, Elhanan G, Bolze A, et al.  Screening familial risk for hereditary breast and ovarian cancer. JAMA Netw Open. 2024;7:e2435901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Myers KD, Knowles JW, Staszak D, et al.  Precision screening for familial hypercholesterolaemia: a machine learning study applied to electronic health encounter data. Lancet Digit Health. 2019;1:e393-e402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Safarova MS, Liu H, Kullo IJ.  Rapid identification of familial hypercholesterolemia from electronic health records: the SEARCH study. J Clin Lipidol. 2016;10:1230-1239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Sheth S, Lee P, Bajaj A, et al.  Implementation of a machine-learning algorithm in the electronic health record for targeted screening for familial hypercholesterolemia: a quality improvement study. Circ Cardiovasc Qual Outcomes. 2021;14:e007641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Birnbaum RA, Horton BH, Gidding SS, Brenman LM, Macapinlac BA, Avins AL.  Closing the gap: identification and management of familialhypercholesterolemia in an integrated healthcare delivery system. J Clin Lipidol. 2021;15:347-357. 10.1016/j.jacl.2021.01.008 [DOI] [PubMed] [Google Scholar]
  • 23. Akyea RK, Qureshi N, Kai J, et al.  Evaluating a clinical tool (FAMCAT) for identifying familialhypercholesterolaemia in primary care: a retrospective cohort study. BJGP Open. 2020;4:bjgpopen20X101114. 10.3399/bjgpopen20X101114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Weng SF, Kai J, Andrew Neil H, Humphries SE, Qureshi N.  Improving identification of familial hypercholesterolaemia in primarycare: derivation and validation of the familial hypercholesterolaemia caseascertainment tool (FAMCAT). Atherosclerosis. 2015;238:336-343. 10.1016/j.atherosclerosis.2014.12.034 [DOI] [PubMed] [Google Scholar]
  • 25. Weng S, Kai J, Akyea R, Qureshi N.  Detection of familial hypercholesterolaemia: external validation of theFAMCAT clinical case-finding algorithm to identify patients in primarycare. Lancet Public Health. 2019;4:e256-e264. 10.1016/S2468-2667(19)30061-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Akyea RK, Qureshi N, Kai J, Weng SF.  Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care. NPJ Digit Med. 2020;3:142. 10.1038/s41746-020-00349-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Stevens CAT, Vallejo-Vaz AJ, Chora JR, et al.  Improving the detection of potential cases of familial hypercholesterolemia: could machine learning be part of the solution?  J Am Heart Assoc. 2024;13:e034434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Qureshi N, Akyea RK, Dutton B, et al.  Comparing the performance of the novel FAMCAT algorithms and established case-finding criteria for familial hypercholesterolaemia in primary care. Open Heart. 2021;8:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Besseling J, Reitsma JB, Gaudet D, et al.  Selection of individuals for genetic testing for familialhypercholesterolaemia: development and external validation of a prediction model for the presence of a mutation causing familialhypercholesterolaemia. Eur Heart J. 2017;38:565-573. 10.1093/eurheartj/ehw135 [DOI] [PubMed] [Google Scholar]
  • 30. Mohammadnia N, Bax WA, Cornel J.  Sensitivity analysis of an electronic health record-based algorithm to facilitate detection of familial hypercholesterolemia: results in genetically confirmed familial hypercholesterolemia. Circulation. 2021;144:000752020003379. [Google Scholar]
  • 31. Kirke AB, Barbour RA, Burrows S, et al.  Systematic detection of familial hypercholesterolaemia in primary healthcare: a community based prospective study of three methods. Heart Lung Circ. 2015;24:250-256. 10.1016/j.hlc.2014.09.011 [DOI] [PubMed] [Google Scholar]
  • 32. Troeung L, Arnold-Reed D, Chan She Ping-Delfos W, et al.  A new electronic screening tool for identifying risk of familialhypercholesterolaemia in general practice. Heart Br Card Soc. 2016;102:855-861. 10.1136/heartjnl-2015-308824 [DOI] [PubMed] [Google Scholar]
  • 33. Brett T, Chan DC, Radford J, et al.  Improving detection and management of familial hypercholesterolaemia in Australian general practice. Heart Br Card Soc. 2021;107:1213-1219. 10.1136/heartjnl-2020-318813 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Pina A, Helgadottir S, Mancina RM, et al.  Virtual genetic diagnosis for familial hypercholesterolemia powered by machine learning. Eur J Prev Cardiol. 2020;27:1639-1646. 10.1177/2047487319898951 [DOI] [PubMed] [Google Scholar]
  • 35. Hesse R, Raal FJ, Blom DJ, George JA.  Familial hypercholesterolemia identification by machine learning using lipid profile data performs as well as clinical diagnostic criteria. Circ Genomic Precis Med. 2022;15:e003324. [DOI] [PubMed] [Google Scholar]
  • 36. Luo RF, Wang JH, Hu LJ, Fu QA, Zhang SY, Jiang L.  Applications of machine learning in familial hypercholesterolemia. Front Cardiovasc Med. 2023;10:1237258. 10.3389/fcvm.2023.1237258 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Osei J, Razavi AC, Otchere B, et al.  A scoping review of electronic health records-based screening algorithms for familial hypercholesterolemia. JACC Adv. 2024;3:101297. 10.1016/j.jacadv.2024.101297 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Guzauskas GF, Garbett S, Zhou Z, et al.  Population genomic screening for three common hereditary conditions : a cost-effectiveness analysis. Ann Intern Med. 2023;176:585-595. 10.7326/M22-0846 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Mighton C, Shickh S, Aguda V, Krishnapillai S, Adi-Wauran E, Bombard Y.  From the patient to the population: use of genomics for population screening. Front Genet. 2022;13:893832. 10.3389/fgene.2022.893832 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Brothers KB, Vassy JL, Green RC.  Reconciling opportunistic and population screening in clinical genomics. Mayo Clin Proc. 2019;94:103-109. 10.1016/j.mayocp.2018.08.028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Qureshi N, Wilson B, Santaguida P, et al.  Family history and improving health. Evid Rep Technol Assess. 2009;186:1-135. [PMC free article] [PubMed] [Google Scholar]
  • 42. Welch BM, Dere W, Schiffman JD.  Family health history: the case for better tools. JAMA. 2015;313:1711-1712. 10.1001/jama.2015.2417 [DOI] [PubMed] [Google Scholar]
  • 43. Shi J, Morgan KL, Bradshaw RL, et al.  Identifying patients who meet criteria for genetic testing of hereditary cancers based on structured and unstructured family health history data in the electronic health record: natural language processing approach. JMIR Med Inform. 2022;10:e37842. 10.2196/37842 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Bradshaw RL, Kawamoto K, Bather JR, et al.  Enhanced family history-based algorithms increase the identification of individuals meeting criteria for genetic testing of hereditary cancer syndromes but would not reduce disparities on their own. J Biomed Inform. 2024;149:104568. 10.1016/j.jbi.2023.104568 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Zurita AJW, del Rio HM, de Aguirre NUR, et al.  The transformative potential of large language models in mining electronic health records data: content analysis. JMIR Med Inform. 2025;13:e58457. 10.2196/58457 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Cheng S, Wei Y, Zhou Y, et al.  Deciphering genomic codes using advanced natural language processing techniques: a scoping review. J Am Med Inform Assoc. 2025;32:761-772. 10.1093/jamia/ocaf029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Bradshaw RL, Kawamoto K, Kaphingst KA, et al.  GARDE: a standards-based clinical decision support platform for identifying population health management cohorts. J Am Med Inform Assoc JAMIA. 2022;29:928-936. 10.1093/jamia/ocac028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Kaphingst KA, Kohlmann WK, Lorenz Chambers R, et al.  Uptake of cancer genetic services for chatbot vs standard-of-care delivery models: The BRIDGE randomized clinical trial. JAMA Netw Open. 2024;7:e2432143. 10.1001/jamanetworkopen.2024.32143 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ocaf140_Supplementary_Data

Data Availability Statement

No new data were generated or analyzed in support of this research.


Articles from Journal of the American Medical Informatics Association : JAMIA are provided here courtesy of Oxford University Press

RESOURCES