Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2024 Dec 5;111(12):2618–2642. doi: 10.1016/j.ajhg.2024.10.021

Prequalification of genome-based newborn screening for severe childhood genetic diseases through federated training based on purifying hyperselection

Stephen F Kingsmore 1,2,, Meredith Wright 1,2, Laurie D Smith 1, Yupu Liang 3, William R Mowrey 3, Liana Protopsaltis 1,2, Matthew Bainbridge 1,2, Mei Baker 4, Sergey Batalov 1,2, Eric Blincow 1,2, Bryant Cao 1,2, Sara Caylor 1,2, Christina Chambers 5, Katarzyna Ellsworth 1,2, Annette Feigenbaum 1,2,5, Erwin Frise 6, Lucia Guidugli 1,2, Kevin P Hall 7, Christian Hansen 1,2, Mark Kiel 8, Lucita Van Der Kraan 1,2, Chad Krilow 9, Hugh Kwon 1,2, Lakshminarasimha Madhavrao 1,2, Sebastien Lefebvre 10, Jeremy Leipzig 9, Rebecca Mardach 1,2,5, Barry Moore 11, Danny Oh 1,2, Lauren Olsen 1,2, Eric Ontiveros 1,2, Mallory J Owen 1,2, Rebecca Reimers 1,12, Gunter Scharer 1, Jennifer Schleit 1, Seth Shelnutt 9, Shyamal S Mehtalia 7, Albert Oriol 1,2, Erica Sanford 1, Steve Schwartz 8, Kristen Wigby 1,2, Mary J Willis 1, Mark Yandell 11, Chris M Kunard 7, Thomas Defay 3
PMCID: PMC11639087  PMID: 39642867

Summary

Genome-sequence-based newborn screening (gNBS) has substantial potential to improve outcomes in hundreds of severe childhood genetic disorders (SCGDs). However, a major impediment to gNBS is imprecision due to variants classified as pathogenic (P) or likely pathogenic (LP) that are not SCGD causal. gNBS with 53,855 P/LP variants, 342 genes, 412 SCGDs, and 1,603 therapies was positive in 74% of UK Biobank (UKB470K) adults, suggesting 97% false positives. We used the phenomenon of purifying hyperselection, which acts to decrease the frequency of SCGD causal diplotypes, to reduce false positives. Training of gene-disease-inheritance mode-diplotype tetrads in 618,290 control and affected subjects identified 293 variants or haplotypes and seven genes with variable inheritance contributing higher positive diplotype counts than consistent with purifying hyperselection and with little or no evidence of SCGD causality. With these changes, 2.0% of UKB470K adults were positive. In contrast, gNBS was positive in 7.2% of 3,118 critically ill children with suspected SCGDs and 7.9% of 705 infant deaths. When compared with rapid diagnostic genome sequencing (RDGS), gNBS had 99.1% recall. In eight true-positive children, gNBS was projected to decrease time to diagnosis by a median of 121 days and avoid life-threatening disease presentations in four children, organ damage in six children, ∼$1.25 million in healthcare cost, and ten (1.4%) infant deaths. Federated training predicated on purifying hyperselection provides a general framework to attain high precision in population screening. Federated training across many biobanks and clinical trials can provide a privacy-preserving mechanism for qualification of gNBS in diverse genetic ancestries.

Keywords: newborn screening, genome sequencing, severe childhood genetic diseases, false positive, purifying hyperselection, infant mortality, query federation, diplotype, genetic architecture, artificial intelligence

Graphical abstract

graphic file with name fx1.jpg


Screening newborns by genome sequencing is anticipated to improve outcomes in hundreds of severe genetic disorders. A major problem is false-positive variants that, while pathogenic, do not cause severe childhood disease. We used federated training in 618,290 subjects based on purifying hyperselection to identify and remove false-positive variants.

Introduction

Newborn screening (NBS) is risk evaluation at birth for severe childhood genetic disorders (SCGDs) with early onset for which effective therapeutic interventions are available. Morbidity and mortality are minimized by early intervention. Since inception in 1963, NBS has expanded worldwide to ∼40 million newborns/year and up to 80 SCGDs (https://www.cdph.ca.gov/Programs/CFH/DGDS/Pages/nbs/default.aspx#:∼:text=Newborn%20screening%20began%20in%20California,congenital%20(present%20at%20birth).1,2,3,4,5,6 In the United States (US), NBS identifies ∼12,500 affected infants per year, with biochemical geneticists playing a critical role in communication of results and facilitation of follow-up for screen-positive newborns. However, expansion of NBS to keep abreast of new therapeutic interventions for SCGDs is impeded by the lengthy evidence generation to merit inclusion on the Recommended Uniform Screening Panel (RUSP), the need for new custom assay development for most added disorders, state-by-state implementation, and paucity of medical geneticists. A long-standing goal has been to supplement RUSP NBS ∼50-fold by inclusion of all SCGDs with effective treatments that are detectable by genome sequencing (GS).7,8,9 Early efforts at genome-based NBS (gNBS) demonstrated safety and feasibility but were impeded by GS cost, immaturity of population bioinformatics resources, and lack of knowledge of variant pathogenicity.10,11,12,13,14 Recent advances in these areas enabled development of prototypic methods for parentally consented gNBS of 388 SCGDs, 342 genes, 29,771 pathogenic (P) and likely pathogenic (LP) variants, confirmatory diagnosis, and ∼1,500 treatments by virtual management guidance.15,16,17,18 Named BeginNGS (Begin Newborn Genome Sequencing), it is intended to be a complement to traditional NBS. Recently, numerous groups worldwide have started or are planning independent clinical trials to evaluate the clinical utility of gNBS for SCGDs.19,20,21,22,23,24,25,26,27,28,29 Unlike BeginNGS, they employ the mainly manual current methods of variant interpretation and return of results developed for diagnostic genome and exome sequencing, and they generate ad hoc guidance regarding confirmatory testing and therapeutic interventions. For population implementation in millions of newborns per year, however, gNBS requires a much more scalable framework with emerging requirements that include: (1) very-low-cost, clinical-grade GS from dried blood spot (DBS) at a scale of millions; (2) very-low-cost and, therefore, highly automated GS clinical analysis, interpretation, and reporting for hundreds of SCGDs with high precision (positive predictive value [PPV]) and recall (sensitivity); (3) translation of positive screens into confirmatory tests by tens of thousands of geographically dispersed primary care pediatricians who lack genomic literacy, facilitated and assisted by available medical geneticists; and (4) precision medicine intervention implementation without delay by the same workforce. Described herein is BeginNGS version 2 (BeginNGS.2), a healthcare delivery platform for SCGDs that starts to meet these requirements. Another requirement is that BeginNGS development is adaptive and versioned in response to rapid advances in understanding of SCGD prevalence, penetrance, expressivity, causal variants, new efficacious therapies, and rapid evolution of GS, informatics, and AI. The technical approach to adaptive BeginNGS development has three elements.

  • (1)

    A structured rare disease molecular and treatment knowledgebase that rapidly grows by addition of new disorders, treatments, and variants.15,16,17,18 This knowledge base informs a gNBS algorithm that is trained, ideally by distributed or federated learning from multisite, large diplotype models to achieve analytic performance suitable for population risk screening (Figure 1A).30,31,32,33 Large diplotype models are sets of genotypes (diplotypes) for individual genes, comprising putatively pathogenic variants, from hundreds of thousands of GS from subjects of known age, sex, genetic ancestry, and, ideally, of known affected status for each SCGD to be screened. Akin to federated learning with large language models to identify rare disease phenotypes, large diplotype models can identify incorrect algorithm parameters, such as variants curated as pathogenic but which do not cause SCGDs with high penetrance.34,35,36,37

  • (2)

    A highly automated platform for standardized, scalable population screening, diagnosis, and treatment that is empowered by the knowledge base, screening algorithm, and an imputation algorithm called Transformer (Figure 1B).15,38 The former informs electronic clinical decision support (eCDS, called Genome-to-Treatment [GTRx]) that disseminates results and management guidance in a manner understandable to frontline providers nationwide to upskill them to translate positive results into optimal outcomes, facilitated and assisted by available medical geneticists.15,16,17,18 The screening algorithm and Transformer provide automated, supervised GS interpretation, which is critical for scalability. It should be noted that irrespective of eCDS, population genetic screening will exacerbate the profound current need to attract many more physicians into careers in medical genetics.

  • (3)

    Adaptive clinical trials that employ elements (1) and (2) to fill knowledge-base gaps further train the screening algorithm and evaluate clinical utility and cost-effectiveness for each SCGD across diverse demographics and geographies, together with acceptability to parents and providers (ClinicalTrials.gov ID NCT06276348, NCT06306521).

Figure 1.

Figure 1

Technical approach to structured, adaptive development of the BeginNGS.2 SCGD screening, diagnosis, and treatment platform

(A) Development of a structured SCGD molecular and treatment knowledge base and screening algorithm that is trained in multicentric, large diplotype models. Federated training identifies variants in BeginNGS.2 genes contributing to diplotypes with frequencies (fdiplotype 1 … n) inconsistent with purifying hyperselection, such that fdiplotype 1 … n are greater than P, the population prevalence of the corresponding genetic disease(s) after correction for penetrance (p), expressivity (e), diplotype heterogeneity (d), and locus heterogeneity (l)(Figure 2).

(B) Highly automated platform for scalable population screening, diagnosis, and treatment that is empowered by the knowledge base and trained algorithm. GS, genome sequence; SME, subject matter expert; Rx, treatment. Automated interpretation includes a diplotype query and use of the Transformer tool.

(C) Federated learning by (1) iterative queries of genomic sequences of UKB470K and RCIGM RDGS cohorts, with (2) return of positive diplotypes with zygosity and count of positive subjects and (3) removal of NSDCC variants and disorder MOI contributing excess positive counts. Rx, therapeutic intervention; GS, genome sequence; SME, subject matter expert; MOI, mode of inheritance; GTRx; Genome-to-Treatment; eCDS, electronic clinical decision support; DBS, dried blood spot; Exp., expected; TP, true positive rate; TN, true negative rate; AWS, Amazon web services; aiSNPs, ancestry-informative single-nucleotide polymorphisms; ETL, extract, transform, load; Db, database; VEP, variant effect predictor.

Here, we report methods to differentiate SCGD causal and non-causal diplotypes in large cohorts and results of the evaluation of BeginNGS.2 regarding clinical requirements for population screening such as maintainability, reliability, usability, testability, recall, PPV, and clinical utility.

Subjects and methods

Research participants

Research subjects were 469,902 UK Biobank (UKB470K) participants, 141,046 Mexico City Prospective Study (MCPS) participants, and 7,342 children, parents, or siblings who received rapid diagnostic GS (RDGS) for a suspected genetic disease at Rady Children’s Institute for Genomic Medicine (RCIGM).15,39,40,41,42,43,44,45 De-identified UKB470K exomes and phenotypes were queried through the UK Biobank Research Analysis Platform under application 82213 as previously described.15 At enrollment, UKB470K participants were aged 40–69 years, and 86%, 10.1%, 1.3%, and 1.1% were of white British, other European (EUR), African (AFR), and East Asian (EAS) ancestry, respectively.39,40,41,42 At enrollment, MCPS participants were aged >35 years, of which 66%, 31%, and 1.1% were of indigenous Mexican, EUR, and AFR, respectively.43,44,45 De-identified MCPS ancestry-specific allele frequencies were downloaded from https://rgc-mcps.regeneron.com.43 Retrospective analysis of genomes, phenotypes, and electronic medical records (EMRs) of critically ill newborns and children, their parents, and siblings at RCIGM was approved by the Institutional Review Board (IRB) of Rady Children’s Hospital/University of California, San Diego (UCSD).15 Analysis of genetic contributors to infant death was performed in 1,000 consecutive infant deaths in San Diego County from 2010 to 2018 in the San Diego Study of Outcomes in Mothers and Infants (SOMI) with approval from the IRBs of UCSD and the California Department of Public Health (California Biobank Program, SIS request 1830, Section 6555(b), 17 CCR).46,47 The genetic ancestries of SOMI infants were 10.4% AFR, 27.8% American ancestry, 7.7% EAS, and 53.9% EUR ancestry. 32% of SOMI infants had ≥30% ancestry admixture.

Selection of disease-gene dyads, therapeutic interventions, and inheritance modes for BeginNGS.2 queries

Disorder and intervention curation for the GTRx management guidance system have been described in detail for BeginNGS.1 (Figure 1A).15,16,17,18 In brief, we examined the efficacy of therapeutic interventions available for 645 childhood-onset, single-locus genetic disorders that met the following criteria: acute presentations that were likely to lead to neonatal, pediatric, or cardiovascular intensive care unit (ICU) admission; having somewhat effective treatments; high likelihood of rapid progression without treatment; and diagnosable by GS. Publications relating to ∼10,000 interventions associated with these disorders were extracted with custom scripts (Rancho Biosciences) and curated manually for relevance. The interventions were adjudicated by six pediatric clinical and biochemical geneticists using a modified Delphi technique and electronic data capture (REDCap).15,48,49 Consensus was required for inclusion of interventions and disorders regarding: (1) age groups in which the intervention was indicated; (2) optimal time of intervention initiation; (3) contraindications; (4) efficacy category (curative, effective, ameliorative); and (5) level of evidence supporting efficacy.15,16,17,18 For BeginNGS.2, we re-evaluated the therapeutic interventions available for the 388 BeginNGS.1 disorders and 77 new gene-disorder dyads. We also evaluated the suitability of these gene-disease dyads for NBS as described with the same expert panel, electronic data capture, and Delphi methods.15,48,49 The panel comprised five pediatric clinical and biochemical geneticists representing hospitals in four states. To reach consensus regarding inclusion of gene-disease dyad in BeginNGS.2, the panel considered six questions and clarifying subquestions15.

  • (1)
    Is the natural history of this genetic disease well understood?
    • a.
      Is there at least one well-established gene-phenotype association?
    • b.
      Is there significant variation in expressivity? Is expressivity sufficient in children to be characterized as a severe disease?
    • c.
      Is there reduced penetrance (see below)?
    • d.
      Is the mode of inheritance (MOI)—autosomal dominant (AD), autosomal recessive (AR), X-linked, or mitochondrial—well understood (see below)?
    • e.
      Is pathogenicity of at least a subset of DNA variants well understood (gain vs. loss of function)?
    • f.
      Is genotype-phenotype correlation sufficient for those variants to predict disease course?
    • g.
      Can variability in outcome or disease severity be clarified by additional investigation (such as an analyte, enzyme, biomarker, or functional test)?
  • (2)
    Is this genetic disease a significant risk for morbidity and mortality in infants or young children?
    • a.
      Is penetrance high enough such that identification of clinically insignificant disease is minimal or causes minimal harm?
  • (3)
    Is a treatment or intervention available that is effective and accepted?
    • a.
      Is a treatment available that can affect outcome?
    • b.
      Is a treatment effective for all affected individuals?
    • c.
      Is response to treatment consistent for a given recognized pathogenic variant(s)?
    • d.
      Is treatment effective for all symptoms of a disorder?
    • e.
      If no specific treatment is available, would making a diagnosis change management in some other way?
    • f.
      Is a treatment widely available and are there sufficient providers, facilities, and resources to accommodate all identified individuals?
    • g.
      Is a treatment acceptable to the majority population? Considerations include cost, morbidity of the treatment, and religious or political beliefs. For example, does this intervention require use of fetal-derived tissue?
  • (4)
    Does early treatment improve outcome?
    • a.
      Is there a latent phase during which initiation of treatment leads to improved outcome or prevents complications?
    • b.
      Does delayed diagnosis lead to poorer outcome or serious complications?
    • c.
      Does early diagnosis and treatment lead to improved outcome over reactive care following symptom onset?
  • (5)
    Do the benefits of early intervention clearly outweigh the risks?
    • a.
      Are false positives problematic with this gene?
    • b.
      Might NBS adoption of this condition have a negative net benefit? Considerations include the proband, family, and the general population.
    • c.
      Do concerns exist regarding identification of carriers?
  • (6)

    For genes with more than one associated disorder, do their treatments differ, and can they be distinguished by RDGS or additional testing?

While BeginNGS.1 was a research-grade prototype, BeginNGS.2 was designed to meet criteria for clinical testing. An expert panel of laboratory directors, genetic counselors, and doctoral-level genome analysts met weekly for 6 months to address five additional questions related to the GS testing performance and test reporting of each disease-gene dyad.

  • (1)

    Did the gene have several phenotype associations that represent a single phenotype spectrum? The expert consensus was to amalgamate such phenotypes if the mode of inheritance and therapy were identical (see supplemental results).

  • (2)

    Was the gene-disorder association of uncertain significance (GUS)? The expert consensus was that inclusion required each gene-disorder dyad to have been adequately described in >5 independent families (see supplemental results).

  • (3)

    Can a majority of affected individuals be identified by short-read GS? Sensitivity may be low in “dead zone” exons (such as adjacent to LINE elements), genes with highly homologous pseudogenes, or difficult variants (such as inversions). The expert consensus, based on comparisons with other screening tests, was that an overall BeginNGS sensitivity of >90% was desired. This implied that only a few BeginNGS gene-disorder dyads should have lower sensitivity. In such disorders, GS identification of variants associated with >50% of affected individuals was required for retention (see supplemental results).

  • (4)

    Is childhood penetrance sufficient for population screening? The expert consensus was that an average > 40% PPV during early childhood was desired for BeginNGS.2 (see supplemental results).

  • (5)

    Several BeginNGS-eligible phenotypes are restricted to a very small subset of observed variants, such as rare gain of function or dominant negative variants. While these genes were not included in BeginNGS.2, causal variants were incorporated in an “Allow List” that permits screening for these disorders and inheritance patterns delimited to these variants.

A web resource integrated the BeginNGS.2 and GTRx information resources and the adjudicated interventions of 412 retained disorders associated with 342 genes and 1,603 interventions (https://gtrx.rbsapp.net/; Tables S1 and S2).15,16,17,18

The MOI used in BeginNGS queries of gene-disorder dyads were also re-evaluated as part of the transition from a BeginNGS.1 research-grade prototype to BeginNGS.2 clinical test for NBS by three questions.

  • (1)

    Would the burden of follow-up of asymptomatic positives overwhelm primary care pediatricians? For mild dominant gene-disease dyad disorders that have severe recessive forms, should BeginNGS.2 queries be limited to the severe recessive form?

  • (2)

    For autosomal disorders in which Mendelian Inheritance in Man (MIM) lists inheritance as both AD and AR, which MOI should be employed in BeginNGS.2 queries?50 In BeginNGS.1, we elected to query these in a dominant manner.

  • (3)

    For X-linked recessive (XR) disorders, does Lyonization lead to sufficiently severe disease in female carriers to warrant population screening? BeginNGS.1 screened for these disorders under a recessive MOI.

Table S1 shows the 412 retained BeginNGS.2 disorders, 342 genes, and MOI. Periodic review of these allocations will be necessary considering new knowledge.

Variant selection

The input for BeginNGS.2 was 53,855 germline variants mapping to 412 disease-gene dyads (Table S3). They were 43,064 ClinVar P, LP, and variants with conflicting pathogenicity assertions that included at least one P or LP assertion (May 4, 2023 XML release). 15,624 P or LP published variants were identified in 119 BeginNGS.2 genes using Mastermind, with evidence curation and variant interpretation according to standard American College of Medical Genetics (ACMG) clinical guidelines (Genomenon).51 4,833 variants were common to ClinVar and Genomenon sets. Variants of uncertain significance (VUS) were excluded, except for variants common to ClinVar and Genomenon that were annotated as P or LP in one.

Rapid diagnostic whole-genome sequencing

Clinical GS methods from EDTA blood samples and DBS were as described.15,17,46,47 In brief, genomic DNA was isolated from blood with the EZ1 DSP DNA Blood Kit (Qiagen) or from five 3-mm2 archived, de-identified, NBS DBS punches with the DNA Flex Lysis Reagent Kit (Illumina). Sequencing libraries were prepared with the DNA PCR-free Tagmentation library kits (Illumina). Libraries with concentration >3 nM were sequenced (2x101 nucleotide) on NovaSeq 6000 instruments (Illumina). Quality controls were Q30 ≥80%, error rate ≤3%, and >120 Gb sequence generated per sample. GSs were aligned to human genome GRCh37 and variants identified and genotyped with the DRAGEN platform (v.3.9, Illumina). Structural variants were filtered to retain those affecting coding regions associated with SCGDs and with allele frequencies <2% in the RCIGM database. GS variant quality controls included: (1) identity tracking by CODIS short tandem repeats by capillary electrophoresis (Thermo Fisher) and in silico from GS; (2) <15% duplicates; (3) >98% aligned reads; (4) Ti/Tv ratio 2.0–2.2; (5) Hom/Het variant ratio 0.40–0.61; (6) >90% of OMIM genes with >10-fold coverage of all coding nucleotides; (7) sex match; (8) coverage uniformity by GC bias, standard deviation of coverage normalized to average coverage, and the total length of the reference genome with read coverage. Variants were interpreted according to standard guidelines by clinical molecular geneticists with GEM and Enterprise software (Fabric Genomics) using the variant call file (vcf), list of observed human phenotype ontology terms, and individual metadata. Variant diplotypes were ranked according to phenotypic match with the associated genetic disease, pathogenicity classification, and rarity in population databases. Variants were confirmed by Sanger sequencing, multiplex-ligation-dependent probe amplification, or chromosomal microarray, as appropriate.

Re-pipelining of WGS, TileDB development, and queries

We realigned CSI and TBL files for 7,342 children, their relatives, and SOMI infants who received GS at RCIGM to the GRCh38 reference genome using DRAGEN (v.3.9) on Illumina Connected Analytics (ICA) as described.15 We developed array-based data models for genomic variants and metadata extracted from Fabric Enterprise, Ensembl, gnomAD, ClinVar, and Variant Effect Predictor (VEP). The resultant VCFs were ingested into a TileDB array (v.2.8) on AWS S3 using TileDB-VCF (v.0.15).15 Metadata fields from prior RDGS were extracted from Fabric Enterprise, and interpretation reports were de-identified, lifted to GRCh38 coordinates, and ingested into TileDB-Cloud (v.0.7.41), together with Ensembl (v.104), gnomAD (v.3.1.1), ClinVar (downloaded May 20, 2022) and VEP (v.105) metadata for each variant.15 We parsed 342 BeginNGS.2 genes and queried the VCFs with the variants selected above. Multiallelic variant rows were flattened. We retained high-quality variants and annotated the query results with gene information, project-specific subject codes, gender, and disorder MOI. We used custom scripts to calculate variant zygosity and determine whether genotypes represented BeginNGS.2 positives based on diplotypes and disorder MOI. Completeness of query results was assessed by comparison with results of prior diagnostic interpretation. Among individuals who had been diagnosed with a BeginNGS.2 disorder, additional BeginNGS-positive individuals were sought by analysis of VCFs using the automated interpretation tool, Transformer in Fabric Enterprise.15,38 In SOMI infants, GEM was performed with a Bayes-factor-based cutoff of >0.1 and the phenotype death in infancy, HP:0000152.38

UK Biobank exome queries

BeginNGS gene regions were extracted from UKB470K exome pVCFs, after which we split multiallelic rows, normalized indels, and filtered out low-quality variants as described.15 We intersected the two variant sets and identified positive individuals based on MOI and individual zygosity (heterozygous for dominant disorders, and compound heterozygous, hemizygous, or homozygous for recessive disorders). Federated UKB470K exome queries were performed on the AstraZeneca Center for Genomics Research instance. Federated UK Biobank genome queries are discussed in the supplemental information.

Analysis of results of federated queries related to severe childhood genetic diseases

Detailed methods of federated queries of UK Biobank and RCIGM datasets are provided in supplemental methods. Supervised, federated training in the UKB470K, MCPS, and RCIGM large diplotype models was performed to define the genetic prevalence of gene-disorder dyads and the genetic architecture of severe childhood disease causing variation in BeginNGS genes with BeginNGS modes of inheritance across diverse genetic ancestries and diverse population age groups. A by-product of this analysis is a genetic record of the natural history of gene-disorder dyads across diverse genetic ancestries. To accomplish this it was first necessary to identify variants that had been curated as P or LP but that were unlikely to be causal of severe childhood disease (non-severe disease causal in childhood, NSDCC).15 We sought to identify and remove likely NSDCC variants comprehensively in a generalizable manner to increase the PPV of BeginNGS.2 gNBS queries (Figure 2).

Figure 2.

Figure 2

Training of the BeginNGS.2 genetic disease screening algorithm in multicentric, large diplotype models

(A) Federated training in large GS cohorts flags P or LP variants for evaluation as non-severe disease causing in childhood (NSDCC) based on absence of purifying hyperselection evidenced by contributing diplotype frequencies (f) that are greater than those expected based on the sum of the corresponding disease prevalences (P) following correction for penetrance (p), expressivity (e), diplotype heterogeneity (d), and locus heterogeneity (L).

(B) Manhattan plot of counts of 2,785 diplotypes that were gNBS positive in UKB470K. The x axis shows chromosome number and relative nucleotide position from the lowest value (left) to the highest value (right). The y axis is the diplotype count in UKB470K. 113 diplotypes with counts ≥54 in UKB470K (frequency >1 in 8,703) are indicated in green if disease causal (n = 16), and in red if determined to be NSDCC (n = 97) using the method of (A). The top 109 CFTR diplotypes (with counts >3, 1 in 118,000) are also indicated as green if disease causal (n = 5) and red if not (n = 104).

(C) Rank ordering of 2,785 diplotype counts in UKB470K from largest (left) to smallest (right). The x axis shows the diplotype rank from most common (left) to least common (right). The y axis is the diplotype count in UKB470K. The top 10 (darker shaded blue) and 100 (lighter shaded blue) diplotypes accounted for 91% and 97%, respectively, of the total diplotype count. The 113 diplotypes with frequencies >1 in 8,703 (counts ≥54) are indicated in green if disease causal (n = 16), and in red (n = 97) if determined to be NSDCC using the method of (A), indicating the power to reduce false positives.

For each BeginNGS.2 gene, step 1 was to calculate P, the sum of the prevalence of the corresponding x genetic disease(s) associated with that gene (range 1–9) in a population (Figure 2 and Table 1), where

P=1x(Prevalenceofgeneticdiseaseinthepopulation).

Table 1.

Identification of variants and inheritance patterns associated with 80 BeginNGS severe childhood genetic diseases that, while pathogenic, are not severe childhood disease causal by comparison of actual UK adult population prevalence per 100,000 individuals with the observed UKB470K genetic prevalence and correction for penetrance, expressivity, and locus and diplotype heterogeneity

Gene OMIM or Orphanet disorder ID MOI BeginNGS MOI Observed UKB470K disorder genetic prevalence (O) Actual adult UK adult disorder prevalence Estimated adult penetrance Estimated adult expressivity OMIM disorders assoc. with gene Genes assoc. with OMIM disorder Estimated locus heterogeneity Corrected adult UK genetic prevalence (Pcorrected) BeginNGS.2 variants assoc. with gene Diplotype heterogeneity Corrected UK diplotype frequency (fcorrected) UKB470K positive diplotype count Blocklist variant count UKB470K positive diplotype count minus blocklist Mean (95% CI) UKB470K genetic prevalence minus blocklist (Ocorrected) MOI Δ Mean (95% CI) UKB470K genetic prevalence with blocklist & MOI Δ Supplemental references
P p e l Pl/pe d Pcorrectedd fdiplotype 1 … nfcorrected Ocorrected
HNF1A 600496, O: 324575 AD 52,905 6,400 0.7 0.7 3 396 0.01 131 259 0.10 13.1 78 2 43 69 (62–77) 69 (62–77) Nkonge et al.,52 Shepherd et al.,53 Shields et al.,54 Snider et al.,55 Shepherd et al.56
CYP21A2 201910 AR 37,320 10 0.5 0.5 2 1 1 40 101 0.10 4.0 170 2 28 54 (48–62) 54 (48–62) Berglund et al.,57 van der Linde et al.,58 Neocleous et al.,59 Barbaro et al.,60 Baumgartner-Parzer et al.61
CFTR 219700 AR 2,349 20 0.7 0.7 1 4 0.95 39 1261 0.10 3.9 499 20 72 52 (46–59) 52 (46–59) CFTR database, Boussaroque et al.,62 Barton et al.63
GALT 230400 AR 1,043 4 0.6 0.6 1 4 0.9 10 250 0.10 1.0 40 2 6 1.3 (0.6–2.8) 1.3 (0.6–2.8) Badiu Tișa et al.64
FGG 202400, 616004 AD, AR AD 955 15 0.5 0.5 2 3 0.33 20 21 0.20 4.0 11 4 7 6.2 (4.3–8.9) 6.2 (4.3–8.9) Simurda et al.,65 Ergoren and Ismail66
TCF3 616941 AR 870 0.5 0.5 0.5 2 39 0.1 0.2 18 0.20 0.0 9 2 1 0.2 (0–1.2) 0.2 (0–1.2) Person67
G6PD 230400 XD XR 852 300 0.75 0.75 1 1 1 533 239 0.10 53.3 77 4 46 311 (295–327) 311 (295–327) Koromina et al.,68 Geck et al.,69 Powell et al.70
BTD 253260 AR 396 2 0.5 0.5 1 1 1 8 191 0.10 0.8 38 2 9 2.8 (1.6–4.7) 2.8 (1.6–4.7) ORPHA:79241, Wolf71
SCN5A 609634, 603830, 113900 AD 390 450 0.35 0.7 9 >50 0.15 276 474 0.10 27.6 119 9 113 214 (201–228) 214 (201–228) Hayesmoore et al.,72 Coll et al.,73 Akai et al.,74 Verheul et al.,75 Villarreal-Molina et al.,76 Wilde and Amin,77 Ackerman,78 Van Driest et al.,79 Walsh et al.,80 ORPHA:154, ORPHA:166282, ORPHA:101016, McGurk et al.,81 Lipov et al.,82
SCN1A 607208, 604403, 609634 AD 349 49 0.5 0.7 4 >100 0.2 28 1735 0.10 2.8 57 2 54 78 (71–87) 78 (71–87) ORPHA:569, ORPHA:33069, ORPHA:2382, ORPHA:1942, ORPHA:293181, ORPHA:369992, Miller et al.83
DSP 605676, 615821 AD, AR AD 312 19 0.5 0.7 5 >100 0.2 11 754 0.10 1.1 105 34 99 78 (70–86) 78 (70–86) ORPHA:65282, ORPHA:476096, ORPHA:2032, ORPHA:154, ORPHA:293165, ORPHA:217656
KCNQ1 220400, 609621 AD 289 51 0.5 0.7 4 16 0.4 58 500 0.10 5.8 115 2 112 239 (225–253) 239 (225–253) ORPHA:101016, ORPHA:90647
ABCC8 618857, 256450 AD, AR AD 273 25 0.5 0.7 5 8 0.4 29 400 0.10 2.9 114 1 113 271 (239–302) AR 0 (0–0.8) Snider et al.,55 Fan et al.,84 ORPHA:552, ORPHA:99885
GLA 301500 XR XD 243 30 0.6 0.6 1 1 1 83 459 0.15 12.5 26 2 14 21 (17–25) 21 (17–25) Bokhari et al.85
PRF1 603553 AR 186 0.5 0.4 0.4 2 38 0.2 0.6 177 0.15 0.09 18 1 4 1.3 (0.6–2.8) 1.3 (0.6–2.8) West et al.86
GLRA1 149400 AD AD 156 1 0.5 0.7 1 4 0.5 1.4 64 0.15 0.2 26 1 25 64 (48–79) AR 0 (0–0.8) ORPHA:3197
F8 306700 XR 132 24.6 0.5 0.7 1 1 1 70 2782 0.10 7.0 84 21 61 76 (68–84) 76 (68–84) Iorio et al.,87 Johnsen et al.,88,89 Xu et al.90
GCH1 128230, 233910 AD, AR AD 121 10 0.5 0.7 2 37 0.27 8 94 0.15 1.2 13 3 9 3.4 (2.1–5.5) 3.4 (2.1–5.5) Yoshino et al.,91 ORPHA:98808, ORPHA:2102, ORPHA:238583
SCN4A 170500, 613345, 614198, 168300 AD, AR AD 104 11 0.5 0.7 7 >25 0.33 10 220 0.10 1.0 62 0 61 104 (95–114) 104 (95–114) ORPHA:99736, ORPHA:682, ORPHA:681, ORPHA:684, ORPHA:590
SLC22A5 614707 AR 91 2 0.5 0.5 1 2043 0.5 4 233 0.15 0.6 25 1 11 3.0 (1.8–5.0) 3.0 (1.8–5.0) ORPHA:1582
CPOX 121300 AD, AR AD 90 0.2 0.5 0.25 2 4 1 2 71 0.15 0.2 21 0 21 92 (73–111) AR 1.3 (0.6–2.8) Andrews et al.,92 Lamoril et al.93
RYR1 145600 AD 87 50 0.5 0.7 1 6 0.75 107 684 0.10 10.7 44 92 44 87 (79–95) 87 (79–95) Ibarra Moreno et al.,94 ORPHA:423, Monnier et al.,95 Rosenberg et al.,96 Ibarra et al.97
CHRND 616321, 616322, 616323 AD; AR AD 86 7 0.5 0.5 4 32 0.2 6 52 0.15 0.8 22 0 22 70–106 AR 0 (0–0.8) Finsterer98,99
CHRNB1 616313, 616313 AD; AR AD 57 7 0.5 0.5 4 32 0.2 6 27 0.20 1.1 10 0 10 44–74 AR 0 (0–0.8) Finsterer98,99
CHRNA1 601462, 608930 AD; AR AD 40 7 0.5 0.5 4 32 0.2 6 40 0.15 0.8 15 1 14 28–52 AR 0 (0–0.8) Finsterer98,99
NR5A1 617480, 612965 AD 83 9 0.5 0.7 4 11 0.5 13 199 0.15 1.9 20 5 14 18 (15–22) 18 (15–22) ORPHA:2138, ORPHA:393, ORPHA:243, ORPHA:242, ORPHA:251510
HNF4A 616026, 125850, O:263455 AD 68 6,400 0.7 0.7 2 396 0.01 131 82 0.10 13.1 25 2 24 48 (42–55) 48 (42–55) Nkonge et al.,52 Shepherd et al.,53,56 Shields et al., 54 Snider et al.,55
HESX1 182230 AD, AR AD 68 9 0.5 0.7 1 8 0.3 8 24 0.20 1.5 10 0 10 54–86 AR 0 (0–0.8) ORPHA:478
KCNT1 615005, 614959 AD 67 49 0.5 0.7 2 >100 0.1 14 97 0.15 2.1 16 2 15 18 (14–22) 18 (14–22) ORPHA:569, ORPHA:33069, ORPHA:2382, ORPHA:1942, ORPHA:293181, ORPHA:369992, Miller et al.83
KCNQ2 613720, 121200 AD 42 49 0.5 0.7 3 >100 0.1 14 770 0.15 2.1 26 1 25 12–30 12–30 ORPHA:569, ORPHA:33069, ORPHA:2382, ORPHA:1942, ORPHA:293181, ORPHA:369992, Miller et al.83
SCN2A 613721 AD 42 49 0.5 0.7 3 >100 0.1 14 667 0.15 2.1 18 3 16 21 (17–25) 21 (17–25) ORPHA:569, ORPHA:33069, ORPHA:2382, ORPHA:1942, ORPHA:293181, ORPHA:369992, Miller et al.83
IDS 309900 XR 41 9 0.75 0.75 1 1 1 16 286 0.30 4.8 5 1 3 1.7 (0.9–3.4) 1.7 (0.9–3.4) ORPHA:580
IDUA 607014, 607015, 607016 AR 40 1 1 0.8 1 1 1 2 428 0.33 0.6 2 2 2 0.2 (0–1.2) 0.2 (0–1.2) Clarke100
ABCC6 614473 AR 35 0.2 0.8 0.8 3 2 1 0.4 414 0.10 0.036 59 1 32 8.7 (6.4–12) 8.7 (6.4–12) ORPHA:51608
CACNA1C 618447, 601005 AD 16 51 0.5 0.7 4 16 0.2 29 7 0.25 7.3 7 3 7 16 (12–20) 16 (12–20) ORPHA:101016, ORPHA:90647
F2 613679 AR 15 0.1 1.0 1.0 3 1 1 0 63 0.33 0.03 2 2 0 0 (0–0.8) 0 (0–0.8)
ACADVL 201475 AR 13 2 1.0 0.5 1 1 1 4 576 0.15 0.6 13 1 14 13 (10–16) 13 (10–16) ORPHA:26793
UNC13D 608898 AR 10 0.5 0.5 0.5 1 38 0.1 0.2 239 0.30 0.06 3 4 2 0.6 (0.2–1.9) 0.6 (0.2–1.9) West et al.86
DMD 310200, 300376 XR 9 7.3 0.72 0.7 3 3 1 14 1682 0.20 2.9 9 1 8 7.4 (5.0–9.9) 7.4 (5.0–9.9) Restrepo-Cordoba et al.,101 Whitehead et al.,102 Broomfield et al.,103 Crisafulli et al.104
ALDOB 229600 AR 5 5 0.8 0.8 1 1 1 8 110 0.25 2.0 6 2 6 5.0 (3.4–7.6) 5.0 (3.4–7.6) Gaughan et al.105
BTK 300755 XR 4 0.2 0.75 0.75 2 1 1 0.4 895 0.25 0.09 8 1 7 2.3 (1.3–4.2) 2.3 (1.3–4.2) Smith and Berglöf106
FANCL 614083 AR 2 5 0.9 0.8 1 21 0.15 1 143 0.33 0.3 2 1 0 0 (0–0.8) 0 (0–0.8) ORPHA:84
BRCA2 605724 AR 0.4 5 0.9 0.8 1 21 0.15 1 4687 0.33 0.3 2 1 2 0.4 (0.1–1.6) 0.4 (0.1–1.6) ORPHA:84
ACAD9 611126 AR 0.2 1 0.8 0.8 1 39 0.1 0.2 105 0.40 0.071 1 1 2 0.2 (0–1.2) 0.2 (0–1.2)
FANCA 227650 AR 0.2 5 0.9 0.8 1 21 0.15 1 713 0.40 0.4 1 5 1 0.2 (0–1.2) 0.2 (0–1.2) ORPHA:84
SLC6A5 614618 AD, AR AR 0.0 1 0.5 0.7 1 4 0.5 1 46 0.50 0.7 0 0 0 0 (0–0.8) 0 (0–0.8) ORPHA:3197
BRIP1 609054 AR 0.0 5 0.9 0.8 1 21 0.15 1 631 0.50 0.5 0 1 0 0 (0–0.8) 0 (0–0.8) ORPHA:84
FANCG 614082 AR 0.0 5 0.9 0.8 1 21 0.15 1 121 0.50 0.5 0 1 0 0 (0–0.8) 0 (0–0.8) ORPHA:84
NBN 251260 AR 0.0 0.5 1.0 1.0 1 1 1 1 466 0.50 0.3 0 2 0 0 (0–0.8) 0 (0–0.8) Varon et al.107
RAG2 233650, 603554, 601457 AR 0.0 1 0.5 0.5 3 >100 0.1 0.4 122 0.50 0.20 0 0 0 0 (0–0.8) 0 (0–0.8) Dorsey and Puck108
SCNN1B 264350 AR 0.0 2 0.5 0.5 3 4 0.25 2 61 0.50 1.0 0 2 0 0 (0–0.8) 0 (0–0.8) ORPHA:756
SCNN1G 264350 AR 0.0 2 0.5 0.5 3 4 0.25 2 13 0.50 1.0 0 1 0 0 (0–0.8) 0 (0–0.8) ORPHA:756

UK, United Kingdom; assoc., associated; O, Orphanet ID; MOI, mode of inheritance, Δ, change in. Formulas in the second row explain the derivation of corrected values.

x was obtained from MIM.50 Disease prevalence values were taken from the literature (supplemental references 1–119). Where possible, they were specific to the genetic ancestry or ancestries being evaluated (such as the United Kingdom [UK] for UK Biobank [UKB470K] queries). Where prevalence estimates differed, the most authoritative was selected.

For each BeginNGS.2 gene, step 2 was to calculate Pcorrected, the prevalence of the corresponding x genetic disease(s) in the population corrected for disease penetrance (p, range 0.25–1.0 in the UK), disease expressivity at the age of the population being queried (e, range 0.25–1.0 in UKB470K subjects aged 40–69 years), and locus heterogeneity (l, range 0.1–1). Thus,

Pcorrected=Plpe,

where values for p and e were obtained from the literature for that gene, population, and age range. Locus heterogeneity was the proportion of all disease-affected subjects attributable to that specific gene. Values for l were derived from the number of genes associated each disease in MIM, together with the relative proportion of disease attributable to that gene from GeneReviews and the literature.109 Pcorrected is equivalent to the expected prevalence of disease-causing diplotypes for that disorder in that population.

For each BeginNGS.2 gene, step 3 was to calculate O, the observed genetic prevalence, calculated by the sum of the frequencies of n unique diplotypes containing P and LP variants with appropriate MOI for the disorder mapped to that gene in the GS of individuals in a cohort derived from the population (such as UKB470K; Figure 2 and Table 1):

O=1n(Observeddiplotypefrequenciesincohort).

Step 3 then identified BeginNGS.2 genes for which O was considerably greater than Pcorrected:

OPcorrected.

For such genes, step 4 was to calculate fcorrected, the expected maximum diplotype frequency in the population, based on adjustment of Pcorrected for diplotype heterogeneity (d):

fcorrected=Pcorrectedd.

d (range 0.1–0.5) was derived from the number of unique diplotypes containing BeginNGS.2 P and LP variants in that gene, together with the relative proportion of disease attributable to that diplotype from MIM, GeneReviews, the literature, and locus-specific databases. It should be noted that the prevalence, penetrance, and expressivity of severe childhood single-locus diseases may vary with age and genetic ancestry, necessitating matching the population and cohorts for these.

Step 4 then examined fdiplotype1..n, the frequency of each of the n diplotypes mapped to that gene to identify those whose observed frequency in the cohort exceeded fcorrected:

fdiplotype1nfcorrected

or

fdiplotype1.nPlped.

In step 5, variants contributing to diplotypes where fdiplotype1.n exceeded fcorrected were further evaluated as potential NSDCC variants based on pathogenicity evidence in ClinVar, Mastermind, and ACMG classification (Artificial Intelligence Classification Engine, Fabric Genomics) and functional consequences in the literature and locus-specific databases with quantitative phenotype-genotype information and disease severity, including, for CFTR (MIM: 602421)-associated cystic fibrosis (CF [MIM: 219700]) the Clinical and Functional Translation of CFTR (CFTR2) database (http://cftr2.org), and similar databases for F2 (MIM: 176930)-associated hypoprothrombinemia/dysprothrombinemia (MIM: 613679), F8 (MIM: 300841)-associated hemophilia A (MIM: 306700), F9 (MIM: 300746)-associated hemophilia B (MIM: 306900), FGA (MIM: 134820), FGB (MIM: 134830), and FGG (MIM: 134850)-associated dysfibrinogenemia/hypodysfibrinogenemia (MIM: 616004) and hypofibrinogenemia/afibrinogenemia (MIM: 202400), G6PD (MIM: 305900)-associated glucose-6-phosphate dehydrogenase deficiency (G6PDD [MIM: 300908]), GLA (MIM: 300644)-associated Fabry disease (MIM: 301500), OTC (MIM: 300461)-associated ornithine transcarbamylase deficiency (MIM: 311250), RYR1-associated malignant hyperthermia (MH) risk, and SCN5A-associated heart rhythm disorders (https://databases.lovd.nl/shared/genes/F2, https://www.cdc.gov/ncbddd/hemophilia/champs.html, https://site.geht.org/base-fibrinogene/).50,51,60,69,88,90,109,110,111,112 For example, we retained only the 237 G6PD variants associated with 1,985 World Health Organization (WHO) class I–III and 2022 WHO class A and B G6PDD,69 and only the 339 RYR1 variants associated with MH in the Food and Drug Administration-recognized database.112 Variants classified as NSDCC by this evaluation were removed (blocklisted) from the BeginNGS.2 set, and Ocorrected was calculated:

Ocorrected=1nf1n(fdiplotypescontainingblocklistvariants).

Steps 4 and 5 were repeated iteratively until O, the sum of the observed diplotype frequencies in that gene in the cohort, was in equilibrium with Pcorrected, the corrected, expected prevalence in the population. As a result of these steps, 286 variants were classified as NSDCC in BeginNGS.2. In disorders for which Ocorrected Pcorrected despite removal of diplotypes containing blocklist variants, the MOI was reviewed. In seven disorders OMIM lists the inheritance as either AD or AR. In these disorders, the MOI used for BeginNGS query was changed from dominant to recessive, and Ocorrected was calculated in the same manner as before.

BeginNGS.2 NBS assays

Supervised, automated GS analysis and interpretation by BeginNGS.2 has two steps. The first is an automated query that returns all positive diplotypes in an individual GS that contain BeginNGS.2 variants (after removal of blocklisted variants) in all BeginNGS.2 genes that obey BeginNGS.2 MOI rules. The second is a two-step process designed to recover disease-causing diplotypes composed of one or more non-BeginNGS.2 variants. This second step consists of GEM38 and Transformer (Fabric Genomics) in tandem. First GEM is run in NBS mode (without phenotype data and with variant analyses restricted to the BeginNGS genes and MOI; Table S1). The GEM output is input to Transformer, an AI agent that models a human reviewer’s actions when interpreting GEM results. Transformer was trained on a corpus of GEM results and clinical diagnoses. Transformer uses a probabilistic graphical model113 to discover and model the data features used by clinical geneticists to interpret GEM results for diagnosis. Transformer was trained using clinical diagnoses in 119 RDGS probands.38 BeginNGS.2 screen positives are the superset of results from the automated query and Transformer analysis.

Retrospective clinical utility assessment

The potential added clinical utility of BeginNGS was evaluated retrospectively by comparing the actual age at diagnosis with counterfactual return of BeginNGS.2 results on day of life (DOL) 10 in 152 of the 3,118 critically ill children who had previously received RDGS with disease detection by both modalities.15,47 By review of the EMRs, we calculated their ages at symptom onset, first hospitalization, and diagnosis by RDGS. For children in whom the age at diagnosis was considerably later than DOL 10, we undertook further EMR review to ascertain the severity and acuity of disease presentation, whether there were definitive changes in treatment upon diagnosis, the total number of hospitalizations and days hospitalized before diagnosis, and whether the child suffered disease-related organ damage. The observed presentations and organ damage were compared with the clinical features of the specific genetic disease in the literature to determine which were attributable to that molecular diagnosis. Based on the assessed efficacy of each indicated intervention for that disorder in GTRx, we compared the impact on the observed clinical features of disease of starting those interventions at the actual age of diagnosis by RDGS with that at the counterfactual age at BeginNGS.2 return of result (DOL 10).15,47 We assessed the degree of confidence for these assertions on a scale of low, medium, high, and very high. In SOMI infants for whom a genetic disease was identified by BeginNGS.2, we performed a similar analysis to determine whether infant death was likely to be attributable to that disorder. Based on the efficacy of each indicated intervention for that disorder in GTRx, we determined whether infant mortality might have been avoided by starting those interventions in the neonatal period.

Statistical analysis

Confidence intervals of binomial proportions were calculated with Wilson score 95% confidence intervals. Differences in categorical variables were calculated by Fisher’s exact test.

Results

Disorder selection for version 2 of BeginNGS

While version 1 was a research-grade prototype, BeginNGS.2 was designed for clinical gNBS. Of 77 new gene-SCGD dyads evaluated by expert review of published evidence according to Wilson and Jungner principles, 44 (57%) were added to BeginNGS.2 (Table S1).3,15,16,17,18 Critical reappraisal of the 388 version 1 disorders led to removal of 34 (9%) due to insufficient penetrance, expressivity, or severity to warrant inclusion in population NBS, consolidation of disorders comprising a phenotypic spectrum, evidence as genes of uncertain significance, or insufficient identification of causative variants by short-read GS (see supplemental results for details). In total, BeginNGS.2 included 412 (73%) of 568 gene-SCGD dyads evaluated (Table S1). It should be noted that the consensus to retain a disorder did not imply sufficient evidence for inclusion in public health gNBS. Rather, it indicated that the benefit-to-harm ratio was sufficient for inclusion in gNBS research studies.

The MOIs used in BeginNGS queries were also critically reappraised for transition from research to clinical test (supplemental results). Inheritance was re-evaluated in 29 disorders for which MIM lists inheritance as both AD and AR,50 six disorders for which Lyonization leads to sufficiently severe disease in female carriers to warrant population screening, and disorders for which severe childhood disease is limited to homozygotes and compound heterozygotes (see supplemental results for details). For example, G6PDD queries in BeginNGS.2 were changed from X-linked dominant to X-linked recessive and limited to variants associated with more severe manifestations (WHO classifications I–III and A and B).69 As a result, the MOI used in BeginNGS.2 queries was changed in 36 (9%) of 412 gene-disorder dyads to limit screening to SCGDs (Table S1). As noted below, further review of gene-disorder dyads and MOI used in gNBS was undertaken in response to evaluation in large cohorts.

Therapeutic interventions for BeginNGS.2 disorders

One of the most important reasons for disorder selection for gNBS is the likelihood of improved outcomes by early therapeutic interventions in identified affected individuals.15,16,17,18 BeginNGS.2 results are returned together with the GTRx eCDS that contains disorder natural history information, appropriate confirmatory tests and specialist consultants, and structured evaluations of the efficacy, evidence of efficacy, indications, contraindications, and urgency of initiation of the therapeutic interventions corresponding to the screened disorders. Since the pace of therapeutic development and approval for childhood genetic disorders is accelerating,114 we reviewed or re-reviewed the therapeutic literature for new BeginNGS.2 disorders and version 1 disorders.15,16,17,18 Fifty interventions were added for 31 version 1 disorders, and 140 interventions were removed from 46 BeginNGS.1 disorders, representing 12% change in an 18-month interval. For 44 new BeginNGS.2 disorders, 119 interventions were added. In total, BeginNGS.2 provides clinical guidance for 1,603 beneficial interventions for 412 SCGDs (Table S2). Of note, only 16.1% of 9,965 interventions reviewed were adjudged to have sufficient evidence of efficacy for inclusion. This reflects the paucity of clinical trials for existing SCGD therapies and the extensive use of off-label treatments and therapies based on case report and case series evidence.15,16,17,18

Variant selection by training and testing in large control-subject cohorts

Meeting the requirement that gNBS scales to whole populations at very low cost necessitates GS interpretation principally by supervised AI rather than the manually intensive current clinical standard. BeginNGS.1 provided a prototype solution.15 BeginNGS.1 featured an automated query with ∼30,000 ClinVar P or L) variants that were manually trained by root-cause analysis of UK Biobank exomes, identifying 94 variants with evidence against disease causality (i.e., NSDCC). Since UK Biobank subjects were aged 40–69 years at the time of recruitment, they serve as a control population for almost all SCGDs with high penetrance and expressivity. Upon removal of NSDCC, the estimated specificity of BeginNGS.1 was >99.7% in the UK Biobank.15 The input for BeginNGS.2 was almost twice as large—53,855 ClinVar and Mastermind variants—and featured variants that were curated to be P, LP, and with conflicting pathogenicity assertions (which included at least one P or LP assertion, Table S3). Since for many variants the associated condition was not delineated in ClinVar at that time, ClinVar P and LP assertions were not necessarily for the 412 BeginNGS.2 SCGDs. Additionally, P and LP assertions are generally developed in affected subjects receiving diagnostic GS. Thus, they generally ignore consideration of penetrance in general populations and early childhood expressivity, which are critically important considerations in risk assessment for SCGDs in healthy newborns. Likewise, in affected subjects undergoing diagnostic GS for disorders with variable MOI, the penetrance and early childhood expressivity of diagnostic variants are not considered, whereas they are highly pertinent in risk assessment for SCGDs in healthy newborns. Variable MOI, low penetrance, and low newborn expressivity were anticipated to exacerbate imprecision (low PPV) by contributing to contaminating NSDCC variants. Indeed, query of the UKB470K exomes with 53,855 variants identified 474,264 positive diplotypes in 73.5% (345,443) of subjects (after correction for positive subjects with >1 diplotype per disease-gene dyad, Table S4). Of note, 99.4% of the variants map within the exome target regions. These contained 2,173 (4%) variants in diplotypes that matched the MOI of the 412 disease-gene dyads. This exceeded the combined prevalence of the 412 SCGDs in UK adults aged >40 years by 37-fold, suggesting 97% to be false positives.

To solve the problem of imprecision associated with NSDCC variants comprehensively in a reproducible, structured manner that met clinical requirements, we performed supervised federated training in several large diplotype models. Federated training (queries of datasets at different sites, currently without machine learning) identified high-likelihood NSDCC variants on the basis of population frequencies that violated the effects of purifying hyperselection on SCGDs. Diplotypes, rather than genotypes, were queried to provide direct counts and distinguish compound zygosity states from haplotypes with more than one variant. Following NSDCC variant removal, automated supervised interpretation of gNBS in individual subjects was performed by both by variant diplotype query and AI-based pathogenicity prediction algorithms (GEM and Transformer; Figure 1; see subjects and methods).38 The trained, adjudicated variant set was used as a true positive training set for refinement of Transformer, which supplemented direct gNBS queries by identification of novel variants with high likelihood of disease causality to achieve high recall.15,38

Federated training in large diplotype models occurred in several stages. BeginNGS.2 genes were identified in which the expected prevalence of the associated disease in the UK adult population (Pcorrected) was considerably lower than the observed genetic prevalence in UKB470K adult exome sequences (O) using all 53,855 variants and the MOI discussed above (Figure 2 and Table 1). Where more than one disease was associated with a gene, their combined prevalence was used. O was the sum of the observed BeginNGS.2 diplotype frequencies in UKB470K. Pcorrected was derived from the actual disorder prevalence, P, in UK adults by correcting for estimated penetrance, p, adult expressivity, e, and locus heterogeneity, l. Where more than one gene was associated with a disorder, locus heterogeneity was the proportion of disease-affected subjects associated with that gene. An exception was RYR1 (MIM:180901)-associated malignant hyperthermia susceptibility (MIM: 145600), where P and LP variants and prevalence excluded those associated with RYR1-associated congenital myopathy (MIM: 255320) and King-Denborough syndrome (MIM: 619542).112 Prevalence, penetrance, and age-specific expressivity values were estimates from the literature or results of prior population screening. The prevalence, penetrance, and expressivity of SCGDs often vary with age. For example, the prevalence of Duchenne muscular dystrophy (DMD [MIM: 310200]) in males aged 15–19 years is 2-fold higher than in those aged <5 years (due to increased expressivity) and 7-fold higher than in those aged >25 years (due to early death).102,115 Among 52 BeginNGS.2 genes and 80 associated SCGDs surveyed in detail, O exceeded Pcorrected in 37 (71%) genes and 59 (74%) SCGDs (Table 1). For example, the observed UKB470K genetic prevalence of cystic fibrosis (O) was 2,349/100,000 (the sum of the frequency of 499 diplotypes comprising 160 of 1,261 BeginNGS.2 CFTR variants; Table 1 and Figure S3A). The prevalence of cystic fibrosis (CF [MIM: 219700]) in UK newborns is 40/100,000 and median survival is 47 years,116 suggesting an adult UK prevalence, P, of ∼20/100,000 (∼117-fold less than O; Table 1 and Figure S3A). CFTR (MIM: 602421) is one of four loci associated with bronchiectasis and elevated sweat chloride (BESC). The others are SCNN1B (MIM: 600760)-associated BESC1 (MIM: 211400), SCNN1A (MIM: 600228)-associated BESC2 (MIM: 613021), and SCNN1G (MIM: 600761)-associated BESC3 (MIM: 613071).50 Since BESC1–3 are much less common than CF, the locus heterogeneity, l, of CFTR-CF was estimated at 0.95 (Table 1 and Figure S3A). The adult penetrance, p, and expressivity, e, of CFTR-CF are high, and thus were set to 0.7 (Table 1; Figure S3A). These values gave a Pcorrected value for CFTR-CF of 39/100,000, which was 60-fold lower than O, suggesting some of the 160 variants to be NSDCC (Table 1). Additional details of the federated query are provided in supplemental results.

The second stage was to identify BeginNGS.2 variants that contributed to diplotypes whose expected UK adult population frequency (fcorrected) was much lower than the observed diplotype frequency in UKB470K and evaluate these further as potential NSDCC variants (Figure 2 and Table 1). fcorrected had corrections for diplotype heterogeneity, penetrance, expressivity, and locus heterogeneity. Diplotype heterogeneity, d, was the largest proportion of subjects affected by a disease-gene dyad associated with a specific diplotype. For example, the UK diplotype heterogeneity for CF-CFTR was set at 0.1, since there is considerable diplotype heterogeneity (with the exception of diplotypes containing the most frequent P variant, CFTR F508del, which is identified in 30%–80% of UK patients dependent upon genetic ancestry).110 For CF-CFTR, this yielded an fcorrected value of 3.9/100,000, exceeded by 109 of 499 CFTR diplotypes in UKB470K (Table S5; Figures 2 and S3). We further evaluated potential NSDCC variants based on pathogenicity assertions in ClinVar, Mastermind, ACMG classification (Artificial Intelligence Classification Engine, Fabric Genomics), and functional consequences in the literature and disease databases. For example, the compound heterozygous diplotype [chr7:117590400G>C]; [chr7:117592169C>T] in CF-CFTR had a frequency of 1,301/100,000. The corresponding variants, p.Gly576Ala (ClinVar:7165) and p.Arg668Cys (ClinVar:35835), were part of a known haplotype and not a CF-causal diplotype according to the CFTR2 database (http://cftr2.org). Evaluation classified 20 CF-CFTR variants and 109 CF-CFTR diplotypes, and one additional haplotype, as non-CF causal (Table S6). After their removal, 83 unique CF-CFTR diplotypes remained (Table S5), and Ocorrected was 37/100,000 (95% confidence interval [CI] 27–51/100,000), which agreed with Pcorrected (39/100,000; Table 1 and Figure S3). The most frequent remaining positive CF diplotype [chr7:117603774T>C]; [chr7:117559590ATCT>A] was compound heterozygosity for CFTR p.Leu967Ser and p.Phe508del. Clinical information and functional testing indicate this diplotype to have varying clinical consequence (http://cftr2.org). Additional examples are discussed in supplemental results.

Federated training was extended to 96,811 MCPS exomes with genetic ancestry different from that of UKB470K. For example, IDS (MIM: 300823)-associated mucopolysaccharidosis II (MPS2 [MIM: 309900]) is an X-linked recessive disorder without expressivity in carrier females (Table 1).117 While the birth prevalence of iduronate-2-sulfatase deficiency is 10.4/100,000 by biochemical screening, this includes severe MPS2 (MPS2A, prevalence 0.7 per 100,000), attenuated MPS2 (MPS2B, 0.7/100,000), and pseudodeficiency (9.0/100,000).118 Since enzyme replacement therapy with idursulfase (EC 3.1.6.13) has been available only since 2006, individuals with MPS2A would not have survived to be enrolled in UKB470K or MCPS.119 Thus, the prevalence of IDS-associated disorders in the UKB470K and MCPS adult cohorts should be ∼9 per 100,000 (Table 1). The observed genetic prevalence, O, of IDS-associated disorders in UKB470K and MCPS were 41/100,000 and 51/100,000, respectively (Table 1). Assuming 75% penetrance and expressivity in adults, and a maximum contribution per diplotype of 30% of IDS positives, the corrected adult diplotype frequency fcorrected was 4.8/100,000 (Table 1). Of 170 UKB470K hemizygotes for IDS c.641C>T (ClinVar:92622, 1 P assertion, 2 LB assertions, 6 B assertions), 163 were of African ancestry (AFR diplotype frequency 2,100/100,000) and only six of European ancestry (EUR diplotype frequency 1.4/100,000). MCPS had 49 IDS c.641C>T hemizygotes (MCPS frequency 50/100,000). Thus, federated training with diverse genetic ancestries flagged this variant as NSDCC. After blocking this variant, Ocorrected for IDS-associated disorders in UKB470K was 1.7/100,000 (Table 1). Thus, differences in ancestry between the large diplotype models allowed assessment of the specificity of BeginNGS in populations reflective of the diverse US population and detection of ancestry-specific NSDCC.

Of 395 variants evaluated in this manner, 102 were retained based on functional or case-based evidence of severe disease causality, while 293 were classified as NSDCC and blocklisted (Table S6). They included 103 variants determined to lack a ClinVar P or LP assertion, 110 variants not associated with a BeginNGS.2 gene, disorder, or MOI, 53 variants whose functional data were demonstrated to be NSDCC, and 9 variants that were in haplotypes (linkage disequilibrium) with other LP or P variants. Blocking NSDCC variants decreased the number of evaluated BeginNGS.2 genes with Ocorrected > Pcorrected from 37 (71%) to 12 (23%). Seven of the remaining 12 genes were associated with disorders that OMIM identified as having variable (AD or AR) inheritance (ABCC8, GLRA1, CPOX, CHRND, CHRNA1, CHRNB, and HESX1). These had been queried as AD disorders. However, early childhood expressivity and penetrance are much less in the AD forms of these disorders than the AR forms. For example, in ABCC8-associated familial hyperinsulinemic hypoglycemia 1 (MIM: 256450), newborns with the AR form develop severe refractory hypoglycemia that responds only partially to diet and diazoxide, whereas the AD form has considerably later onset and generally responds fully to diet and diazoxide therapy.120 For these disorders with variable inheritance, we reasoned that it was preferable for gNBS to have a high PPV for severe early-childhood-onset disease rather than a low PPV but higher sensitivity for late-onset, less severe disease. Therefore, we changed the MOI queried in BeginNGS.2 from AD to AR. This led to Ocorrected < Pcorrected for all seven genes (Table 1). Following blocking of NSDCC variants and changing the MOI queried from AD to AR for these genes, only five (10%) still had Ocorrected > Pcorrected.

After blocking NSDCC variants and changing the MOI of seven genes, a BeginNGS.2 query of UKB470K identified 1,682 of the 53,562 remaining variants in 1,758 diplotypes and 9,876 (2.1%) UKB470 subjects (9,810 [2.0%] following correction for subjects with more than one positive diplotype), a 48-fold reduction in positive diplotype count (Table S7). The MOI with the largest proportionate decrease in gNBS-positive UKB470K subjects were AR and AD (27% and 21% reduction in positive variants, 99% and 98% reduction in positive subjects, respectively; Table S8). Variants categorized as having conflicting pathogenicity assertions in ClinVar were those with the largest proportionate adjustment in positive subjects (99%) but remained the group with the largest number of UKB470K positive subjects (1%, Table S9), suggesting that additional evaluation was needed in this group in the remaining five Ocorrected > Pcorrected genes.

We evaluated whether simpler approaches could recapitulate the blocklist of NSDCC variants. First, we examined whether blocklist variants could be recapitulated based on having zero or one ClinVar quality stars. We found that 61 (19%) of blocklist variants had two or more stars, indicating that they would not have been removed on this basis (Table S6). Second, we evaluated whether blocklist variants could be recapitulated based on the gnomAD v.4.1 maximum credible genetic ancestry group allele frequency (Grpmax filtering AF, 95% confidence)121,122 exceeding the disease-specific threshold, fcorrected. However, only 145 (59%) of blocklist variants had Grpmax filtering AF > fcorrected (Table S6). Finally, we evaluated whether blocklist variants could be recapitulated based on gnomAD v.4.1 homozygote frequency. However, only 49 (17%) of blocklist variants had a gnomAD homozygote frequency >fcorrected (Table S6). Thus, simple outlier removal was not effective in identifying NSDCC variants.

We checked the results of federated training in large diplotype models in two other disease-gene dyads with well-understood prevalence, locus heterogeneity, allelic heterogeneity, and comprehensive diplotype-phenotype assessment databases (Table 1). First, G6PDD has a prevalence of 300/100,000 in the UK, and Pcorrected was 533/100,000. The observed UKB470 genetic prevalence of G6PDD, O, was 852/100,000. Following removal of 4 of 239 BeginNGS.2 variants, Ocorrected was 311/100,000 (95% CI 295–3,327/100,000), which was less than Pcorrected. Following removal, the most frequent UKB470K G6PDD-positive diplotype was also within fcorrected, the expected, corrected UK diplotype frequency. It was hemizygous G6PD p.Val98Met (G6PD Asahi, chrX:154536002C>T), which is associated with 2022 class B (1985 class III) G6PDD. Second, the prevalence of dystrophinopathies in the UK population >40 years of age is 7.3/100,000, which includes Becker muscular dystrophy (MIM: 300376) and dilated cardiomyopathy 3B (MIM: 302045) with a penetrance of ∼72% in adult European populations,50,102,115 providing a Pcorrected value of 14/100,000. Following removal of one of 1,682 variants, Ocorrected in UKB470K was 7.4/100,000 subjects. Finally, we evaluated the 100 most frequent diplotypes in UKB470K in detail. Only two (4%) of the top 50 were childhood disease causal, both of which were associated with G6PDD (Figure 2). Ten (20%) of diplotypes 51–100 were disease causal. This congruity demonstrates the power of training based on purifying hyperselection of SCGD diplotypes.

Transformer performance in the 1000 Genomes dataset

GEM and Transformer are automated variant interpretation software tools developed for use in RDGS.15,17,38 For gNBS they were modified to limit interpretation to diplotypes in BeginNGS.2 genes and patterns of inheritance to omit ten genes with the highest (anomalous) population-attributable risk, with higher cutoff values than used in RDGS. Details of the performance of GEM and Transformer with and without these adjustments are provided in supplemental results. In the 1000 Genomes Consortium dataset (genomes of 2,495 apparently healthy, unrelated adults from 26 populations), following optimization, GEM together with Transformer identified 125 high-confidence, positive diplotypes (5.0%), of which 16 (0.6%) were G6PDD, in addition to 25 (1.0%) diplotypes that they were unable to classify. Thus, it appears that GEM and Transformer will require additional optimization for use alone in gNBS in healthy newborns.

BeginNGS performance in critically ill newborns with suspected SCGDs

Critically ill children with suspected SCGDs in ICUs increasingly receive RDGS.123,124 We retrospectively evaluated the PPV and recall of BeginNGS.2 in 3,118 children who had received RDGS.15 Phenotype-informed, singleton or parent-child trio RDGS reported diagnostic findings in ∼30% of these children.15,123,124 Of these, 187 variants (152 diplotypes) were on target (associated with the 412 BeginNGS.2 gene-disorder dyads, Table 2). Query of 3,118 singleton GS (without phenotype information) by the retained 53,375 BeginNGS.2 variants identified 147 (78.6%) of the RDGS-reported variants and 124 (81.6%) of RDGS-reported diplotypes (Table 2). The 40 variants detected by RDGS but not BeginNGS.2 were either absent from ClinVar or Mastermind or present without P or LP assertions. We supplemented the query by identification of novel loss-of-function variants (in disorders with known loss-of-function genetic mechanism) and by utilizing GEM and Transformer.15,38 These tools identified 34 additional RDGS-reported variants (181 total variants, 96.8% sensitivity; Table 2 and supplemental results). Thus, the BeginNGS.2 recall with the variant query, Transformer, and novel loss-of-function variants was 184 (98.4%) of 187 RDGS-reported variants (150 [98.7%] of 152 RDGS-reported diplotypes; Table 2). Two remaining unidentified variants by Transformer were VUS in recessive disorders that had been reported by RDGS because they were in compound heterozygous diplotypes with pathogenic variants in gene-disorder dyads that were a good fit for the proband phenotypes. In addition, the BeginNGS query with 53,375 retained variants identified 81 diplotypes that had not been reported by RDGS in the 3,118 probands (Table 2). Of these, six were false positives (not associated with a BeginNGS.2 MOI or disorder), giving a BeginNGS.2 query PPV of 97.1% (199 of 205 diplotypes, Table 2). The 75 new BeginNGS diplotypes were disorders for which early detection allows monitoring for problems, earlier treatment, and better outcomes. In a random subset of 1,448 of the 3,118 children, Transformer also identified 62 diplotypes that were not reported by RDGS, giving a nominal Transformer PPV of 62% (Table 2 and supplemental results). As noted above, the PPV of Transformer in healthy newborns is anticipated to be considerably lower. Of 233 total diplotypes identified by RDGS or BeginNGS.2 in the 412 gene-disorder dyads in 3,118 GS, the recall of BeginNGS.2 and RDGS were 99.1% and 66.7%, respectively (Table S10). Thus, in a cohort of critically ill children with suspected underlying SCGD, the prevalence of true-positive BeginNGS screens was 7.2% (225 of 3,118), which was significantly greater than in the UKB470K (2.0%, 9,810 of 469,902, p < 0.00001; Table S10). Thirteen (6%) of the 228 findings were NBS RUSP core conditions.

Table 2.

Comparison of recall and PPV of 412 gene-disorder dyads by RDGS, the BeginNGS.2 diplotype query both for all variants and limited to those with more than one ClinVar quality score, Transformer, and BeginNGS.2 with Transformer in 3,118 critically ill children receiving RDGS

Metric Definition Total RDGS (n, %) BeginNGS.2 query (n, %) BeginNGS query limited to ClinVar quality >1 star (n, %) Transformer (n, %) BeginNGS.2 with Transformer (n, %)
Recall no. of on-target RDGS reported variants recapitulated by each test 187 187 (100%) 147 (78.6%) 95 (50.8%) 181 (96.8%) 184 (98.4%)
Recall no. of on-target RDGS reported diplotypes recapitulated by each test 152 152 (100% 124 (81.6%) 90 (59.2%) 147 (96.7%) 150 (98.7%)
Recall total positive on-target diplotypes identified by each test 233 152 (66.7%) 205 (88.0%) 117 (50.2%) 223 (97.8%) 231 (99.1%)
PPV on-target PPV of each test in population receiving RDGS 3,118 98.7% (150 of 152) 97.1% (199 of 205) 98.3% (114 of 116) 62% (101 of 163)a 97.8% (226 of 231)a
a

Transformer PPV was evaluated in 1,488 of the 3,118 probands.

We evaluated whether a simpler approach to generation of the blocklist of NSDCC variants would have similar PPV and sensitivity in the 3,118 critically ill children. Limiting the BeginNGS.2 query to ClinVar variants with at least two quality stars reduced sensitivity from 88.0% to 50.2% while marginally increasing PPV (from 97.1% to 98.3%, Table 2). Thus, simple approaches to variant removal both decrease the PPV of gNBS in adults and decrease the sensitivity of gNBS in critically ill children.

Akin to RUSP NBS, a potential benefit of BeginNGS is the possibility of effective treatment at or before symptom onset. With RDGS, in contrast, treatment is delayed until severe illness develops, ICU admission occurs, evaluation suggests a broad genetic differential diagnosis, RDGS is ordered, and results are returned. We evaluated the potential added clinical utility of BeginNGS by comparing the counterfactual return of BeginNGS.2 results on DOL 10 with the actual time of diagnosis by RDGS in children with SCGDs detected by both modalities.15,47,123,124 In eight children, RDGS was performed outside the neonatal period, and there was sufficient knowledge of the natural history of the disorder with and without early treatment to make quantitative assessment possible (Table S11). In those children, symptom onset was at a median DOL 70 (average 105 days, range 0–313, Table 3). A median of two (average 2, range 1–7) hospitalizations occurred before RDGS diagnosis, and diagnosis occurred after a median of 8 hospital days (average 23, range 2–97). BeginNGS and early treatment would potentially have prevented 19 hospitalizations and 181 hospital days in the eight children (Table 3). BeginNGS would also potentially have shortened the time to diagnosis by a median of 121 days (average 474 days, range 3–2,203 days, total 3,791 days, Table 3). Five of the eight children had life-threatening disease presentations, of which four would potentially have been avoided by BeginNGS (Table 3). All the children had organ damage, which would potentially have been avoided in six by BeginNGS. In three families, parents were evaluated by social services for potential child abuse due to failure to thrive that would also potentially have been avoided by BeginNGS. This incremental clinical utility of BeginNGS was in addition to the 75 findings by BeginNGS that were not reported by RDGS.

Table 3.

Comparison of actual clinical utility of RDGS with counterfactual clinical utility of BeginNGS.2 in eight children

ID RDGS Dx recapitulated by BeginNGS.2 Age Sx start (days) Life-threatening disease presentation Age RDGS Dx (days) Definitive Rx change due to Dx Time Sx start to Dx (days) Hospital admits before Dxa Hospital days before Dx Organ damage Assertion confidence
1 dev. epileptic encephalopathy 6B 313 no 2,516 no 2,203 2 4 GDD low
2 dev. epileptic encephalopathy 14 7 no 1,004 no 997 7 46 GDD, FTT low
3 pyruvate dehydrogenase E1α def. 0 infantile spasms 304 ketogenic diet, gastrostomy feeding 304 2 2 FTT very high
4 cong. myasthenic syn. 5 129 acute respiratory failure 259 albuterol, cholinesterase inhibitor avoidance, home O2 130 3 16 FTT very high
5 SCN5A-related heart rhythm disorders 253 supraventricular tachycardia 256 NK, lost to follow-up 3 1 2 HRD very high
6 XL immunodysregulation, polyendocrinopathy, enteropathy 100 no 139 rituximab, tacrolimus, bone marrow transplant 39 1 11 FTT very high
7 cong. disorder of glycosylation It 0 severe bradycardia 112 D-galactose supplement, home hypoglycemia plan 112 2 97 FTT very high
8 Biotin-thiamine-responsive basal ganglia dis. 40 profound HIE 43 biotin and thiamine supplement 3 1 3 FTT, HIE very high
Average 105 579 473.9 2.4 22.6
Median 70 258 121 2 7.5
Actual RDGS total 5 4,633 3,791 19 181 8
Counterfactual BeginNGS total 1 80 23 0 0 2

Sx, symptoms; Dx, diagnosis; Rx, treatment; NK, not known; GDD, global developmental delay; FTT, failure to thrive; HRD, heart rhythm disorder; HIE, hypoxic ischemic encephalopathy.

a

Includes admission in which RDGS performed.

BeginNGS performance in infant deaths and first-degree relatives of critically ill children

We also evaluated the BeginNGS.2 53,576 variant query in singleton GS of 3,519 parents and siblings of the 3,118 children who had received RDGS.15 Following removal of three false-positive variants (not pathogenic in the heterozygous state), there were 126 positive diplotypes (3.6%, 98% PPV, Table S10), of which 27 were G6PDD. Thus, the proportion of positive BeginNGS screens in young adult first-degree relatives of children with suspected SCGD was greater than in the UKB470K (2.8%, p < 0.00001) and less than in probands (7.2%, p < 0.00001, Table S10).

We also evaluated the BeginNGS.2 53,576 variant query among 705 consecutive infant deaths in San Diego County between 2005 and 2018 (SOMI) for which archived NBS DBS were available (Table S10). BeginNGS identified 61 positive diplotypes, of which five were query false positives (not disease causative in the heterozygous state), giving a PPV of 92% (56 of 61, Table S10). The 56 remaining positive individuals yielded a positive rate (7.9%) similar to that in children with suspected SCGD who received RDGS (7.2%, p = 0.52) but significantly greater than in parents and siblings of those who had received RDGS (3.6%, p < 0.00001, Table S10). In 18 of the 56 infant deaths, BeginNGS identified G6PDD, which was unlikely to have contributed to mortality. In ten infant deaths, however, BeginNGS identified SCGDs that are known to cause infant death and that have effective therapies (Table 4). Clinicopathologic correlation was not possible in these individuals, since the archived DBS were de-identified. However, one of them was almost certainly an affected older sister of a 5-week-old boy diagnosed in 16 h by RDGS with SLC19A3-associated biotin/thiamine-responsive basal ganglia disease (BTRBGD, MIM:607483, homozygous c.597dup) who received immediate, effective therapy and had a good outcome.125 His older sister died in infancy of progressive encephalopathy during the SOMI study period without a molecular diagnosis. Had these ten infants received BeginNGS.2 and the indicated therapeutic interventions, their deaths could potentially have been avoided (Table S12).47,126 The remaining 28 infant deaths with non-G6PDD BeginNGS findings were positive for SCGDs, in which early detection allows monitoring for problems and earlier treatment, and thus may also have had better outcomes had they received BeginNGS.2. Only eight (14%) of the 56 infant deaths with positive BeginNGS findings would have been positive by RUSP NBS. Thus, BeginNGS.2 might potentially decrease infant mortality by 1.4%–5.3%.

Table 4.

Cause of death and indicated therapeutic interventions for severe, infant-onset disorders identified by BeginNGS in archived NBS DBS of SOMID infant deaths

ID Disease Gene Immediate cause of death Final cause of death (ICD code) Other contributory causes Indicated therapeutic interventions
1 her. thrombotic thrombocytopenic purpura ADAMTS13 cardiac arrest atrioventricular septal defect (Q212) heart failure, complex congenital heart disease, complete atrioventricular septal defect fresh frozen plasma, phototherapy, plasma exchange
2 infantile hypophosphatasia ALPL complications of hypophosphatasia X-linked hypophosphatasia (E833) none asfotase-α, calcium homeostasis
3 respiratory failure thoracic instability, hypophosphatasia
4 congenital adrenal hyperplasia due to 21-hydroxylase deficiency CYP21A2 respiratory failure necrotizing enterocolitis (P77) extreme prematurity hydrocortisone, sodium chloride supplementation, dexamethasone, fludrocortisone
5 Gaucher disease GBA1 respiratory arrest persistent fetal circulation (P293) pulmonary hypertension, genetic disorder 15q26 deletion ERT: imiglucerase, velaglucerase-α, or taliglucerase-α; miglustat
6 holocarboxylase synthetase deficiency HLCS progressive mitochondrial neurodegenerative disorder metabolic disorder (E889) seizures biotin, levocarnitine
7 ornithine carbamoyltransferase deficiency OTC cardiorespiratory failure congenital diaphragmatic hernia (Q790) congenital heart disease, renal failure, pulmonary hypertension glucose, ammonia scavenger, arginine, citrulline, hemodialysis, low-protein diet, hypothermia, intravenous lipids (liver transplant)
8 pyruvate dehydrogenase E1-α deficiency PDHA1 respiratory failure metabolic disorder (E889) severe metabolic acidosis, unknown inborn error of metabolism, multiple congenital anomalies, abnormal corpus callosum, cerebral ventriculomegaly thiamine, dicholoroacetate, ketogenic diet, phenylbutyrate
9 familial hemophagocytic lymphohistiocytosis 2 PRF1 respiratory arrest hydrops fetalis (P832) severe respiratory failure, severe hydrops, multiorgan failure HLH-2004 protocol126
10 biotin-responsive basal ganglia disease SLC19A3 progressive mitochondrial neurodegenerative disorder metabolic disorder (E889) seizures thiamine, biotin

Indicated therapeutic interventions are from the BeginNGS.2 eCDS (https://gtrx.rbsapp.net/). Her., hereditary; ERT, enzyme replacement therapy.

Discussion

Here we have shown that it is feasible to screen populations for more than 400 SCGDs with high sensitivity and high PPV. In addition, we have described processes that will enable future scaling of gNBS to include many additional disorders, variants, interventions, and performance of adequately powered clinical trials. We have also shown that it is possible to validate these gNBS processes using adult populations as a proxy for unaffected newborns and critically ill children as a proxy for affected newborns, creating the possibility of validation in millions of previously sequenced genomes. Finally, we report initial measurements of the clinical utility and potential for improved childhood outcomes by BeginNGS.2, which are highly encouraging.

There are now more than 30 international research studies evaluating gNBS with individualized processes for selection of disease-gene dyads and with substantial heterogeneity in gene lists.29 Here we have described in detail the methods used to select disease-gene dyads, therapeutic interventions, and inheritance modes for BeginNGS.2, which were refined and further systematized in preparation for deployment in prospective clinical trials.15,16,17,18 In simplest terms, disorders were chosen by expert review of published evidence according to the 1968 Wilson and Jungner principles, and choices are re-reviewed approximately annually in light of new evidence.3 Unlike other gNBS efforts, we have invested significant effort in validating 1,603 therapeutic interventions for inclusion, which are the basis for clinical guidance for pediatricians that accompany positive results. While the methods may seem prosaic, disorders are selected not for use in public health NBS but rather for inclusion in prospective clinical trials. Suitable performance of each disorder in those trials (recall, PPV, clinical utility, outcome improvement) will be required for retention and eventual implementation in public health gNBS.

The genesis of BeginNGS was 13 years of experience in development and implementation of RDGS for SCGDs in critically ill children, including GTRx management guidance to facilitate rapid translation of results into precision interventions at scale.15,16,17,18,27,28 The type of diseases identified, indicated precision interventions, and core technologies employed are common to BeginNGS and RDGS (Table S13). However, their scope, objectives, indications, reporting, pretest probability of positives, results recipients, and required sensitivity, PPV, turnaround time, and cost are quite dissimilar (Table S13). While RDGS is ordered to diagnose individual acutely ill children with suspected SCGD (50% pretest probability of positive), BeginNGS is designed to screen all (3.7 million/year), mainly asymptomatic infants (pretest probability of SCGD <5%). Thus, while RDGS requires very rapid time to result (ideally 1 day) and 100% desired recall, BeginNGS demands high PPV and low very cost per subject. Achievement of acceptable PPV despite relatively low pretest probability of SCGDs necessitates a very low false-positive rate. Compounding this is Last’s iceberg of underestimated prevalence of mild formes frustes of SCGDs associated with variant diplotypes that lack full penetrance or expressivity.127,128 Thus, BeginNGS false positives encompass both variant diplotypes miscategorized as P or LP for SCGDs, and those that, while P or LP, have insufficient penetrance or expressivity for utility in population screening for SCGDs. The latter have been referred to as late-onset pathogenic (LOP) variants.129,130 This exemplar of Scylla and Charybdis is much more difficult to navigate than recognized131: untrained BeginNGS.2 screens with 53,855 P and LP variants were positive in 73.5% of UKB470K subjects. Here we showed, however, that training based on purifying hyperselection of SCGD-causing diplotypes in large diplotype models removes both types of false positives in a manner that could not be recapitulated by simpler data-cleaning methods. Federated training of BeginNGS disease-gene dyad (genetic) prevalence in ancestry- and age-diverse UKB470K, RCIGM, and MCPS genomic datasets identified patterns of inheritance and NSDCC variants that contributed 97% of likely false positives. By building upon prior allele frequency methods,121,122,132,133,134,135,136,137 federated comparison of diplotype prevalence in the three genomic datasets with prevalence in ancestry- and age-matched populations (with appropriate corrections) identified 293 (0.5%) of 53,855 BeginNGS.2 variants that contributed 96% of positive UKB470K subjects. Removal (blocklisting) or parameter adjustment of individual disease-gene-variant-MOI tetrads that exceeded credible frequency thresholds reduced positive BeginNGS.2 diplotype counts 48-fold to 2.0% of UKB470K subjects. In CF, G6PDD, and DMD, which have well-defined prevalence, locus heterogeneity, allelic heterogeneity, penetrance, expressivity, and functional consequence databases, federated training in large diplotype models removed known NSDCC variants, retained disease-causing variants, and yielded genetic prevalence in accord with population prevalence.

Prevalence assessments in large diplotype models appear to provide a generalizable framework for qualification disease-gene-variant-MOI tetrads for population screening. While limited herein to NBS, there is considerable potential for these methods to be translated to other age groups for disorders with later age of onset. Tetrad qualification remains incomplete in BeginNGS.2, with six genes requiring further examination for NSDCC variants. In addition to inexact knowledge of age-related variation in prevalence, penetrance, and expressivity, or comprehensive diplotype-phenotype assessments for many gene-disorder pairs, our large diplotype models do not yet represent all major genetic ancestries nor positive findings for all BeginNGS.2 genes, precluding their qualification for screening in those ancestries or disorders, respectively. While 80 disorders were fully evaluated with these methods, extension to the remaining set is enormously facilitated by the phenomenon of homogeneous SCGD clusters with diverse genetic underpinnings. For SCGD clusters, such as Fanconi anemia or hyperinsulinemia, the underpinning model does not change between genes. Despite being incomplete, the success of this method to date is a testament to the power of purifying selection on prevalence of diplotypes causing severe early-childhood-onset diseases.138,139 The training algorithm parameterization can be further optimized on a gene-by-gene basis to obtain optimal F measures and by expanding the number of case and control genomes evaluated (see supplemental discussion). Another future direction will be to undertake a sensitivity analysis of O=xPcorrected at values of x between, for example, 0.1 and 1.0, to identify the optimal F measure.

Retrospective cohort evaluations of BeginNGS.2, with 412 disease-gene dyads and 1,603 associated therapeutic interventions, suggest high potential for clinical utility. BeginNGS.2 was positive in 7.9% of 705 infant deaths in San Diego County, a 7-fold increase over disorders screened by RUSP NBS. Reassuringly, this was ∼3-fold higher than the positive rate of BeginNGS.2 in UKB470K. An earlier pilot study yielded concordant results: 47 SCGDs were identified by GS in 46 (41%) of 112 infant deaths in San Diego County between 2015 and 2020.47 Of these, five (4.4%) would have been identified by BeginNGS.2 (CACNA1C—long QT syndrome 8 [MIM: 618447], COQ2—primary coenzyme Q10 deficiency, MOCS1—molybdenum cofactor deficiency, SCN1A—developmental and epileptic encephalopathy 6B, and TAFFAZIN—Barth syndrome). Had these infants received BeginNGS.2 at birth, together with confirmatory testing and prompt institution of indicated therapeutic interventions, morbidity and mortality may have been reduced substantially. In ten (1.4%) SOMI infant deaths and four (3.6%) of the 112 infant deaths reported by Owen et al., counterfactual analysis suggested that the BeginNGS-positive disorders were of sufficient severity and indicated therapeutic interventions of sufficient efficacy that death may have been avoidable.47 These data suggest that BeginNGS has the potential to reduce US infant mortality.

Retrospective analysis of BeginNGS.2 in 3,118 newborns and children in ICUs who received RDGS also suggested that BeginNGS had clinical utility in critically ill children. A recent review suggested that RDGS had a diagnostic rate of ∼37%, changed outcomes in ∼18% of children in ICUs tested, and led to net healthcare cost savings of $14,265 per ICU infant tested.123,124 BeginNGS.2 was positive in 7.2% of ICU infants and children, a 17-fold increase over disorders screened by RUSP NBS. Among eight children with disorders suitable for actual-counterfactual analysis, use of BeginNGS.2 would have shortened time to diagnosis by a median of 121 days (relative to RDGS) and potentially avoided life-threatening disease presentations in four children and organ damage in six. These results concur with findings from a pilot prospective clinical trial, in which five of 120 newborns in an ICU were true positive by BeginNGS.2 and are anticipated to receive changes in medical, surgical, or nutritional management.138 A similar analysis of BeginNGS.1 in 2,208 of these infants using different methods identified seven children in whom morbidity would likely have been completely avoided, 21 with most morbidity avoided, and 13 with partial avoidance.15 In the eight children herein, BeginNGS.2 would potentially have avoided healthcare cost of 181 neonatal ICU days (at an estimated 2024 average daily spending of $4,332; total, $784,092) and 150 RDGS tests (at an average reimbursement rate under Current Procedural Terminology codes of $8,521; total, $1,278,150, https://healthcostinstitute.org/hcci-originals-dropdown/all-hcci-reports/nicu-use-and-spending-1).123,124 Assuming no additional lifetime healthcare cost savings in the 3,118 infants, BeginNGS.2 would be cost neutral at a fee of $661, which is feasible given GS reagent and computation cost of ∼$220 per subject. By comparison, current NBS is considered cost effective at an average state fee of $109 (https://www.newsteps.org/data-resources/reports/nbs-fees-report). Despite Medicaid coverage policies in 12 US States, current ICU utilization of RDGS is less than 5% of those meeting indications for testing.123,124 Thus, an early target for BeginNGS implementation may be the 350,000 infants per year (9.5% of births) at time of admission to a neonatal ICU, with more expensive RDGS reserved either for those with rapidly progressing disorders or for those who screen negative (by less-expensive reinterpretation of GS generated for BeginNGS).123,124,139 These retrospective analyses were limited to children who received RDGS at our institution. A future direction will be to evaluate performance in third-party cohorts.

Numerous clinical trials of gNBS have recently commenced, each with a distinct panel of disease-gene dyads, endpoints, and unique representation of genetic ancestry.15,16,17,18,19,20,21,22,23,24,25,26,27,28,29 For example, we recently started an adaptive, multicenter clinical trial to compare clinical utility and cost-effectiveness of BeginNGS and RUSP NBS in up to 100,000 newborns, with an initial enrollment emphasis on Hispanic, Amerindian, and African American ancestry (ClinicalTrials.gov ID NCT06306521). In toto, gNBS trials worldwide are evaluating several thousand genes and SCGDs.29 As noted above, a major problem in the expansion of gNBS to hundreds of rare and ultra-rare SCGDs is that any single clinical trial will be underpowered to evaluate the clinical utility of all SCGDs, particularly ultra-rare disorders, or in all genetic ancestries needed to provide equitable accuracy to the entire population. For example, our trial will be underpowered in Polynesian and Asian genetic ancestries. It would be necessary to enroll millions of newborns to evaluate clinical utility and cost-effectiveness with stratification by disease and ancestry. While data aggregation is a possible solution, it faces significant jurisdictional, privacy, control, and consent-related objections, even when encrypted, de-identified, and anonymized. This is greatly exacerbated for genomic data that are clinical grade or have associated metadata such as age, geographic location, or phenotypic information. Extension of prevalence-based, federated learning to many gNBS trials and adult biobank cohorts would overcome most of these objections. An ideal query could be the superset of disease-gene dyads of all gNBS trials. The minimum query set would include, at the cohort level, categorical age (infant, child, young adult, older adult) and population characteristics. At the query level, it would be a list of variant coordinates with associated genes, disorders, and inheritance modes. The variants could include several hundred ancestry-informative SNPs. The minimum response set would be a list of positive diplotypes identified in the cohort together with their zygosity and count. Resultant analysis would yield a model of the genetic architecture of disease in that cohort for each informative gene, together with a list of validated and failed variants. The increase in power of query federation to evaluate precision and recall can be multiplicative rather than linear: evaluation of IDS c.641C>T in 7,728 UKB470K individuals of African ancestry, for example, was informative, while that in 419,019 UKB470K individuals of European ancestry was not. An alternative or addition to query federation in a few very large trials is a cluster of smaller trials, each enriching for a genetic ancestry. Herein, for example, we demonstrated that a cohort of only 700 infant deaths was informative. In conclusion, having demonstrated a proof of concept, we invite groups worldwide to collaborate in federated queries to define disorder prevalence, genetic architecture, and elements of natural history stratified by genetic ancestry and identify false-positive variants with a next phase goal of ∼2,000 disorders, representing the superset of disorders under evaluation by all international research studies evaluating gNBS and ∼2 million subjects.

In summary, despite the initial challenges of false positives due to P/LP variants with low/no childhood penetrance or expressivity, gNBS is feasible for hundreds of SCGDs with effective therapies. Given the magnitude of changes in knowledge between BeginNGS.1 and 0.2, frequent re-review of the underpinning structured rare disease molecular and treatment knowledge base is warranted, and BeginNGS should remain an adaptive, open platform in addition to jumping potentially to the superset of genes in all gNBS trials.29 Prevalence and diplotype-based federated learning appears to be a generalizable approach for achievement of acceptable analytic performance for population testing in newborns and older age groups.

Data and code availability

Consented proband and parent data analyzed in this study and non-human subjects data generated during this study are available at the Longitudinal Pediatric Data Resource (LPDR) under accession code nbs000003.v1.p at https://nbstrn.org/. Qualified researchers can obtain access by registration at https://nbstrn.org/login?token-expired=true&rel=/tools/lpdr. There are restrictions to the availability of raw individual data due to data privacy and confidentiality laws. Anonymized and pseudonymized individual data generated in this study, subject to the terms of informed written consent documents and state and federal laws, are provided in the supplemental information.

GTRx and the GTRx REDCap instance are available at https://gtrx.rbsapp.net/, and code is available from Christian Hansen (chansen@rchsd.org) and at https://github.com/rao-madhavrao-rcigm/gtrx. The DRAGEN Platform and Illumina Connected Analytics are available from Illumina (Shyamal Mehtalia, smehtalia@illumina.com). GEM and Transformer are available from Fabric Genomics (info@fabricgenomics.com). TileDB v.2.8.0 is available at https://github.com/TileDB-Inc/TileDB. TileDB-VCF v.0.15.0 is available at https://github.com/tiledb-inc/tiledb-vcf. Federated query data and code are available at https://github.com/rady-childrens-genomics/beginngs-partner-onboarding.

Acknowledgments

This work was supported by NIH grants UM1TR004407 from NCATS to E.J. Topol (with sub-award to S.F.K.) and R01HD101540; the Rady Children’s Institute for Genomic Medicine and Rady Children’s Hospital; research grant support from Alexion, Amgen, Chiesi Farmaceutici, Horizon Therapeutics, Inozyme Pharma, Ionis Pharmaceuticals, Mahzi Therapeutics, Orchard Therapeutics, Rocket Pharma, Sanofi, Sarepta Therapeutics, Sentinyl Therapeutics, Travere Therapeutics, and Ultragenyx; and in-kind support from Alexion, Illumina, TileDB, Genomenon, Nest Genomics, and Fabric Genomics. The California Department of Public Health is not responsible for the results or conclusions drawn by the authors of this publication. A Deo lumen, ab amicis auxilium.

This article is dedicated to the memory of Gunter Scharer, MD PhD.

Declaration of interests

K.P.H., C.M.K., and S.S.M. are employees and shareholders of Illumina, Inc. W.R.M., Y.L., and T.D. are employees and shareholders of Alexion, AstraZeneca Rare Disease. E.F. is an employee and shareholder of Fabric Genomics, Inc. M.K. and S. Schwartz are employees and shareholders of Genomenon, Inc. J.L., C.K., and S. Shelnutt are employees and shareholders of TileDB, Inc. M.Y. is a co-founder and consultant of Fabric Genomics, Inc. S.F.K. has filed a patent related to this work.

Published: December 5, 2024

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2024.10.021.

Web resources

BeginNGS, https://radygenomics.org/begin-ngs-newborn-sequencing/.

CFTR, http://cftr2.org

Genome-to-Treatment (GTRx), https://gtrx.rbsapp.net/

OMIM, https://www.omim.org/

Orphanet, https://www.orpha.net/en/disease

Supplemental information

Document S1. Supplemental methods, results, and discussion; Figures S1–S3; and supplemental table captions
mmc1.pdf (746.9KB, pdf)
Data S1. Tables S1–S15
mmc2.xlsx (5.4MB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (4MB, pdf)

References

  • 1.GUTHRIE R., SUSI A. A SIMPLE PHENYLALANINE METHOD FOR DETECTING PHENYLKETONURIA IN LARGE POPULATIONS OF NEWBORN INFANTS. Pediatrics. 1963;32:338–343. [PubMed] [Google Scholar]
  • 2.IRWIN H.R., NOTRICA S., FLEMING W. Blood phenylalanine levels of newborn infants. A routine screening program for the hospital newborn nursery. Calif. Med. 1964;101:331–333. [PMC free article] [PubMed] [Google Scholar]
  • 3.Wilson J.M.G., Jungner G., World Health Organization . World Health Organization; 1968. Principles and Practice of Screening for Disease. [Google Scholar]
  • 4.Newborn Screening: A blueprint for the future. Pediatrics. 2000;106:S383–S427. [PubMed] [Google Scholar]
  • 5.Watson M.S., Mann M.Y., Lloyd-Puryear M.A., Rinaldo P., Howell R.R. Newborn Screening: Towards a Uniform Screening Panel and System. Genet. Med. 2006;8:1S–11S. doi: 10.1542/peds.2005-2633J. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Therrell B.L., Padilla C.D., Loeber J.G., Kneisser I., Saadallah A., Borrajo G.J.C., Adams J. Current status of newborn screening worldwide. Semin. Perinatol. 2015;39:171–187. doi: 10.1053/j.semperi.2015.03.002. [DOI] [PubMed] [Google Scholar]
  • 7.Alexander D., van Dyck P.C. A vision of the future of newborn screening. Pediatrics. 2006;117:S350–S354. doi: 10.1542/peds.2005-2633O. [DOI] [PubMed] [Google Scholar]
  • 8.Collins F.S. Harper; 2010. The Language of Life: DNA and the Revolution in Personalized Medicine. [Google Scholar]
  • 9.Clayton E.W. Currents in contemporary ethics. State run newborn screening in the genomic era, or how to avoid drowning when drinking from a fire hose. J. Law Med. Ethics. 2010;38:697–700. doi: 10.1111/j.1748-720X.2010.00522.x. [DOI] [PubMed] [Google Scholar]
  • 10.Bhattacharjee A., Sokolsky T., Wyman S.K., Reese M.G., Puffenberger E., Strauss K., Morton H., Parad R.B., Naylor E.W. Development of DNA confirmatory and high-risk diagnostic testing for newborns using targeted next-generation DNA sequencing. Genet. Med. 2015;17:337–347. doi: 10.1038/gim.2014.117. [DOI] [PubMed] [Google Scholar]
  • 11.Bodian D.L., Klein E., Iyer R.K., Wong W.S.W., Kothiyal P., Stauffer D., Huddleston K.C., Gaither A.D., Remsburg I., Khromykh A., et al. Utility of whole-genome sequencing for detection of newborn screening disorders in a population cohort of 1,696 neonates. Genet. Med. 2016;18:221–230. doi: 10.1038/gim.2015.111. [DOI] [PubMed] [Google Scholar]
  • 12.Pereira S., Robinson J.O., Gutierrez A.M., Petersen D.K., Hsu R.L., Lee C.H., Schwartz T.S., Holm I.A., Beggs A.H., Green R.C., et al. Perceived Benefits, Risks, and Utility of Newborn Genomic Sequencing in the BabySeq Project. Pediatrics. 2019;143:S6–S13. doi: 10.1542/peds.2018-1099C. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Roman T.S., Crowley S.B., Roche M.I., Foreman A.K.M., O'Daniel J.M., Seifert B.A., Lee K., Brandt A., Gustafson C., DeCristo D.M., et al. Genomic Sequencing for Newborn Screening: Results of the NC NEXUS Project. Am. J. Hum. Genet. 2020;107:596–611. doi: 10.1016/j.ajhg.2020.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Adhikari A.N., Gallagher R.C., Wang Y., Currier R.J., Amatuni G., Bassaganyas L., Chen F., Kundu K., Kvale M., Mooney S.D., et al. The role of exome sequencing in newborn screening for inborn errors of metabolism. Nat. Med. 2020;26:1392–1397. doi: 10.1038/s41591-020-0966-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kingsmore S.F., Smith L.D., Kunard C.M., Bainbridge M., Batalov S., Benson W., Blincow E., Caylor S., Chambers C., Del Angel G., et al. A genome sequencing system for universal newborn screening, diagnosis, and precision medicine for severe genetic diseases. Am. J. Hum. Genet. 2022;109:1605–1619. doi: 10.1016/j.ajhg.2022.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kingsmore S.F., BeginNGS Consortium Dispatches from Biotech beginning BeginNGS: Rapid newborn genome sequencing to end the diagnostic and therapeutic odyssey. Am. J. Med. Genet. C Semin. Med. Genet. 2022;190:243–256. doi: 10.1002/ajmg.c.32005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Owen M.J., Lefebvre S., Hansen C., Kunard C.M., Dimmock D.P., Smith L.D., Scharer G., Mardach R., Willis M.J., Feigenbaum A., et al. An automated 13.5-hour system for scalable diagnosis and acute management guidance for genetic diseases. Nat. Commun. 2022;13:4057. doi: 10.1038/s41467-022-31446-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Smith L., Willis M., Feigenbaum A., Scharer G., Mardach R., Hansen C., Kingsmore S. Genome-to-Treatment and Begin Newborn Genomic Screening: A Review of System Guides for the Acute Management and Newborn Screening Follow-up of Genetic Disorders in Infants and Children. Med. Res. Arch. 2023;11:1–20. doi: 10.18103/mra.v11i10.4528. [DOI] [Google Scholar]
  • 19.Jian M., Wang X., Sui Y., Fang M., Feng C., Huang Y., Liu C., Guo R., Guan Y., Gao Y., et al. A pilot study of assessing whole genome sequencing in newborn screening in unselected children in China. Clin. Transl. Med. 2022;12 doi: 10.1002/ctm2.843. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Xiang J., Zhang H., Sun X., Zhang J., Xu Z., Sun J., Peng Z. Utility of Whole Genome Sequencing for Population Screening of Deafness-Related Genetic Variants and Cytomegalovirus Infection in Newborns. Front. Genet. 2022;13 doi: 10.3389/fgene.2022.883617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Pichini A., Ahmed A., Patch C., Bick D., Leblond M., Kasperaviciute D., Deen D., Wilde S., Garcia Noriega S., Matoko C., et al. Developing a National Newborn Genomes Program: An Approach Driven by Ethics, Engagement and Co-design. Front. Genet. 2022;13 doi: 10.3389/fgene.2022.866168. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.White S., Mossfield T., Fleming J., Barlow-Stewart K., Ghedia S., Dickson R., Richards F., Bombard Y., Wiley V. Expanding the Australian Newborn Blood Spot Screening Program using genomic sequencing: do we want it and are we ready? Eur. J. Hum. Genet. 2023;31:703–711. doi: 10.1038/s41431-023-01311-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Brennenstuhl H., Schaaf C.P. Genomic newborn screening-research approaches, challenges, and opportunities. Bundesgesundheitsblatt - Gesundheitsforsch. - Gesundheitsschutz. 2023;66:1232–1242. doi: 10.1007/s00103-023-03777-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Garnier N., Berghout J., Zygmunt A., Singh D., Huang K.A., Kantz W., Blankart C.R., Gillner S., Zhao J., Roettger R., et al. Genetic newborn screening and digital technologies: A project protocol based on a dual approach to shorten the rare diseases diagnostic path in Europe. PLoS One. 2023;18 doi: 10.1371/journal.pone.0293503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kiewiet G., Westra D., de Boer E.N., van Berkel E., Hofste T.G.J., van Zweeden M., Derks R.C., Leijsten N.F.A., Ruiterkamp-Versteeg M.H.A., Charbon B., et al. Future of Dutch NGS-Based Newborn Screening: Exploring the Technical Possibilities and Assessment of a Variant Classification Strategy. Int. J. Neonatal Screen. 2024;10:20. doi: 10.3390/ijns10010020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Lunke S., Bouffler S.E., Downie L., Caruana J., Amor D.J., Archibald A., Bombard Y., Christodoulou J., Clausen M., De Fazio P., et al. Prospective cohort study of genomic newborn screening: BabyScreen+ pilot study protocol. BMJ Open. 2024;14 doi: 10.1136/bmjopen-2023-081426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ziegler A., Koval-Burt C., Kay D.M., Suchy S.F., Begtrup A., Langley K.G., Hernan R., Amendola L.M., Boyd B.M., Bradley J., et al. Expanded Newborn Screening Using Genome Sequencing for Early Actionable Conditions. JAMA. 2024;24 doi: 10.1001/jama.2024.19662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Cope H.L., Milko L.V., Jalazo E.R., Crissman B.G., Foreman A.K.M., Powell B.C., DeJong N.A., Hunter J.E., Boyea B.L., Forsythe A.N., et al. A systematic framework for selecting gene-condition pairs for inclusion in newborn sequencing panels: Early Check implementation. Genet. Med. 2024;26 doi: 10.1016/j.gim.2024.101290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Minten T., Gold N.B., Bick S., Adelson S., Gehlenborg N., Amendola L.M., Boemer F., Coffey A.J., Encina N., Ferlini A., et al. Determining the characteristics of genetic disorders that predict inclusion in newborn genomic sequencing programs. medRxiv. 2024 doi: 10.1101/2024.03.24.24304797. Preprint at. [DOI] [Google Scholar]
  • 30.Teo Z.L., Jin L., Liu N., Li S., Miao D., Zhang X., Ng W.Y., Tan T.F., Lee D.M., Chua K.J., et al. Federated machine learning in healthcare: A systematic review on clinical applications and technical architecture. Cell Rep. Med. 2024;5 doi: 10.1016/j.xcrm.2024.101481. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Raimondi D., Chizari H., Verplaetse N., Löscher B.S., Franke A., Moreau Y. Genome interpretation in a federated learning context allows the multi-center exome-based risk prediction of Crohn's disease patients. Sci. Rep. 2023;13 doi: 10.1038/s41598-023-46887-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Kirienko M., Sollini M., Ninatti G., Loiacono D., Giacomello E., Gozzi N., Amigoni F., Mainardi L., Lanzi P.L., Chiti A. Distributed learning: a reliable privacy-preserving strategy to change multicenter collaborations using AI. Eur. J. Nucl. Med. Mol. Imaging. 2021;48:3791–3804. doi: 10.1007/s00259-021-05339-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Yang Q., Liu Y., Chen T., Tong Y. Federated Machine Learning: Concept and Applications. ACM Trans. Intell. Syst. Technol. 2019;10:1–19. doi: 10.48550/arXiv.1902.04885. [DOI] [Google Scholar]
  • 34.Kolobkov D., Mishra Sharma S., Medvedev A., Lebedev M., Kosaretskiy E., Vakhitov R. Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project. Front. Big Data. 2024;7 doi: 10.3389/fdata.2024.1266031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Clemens D.J., Lentino A.R., Kapplinger J.D., Ye D., Zhou W., Tester D.J., Ackerman M.J. Using the genome aggregation database, computational pathogenicity prediction tools, and patch clamp heterologous expression studies to demote previously published long QT syndrome type 1 mutations from pathogenic to benign. Heart Rhythm. 2018;15:555–561. doi: 10.1016/j.hrthm.2017.11.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Labbe T., Castel P., Sanner J.M., Saleh M. ChatGPT for phenotypes extraction: one model to rule them all? Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2023;2023:1–4. doi: 10.1109/EMBC40787.2023.10340611. [DOI] [PubMed] [Google Scholar]
  • 37.Kim J., Wang K., Weng C., Liu C., Liu C. Assessing the Utility of Large Language Models for Phenotype-Driven Gene Prioritization in Rare Genetic Disorder Diagnosis. Am. J. Hum. Genet. 2024;111:2190–2202. doi: 10.1016/j.ajhg.2024.08.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.De La Vega F.M., Chowdhury S., Moore B., Frise E., McCarthy J., Hernandez E.J., Wong T., James K., Guidugli L., Agrawal P.B., et al. Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases. Genome Med. 2021;13:153. doi: 10.1186/s13073-021-00965-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Backman J.D., Li A.H., Marcketta A., Sun D., Mbatchou J., Kessler M.D., Benner C., Liu D., Locke A.E., Balasubramanian S., et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. 2021;599:628–634. doi: 10.1038/s41586-021-04103-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Fry A., Littlejohns T.J., Sudlow C., Doherty N., Adamska L., Sprosen T., Collins R., Allen N.E. Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. Am. J. Epidemiol. 2017;186:1026–1034. doi: 10.1093/aje/kwx246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Szustakowski J.D., Balasubramanian S., Kvikstad E., Khalid S., Bronson P.G., Sasson A., Wong E., Liu D., Wade Davis J., Haefliger C., et al. Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank. Nat. Genet. 2021;53:942–948. doi: 10.1038/s41588-021-00885-0. [DOI] [PubMed] [Google Scholar]
  • 42.Constantinescu A.E., Mitchell R.E., Zheng J., Bull C.J., Timpson N.J., Amulic B., Vincent E.E., Hughes D.A. A framework for research into continental ancestry groups of the UK Biobank. Hum. Genomics. 2022;16:3. doi: 10.1186/s40246-022-00380-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Ziyatdinov A., Torres J., Alegre-Díaz J., Backman J., Mbatchou J., Turner M., Gaynor S.M., Joseph T., Zou Y., Liu D., et al. Genotyping, sequencing and analysis of 140,000 adults from Mexico City. Nature. 2023;622:784–793. doi: 10.1038/s41586-024-07051-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Tapia-Conyer R., Kuri-Morales P., Alegre-Díaz J., Whitlock G., Emberson J., Clark S., Peto R., Collins R. Cohort profile: the Mexico City Prospective Study. Int. J. Epidemiol. 2006;35:243–249. doi: 10.1093/ije/dyl042. [DOI] [PubMed] [Google Scholar]
  • 45.Addey T., Alegre-Díaz J., Bragg F., Trichia E., Wade R., Santacruz-Benitez R., Ramirez-Reyes R., Garcilazo-Ávila A., Gonzáles-Carballo C., Bello-Chavolla O.Y., et al. Educational and social inequalities and cause-specific mortality in Mexico City: a prospective study. Lancet Public Health. 2023;8:e670–e679. doi: 10.1016/S2468-2667(23)00153-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Ding Y., Owen M., Le J., Batalov S., Chau K., Kwon Y.H., Van Der Kraan L., Bezares-Orin Z., Zhu Z., Veeraraghavan N., et al. Scalable, high quality, whole genome sequencing from archived, newborn, dried blood spots. NPJ Genom. Med. 2023;8:5. doi: 10.1038/s41525-023-00349-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Owen M.J., Wright M.S., Batalov S., Kwon Y., Ding Y., Chau K.K., Chowdhury S., Sweeney N.M., Kiernan E., Richardson A., et al. Reclassification of the Etiology of Infant Mortality With Whole-Genome Sequencing. JAMA Netw. Open. 2023;6 doi: 10.1001/jamanetworkopen.2022.54069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Helmer-Hirschberg O. RAND Corporation; 1967. Analysis of the Future: The Delphi Method. [Google Scholar]
  • 49.Yousuf M.I. Using experts' opinions through Delphi technique. Pract Assess Res Eval. 2007;12:4. doi: 10.7275/rrph-t210. [DOI] [Google Scholar]
  • 50.Amberger J.S., Bocchini C.A., Scott A.F., Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019;47:D1038–D1043. doi: 10.1093/nar/gky1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Chunn L.M., Nefcy D.C., Scouten R.W., Tarpey R.P., Chauhan G., Lim M.S., Elenitoba-Johnson K.S.J., Schwartz S.A., Kiel M.J. Mastermind: A Comprehensive Genomic Association Search Engine for Empirical Evidence Curation and Genetic Variant Interpretation. Front. Genet. 2020;11 doi: 10.3389/fgene.2020.577152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Nkonge K.M., Nkonge D.K., Nkonge T.N. The epidemiology, molecular pathogenesis, diagnosis, and treatment of maturity-onset diabetes of the young (MODY) Clin Diabetes Endocrinol. 2020;6:20. doi: 10.1186/s40842-020-00112-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Shepherd M.H., Shields B.M., Hudson M., Pearson E.R., Hyde C., Ellard S., Hattersley A.T., Patel K.A. UNITED study. A UK nationwide prospective study of treatment change in MODY: genetic subtype and clinical characteristics predict optimal glycaemic control after discontinuing insulin and metformin. Diabetologia. 2018;61:2520–2527. doi: 10.1007/s00125-018-4728-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Shields B.M., Hicks S., Shepherd M.H., Colclough K., Hattersley A.T., Ellard S. Maturity-onset diabetes of the young (MODY): how many cases are we missing? Diabetologia. 2010;53:2504–2508. doi: 10.1007/s00125-010-1799-4. [DOI] [PubMed] [Google Scholar]
  • 55.Snider K.E., Becker S., Boyajian L., Shyng S.L., MacMullen C., Hughes N., Ganapathy K., Bhatti T., Stanley C.A., Ganguly A. Genotype and phenotype correlations in 417 children with congenital hyperinsulinism. J Clin Endocrinol Metab. 2013;98:E355–E363. doi: 10.1210/jc.2012-2169. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Shepherd M., Shields B., Hammersley S., Hudson M., McDonald T.J., Colclough K., Oram R.A., Knight B., Hyde C., Cox J., et al. Systematic Population Screening, Using Biomarkers and Genetic Testing, Identifies 2.5% of the U.K. Pediatric Diabetes Population With Monogenic Diabetes. Diabetes Care. 2016;39:1879–1888. doi: 10.2337/dc16-0645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Berglund A., Ornstrup M.J., Lind-Holst M., Dunø M., Bækvad-Hansen M., Juul A., Borch L., Jørgensen N., Rasmussen Å.K., Andersen M., et al. Epidemiology and diagnostic trends of congenital adrenal hyperplasia in Denmark: a retrospective, population-based study. Lancet Reg Health Eur. 2023;28 doi: 10.1016/j.lanepe.2023.100598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.van der Linde A.A.A., Schönbeck Y., van der Kamp H.J., van den Akker E.L.T., van Albada M.E., Boelen A., Finken M.J.J., Hannema S.E., Hoorweg-Nijman G., Odink R.J., et al. Evaluation of the Dutch neonatal screening for congenital adrenal hyperplasia. Arch Dis Child. 2019;104:653–657. doi: 10.1136/archdischild-2018-315972. [DOI] [PubMed] [Google Scholar]
  • 59.Neocleous V., Fanis P., Toumba M., Stylianou C., Picolos M., Andreou E., Kyriakou A., Iasonides M., Nicolaou S., Kyriakides T.C., et al. The Spectrum of Genetic Defects in Congenital Adrenal Hyperplasia in the Population of Cyprus: A Retrospective Analysis. Horm Metab Res. 2019;51:586–594. doi: 10.1055/a-0957-3297. [DOI] [PubMed] [Google Scholar]
  • 60.Barbaro M., Soardi F.C., Östberg L.J., Persson B., de Mello M.P., Wedell A., Lajic S. In vitro functional studies of rare CYP21A2 mutations and establishment of an activity gradient for nonclassic mutations improve phenotype predictions in congenital adrenal hyperplasia. Clin. Endocrinol. 2015;82:37–44. doi: 10.1111/cen.12526. [DOI] [PubMed] [Google Scholar]
  • 61.Baumgartner-Parzer S., Witsch-Baumgartner M., Hoeppner W. EMQN best practice guidelines for molecular genetic testing and reporting of 21-hydroxylase deficiency. Eur J Hum Genet. 2020;28:1341–1367. doi: 10.1038/s41431-020-0653-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Boussaroque A., Audrézet M.P., Raynal C., Sermet-Gaudelus I., Bienvenu T., Férec C., Bergougnoux A., Lopez M., Scotet V., Munck A., Girodon E. Penetrance is a critical parameter for assessing the disease liability of CFTR variants. J Cyst Fibros. 2020;19:949–954. doi: 10.1016/j.jcf.2020.03.019. [DOI] [PubMed] [Google Scholar]
  • 63.Barton A.R., Hujoel M.L.A., Mukamel R.E., Sherman M.A., Loh P.R. A spectrum of recessiveness among Mendelian disease variants in UK Biobank. Am J Hum Genet. 2022;109:1298–1307. doi: 10.1016/j.ajhg.2022.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Badiu Tișa I., Achim A.C., Cozma-Petruț A. The Importance of Neonatal Screening for Galactosemia. Nutrients. 2022;15:10. doi: 10.3390/nu15010010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Simurda T., Asselta R., Zolkova J., Brunclikova M., Dobrotova M., Kolkova Z., Loderer D., Skornova I., Hudecek J., Lasabova Z., et al. Congenital Afibrinogenemia and Hypofibrinogenemia: Laboratory and Genetic Testing in Rare Bleeding Disorders with Life-Threatening Clinical Manifestations and Challenging Management. Diagnostics (Basel) 2021;11:2140. doi: 10.3390/diagnostics11112140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ergoren M.C., Ismail A.B. Reference Module in Biomedical Sciences. Elsevier; 2023. Rare Thrombophilic Disorder.https://www.sciencedirect.com/science/article/pii/B9780443157172000329 [Google Scholar]
  • 67.Person D.A. 2019. Agammaglobulinemia.https://emedicine.medscape.com/article/884942-overview [Google Scholar]
  • 68.Koromina M., Pandi M.T., van der Spek P.J., Patrinos G.P., Lauschke V.M. The ethnogeographic variability of genetic factors underlying G6PD deficieny. Pharmacol Res. 2021;173 doi: 10.1016/j.phrs.2021.105904. [DOI] [PubMed] [Google Scholar]
  • 69.Geck R.C., Powell N.R., Dunham M.J. Functional interpretation, cataloging, and analysis of 1,341 glucose-6-phosphate dehydrogenase variants. Am. J. Hum. Genet. 2023;110:228–239. doi: 10.1016/j.ajhg.2023.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Powell N.R., Geck R.C., Lai D., Shugg T., Skaar T.C., Dunham M. Functional Analysis of G6PD Variants Associated With Low G6PD Activity in the All of Us Research Program. medRxiv. 2024 doi: 10.1093/genetics/iyae170. 2024.04.12.24305393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Wolf B. In: Adam M.P., Feldman J., Mirzaa G.M., Pagon R.A., Wallace S.E., Bean L.J.H., Gripp K.W., Amemiya A., editors. GeneReviews; 2023. Biotinidase Deficiency. [Google Scholar]
  • 72.Hayesmoore J.B., Bhuiyan Z.A., Coviello D.A., du Sart D., Edwards M., Iascone M., Morris-Rosendahl D.J., Sheils K., van Slegtenhorst M., Thomson K.L. EMQN: Recommendations for genetic testing in inherited cardiomyopathies and arrhythmias. Eur J Hum Genet. 2023;31:1003–1009. doi: 10.1038/s41431-023-01421-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Coll M., Pérez-Serra A., Mates J., Del Olmo B., Puigmulé M., Fernandez-Falgueras A., Iglesias A., Picó F., Lopez L., Brugada R., et al. Incomplete Penetrance and Variable Expressivity: Hallmarks in Channelopathies Associated with Sudden Cardiac Death. Biology (Basel). 2017;7:3. doi: 10.3390/biology7010003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Akai J., Makita N., Sakurada H., Shirai N., Ueda K., Kitabatake A., Nakazawa K., Kimura A., Hiraoka M. A novel SCN5A mutation associated with idiopathic ventricular fibrillation without typical ECG findings of Brugada syndrome. FEBS Lett. 2000;479:29–34. doi: 10.1016/s0014-5793(00)01875-5. [DOI] [PubMed] [Google Scholar]
  • 75.Verheul L.M., van der Ree M.H., Groeneveld S.A., Mulder B.A., Christiaans I., Kapel G.F.L., Alings M., Bootsma M., Barge-Schaapveld D.Q.C.M., Balt J.C., et al. The genetic basis of apparently idiopathic ventricular fibrillation: a retrospective overview. Europace. 2023;25 doi: 10.1093/europace/euad336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Villarreal-Molina T., García-Ordóñez G.P., Reyes-Quintero Á.E., Domínguez-Pérez M., Jacobo-Albavera L., Nava S., Carnevale A., Medeiros-Domingo A., Iturralde P. Clinical Spectrum of SCN5A Channelopathy in Children with Primary Electrical Disease and Structurally Normal Hearts. Genes (Basel) 2021;13:16. doi: 10.3390/genes13010016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Wilde A.A.M., Amin A.S. Clinical Spectrum of SCN5A Mutations: Long QT Syndrome, Brugada Syndrome, and Cardiomyopathy. JACC Clin Electrophysiol. 2018;4:569–579. doi: 10.1016/j.jacep.2018.03.006. [DOI] [PubMed] [Google Scholar]
  • 78.Ackerman M.J. Genetic purgatory and the cardiac channelopathies: Exposing the variants of uncertain/unknown significance issue. Heart Rhythm. 2015;12:2325–2331. doi: 10.1016/j.hrthm.2015.07.002. [DOI] [PubMed] [Google Scholar]
  • 79.Van Driest S.L., Wells Q.S., Stallings S., Bush W.S., Gordon A., Nickerson D.A., Kim J.H., Crosslin D.R., Jarvik G.P., Carrell D.S., et al. Association of Arrhythmia-Related Genetic Variants With Phenotypes Documented in Electronic Medical Records. JAMA. 2016;315:47–57. doi: 10.1001/jama.2015.17701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Walsh R., Lahrouchi N., Tadros R., Kyndt F., Glinge C., Postema P.G., Amin A.S., Nannenberg E.A., Ware J.S., Whiffin N., et al. Enhancing rare variant interpretation in inherited arrhythmias through quantitative analysis of consortium disease cohorts and population controls. Genet Med. 2021;23:47–58. doi: 10.1038/s41436-020-00946-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.McGurk K.A., Zhang X., Theotokis P., Thomson K., Harper A., Buchan R.J., Mazaika E., Ormondroyd E., Wright W.T., Macaya D., et al. The penetrance of rare variants in cardiomyopathy-associated genes: A cross-sectional approach to estimating penetrance for secondary findings. Am J Hum Genet. 2023;110:1482–1495. doi: 10.1016/j.ajhg.2023.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Lipov A., Jurgens S.J., Mazzarotto F., Allouba M., Pirruccello J.P., Aguib Y., Gennarelli M., Yacoub M.H., Ellinor P.T., Bezzina C.R., et al. Exploring the complex spectrum of dominance and recessiveness in genetic cardiomyopathies. Nat Cardiovasc Res. 2023;2:1078–1094. doi: 10.1038/s44161-023-00346-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Miller I.O., Sotero de Menezes M.A. In: SCN1A Seizure Disorders. Adam M.P., Feldman J., Mirzaa G.M., Pagon R.A., Wallace S.E., Bean L.J.H., Gripp K.W., Amemiya A., editors. GeneReviews; 2022. [PubMed] [Google Scholar]
  • 84.Fan Z.C., Ni J.W., Yang L., Hu L.Y., Ma S.M., Mei M., Sun B.J., Wang H.J., Zhou W.H. Uncovering the molecular pathogenesis of congenital hyperinsulinism by panel gene sequencing in 32 Chinese patients. Mol Genet Genomic Med. 2015;3:526–536. doi: 10.1002/mgg3.162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Bokhari S.R.A., Zulfiqar H., Hariz A. StatPearls [Internet] StatPearls Publishing; Treasure Island (FL): 2023. Fabry Disease. [Google Scholar]
  • 86.West J., Stilwell P., Liu H., Ban L., Bythell M., Card T.R., Lanyon P., Nanduri V., Rankin J., Bishton M.J., et al. Temporal Trends in the Incidence of Hemophagocytic Lymphohistiocytosis: A Nationwide Cohort Study From England 2003-2018. Hemasphere. 2022;6:e797. doi: 10.1097/HS9.0000000000000797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Iorio A., Stonebraker J.S., Chambost H., Makris M., Coffin D., Herr C., Germini F., Data and Demographics Committee of the World Federation of Hemophilia Establishing the Prevalence and Prevalence at Birth of Hemophilia in Males: A Meta-analytic Approach Using National Registries. Ann Intern Med. 2019;171:540–546. doi: 10.7326/M19-1208. [DOI] [PubMed] [Google Scholar]
  • 88.Johnsen J.M., Fletcher S.N., Dove A., McCracken H., Martin B.K., Kircher M., Josephson N.C., Shendure J., Ruuska S.E., Valentino L.A., et al. Results of genetic analysis of 11 341 participants enrolled in the My Life, Our Future hemophilia genotyping initiative in the United States. J. Thromb. Haemost. 2022;20:2022–2034. doi: 10.1111/jth.15805. [DOI] [PubMed] [Google Scholar]
  • 89.Johnsen J.M., Fletcher S.N., Huston H., Roberge S., Martin B.K., Kircher M., Josephson N.C., Shendure J., Ruuska S., Koerper M.A., et al. Novel approach to genetic analysis and results in 3000 hemophilia patients enrolled in the My Life, Our Future initiative. Blood Adv. 2017;1:824–834. doi: 10.1182/bloodadvances.2016002923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Xu Z., Spencer H.J., Harris V.A., Perkins S.J. An updated interactive database for 1692 genetic variants in coagulation factor IX provides detailed insights into hemophilia B. J. Thromb. Haemost. 2023;21:1164–1176. doi: 10.1016/j.jtha.2023.02.005. [DOI] [PubMed] [Google Scholar]
  • 91.Yoshino H., Nishioka K., Li Y., Oji Y., Oyama G., Hatano T., Machida Y., Shimo Y., Hayashida A., Ikeda A., et al. GCH1 mutations in dopa-responsive dystonia and Parkinson's disease. J Neurol. 2018;265:1860–1870. doi: 10.1007/s00415-018-8930-8. [DOI] [PubMed] [Google Scholar]
  • 92.Andrews J., Erdjument H., Nicholson D.C. Hereditary coproporphyria: incidence in a large English family. J Med Genet. 1984;21:341–349. doi: 10.1136/jmg.21.5.341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Lamoril J., Puy H., Whatley S.D., Martin C., Woolf J.R., Da Silva V., Deybach J.C., Elder G.H. Characterization of mutations in the CPO gene in British patients demonstrates absence of genotype-phenotype correlation and identifies relationship between hereditary coproporphyria and harderoporphyria. Am J Hum Genet. 2001;68:1130–1138. doi: 10.1086/320118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Ibarra Moreno C.A., Hu S., Kraeva N., Schuster F., Johannsen S., Rueffert H., Klingler W., Heytens L., Riazi S. An Assessment of Penetrance and Clinical Expression of Malignant Hyperthermia in Individuals Carrying Diagnostic Ryanodine Receptor 1 Gene Mutations. Anesthesiology. 2019;131:983–991. doi: 10.1097/ALN.0000000000002813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Monnier N., Krivosic-Horber R., Payen J.F., Kozak-Ribbens G., Nivoche Y., Adnet P., Reyford H., Lunardi J. Presence of two different genetic traits in malignant hyperthermia families: implication for genetic analysis, diagnosis, and incidence of malignant hyperthermia susceptibility. Anesthesiology. 2002;97:1067–1074. doi: 10.1097/00000542-200211000-00007. [DOI] [PubMed] [Google Scholar]
  • 96.Rosenberg H., Pollock N., Schiemann A., Bulger T., Stowell K. Malignant hyperthermia: a review. Orphanet J Rare Dis. 2015;10:93. doi: 10.1186/s13023-015-0310-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Ibarra M.C.A., Wu S., Murayama K., Minami N., Ichihara Y., Kikuchi H., Noguchi S., Hayashi Y.K., Ochiai R., Nishino I. Malignant hyperthermia in Japan: mutation screening of the entire ryanodine receptor type 1 gene coding region by direct sequencing. Anesthesiology. 2006;104:1146–1154. doi: 10.1097/00000542-200606000-00008. [DOI] [PubMed] [Google Scholar]
  • 98.Finsterer J. Prevalence in congenital myasthenic syndrome. Eur J Paediatr Neurol. 2020;26:5–6. doi: 10.1016/j.ejpn.2020.04.011. [DOI] [PubMed] [Google Scholar]
  • 99.Finsterer J. Congenital myasthenic syndromes. Orphanet J Rare Dis. 2019;14:57. doi: 10.1186/s13023-019-1025-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Clarke L.A. In: Mucopolysaccharidosis Type I. Adam M.P., Feldman J., Mirzaa G.M., Pagon R.A., Wallace S.E., Amemiya A., editors. GeneReviews; 2024. [Google Scholar]
  • 101.Restrepo-Cordoba M.A., Wahbi K., Florian A.R., Jiménez-Jáimez J., Politano L., Arad M., Climent-Paya V., Garcia-Alvarez A., Hansen R.B., Larrañaga-Moreira J.M., et al. Prevalence and clinical outcomes of dystrophin-associated dilated cardiomyopathy without severe skeletal myopathy. Eur J Heart Fail. 2021;23:1276–1286. doi: 10.1002/ejhf.2250. [DOI] [PubMed] [Google Scholar]
  • 102.Whitehead N., Erickson S.W., Cai B., McDermott S., Peay H., Howard J.F., Ouyang L., Muscular Dystrophy Surveillance Tracking and Research Network Sources of variation in estimates of Duchenne and Becker muscular dystrophy prevalence in the United States. Orphanet J. Rare Dis. 2023;18:65. doi: 10.1186/s13023-023-02662-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Broomfield J., Hill M., Guglieri M., Crowther M., Abrams K. Life Expectancy in Duchenne Muscular Dystrophy: Reproduced Individual Patient Data Meta-analysis. Neurology. 2021;97:e2304–e2314. doi: 10.1212/WNL.0000000000012910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Crisafulli S., Sultana J., Fontana A., Salvo F., Messina S., Trifirò G. Global epidemiology of Duchenne muscular dystrophy: an updated systematic review and meta-analysis. Orphanet J Rare Dis. 2020;15:141. doi: 10.1186/s13023-020-01430-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Gaughan S., Ayres L., Baker P.R.I.I. In: Hereditary Fructose Intolerance. Adam M.P., Feldman J., Mirzaa G.M., Pagon R.A., Wallace S.E., Amemiya A., editors. GeneReviews; 2021. [PubMed] [Google Scholar]
  • 106.Smith C.I.E., Berglöf A. In: GeneReviews. Adam M.P., Feldman J., Mirzaa G.M., Pagon R.A., Wallace S.E., Amemiya A., editors. 2024. X-Linked Agammaglobulinemia. [Google Scholar]
  • 107.Varon R., Demuth I., Chrzanowska K.H. In: Nijmegen Breakage Syndrome. Adam M.P., Feldman J., Mirzaa G.M., Pagon R.A., Wallace S.E., Amemiya A., editors. GeneReviews; 2023. [PubMed] [Google Scholar]
  • 108.Dorsey M.J., Puck J.M. Newborn Screening for Severe Combined Immunodeficiency in the United States: Lessons Learned. Immunol Allergy Clin North Am. 2019;39:1–11. doi: 10.1016/j.iac.2018.08.002. [DOI] [PubMed] [Google Scholar]
  • 109.Adam M.P., Feldman J., Mirzaa G.M., Pagon R.A., Wallace S.E., Amemiya A., editors. GeneReviews®. University of Washington; 1993. [Google Scholar]
  • 110.Cystic Fibrosis Foundation . Bethesda: Cystic Fibrosis Foundation; 2011. Cystic Fibrosis Foundation Patient Registry 2011 Annual Data Report. [Google Scholar]
  • 111.Lo R.S., Cromie G.A., Tang M., Teng K., Owens K., Sirr A., Kutz J.N., Morizono H., Caldovic L., Ah Mew N., et al. The functional impact of 1,570 individual amino acid substitutions in human OTC. Am. J. Hum. Genet. 2023;110:863–879. doi: 10.1016/j.ajhg.2023.03.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Johnston J.J., Dirksen R.T., Girard T., Hopkins P.M., Kraeva N., Ognoon M., Radenbaugh K.B., Riazi S., Robinson R.L., Saddic Iii L.A., et al. Updated variant curation expert panel criteria and pathogenicity classifications for 251 variants for RYR1-related malignant hyperthermia susceptibility. Hum. Mol. Genet. 2022;31:4087–4093. doi: 10.1093/hmg/ddac145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Koller D., Friedman N. The MIT Press; 2009. Probabilistic Graphical Models: Principles and Techniques. [Google Scholar]
  • 114.Yu T.W., Kingsmore S.F., Green R.C., MacKenzie T., Wasserstein M., Caggana M., Gold N.B., Kennedy A., Kishnani P.S., Might M., et al. Are we prepared to deliver gene-targeted therapies for rare diseases? Am. J. Med. Genet. C Semin. Med. Genet. 2023;193:7–12. doi: 10.1002/ajmg.c.32029. [DOI] [PubMed] [Google Scholar]
  • 115.Salari N., Fatahi B., Valipour E., Kazeminia M., Fatahian R., Kiaei A., Shohaimi S., Mohammadi M. Global prevalence of Duchenne and Becker muscular dystrophy: a systematic review and meta-analysis. J. Orthop. Surg. Res. 2022;17:96. doi: 10.1186/s13018-022-02996-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Schlüter D.K., Southern K.W., Dryden C., Diggle P., Taylor-Robinson D. Impact of newborn screening on outcomes and social inequalities in cystic fibrosis: a UK CF registry-based study. Thorax. 2020;75:123–131. doi: 10.1136/thoraxjnl-2019-213179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.de Camargo Pinto L.L., Maluf S.W., Leistner-Segal S., Zimmer da Silva C., Brusius-Facchin A., Burin M.G., Brustolin S., Llerena J., Moraes L., Vedolin L., et al. Are MPS II heterozygotes actually asymptomatic? A study based on clinical and biochemical data, X-inactivation analysis and imaging evaluations. Am. J. Med. Genet. 2011;155A:50–57. doi: 10.1002/ajmg.a.33770. [DOI] [PubMed] [Google Scholar]
  • 118.Burton B.K., Shively V., Quadri A., Warn L., Burton J., Grange D.K., Christensen K., Groepper D., Ashbaugh L., Ehrhardt J., Basheeruddin K. Newborn screening for mucopolysaccharidosis type II: Lessons learned. Mol. Genet. Metab. 2023;140 doi: 10.1016/j.ymgme.2023.107557. [DOI] [PubMed] [Google Scholar]
  • 119.Burton B.K., Jego V., Mikl J., Jones S.A. Survival in idursulfase-treated and untreated patients with mucopolysaccharidosis type II: data from the Hunter Outcome Survey (HOS) J. Inherit. Metab. Dis. 2017;40:867–874. doi: 10.1007/s10545-017-0075-x. [DOI] [PubMed] [Google Scholar]
  • 120.Gillis D. In: GeneReviews®. Adam M.P., Feldman J., Mirzaa G.M., Pagon R.A., Wallace S.E., Amemiya A., editors. University of Washington; 2003. Familial Hyperinsulinism. [Google Scholar]
  • 121.Whiffin N., Minikel E., Walsh R., O'Donnell-Luria A.H., Karczewski K., Ing A.Y., Barton P.J.R., Funke B., Cook S.A., MacArthur D., Ware J.S. Using high-resolution variant frequencies to empower clinical genome interpretation. Genet. Med. 2017;19:1151–1158. doi: 10.1038/gim.2017.26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Whiffin N., Roberts A.M., Minikel E., Zappala Z., Walsh R., O'Donnell-Luria A.H., Karczewski K.J., Harrison S.M., Thomson K.L., Sage H., et al. Using High-Resolution Variant Frequencies Empowers Clinical Genome Interpretation and Enables Investigation of Genetic Architecture. Am. J. Hum. Genet. 2019;104:187–190. doi: 10.1016/j.ajhg.2018.11.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Kingsmore S.F., Nofsinger R., Ellsworth K. Rapid genomic sequencing for genetic disease diagnosis and therapy in intensive care units: a review. NPJ Genom. Med. 2024;9:17. doi: 10.1038/s41525-024-00404-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Kingsmore S.F., Cole F.S. The Role of Genome Sequencing in Neonatal Intensive Care Units. Annu. Rev. Genomics Hum. Genet. 2022;23:427–448. doi: 10.1146/annurev-genom-120921-103442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Owen M.J., Niemi A.K., Dimmock D.P., Speziale M., Nespeca M., Chau K.K., Van Der Kraan L., Wright M.S., Hansen C., Veeraraghavan N., et al. Rapid Sequencing-Based Diagnosis of Thiamine Metabolism Dysfunction Syndrome. N. Engl. J. Med. 2021;384:2159–2161. doi: 10.1056/NEJMc2100365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Bergsten E., Horne A., Aricó M., Astigarraga I., Egeler R.M., Filipovich A.H., Ishii E., Janka G., Ladisch S., Lehmberg K., et al. Confirmed efficacy of etoposide and dexamethasone in HLH treatment: long-term results of the cooperative HLH-2004 study. Blood. 2017;130:2728–2738. doi: 10.1182/blood-2017-06-788349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Last J.M., Adelaide D.P.H. The iceberg: ‘Completing the clinical picture’ in general practice. Lancet. 2013;42:1608–1613. doi: 10.1093/ije/dyt113. [DOI] [PubMed] [Google Scholar]
  • 128.Last J.M. Commentary: The iceberg revisited. Int. J. Epidemiol. 2013;42:1613–1615. doi: 10.1093/ije/dyt112. [DOI] [PubMed] [Google Scholar]
  • 129.Gilchrist M., Casanova F., Tyrrell J.S., Cannon S., Wood A.R., Fife N., Young K., Oram R.A., Weedon M.N. Prevalence of Fabry disease-causing variants in the UK Biobank. J. Med. Genet. 2023;60:391–396. doi: 10.1136/jmg-2022-108523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Serebrinsky G., Calvo M., Fernandez S., Saito S., Ohno K., Wallace E., Warnock D., Sakuraba H., Politei J. Late onset variants in Fabry disease: Results in high risk population screenings in Argentina. Mol. Genet. Metab. Rep. 2015;4:19–24. doi: 10.1016/j.ymgmr.2015.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Homer . W. Heinemann; 1919. The Odyssey. [Google Scholar]
  • 132.CEPPELLINI R., SINISCALCO M., SMITH C.A. The estimation of gene frequencies in a random-mating population. Ann. Hum. Genet. 1955;20:97–115. doi: 10.1111/j.1469-1809.1955.tb01360.x. [DOI] [PubMed] [Google Scholar]
  • 133.Roberts M.F., Bricher S.E. Theoretical Framework for the Study of Genetic Diseases Caused by Dominant Alleles. Life. 2023;13:733. doi: 10.3390/life13030733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Hanany M., Rivolta C., Sharon D. Worldwide carrier frequency and genetic prevalence of autosomal recessive inherited retinal diseases. Proc. Natl. Acad. Sci. USA. 2020;117:2710–2716. doi: 10.1073/pnas.1913179117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Gao J., Brackley S., Mann J.P. The global prevalence of Wilson disease from next-generation sequencing data. Genet. Med. 2019;21:1155–1163. doi: 10.1038/s41436-018-0309-9. [DOI] [PubMed] [Google Scholar]
  • 136.Pritchard J.K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 2001;69:124–137. doi: 10.1086/321272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Chen S., Francioli L.C., Goodrich J.K., Collins R.L., Kanai M., Wang Q., Alföldi J., Watts N.A., Vittal C., Gauthier L.D., et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024;625:92–100. doi: 10.1038/s41586-024-07050-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Kingsmore S.F.,, Wright M., Olsen L., Schultz B., Protopsaltis L., Averbuj D., Blincow E., Carroll J., Caylor S., Defay T., et al. Genome-based newborn screening for severe childhood genetic diseases has high positive predictive value and sensitivity in a NICU pilot trial. Am. J. Hum. Genet. 2024;111:2643–2667. doi: 10.1016/j.ajhg.2024.10.020. [DOI] [PubMed] [Google Scholar]
  • 139.Schroeder B.E., Gonzaludo N., Everson K., Than K.S., Sullivan J., Taft R.J., Belmont J.W. The diagnostic trajectory of infants and children with clinical features of genetic disease. NPJ Genom. Med. 2021;6:98. doi: 10.1038/s41525-021-00260-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Supplemental methods, results, and discussion; Figures S1–S3; and supplemental table captions
mmc1.pdf (746.9KB, pdf)
Data S1. Tables S1–S15
mmc2.xlsx (5.4MB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (4MB, pdf)

Data Availability Statement

Consented proband and parent data analyzed in this study and non-human subjects data generated during this study are available at the Longitudinal Pediatric Data Resource (LPDR) under accession code nbs000003.v1.p at https://nbstrn.org/. Qualified researchers can obtain access by registration at https://nbstrn.org/login?token-expired=true&rel=/tools/lpdr. There are restrictions to the availability of raw individual data due to data privacy and confidentiality laws. Anonymized and pseudonymized individual data generated in this study, subject to the terms of informed written consent documents and state and federal laws, are provided in the supplemental information.

GTRx and the GTRx REDCap instance are available at https://gtrx.rbsapp.net/, and code is available from Christian Hansen (chansen@rchsd.org) and at https://github.com/rao-madhavrao-rcigm/gtrx. The DRAGEN Platform and Illumina Connected Analytics are available from Illumina (Shyamal Mehtalia, smehtalia@illumina.com). GEM and Transformer are available from Fabric Genomics (info@fabricgenomics.com). TileDB v.2.8.0 is available at https://github.com/TileDB-Inc/TileDB. TileDB-VCF v.0.15.0 is available at https://github.com/tiledb-inc/tiledb-vcf. Federated query data and code are available at https://github.com/rady-childrens-genomics/beginngs-partner-onboarding.


Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES