Abstract
Background
Congenital anomalies (CAs) encompass a wide spectrum of structural and functional abnormalities during fetal development, commonly presenting at birth. Identifying the cause of CA is essential for accurate diagnosis and treatment. Using a target-gene approach, genetic variants could be found in certain CA patients. However, some patients were genetically undiagnosed; therefore, it is imperative to identify the causative variants from whole genome sequence (WGS) data of these patients.
Results
An in-house pipeline utilizing DRAGEN-GATK-Hail was established for trio-based WGS data analysis (n = 18 undiagnosed CA patients and their parents) and thirty-five candidate variants, including SNV/Indel, CNV, and SV were identified. Among them, 10 variants of seven coding genes were selected as possible causal variants by variant pathogenicity, genotype–phenotype analysis, and a multidisciplinary team. Finally, functional validation of six genes including RYR3, NRXN1, FREM2, CSMD1, RARS1, and NOTCH1, revealed various phenotypes in zebrafish models that aligned with those observed in each patient. In addition to the above findings, eleven diagnostic variants initially discovered in a targeted-gene analysis from a previous study were also identified as diagnostic variants and the in-house pipeline demonstrated a significant advantage in accurately and efficiently identifying de novo variants (DNVs), compound heterozygous (CH), and homozygous variants.
Conclusions
Taken together, the in-house pipeline established in this study provides a highly valuable diagnostic tool for the identification of potential candidate variants in patients with CA. Further research into the molecular mechanisms related to the development of CAs could shed light on the functional aspects of these genetic variations and contribute to the development of therapeutic drugs.
Supplementary Information
The online version contains supplementary material available at 10.1186/s40246-024-00709-2.
Keywords: Congenital Anomalies, Whole Genome Sequencing, Genetic Variant, Rare Disease
Background
Congenital anomalies (CAs) are structural and functional anomalies present at birth, with a global prevalence of 2–5%, influenced by geographical location, diagnostic criteria, and income levels. Annually, approximately 240,000 newborns worldwide succumb to congenital diseases within 28 days of birth, and CAs cause 170,000 deaths in children aged 1 month to 5 years [1–4]. In children, CAs include anomalies related to the cardiovascular system (particularly congenital heart diseases), renal system, central nervous system, and facial structure [2, 5–7]. The severity of CAs ranges from mild abnormalities to life-threatening conditions, leading to long-term treatment and economic burdens [8, 9].
Approximately 80% of CAs originate from unknown causes, while the remaining 20% are attributed to known etiologies such as chromosomal defects, genetic abnormalities, and exposure to teratogens (10). Until recently, chromosomal microarray and karyotyping analyses were the primary methods used to identify the genetic causes of conditions such as Down syndrome, developmental disabilities, and cardiovascular malformations. However, the use of the chromosomal microarray and karyotyping did not significantly improve CA diagnosis rates [11–13].
Therefore, the development of sensitive CA diagnostic techniques is crucial. Next-generation sequencing (NGS), particularly whole-exome sequencing (WES), significantly enhances the diagnostic rate for suspected or undiagnosed genetic diseases by precisely identifying genetic abnormalities such as single nucleotide variants (SNVs), insertions and deletions (Indels), and structural variants (SVs). The diagnostic rate varies based on factors such as disorder type and the sample composition (trio or proband-only) [14]. However, existing WES information production and analysis may miss some gene coding regions that have not been captured. In addition, whole-genome sequencing (WGS) information analysis increases the possibility of discovering structural variants, splicing variants, deep intronic variants, and 5′-UTR / 3′-UTR mutations that are difficult to detect with WES analysis [15]. Recently, WGS has emerged as a valuable tool for diagnosing rare genetic diseases due to its comprehensive genomic coverage, including noncoding regions and SVs. With regard to diagnostic yield, WGS outperforms WES, targeted-gene sequencing, and chromosomal microarray (CMA) analysis [16–19].
Early detection of genetic variants using WGS is critical in the Neonatal Intensive Care Unit (NICU), enabling rapid identification and treatment of genes responsible for various CAs [20, 21]. Recently, the U.K. 100,000 Genome Project demonstrated the effectiveness of WGS, with the application of the specialized pipeline increasing diagnostic yield and clinical interventions for 25% of diagnosed patients, resulting in significant economic savings and improved quality of life [22].
Therefore, in this study, we aimed to establish an in-house pipeline for WGS analysis on trios of CA families, comprising undiagnosed patients and their parents who were not genetically diagnosed, by using a targeted-gene approach with WGS data. Additionally, we aimed to identify potential CA-associated genetic variants using the established pipeline and further functional validation. Our results will help develop a diagnostic tool to identify potential candidate variants in patients with CA.
Methods
Study design and participants
This retrospective study included 18 unrelated trio families (54 total individuals including 18 CA probands and 36 parents) who remained genetically undiagnosed from a previous study of 80 patients with CA [23]. Neonates under 1 year old, born with two or more major congenital anomalies in 2021 were recruited at the Samsung Medical Center (SMC), Seoul, Korea. All participants provided written informed consent. Participant enrollment and phenotypic analysis were performed in the previous study [24]. Briefly, probands with disorders identified through a neonatal screening program, chromosomal studies, and congenital viral studies were excluded from the study. For diagnosis, each proband received a clinical examination, performed by multidisciplinary specialists. Clinical characteristics such as gender, age, height, weight, prenatal screening test results, diagnosis, pregnancy and delivery history, family history, residence and living environment risk factors, medication history, history of exposure to teratogenic substances, and other relevant factors were extracted from medical records and obtained from patient interviews. The obtained clinical characteristics were converted, by experienced clinicians, into standardized terms using Human Phenotype Ontology terms (HPO, https://hpo.jax.org/app/) (Table S1). The descriptive statistics of clinical characteristics, performed by clinical manifestations such as affected organ and phenotype property, allowed for redundancy in each participant. This study was reviewed and approved by the SMC Institutional Review Board (IRB) and the Korea National Institute of Health IRB (Approval Number: SMC-2021–04–189–002, 2022–02–07-P-A).
WGS data
Genomic DNA was isolated from the peripheral blood. Library preparation and sequencing were performed using the TruSeq Nano DNA and NovaSeq6000 platform (Illumina, San Diego, CA, USA), respectively, following the manufacturer’s protocols. The average coverage is approximately 30x. The resulting FASTQ files were utilized for variant analysis and subsequent steps in the analysis.
Construction of an in-house pipeline for variant identification using WGS
The WGS analysis pipeline was constructed to call, filter, annotate, and discover candidate genetic variants such as SNV/Indel, CNV, and SV in each patient using trio samples, including the patient as well as the patient’s mother and father (Fig. 1). DRAGEN (Illumina) and Hail (https://hail.is) systems were utilized for WGS analysis to achieve fast alignment and accurate variant calling, filtering, and annotation as previously described [25]. Sequence reads were then mapped and aligned to the Genome Reference Consortium human genome build 38 (GRCh38). Duplicated reads were marked, followed by variant calling and joint genotyping using the DRAGEN pipeline (version 3.8). To estimate variant call accuracy, Variant Quality Score Recalibration (VQSR) was applied using GATK (version 4.2.6.1). To identify all variant types such as SNV/Indels, CNVs, and SVs, we performed trio-based WGS analysis.
Fig. 1.
Workflow of congenital anomaly-associated variants discovered using whole genome sequence data
In the Hail system (Hail 0.2), variants in multi-allelic sites, low-complexity regions (LCR), and those that did not pass the VQSR filter were excluded. Subsequent genotype-based filtering was then applied. Genotypes with variant calling depths (DP) below 10 or above 1000 were filtered out. The variants were included according to the allelic balance (AB) criteria: 0.3 ≤ AB ≤ 0.7 for SNV heterozygotes, 0.2 ≤ AB ≤ 0.8 for Indel heterozygotes, and AB ≥ 0.95 for homozygotes. Variants were excluded if their call rates were below 0.1 or if the P-value of the Hardy–Weinberg equilibrium (HWE) test was below 10−12. Additionally, any calls on the Y chromosome in female samples and heterozygous calls in non-pseudoautosomal regions (PAR) in male samples were excluded. De novo, compound heterozygous (CH), and homozygous SNV/Indels were the DRAGEN-Hail pipeline output results. For gene and consequence annotation of variants, we used the Variant Effect Predictor (VEP) [26]. To ensure high-quality de novo variant (DNV) identification, we filtered out variants with GQ ≤ 25 in the proband and those that were observed in more than 0.1% of the non-neuro subset of gnomAD GRCh38 v3.1.2. Variants were also excluded if the proband AB was less than 0.3, parent AB above 0.1, or DP ratio (proband read depth / parental read depth) below 0.3. For inherited heterozygous variants, only variants with an internal allele count (AC) below two were considered (proband: 0/1, mom: 0/1, dad: 0/0; proband: 0/1, mom: 0/0, dad: 0/1). Insertions and deletions of 50 base pairs (bp) or less were classified as Indels. To discover de novo CNV and SV, we used the CNV calling tool and MANTA program, respectively (which were embedded in the DRAGEN system) following the manufacturer’s protocols. Discovered variants were visualized by IGV and Samplot and true positive variants were processed into variant interpretation and prioritization [27, 28].
Variants discovered within the coding regions and splicing sequences were prioritized as potential candidate genes, based on their potential pathogenicity, using VarSome (Ver. 11.8, Aug. 2023, http://varsome.com/), which is aligned with the American College of Medical Genetics and Genomic (ACMG) variant classification guidelines [29, 30]. Additionally, we applied various prediction tools, including CADD (https://cadd.gs.washington.edu/), SIFT (https://sift.bii.a-star.edu.sg/), PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), SpliceAI (https://spliceailookup.broadinstitute.org/), REVEL, MetaLR, MetaRNN, and MutationTaster (VarSome), in combination with an assessment of amino acid conservation [29–32]. The impact of missense variants on protein structure was analyzed using the DynaMut2 protein prediction program (https://biosig.lab.uq.edu.au/dynamut2/), which leverages 3D structural data, PDB and AlphaFold, from Uniprot [33, 34]. The protein–protein interaction analysis was used to determine the proteins associated with the potential candidate genes [35].
To establish genotype–phenotype correlations, we gathered information from medical databases and literature sources, including OMIM (https://www.omim.org/), ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/), IMPC (https://www.mousephenotype.org/), AMELIE (https://amelie.stanford.edu/), PanelApp (https://panelapp.genomicsengland.co.uk/), and DECIPHER (https://www.deciphergenomics.org/). Candidate variants were reviewed and possible causal variants were identified at a multidisciplinary team meeting (MDT), which included neonatal and pediatric clinicians, bioinformatics analysts, clinical geneticists, and a biologist.
Functional analysis of novel genetic variants
Zebrafish is widely used as an animal model to study the role of genes identified in rare diseases, such as CA [36]. The functional validation of possible causal variants, identified from the in-house pipeline WGS analysis, was performed using the zebrafish KO model. Wild-type (WT) zebrafish (Danio rerio) were maintained in a controlled laboratory environment. For efficient CRISPR-Cas9 genome editing, single-guide RNAs (sgRNAs) were designed to specifically target and disrupt the gene of interest, using established guidelines (IDT, USA) (Table S2). Fertilized zebrafish embryos were microinjected with a mixture of sgRNAs and Cas9 protein, resulting in the generation of zebrafish embryos carrying gene mutations. Genomic DNA was extracted from injected zebrafish embryos, and genotyping was performed to confirm the presence of the desired mutations, as previously described [37]. The KO zebrafish lines were subjected to comprehensive phenotypic analysis, including morphological, physiological, and behavioral assessments. To determine the role of the genes in disease pathogenesis, zebrafish models exhibiting disease-relevant phenotypes, such as body length, facial malformation, heart defects, curved tail/body, and ‘response to stimuli’ were compared to wild-type controls in a microscopic environment. The body length and atria/ventricle area were measured using ImageJ (National Institutes of Health, Bethesda, MD, USA), and the ratio was calculated. Statistical analysis was conducted to assess the significance of observed differences in disease-related phenotypes between knockout and wild-type zebrafish (two-tailed t-test). All animal experiments were conducted in accordance with institutional and ethical guidelines, ensuring the humane treatment of the zebrafish specimens throughout the study.
Results
Clinical characteristics and phenotypic classification of the study participants
The clinical features of patients with CA were analyzed using clinical indices such as gestational age, birth weight, gender, preterm, in vitro fertilization (IVF), and maternal/paternal age. The average gestational age was 37.2 weeks (± 1.8) and the average birth weight and height were 2501.4 g (± 602.5) and 45.4 cm (± 3.1), respectively. The proportion of IVF was 33.3% (6/18) and the average maternal age was 35.3 years (± 3.3). The number of mothers over 35 years old was 13 (72.2%) and the proportion of infants that were small for gestational age was 61% (11/18). The majority of phenotypes (75.1%) were detected at birth or less than 1 month after birth. Half of the patients have 5–10 phenotypes (Table 1). Abnormality of the nervous system was the top phenotypic subgroup followed by several organs/phenotypes such as abdomen, growth, head or neck, skeletal, cardiovascular, and genitourinary abnormalities. All participants had multiple anomalies with four or more phenotypes (Fig. S1 and Table S1).
Table 1.
Clinical characteristics of the study population
| Clinical characteristics | Patients (n=18) |
|---|---|
| Gestational age (weeks) | 37.2 ± 1.8 |
| Birth weight (g) | 2501.4 ± 602.5 |
| Birth height (cm) | 45.4 ± 3.1 |
| Male/Female (n) | 9 (50%)/9 (50%) |
| In vitro fertilization (n) | 6 (33.3%) |
| Maternal age (year) | 35.3 ± 3.3 |
| Advanced maternal age (≥35 years) | 13 (72.2%) |
| Non-Korean parents (n) | 1 (5.6%) |
| Small for gestational age (n) | 11 (61.1%) |
| Preterm infant (<37 weeks) | 5 (27.8%) |
| Low birth weight (<2500 g) | 8 (44.4%) |
| Death (n) | 0 (0%) |
| Detection of phenotype (n) | |
| Prenatal | 24 (13.9%) |
| Neonatal (at birth, < a month) | 130 (75.1%) |
| Post-neonatal (1–15 months) | 19 (11.0%) |
| HPO terms (n) | |
| ≤ 4 | 2 (11.1%) |
| ≥ 5 | 9 (50.0%) |
| ≥ 10 | 7 (38.9%) |
Variants discovery using the established in-house pipeline
We analyzed the variant-annotated data from the established in-house pipeline to identify potential CA-associated candidate variants (Fig. 1). To identify SNV/Indel variants, firstly, we focused on de novo variants because our participants were composed of trio samples. As shown in Fig. 2, after filtering variants for annotation of coding sequences and splicing sites, 26 DNVs were identified. Secondly, after filtering out non-private variants present in other family samples, we identified 64 CH variants within the coding region of 27 genes. Thirdly, after filtering out variants missing one parental side or those also discovered in other families, we collected 47 rare homozygous variants in the coding region of 43 genes. To identify CA-associated CNV/SV, we focused on de novo variants. From the CNV analysis using the DRAGEN platform, we identified 54 de novo CNVs (dnCNVs) following visualization using Samplot and IGV software. From the SV analysis, using Manta software embedded in DRAGEN system, 56 de novo SVs were identified (Fig. 2). From the 18 undiagnosed patients with CA, we identified 35 candidate SNV/Indel variants, including seven DNVs and 28 CH variants in 21 genes (Table 2). Some patients have more than one candidate gene because the most likely candidate genetic variants could not be identified. SNV/Indel variant types include three frameshifts and two stop-gains inducing premature termination of protein synthesis, one stop-loss, one in-frame deletion, one splicing, and 27 missense variants. Despite careful examination of all data, including CNV, SV, and SNV/Indel, we were unable to identify any candidate variants in one patient (patient-8). Furthermore, of the 29 families recruited, 11 CA patients had previously been diagnosed at the hospital with known disease-causing genes (Fig. 2). Using our in-house pipeline, we once again verified the genetic variants in these 11 pre-diagnosed CA patients and confirmed the capability of our pipeline [23]. As a result, we identified 11 diagnosed genes, including SNVs/Indels in KRT14, ABCC8, CREBBP, PURA, PKHD1, PTPN11, HNRNPR, SERPINC1, and NIBPL, as well as a CNV and an SV deletion in EYA1 and FOXC1 (Table S3).
Fig. 2.

Summary of results from the trio-based WGS analysis of patients with congenital anomaly using the established in-house pipeline. The numbers in parentheses indicate the number of variants
Table 2.
List of potential candidate or possible causal variants from 18 undiagnosed CA patients
| Patient no. | Gene* | Variant | Variant type | Zygosity | Inheritance | ACMG classification (Criteria)** | OMIM (Gene) |
|---|---|---|---|---|---|---|---|
| 4 | RYR3* | c.12295G > T (p.Glu4099Ter) | Stop Gained | Het | De novo | LP (PVS1, PM2) | 180903 |
| 5 | NRXN1* | c.298C > G (p.Pro100Ala) | Missense | Het | De novo | VUS (PM2, PP3, BP1) | 600565 |
| MED24 | c.1775G > A (p.Trp592Ter) | Stop Gained | Comp. Het | Paternal | VUS (PVS1, PM2) | 607000 | |
| c.1376G > C (p.Gly459Ala) | Missense | Maternal | VUS (PM2, PP3, BP1) | ||||
| 6 | RANBP2* | c.9673 T > C (p.Ter3225Glnext*6) | Stop Lost | Comp. Het | Maternal | LP (PM4, PM2, BP4) | 601181 |
| c.7918C > A (p.Pro2640Thr) | Missense | Paternal | VUS (PM2, BP1) | ||||
| 7 | COL18A1 | c.2873C > T (p.Pro958Leu) | Missense | Comp. Het | Maternal | VUS (PM2, BP1) | 120328 |
| c.2973_2981del (p.Gly994_Pro996del) | Inframe Deletion | Paternal | VUS (PM4, PM2, BP6) | ||||
| AGRN | c.749A > C (p.Asn250Thr) | Missense | Comp. Het | Maternal | LB (BP4, BP1, PM2) | 103320 | |
| c.5075C > T (p.Ser1692Leu) | Missense | Paternal | VUS (PM2, BP1) | ||||
| 8 | No Candidate | – | – | – | – | – | – |
| 9 | CPLANE1 | c.2439G > C (p.Gln813His) | Missense | Comp. Het | Paternal | VUS (PM2, BP1) | 614571 |
| c.608A > G (p.Tyr203Cys) | Missense | Maternal | LB (PM2, BP4, BP1) | ||||
| 11 | COL6A6 | c.1804A > G (p.Ile602Val) | Missense | Comp. Het | Paternal | LB (BP4, BP1, PM2) | 616613 |
| c.4504C > T (p.Arg1502Cys) | Missense | Maternal | VUS (PM2, BP1) | ||||
| 12 | YY1AP1 | c.1843_1846del (p.Glu616ProfsTer13) | Frameshift | Het | De novo | P (PVS1, PS2, PM2, PP5) | 607860 |
| UNC80 | c.3916G > A (p.Asp1306Asn) | Missense | Comp. Het | Maternal | VUS (PM2, BP1) | 612636 | |
| c.9142G > A (p.Val3048Met) | Missense | Paternal | LB (BP4, BP6, BP1, PM2) | ||||
| 13 | FREM2* | c.412 T > C (p.Tyr138His) | Missense | Het | De novo | VUS (PS2, PM2, PP3, BP1) | 608945 |
| 15 | ITGA8 | c.1793C > T (p.Ser598Phe) | Missense | Comp. Het | Maternal | VUS (PM2, BP1) | 604063 |
| c.916A > G (p.Met306Val) | Missense | Paternal | LB (PM2, BP1, BP4) | ||||
| 16 | CSMD1* | c.640G > A (p.Gly214Arg) | Missense | Het | De novo | VUS (PM2, PP3, BP1) | NA |
| 18 | AOX1 | c.233 T > C (p.Ile78Thr) | Missense | Comp. Het | Maternal | LB (BP4, BP1, PM2) | NA |
| c.2264C > T (p.Thr755Ile) | Missense | Paternal | B (BS1, BS2, BP4, BP1) | NA | |||
| 20 | SLC27A3 | c.878G > T (p.Gly293Val) | Missense | Comp. Het | Paternal | LB (BP4, PM2) | NA |
| c.1841del (p.Pro614HisfsTer49) | Frameshift | Maternal | B (BS1, BS2) | ||||
| 21 | RARS1* | c.1775dup (p.Leu592PhefsTer8) | Frameshift | Comp. Het | Maternal | LP (PVS1, PM2) | 107820 |
| c.1901G > T (p.Arg634Leu) | Missense | Paternal | VUS (PP3, PM2, BP1) | ||||
| 24 | COL5A2 | c.2968G > A (p.Gly990Arg) | Missense | Comp. Het | Paternal | VUS (PP3, PM2, PP5, BP1) | 120190 |
| c.1109C > G (p.Pro370Arg) | Missense | Maternal | VUS (PP3, PM2, BP1) | ||||
| 26 | CFTR | c.374 T > C (p.Ile125Thr) | Missense | Comp. Het | Maternal | VUS (PM1, PM2, BP4) | 602421 |
| c.2042A > T (p.Glu681Val) | Missense | Paternal | LB (PM2, BP4) | ||||
| 28 | VAMP4 | c.114NA15T > G (Splicing) | Splicing acceptor loss | Het | De novo | VUS (PM2, BP4) | NA |
| MACF1 | c.4629 + 6998 T > C (Ile2588Thr) | Missense | Het | De novo | VUS (PS2, PM2, BP4) | 608271 | |
| 30 | NOTCH1* | c.6557G > T (p.Gly2186Val) | Missense | Comp. Het | Paternal | B (BS1, BS2, BP1) | 190198 |
| c.2054A > T (p.Asn685Ile) | Missense | Maternal | VUS (PP3, PM2, BP1) |
| Patient No | CADD | SIFT | PolyPhen-2 | Protein Stability*** | Varsome | |
|---|---|---|---|---|---|---|
| Meta Scores | Indivi Scores | |||||
| 4 | Pathogenic strong | NA | NA | NA | 8,0,0 | 12,2,1 |
| 5 | Benign moderate | Uncertain | Benign Supporting | Destablizing | 2,4,0 | 3,13,10 |
| Pathogenic strong | NA | NA | NA | 8,0,0 | 8,3,1 | |
| Uncertain | Uncertain | Pathogenic Supporting | Destabilizing | 1,4,1 | 7,13,3 | |
| 6 | Benign strong | NA | NA | NA | 0,0,3 | 7,1,6 |
| Uncertain | Pathogenic Supporting | Pathogenic Supporting | 0,1,8 | 8,8,7 | ||
| 7 | Pathogenic supporting | Benign Supporting | Pathogenic Supporting | Destablizing | 3,2,3 | 6,10,9 |
| NA | NA | NA | NA | NA | NA | |
| Benign strong | Benign Moderate | Benign | Stabilizing | 0,0,18 | 1,4,26 | |
| Uncertain | Uncertain | Probably Damaging | Stabilizing | 2,4,0 | 5,14,0 | |
| 8 | - | - | - | - | - | - |
| 9 | Benign moderate | Uncertain | Probably Damaging | NA | 0,1,9 | 2,9,15 |
| Uncertain | Uncertain | Probably Damaging | 0,0,16 | 2,11,11 | ||
| 11 | Uncertain | Benign Supporting | Uncertain | Destabilizing | 0,3,9 | 3,9,13 |
| Benign supporting | Benign Moderate | Benign Supporting | Stabilizing | 0,3,4 | 1,7,22 | |
| 12 | NA | NA | NA | NA | NA | NA |
| Pathogenic moderate | Pathogenic Supporting | Uncertain | NA | 0,1,9 | 12,7,8 | |
| Pathogenic supporting | Pathogenic Supporting | Pathogenic Supporting | NA | 0,0,15 | 10,5,11 | |
| 13 | Pathogenic moderate | Pathogenic Supporting | Pathogenic Supporting | NA | 4,3,0 | 9,6,3 |
| 15 | Pathogenic moderate | Pathogenic Supporting | Uncertain | Destabilizing | 1,3,2 | 12,10,4 |
| Benign moderate | Uncertain | Benign Supporting | Destabilizing | 0,0,13 | 0,9,21 | |
| 16 | Uncertain | Pathogenic Supporting | Pathogenic Moderate | NA | 3,1,5 | 8,10,10 |
| 18 | Pathogenic supporting | Pathogenic Supporting | 0.969/Uncertain | Destabilizing | 0,0,10 | 5,11,7 |
| Pathogenic supporting | Pathogenic Supporting | Pathogenic Supporting | Destabilizing | 1,3,5 | 17,7,2 | |
| 20 | Pathogenic moderate | Pathogenic Supporting | Probably Damaging | Destabilizing | 2,4,2 | 17,7,3 |
| NA | NA | NA | NA | NA | NA | |
| 21 | NA | NA | NA | NA | NA | NA |
| Pathogenic moderate | Pathogenic Supporting | Pathogenic Supporting | Destabilizing | 11,0, 0 | 17,11,0 | |
| 24 | Pathogenic supporting | Pathogenic Supporting | Pathogenic Supporting | Destabilizing | 22,0,0 | 33,5,0 |
| Pathogenic supporting | Uncertain | Uncertain | Destabilizing | 9,0,0 | 8,14,1 | |
| 26 | Uncertain | Uncertain | Benign Supporting | Destabilizing | 1,4,4 | 5,9,12 |
| Uncertain | Uncertain | Uncertain | Stabilizing | 4,2,4 | 5,17,1 | |
| 28 | Benign supporting | NA | NA | NA | NA | 0,1,1 |
| Benign moderate | Benign Moderate | Benign Moderate | NA | 0,0,13 | 0,4,26 | |
| 30 | Pathogenic supporting | Benign Supporting | Pathogenic Supporting | Destabilizing | 0,5,1 | 9,12,5 |
| Benign moderate | Pathogenic Supporting | Uncertain | Stabilizing | 4,1,3 | 7,8,18 | |
* Variants selected as possible causal variant
** Modified ACMG classification
*** NA: No data from PDB/AlphaFold or Frameshift/Stop_gained Variants
In silico prediction of pathogenicity of candidate variants
We have predicted the pathogenicity of 35 candidate genetic variants (Table S4). As expected in ACMG guideline-based variant classification, we found that stop-gain and frameshift-induced premature termination variants may have the strongest CA-development effects. These genes, including RYR3, MED24, YY1AP1, SLC27A3, and RARS1, were found to belong to this category (stop-gain and frameshift-induced premature termination variants). Additionally, variants in other genes such as COL18A1, UNC80, FREM2, ITGA8, AOX1, COL5A2, and NOTCH1, exhibited pathogenic scores according to CADD. Among the genes, the RARS1 gene in patient-21 has the second highest number of pathogenic variants, with one variant showing 11 pathogenic-points in ‘Meta-score’ and 17 pathogenic-points in ‘Individual-score’. The other variant induces frameshift-induced early termination, which is expected to be pathogenic. The third-most pathogenic variant is the DNV in the RYR3 gene found in patient-4, with 8 points in ‘Meta-score’ and 12 points in ‘Individual-score’, respectively. Additionally, several candidate genes have variants, including MED24, FREM2, CSMD1, and NOTCH1, with potential pathogenic properties.
We also assessed the stability and flexibility of proteins altered by missense variants using DynaMut2. Destabilization was indicated in at least one variant of the following genes: NRXN1, MED24, COL18A1, COL6A6, ITGA8, AOX1, SLC27A3, RARS1, COL5A2, CFTR, and NOTCH1 (Table S5). However, due to the unavailability of protein structures in PDB and AlphaFold, these results could not be obtained for genetic variants in the genes RANBP2, CPLANE1, YY1AP1, FREM2, CSMD1, and MACF1.
Further details on pathogenicity prediction reveal that the de novo variant, c.C298G, in the NRXN1 gene results in the substitution of proline with alanine at the 100th amino acid residue (Fig. S2A, B and C). This particular amino acid is situated within the laminin G-like 1 domain and is highly conserved across a variety of species, including humans, rhesus monkeys, mice, chickens, and zebrafish. Structural prediction of NRXN1 revealed that Pro100Ala variant is expected to destabilize the NRXN1 protein, resulting in a stability change of −0.58 kcal/mol (Table S5). Further details regarding other genetic variants show that the arginine at position 634 (Arg634), one of the CH variants in RARS1, is highly conserved across various species, from humans to zebrafish, in terms of evolutionary conservation (Fig. S2D, E and F).
Additionally, three-dimensional RARS1 modeling, using DynaMut2, indicated that in the wild-type (WT) form, Arg634, being a positively charged amino acid, forms hydrogen bonds (H-bonds) with five other amino acids (Cys615, Cys617, Asn631, Leu637, and Gln571). However, the change from Arg634 to Leu634 is predicted to result in the loss of H-bonds with Gln571, Cys615, and Asn631, while gaining additional H-bonds with Cys617, Leu637, and Cys638. Additionally, there are dramatic changes in the number of interactions, such as hydrophobic, polar, ionic, and van der Waals interactions, with neighboring amino acids. Therefore, the amino acid change, Arg634Leu, is expected to destabilize the RARS1 protein, with a stability change of −0.29 kcal/mol (Table S5).
Identification of possible causal variants for CA
Following the in-silico prediction of variant pathogenicity, genotype–phenotype relationships, and manual prioritization by MDT, we identified 10 possible causal variants of seven genes in seven of the patients. These variants comprise four DNVs and six CH variants, including one stop gain, one stop loss, one frameshift-induced early termination, and seven missense variants (Table 2). Detailed descriptions regarding variant pathogenicity, genotype–phenotype relationships, and the prioritization of possible causal variants for each patient are provided below.
In the case of patient-4, the variant (p.Glu4099Ter) with the stop-gain effect was predicted as likely pathogenic (LP). The variant inheritance pattern of RYR3 was autosomal recessive (AR) in OMIM; however, we observed AD (DNV). Patient-4 did not have another RYR3 variant to confirm the mode of inheritance. In patient-5, we identified one VUS DNV (p.Pro100Ala) in NRXN1. Patient-6 had CH variants in the RANBP2 gene; one was a maternal-inherited p.Ter3225Glnext*6 (classified as LP) and the other was a paternal-inherited p.Pro2640Thr (classified as VUS). In OMIM, RANBP2 gene is known for encephalopathy. In patient-13, a de novo missense variant (p.Tyr138His), classified as a VUS, was observed in FREM2, which is related to the ‘Cryptophthalmos’ and ‘Fraser syndrome 2’ in an AR inheritance mode. A DNV (p.Gly214Arg) of CSMD1 in patient-16 was predicted as VUS with PM2, PP3 and BP1, but upscaled to LP on applying PS2 de novo variant criteria. Given that the CSMD1 variants are not documented in OMIM and ClinVar, the variants are considered novel. In patient-21, the RARS1 CH variants caused the ‘Leukodystrophy, hypomyelinating, 9’ phenotype in an AR inheritance mode. The frameshift variant p.Leu592PhefsTer8 was predicted to be pathogenic, while the missense variant p.Arg634Leu was predicted to be a variant of uncertain significance (VUS). Patient-30 had CH variants in NOTCH1, known as the causal gene for ‘Adams-Oliver syndrome 3’ and ‘Aortic valve disease 1’ in OMIM. Both p.Gly2186Val and p.Asn685Ile are missense variants, classified as benign and VUS, respectively.
Functional validation of possible causal variants and identification of potential causal genes of CA patients
We validated the pathogenicity of genes containing seven possible causal genes using the zebrafish knockout (KO) model generated with CRISPR-Cas9 system and identified six genes as potential causal genes of CAs. As shown in Fig. 3 and Table S1, the KO zebrafish models for possible causal genes displayed phenotypic abnormalities corresponding to patient clinical characteristics.
Fig. 3.
Phenotypic characteristics of zebrafish knockout (KO) models for six genes, including RYR3, NRXN1, FREM2, CSMD1, RARS1, and NOTCH1. A, B Atria and ventricle areas in RYR3 KO zebrafish are larger compared to those in control at 3dpf. The red arrow indicates the area of ventricle and the blue indicates the atria. C, D Compared to that in the control, the body length decreased in the NRXN1 KO model, and some individuals showed pericardial edema/congestion in KO (red arrow). E Compared to that in the control, ‘Total moved distance’ was decreased in the FREM2 KO model (left panel) and ‘the number of stimuli until escape’ increased (right panel). F, G Compared to those in the control, the atria and ventricle sizes in heart were increased (blue and red arrow), the body length was shortened, and ear area was decreased (orange arrow) in the CSMD1 KO model. H, I Body length in the RARS1 KO model is shorter and the swim bladder is not developed compared to that of the control at 5dpf (blue arrow). Some individuals showed mandible protrusion and a short mandible length compared to the control (red arrow). J, K Compared to that in the control, the body length was shortened, the size of head and eye (green circle) decreased in NOTCH1 KO. The KO model showed tail curvature (purple circle) and pericardial edema/congestion (red arrow). The data depict the area of atria and ventricle (A, F), the length of body (B, F, H, J), and the area of ear (F) for each zebrafish individual, with the data plot representing the mean ± SEM (n = 10). Statistical significance was determined using a two-tailed t-test. *p < 0.05; **p < 0.01; ***p < 0.001; dpf, day post-fertilization; SEM, standard error of the mean
The atria and ventricle areas of the RYR3 KO model were significantly enlarged compared to the control, which aligns with patient phenotypes, including ‘ventricular septal defect’, ‘secundum atrial septal defect’, and ‘tricuspid valve prolapse’ (Fig. 3A and B). The NRXN1 KO model showed a short body length and pericardial edema/congestion, related to patient phenotypes such as pulmonary artery hypoplasia (Fig. 3C and D). The FREM2 KO model showed the increase of ‘stimulus number to escape response’, related to patient phenotypes, including brain disorders such as seizures and global development delay (Fig. 3E). The CSMD1 phenotypes of the KO model were short body length, small ear, and enlarged atria and ventricular areas that were related to patient phenotypes, such as small for gestational age and secundum atrial septal defect (Fig. 3F and G). The RARS1 KO model exhibited a reduced body length and various deformities, including an uninflated swim bladder and mandible protrusion. These findings align with patient phenotypes such as ‘small for gestational age’, ‘short stature’, ‘high, narrow palate’, ‘failure to thrive’, and ‘oropharyngeal dysphagia’ (Fig. 3H and I). The NOTCH1 KO model showed pericardial edema/congestion, short body length, small head and eye, and curved tail, which are similar to patient phenotypes such as ventricular septal defect, patent ductus arteriosus, and short stature (Fig. 3J and H).
Discussion
For undiagnosed CA patients in whom genetic factors causing congenital disorders could not be identified, the discovery of the genetic variants that cause the anomaly is very important for the treatment and prognosis of patients [20]. Approximately 80% of the causes of CA are expected to be genetic abnormalities, but because they cannot be interpreted using genetic variation known to date, more genes causing CAs need to be identified [10]. To date, targeted-gene analysis has been commonly used to discover genetic variants related to CA. However, with recent developments in sequencing technology, WGS data production and analysis are being used to find causal genetic variations for rare diseases, especially CA [18, 19].
To identify CA-associated genetic variants in undiagnosed CA patients, we constructed an in-house pipeline designed to encompass all variant types, including SNV/Indel and CNV/SV, from trio-based WGS data. This in-house pipeline not only excels at accurately and efficiently identifying DNVs, CH, and homozygous variants but also enables rapid quality control and variant calling through the DRAGEN-Hail platform. Despite these strengths, the pipeline does have limitations, including primarily the high cost and limited accessibility of the DRAGEN system, which may hinder widespread adoption. Using this pipeline, we have discovered, annotated, and interpreted those genetic variants. Furthermore, we have validated the role of candidate genes in the development of zebrafish by using knockout models. Therefore, we have suggested several genes as candidate causative genes in the development of CA.
In general, patients with CA exhibit multiple phenotypes in various organs, including heart, kidney, craniofacial features, and the brain, among others [1, 2]. Similarly, our patients present a wide range of phenotypes, with counts ranging from four to twenty phenotypes distributed across different organs (Table S1 and Fig. S1). The most prevalent phenotype affects the nervous system, followed by phenotypes in other systems. Notably, as patients may exhibit multiple phenotypes within the same affected system, there was no significant difference in the number of phenotypes within these affected systems among patients.
Through our in-house pipeline for WGS data analysis from trio-based patients with CA, we sought to discover various genetic variants, including SNV/Indel, dnCNV, and dnSV. Our investigation led to the identification of multiple candidate variants in 21 novel genes, not previously reported as causal variants for CA phenotypes (Fig. 2 and Table 2). Therefore, we retrospectively analyzed the WGS data of all 29 CA patients, including the 18 undiagnosed patients examined in this study. We also identified 11 variants previously diagnosed in a separate study [23], consisting of 9 SNV/Indels and 2 CNVs (Table S3). Hence, it is unlikely that genetic variants in known disease-related genes would be discovered among the 18 undiagnosed CA patients. However, a thorough examination of variants and genotype–phenotype studies, based on the results from our WGS analysis pipeline, enabled us to identify novel potential candidate genetic variants associated with each patient. Among these variants, several exhibited strong pathogenicity as per ACMG rules and in silico predictions (Table 2). After analyzing the phenotype of zebrafish KO models for those 7 genes, we proposed six of them as candidate causal genes for undiagnosed CA patients.
In the case of patient-4, who had duodenal atresia, cardiovascular abnormalities, and vesicoureteral reflux, we identified a stop gain variant in RYR3 as a potential causative candidate. The RYR3 gene belongs to the ryanodine receptors (RyRs) family, which serves as the calcium release channel controlling calcium entry from intracellular stores into the cytosol, playing a crucial role in muscular contraction [38]. While RYR1 and RYR2 are predominantly expressed in the heart and skeletal muscle, respectively, RYR3 exhibits relatively low levels of expression in a wide range of tissues [39]. Representatively, mutations in RYR3 have been associated with congenital myopathy (OMIM: 620310). Furthermore, damaging missense variants in RYR3 are found in patients with congenital heart disease (CHD), presenting RYR3 as a candidate gene alongside RYR2 [40]. Additionally, the zebrafish RYR3 KO model revealed an enlarged atria and ventricle area which correlated with the phenotype of patient-4 (Fig. 3A, B and Table S1). Given that the stop gain variant has not been reported in the general population and categorized as LP based on ACMG guidelines (Table 2), RYR3 could underlie the congenital abnormality seen in patient-4.
A de novo variant of NRXN1 was identified in patient-5 with esophageal atresia/tracheoesophageal fistula (EA/TEF). While the CADD, SIFT, and PolyPhen-2 scores did not suggest damaging effects for this de novo variant, the MetaRNN score, which combines multiple in-silico predictors, was 0.7844, providing support for its pathogenicity (Table S4). Additionally, it was predicted that the P100A mutation would lead to destabilizing changes in the protein structure (Table S5). It is worth noting that EA/TEF in patient-5 is not a typical feature associated with NRXN1-related diseases. Normally, autosomal dominant and recessive variants in NRXN1 are related to schizophrenia (OMIM: 614332) and Pitt-Hopkins-like syndrome 2 (OMIM: 614325), respectively. However, a recent study had revealed the presence of exonic NRXN1 deletions in individuals with congenital anomalies, including tracheoesophageal fistula [41]. Our investigation was expanded with reference to the DECIPHER database to explore the association between NRXN1 and EA/TEF. Among the phenotypes of people with SNV in the NRXN1 gene, patients with a heterozygous stop-gain variant had tracheoesophageal fistula, which is one of the phenotypes observed in patient-5. Additionally, the zebrafish NRXN1 KO model exhibited a short body length and pericardial edema and congestion (Fig. 3C and D). The findings suggest that the missense variant in NRXN1 found in patient-5 might be related to the phenotypes of the patient.
In the case of FREM2 from patient-13, the discovered DNV was annotated as a missense variant inducing tyrosine to histidine at 138 amino acid position. It was predicted as VUS with PS2, PM2, PP3, and BP1 criteria. In addition, due to pathogenic properties such as CADD, SIFT, and PolyPhen prediction as well as MetaRNN analysis (Tables 2 and S4), it serves as a potential CA causative candidate variant. Based on gnomAD database, the variant is not reported in the general population. The protein structure of FREM2 with a missense variant is unpredictable because there is no predicted protein structure of WT of FREM2 (Table S5). However, FREM2 is known to be related to ‘Fraser syndrome 2’ and ‘Cryptophthalmos’ as AR inheritance mode in OMIM as well as ‘delayed speech and language’ in PanelApp. FREM2 is involved in the development of congenital diaphragmatic hernia (CDH) [42]. The zebrafish model of FREM2 KO shows several phenotypes which are related to patient disorders, including ‘total moved distance’ and ‘number of stimuli until escape’. However, there is no difference in larvae morphology (Fig. 3E). Taken together, the findings suggest that of the FREM2 DNV might be involved in several phenotypes of patients such as ‘global developmental delay, ‘encephalomalacia, and ‘hypotonia’.
In patient-16, a de novo missense variant in CSMD1, predicted as a VUS, shows high pathogenicity scores and conservation across species, suggesting it may be pathogenic (Table 2 and S4). Though association with disorder is not listed in the OMIM database, five LP variants in different positions are associated with autism and schizophrenia in ClinVar. CSMD1, expressed in sensory hair cells of the organ of Corti, and may play a role in age-related hearing loss. It is also a newly identified complement-regulatory protein with multiple domains, expressed in the central nervous system and epithelial tissues [43]. Furthermore, CSMD1 is associated with various processes, including development, neurodevelopment, and cancer progression. The identified variant in the 2nd CUB domain, crucial for development and immunity, highlights its potential involvement in these pathways [44, 45]. Additionally, the zebrafish CSMD1 KO model showed phenotypes such as enlarged atria and ventricles, reduced body length, and smaller ear area, which resembled the patient’s features (Fig. 3F, G, and Table S1). These findings suggest CSMD1’s involvement in phenotypes such as ‘small for gestational age,’ ‘microtia,’ ‘hearing impairment,’ and ‘atrial septal defect’.
In patient-21, heterozygous RARS1 variants were identified: a likely pathogenic frameshift/early termination variant and a VUS missense variant with high pathogenicity scores (Table 2 and S4). The missense variant, Arg634Leu, is predicted to destabilize the RARS1 protein and alter amino acid interactions. Both variants are conserved across species from zebrafish to humans (Table S5 and Fig. S2D, E, F). RARS1 is a known causal gene for ‘hypomyelinating leukodystrophy 9 (HLD9)’, a disorder with diverse phenotypes due to myelin deficits and potential errors in RNA synthesis enzymes, such as aspartyl-tRNA synthetase [46–48]. HLD9 symptoms caused by RARS1 mutation include microcephaly, eye disorders, hypertonicity, mental retardation, and seizures (https://www.omim.org/entry/616140). As shown in Table S1, patient-21 exhibits similar phenotypes, such as hypomyelination, cerebral calcification, seizures, and global developmental delay. Insufficient amino-acyl tRNA synthetase during embryonic development can lead to varied clinical phenotypes across multiple organs [49]. RARS1 knockout mice show significant behavioral and neurological issues, suggesting RARS1’s critical role in development (IMPC: https://www.mousephenotype.org/data/genes/MGI:1914297). The zebrafish RARS1 KO model also displays phenotypes similar to those of patient-21, including shorter body length and deformities (Fig. 3H and I). These findings support RARS1 as a potential causal gene for the hypomyelination phenotype observed in patient-21.
In patient-30, CH variants were identified in the NOTCH1 gene, which is associated with cardiovascular and developmental disorders, including those affecting the heart, bone, eye, and limb [50, 51]. Although one variant was classified as a VUS (p.N685I) and the other as benign (p.G2186V) based on ACMG guidelines, further analysis revealed that both variants demonstrate significant pathogenicity. The benign variant shows high pathogenicity in ‘individual pathogenic analysis’, CADD and PolyPhen-2 and is predicted to destabilize protein stability. The VUS variant also exhibits high pathogenicity in multiple analyses, including SIFT, MetaLR, and MetaRNN (Tables S4 and S5). Dysregulation of the Notch signaling pathway has been linked to developmental anomalies similar to those seen in patient-30, including VACTERL association [52, 53]. Therefore, these CH variants in NOTCH1 may contribute to the development of multiple anomalies through an autosomal recessive mechanism. Moreover, zebrafish NOTCH1 knockout models display disorders similar to the patient’s phenotypes (Fig. 3J and K). Additionally, abnormal vertebrate morphology in NOTCH1 KO mice, observed in IMPC (https://www.mousephenotype.org/data/genes/MGI:97363), suggests that NOTCH1 is crucial in the development of VACTERL association.
This study has three limitations. First, the inclusion of CA patients from a single local hospital may have introduced a bias toward specific clinical characteristics of CA. However, considering that genetic variants of CA are very rare, and the phenotypes of CA are highly varied, this limitation may not be a significant concern. Nevertheless, analyzing data from a larger cohort of patients could lead to the discovery of more genetic variations and allow for the analysis of specific sub-phenotypes of CA. Second, our focus was on isolating SNVs/Indels, including de novo, CH, and homozygous variants, as well as de novo CNVs/SVs. We did not consider inherited CNVs/SVs from one parent, as neither parent has a CA. However, if we identified an SNV/Indel variant in a gene reported as causative in OMIM, we attempted to detect CNVs/SVs inherited from one parent. Third, we did not identify variants in the complex types of SVs, repetitive regions, and transposable elements. We also filtered out variants in multi-allelic sites and low-complexity regions, as depicted in Fig. 1. However, given that most genetic variants discovered in rare diseases, especially CA, consist of SNVs/Indels and de novo CNVs/SVs, our pipeline proves helpful in uncovering the genetic causes of CA. Additionally, we validated the functional effects of identified genes using a zebrafish knockout model to address the limitations of variant interpretation. Nonetheless, future studies conducted on larger CA cohorts by independent groups of researchers may be required to evaluate the performance of our proposed in-house pipeline and determine whether it can be considered as an established workflow for future studies.
Conclusions
In conclusion, we developed an in-house pipeline for trio-based WGS analysis of patients with CA and have proposed novel causative genetic variants in six genes, including RYR3, NRXN1, FREM2, CSMD1, RARS1, and NOTCH1 through variant calling, interpretation, in-silico prediction, genotype–phenotype analysis, and the use of the zebrafish KO model. Additionally, using the established in-house pipeline, we identified eleven genetic variants that were previously reported as diagnostic variants in targeted-gene study [23]. This demonstrates the effectiveness and applicability of our in-house WGS analysis pipeline for the identification of genetic variants, including SNVs/Indels and CNVs/SVs, related to the development of CA.
Supplementary Information
Acknowledgements
We would like to thank Editage (www.editage.co.kr) for English language editing.
Abbreviations
- CAs
Congenital Anomalies
- WGS
Whole Genome Sequence
- CHD
Congenital Heart Defects
- NGS
Next-Generation Sequencing
- WES
Whole Exome Sequence
- Indels
Insertions and Deletions
- SVs
Structural Variants
- HPO
Human Phenotype Ontology
- VEP
Variant Effect Predictor
- DNV
De Novo Variant
- IGV
Integrative Genomics Viewer
- ACMG
American College of Medical Genetics and Genomic
- WT
Wild-Type
- KO
Knock Out
- sgRNAs
Single-Guide RNAs
- VUS
Variants of Uncertain Significance
Author’s contribution
JMK, HWC, HYP and MHP designed and organized the study. JMK and HWC did formal analysis, intrepretated the data and wrote the main manuscript text. HYP and MHP conceived and supervised the study and revised manuscript. DMS and JK analyzed the WGS data. OHK analyzed the data of zebrafish models. HL, GHL and JYA constructed the analysis pipeline for WGS data. MY, HSJ, JHJ and YSC contributed to the clinical data. All authors reviewd the manuscript.
Funding
Korea National Institute of Health, 2022-NI-060-00 & 2022-NI-060-01
Availiability of data materials
All data generated or analyzed during this study are included in this published article and its supplementary information files.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Jeong-Min Kim and Hye-Won Cho have contributed equally to this work.
Contributor Information
Hyun-Young Park, Email: hypark65@korea.kr.
Mi-Hyun Park, Email: mihyun4868@korea.kr.
References
- 1.WHO. WHO birth defects (FactSheets). 2023.
- 2.NCARDRS. NCARDRS Congenital Anomaly Official Statistics Report, 2020. 2020.
- 3.Lupo PJ, Mitchell LE, Jenkins MM. Genome-wide association studies of structural birth defects: a review and commentary. Birth Defects Res. 2019;111(18):1329–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lee KS, Choi YJ, Cho J, Lee H, Lee H, Park SJ, et al. Environmental and genetic risk factors of congenital anomalies: an umbrella review of systematic reviews and meta-analyses. J Korean Med Sci. 2021;36(28): e183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Allred ET, Perens EA, Coufal NG, Sanford Kobayashi E, Kingsmore SF, Dimmock DP. Genomic sequencing has a high diagnostic yield in children with congenital anomalies of the heart and urinary system. Front Pediatr. 2023;11:1157630. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Boyle B, Addor MC, Arriola L, Barisic I, Bianchi F, Csaky-Szunyogh M, et al. Estimating global burden of disease due to congenital anomaly: an analysis of European data. Arch Dis Child Fetal Neonatal Ed. 2018;103(1):F22–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Al-Dewik N, Samara M, Younes S, Al-Jurf R, Nasrallah G, Al-Obaidly S, et al. Prevalence, predictors, and outcomes of major congenital anomalies: a population-based register study. Sci Rep. 2023;13(1):2198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.DeSilva M, Munoz FM, McMillan M, Kawai AT, Marshall H, Macartney KK, et al. Congenital anomalies: case definition and guidelines for data collection, analysis, and presentation of immunization safety data. Vaccine. 2016;34(49):6015–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Verma RP. Evaluation and risk assessment of congenital anomalies in neonates. Children (Basel). 2021;8(10):862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Feldkamp ML, Carey JC, Byrne JLB, Krikov S, Botto LD. Etiology and clinical presentation of birth defects: population based study. BMJ. 2017;357: j2249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kang H, Wang L, Li X, Gao C, Xie Y, Hu Y. Application of chromosome microarray analysis and karyotyping in diagnostic assessment of abnormal down syndrome screening results. BMC Pregnancy Childbirth. 2022;22(1):813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Mastromoro G, Khaleghi Hashemian N, Guadagnolo D, Giuffrida MG, Torres B, Bernardini L, et al. Chromosomal microarray analysis in fetuses detected with isolated cardiovascular malformation: a multicenter study, systematic review of the literature and meta-analysis. Diagnostics (Basel). 2022;12(6):1328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Miller DT, Adam MP, Aradhya S, Biesecker LG, Brothman AR, Carter NP, et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am J Hum Genet. 2010;86(5):749–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lee H, Deignan JL, Dorrani N, Strom SP, Kantarci S, Quintero-Rivera F, et al. Clinical exome sequencing for genetic identification of rare Mendelian disorders. JAMA. 2014;312(18):1880–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ewans LJ, Minoche AE, Schofield D, Shrestha R, Puttick C, Zhu Y, et al. Whole exome and genome sequencing in mendelian disorders: a diagnostic and health economic analysis. Eur J Hum Genet. 2022;30(10):1121–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bick D, Jones M, Taylor SL, Taft RJ, Belmont J. Case for genome sequencing in infants and children with rare, undiagnosed or genetic diseases. J Med Genet. 2019;56(12):783–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Sweeney NM, Nahas SA, Chowdhury S, Batalov S, Clark M, Caylor S, et al. Rapid whole genome sequencing impacts care and resource utilization in infants with congenital heart disease. NPJ Genom Med. 2021;6(1):29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Posey JE. Genome sequencing and implications for rare disorders. Orphanet J Rare Dis. 2019;14(1):153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lionel AC, Costain G, Monfared N, Walker S, Reuter MS, Hosseini SM, et al. Improved diagnostic yield compared with targeted gene sequencing panels suggests a role for whole-genome sequencing as a first-tier genetic test. Genet Med. 2018;20(4):435–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kingsmore SF, Cole FS. The role of genome sequencing in neonatal intensive care units. Annu Rev Genomics Hum Genet. 2022;23:427–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.French CE, Delon I, Dolling H, Sanchis-Juan A, Shamardina O, Megy K, et al. Whole genome sequencing reveals that genetic conditions are frequent in intensively ill children. Intensive Care Med. 2019;45(5):627–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Investigators GPP, Smedley D, Smith KR, Martin A, Thomas EA, McDonagh EM, et al. 100,000 genomes pilot on rare-disease diagnosis in health care - preliminary report. N Engl J Med. 2021;385(20):1868–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Misun Yang JAK, Jo HS, Park J-H, Ahn SY, Sung SI, Park WS, Cho H-W, Kim J-M, Park H-H, Jang J-H, Chang YS. Diagnostic utility of whole genome sequencing after negative karyotyping/chromosomal microarray in infants born with multiple congenital anomalies. J Korean Med Sci. 2024;39(36):e250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Jo HS, Yang M, Ahn SY, Sung SI, Park WS, Jang JH, et al. Optimal protocols and management of clinical and genomic data collection to assist in the early diagnosis and treatment of multiple congenital anomalies. Children (Basel). 2023;10(10):1673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Fu JM, Satterstrom FK, Peng M, Brand H, Collins RL, Dong S, et al. Rare coding variation provides insight into the genetic architecture and phenotypic context of autism. Nat Genet. 2022;54(9):1320–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The ensembl variant effect predictor. Genome Biol. 2016;17(1):122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Belyeu JR, Chowdhury M, Brown J, Pedersen BS, Cormier MJ, Quinlan AR, et al. Samplot: a platform for structural variant visual validation and automated filtering. Genome Biol. 2021;22(1):161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kopanos C, Tsiolkas V, Kouris A, Chapple CE, Albarca Aguilera M, Meyer R, et al. VarSome: the human genomic variant search engine. Bioinformatics. 2019;35(11):1978–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, et al. Predicting splicing from primary sequence with deep learning. Cell. 2019;176(3):535-48.e24. [DOI] [PubMed] [Google Scholar]
- 32.Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. Genome Res. 2001;11(5):863–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rodrigues CHM, Pires DEV, Ascher DB. DynaMut2: assessing changes in stability and flexibility upon single and multiple point missense mutations. Protein Sci. 2021;30(1):60–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.UniProt C. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2023;51(D1):D523–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Uhlen M, Oksvold P, Fagerberg L, Lundberg E, Jonasson K, Forsberg M, et al. Towards a knowledge-based human protein atlas. Nat Biotechnol. 2010;28(12):1248–50. [DOI] [PubMed] [Google Scholar]
- 36.Adamson KI, Sheridan E, Grierson AJ. Use of zebrafish models to investigate rare human disease. J Med Genet. 2018;55(10):641–9. [DOI] [PubMed] [Google Scholar]
- 37.Kim OH, Cho HJ, Han E, Hong TI, Ariyasiri K, Choi JH, et al. Zebrafish knockout of down syndrome gene, DYRK1A, shows social impairments relevant to autism. Mol Autism. 2017;8:50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Kushnir A, Betzenhauser MJ, Marks AR. Ryanodine receptor studies using genetically engineered mice. FEBS Lett. 2010;584(10):1956–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lanner JT, Georgiou DK, Joshi AD, Hamilton SL. Ryanodine receptors: structure, expression, molecular details, and function in calcium release. Cold Spring Harb Perspect Biol. 2010;2(11): a003996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Morton SU, Quiat D, Seidman JG, Seidman CE. Genomic frontiers in congenital heart disease. Nat Rev Cardiol. 2022;19(1):26–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lowther C, Speevak M, Armour CM, Goh ES, Graham GE, Li C, et al. Molecular characterization of NRXN1 deletions from 19,263 clinical microarray cases identifies exons important for neurodevelopmental disease expression. Genet Med. 2017;19(1):53–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Jordan VK, Beck TF, Hernandez-Garcia A, Kundert PN, Kim BJ, Jhangiani SN, et al. The role of FREM2 and FRAS1 in the development of congenital diaphragmatic hernia. Hum Mol Genet. 2018;27(12):2064–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Vuckovic D, Mezzavilla M, Cocca M, Morgan A, Brumat M, Catamo E, et al. Whole-genome sequencing reveals new insights into age-related hearing loss: cumulative effects, pleiotropy and the role of selection. Eur J Hum Genet. 2018;26(8):1167–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kraus DM, Elliott GS, Chute H, Horan T, Pfenninger KH, Sanford SD, et al. CSMD1 is a novel multiple domain complement-regulatory protein highly expressed in the central nervous system and epithelial tissues. J Immunol. 2006;176(7):4419–30. [DOI] [PubMed] [Google Scholar]
- 45.Ermis Akyuz E, Bell SM. The Diverse role of CUB and Sushi multiple domains 1 (CSMD1) in human diseases. Genes (Basel). 2022;13(12):2332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Wolf NI, Salomons GS, Rodenburg RJ, Pouwels PJ, Schieving JH, Derks TG, et al. Mutations in RARS cause hypomyelination. Ann Neurol. 2014;76(1):134–9. [DOI] [PubMed] [Google Scholar]
- 47.Taft RJ, Vanderver A, Leventer RJ, Damiani SA, Simons C, Grimmond SM, et al. Mutations in DARS cause hypomyelination with brain stem and spinal cord involvement and leg spasticity. Am J Hum Genet. 2013;92(5):774–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Nafisinia M, Sobreira N, Riley L, Gold W, Uhlenberg B, Weiss C, et al. Mutations in RARS cause a hypomyelination disorder akin to Pelizaeus-Merzbacher disease. Eur J Hum Genet. 2017;25(10):1134–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Fuchs SA, Schene IF, Kok G, Jansen JM, Nikkels PGJ, van Gassen KLI, et al. Aminoacyl-tRNA synthetase deficiencies in search of common themes. Genet Med. 2019;21(2):319–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Stittrich AB, Lehman A, Bodian DL, Ashworth J, Zong Z, Li H, et al. Mutations in NOTCH1 cause Adams-Oliver syndrome. Am J Hum Genet. 2014;95(3):275–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Penton AL, Leonard LD, Spinner NB. Notch signaling in human development and disease. Semin Cell Dev Biol. 2012;23(4):450–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Sachan N, Sharma V, Mutsuddi M, Mukherjee A. Notch signalling: multifaceted role in development and disease. FEBS J. 2023;291:3030. [DOI] [PubMed] [Google Scholar]
- 53.Stevenson RE, Hunter AG. Considering the embryopathogenesis of VACTERL association. Mol Syndromol. 2013;4(1–2):7–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data generated or analyzed during this study are included in this published article and its supplementary information files.


