Abstract
Background:
Tetralogy of Fallot (TOF) is the most common cyanotic heart defect in newborns, with a complex etiology and genetic variation considered to be one of the main pathogenic factors. Identifying genetic variations associated with TOF has important clinical value for understanding its pathogenesis, patient susceptibility, and prognosis of patients with TOF. Therefore, this study aimed to identify potential pathogenic genes of TOF through comprehensive genetic analysis.
Materials and Methods:
In this study, we employed whole exome sequencing (WES) of the DNA of 47 Chinese children who received surgical TOF treatment at the Children’s Hospital of Zhejiang University of Medicine and processed for DNA extraction and quantification of the DNA following WES using the Illumina NovaSeq platform. WES data undergo strict quality control and analysis processes including alignment, postprocessing, variant calling, annotation, and prioritization. Key tools, such as GATK’s haplotype calling module and Annotate Variation, were used for variant annotation. In addition, by combining bioinformatics tools such as SIFT, Polyphen2, and Clin Pred, we evaluated the potential impact of nonsynonymous mutations on protein function and referred to relevant literature to support our prediction.
Results:
Comprehensive data analysis and quality assessment analysis corroborated the data generated from the WES dataset of 47 patients with TOF. Interpreting variants from the perspective of clinical pathogenicity results revealed a novel polymorphism and variant associated with TOF. The identified genetic results revealed evidence for a major contribution of MUTYH, RARB, GFM1, PDZD2, CEP57, DCPS, POMT2, BUB1B, CYP19A1, MAZ, USP10, and TCF3 and provided novel findings for functionally interacting proteins associated with the pathomechanism of TOF. Seven pathogenic variants related to TOF were detected, most of which were previously unreported in this cohort.
Conclusions:
The genetic variations discovered in this study emphasize the importance of genetic factors in the pathogenesis of TOF, revealing its complex molecular pathways and protein–protein interactions. The study of genetic diversity provides a new perspective for understanding the etiology of TOF and promotes an in-depth exploration of its pathological mechanisms. These findings lay the foundation for subsequent clinical research and the development of treatment strategies.
Keywords: congenital heart disease, tetralogy of fallot, whole exome sequencing, genetic variant
Introduction
Congenital heart disease (CHD) is the predominant birth defect, affecting eight out of every 1000 live births (Ferencz et al., 1985). CHD encompasses a wide range of diverse cardiovascular characteristics, ranging from isolated and specific problems to intricate structural anomalies. Tetralogy of Fallot (TOF) is the most common complex, cyanotic congenital cardiac anomaly, affecting approximately one in every 3000 newborns (Krieger and Valente, 2020). TOF is a condition identified by the presence of a deformity in the cardiac outflow tract. This malformation occurs due to the uneven separation between the truncus arteriosus and bulbus arteriosus during the embryonic stage and is characterized by four specific structural abnormalities in the ventricular outflow anatomy that is present after birth (Shinebourne et al., 2006). Due to surgical procedures in infancy, 85–90% of children with TOF survive until at least 30 years of age (Apitz et al., 2009). The cause of TOF is uncertain, and no individual gene can be identified as being solely responsible for the characteristics of the disease. Therefore, the genetic status of individuals with syndromic TOF offers crucial insights into the causal genes in certain patients. Approximately 20% of the cases are linked to a known condition or chromosomal abnormalities. Approximately 15% of patients with TOF are affected by 22q11.2 deletion syndrome, with the main causative gene being TBX1 (Mercer-Rosa et al., 2015). Around 80% of TOF cases are nonsyndromic and typically lack an identifiable cause mainly because of their non-Mendelian inheritance patterns Palomino Doza et al., 2013; Althali and Hentges, 2022). A polygenic genetic architecture has been proposed, and genome-wide techniques have been used to understand the complicated genetic changes associated with TOF and other CHDS (Zaidi et al., 2013; Soemedi et al., 2012a). Heart development is a more complex and systematic process that involves the formation of the cardiac tube, the creation of loops, the separation of structures within the heart, and the establishment of vascular connections. The growth of the heart is intricately linked to several genes that are expressed and interact within distinct spatial and temporal contexts to establish a highly precise regulatory mechanism. Any deviation in the expression of these genes is likely to affect the development of the heart, resulting in the production of abnormal heart structures. Nevertheless, the precise mechanism by which these molecules cause disease remains unknown (Soemedi et al., 2012a; Soemedi et al., 2012b). Research supports that whole exome sequencing (WES) has led to the discovery of novel candidate genes for CHD (Ekure et al., 2021). WES has been successfully used to identify new CHD-candidate genes (Zaidi et al., 2013; Sifrim et al., 2016). Several studies have indicated that gene variants can produce specific phenotypic effects. For example, deletions in 22q11.2 or mutations in TBX1 are typically associated with defects in the outflow tract and major vessels (Griffin et al., 2010), while Down syndrome or mutations in NKX2-5 typically cause septal defects (Benhaourech et al., 2016). To date, there have been several WES studies related to decoding the genetic inheritances in pediatric TOF using WES and integrated bioinformatics analysis carried out. WES selectively targets and sequences the genome and identifies genetic variations within the exons, which are coding regions of genes (Yoshihara et al., 2016). Thus, in this study, we utilized WES to analyze the genetic data of 48 patients with TOF, with the aim of identifying potential pathogenic genes associated with TOF. Through comprehensive bioinformatics analysis, we screened the genetic variations found in these patients to identify possible pathological variants, thereby enhancing our understanding of the genetic underpinnings of TOF and contributing to the identification of novel genetic markers and therapeutic targets for this congenital heart condition.
Materials and Methods
Subjects, sample collection, study design, and participant recruitment
A total of 47 patients whose Chinese children received surgical TOF treatment at the Children’s Hospital of Zhejiang University of Medicine between October 2018 and July 2020 were enrolled in this study. The 47 children aged 4 months to 14 years included 27 males and 20 female babies. All 47 children were diagnosed with TOF by echocardiogram, clinical symptoms and signs, and intraoperative findings as described previously. The collection of blood samples from patients was carried out by a team of trained medical professionals from the Children’s Hospital of Zhejiang University of Medicine using a standardized venipuncture technique. Approximately 5–10 mL of venous blood was drawn from each participant using sterile venipuncture equipment.
WES
Genomic DNA was extracted from 200 μL of peripheral blood from each of the 47 patients using the Qiagen Genomic DNA Extraction Kit (QIAGEN USA). DNA quality was assessed using a Thermo Fisher spectrophotometer (Thermo Fisher, USA) to determine concentration and purity. The library was prepared by using a high-performance ultrasonic sample processing system to randomly fragment DNA samples, selecting fragment lengths of approximately 150–250 bp and performing end repair. High-quality DNA was subjected to real-time gene amplification fluorescence detection to ensure the library’s effective concentration was above 3 nmol/L and accurately quantified. DNA was sequenced on the Illumina NovaSeq platform by paired-end sequencing, 150 bp sequence length, and a sequencing depth of 200× in collaboration with Zhejiang Bo Sheng Biotechnology Co., Ltd.
WES data analysis pipeline
The whole exome sequence data analysis pipeline was followed by GATK4, and the overall flow diagram is shown (Seaby et al., 2016). Briefly, sequenced reads were mapped to the hg38 reference genome (Table 2) using the mapping algorithm from BWA using various tools such as Illumina calling software, FastQC, and MultiQC. Deduplication of the reads was carried out using the Picard module in GATK from the BAM files. Finally, WES reads were recalibrated using the base recalibration module in GATK to recalibrate the reads and their features, following the application of the Base Quality Score Recalibration in GATK to update the bam files with recalibrated base quality scores.
Table 2.
Forty-Seven TOF Sample Alignment to hg38 Reference Genome Statistic
| Sample name | Error rate | M Nonprimary | MReads mapped | % Mapped | MTotal seqs |
|---|---|---|---|---|---|
| P1 | 0.51% | 0.9 | 66.2 | 99.80% | 66.4 |
| P2 | 0.46% | 2.5 | 73.2 | 99.80% | 73.3 |
| P3 | 0.39% | 1.6 | 75.8 | 99.90% | 75.9 |
| P4 | 0.35% | 0.2 | 39.6 | 100.00% | 39.7 |
| P5 | 0.29% | 0.4 | 54.7 | 99.90% | 54.7 |
| P6 | 0.26% | 0.3 | 49.6 | 100.00% | 49.6 |
| P7 | 0.31% | 0.2 | 55.1 | 100.00% | 55.2 |
| P8 | 0.42% | 0.2 | 64.3 | 100.00% | 64.3 |
| P9 | 0.56% | 0.2 | 83.9 | 100.00% | 84 |
| P10 | 0.55% | 0.2 | 98.8 | 100.00% | 98.8 |
| P11 | 0.56% | 0.2 | 100.1 | 100.00% | 100.1 |
| P12 | 0.32% | 0.2 | 48 | 100.00% | 48 |
| P13 | 0.76% | 1.5 | 72.9 | 100.00% | 72.9 |
| P14 | 0.58% | 0.2 | 56.1 | 100.00% | 56.1 |
| P15 | 0.33% | 0.3 | 123.6 | 100.00% | 123.6 |
| P16 | 0.61% | 0.9 | 83.3 | 100.00% | 83.3 |
| P17 | 0.77% | 0.2 | 60.7 | 100.00% | 60.7 |
| P18 | 0.74% | 0.2 | 44.3 | 100.00% | 44.3 |
| P19 | 0.90% | 0.2 | 40.9 | 100.00% | 40.9 |
| P20 | 0.90% | 3.9 | 111.7 | 100.00% | 111.7 |
| P21 | 0.82% | 1.5 | 88.8 | 100.00% | 88.8 |
| P22 | 0.82% | 0.4 | 88.9 | 100.00% | 88.9 |
| P23 | 0.74% | 1.7 | 74.7 | 100.00% | 74.7 |
| P24 | 0.87% | 2.6 | 69.1 | 100.00% | 69.1 |
| P25 | 0.35% | 0.5 | 120.1 | 100.00% | 120.1 |
| P26 | 0.46% | 0.4 | 113.2 | 100.00% | 113.3 |
| P27 | 0.44% | 0.7 | 106.3 | 100.00% | 106.3 |
| P28 | 0.27% | 1.5 | 62.4 | 100.00% | 62.4 |
| P29 | 0.54% | 4.3 | 108.1 | 100.00% | 108.1 |
| P30 | 0.36% | 2.2 | 115.2 | 100.00% | 115.2 |
| P31 | 0.33% | 0.8 | 121.6 | 100.00% | 121.6 |
| P32 | 0.41% | 1.3 | 78.2 | 100.00% | 78.2 |
| P33 | 0.54% | 1.2 | 108 | 100.00% | 108 |
| P34 | 0.55% | 2.3 | 114.2 | 100.00% | 114.3 |
| P35 | 0.36% | 3.8 | 151.4 | 100.00% | 151.4 |
| P36 | 0.86% | 0.8 | 61 | 100.00% | 61.1 |
| P37 | 0.61% | 0.8 | 82.3 | 100.00% | 82.3 |
| P38 | 0.47% | 0.6 | 103.5 | 100.00% | 103.5 |
| P39 | 0.72% | 0.4 | 61.2 | 100.00% | 61.2 |
| P40 | 0.54% | 0.3 | 96.3 | 100.00% | 96.3 |
| P41 | 0.45% | 0.3 | 41.2 | 100.00% | 41.2 |
| P42 | 0.23% | 0.4 | 47.6 | 100.00% | 47.6 |
| P43 | 0.33% | 0.3 | 105.4 | 100.00% | 105.4 |
| P44 | 0.73% | 0.4 | 73.5 | 100.00% | 73.5 |
| P45 | 0.87% | 2.3 | 79.1 | 100.00% | 79.1 |
| P46 | 0.75% | 1.8 | 68.3 | 100.00% | 68.3 |
| P47 | 0.54% | 0.6 | 96.3 | 100.00% | 96.3 |
TOF, tetralogy of Fallot.
Identification of germline mutations and variants annotation
The Haplotype Caller module in GATK was employed to identify germline mutations following Germline VCF files that enlist the identified germline mutations. Similarly, the ANNOVAR (Annotate Variation) software tool is used for annotating genetic variants detected in genomes as this tool is widely used in genomic research to understand the functional implications of genetic variations such as gene-based annotation, which annotates the relationship between the variant sites and known genes, as well as their functional impact. Similarly, region-based annotation was carried out to identify the relationship between variant sites and specific genomic regions following filter-based annotation to analyze whether the variant sites are located in specific databases such as dbSNP, 1000G, ESP6500, clinvar, and Online Mendelian Inheritance in Man (OMIM), and to calculate scores from software such as SIFT, PolyPhen2, and Mutation Taster.
Functional analysis of single nucleotide polymorphism and INDELs
SIFT results are denoted as D (deleterious) or T (tolerated), and Polyphen2 uses Bayesian machine learning to assess amino acid changes. The likelihood ratio test (LRT) provides a score between 0 and 1 to evaluate the significance of a variant site. Higher scores indicate greater significance. The LRT results are categorized as D (deleterious), N (neutral), or U (unknown). Mutation Taster assesses the importance of variant sites with a score ranging from 0 to 1, where a higher score signifies greater importance and it includes A (disease-causing automatic), D (disease-causing), N (polymorphism), and P (polymorphism automatic).
The pathogenicity of single nucleotide polymorphisms and INDELs prediction
The variants were categorized based on the following classification such as (a) Pathogenic: variants reported to cause disease, (b) Likely Pathogenic: Variants with the potential to cause disease, (c) Likely Benign: Variants likely harmless, (d) Risk Factors: Variants that may increase disease risk, (e) Drug Response: Variants potentially affecting drug efficacy, (f) Uncertain Significance: Variants with unclear implications, and (g) Benign: Harmless variants. Similarly, variant filtering rules were (a) the selection of variants occurring in the exonic, splicing, and untranslated region (UTR) regions within genes and (b) variants with a frequency above 0.01 in databases such as 1000G, GnomAD, ExAC, and ESP6500 are excluded.
Results
All 47 children with TOF presented with cyanosis and grunting systolic-ejection murmurs along with the left sternal border, namely between the second and fourth ribs. The echocardiography revealed enlargement of the right ventricle, thickening of the anterior wall of the right ventricle, stenosis in the right ventricular outflow tract and pulmonary artery, a ventricular septal defect, and aortic overriding. The echocardiographic findings of 47 children with TOF were examined across various heart regions, including the right ventricle, right ventricular anterior wall, right ventricular outflow tract, main pulmonary artery, left pulmonary artery, right pulmonary artery, ventricular septal defect, and left ventricular ejection fraction. The 47 children with tracheoesophageal fistula TOF showed no further abnormalities (Table 1).
Table 1.
The Clinical Information in the 47 Children with Tetralogy of Fallot
| ID | Sex | Age (month/years) | TOF | Ventricular septal defect (VSD) | Patent foramen ovale (PFO) | Right ventricular outflow tract stenosis | Right ventricular wall and ventricular septal hypertrophy | Atrial septal defect (ASD) | Patent ductus arteriosus (PDA) |
|---|---|---|---|---|---|---|---|---|---|
| P1 | Male | 1 year | Yes | No | Yes | No | No | No | No |
| P2 | Male | 6 months | Yes | No | Yes | No | No | No | Yes |
| P3 | Male | 8 months | Yes | No | No | No | No | No | No |
| P4 | Male | 1 year | Yes | No | Yes | No | No | No | No |
| P5 | Male | 1 month | Yes | No | No | No | No | No | No |
| P6 | Female | 1 year | Yes | No | No | No | No | No | No |
| P7 | Female | 9 months | Yes | No | No | No | No | Yes | Yes |
| P8 | Male | 7 months | Yes | No | No | No | No | Yes | No |
| P9 | Female | 10 months | Yes | No | No | No | No | Yes | Yes |
| P10 | Female | — | Yes | No | No | No | No | Yes | No |
| P11 | Female | — | Yes | No | No | No | No | Yes | Yes |
| P12 | Male | 4 years | Yes | No | Yes | No | No | Yes | No |
| P13 | Male | — | Yes | No | Yes | No | No | No | No |
| P14 | Male | 5 years | Yes | No | Yes | No | No | No | No |
| P15 | Female | — | Yes | No | No | No | No | No | No |
| P16 | Male | 4 years | Yes | No | Yes | No | No | No | No |
| P17 | Male | 1 year | Yes | No | No | No | No | Yes | No |
| P18 | Female | 9 months | Yes | No | No | No | No | Yes | No |
| P19 | Female | 6 months | Yes | No | Yes | No | No | Yes | Yes |
| P20 | Female | 10 months | Yes | No | Yes | No | No | No | Yes |
| P21 | Female | — | Yes | No | No | No | No | No | No |
| P22 | Female | — | Yes | Yes | Yes | No | No | No | No |
| P23 | Male | — | Yes | No | No | No | No | No | No |
| P24 | Female | — | Yes | No | Yes | No | No | Yes | No |
| P25 | Female | — | Yes | No | No | No | No | No | No |
| P26 | Female | — | Yes | No | Yes | No | No | Yes | No |
| P27 | Male | — | Yes | No | No | No | No | Yes | No |
| P28 | Male | — | Yes | No | No | No | No | No | No |
| P29 | Male | 6 years | Yes | No | No | No | No | No | No |
| P30 | Female | 10 months | Yes | No | No | No | No | No | No |
| P31 | Male | 8 months | Yes | No | No | No | No | Yes | No |
| P32 | Male | 1 year | Yes | No | No | No | No | No | No |
| P33 | Female | 10 months | Yes | No | Yes | No | No | No | No |
| P34 | Male | 6 months | Yes | No | Yes | No | No | No | Yes |
| P35 | Male | 14 years | Yes | No | No | No | No | No | Yes |
| P36 | Male | 10 months | Yes | No | Yes | No | No | No | No |
| P37 | Female | — | Yes | No | No | No | No | No | No |
| P38 | Male | — | Yes | No | Yes | No | No | No | Yes |
| P39 | Female | 8 months | Yes | No | No | No | No | Yes | No |
| P40 | Female | 8 months | Yes | No | No | No | No | No | Yes |
| P41 | Male | 9 months | Yes | No | Yes | No | No | Yes | No |
| P42 | Male | — | Yes | No | No | No | No | No | No |
| P43 | Female | 7 months | Yes | No | Yes | No | No | Yes | Yes |
| P44 | Male | 4 months | Yes | No | No | No | No | No | No |
| P45 | Male | 9 months | Yes | No | Yes | No | No | Yes | No |
| P46 | Male | 8 months | Yes | No | No | No | No | Yes | No |
| P47 | Male | 1 year | Yes | No | no | No | No | No | Yes |
POMT, protein O-mannosyl transferase; TOF, tetralogy of Fallot.
TOF germline variants calling and statistic
A total of 115,211 variants from the WES data passed quality control, including 107,351 single nucleotide polymorphisms (SNPs) and 7860 indels among 47 TOF samples. In our analysis of variant calling from 47 TOF patient samples, both raw and postfiltering data were assessed. Each sample maintained a consistent count throughout the filtering process, with all 47 accounted for. Initially, a raw count of 138,455 variants was identified, which upon applying stringent filtering criteria, and reduced to 115,211 high-confidence variants. SNP identification saw a reduction from 129,154 to 107,351 postfiltering, whereas indels decreased from 10,529 to 7860, indicating the filter’s effectiveness in refining variant data. Variants classified as “others” saw a significant reduction from 1422 to 520, further emphasizing the filter’s stringent nature. The analysis also highlighted multiallelic sites, where the raw count of 10,458 was narrowed down to 6575 following the application of filters. This was also reflected in the number of multiallelic SNP sites, which saw a decrease from 4924 to 3267. The postfiltering results suggest a robust approach to ensure data quality, which is crucial for downstream genetic analysis, by retaining only the most reliable variant calls. These statistics not only demonstrate the utility of the filtering process in refining the dataset but also help in reducing the likelihood of false-positive variant calls that can impede further genetic analysis (Table 3) (Fig. 1).
Table 3.
Forty-Seven TOF Sample Variants Call Statics
| Keys | Raw variants called | Pass filter |
|---|---|---|
| Number of samples: | 47 | 47 |
| Number of records: | 138455 | 115211 |
| Number of SNPs: | 129154 | 107351 |
| Number of indels: | 10529 | 7860 |
| Number of others: | 1422 | 520 |
| Number of multiallelic sites: | 10458 | 6575 |
| Number of multiallelic SNP sites: | 4924 | 3267 |
SNP, single nucleotide polymorphism; TOF, tetralogy of Fallot.
FIG. 1.
Variant Statistics for TOF Substitutions and Transitions/Transversions (Ts/Tv) ratio. (A) Ts/Tv ratio versus variant quality, with sites sorted by descending quality score. For human exomes, the Ts/Tv ratio is typically 2.0–2.1, and deviations from this range at lower quality scores may indicate reduced confidence in variant calls. (B) Variant allele frequency distribution for SNPs. Bars represent the frequency of nucleotide mutations (e.g., A→C, A→G), though the mutation categories are unclear without a legend. (C) Insertion and deletion (InDel) length distribution. A strong peak at 0 indicates most InDels are small, with frequency decreasing as length increases. SNP, single nucleotide polymorphism; TOF, tetralogy of Fallot.
Variants annotated to the genome
The annotation of variants encompasses a comprehensive range of information, including minor allele frequencies, predictions of protein functional impact, nucleotide conservation scores, clinical significance from ClinVar, associated diseases or traits from the Human Gene Mutation Database, genetic disorder correlations from OMIM, and overall pathogenicity assessments.
Table 4 shows five primary categories of genetic variant analysis results. The categories are processed variants, novel/existing variations, overlapping transcripts, genes, and regulatory characteristics. Categories are in the left column. The right column shows category matching counts. All 115211 variants were handled. They included 102,584 (89.0%) existing variations and 12,627 (11.0%) new ones. Similarities were found between the variants and 49,971 genes, 304,793 transcripts, and 19,615 regulatory features (Table 5, Fig. 2). Moreover, the variants overlapped with regulatory features in the genome, with 19,615 regulatory characteristics identified, suggests that a significant portion of the variants may have regulatory implications on gene expression and are important in understanding the overall genomic architecture.
Table 4.
Summary Statistics of Annotated Variants
| Category | Count |
|---|---|
| Variants processed | 115211 |
| Novel/existing variants | 12627 (11.0) / 102584 (89.0) |
| Overlapped genes | 49971 |
| Overlapped transcripts | 304793 |
| Overlapped regulatory features | 19615 |
Table 5.
Proportions of Annotated Variants for All Consequences and Coding Consequences
| Consequence type | Proportion (A) | Proportion (B) |
|---|---|---|
| Missense variant | 47% | 14% |
| Synonymous variant | 45% | 13% |
| Coding sequence variant | 5% | — |
| Inframe deletion | 1% | — |
| Frameshift variant | 1% | — |
| Stop gained | 1% | — |
| Inframe insertion | 0% | — |
| Start lost | 0% | — |
| Stop retained variant | 0% | — |
| Intron variant | — | 22% |
| Downstream gene variant | — | 11% |
| Upstream gene variant | — | 9% |
| Noncoding transcript exon | — | 4% |
| Noncoding transcript variant | — | 4% |
| Splice polypyrimidine tract | — | 4% |
| 3′ prime UTR variant | — | 4% |
| Others | 1% | 4% |
FIG. 2.
Circos Plot of Genetic Variants Across Human Chromosomes. This Circos plot illustrates the distribution of genetic variants across chromosomes 1–22, X, and Y. The outer ring displays chromosome numbers, while the inner rings represent variant density and distribution. Colors denote the functional impact of variants, with high-impact variants (e.g., nonsynonymous mutations) in red and low-impact variants in blue. The proximity of points to the center reflects variant frequency, with more frequent variants located closer to the edge. Connecting lines indicate structural variations, such as gene fusions or translocations.
In this study, we identified several mutations in keratin-associated proteins (KAPs), Kelch-like proteins, SERPINA9, and ABCG1, which may have clinical significance. The KRTAP10-1 gene was found to have a point mutation P39R at the exon 1 active site (C116G) on chromosome 21. While this mutation occurs in a critical region, further experimental evidence is required to confirm its role as a disease-causing variant. Similarly, the KLHL33 gene exhibited a mutation L65V at the exon 1 active site (C139G) also located on chromosome 21, but additional studies are necessary to establish its pathogenicity. The SERPINA9 gene, which encodes a member of the serpin family, was found to have 13 distinct mutations, with notable ones including C372A, C612A, and C312A in exon 2, and C312A and C219A in exon 3. The ABCG1 gene, part of the ATP-binding cassette (ABC) transporter family, was identified with eight exonic mutations spread across exons 3, 4, and 5. These mutations suggest that ABCG1 could be a potential candidate gene for the disease, though further functional studies are needed to confirm their pathogenicity. We are pursuing additional research to validate their biological impact.
Most of the genes confronted with mutations at various positions, such as dynein axonemal heavy chain 1, uncharacterized (HGC6.3), Kelch domain containing 7B (KLHDC7B), endosomal transmembrane epsin interactor 1 (FAM189A2), YLP motif containing 19 (YLPM1), CASK interacting protein 1 (CASKIN1), TMEM114, ZDHHC7, RNF227, SPC24, and SYMPK encodes only two exonic nonsense mutations at various positions (Supplementary Table S2). Although the number of mutations identified was much lower than in other reported genes, it seems to have a role in TOF disease. Moreover, in the WES data associated with TOF, we noted that gene exoribonuclease 1 (ERI1), KLHDC7B, TIR domain containing adaptor protein, SPNS lysolipid transporter 2 (SPNS2), and Mex-3 RNA binding family member D (MEX3D) (Supplementary Table S1) are the genes that are confronted with stop gain exonic mutation at various position.
Analysis of pathogenic variants in coding genes
Pathogenic variants can occur in various types of genes, including noncoding RNA (ncRNA) genes and ncRNA are RNA molecules that do not encode proteins but play crucial roles in the regulation of gene expression and other cellular processes. Here, we categorized the exome sequence data into pathogenic, likely pathogenic, and likely benign. Keeping in mind the recommendation and importance of Mendelian disorders classification, we focused only on the pathogenic variant. Among them, MUTYH functions as a pivotal enzyme in the base excision repair pathway, which is crucial for rectifying DNA errors arising from guanine oxidation. Retinoic acid signaling plays a key role in the development and function of several mammal systems. The GFM1 gene provides instructions for producing an enzyme called mitochondrial translation elongation factor G1. Centrosomal protein 57 encodes a cytoplasmic protein known as translokin, which is involved in intracellular transport. Vacuolar protein sorting-associated protein 11 participates in vesicle-mediated protein trafficking to lysosomal compartments, the protein O-mannosyl transferase (POMT2) gene provides instructions for making one piece of the POMT enzyme complex, BUB1 mitotic checkpoint serine/threonine kinase B, the CYP19A1 gene provides instructions for making an enzyme called aromatase, Myc-associated zinc finger protein, ubiquitin-specific peptidase 10, the PNPLA6 gene provides instructions for making a protein called neuropathy target esterase and bromodomain containing 1 were the genes that confronted with exonic mutations at various positions. Among them, GFM1, BUB1B, MUTYH, RARB, CEP57, POMT2, and MAZ were categorized as Deleterious based on LRT while PDZD2, DCPS, USP10, and PNPLA6 (Table 6 and Supplementary Table S2).
Table 6.
The Genetics of the Genes Such as Mutation and Pathogenic Nature and Their Association with Patients
| Gene name | Position | Ref | AAChange | SIFT | Polyphen2 | LRT | Mutation Taster | InterVar | DBsnp147 | Mutated patients |
|---|---|---|---|---|---|---|---|---|---|---|
| MUTYH | chr1:45332300 | G/A | c.C715T:p.Q239X | T | . | D | A | Pathogenic | rs786203115 | P48 het |
| NPAS2 | chr2:100925235 | G/A | c.G122A:p.R41Q | D | D | D | D | Likely pathogenic | . | P48 het |
| ERBB4 | chr2:211428451 | T/C | c.A2676G:p.I892M | D | D | D | D | Likely pathogenic | . | P14 het |
| RARB | chr3:25461308 | C/G | c.C294G:p.Y98X | T | . | D | A | Pathogenic | . | P45 het |
| GFM1 | chr3:158666361 | C/T | c.C1633T:p.R545X | T | . | D | D | Pathogenic | . | P37 het |
| GPM6A | chr4:176002453 | G/T | c.C138A:p.C46X | . | . | . | . | Likely pathogenic | . | P17 het, P22 het |
| PDZD2 | chr5:32073837 | A/T | c.A2731T:p.K911X | T | . | N | A | Pathogenic | . | P17 het |
| PLXNA4 | chr7:132484863 | G/A | p.Q508X | D | . | . | N | Likely pathogenic | rs867987171 | P3 het |
| LPL | chr8:19955991 | G/A | c.G926A:p.R309H | . | D | U | D | Likely pathogenic | . | P12 het |
| CA1 | chr8:85338347 | G/T | c.C140A:p.P47H | D | D | D | D | Likely pathogenic | . | P16 het |
| SF1 | chr11:64778068 | G/A | c.C325T:p.Q109X | T | . | . | D | Likely pathogenic | . | P10 het |
| CEP57 | chr11:95822526 | C/T | c.C754T:p.Q252X | D | . | D | A | Pathogenic | . | P47 het |
| VPS11 | chr11:119077945 | G/A | c.G1640A:p.R547H | . | . | . | . | Likely pathogenic | rs200774813 | P9 het |
| DCPS | chr11:126331482 | C/T | c.C454T:p.R152X | T | . | N | A | Pathogenic | . | P3 het |
| ATP7B | chr13:51937570 | T/C | c.A3476G:p.N1159S | D | D | D | A | Likely pathogenic | rs121907990 | P41 het |
| POMT2 | chr14:77285035 | C/T | c.G1491A:p.W497X | T | . | D | A | Pathogenic | . | P41 het |
| BUB1B | chr15:40218549 | C/T | c.C2944T:p.Q982X | T | . | D | D | Pathogenic | . | P12 het |
| CYP19A1 | chr15:51218611 | G/A | c.C673T:p.Q225X | D | . | D | A | Pathogenic | . | P36 het |
| MAZ | chr16:29807062 | C/T | c.C208T:p.Q70X | T | . | . | D | Pathogenic | . | P37 het |
| USP10 | chr16:84758745 | A/T | c.A1234T:p.K412X | T | . | N | A | Pathogenic | . | P22 het |
| TCF3 | chr19:1612298 | C/G | c.G1722C:p.Q574H | D | D | . | D | Likely pathogenic | . | P12 het |
| PNPLA6 | chr19:7542814 | C/A | c.C1299A:p.Y433X | T | . | N | A | Pathogenic | . | P45 het |
| PIK3R2 | chr19:18169251 | C/A | c.C2144A:p.P715Q | D | D | D | D | Likely pathogenic | . | P36 het |
| MMP9 | chr20:46010964 | T/C | c.T563C:p.L188P | D | D | D | D | Likely pathogenic | . | P19 het |
| RPGR | chrX:38286703 | C/A | c.G2296T:p.G766X | T | . | U | N | Likely pathogenic | . | P15 het |
LRT, likelihood ratio test; POMT, protein O-mannosyl transferase.
Protein–protein interaction between deleterious variants
Protein–protein interactions (PPIs) involving deleterious variants can have important implications for understanding disease mechanisms and developing targeted therapies (Chelu et al., 2022). To explore the impact of deleterious mutations on protein interactions, we analyzed the PPIs among the genes carrying these mutations. Our analysis revealed a clear interaction between MUTYH and CDC20, indicating a potential link between these proteins. In addition, POMT2 and CYP19A1 exhibited weak interactions, although this interaction was less pronounced and further investigation required. Overall, our results suggest that in addition to the interaction between MUTYH and CDC20, no strong or significant PPIs were detected among the other proteins with deleterious variants. These findings indicate that most of the analyzed proteins appear to function independently in the context of disease development in the TOF (Fig. 3).
FIG. 3.
A protein–protein interaction network. Each node represents a protein, and edges indicate functional or physical interactions. Proteins such as SRD5A3, CYP19A1, POMT2, and others are involved. Line colors represent interaction types: curated (blue), experimental (pink), and predicted (yellow). Nodes are color-coded by biological function, illustrating their roles within the network. POMT, protein O-mannosyl transferase.
Discussion
CHD is the most prevalent type of birth defect, affecting eight out of every 1000 live births, whereas TOF stands out as the most common complex and cyanotic subtype of CHD, with a prevalence of one out of every 3000 births (Saxena et al., 2016). The genetic landscape of TOF, as detailed in our study and supported by the current literature, underscores the significance of chromosomal anomalies and single-gene diseases in the etiology of TOF. Research has clarified the pivotal role of the 22q11.2 deletion syndrome, accounting for a significant proportion of postnatally diagnosed TOF cases (Reuter et al., 2021), thereby accentuating the complex genetic foundation of this condition. The employment of chromosomal microarray analysis has been instrumental in revealing that a considerable number of CHDs (Reuter et al., 2021) can be attributed to copy number variations, reinforcing the necessity for advanced genetic screening techniques for the early detection and management of TOF. In the detailed genetic exploration of TOF using WES, our research has highlighted the extensive genetic heterogeneity that characterizes this CHD (Pan et al., 2023).
WES in a cohort of 48 pediatric patients revealed a wide spectrum of genetic variations, including 107,351 SNPs and 7860 insertions and deletions (indels). This diversity emphasizes the complexity of genetic factors contributing to TOF, beyond the traditionally recognized chromosomal anomalies and single-gene mutations (Pan et al., 2023). Our analysis identified significant pathogenic variants across multiple genes, highlighting the nuanced interplay between genetic elements in cardiac development and function. For instance, the discovery of stop gains and nonsynonymous single nucleotide variants (SNVs) in critical genes such as RARB, FRAS1, ZFHX3, and NPAS2 underlines the potential for substantial disruptions in protein functionality. These disruptions may directly contribute to the pathophysiological mechanisms underlying TOF. For example, variants in RARB are linked to arterial compliance issues and an increased risk of cardiomyopathy, suggesting a heightened susceptibility to TOF. Similarly, mutations in FRAS1, which are associated with heart malformations in Fraser syndrome, offer insights into the genetic mechanisms involved in CHDs. Furthermore, the identification of novel genetic variants previously undocumented in genetic databases (e.g., SYMPK, SNPS2, CASKIN1, and ABCG1) opens new avenues for research. These findings not only enrich our genetic databases but also offer unprecedented insights into potential biomarkers and therapeutic targets for TOF. ABCG1, which is involved in cholesterol transport and homeostasis, exemplifies the broader implications of our findings, suggesting that metabolic dysfunctions may play a role in the spectrum of cardiovascular diseases, extending beyond the direct causation of structural heart defects. Interestingly, upon analyzing mutations across all patients in the cohort, it was observed that the genes KRTAP10-1, KLHL33, ERI1, SERPINA9, and ABCG1 exhibited mutations in a large range of patients, indicating a need for further investigations into their roles in TOF. These genes have a wide array of functions in cellular processes, with no apparent shared or related roles in cardiac function.
In our study, we found that KAPs with a higher frequency of SNPs in heterozygous and homozygous families (73%) are a diverse group of proteins that interact with keratins and play essential roles in the formation, structure, and function of hair fibers, and a P39R mutation at the N-terminal domain of human αB-crystallin regulates its oligomeric state and chaperone-like activity (Akter et al., 2019). KLHL3 protein serves as a recognition factor for the target of the E3 ubiquitin ligase complex, facilitating the attachment of the complex to its target. The frequency of the SNPs was 34% in KLHL3 genes found in both homozygous and heterozygous families. Following KAP, KLHL3, and ERI1 were found to have higher rates of homozygous and heterozygous SNPs, indicating polymorphisms among the families with CHD disease.
ERI1 encoded mutations in 13 heterozygous and homozygous families that constitute 26% of SNPs frequency, which may have a significant impact on function compared to SPNS2 or MEX3D which has 3–4 mutations only. Nonsense mutations contribute to both rare genetic disorders such as Duchenne muscular dystrophy, cystic fibrosis, and hemophilia, as well as more common conditions, such as cancers, metabolic disorders, and neurological diseases (Gerasimavicius et al., 2022). Some genes such as SERPIN9 have been identified to encode nonsense mutations. Correcting a nonsense mutation restores the functional expression of the affected gene. Promoting premature termination codon (PTC) readthrough leads to the synthesis of a full-length protein, which may differ by one amino acid from the wild-type protein because the inserted amino acid at the PTC position can differ from that encoded by the normal DNA sequence (Bidou et al., 2012).
Although all the mutations are known to cause various diseases such as MUTYH mutations they have been associated with MUTYH-associated polyposis syndrome (MAP), a hereditary condition characterized by the presence of multiple colorectal adenomas. Individuals with MAP have a significantly increased lifetime risk of gastrointestinal cancers (Curia et al., 2020). Moreover, all the variants and associated genes are well-documented to cause various diseases. In this variant dataset, some genes, such as GFM1, BUB1B, MUTYH, RARB, CEP57, POMT2, and MAZ, were categorized as deleterious based on the LRT, as they were confronted with deleterious mutation. A genetic variant that has been shown to be correlated with increased susceptibility to a particular disease. For mitochondrial protein synthesis to occur three elongation factors are needed, which include EF-Tu (TUFM), EF-Ts (TSFM), and EF-G1 (GFM1). Pathogenic variants in any of these three components can impair mitochondrial translation, leading to a deficiency in oxidative phosphorylation (Shen et al., 2020). Thus, a mutation in GFM1 is strongly associated with the TOF disease.
Similarly, it has been documented that CEP57 mutations are responsible for mosaic-variegated aneuploidy (MVA) syndromes. Mutations in the CEP57 gene have been identified as being responsible for MVA syndrome. MVA is a rare genetic disorder characterized by abnormal chromosome numbers (aneuploidy) in a mosaic pattern, meaning that some cells have an abnormal number of chromosomes, whereas others have a normal number. This leads to a variety of clinical features, including developmental delay, growth retardation, intellectual disability, and an increased risk of cancer (Tatton-Brown et al., 2011). Overall, all the deleterious mutations were associated with several illnesses, confirming the role of deleterious variants in the severity of the TOF. In contrast, USP10, PNPLA6, DCPS, and PDZD2 are reported to have variants as neutral.
Notably, USP10 drives the progression of triple-negative breast cancer through its role in stabilizing TCF4 protein (Bhattacharya et al., 2020). PNPLA6 gene mutations give rise to a spectrum of neurological conditions termed PNPLA6-related disorders, NPLA6 disorders span a phenotypic continuum characterized by variable combinations of cerebellar ataxia. Upper motor neuron involvement manifests as spasticity and/or brisk reflexes (Liu and Hufnagel, 2023). Loss of function of the mRNA decapping enzyme DCPS, which serves as a scavenger, causes syndromic intellectual disability with associated neuromuscular defects. The PDZ protein, produced by this gene, comprises six domains and shares sequence similarities with pro-interleukin-16 (pro-IL-16). Similar to pro-IL-16, PDZ localizes to the endoplasmic reticulum and is likely cleaved by a caspase to form a secreted peptide containing two PDZ domains.
The apparent underrepresentation of mutations in known pathogenic genes may suggest a more intricate relationship between the genotype and phenotype. Certain mutations in less characterized regions of the genome may contribute to disease manifestation in ways that are not yet fully understood. These findings advocate for a broader perspective of the genomic landscape of CHDs, recognizing the significance of both established and emerging genetic factors in shaping patient outcomes. This study population may partially explain the identification of variants in genes that were not previously reported to be associated with TOF. Geographic factors can influence the frequency of specific variants, highlighting the need for a diverse and representative cohort in genetic research. The implications of our findings are multifaceted. First, they reinforce the importance of genetic screening for early detection and management of TOF. The identification of a wide array of genetic variations, including novel variants, demonstrates the potential of personalized medical approaches for treating CHDs. Second, our results underscore the critical role of advanced genomic technologies such as WES in uncovering the complex genetic landscape of CHDs. By providing a more comprehensive understanding of the genetic underpinnings of TOF, our study paves the way for novel diagnostic, therapeutic, and preventive strategies for this disease.
Conclusion
The genetic landscape of TOF, as revealed in our study, highlights the vast genetic diversity and complexity inherent in this condition. This comprehensive genetic analysis not only enhances our understanding of TOF’s pathophysiology but also contributes to the ongoing development of more accurate diagnostic tools and personalized treatment plans. Our findings support the adoption of a holistic genetic strategy in CHD research, which underlines the critical role of precise genomic technologies in revealing the molecular mechanisms underlying CHDs, thereby propelling us toward better patient prognosis and treatment outcomes.
Acknowledgments
The authors would like to thank Yang Feng for his valuable comments during the writing of the article and his help with statistical analysis.
Authors’ Contributions
K.M.A. and A.F.A.—conceptualization, methodology, software, investigation, writing—original draft, and writing—review & editing. M.M.A.—conceptualization, formal analysis, methodology, and data curation. Q.S.—resources, supervision, project administration, and funding acquisition. All authors contributed to revisions, and proof-reading, and approved the submitted version.
Ethical Consideration
This study was approved by the Ethical Committee of the Children’s Hospital, Zhejiang University School of Medicine and National Clinical Research Center for Child Health, and National Children’s Regional Medical Center, Hangzhou, China (2021-IRB-182). All methods were performed in accordance with the Declaration of Helsinki (as revised in 2013). Written consent was obtained from the parents or guardians of all patients.
Consent for Publication
All authors approved the final version of this paper.
Availability of Data and Materials
Data Sharing Statement: Available with the author.
Author Disclosure Statement
The authors declare that they have no conflicts of interest. All authors have read the article and approved its submission to this journal.
Funding Information
No funding was received to assist with the article.
Supplementary Material
References
- Akter H, Sultana N, Martuza N, et al. Novel mutations in actionable breast cancer genes by targeted sequencing in an ethnically homogenous cohort. BMC Med Genet 2019;20(1):150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Althali NJ, Hentges KE. Genetic insights into non-syndromic tetralogy of fallot. Front Physiol 2022;13:1012665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Apitz C, Webb GD, Redington AN. Tetralogy of Fallot. Lancet 2009;374(9699):1462–1471. [DOI] [PubMed] [Google Scholar]
- Benhaourech S, Drighil A, El Hammiri A. Congenital heart disease and Down syndrome: Various aspects of a confirmed association. Cardiovasc J Afr 2016;27(5):287–290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bhattacharya U, Neizer-Ashun F, Mukherjee P, et al. When the chains do not break: The role of USP10 in physiology and pathology. Cell Death Dis 2020;11(12):1033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bidou L, Allamand V, Rousset JP, et al. Sense from nonsense: Therapies for premature stop codon diseases. Trends Mol Med 2012;18(11):679–688. [DOI] [PubMed] [Google Scholar]
- Chelu A, Williams SG, Keavney BD, et al. Joint analysis of functionally related genes yields further candidates associated with tetralogy of fallot. J Hum Genet 2022;67(10):613–615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Curia MC, Catalano T, Aceto GM. MUTYH: Not just polyposis. World J Clin Oncol 2020;11(7):428–449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ekure EN, Adeyemo A, Liu H, et al. Exome sequencing and congenital heart disease in Sub-Saharan Africa. Circ Genom Precis Med 2021;14(1):e003108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferencz C, Rubin JD, Mccarter RJ, et al. Congenital heart disease: Prevalence at livebirth: The Baltimore-Washington infant study. Am J Epidemiol 1985;121(1):31–36. [DOI] [PubMed] [Google Scholar]
- Gerasimavicius L, Livesey BJ, Marsh JA. Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure. Nat Commun 2022;13(1):3895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Griffin HR, Töpf A, Glen E, et al. Systematic survey of variants in TBX1 in non-syndromic tetralogy of fallot identifies a novel 57 base pair deletion that reduces transcriptional activity but finds no evidence for association with common variants. Heart 2010;96(20):1651–1655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krieger EV, Valente AM. Tetralogy of fallot. Cardiol Clin 2020;38(3):365–377. [DOI] [PubMed] [Google Scholar]
- Liu J, Hufnagel RB. PNPLA6 disorders: What’s in a name? Ophthalmic Genet 2023;44(6):530–538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mercer-Rosa L, Paridon SM, Fogel MA, et al. 22q11.2 deletion status and disease burden in children and adolescents with tetralogy of fallot. Circ Cardiovasc Genet 2015;8(1):74–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Palomino Doza J, Topf A, Bentham J, et al. Low-frequency intermediate penetrance variants in the ROCK1 gene predispose to Tetralogy of Fallotw. BMC Genet 2013;14:57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pan Y, Liu M, Zhang S, et al. Whole-exome sequencing revealed novel genetic alterations in patients with tetralogy of fallot. Transl Pediatr 2023;12(10):1835–1841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reuter MS, Chaturvedi RR, Jobling RK, et al. Clinical genetic risk variants inform a functional protein interaction network for tetralogy of fallot. Circ Genom Precis Med 2021;14(4):e003410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saxena A, Mehta A, Sharma M, et al. Birth prevalence of congenital heart disease: A cross-sectional observational study from North India. Ann Pediatr Cardiol 2016;9(3):205–209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seaby EG, Pengelly RJ, Ennis S. Exome sequencing explained: A practical guide to its clinical application. Brief Funct Genomics 2016;15(5):374–384. [DOI] [PubMed] [Google Scholar]
- Shen Y, Yan K, Dong M, et al. Analysis of GFM1 gene mutations in a family with combined oxidative phosphorylation deficiency 1. Zhejiang Da Xue Xue Bao Yi Xue Ban 2020;49(5). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shinebourne EA, Babu-Narayan SV, Carvalho JS. Tetralogy of fallot: From fetus to adult. Heart 2006;92(9):1353–1359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sifrim A, Hitz MP, Wilsdon A, et al. Deciphering Developmental Disorders Study . Distinct genetic architectures for syndromic and nonsyndromic congenital heart defects identified by exome sequencing. Nat Genet 2016;48(9):1060–1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soemedi R, Topf A, Wilson IJ, et al. Phenotype-specific effect of chromosome 1q21.1 rearrangements and GJA5 duplications in 2436 congenital heart disease patients and 6760 controls. Hum Mol Genet 2012a;21(7):1513–1520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Soemedi R, Wilson IJ, Bentham J, et al. Contribution of global rare copy-number variants to the risk of sporadic congenital heart disease. Am J Hum Genet 2012b;91(3). [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tatton-Brown K, Hanks S, Ruark E, et al. Childhood Overgrowth Collaboration . Germline mutations in the oncogene EZH2 cause weaver syndrome and increased human height. Oncotarget 2011;2(12):1127–1133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yoshihara M, Saito D, Sato T, et al. Design and application of a target capture sequencing of exons and conserved non-coding sequences for the rat. BMC Genomics 2016;17(1):593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zaidi S, Choi M, Wakimoto H, et al. De novo mutations in histone-modifying genes in congenital heart disease. Nature 2013;498(7453):220–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data Sharing Statement: Available with the author.



