ABSTRACT
Background:
Individuals with Down syndrome (DS) have a 40%–60% chance of being born with congenital heart disease (CHD). This indicates that CHD in individuals with DS is not solely caused by trisomy 21, and there may be other genetic factors contributing to the development of CHD in these children. A study has identified variants in the specific genes that contribute to the pathogenesis of CHD in children with DS, isolated DS, and the CHD group. Computational studies on these identified variants, which, together with trisomy 21, determine the risk for CHD in DS cases, were limited. Here, we aimed to identify the impact of the identified variants that contribute to the pathogenesis of CHD in children with DS through in silico prediction, molecular modeling, and dynamics studies.
Methodology and Results:
The target single-nucleotide polymorphisms included in the study were examined for pathogenicity, residue conservation, and protein structural changes. The structural predictions were done using I-TASSER, Robetta, SWISS-MODEL, and Phyre2 tools. Further, the predicted models were validated through the PROCHECK server and molecular dynamics simulation using GROMACS software. The conservation analysis conducted on the identified variant highlights its significance in relation to the genetic disorders. Furthermore, a dynamics simulation study revealed the impact of the variant on protein structural stability (≤3 Å), providing valuable insights into its pathogenicity. We have also observed that the structure of the centrosomal protein of 290 kDa gene is relatively unstable, which may be attributed to its exclusive inclusion of helices within its secondary structural components.
Conclusions:
This computational study explores, for the first time, the association between genes and CHD-DS, evaluating the identified specific frameshift variants. The observed pathogenic mutations in CHD-DS patients require further experimental validation and may contribute to the development of prospective drug design research. The insights gained from the structural and functional implications of these variants could potentially serve as a cornerstone in the development of effective treatments for this debilitating condition.
Keywords: Congenital heart disease, Down syndrome, frameshift mutation, in silico prediction, molecular dynamics simulation, next-generation sequencing, single-nucleotide polymorphisms
INTRODUCTION
Down syndrome (DS) is the most common chromosomal disorder in humans, resulting from trisomy 21 and causing various complications, including intellectual disability. Congenital heart disease (CHD) is the leading cause of mortality and morbidity during the first 2 years of life in the DS population.[1,2,3] Although some DS children may have a normal heart, there is a high risk of 40%–63.5% of them developing CHD.[1,4,5,6] While trisomy of chromosome 21 increases the risk of CHD, it alone is not sufficient to cause it. Other genetic variations or environmental factors may also contribute to the risk of developing CHD.[7]
A study from Saudi Arabia aimed to identify the genomic variations in conjunction with trisomy 21 to determine the risk of CHD in DS cases. The study screened 240 patients, including 100 patients with CHD only, 110 patients with CHD and DS, and 30 patients with isolated DS, as well as 100 control samples. Targeted next-generation sequencing (NGS) was used to analyze 406 cardiovascular genes involved in heart development and function. The study identified variants in GATA3, GUSB, KCNH2, and filamin A (FLNA) genes that are involved in the pathogenesis of CHD in DS children. In addition, variants in period circadian protein homolog 1 (PER1), Myosin-9 (MYH9), and coproporphyrinogen-III oxidase (CPOX) genes were identified in the isolated DS group, while variants in centrosomal protein of 290 kDa (CEP290), endoglin (ENG), and myocyte-specific enhancer factor 2A (MEF2A) were identified in the CHD group.[8]
Computational studies on the identified variants that, in concert with trisomy 21, determine the risk for CHD in DS cases were very limited. Therefore, in this study, the available sequence and structural data were employed to analyze the impact of the identified frameshift mutation associated with CHD-DS. One splice site variant was identified in the FLNA gene, which was associated with both DS and CHD among the variants investigated. However, due to a lack of adequate data, it was not feasible to demonstrate the impact of this splice site variant on the protein structure in the structural analysis. This study represents the first attempt to report on the impact of frameshift mutations through computational analyses, providing a novel and valuable insight into the functional and structural consequences of these genetic alterations. The reported data could act as an important source of information if explored well. To predict the 3D structure of a protein lacking a crystal structure, a molecular modeling approach was employed in this study. Furthermore, molecular dynamic (MD) simulations were conducted to investigate the effect of SNPs on the structural integrity and stability of the gene, providing a detailed understanding of the molecular mechanisms underlying these genetic variations.
METHODOLOGY
Computational resources
All computational analyses were performed using a CentOS v7.0 Linux platform, with an Intel® Core™ i7-4770 CPU 3.40 GHz processor, and software packages Maestro (version 11.4.011, Schrodinger, LLC, NY, USA) and GROMACS (version 5.0.5) (Developed by University of Groningen, Royal Institute of Technology, Uppsala University, Sweden).
Structural analysis of target genes
The focus of the study was to perform structural analysis of genes reported in patients with CHD-DS to gain insights into the potential mechanisms underlying the disease. Table 1 shows a total of nine gene variants that are associated with DS, CHD, or both. The genes analyzed in this study were classified into three distinct groups based on their clinical significance,[8] enabling a comprehensive analysis of the molecular mechanisms underlying the observed genetic variations and their potential impact on disease pathogenesis.[9] Previous studies did not include information on the protein structure and functional mechanism of each gene in relation to the associations with CHD and DS. To study the impact of mutations on protein structure, a reliable protein structure obtained through crystallography would be the optimal choice. When a reliable crystal structure is unavailable, molecular modeling is the most viable approach to investigate the structural implications of genetic mutations on protein function. In molecular modeling, we opted to choose homology modeling, comparative modeling or ab initio modeling based on the availability and query coverage of the templates. To conduct a template search, we used the BLASTp tool to screen available structure databases for proteins with similar sequences. If the target protein sequence is not encompassed by the query coverage of the protein sequence, this can pose significant challenges to structural analysis. In such cases, alternative computational approaches may be required to derive meaningful insights into the protein’s structural and functional properties. To model the protein, a multitemplate approach can be employed, which utilizes multiple templates to cover the target region and enable a more accurate prediction of the protein’s overall structure and function. As a result, obtaining structural data on genes associated with CHD-DS would greatly benefit future research on this genetic disease.
Table 1.
Data of all the studied genetic variants with sequence and structural information
| Category | Genes | Variant details | UniProt and PDB accession codes (amino acid length) |
|---|---|---|---|
| Group A (variants common in CHD-DS patients) | GATA3 | NM_001002295:exon3:c.380_381del: p.P127fs | P23771/NA (443aa) |
| GUSB | NM_000181:exon3:c.479_480del: p.G160fs | P08236/3HN3 (651aa) | |
| KCNH2 | NM_172057:exon8:c.1755delG: p.G585fs | Q12809/NA (1159aa) | |
| FLNA* | NM_001456:exon18:c.2405-5C>T | H0Y5C6/NA (281aa) | |
| Group B (variants common in DS patients) | PER1 | NM_002616:exon18:c.2289_2290insC: p.V764fs | O15534/NA (1290aa) |
| MYH9 | NM_002473:exon16:c.1960_1961insG: p.L654fs | P35579/NA (1960aa) | |
| CPOX | NM_000097:exon5:c.990_991insG: p.R331fs | P36551/2AEX (454aa) | |
| Group C (variants common in CHD patients) | CEP290 | NM_025114:exon40:c.5506_5507insG: p.I1836fs | O15078/NA (2479aa) |
| MEF2A | NM_001130928:exon8:c.1050_1058del: p.350_353del | Q02078/NA (507aa) | |
| ENG | NM_000118:exon4:c.392_393del: p.P131fs | P17813/5I04 (658aa) |
*This variant represents a splice site mutation, which disrupts the normal mRNA splicing process and has the potential to alter the resulting protein sequence and function. NA: Not available, DS: Down syndrome, CHD: Congenital heart disease, PDB: Protein data bank, FLNA: Filamin A, PER1: Period circadian protein homolog 1, CEP290: Centrosomal protein of 290 kDa
The availability of crystal structures for proteins belonging to Groups A, B, and C was explored by searching the structural databases.[10] Molecular modeling approaches were employed to predict the protein’s structure when the desired position of genetic variants was not present in available crystal structure records. To achieve this, we employed various molecular modeling tools such as SWISS-MODEL workspace,[11] I-TASSER,[12] Robetta server,[13] and Pyrex2 server,[14] which allowed us to generate precise protein models that were further analyzed for subsequent investigations.
Data collection
Group A
GATA3
The gene encoding the trans-acting T-cell specific transcription factor contains a variant at position 127 that causes premature termination of the protein structure.[15,16] Obtaining a 3D structure is imperative to investigate the impact of this variant on the GATA3 protein. The protein sequence for GATA3 was retrieved from the UniProt database (ID: P23771) and saved in *.fasta format to facilitate further analysis. The crystal structure currently available for the GATA3 protein only spans amino acid positions 260-370. The target mutation position Proline 127 residue does not fall within the sequence of crystal structure, hence molecular modeling was done using Robetta server. Among the several predicted structures, we obtained the most reliable one based on the PROCHECK analysis results.
GUSB
The beta-glucuronidase has a variant in G160 position that leads to premature truncation of the protein.[17] The crystal structure of the GUSB was obtained from protein databank (PDB ID: 3HN3). The available protein structure has a sequence coverage of amino acids 21–633, which includes our target site, position 160. The protein sequence of the GUSB protein was retrieved from the UniProt database (P08236). The variant position was located at the 6th helix of the GUSB protein structure.
KCNH2
The potassium voltage-gated channel subfamily H member 2 has a variant at position G585, which results in premature truncation of the protein.[8,18] Despite the availability of the crystal structure of KCNH2 in the Protein Data Bank (PDB ID: 5KA1), the variant at position G585 was not reported. Therefore, we utilized homology modeling, using the protein sequence obtained from UniProt with the accession ID Q12809-2. Subsequently, we conducted PROCHECK analysis to evaluate the stereochemical properties of the resulting model.
Filamin A
The FLNA gene encodes for the FLNA protein, which plays a key role in the cytoskeletal organization of cells. Mutations in the FLNA gene have been associated with various genetic disorders, including skeletal dysplasia, cardiovascular abnormalities, and neurological disorders.[19,20] It has a variant in exon 18:c. 2405-5C>T which is a splice site mutation. Splice site mutations are valuable in protein structural analysis as they can impact the inclusion or exclusion of specific protein domains or segments. These changes can lead to modified protein structures and functions, which can be examined using methods like crystallography, molecular modeling, and MD simulations. However, owing to the limited data available for computational predictions, we have excluded this mutation from the remainder of our analysis.
Group B
Period circadian protein homolog 1
The PER1 is a transcriptional repressor that maintains the circadian clock’s internal time-keeping system.[21] A variant was observed in the V764 region responsible for phosphorylation by Casein Kinase 1 Epsilon (CSNK1E). The PER1 protein sequence was obtained from UniProt (ID: O15534), and a suitable crystal structure with the mutated position was not available. Therefore, we utilized homology modeling to predict the protein’s 3-D structure.
Myosin-9
The cellular MYH9 appears to play a vital role in cytokinesis, cell shape, and specialized functions like capping and secretion.[22,23] A variant at position L654 in the gene lacks a suitable crystal structure for structural analysis. Hence, we used the SWISS-MODEL workspace to model a new 3D structure. The protein sequence was obtained from the UniProt database (ID: P35579).
Coproporphyrinogen-III oxidase
Oxygen-dependent CPOX, mitochondrial gene, which involved in the heme-biosynthesis.[8] The identified variant position in the gene is R331 which is present in the available crystal structure (PDB ID: 2AEX). We obtained the available crystal structure for the CPOX protein from RCSB, Protein Data Bank, which is a high-resolution structure with 1.58 Å resolution. This structure covers the variant position, located in the 8th beta-strand of the CPOX protein. The protein sequence was taken from the UniProt database (ID: P36551).
Group C
Centrosomal protein of 290 kDa
CEP290 is a large protein involved in the early and late steps of cilia formation.[24] The protein sequence, comprising 1879 amino acid residues, was obtained from the UniProt database (ID: O15078). Unfortunately, none of the available crystal structures covered the variant at position I1836. Hence, we used the SWISS-MODEL workspace to generate a new model specifically for the mutated position.
Myocyte-specific enhancer factor 2A
MEF2A is a transcriptional activator, also involved in the activation of numerous growth factor and stress-induced genes.[25] The protein sequence of the MEF2A was taken from UniProt database (ID: Q02078). The observed variant in this gene is a deletion mutation between 350 and 353, these four residues were taken into consideration for model building. Since there was no crystal structure available, a new model was developed using SWISS-MODEL workspace.
Endoglin
ENG, also known as vascular endothelium glycoprotein, is a protein that plays a crucial role in the regulation of angiogenesis.[26] The protein sequence of the ENG gene was obtained from the UniProt database (ID: P17813) and it consists of 658 amino acids. A variant at position P131 of this gene has been identified as being associated with coronary heart disease (CHD) patients. The crystal structure of ENG protein was taken from the RCSB protein databank (ID: 5I04).
We utilized the pBLAST tools from NCBI to analyze the protein sequences of three different groups and identify suitable template protein sequences from the PDB database.[27] For any variant sequence that did not align with the template sequence, molecular modeling was performed using several online servers, including SWISS-MODEL workspace, Pyrex2, I-TASSER, and Robetta server. These modeling interfaces were utilized to predict the tertiary structure of the target proteins. The final reliable model was obtained after performing the PROCHECK validation.
Secondary structure analysis
The secondary structural analysis was done by aligning the protein sequence using Geneious Pro software.[28] In this study, the mutant sequences of various genes were aligned using the Multiple Alignment using Fast Fourier Transform (MAFFT) alignment v 7.017 software with the Blosum62 matrix, and the secondary structure prediction was carried out on the aligned sequences. A color scheme representing the hydrophobicity, similarity, and polarity of specific amino acids was created in a universal format. The secondary structure components, including beta-sheets, helixes, loops, and turns, were predicted based on the protein sequence. If there were any changes in the protein sequence due to a point mutation, the secondary structure would be different from the original (wild type). The comprehensive sequence alignment with secondary structure prediction enabled the prediction and localization of structural changes in target proteins.[29,30]
Molecular dynamics simulations
The crystal and modeled structures of all genes were subjected to MD simulations using GROMACS v5.0.5 and the GROMOS96 43a1 force field to observe the dynamic behavior and structural changes of the proteins over time. For each protein, a 10 ns MD simulation was performed using the GROMACS v5.0.5 software and the GROMOS96 43a1 force field. The protein complex was embedded in a cubic-shaped box filled with simple point-charge water molecules.[31] Before the simulation, the energy of the system was minimized using the steepest descent method with a cutoff of 9 Å for van der Waals and Coulomb forces, and appropriate ions were added for neutralization. The simulations were performed at a constant pressure and temperature (NPT) ensemble using the Berendsen thermostat with coupled temperature and pressure of 310 K and 1 bar pressure, respectively.[32] During the MD simulations, the heavy atoms of the proteins were restrained by a harmonic force with a constant of 1000 kJ/mol/nm2 using the NPT and NVT ensembles. The simulation trajectories were saved at 2 fs intervals over a 10 ns period, and the output files were subsequently analyzed to calculate the root mean square deviation (RMSD). The RMSD calculations were used to observe the structural deviation with respect to the backbone atoms of each residue. The data were calculated for the whole 10 ns simulation period and plotted using Origin Pro 8 to examine deviations among the proteins.
Protein sequence conservation analysis
The protein sequences of all the genes were subjected to residue conservation analysis using the ConSurf server.[33,34] This analysis helps in identifying the level of conservation of the mutated residue based on the evolutionary conservation score as compared to the UNIREF-90 database. HMMER was used for the alignment, and single iteration with Bayesian calculation methods was employed to calculate the conservation score. Mutations in the evolutionarily conserved region could potentially have a huge impact on the protein’s function. The degree of conservation was represented using a color scheme, with the more conserved residues appearing in dark red and the less conserved ones in blue. This analysis provides crucial insights into the potential impact of the mutation on the protein’s function.
RESULTS
Secondary structure analysis
The predicted secondary structure for the targeted genes emphasizes the frameshift mutation location [Figure 1]. All three groups were categorized and aligned independently with their predicted secondary structure. Figure 1a represents the alignment of Group A genes GATA3, GUSB, and KCNH2 protein sequences. Figure 1b covers Group B with PER1, MYH9, and CPOX genes. Figure 1c consists of CEP290, MEF2A, and ENG protein sequences with secondary structure patterns. The protein residue charge is also indicated in the line graph below the sequence. It was observed that the predicted secondary structure applied to the target sequence reveals that most of the variants lie in the helix and beta-strands of the protein structure. The variant positions in the protein sequence were marked with red asterisk symbols.
Figure 1.
Secondary structure prediction results of all the genes highlighting the mutated regions (a-c)
Protein structural modeling
SWISS-MODEL workspace
The prediction of protein structure through a homology modeling approach is helpful in analyzing and correlating structural and conformational changes. The SWISS-MODEL workspace predicts a 3D structure of the proteins, and the predicted protein is validated through PROCHECK analysis. The variant region of the proteins was predicted to have a refined structure with the mutation site [Figure 2a-c] highlighted. The mutation site is located in the helix and loop regions of the protein.
Figure 2.
Predicted protein structure models and highlighting the mutated amino acid sites in the target proteins (a-c)
Online modeling servers
The GATA3 protein sequence has very low identity and coverage with the existing sequence of the structural proteins. Since it is disordered protein, we have carried out ab initio protein modeling using Roberta server. The server predicts three domains using Ginzu prediction algorithm; among the three domains, we have modeled the second domain which covers the mutation site P127 position for GATA3 protein modeling. In the SWISS-MODEL workspace, with the help of template identification tools, further, the best templates taken for modeling. All the modeled proteins were then validated using PROCHECK [Table 2] and proceeded further for protein preparation and energy minimization. Few more residues were found to be present in the additionally allowed regions in the PROCHECK analysis that excludes CEP290 and MEF2A. Most favored or allowed regions is a term used in protein structure analysis to describe regions of the Ramachandran plot where the backbone dihedral angles are commonly observed in experimentally determined protein structures. These regions are considered energetically favorable for protein folding and stability.
Table 2.
The PROCHECK analysis results for the modeled proteins
| Gene | Percentage of residues falls in most favored/allowed regions | Percentage of residues falls in disallowed regions |
|---|---|---|
| GATA3 | 88.6 | 1.4 |
| KCHN2 | 88.5 | 1.5 |
| PER1 | 85.7 | 4.3 |
| MYH9 | 89.7 | 0.3 |
| CEP290 | 100 | 0.0 |
| MEF2A | 92.3 | 7.7 |
PER1: Period circadian protein homolog 1, CEP290: Centrosomal protein of 290 kDa
Residue conservation analysis
The evolutionary conserved regions in the protein sequence are the hot spot for pathogenic mutations. The frameshift variants are observed in domain, active site, repeats, coils, and some in hypothetical regions of the target protein. The variant position in the proteins was found to be present in the most conserved region with a conservation score of 9 and few in less conserved regions with the score range from 1 to 5 [Table 3]. The confidence intervals were calculated using the Bayesian method, the confidence interval is assigned to each of the inferred evolutionary scores. The mutated amino acids are predicted to be exposed, buried, and functional residues by ConSurf server [Figure 3].
Table 3.
The residue conservation analysis results for the respective gene mutations
| Gene | Position | Score normalized | Conservation color | CI | Buried/exposed |
|---|---|---|---|---|---|
| GATA3 | 127/P | 0.124 | 5 | −0.256–0.343 | Exposed |
| GUSB | 160/G | 0.116 | 5 | −0.187–0.323 | Functional |
| KCHN2 | 585/G | 1.247 | 1 | 0.493–2.193 | Exposed |
| PER1 | 764/V | 0.471 | 3 | 0.143–0.710 | Exposed |
| MYH9 | 654/L | −1.321 | 9 | −1.435–1.302 | Buried |
| CPOX | 331/G | 0.452 | 3 | 0.087–0.662 | Functional |
| CEP290 | 1836/I | 0.239 | 4 | −0.153–0.576 | Buried |
| MEF2A | 350/L | 0.234 | 4 | −0.203–0.457 | Buried |
| 351/Q | 0.309 | 4 | −0.103–0.457 | Exposed | |
| 352/G | 0.590 | 3 | 0.137–0.934 | Exposed | |
| 353/F | −0.294 | 6 | −0.581–0.103 | Buried | |
| ENG | 131/P | 2.062 | 1 | 1.018–3.110 | Exposed |
CI: Confidence interval, CEP290: Centrosomal protein of 290 kDa, PER1: Period circadian protein homolog 1
Figure 3.
Amino acid conservation analysis of the proteins locating the mutated positions
Structural stability of predicted protein structure
The dynamic behavior of the protein was analyzed to understand the structural changes and stability of the predicted model. The MD simulation analysis was performed for 10 ns and the results were plotted from the exported RMSD data. The RMSD plot clearly represents that the predicted model is well stable and has less fluctuation during the simulation period [Figure 4]. Moreover, the overall predicted structure has stabilized within 2 Å after a 3 ns time period, indicating that the predicted structure is stable in a dynamic environment and suitable for further computational analysis.
Figure 4.

Root mean square deviation plot of the modeled structures obtained from molecular dynamics simulation
The stability of the predicted protein structure suggests that the homology modeling approach used to predict the protein structure was reliable and accurate. The results of this study provide valuable insights into the structural stability of the predicted protein structure and can be useful for further studies on the function and dynamics of the protein.
DISCUSSION
Children with DS often show co-occurrence of CHD. DS children may exhibit a normal heart; however, there is a 40%–63.5% chance of DS children having CHD. CHD is a major cause of morbidity and mortality in patients with DS. Although trisomy 21 (T21) increases the risk of CHD, it is not the primary cause of CHD, as 40% of children with DS have a normal heart. Investigating the genomic variations that have been associated with T21 might help determine the risk for CHD in DS. The molecular genetics of CHD risk in DS patients are complex and include discrepancies in chromosome 21 such as trisomies, single-nucleotide polymorphisms (SNPs), and CNV variations.[7]
A recent study from Saudi Arabia identified genomic variations in conjunction with trisomy 21 to determine the risk of CHD in DS cases. The study identified variants in the GATA3, GUSB, KCNH2, and FLNA genes that contribute to the pathogenesis of CHD in children with DS. Variants in the PER1, MYH9, CPOX, and FLNA genes were identified in the isolated DS group. In the CHD group, variants in CEP290, ENG, and MEF2A were found. However, computational studies on these identified variants, which together with trisomy 21, determine the risk for CHD in DS cases, were insufficient.
Therefore, our aim was to predict the pathogenicity, residue conservation, and effects on the protein structure in response to the selected SNPs. In doing so, we identified the impact and position of the identified genetic variants in the protein structure reported in all the genes studied. The secondary structural changes and residue conservation analysis have provided new insights into the identified mutations. The variant position of MYH9 was found to be the most conserved region using ConSurf analysis with a score of 9, and it is a buried residue. The impact of these mutations on protein structure and function was predicted using MD simulation studies. All the studied mutations were stable after modeling, except the structure modeled with the variant in CEP290, which had unfavorable stability during the simulation period. Furthermore, CEP290 (p.I1836fs) mutations have been extensively reported to have an impact on various health issues, including heart disease, retinal dystrophy, chronic kidney disease, and Joubert syndrome.[35]
The observed results clearly demonstrate the virulence factor of each variant in contributing to CHD and DS. The protein secondary structure analysis revealed changes in the secondary structural elements of the mutated regions, including helices, beta-sheets, and coils. The conservation analysis of the identified variants has emphasized the significance of the mutations in genetic disorders. Furthermore, the molecular visualization and simulation studies provide additional evidence of structural stability for future use in drug design studies.
Variants in Group A
The variants in GATA3, GUSB, and KCNH2 are associated with both CHD and DS. The GATA3 (exon3: p.P127fs), GUSB (exon3: p.G160fs), and KCNH2 (exon8: p.G585fs) structural information of the genes prove that these frameshift mutations create dramatic changes in the remaining functional hotspots in the protein structure. The functional loss of these mutations makes the protein nonfunctional, and the biological activity is also completely diminished. In GATA3, the mutation in P127 affects the zinger finger domains of the protein for metal binding. The protein–protein interaction data show that the 1-258 regions of GATA3 are responsible for TBX21 interaction. The GUSB variant G160 is located 13 downstream from the N-linked glycosylation first site out of four, which partially restricts the glycosylation process of the protein. In KCNH2, the variant G585 falls in the extracellular topological domain of this membrane protein. The changes in the protein due to mutation affect the remaining segments of the membrane protein.
Variants in Group B
In Group B, PER1, MYH9, and CPOX genes were reported to be associated with DS cases. Unique variants, such as PER1 (p.V764fs), MYH9 (p.L654fs), and CPOX (p.R331fs), were identified in isolated DS cases compared to CHD. The frameshift mutations identified in the genes were taken into the structure of the protein to find their effect on the functional role of the protein. The V764 variant in PER1 is located next to the PAC domain, responsible for CSNK1E phosphorylation, which is completely inhibited by this frameshift mutation. The MYH9 variant is located in the myosin motor domain, which plays an essential role in its molecular function. In CPOX, the R331 variant site is located near the region important for protein dimerization and the substrate-binding site.
Variants in Group C
The insertion mutation in exon 20 of the CEP290 gene (p.I1836fs), the deletion mutation in the MEF2A (p.350–353del), and the deletion mutation in exon 4 of the ENG gene (p.P131fs) were identified in CHD cases alone. In CEP290, the variant is located in the region responsible for self-association with the N-terminus. The MEF2A variant is located between cleavage site 2 and site 3. The variant is also located close to the beta domain of the MEF2A protein. The variant position of the ENG is located in the extracellular domain and close to the N-linked glycosylation site. The variant site is also responsible for the GDF2 protein interaction and the OR2 region of the ENG protein.
Thus, the present computational study on the genes (GATA3, GUSB, KCNH2, PER1, MYH9, CPOX, CEP290, MEF2A, and ENG) shows a strong association with CHD-DS, supported by specific deleterious SNPs identified in the NGS analysis [Table 1]. This study provides additional insights into the impact of each gene at the protein structure level. However, a significant number of genes targeted for mutation analysis does not have an available crystal structure, which is a critical prerequisite for accurately analyzing the effects of the mutation on the protein’s structure. Therefore, several computational algorithms were utilized to predict the protein structure of the pathogenic genes. Secondary structure analysis provides key information on the mutated regions of the protein, and residue conservation analysis helps to identify evolutionarily conserved regions in the protein sequence. In residue conservation analysis, we found that the MYH9 mutation position is found in the most conserved region among the studied variants. The modeled protein structures were validated and energy was minimized for MD simulation studies, which demonstrated that the predicted structures were very stable in a biological environment with low RMSD values (0.25–0.65 nm). In the MD simulation study, only the CEP290 I1836 frameshift mutation was found to be unstable throughout the simulation due to its secondary structural elements. These stable modeled structures may be useful for future research targeting similar genetic disorders. However, crystal structure data are required to reveal the entire structure of the protein as well as the structure of the disorder proteins. Moreover, computational studies have shown the structural effect of the identified mutation. Previous studies have found that the targeted genes are highly associated with DS patients with CHD, and the frameshift variants identified in this study are strongly linked to the risk factor.
CONCLUSIONS
This theoretical study sheds light on the genes associated with CHD-DS and identifies specific deleterious SNPs. Despite the lack of crystal structures for some targeted genes, the use of computational algorithms allows for the prediction of protein structures that are stable in a biological environment. These modeled structures provide valuable insight into the impact of pathogenic mutations and may pave the way for future research targeting similar genetic disorders and in the same protein targets. This study emphasizes the potential for prospective drug design based on the observed pathogenic mutations in CHD-DS patients.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
Acknowledgment
The author KM thankfully acknowledges the MHRD-RUSA 2.0 - F.24/51/2014-U Policy (TN Multi-Gen), Department of Education DST-FIST (SR/FST/LSI-667/2016)(C), DBT-Bioinformatics and Computational Biology Centre (BIC) – No.BT/PR40154/BTIS/137/34/2021 for the infrastructure facilities.
REFERENCES
- 1.Freeman SB, Taft LF, Dooley KJ, Allran K, Sherman SL, Hassold TJ, et al. Population-based study of congenital heart defects in Down syndrome. Am J Med Genet. 1998;80:213–7. [PubMed] [Google Scholar]
- 2.Langdon J, Down H. Observations on an ethnic classification of idiots. Heredity (Edinb) 1966;21:695–7. [Google Scholar]
- 3.Levenson D. Talking about down syndrome. Am J Med Genet (Part A) 2013;149(A):vii–viii. doi: 10.1002/ajmg.a.32867. [DOI] [PubMed] [Google Scholar]
- 4.Laursen HB. Congenital heart disease in Down's syndrome. Br Heart J. 1976;38:32–8. doi: 10.1136/hrt.38.1.32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Paladini D, Tartaglione A, Agangi A, Teodoro A, Forleo F, Borghese A, et al. The association between congenital heart disease and Down syndrome in prenatal life. Ultrasound Obstet Gynecol. 2000;15:104–8. doi: 10.1046/j.1469-0705.2000.00027.x. [DOI] [PubMed] [Google Scholar]
- 6.Narayanan DL, Yesodharan D, Kappanayil M, Kuthiroly S, Thampi MV, Hamza Z, et al. Cardiac spectrum, cytogenetic analysis and thyroid profile of 418 children with Down syndrome from South India:A cross-sectional study. Indian J Pediatr. 2014;81:547–51. doi: 10.1007/s12098-013-1088-6. [DOI] [PubMed] [Google Scholar]
- 7.Sailani MR, Makrythanasis P, Valsesia A, Santoni FA, Deutsch S, Popadin K, et al. The complex SNP and CNV genetic architecture of the increased risk of congenital heart defects in Down syndrome. Genome Res. 2013;23:1410–21. doi: 10.1101/gr.147991.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Alharbi KM, Al-Mazroea AH, Abdallah AM, Almohammadi Y, Carlus SJ, Basit S. Targeted next-generation sequencing of 406 genes identified genetic defects underlying congenital heart disease in Down syndrome patients. Pediatr Cardiol. 2018;39:1676–80. doi: 10.1007/s00246-018-1951-3. [DOI] [PubMed] [Google Scholar]
- 9.Loganathan L, Gopinath K, Sankaranarayanan VM, Kukreti R, Rajendran K, Lee JK, et al. Computational and pharmacogenomic insights on hypertension treatment:Rational drug design and optimization strategies. Curr Drug Targets. 2020;21:18–33. doi: 10.2174/1389450120666190808101356. [DOI] [PubMed] [Google Scholar]
- 10.RCSB Protein Data Bank. RCSB PDB:Homepage. RCSB PDB. 2021. [[Last accessed on 2023 Apr 24]]. Available from: https://www.rcsb.org/
- 11.Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, et al. SWISS-MODEL:Homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46:W296–303. doi: 10.1093/nar/gky427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Yang J, Zhang Y. I-TASSER server:New development for protein structure and function predictions. Nucleic Acids Res. 2015;43:W174–81. doi: 10.1093/nar/gkv342. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kim DE, Chivian D, Baker D. Protein structure prediction and analysis using the robetta server. Nucleic Acids Res. 2004;32:W526–31. doi: 10.1093/nar/gkh468. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJ. The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc. 2015;10:845–58. doi: 10.1038/nprot.2015.053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Xu K, Yang WY, Nanayakkara GK, Shao Y, Yang F, Hu W, et al. GATA3, HDAC6, and BCL6 regulate FOXP3+treg plasticity and determine treg conversion into either novel antigen-presenting cell-like treg or Th1-Treg. Front Immunol. 2018;9:45. doi: 10.3389/fimmu.2018.00045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Li D, Ji L, Liu L, Liu Y, Hou H, Yu K, et al. Characterization of circulating microRNA expression in patients with a ventricular septal defect. PLoS One. 2014;9:e106318. doi: 10.1371/journal.pone.0106318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Molina CE, Jacquet E, Ponien P, Muñoz-Guijosa C, Baczkó I, Maier LS, et al. Identification of optimal reference genes for transcriptomic analyses in normal and diseased human heart. Cardiovasc Res. 2018;114:247–58. doi: 10.1093/cvr/cvx182. [DOI] [PubMed] [Google Scholar]
- 18.Hocker JD, Poirion OB, Zhu F, Buchanan J, Zhang K, Chiou J, et al. Cardiac cell type-specific gene regulatory programs and disease risk association. Sci Adv. 2021;7:eabf1444. doi: 10.1126/sciadv.abf1444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bandaru S, Ala C, Zhou AX, Akyürek LM. Filamin a regulates cardiovascular remodeling. Int J Mol Sci. 2021;22:6555. doi: 10.3390/ijms22126555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Deng X, Li S, Qiu Q, Jin B, Yan M, Hu Y, et al. Where the congenital heart disease meets the pulmonary arterial hypertension, FLNA matters:A case report and literature review. BMC Pediatr. 2020;20:504. doi: 10.1186/s12887-020-02393-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Leibetseder V, Humpeler S, Svoboda M, Schmid D, Thalhammer T, Zuckermann A, et al. Clock genes display rhythmic expression in human hearts. Chronobiol Int. 2009;26:621–36. doi: 10.1080/07420520902924939. [DOI] [PubMed] [Google Scholar]
- 22.O’Seaghdha CM, Parekh RS, Hwang SJ, Li M, Köttgen A, Coresh J, et al. The MYH9/APOL1 region and chronic kidney disease in European-Americans. Hum Mol Genet. 2011;20:2450–6. doi: 10.1093/hmg/ddr118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Zhang SF, Zhang YA, Wang SL, Deng XD, Xiao JC, Yu P. Clinical and molecular biological characteristics of patients in a MYH9-RD family. J Jilin Univ Med. (Ed.. J4) 2011;37:109–12. [Google Scholar]
- 24.Drivas TG, Holzbaur EL, Bennett J. Disruption of CEP290 microtubule/membrane-binding domains causes retinal degeneration. J Clin Invest. 2013;123:4525–39. doi: 10.1172/JCI69448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Horan PG, Allen AR, Hughes AE, Patterson CC, Spence M, McGlinchey PG, et al. Lack of MEF2A Delta7aa mutation in Irish families with early onset ischaemic heart disease, a family based study. BMC Med Genet. 2006;7:65. doi: 10.1186/1471-2350-7-65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Garzon-Martinez M, Perretta-Tejedor N, Garcia-Ortiz L, Gomez-Marcos MA, Gonzalez-Sarmiento R, Lopez-Hernandez FJ, et al. Association of Alk1 and endoglin polymorphisms with cardiovascular damage. Sci Rep. 2020;10:9383. doi: 10.1038/s41598-020-66238-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Singh KH, Muthusamy K. In silico genome analysis and drug efficacy test of influenza a virus (H1N1). 2009. Indian J Microbiol. 2009;49:358–64. doi: 10.1007/s12088-009-0063-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al. Geneious basic:An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28:1647–9. doi: 10.1093/bioinformatics/bts199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Loganathan L, Al-Haidose A, Ganesh Kumar A, Sujatha LB, Carlus FH, Alharbi A, et al. An in silico analysis of the impact of POLE mutations on cladribine docking. Eur Rev Med Pharmacol Sci. 2022;26:7580–93. doi: 10.26355/eurrev_202210_30033. [DOI] [PubMed] [Google Scholar]
- 30.Carlus SJ, Almuzaini IS, Karthikeyan M, Loganathan L, Al-Harbi GS, Abdallah AM, et al. Next-generation sequencing identifies a homozygous mutation in ACADVL associated with pediatric familial dilated cardiomyopathy. Eur Rev Med Pharmacol Sci. 2019;23:1710–21. doi: 10.26355/eurrev_201902_17133. [DOI] [PubMed] [Google Scholar]
- 31.Loganathan L, Muthusamy K. Investigation of drug interaction potentials and binding modes on direct renin inhibitors:A computational modeling studies. Lett Drug Des Discov. 2018;16:919–38. [Google Scholar]
- 32.Loganathan L, Natarajan K, Muthusamy K. Computational study on cross-talking cancer signalling mechanism of ring finger protein 146, axin and tankyrase protein complex. J Biomol Struct Dyn. 2020;38:5173–85. doi: 10.1080/07391102.2019.1696707. [DOI] [PubMed] [Google Scholar]
- 33.Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, et al. ConSurf 2016:An improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 2016;44:W344–50. doi: 10.1093/nar/gkw408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Landau M, Mayrose I, Rosenberg Y, Glaser F, Martz E, Pupko T, et al. ConSurf 2005:The projection of evolutionary conservation scores of residues on protein structures. Nucleic Acids Res. 2005;33:W299–302. doi: 10.1093/nar/gki370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Albesher N, Massadeh S, Hassan SM, Alaamery M. Consanguinity and congenital heart disease susceptibility:Insights into rare genetic variations in Saudi Arabia. Genes (Basel) 2022;13:354. doi: 10.3390/genes13020354. [DOI] [PMC free article] [PubMed] [Google Scholar]



