Abstract
Distinct missense mutations in a specific gene have been associated with different diseases as well as differing severity of a disease. Current computational methods predict the potential pathogenicity of a missense variant but fail to differentiate between separate disease or severity phenotypes. We have developed a method to overcome this limitation by applying machine learning to features extracted from molecular dynamics simulations, creating a way to predict the effect of novel genetic variants in causing a disease, drug resistance, or another specific trait. As an example, we have applied this novel approach to variants in calmodulin associated with two distinct arrhythmias as well as two different neurodegenerative diseases caused by variants in amyloid-β peptide. The new method successfully predicts the specific disease caused by a gene variant and ranks its severity with more accuracy than existing methods. We call this method molecular dynamics phenotype prediction model.
Significance
We describe a computational strategy for extracting the effect of genomic variants on function. We do this by extracting features of the all-atom molecular simulation of wild-type and variant proteins and applying machine learning to differentiate between them. This method provides a way to quantify the differential impact of a genetic variant’s contribution to phenotype (disease or trait) and predicts the severity of this difference from the wild-type protein. As an example, we have used genetic variants of the protein calmodulin that lead to two significant and mechanistically distinct cardiac arrhythmias, catecholaminergic polymorphic ventricular tachycardia (CPVT) and long QT syndrome (LQTS) to demonstrate our method. Our method offers greater accuracy and more capabilities than other methods used to predict the phenotype of gene variants.
Introduction
Variations in genomic sequences are believed to confer traits or underlie genetic diseases through subtle changes to the protein conformational dynamics and subsequent molecular interactions that drive complex biological systems (1). Breakthroughs in sequencing technology have made the identification of variants faster and more cost-effective and have resulted in the discovery of many new variants, of which many remain unclassified (2). Moreover, because of their rarity, large sample size or improved methods will be required to associate these variants with complex traits and disease (3). As a result, one of the largest unsolved challenges in biology is the prediction of the effect of discovered genomic variation on altering the phenotype, i.e., the traits of an organism or the role of variants in disease (4, 5, 6, 7, 8). To address this pressing need, we present a method that leverages machine learning to identify and quantify features of all-atom molecular dynamics simulations to predict the specific functional effect and disruptive severity of genetic variants.
All-atom simulations are widely used to understand the function of proteins and how variation can impact structure (9, 10, 11, 12, 13). However, their transformative impact on clinical translation has yet to be demonstrated. Previous molecular simulation studies demonstrate that genetic variants structures differ from wild-type (WT) proteins and combined molecular simulation, and docking studies predict that variants have different drug-binding affinities (14, 15, 16). Unfortunately, these analysis methods fall short of predicting and ranking the relative functional impact of a given variant. Machine learning approaches have garnered much attention recently by their ability to differentiate between inputs (17,18). By mining features from these simulations and using machine learning to classify them, the novel method presented here can differentiate between different diseases caused by genomic variation in a specific protein and predict and rank the disease severity of multiple variants resulting in the same disease phenotype.
We present two examples of proteins that have variants that are responsible for more than one pathology: calmodulin (CaM) and amyloid-β (Aβ) peptide. Identified variants of calmodulin (CaM) give rise to different potentially fatal cardiac arrhythmias, polymorphic ventricular tachycardia (CPVT) and long QT syndrome (LQTS), whereas Aβ mutants may result in different neurodegenerative pathologies (19). Calmodulin and Aβ peptide were chosen because they both have known associations between missense variants and more than one disease, there are existing data that quantitatively measures the severity of the disease phenotype caused by each of the missense variants, the size of each protein/peptide is not too large, and there is an available Protein Data Bank (PDB) structure. These two specific candidates were also chosen because of the contrasts they provide. Aβ peptide is considered an intrinsically disordered protein with no strict association of global structure on function. On the other hand, calmodulin has a known functional role of its structure where there are two lobes and a linker, and the relative positions of these are part of the functional mechanism.
The methods presented in this manuscript leverage clustering and machine-learning techniques derived from global structural changes and the ϕ and ψ backbone data within the protein. PDB files used in the analysis are initially taken from the all-atom simulation trajectories and some are clustered and parsed into sets of three or four based on the root mean-squared deviation (RMSD). A subsequent principal component analysis (PCA) is used on the resulting ϕ and ψ angle matrices followed by a random forest and k-nearest neighbors (KNN) algorithm on the principal component data to evaluate pathogenicity, phenotype, and accuracy.
Methods
Calmodulin
For all molecular dynamics (MD) simulations, the structure of wild-type Ca2+ CaM bound to the IQ domain of the L-type Ca2+ channel (PDB: 2F3Y) was used. Because the LQTS (D96V, D130G, F142L) mutations disrupt calcium binding in regions of their respective EF hand domains in the C-lobe, the corresponding ion was removed from the variant structure (20). The Mutator plug-in for Visual Molecular Dynamics (VMD) (21) was used to mutate the appropriate residue. All Ca2+ ions were left in the CPVT (N54I and N98S) variant because those residues do not directly interact with Ca2+. The five variants and wild-type structures were solvated in an explicit TIP3P water box (77 Ǻ × 70 Ǻ × 72 Ǻ), and the system was neutralized using the Solvate and Autoionize plugins for VMD (21). The MD simulations were carried out using the GPU-enabled implementation of Nanoscale Molecular Dynamics (NAMD) (22) employing CHARMM36m (23) force field and producing three 200 ns trajectories at 310 K for each variant of CaM, with constant pressure control using the Langevin piston method (24). Periodic boundary conditions were used, the time step was set to 2 fs, and structures were saved every 10 ps. For all MD simulations, once the system is equilibrated to the correct density using NPT ensemble, we utilized NVT simulations. Because the pressure fluctuations in NVT are minor compared to those that may cause structural changes in peptides, we believe that the NVT ensemble is appropriate. The results were processed in VMD, first using the pcbtools plug-in to unwrap the structure trajectories (21).
The ∼30,000 structures generated through the MD simulations were characterized by several metrics to compare the impact of the mutations on the conformational dynamics of CaM. Prior molecular dynamics studies-established structural features are used to define the global dynamics that define transitions between the “open” and “closed” states based on the relative orientation of the two lobes of calmodulin (25). The end-to-end distance between the linker termini provides a measure of the distance between CaM lobes. A measure of the relative orientation of the N-lobe and C-lobe is obtained by measuring the dihedral angle formed by the end points of the linker region and the centers of mass of each CaM lobe. Additionally, a crude measure of the tightness of binding was determined by measuring the distance from the center of mass of each lobe to the center of mass of the IQ peptide. The nonbonded energies between CaM and the IQ peptide were calculated with the NAMD Energy plug-in in VMD (21).
To identify variant-specific structural populations, clustering was performed based on the RMSD of the CaM structures when aligned by the IQ peptide. The workflow for this study is shown in Fig. 1 i. Hierarchal clustering of the pairwise structural similarity produced distinct populations through selection of a clustering threshold, which maximized the number of clusters containing at least 10% of the overall trajectory, in which the sum of populated clusters must exceed 50%.
Figure 1.
Flowchart of methods used in the study. Different simulation strategies (second row, simulation) were used to show that the method works as long as there is appropriate sampling of the conformational space. After simulations, the trajectory files from the DCD are used in different machine-learning techniques for different endpoints (third row, machine learning). Originally, the energies from the trajectory were used for calculations in the analysis for calmodulin (i). The method was expanded to use all ϕ-ψ-angles for different ML methods as details of the structure might not be well studied and was used for analysis of Aβ as well as different simulation types and confirmed with application to calmodulin. The outcomes of the workflow are indicated in the fourth row (outcome). The final comparison to the ground truth evaluates the quality of the methods (fifth row, QA/QC)
Visualization of conformational features and PCA was carried out using ggplot in R (26,27). Variant centroids were calculated by first removing subpopulations most similar to wild-type CaM, as defined through hierarchal clustering. The remaining variant clusters were combined in the PC feature space, where centroids of subpopulation clusters were merged to an overall variant centroid and weighted according to its proportion of the overall trajectory conformations. The distance between centroids was calculated as a Euclidean distance.
Aβ peptide
To test the robustness of this method and to show that it can work with different approaches through sampling conformational space, REMD simulations were used to explore different conformations of the Aβ42 mutants and wild-type peptides (28). Because Aβ is intrinsically disordered, the dynamics could not be quantified using global structural changes as with CaM. Instead, the ϕ-ψ-angles of the peptide backbone were used as the features to characterize variant-specific dynamics. For each system, N = 10 replicas were employed and distributed exponentially from 270 to 600 K. The temperature closest to body temperature, 295 K, was used to assess the conformational states of Aβ variants. We used the wide temperature range to enhance conformational sampling. It is difficult to ascertain the connection between free energy states and temperatures because Aβ is a natively disordered peptide-sampling manifold of free energy basins. In addition, classic pairwise nonpolarizable force fields do not capture well the temperature dependence of conformational ensembles (see, for example (29)). Therefore, the REMD temperature range serves mainly to generate diverse conformations. Please see https://github.com/MDPPM/initialCode for all conformation file parameters. Three separate REMD simulations were carried out for each variant and wild-type proteins. Variants were modified using the VMD mutation tool through deletion and insertion of variant residues into the wild-type Aβ42.
Two preliminary simulations for wild-type and variants were initiated with the PDB structure 1IYT for wild-type Aβ42 and carried out before REMD simulation launches, one to assess the folded state and one the unfolded. To this end, wild-type or variant structures were simulated for more than 1 ns at 300 or 600 K. The end PDB structures in the preliminary simulations were used for REMD simulations. Aβ NPT preliminary simulations in explicit solvent had the initial dimensions of the unit cell of 94 Ǻ × 93 Ǻ × 97 Ǻ and utilized the time step of 2 fs for 20-ns trajectories, which was enough time for the calculated RMSD from the initial wild-type structure to stabilize. We verified the quality of the simulations by computing the J-coupling and comparing to the experiment (below). We also saw the minimal changes to the distribution of the ensemble of structures over subintervals at the end of the simulation. Nonbonded interactions were computed using smooth switching function applied between 10 and 12 Ǻ. Particle-mesh Ewald summation with the grid size of 1 Ǻ was used for electrostatic interactions. Langevin dynamics with the damping coefficient of 10 ps−1 were employed for temperature control, whereas the pressure control was applied by using Langevin piston method and coupled unit cell dimensions. The production NVT REMD simulations sampling Aβ peptides used the time step of 1 fs. Nonbonded interactions were smoothly switched off in the interval from 7 to 8 Ǻ. Although we utilized the cutoffs for switching function, which are different from the CHARMM default values, we applied them uniformly for all Aβ simulations, including the WT and mutants. Consequently, we believe that any resulting differences are likely to cancel out when we compare the conformational ensembles of the WT and mutants. Additional argument in favor of validity of our simulations is a good agreement between computed and experimental J-coupling values. Temperature was controlled using Langevin dynamics with the damping coefficient of 10 ps−1. Three separate REMD simulations were performed for each variant and wild-type Aβ using NAMD default parameter ratios.
Machine learning utilized the PDB files from the last ∼500 of the simulation trajectory. Calculations for Euclidean distances and the Karplus equation used files that were sorted by RMSD values according to the workflow in Fig. 1 iv. The files are sorted by setting a specific RMSD threshold followed by a root mean-square fluctuation (RMSF). A TCL script was generated for the output of ϕ and ψ angles for each PDB file using the VMD TkConsole, and machine learning was administered on the resulting raw backbone angle values by executing a principal component analysis in RStudio as shown by the workflow in Fig. 1, ii and iii.
Comparison of Aβ with experiments for simulation convergence
To quantitatively assess the accuracy of the simulations, the consistency between experimental and computational conformational ensembles was calculated by computing the 3JHNHα coupling constants for the wild-type simulation (Fig. 2). J-coupling is associated with the three-bond coupling interaction between HN and Hα protons. These constants are sensitive to the peptide’s secondary structure (31). In this study, we used two sets of experimentally measured J-coupling constants, Jexp, for Aβ 1–42 peptide (30). In silico J-coupling constants, Jcomp, were determined from the backbone dihedral angles using Karplus equation (30)
where is a backbone dihedral angle, and A, B, and C are the coefficients determined by fitting the experimental data. We used the sets of coefficients reported by Pardi et al. where A = 6.4, B = −1.4, and C = 1.9 (32). The correlation coefficient between experimental and simulated J-coupling was 0.46 and 0.30 for the SOFAST and JHNA data, respectively (Fig. 2). Amino acids not reported experimentally were excluded from Pearson correlation computations of J-coupling comparisons. The extent of correlation between experimental and computational J-couplings determined in our study approaches those reported previously (33).
Figure 2.
J-coupling calculations. The calculated 3JHNHα coupling constants (J-coupling) indicate that the simulated structure are consistent with experimental structures from (30). To see this figure in color, go online.
Consistency between different time points in the simulations was evaluated to confirm their convergence between the beginning and closing stage of sampling. Approximately 10% of the structures were taken from each cluster from the beginning of the trajectory and from the end of the trajectory and were analyzed separately for the age of onset qualitative correlations. Both yielded similar Spearman correlation coefficients of −0.95 and −0.80 for the first 10% and last 10%, respectively.
Calmodulin-implicit REMD simulations
As method development progressed, another second set of CaM simulations were carried out to test the reliability of the MDPPM using different simulation parameters to further demonstrate the robustness of the method. Implicit solvent was chosen in this case (instead of explicit solvent) to demonstrate that the method was equally valid with this assumption. REMD was used for these, with four replicates from temperatures 310 to 326 K. We selected the narrow temperature range for calmodulin simulations because we wanted to avoid global unfolding of its structure. Nevertheless, the selected temperature range was sufficient to facilitate rigid rotations of calmodulin lobes. Approximately the last 500 frames from all four replicates were used to calculate the protein backbone dihedral angles used in the analyses (Fig. 1 ii). The variants E141V and E141G were added to this set as they pertain to LQTS (34,35).
To perform REMD simulations of calmodulin, we utilized General Born Implicit Solvent (GBIS) model. The simulations were performed using the time step of 2 fs for a 10 ns trajectory, which was sufficient for the RMSD difference from the initial wild-type structure to stabilize. We also saw the minimal changes to the distribution of the ensemble of structures over subintervals at the end of the simulation. All covalent bonds were kept rigid using ShakeH algorithm, except for water molecules for which we employed SETTLE algorithm. GBIS parameters included a 0.3 M ion concentration, an α-cutoff of 14 Ǻ, and a solvent dielectric of 80. The surface tension was set to 0.006 kcal/mol/Ǻ2. van der Waals and electrostatic interactions were evaluated with the period of two and four steps, respectively. Nonbonded interactions were smoothly switched off in the interval from 15 to 16 Ǻ. Temperature was controlled using Langevin thermostat with the damping coefficient of 5 ps−1. Simulations of Aβ and calmodulin used CHARMM36m protein force field and modified TIP3P water model.
Prediction testing
The method outlined in this article was used to make the following three predictions and compared with other methods with respect to accuracy and abilities:
-
1)
Can the method accurately predict if the variant was pathogenic/nonpathogenic?
-
2)
Can the method accurately predict the specific phenotype caused by the variant?
-
3)
Can the method accurately predict the severity of the phenotype caused by the variant?
To do this, we took the PCA coordinates computed above for the ϕ- and ψ-angles for Aβ and for all conformations in the REMD ensemble. For each variant, a centroid was calculated for each PC. A second round of PCA coordinate transformation was applied to these data. This reduced the number of PCs from 82 to 8 with a majority of the variance captured by just a few PCs. These new data were used to address the above three questions.
The KNN machine-learning method was used to both predict pathogenic versus nonpathogenic variants and to predict the phenotypes for each disease (Fig. 1 iii). It was also used to assess the sensitivity, specificity, and balanced accuracy of the method (Fig. 1 ii). Either a random forest algorithm was used for the ensemble of PCs to calculate accuracy or a pairs plot (Figs. S1–S4) was made for the double PCA or aggregate PCA centroids to identify appropriate PCs to be used for the KNN “yes/no” classifications. Using the matrix of the double PCA set or aggregate centroid, a leave one out strategy was used with one variant held out as a test variant and the remaining variants used as training data. KNN classified the test variant as “pathogenic” or “nonpathogenic,” “CPVT or LQTS,” and “cerebral amyloid angiopathy (CAA) or familial Alzheimer’s disease (AD1)” (Fig. 1 iii). In addition to pathogenicity and phenotype classifications, an accuracy measurement was made using the entire set of the original PCA collective to gain an insight into accuracy through the trajectory (Fig. S5). In this case, a leave two out strategy was used, in which a pathogenic variant and a nonpathogenic variant were left out of the training set and then added back into the set to be classified. This assured that no training bias would be implemented into the test set. All permutations for every combination of nonpathogenic and pathogenic or disease type were used (please see Figs. S1–S5 for quantitative numbers).
Several methods were used to fit predicted phenotype severity curves. In all instances, however, the Euclidean distance of the variant from the wild-type protein was used as a predictor of severity. Curve fitting for data points (except those without know severity) was performed by using the formula , where y is the predicted characteristic, x is the Euclidean distance, and c1, c2, and c3 are the fitted constants (Figs. 7, 8, and 9).
Figure 7.
Figures showing the predicted values in blue and the actual values in red. (A) Distance versus RyR open probability has a Spearman correlation coefficient of 1.0 and a Pearson correlation coefficient of 0.96. (B) Distance versus CaM calcium binding affinity has a Spearman correlation coefficient of 1.0 and a Pearson correlation coefficient of 0.73. (C) Distance versus escapes (arrhythmic events) has a Spearman correlation coefficient of 1.0 and a Pearson correlation coefficient of 0.83. Shown are experimental data for RyR open probability, CaM calcium binding affinity, and escapes from (36,37). To see this figure in color, go online.
Figure 8.
Studies with Aβ peptide. (A) Aβ wild-type PDB: 1IYT before simulations shows helix morphology. (B) 1IYT structure during simulation shows coil morphology with β-sheets. (C) Shown is a 3D image of PCA separation (PC1, PC8, and PC11) by phenotype (AD1, blue; CAA, red; and negative controls, gray). (D) Shown is the age of onset versus the Euclidean distance calculated from PCA variance has a Spearman correlation coefficient of −0.92 and a Pearson correlation coefficient of −0.76. (E) Shown is a table of distances from wild-type. To see this figure in color, go online.
Figure 9.
(A) Linear plot of all variants of CaM based on ϕ-ψ-angles as the feature set using explicit solvent REMD simulations. (B) Shown is a 3D image of variant separation of CaM. (C) Table shows quantitative data found in the literature and predicted by MDPPM. The experimental Ca2+-CaM dissociation constant values are from (34,35,37). Shown are experimental data for RyR open probability and escapes from (36,37). To see this figure in color, go online.
Results
Calmodulin
Several disease-causing variants have been identified in calmodulin (CaM). For example, CaM variants give rise to different potentially fatal cardiac arrhythmias, polymorphic ventricular tachycardia (CPVT) (19,36,38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49), and LQTS (46,50). Correct diagnosis is key to disease management (see Supporting Materials and Methods for a description of calmodulin physiology). Here, we show how features extracted from all-atom simulations of LQTS- and CPVT-associated variants in CaM can distinguish the nature and severity of functional disruption. Metrics describing the global structural conformations and energetic contributions to molecular interactions show a variant’s specific divergence from the wild-type dynamics (25). More broadly, this result highlights the potential for extracting structural and energetic attributes measured from molecular dynamics simulations as features for more advanced machine-learning classification algorithms.
Modeling the conformational dynamics in the folded structure of CaM variants provides insight into the severity of functional disruption by comparing changes in structural metrics and energetic interactions with the wild-type structure. Using all-atom molecular dynamic simulations of CaM bound to the IQ domain of the L-type Ca2+ channel, features of variant-specific conformational diversity were used to measure the impact of three LQTS-associated variants (D96V, D130G, and F142L) and two associated with CPVT phenotypes (N54I and N98S) on CaM structural dynamics and interactions.
The simulations were initiated using the experimentally determined structure of wild-type CaM bound to a short peptide representing the IQ domain from the L-type Ca2+ channel (Fig. 3 A; (20)). In this initial structure, a compact CaM is wrapped around the α-helical IQ peptide with both the N-lobe and C-lobe forming intermolecular interactions. Three 200-ns simulation trajectories were used to sample the conformational landscape. In the course of the simulations, neither CaM lobe dissociated from the IQ peptide, nor did the RMSD deviate by more than 5 Å from the wild-type structure (data not shown).
Figure 3.
(A) The ribbon representation of calmodulin bound to the L-type Ca2+ channel IQ domain peptide (PDB: 2F3Y) shows the calcium binding protein in a compact conformation in which both the C-lobe (blue) and N-lobe (orange) make contacts with the peptide (yellow). The flexible linker region of calmodulin (red) imparts a high degree of conformational flexibility. (B) The results of molecular dynamics simulations produced consistently compact structures, as measured by the end-to-end distance of the linker region and the dihedral angle formed by the center of mass of each calmodulin lobe and their respective intersections with the linker region. Variant trajectories showed a distinct variation from the wild-type structures. To see this figure in color, go online.
The subtle structural variation induced by CaM mutations is captured by metrics relating the relative positions of the N- and C-lobes, which include the linker distance and the relative twist of the N- and C-lobes (see Methods for their definition). These metrics have been used in previous molecular dynamics simulations to characterize variant-induced changes to CaM conformational dynamics (25). Fig. 3 B shows the conformational distribution of CaM variants overlap, revealing their uniquely distributed conformational profiles. In the case of the LQTS mutations, the divergence of a particular variant from the wild-type distribution approximates the severity of functional disruption of CDI-observed single cell experiments (e.g., functional impairment of F142L is less severe than D130G or D96V) (36).
However, the conformational distributions captured by the two measures are not always sufficient to distinguish between functional differences, as illustrated by the overlap of the LQTS variant D130G with CPVT variant N54I. Although all variants show a decreased affinity for RYR2 (36,37), the N54I variant does not impair CDI (36). The distribution of the N98S variant shows significant overlap with other variants but does not clearly show how the unique population level profile translates to a more severe CPVT phenotype than the other variants (37). Molecular-scale structural comparisons alone are not enough to discriminate between the subtle functional differences, but structural clustering of individual variant trajectories revealed subpopulations with distinct changes to local structure (Fig. 4). The analysis of these clusters through the definition of several distinct population features (Fig. 5) reveals a large variation within individual variant clusters and between the overall distributions of individual variants.
Figure 4.
The ribbon backbone for a representative structure from each distinct subpopulation of variant structures observed during the simulated trajectories show a wide range of local structural dynamics in the (A) N-lobe, (B) C-lobe, and (C) linker regions of calmodulin. (D) The clustering of RMSD between representative structures of variant subpopulations show no clear separation between variants and wild-type structures. To see this figure in color, go online.
Figure 5.
The variant trajectories were used to define several (A) structural and (B) energetic features associated with the variant subpopulations. The features were chosen to describe global aspects of CaM related to the function such as the overall compactness (dihedral and linker distance), association of each CaM lobe with the IQ peptide, and interaction energies between CaM and the IQ peptide. Variant clusters take on a range of values when compared to other subpopulations and show some overlapping similarities between structures from alternate variant simulations. To see this figure in color, go online.
The complexity of the conformational profiles lends itself to multivariate analysis, and plotting the first two principal components of the variant trajectory structures shows distinct separation of certain conformational subpopulations (Fig. 6 A). Clearly, the distributions more fully distinguish between the variant-specific conformational profiles than molecular-scale structural attributes alone. Furthermore, if variant clusters that share similar features to the wild-type protein are removed (Fig. 6 B), the centroids of the remaining variant structures collapse into distinct and informative regions of the principal component feature space (Fig. 6 C). Fig. 6 D shows that a WPGMA tree based on the Euclidean distances from wild-type protein will separate the different phenotypes into different clades (Fig. 6 C). Pairwise distances between variants were based on the one-dimensional Euclidean distances (PC1 from Fig. 6 C) between these values.
Figure 6.
(A) Principal component analysis of the structural and energetic features show both separation and overlap of variant subpopulations with those derived from the wild-type simulations. When the (B) overlapping variant populations are removed and the variant distributions are (C) collapsed into a weighted centroid, the separation from the wild-type centroid correlates with the observed functional impairment of calcium dynamics in LQTS variants (D130V > D96V ≫ F142L) (36). The separation of the CPVT variants from the LQTS mutation could indicate alternate functional changes. The separation of the CPVT variants (N98S > N54I) also correlates with the severity of the defect (37). (D) Clustering by WPGMA on the Euclidean distance from wild-type CaM in (C) shows that the different defects can be classified in separate clades. To see this figure in color, go online.
The Euclidean distance of each centroid for the LQTS and CPVT mutants are separate and each follows the relative impact on Ca2+ transient dynamics in cardiac myocytes that define the role that the variant plays in disease (36,37). It should be noted that although the distribution multivariate feature space is capturing interactions with the L-type Ca2+ channel peptide, the distribution of variant centroids may also contain information related to similar CaM interactions, such as the CDI of voltage-gated Na+ channels, which maintain ionic homeostasis and contribute to arrhythmogenic mechanisms (51). Fig. 7 demonstrates how the Euclidean distances can be plotted against physiological parameters. In Fig. 7 A, for CPVT, the RyR open probability increases with the Euclidean distance from wild-type CaM. In Fig. 7 B, for LQTS, the Ca2+ binding affinity of CaM increases with the Euclidean distance form wild-type CaM. In Fig. 7 C, for LQTS, the number of arrhythmic events increases with the Euclidean distance form wild-type CaM.
An additional feature of the principal component analysis is a measure of individual feature contributions to the first two principal components. This specifies the features involved in functional disruption, potentially giving mechanistic insight. For example, D130G and F142L both lose calcium binding at the same EF hand, but divergence from the wild-type conformational distribution shows the underlying mechanisms are driven by differences in the van der Waals interactions. Another example is the difference in the distance of the N-lobe to the IQ domain, which appears to significantly contribute to the functional differences in wild-type divergent structural populations for most variants. This is notable because all sequence mutations occur in the C-lobe, and this further reveals the intricate internal dynamics of CaM, which propagates the structural change to more distant regions of the interacting complex.
Aβ peptide
Missense variants in amyloid precursor protein (APP) have been associated with two neurodegenerative diseases: Familial Alzheimer’s disease (AD1) and Cerebral Amyloid Angiopathy (CAA). AD1 is a rare form of Alzheimer’s that is inherited from a parent and can be traced through family pedigrees. AD1 accounts for 2–3% of all Alzheimer’s cases and usually has a much earlier onset than other types of Alzheimer’s disease, with symptoms developing as early as the 30s or 40s (52). AD1 may be accompanied with CAA and typically expresses the classic phenotypes of senile plaques, neurofibrillary tangles, and abundant brain parenchymal amyloid. AD1 variants are usually named after the region where the family of origin was located. This study uses variants of AD1 to include the Arctic (E22G), Flemish (A21G), Tottori (D7N), and the recently discovered C-terminal (A42T) mutation.
CAA is a disorder characterized by amyloid deposition in the wall of cerebral blood vessels. Deposits of amyloid occur frequently in blood vessels of the frontal, parietal, and occipital cortex and occur primarily in medium-sized arteries and arterioles (53). CAA is an age-associated condition that is not always associated with Alzheimer’s disease but is featured in more than 80% of all Alzheimer’s cases (54). Aβ variants presenting substitutions at positions 21–23 are principally associated with CAA, although variants at these positions can demonstrate extraordinarily dissimilar clinical phenotypes, including cerebral hemorrhage, dementia, or both (55). The variants of Aβ used in this study leading to CAA are the Italian (E22K), Dutch (E22Q), Iowa (D23N), and recently described Piedmont (L34V) variants all have been linked to CAA (see Supporting Materials and Methods for a description of Aβ physiology for further details of all variants).
Experimental validation can be difficult because of variable definitions, or lack thereof, of what defines a phenotype or severity. Candidate selection of pathogenic variants were chosen from the literature. To prove that this method is able to predict variants that are most likely innocuous, we included variants that were expected to be negative controls, that were either E to D or D to E. Both glutamic acid (E) and aspartic acid (D) are both negatively charged amino acids with similar structure and properties so that substations at any E or D that are not in a highly conserved region are likely nonpathogenic.
The structure and aggregation mechanisms of mutated Aβ fibrils are not completely understood, although many hypotheses have been presented for the mechanisms of aggregation as well as the structure of a single monomer (56, 57, 58). This study focuses on the random coil structures that each variant monomer may assume and hypothesizes that these monomeric structures may not be as indiscriminate as once supposed.
In nonpolar medium and organic solvents, the Aβ molecule from the PDB database seems to adopt a uniform helical structure (28,59). However, NMR data suggest that in water, the monomer exists as a coil with specified structural propensities at specific residues (56,60,61). These residues can undergo shifts from random coil to β-sheets under different conditions, and previous studies have found that the two most hydrophobic regions (L17–A21 and A30–V40) were commonly found to have a higher propensity for β-conformations (56). The simulations in this study support this hypothesis. Fig. 8 A shows the wild-type PDB structure. Fig. 8 B shows a structure with β-sheets that developed at amino acid residues F19, F20, G38, and V39 during a simulation.
This research study with Aβ peptide used only the ϕ- and ψ-angles of the protein backbone from the simulation trajectory as the feature set for the prediction of the disease-state phenotypes. The results of the principal component analysis (PCA) show that groups for CAA, AD1, and WT can be separated by identifying the ideal components. Fig. 8 C (AD1, blue; CAA, red; and negative control, gray) shows a three-dimensional (3D) image of the group of centroids of the clustered structures in PCA space (PC1, PC8, and PC11). Euclidean distances from the PCA analysis correlate with age of disease symptom onset with a Pearson correlation of −0.7601 (Fig. 8 D). A table with the distances from WT is shown in Fig. 8 E. These distances were calculated using the full PCA space of 82 dimensions for ϕ- and ψ-angles.
Calmodulin REMD simulations
In the aforementioned studies on calmodulin, structural and thermodynamics features were used in the machine-learning analysis. In the case of many proteins, such structural features may not be well studied. For that reason, the analysis for calmodulin was repeated, this time using the ϕ and ψ angles of the protein backbone. Predictions for disease-state phenotypes and pathogenicity were 100% accurate (Table 1) when using the yes/no method of KNN for a single, one out variant (Fig. 1 iii). Fig. 9 shows the results of the CaM REMD simulations that show the separation of variants as well as the predicted values versus the quantitative measure of severity. The correlation between predicted values of CaM-Kd were exponentially fitted, as per the graphs in Fig. 7 with D130 having a very high dissociation Kd. We used only the variants that exhibited LQTS for the new severity measurements. The variant-predictive values gave a Spearman correlation of 0.75 and a Pearson correlation coefficient of 0.52.
Table 1.
Calmodulin Prediction Comparison to Phenotype Prediction Software
MDPPM |
PON-P |
CONDEL |
predictSNP |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Variant | Phenotype Prediction | PON-P | SIFT | PPH2 | MA | FATHMM | CONDEL | predictSNP | PhD-SNP | PolyPhen-1 | PolyPhen-2 | SIFT | SNAP |
A21G | AD1b | 0.845a | 0.420d | 0.978b | 0.805b | −4.910d | 0.561b | 75%d | 83%d | 67%d | 61%d | 76%d | 56%b |
A42T | AD1b | 0.949b | 0.000b | 1.000b | 1.040b | −4.910d | 0.570b | 60%d | 51%d | 67%d | 68%d | 53%b | 81%b |
D7N | AD1b | 0.950b | 1.000d | 0.336c | 0.345d | −4.090d | 0.519d | 72%b | 55%d | 59%b | 63%b | 79%b | 85%b |
E22G | AD1b | 0.937b | 0.010b | 0.912b | 0.975b | −4.930d | 0.568b | 51%b | 61%b | 67%d | 41%b | 53%d | 72%b |
D23N | CAAb | 0.791a | 0.230d | 0.984b | 0.975b | −5.110d | 0.573b | 87%b | 59%b | 74%b | 60%b | 53%b | 81%b |
E22K | CAAb | 0.895b | 0.010b | 0.460c | 0.625d | −4.910d | 0.552b | 87%b | 61%b | 59%b | 60%b | 45%b | 89%b |
E22Q | CAAb | 0.956b | 0.010b | 0.936b | 0.975b | −4.920d | 0.568b | 61%b | 51%d | 67%d | 81%b | 79%b | 89%b |
L34V | CAAb | 0.885b | 0.000b | 0.999b | 1.040b | −4.990d | 0.572b | 76%b | 51%d | 74%b | 68%b | 79%b | 62%b |
E3D | WTd | 0.861b | 0.150d | 0.015d | 0.975b | −4.870d | 0.566b | 71%d | 72%d | 67%d | 68%d | 79%b | 58%d |
E22D | WTd | 0.938b | 0.150d | 0.015d | 0.975b | −4.870d | 0.566b | 75%d | 66%d | 67%d | 63%d | 67%d | 56%b |
Pearson | −0.726 | 0.395 | 0.268 | −0.258 | −0.267 | 0.448 | −0.362 | 0.007 | −0.380 | −0.123 | 0.673 | 0.823 | 0.067 |
Accuracy | 100.0% | 60.0% | 70.0% | 80.0% | 60.0% | 0.0% | 70.0% | 60.0% | 30.0% | 40.0% | 60.0% | 60.0% | 80.0% |
Unknown.
Deleterious/pathological.
Possible damage.
Neutral/nonpathological.
Comparison with other phenotype prediction tools
Several tools have been developed to predict if a particular missense mutation will cause a pathologenic change in function. These tools include, but are not limited to, SIFT, Polyphen-1, Polyphen-2, MutationAssessor, Functional Analysis through Hidden Markov Models (FATHMM), PhD-SNP, SNAP, PANTHER, Auto-Mute, PLINK, CC/PBSA, and I-Mutant (62). Several consensus tools, such as PON-P, Condel, and PredictSNP, have been created to combine multiple tools to get a better prediction of phenotype (63). These tools generally classify a variant as nonpathological or pathogenic, with some allowing an intermediate classification of possibly pathogenic.
Tables 1 and 2 show a comparison of the method developed here (called MDPPM) for the amyloid calmodulin and β-peptide, respectively. The scores in the table were returned by the software, and the meaning is indicated by the coloring. The ground truth used in the comparison is the experimental/clinical observation of pathogenicity/nonpathogenicity in the scientific literature and databases such as Uniprot and ClinVar. For example, the cells in white are predicted to be nonpathogenic, meaning that the software predicted no pathology associated with the variant. Please note that in our hands, we could not get the Condel server to return reasonable results for calmodulin. Our MDPPM method correctly predicts pathological versus nonpathological with 100% accuracy, compared with the other methods that all contain some amount of error. Furthermore, the other methods cannot differentiate between different phenotypes caused by variants of a particular gene, whereas the MDPPM method does this accurately. Finally, the MDPPM method can predict the severity of the phenotype as shown by the high correlation coefficients comparing the distance from WT to the different measures. The other methods show little consistent correlation of their scoring with these measures.
Table 3.
Pearson Correlation
Quantitative Measure | PathoDynamics | PredictSNP |
---|---|---|
Escapes | 0.915 | −0.969 |
RyR Open Probability | 0.995 | 1.000 |
Ca-Calm Binding Affinity | 0.788 | −0.990 |
Note that Pearson correlation does not include WT for this calculation in MDPPM.
Table 2.
Aβ Prediction Comparison to Phenotype Prediction Software
Variant | MDPPM |
PON-P |
Condel |
PredictSNP |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Phenotype Prediction | PON-P | SIFT | PPH2 | MA | FATHMM | CONDEL | predictSNP | PhD-SNP | PolyPhen-1 | PolyPhen-2 | SIFT | SNAP | |
N54I | CPVTb | 0.847b | NDb | 0.778a | NDd | −1.3d | 0.437d | 43%b | 77%b | 74%b | 63%d | 79%b | 55%d |
N98S | CPVTb | 0.893b | NDb | 0.813d | NDd | 0.51b | 0.369d | 75%d | 51%d | 67%d | 72%d | 79%b | 58%d |
D96V | LQTSb | 0.832b | NDb | 0.118a | NDd | −0.59d | 0.410d | 59%b | 73%b | 74%b | 40%b | 79%b | 62%b |
D130G | LQTSb | 0.881b | NDb | 0.983b | NDd | −2.82d | 0.495d | 51%b | 86%b | 74%b | 47%b | 46%b | 72%b |
E141G | LQTSb | 0.818b | NDb | 0.994b | NDd | −2.17d | 0.470d | 87%b | 86%b | 74%b | 65%b | 53%b | 51%b |
E141V | LQTSb | 0.842b | NDb | 0.999b | NDd | −2.18d | 0.471d | 87%b | 73%b | 74%b | 81%b | 79%b | 81%b |
F141L | LQTSb | 0.259c | NDb | 0.983b | NDd | −2.82d | 0.495d | 66%b | 61%b | 59%b | 45%b | 79%b | 62%b |
D79E | WTd | 0.43c | NDb | 0.001d | NDd | −1.79d | 0.456d | 74%d | 72%d | 67%d | 87%d | 45%b | 77%d |
D119E | WTd | 0.698c | NDb | 0.003d | NDd | −0.67d | 0.414d | 74%d | 55%d | 67%d | 87%d | 46%b | 7%d |
Pearson | See Table 3 | ||||||||||||
Accuracy | 100.0% | 66.7% | 77.8% | 66.7% | ND | 33.3% | 22.2% | 88.9% | 88.9% | 88.9% | 77.8% | 77.8% | 77.8% |
ND, not determinable.
Unknown.
Deleterious/pathological.
Possible damage.
Neutral/nonpathological.
Discussion
As the age of genomic sequencing inundates translational researchers with increasingly large numbers of sequences and variations, it is important to have an accurate way to predict and prioritize variants for further study. Innovative computational methods have been applied to predict the likelihood of a particular variant to disrupt protein function but are limited in their translational application. The enormous scale of variability has driven the development of high-throughput algorithms that can be applied independently of the underlying molecular mechanisms. In many cases, this approach can accurately predict how to interpret the pathogenicity of a particular variant, but even the best methods fail on a significant enough rate to prevent their clinical use (64). Furthermore, the categorical prediction produced by these methods is of little use in complex diseases, especially where therapeutic interventions rely on an accurate diagnosis of the underlying mechanisms. Here, we present a computational method that can not only predict the different phenotypes caused by missense mutations but can also quantify the severity of those changes with respect to the wild-type protein.
Because the approach relies on computationally intensive molecular simulations, it is fundamentally not a high-throughput approach. However, it is still generalizable across the proteome, and the additional layer of biophysical information enables the mechanistic interpretation of the results. Furthermore, although many of the existing tools are computationally less intensive, they do not offer the same level of accuracy. The simulations allow variant-specific contributions to the structural dynamics to be quantified, and the divergence from the wild-type behavior can be used to estimate the relative functional impact. By examining how these profiles correlate to known phenotypic changes, specific mechanistic insight can be derived. Not only was this evident in the ability to capture variant-specific changes in calmodulin interactions with the L-type calcium channel and RYR2, the use of ϕ and ψ angles in the Aβ peptide analysis demonstrates the ability to isolate specific residue contribution to disease severity. By leveraging a comparison to the wild-type structural dynamics, bias in the system (by interaction partners, initial structure, and molecular dynamics implementation) is neutralized so that the results reflect a variant-specific change to the broader conformational dynamics of the folded protein structure.
There have been other efforts to apply machine learning to features from molecular simulations; however, they differ in their focus and relative success. Agrahari and co-workers have tried to use the phenotype prediction software and other measures from molecular simulations coupled to PCA to predict the severity of phenotype; however, they were only able to make qualitative suggestions and not quantitative predictions (65). A more recent study by Sinha and Wang used RMSD (Root-mean-square-deviation), RMSF (Root-mean-square-fluctuations), Rg (Radius of gyration), SASA (Solvent-accessible surface area), NH bond (hydrogen bond), and Covariance analysis calculated from molecular dynamics simulations to predict whether unclassified variants of the BRCA 1 gene were cancerous or noncancerous (66). This study was essentially making predictions of pathogenicity. Kumar and Purohit (67) used software prediction tools SIFT, Polyphen-2, PhD-SNP, Pmut, MutPred, Dr Cancer, FATHMM, and SNP Function Portal to predict SNPs in the Aroura A protein that were likely cancer causing followed by molecular dynamics simulations and docking. They showed that the RMSD, RMSF, radius of gyration, docking energy, total energy, and protein-solvent interactions were altered in the mutant predicted to be cancer causing (67). There have also been other methods predicting ligand binding efficiency using molecular simulation and machine learning (68,69). However, none of these methods predict how mutations to the same gene can lead to different phenotypes nor do they predict the phenotype severity (33,70).
The application of the method presented here relies on a molecular structure, and as such, can be extended beyond these applications to missense variation. Aside from an accurate atomic structure, the only other requirement is a training set of phenotypic data. Future applications could adapt the approach to investigate the contribution of variants in RNA and DNA to changes in regulatory function as well as quantify the impact to specific molecular interactions in druggable proteins to optimize the selection of targeted inhibitors. A molecular simulation-based approach to predicting the consequence of genomic variation provides a higher resolution of functional interpretation. Not only is the method presented here able to differentiate between distinct phenotypes in complex diseases, variant-specific changes can be quantified to provide insight into the underlying disease mechanism. Clearly, the translational potential of these methods to inform clinical decision-making is significant.
Author Contributions
M.D.M. carried out the all-atom simulations and data analysis on CaM and wrote the original version of the manuscript. J.H. carried out the all-atom simulations and data analysis on Aβ and the additional simulations and analyses on CaM using the ϕ- and ψ-angles. M.S.J. originally envisioned the project with significant contributions from M.D.M. and D.K.K. to complete it and performed data analysis on the feature sets and comparison to other methods. D.K.K. helped with the molecular simulation and feature selection. All authors participated in the writing of the manuscript.
Acknowledgments
This work was supported in part by National Institutes of Health, United States grants R01-HL105239 (M.S.J. and M.D.M.) and U01-HL116321 (M.S.J.).
Editor: Tamar Schlick.
Footnotes
Matthew D. McCoy and John Hamre contributed equally to this work.
Supporting Material can be found online at https://doi.org/10.1016/j.bpj.2020.12.002.
Contributor Information
Matthew D. McCoy, Email: matthew.mccoy@georgetown.edu.
M. Saleet Jafri, Email: sjafri@gmu.edu.
Supporting Material
References
- 1.Nelson M.R., Wegmann D., Mooser V. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science. 2012;337:100–104. doi: 10.1126/science.1217876. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Pennisi E. Breakthrough of the year. Human genetic variation. Science. 2007;318:1842–1843. doi: 10.1126/science.318.5858.1842. [DOI] [PubMed] [Google Scholar]
- 3.Tennessen J.A., Bigham A.W., Akey J.M., Broad GO. Seattle GO. NHLBI Exome Sequencing Project Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012;337:64–69. doi: 10.1126/science.1219240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Jelier R., Semple J.I., Lehner B. Predicting phenotypic variation in yeast from individual genome sequences. Nat. Genet. 2011;43:1270–1274. doi: 10.1038/ng.1007. [DOI] [PubMed] [Google Scholar]
- 5.Botstein D., Risch N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat. Genet. 2003;33(Suppl):228–237. doi: 10.1038/ng1090. [DOI] [PubMed] [Google Scholar]
- 6.Rehm H.L. A new era in the interpretation of human genomic variation. Genet. Med. 2017;19:1092–1095. doi: 10.1038/gim.2017.90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Dewey F.E., Grove M.E., Quertermous T. Clinical interpretation and implications of whole-genome sequencing. JAMA. 2014;311:1035–1045. doi: 10.1001/jama.2014.1717. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Posey J.E., Harel T., Lupski J.R. Resolution of disease phenotypes resulting from multilocus genomic variation. N. Engl. J. Med. 2017;376:21–31. doi: 10.1056/NEJMoa1516767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.O’Connor M., Deeks H.M., Glowacki D.R. Sampling molecular conformations and dynamics in a multiuser virtual reality framework. Sci. Adv. 2018;4:eaat2731. doi: 10.1126/sciadv.aat2731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Klein M.L., Shinoda W. Large-scale molecular dynamics simulations of self-assembling systems. Science. 2008;321:798–800. doi: 10.1126/science.1157834. [DOI] [PubMed] [Google Scholar]
- 11.Bharadwaj V.S., Kim S., Crowley M.F. Different behaviors of a substrate in P450 decarboxylase and hydroxylase reveal reactivity-enabling actors. Sci. Rep. 2018;8:12826. doi: 10.1038/s41598-018-31237-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Moffett A.S., Bender K.W., Shukla D. Molecular dynamics simulations reveal the conformational dynamics of Arabidopsis thaliana BRI1 and BAK1 receptor-like kinases. J. Biol. Chem. 2017;292:12643–12652. doi: 10.1074/jbc.M117.792762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Hakala M., Kalimeri M., Lappalainen P. Molecular mechanism for inhibition of twinfilin by phosphoinositides. J. Biol. Chem. 2018;293:4818–4829. doi: 10.1074/jbc.RA117.000484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Nagasundaram N., Zhu H., Chen L. Analysing the effect of mutation on protein function and discovering potential inhibitors of CDK4: molecular modelling and dynamics studies. PLoS One. 2015;10:e0133969. doi: 10.1371/journal.pone.0133969. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Pirolli D., Sciandra F., De Rosa M.C. Insights from molecular dynamics simulations: structural basis for the V567D mutation-induced instability of zebrafish alpha-dystroglycan and comparison with the murine model. PLoS One. 2014;9:e103866. doi: 10.1371/journal.pone.0103866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Swetha R.G., Ramaiah S., Anbarasu A. Molecular dynamics studies on D835N mutation in FLT3-its impact on FLT3 protein structure. J. Cell. Biochem. 2016;117:1439–1445. doi: 10.1002/jcb.25434. [DOI] [PubMed] [Google Scholar]
- 17.Jordan M.I., Mitchell T.M. Machine learning: trends, perspectives, and prospects. Science. 2015;349:255–260. doi: 10.1126/science.aaa8415. [DOI] [PubMed] [Google Scholar]
- 18.Ghahramani Z. Probabilistic machine learning and artificial intelligence. Nature. 2015;521:452–459. doi: 10.1038/nature14541. [DOI] [PubMed] [Google Scholar]
- 19.Boczek N., Will M., Ackerman M. Spectrum and prevalence of CALM1, CALM2, and CALM3 mutations in long QT syndrome, catecholaminergic polymorphic ventricular tachycardia, idiopathic ventricular fibrillation, and sudden unexplained death in the young. Circulation. 2013;128:A14699. [Google Scholar]
- 20.Fallon J.L., Halling D.B., Quiocho F.A. Structure of calmodulin bound to the hydrophobic IQ domain of the cardiac Ca(v)1.2 calcium channel. Structure. 2005;13:1881–1886. doi: 10.1016/j.str.2005.09.021. [DOI] [PubMed] [Google Scholar]
- 21.Humphrey W., Dalke A., Schulten K. VMD: visual molecular dynamics. J. Mol. Graph. 1996;14:33–38, 27–28.. doi: 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]
- 22.Phillips J.C., Braun R., Schulten K. Scalable molecular dynamics with NAMD. J. Comput. Chem. 2005;26:1781–1802. doi: 10.1002/jcc.20289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Huang J., Rauscher S., MacKerell A.D., Jr. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods. 2017;14:71–73. doi: 10.1038/nmeth.4067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Best R.B., Zhu X., Mackerell A.D., Jr. Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles. J. Chem. Theory Comput. 2012;8:3257–3273. doi: 10.1021/ct300400x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Aykut A.O., Atilgan A.R., Atilgan C. Designing molecular dynamics simulations to shift populations of the conformational states of calmodulin. PLoS Comput. Biol. 2013;9:e1003366. doi: 10.1371/journal.pcbi.1003366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.R Core Team . R Foundation for Statistical Computing; Vienna, Austria: 2016. R: A language and environment for statistical computing.https://www.R-project.org/ [Google Scholar]
- 27.Vu V.Q. 2011. ggbiplot: a ggplot2 based biplot. R package version 0.55.http://github.com/vqv/ggbiplot [Google Scholar]
- 28.Crescenzi O., Tomaselli S., Picone D. Solution structure of the Alzheimer amyloid beta-peptide (1-42) in an apolar microenvironment. Similarity with a virus fusion domain. Eur. J. Biochem. 2002;269:5642–5648. doi: 10.1046/j.1432-1033.2002.03271.x. [DOI] [PubMed] [Google Scholar]
- 29.Jephthah S., Staby L., Skepö M. Temperature dependence of intrinsically disordered proteins in simulations: what are we missing? J. Chem. Theory Comput. 2019;15:2672–2683. doi: 10.1021/acs.jctc.8b01281. [DOI] [PubMed] [Google Scholar]
- 30.Rosenman D.J., Connors C.R., García A.E. Aβ monomers transiently sample oligomer and fibril-like configurations: ensemble characterization using a combined MD/NMR approach. J. Mol. Biol. 2013;425:3338–3359. doi: 10.1016/j.jmb.2013.06.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Barnwal R.P., Rout A.K., Atreya H.S. Rapid measurement of 3J(H N-H alpha) and 3J(N-H beta) coupling constants in polypeptides. J. Biomol. NMR. 2007;39:259–263. doi: 10.1007/s10858-007-9200-8. [DOI] [PubMed] [Google Scholar]
- 32.Pardi A., Billeter M., Wüthrich K. Calibration of the angular dependence of the amide proton-C alpha proton coupling constants, 3JHN alpha, in a globular protein. Use of 3JHN alpha for identification of helical secondary structure. J. Mol. Biol. 1984;180:741–751. doi: 10.1016/0022-2836(84)90035-4. [DOI] [PubMed] [Google Scholar]
- 33.McCoy M.D. School of Systems Biology. George Mason University, ProQuest Dissertations Publishing; 2016. Calmodulin and cardiac arrhythmia: a multi-scale computational approach to understanding the relationship between sequence variation and disease. PhD thesis. [Google Scholar]
- 34.Wren L.M., Jiménez-Jáimez J., George A.L., Jr. Genetic mosaicism in calmodulinopathy. Circ. Genom. Precis. Med. 2019;12:375–385. doi: 10.1161/CIRCGEN.119.002581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Boczek N.J., Gomez-Hurtado N., Ackerman M.J. Spectrum and prevalence of CALM1-, CALM2-, and CALM3-encoded calmodulin variants in long QT syndrome and functional characterization of a novel long QT syndrome-associated calmodulin missense variant, E141G. Circ. Cardiovasc. Genet. 2016;9:136–146. doi: 10.1161/CIRCGENETICS.115.001323. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Yin G., Hassan F., Satin J. Arrhythmogenic calmodulin mutations disrupt intracellular cardiomyocyte Ca2+ regulation by distinct mechanisms. J. Am. Heart Assoc. 2014;3:e000996. doi: 10.1161/JAHA.114.000996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Hwang H.S., Nitu F.R., Knollmann B.C. Divergent regulation of ryanodine receptor 2 calcium release channels by arrhythmogenic human calmodulin missense mutants. Circ. Res. 2014;114:1114–1124. doi: 10.1161/CIRCRESAHA.114.303391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Nyegaard M., Overgaard M.T., Børglum A.D. Mutations in calmodulin cause ventricular tachycardia and sudden cardiac death. Am. J. Hum. Genet. 2012;91:703–712. doi: 10.1016/j.ajhg.2012.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Crotti L., Johnson C.N., George A.L., Jr. Calmodulin mutations associated with recurrent cardiac arrest in infants. Circulation. 2013;127:1009–1017. doi: 10.1161/CIRCULATIONAHA.112.001216. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Makita N., Yagihara N., George A.L., Jr. Novel calmodulin mutations associated with congenital arrhythmia susceptibility. Circ. Cardiovasc. Genet. 2014;7:466–474. doi: 10.1161/CIRCGENETICS.113.000459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Marsman R.F., Barc J., Bezzina C.R. A mutation in CALM1 encoding calmodulin in familial idiopathic ventricular fibrillation in childhood and adolescence. J. Am. Coll. Cardiol. 2014;63:259–266. doi: 10.1016/j.jacc.2013.07.091. [DOI] [PubMed] [Google Scholar]
- 42.Reed G.J., Boczek N.J., Ackerman M.J. CALM3 mutation associated with long QT syndrome. Heart Rhythm. 2015;12:419–422. doi: 10.1016/j.hrthm.2014.10.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Gomez-Hurtado N., Kryshtal D., Knollmann B. Calmodulin mutation (CALM1–E141G) associated with long QT syndrome disrupts calmodulin calcium binding and impairs L-type Ca channel inactivation. Heart Rhythm. 2014;11:2135–2136. [Google Scholar]
- 44.Pipilas D.C., Johnson C.N., George A.L., Jr. Novel calmodulin mutations associated with congenital long QT syndrome affect calcium current in human cardiomyocytes. Heart Rythm. 2016;13:2012–2019. doi: 10.1016/j.hrthm.2016.06.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Vincent G.M. The long-QT syndrome--bedside to bench to bedside. N. Engl. J. Med. 2003;348:1837–1838. doi: 10.1056/NEJMp030039. [DOI] [PubMed] [Google Scholar]
- 46.Wilde A.A.M., Bhuiyan Z.A., Schwartz P.J. Left cardiac sympathetic denervation for catecholaminergic polymorphic ventricular tachycardia. N. Engl. J. Med. 2008;358:2024–2029. doi: 10.1056/NEJMoa0708006. [DOI] [PubMed] [Google Scholar]
- 47.Viskin S. Long QT syndromes and torsade de pointes. Lancet. 1999;354:1625–1633. doi: 10.1016/S0140-6736(99)02107-8. [DOI] [PubMed] [Google Scholar]
- 48.Kathiresan S., Srivastava D. Genetics of human cardiovascular disease. Cell. 2012;148:1242–1257. doi: 10.1016/j.cell.2012.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wehrens X.H., Lehnart S.E., Marks A.R. FKBP12.6 deficiency and defective calcium release channel (ryanodine receptor) function linked to exercise-induced sudden cardiac death. Cell. 2003;113:829–840. doi: 10.1016/s0092-8674(03)00434-3. [DOI] [PubMed] [Google Scholar]
- 50.Angrist M., Chandrasekharan S., Cook-Deegan R. Impact of gene patents and licensing practices on access to genetic testing for long QT syndrome. Genet. Med. 2010;12(Suppl):S111–S154. doi: 10.1097/GIM.0b013e3181d68293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Ben-Johny M., Dick I.E., Yue D.T. Towards a unified theory of calmodulin regulation (calmodulation) of voltage-gated calcium and sodium channels. Curr. Mol. Pharmacol. 2015;8:188–205. doi: 10.2174/1874467208666150507110359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Sauer A. 2017. What you need to know about familial Alzheimer’s disease, Alzheimers.net. [Google Scholar]
- 53.Nešić S., Kukolj V., Jovanović M. Histological and immunohistochemical characteristics of cerebral amyloid angiopathy in elderly dogs. Vet. Q. 2017;37:1–7. doi: 10.1080/01652176.2016.1235301. [DOI] [PubMed] [Google Scholar]
- 54.DeSimone C.V., Graff-Radford J., Holmes D.R., Jr. Cerebral amyloid angiopathy: diagnosis, clinical implications, and management strategies in atrial fibrillation. J. Am. Coll. Cardiol. 2017;70:1173–1182. doi: 10.1016/j.jacc.2017.07.724. [DOI] [PubMed] [Google Scholar]
- 55.Fossati S., Cam J., Rostagno A. Differential activation of mitochondrial apoptotic pathways by vasculotropic amyloid-beta variants in cells composing the cerebral vessel walls. FASEB J. 2010;24:229–241. doi: 10.1096/fj.09-139584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Roche J., Shen Y., Bax A. Monomeric Aβ(1-40) and Aβ(1-42) peptides in solution adopt very similar ramachandran map distributions that closely resemble random coil. Biochemistry. 2016;55:762–775. doi: 10.1021/acs.biochem.5b01259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Simmons L.K., May P.C., Brems D.N. Secondary structure of amyloid beta peptide correlates with neurotoxic activity in vitro. Mol. Pharmacol. 1994;45:373–379. [PubMed] [Google Scholar]
- 58.Terzi E., Hölzemann G., Seelig J. Reversible random coil-beta-sheet transition of the Alzheimer beta-amyloid fragment (25-35) Biochemistry. 1994;33:1345–1350. doi: 10.1021/bi00172a009. [DOI] [PubMed] [Google Scholar]
- 59.Coles M., Bicknell W., Craik D.J. Solution structure of amyloid beta-peptide(1-40) in a water-micelle environment. Is the membrane-spanning domain where we think it is? Biochemistry. 1998;37:11064–11077. doi: 10.1021/bi972979f. [DOI] [PubMed] [Google Scholar]
- 60.Wälti M.A., Orts J., Riek R. Solution NMR studies of recombinant Aβ(1-42): from the presence of a micellar entity to residual β-sheet structure in the soluble species. ChemBioChem. 2015;16:659–669. doi: 10.1002/cbic.201402595. [DOI] [PubMed] [Google Scholar]
- 61.Jarvet J., Damberg P., Gräslund A. Reversible random coil to β-sheet transition and the early stage of aggregation of the Aβ(12–28) fragment from the alzheimer peptide. J. Am. Chem. Soc. 2000;122:4261–4268. [Google Scholar]
- 62.Tang H., Thomas P.D. Tools for predicting the functional impact of nonsynonymous genetic variation. Genetics. 2016;203:635–647. doi: 10.1534/genetics.116.190033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Bendl J., Stourac J., Damborsky J. PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations. PLoS Comput. Biol. 2014;10:e1003440. doi: 10.1371/journal.pcbi.1003440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Ghosh R., Oak N., Plon S.E. Evaluation of in silico algorithms for use with ACMG/AMP clinical variant interpretation guidelines. Genome Biol. 2017;18:225. doi: 10.1186/s13059-017-1353-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Agrahari A.K., Krishna Priya M., Zayed H. Understanding the structure-function relationship of HPRT1 missense mutations in association with Lesch-Nyhan disease and HPRT1-related gout by in silico mutational analysis. Comput. Biol. Med. 2019;107:161–171. doi: 10.1016/j.compbiomed.2019.02.014. [DOI] [PubMed] [Google Scholar]
- 66.Sinha S., Wang S.M. Classification of VUS and unclassified variants in BRCA1 BRCT repeats by molecular dynamics simulation. Comput. Struct. Biotechnol. J. 2020;18:723–736. doi: 10.1016/j.csbj.2020.03.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Kumar A., Purohit R. Use of long term molecular dynamics simulation in predicting cancer associated SNPs. PLoS Comput. Biol. 2014;10:e1003318. doi: 10.1371/journal.pcbi.1003318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Wang D.D., Ou-Yang L., Yan H. Predicting the impacts of mutations on protein-ligand binding affinity based on molecular dynamics simulations and machine learning methods. Comput. Struct. Biotechnol. J. 2020;18:439–454. doi: 10.1016/j.csbj.2020.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Jamal S., Grover A., Grover S. Machine learning from molecular dynamics trajectories to predict caspase-8 inhibitors against Alzheimer’s disease. Front. Pharmacol. 2019;10:780. doi: 10.3389/fphar.2019.00780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.McCoy M.D., Shivakumar V., Madhavan S. SNP2SIM: a modular workflow for standardizing molecular simulation and functional analysis of protein variants. BMC Bioinformatics. 2019;20:171. doi: 10.1186/s12859-019-2774-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.