Abstract
Dynamics of individual amino acids play key roles in the overall properties of proteins. However, the knowledge of protein structural features at the residue level is limited due to the current resolutions of experimental and computational techniques. To address this issue, we designed a novel machine-learning (ML) framework that uses Molecular Dynamics (MD) trajectories to identify the major conformational states of individual amino acids, classify amino acids switching between two distinct modes, and evaluate their degree of dynamic stability. The Random Forest model achieved 96.94% classification accuracy in identifying switch residues within proteins. Additionally, our framework distinguishes between the stable switch (SS) residues, which remain stable in one angular state and jump once to another state during protein dynamics, and unstable switch (US) residues, which constantly fluctuate between the two angular states. This study also illustrates the correlation between the dynamics of SS residues and the protein’s global properties.
Introduction
A protein is comprised of amino acids, and their dynamics play crucial roles in determining the characteristics of the protein.1−23 However, investigating the structures and dynamics of amino acid residues within a protein poses challenges both experimentally and computationally.4,6,24−28 Although techniques such as X-ray crystallography and NMR spectroscopy can be used to characterize the structure and dynamics of amino acids,11,29,30 current spatiotemporal experimental resolutions prohibit the accurate descriptions of their dynamical features.12,31−35 The rapid motion of amino acids occurs at frequencies in the gigahertz (GHz) range, which cannot be precisely resolved by current experimental instrumentation. However, investigating conformational changes of amino acids with femtosecond (fs) resolution is practical using molecular dynamics (MD) simulations,36−41 which is valuable data for obtaining information about the 3D conformational dynamics of proteins.36,40,41 In the case of individual amino acids within a protein, one notable challenge arises from rare events such as switching to different conformations. These events may not be observed in long trajectory simulations. Indeed, utilizing numerous short trajectories generated from various conformational states of proteins enables us to statistically capture those rare events. However, analyzing this huge amount of data poses another challenge due to the high dimensionality of the data. Given these problems with MD simulation sampling, we addressed a couple of questions in this study: 1. Can we identify major conformational states of single amino acid residue using sufficient MD trajectories and classify those switching between two distinct structures? 2. Can we evaluate the degree of dynamic stability of the switch residues (or if the two conformations are stable?)? 3. Which of the switch residues contributes to the overall function of the protein? To answer these questions, knowledge of statistics and machine learning (ML) is required to deal with the large amounts of the data.42−46 ML models can automatically uncover patterns and relationships in the data, which can be used to infer and predict physical phenomena.43,47,48 In addition, ML can be used to take advantage of thousands of short trajectories.49−52 Inspired by the aforementioned challenges and the power of ML, we developed a framework that processes MD trajectories and uses ML models to characterize amino acid conformations and dynamics. Aggregation of amino acid residue dynamics at different conformational states and extracting biophysical knowledge from short trajectories can be viable by projecting residue structural features onto the Density of States (DoS). For the structural features of an amino acid, we focused on the angles formed by the three atoms. If a residue experiences no significant conformational changes, it leads to a unimodal density of angular states. On the other hand, some amino acid residues undergo transitions between different conformations resulting in shifts between two or more angular states. For the purpose of this investigation, we defined residues that demonstrate bimodal DoS with no intermediate state(s) as switch residues. The proposed approach can be developed to investigate the switching function of residues up to n-modal where all the modes may play critical roles in protein dynamics and function. However, in this study, we focused on bimodal switch residues and aimed to introduce a new metric to analyze dynamics of a given protein at the residue level. The ultimate objective of our study is to investigate potential correlations between the functionality of switch residues and the overall properties of proteins.
Methods
Our framework processes the MD trajectories of any protein and iterates over all of the residues within it. Figure 1 describes this framework that characterizes the switching function of residues in proteins. It consists of two main sections: preprocessing and ML-processing. In the preprocessing section, we calculate all of the angles within every single residue along the trajectories and build the histogram (DoS) for each of the angles. For the ML-processing section, we prepared a training data set for ML models consisting of labels and features of the DoS to classify individual residues into switch or nonswitch. Finally, to validate the performance and effectiveness of our method, this framework is applied to trajectory data sets of two different proteins: Fs peptide protein and β2AR receptor.
Figure 1.
Framework for classifying amino acid residues into switch and nonswitch residues in proteins using MD simulations and ML models. The framework consists of preprocessing of protein trajectories and ML-processing. It is followed by two examples of using this method in Fs peptide protein and the β2AR receptor.
Data Preprocessing
To extract information from the simulations, we considered residues as rigid bodies and identified only atoms located at the nodes and edges of their structures (see Table S1 for a list of selected atoms in amino acids). The whole structure of amino acid, along with its dynamics, can be effectively described through all the angles formed by combinations of three of these atoms. The reason we chose the specific subset of atoms is to decrease the large number of possible angles within a single amino acid. For instance, 12 atoms within TYR residue (N, CA, CB, CG, CD1, CE1, CZ, OH, CE2, CD2, C, O) result in a total of 220 angles, however, by implementing the atom selection approach, we were able to significantly reduce them to 35 angles created by seven atoms (N, CA, CB, CG, OH, C, O). Using the selected atoms, we first measured each angle for every frame of trajectories in order to provide a DoS for the angle. Generally, angles formed by sets of three atoms provide more detailed information about the local spatial arrangement of atoms surrounding a central atom and dynamics of an amino acid compared to dihedral angles (Supporting Information, Section 2). Therefore, for large proteins, notable challenges arise due to both the size of the data throughout the entire protein53 and extremely diverse patterns of DoS within it. To overcome these challenges, we employed ML models to accurately classify the switch residues.
Training ML Models
To train ML models capable of classifying switch residues within a protein, a data set containing information on angular states as well as their labels is required. After obtaining various patterns of distributions of the DoS, we labeled them by visual inspection, determining whether a given density represents a switch or not. As shown in Figure 2, the labels for bimodal DoS with no state(s) in between are 1 and others are 0. Besides the labels, the features of the DoS are required for training ML models. We found that the raw data of DoS will yield low accuracy if used as features. Therefore, we defined and engineered 14 features based on patterns of distributions of the bimodal switch histograms as input for the ML models (Figure 2). We trained three ML models: the Decision Tree model which builds a tree-like structure by recursively splitting data based on the features’ values, the Random Forest model which creates an ensemble of multiple decision trees and combines their predictions through voting or averaging,54 and the XGBoost model that uses a gradient boosting framework to build an ensemble of learners (usually decision trees).55 We benchmarked the accuracy of different ML algorithms and found that the Random Forest model demonstrated superior test accuracy of 96.94% compared to the XGBoost and Decision Tree models (Table 1). To obtain this accuracy, we used 5-fold cross-validation.
Figure 2.
Data preprocessing and training ML models process to classify switch and nonswitch residues.
Table 1. Performance of Random Forest, XGBoost, and Decision Tree Models in Classifying Switch and Nonswitch Residuesa.
ML Model | Switch classification accuracy (%) |
---|---|
Random Forest | 96.94 (0.04) |
XGBoost | 96.42 (0.03) |
Decision Tree | 95.91 (0.03) |
The standard deviations are shown in parentheses.
Characterizing Stability
While the switch residues exhibit similar DoS patterns (blue histograms in Figure 3a,b) we observed that their kinetics (dynamics) may be totally different. The brown plot in Figure 3a shows that the conformation of the switch residue changes with a single transition between the two angular states during the protein dynamic. On the other hand, another switch residue (shown in the brown plot in Figure 3b) constantly oscillates between the two angular states throughout the same trajectory. It can be inferred that the DoS can accurately determine the switching modes within a residue, but it lacks the ability to identify the dynamic stability of the switches. To address this challenge, we developed another framework aimed at classifying the two categories of switch residues into stable switch (SS) (Figure 3a) and unstable switch (US) (Figure 3b) classes. To differentiate between the SS and US residues, we defined an Instability ratio which is characterized as the ratio of total transitions between the two angular states over the length of the trajectories (%). To determine the total transitions between the two angular states, we need to know which point belongs to which of the angular states. To obtain this information, we used clustering techniques in an unsupervised fashion. We utilized the k-means algorithm with two clusters using the Scikit-learn library.56 It was observed that the Instability ratio effectively differentiates the SS and US residues when it is either below 1% (Figure S2a) or above 6% (Figure S2b). However, it becomes challenging to distinguish between the two in cases where the Instability ratio falls within the intermediate range (1% < Instability ratio < 6%) (Figure S2c,d). The reason for this lies in the diverse distribution of transitions’ timesteps between the two angular states throughout the trajectories (see Supporting Information Section 3). To resolve this challenge, we trained the Logistic Regression model with Instability ratio to establish the appropriate Instability ratio for classifying SS and US residues. The training data set for the model is similar to the brown plots in Figure 3 containing the Instability ratio and labels (SS or US). Ultimately, the model achieved an accuracy of 98.97% with a standard deviation of 0.01 for classifying the SS and US residues using 5-fold cross-validation. Figure 3c illustrates how the Logistic Regression model established the Instability ratio to classify the SS and US residues.
Figure 3.
Comparative representation of switch residues exhibiting a similar DoS but different dynamics. DoS along with dynamical representation for (a) stable switch (SS) residue and (b) unstable switch (US) residue in a protein. (c) Established Instability ratios by the Logistic Regression model to perform classification of switch residues into SS and US residues.
Results and Discussion
The global properties of proteins describe characteristics as a whole rather than focusing on specific local regions or individual amino acids. Some common examples of global features of proteins include their stability, activity, folding patterns, and functional properties. We observed that the switchlike transition between the two angular states in SS residues occurs when there is a notable change in the overall properties of the protein. On the other hand, there is no significant correlation between the US residue dynamics and essential changes in the global features of proteins. In order to validate our findings, we evaluated the SS and US residues within the Fs peptide protein as well as in the β2AR receptor. We examined the global characteristics of these proteins that were associated with SS residues (shown in Figure 4 and Figure 5). It is essential to emphasize that compared to traditional methods such as correlation analysis, the method we developed works with density of angular states (histograms), which incorporates the statistical information derived from thousands of trajectories in the case of heavy proteins. However, the correlation coefficient methods are not based on the collective behavior of atoms. When it comes to analyzing complex, multidimensional relationships, the use of ML techniques, like the methods we have developed, becomes imperative. Another advantage of our method compared to traditional methods is its computational efficiency. We first filter out the SS residues and then work only on them to specify their correlation to protein properties. We postulate that in the case of traditional methods, there is no means to have such prior knowledge, which results in redundant calculations.
Figure 4.
The A9 and R10 residues in Fs peptide protein are identified as SS and US residues, respectively. (a) The A9 residue (SS residue) exhibits a strong correlation with the RMSD. (b) The R10 residue (US residue) typically oscillates between the two angular states throughout the protein dynamics. The protein conformations are shown using PyMOL representation.58
Figure 5.
The S14334.55 and I1123.31 residues in β2AR receptor classified as SS and US residues, respectively. (a) The S14334.55 residue represents a strong correlation to activation states of the receptor. (b) Conformational representation58 of the receptor in the active and inactive states show the correlation between H3–H6 distance (measured as Cα contact distance between R1313.50–L2726.34 residues) and angular states in the S14334.55 residue. (c) The I1123.31 residue fluctuates between the two angular states throughout the protein dynamics. (d) The switching function of S14334.55 residue is correlated to the activation states and H3–H6 distances in the receptor. (e) All switch residues over β2AR receptor structure are highlighted in red (see Table S2 for a list of switch residues in this receptor). The I2055.45, V2065.46, and I3097.36 switch residues are within 7 Å of the ligand.
SS Residue in Fs Peptide Protein Correlated to RMSD
In this study, we used trajectories of the Fs peptide protein (Ace-A_5(AAARA)_3A-NME), which is a well-established model system for studying protein folding. The trajectory is 500 ns in duration and is saved every 50 ps. The simulation was executed using OpenMM 6.0.1 and the AMBER99SB-ILDN force field with GBSA-OBC implicit solvent at 300 K.57 It was initiated from randomly selected conformations obtained from an initial 400 K unfolding simulation.57 For this trajectory, the RF model classified seven residues (A3, A7, A9, R10, R15, R20, and A22) as the switch residues out of 21 residues within the protein (Table 2). The A9 residue is also detected as the SS residue. In Table 2, the angle switch ratio (ANSR) is defined as the ratio of switch angles over the total angles in a residue. For example, 10 angles can be formed within the A9 residue, and only two of them exhibit switching functions. Thus, the ANSR for the A9 residue will be 2/10. It is reasonably assumed that residues with higher ANSR values would have stronger correlations with global properties of a protein. The atom switch contribution (ATSC) highlights the degree of contribution of each atom within a residue to its switching function. For example, the two switch angles in residue A9 are formed between the O–C–N and the O–CA–N atoms. Hence, the ATSC for atoms O, C, CA, and N will be 2, 1, 1, and 2, respectively. After identifying the SS residue in the Fs peptide protein, we proceeded to assess any correlation with the global properties of the protein. In this study, we specifically focused on the folding process of the protein, as indicated by the Root Mean Square Deviation (RMSD) feature. As shown in Figure 4a, there is a strong correlation between the SS residue and the RMSD feature in the protein. At time ∼280 ns, the RMSD significantly increases; simultaneously, the A9 residue (SS residue) switches to another angular state. On the other hand, the R10 residue (US residue) mostly oscillates between the two states throughout the folding process, regardless of the significant change in the RMSD.
Table 2. List of Amino Acid Residues in Fs Peptide Protein Classified as Switch Residues Using the RF Modela.
Residue | ANSR | ATSC | Residue | ANSR | ATSC |
---|---|---|---|---|---|
ALA9 | 2/10 | O:2, N:2, C:1, CA:1 | ARG10 | 6/120 | CG:5, CB:4, C:3, N:2, NE:2, CA:1, CD:1 |
ALA7 | 1/10 | O:1, C:1, N:1 | ARG15 | 3/120 | CG:3, C:2, CB:2, CA:1, NE:1 |
ALA3 | 1/10 | O:1, CA:1, N:1 | ARG20 | 1/120 | CB:1, CD:1, NE:1 |
ALA22 | 1/10 | O:1, C:1, N:1 |
ALA9 is detected as SS residue. The angle switch ratio (ANSR) and atom switch contribution (ATSC) are represented for the switch residues.
SS Residues in β2AR Receptor Correlated to Activation States
In some proteins, certain global properties such as protein-drug binding, protein folding, and major conformational changes necessary for protein function may occur within microseconds to millisecond,59 which corresponds to the time required for SS residues to jump between the angular states. Running such long MD simulations is computationally expensive when dealing with heavy proteins. In such scenarios, instead of relying on long trajectories, employing thousands of short trajectories can assist us in capturing the switching functions of amino acids. For this case, we worked on G protein-coupled receptors (GPCRs) that are membrane proteins and play crucial roles in transmitting signals into cells. When a ligand binds to the GPCR, it induces conformational changes in the receptor, causing the receptor to be activated. The activation states are essential for receptor functions and associated with physiological responses. The study of switch residues in GPCRs can shed light on the mechanisms of cell signaling and the transition pathways between activation states in the receptors at residue level.1,28,60−63 We analyzed the conformational changes in β2AR receptor which plays a crucial role in regulating bronchodilation, heart rate, and blood pressure.64−66 We assessed the switch residues and the correlation between the SS residues and activation states in this receptor. For this study, we randomly selected 5000 MD short trajectories (consisting of 110,000 frames) representing structures of β2AR receptor in the presence of the partial inverse agonist carazolol. The simulations were performed for both the inactive and active crystal structures of the receptor (referred to as PDB 2RH1(67) and PDB 3P0G,53 respectively). The protein was embedded in a POPC lipid bilayer and solvated with water. The TIP3P water model was used in this simulation. The simulation snapshots were saved every 0.5 ns. All simulations were carried out with the Gromacs 4.5.3 MD package on Google Exacycle.53,68
Table 3 shows a list of SS residues in the β2AR receptor (see Table S2 for list of US residues). Similar to Table 2, the ANSR and ATSC metrics are reported for the switch residues. The E2686.30 and S14334.55 residues displayed US and SS functions, respectively. During the inactive states of the receptor, the S14334.55 residue interacts with D1303.49 residue within the conserved DRY motif (D1303.49, R1313.50, Y 1323.51).69,70 The DRY region forms an energy barrier that must be broken to achieve the activated state that is necessary for G protein coupling and downstream cellular responses.71,72 The DRY motif is also involved with highly conserved E2686.30 switch residue to form an “ionic lock” associated with the receptor’s inactive state.72−74 These interactions are essential for stabilizing conformations and the function of the receptor. For GPCR’s overall property, we focused on the activation states of the receptor.
Table 3. List of SS Residues in the β2AR Receptora.
Residue | ANSR | ATSC | Residue | ANSR | ATSC |
---|---|---|---|---|---|
S14334.55 | 6/20 | OG:4, N:4, C:3, CA:3, CB:2, O:2 | Q14234.54 | 5/84 | C:3, CG:3, N:3, CA:2, CB:2, O:2 |
I1213.40 | 12/56 | CG1:7, CG2:6, CA:5, CD1:5, CB:4, N:4, C:3, O:2 | Y 2195.58 | 2/35 | N:2, OH:2, C:1, CA:1 |
I2786.40 | 4/56 | CG2:3, C:2, CA:2, CB:2, N:1, CG1:1, CD1:1 | M2155.54 | 3/56 | CG:3, C:2, CB:2, CA:1, N:1 |
See Table S2 for the list of US residues. The angle switch ratio (ANSR) and atom switch contribution (ATSC) are reported for the switch residues. The Ballesteros–Weinstein numbers are utilized to represent the amino acids.
In our previous work,62 we introduced an XGBoost model that predict the activation state (classification task) and activity level (regression task) of a given receptor with prediction accuracy of 97.27% and 8.55% MAE, respectively. We applied that model to the data set employed in this study with the aim of estimating the correlation between the activation states and dynamics of SS residues in the receptor. Figure 5a demonstrates how this correlation appeared in the S14334.55 residue. As the angle in the residue is in the range of ∼1.5–1.8 (rad), the average activity level of the receptor is 18.08% and it increases to 52.66% for angles in ∼2.4–2.7 (rad). Figure 5b displays conformational representations of the angles in the inactive and active structures of the receptor. As shown in Figure 5c, the I1123.31 residue (US residue) mostly fluctuates between the two angular states during the protein dynamics. The Figure shows that the average activation states of the receptor for both angular states are 34.25% and 34.91%, meaning that the dynamic of the I1123.31 residue is not correlated to the activation states of the entire protein. To verify the strength of our models, we incorporated a critical structural feature of the receptor into this analysis. Helix3 (H3) and helix6 (H6) are two of the seven transmembrane helices that make up the core structure of GPCRs. Changes in the H3–H6 distance have essentially significant contributions to downstream signaling pathways and activation states of the receptor. In this study, we measured the H3–H6 distance as the Cα contact distance between R1313.50–L2726.34 amino acids. Figure 5d illustrates that the dynamics of S14334.55 as the SS residue, activation states of the receptor, and H3–H6 distances are heavily correlated to each other. The other SS residues shown in Table 3 (I1213.40, I2786.40, Q14234.54, Y 2195.58, and M2155.54) represent similar correlations to the activation states and H3–H6 distances. Figure 5e highlights all switch residues throughout the β2AR receptor structure in red. The N3227.49, L3247.51, and I3257.52 switch residues belong to the NPLIY motif within the receptor. The NPxxY motif is a conserved sequence in GPCRs (xx amino acids can vary in different GPCRs) that have significant effects on the activation process, impacting ligand affinity and G protein coupling.75,76 Furthermore, during the activation process of receptors, the conserved polar network in GPCRs undergoes essential rearrangements that contribute to stabilizing the receptor’s structure and aiding intracellular signaling.77 Our model detected the I2786.40 as SS residue within the polar network.77Figure 5e also shows that the I2055.45, V 2065.46, and I3097.36 switch residues are identified within 7 Å of the ligand. Understanding the dynamics and functions of amino acids within the binding pocket and its proximity can provide insights into ligand–protein interactions. Each of these switch residues can play significant roles in ligand-binding mechanisms and perform an induced fit for optimal signal transduction. Overall, our model demonstrates that those amino acids that are already known as crucial in the protein properties may exhibit switching function. Studying the dynamics of SS residues and their correlation with activation states in GPCRs provide valuable insights into the mechanisms of conformational changes in the receptors. This understanding will also facilitate the design of more effective drugs targeting GPCRs by specifically interacting with the key amino acids associated with activation states in the receptors.
Conclusion
In this study, we developed a framework to characterize the structure and dynamics of amino acid residues in proteins using MD simulations and ML models. We mainly focused on residues containing angle(s) that switch between two distinct angular states without passing through any intermediate states. We implemented ML models to classify such residues in proteins. Our analysis indicated that the Random Forest model achieved an accuracy of 96.94% in classifying switch residues. Notably, our study revealed that switch residues, despite exhibiting similar densities of angular states, can differ substantially in terms of their structural dynamics and stability. The stable switch (SS) residue tends to remain stable in one of the angular states, and its conformation changes with a single transition to another state during the protein dynamics, while the unstable switch (US) residue constantly fluctuates between the two angular states. To distinguish between SS and US residues, we developed another method using the Logistic Regression model. This model exhibits an accuracy of 98.97% in classifying these residues. In addition, we found that there is a strong correlation between the dynamics of SS residues and the protein’s global properties. We confirmed the validity of it by evaluating the correlation between SS residue and folding characteristics as RMSD in the Fs peptide protein. Additionally, this study has demonstrated the correlation between the SS residues and activation states in β2AR receptor. This knowledge of the switching function of amino acids serves as a foundation for advancing our understanding and analysis of protein characteristics, which can be applied to protein engineering, protein-based therapeutics, and drug discovery approaches.
Acknowledgments
The authors gratefully acknowledge that this work is supported by the Center for Machine Learning in Health (CMLH) at Carnegie Mellon University and a start-up fund from the Mechanical Engineering Department at CMU.
Data Availability Statement
The necessary information containing the training data sets, codes, and scripts for ML models used in this study is available here: https://github.com/pmollaei/AminoSwitch.
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jctc.3c00665.
The Supporting Information includes selected atoms in amino acids, a comparison between basic vs dihedral angles, the Instability ratio, and a list of switch residues in the β2AR receptor (PDF)
The authors declare no competing financial interest.
Supplementary Material
References
- Tzeng S.-R.; Kalodimos C. G. Protein dynamics and allostery: an NMR view. Curr. Opin. Struct. Biol. 2011, 21, 62–67. 10.1016/j.sbi.2010.10.007. [DOI] [PubMed] [Google Scholar]
- Yang L.-Q.; Sang P.; Tao Y.; Fu Y.-X.; Zhang K.-Q.; Xie Y.-H.; Liu S.-Q. Protein dynamics and motions in relation to their functions: several case studies and the underlying mechanisms. J. Biomol. Struct. Dyn. 2014, 32, 372–393. 10.1080/07391102.2013.770372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Damodaran S.Amino acids, peptides and proteins. Fennema’s food chemistry; Taylor & Francis: 2008; Vol. 4, pp 425–439. [Google Scholar]
- Salsbury F. R. Jr Molecular dynamics simulations of protein dynamics and their relevance to drug discovery. Current opinion in pharmacology 2010, 10, 738–744. 10.1016/j.coph.2010.09.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao H.; Caflisch A. Molecular dynamics in drug design. European journal of medicinal chemistry 2015, 91, 4–14. 10.1016/j.ejmech.2014.08.004. [DOI] [PubMed] [Google Scholar]
- Karplus M.; McCammon J. A. Molecular dynamics simulations of biomolecules. Nature structural biology 2002, 9, 646–652. 10.1038/nsb0902-646. [DOI] [PubMed] [Google Scholar]
- Levitt M.; Warshel A. Computer simulation of protein folding. Nature 1975, 253, 694–698. 10.1038/253694a0. [DOI] [PubMed] [Google Scholar]
- Amadei A.; Linssen A. B.; Berendsen H. J. Essential dynamics of proteins. Proteins: Struct., Funct., Bioinf. 1993, 17, 412–425. 10.1002/prot.340170408. [DOI] [PubMed] [Google Scholar]
- Daggett V.; Fersht A. R. Is there a unifying mechanism for protein folding?. Trends in biochemical sciences 2003, 28, 18–25. 10.1016/S0968-0004(02)00012-9. [DOI] [PubMed] [Google Scholar]
- Nisthal A.; Wang C. Y.; Ary M. L.; Mayo S. L. Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. Proc. Natl. Acad. Sci. U. S. A. 2019, 116, 16367–16377. 10.1073/pnas.1903888116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Branden C. I.; Tooze J.. Introduction to protein structure; Garland Science: 2012. [Google Scholar]
- Murphy K. P.Protein structure, stability, and folding; Springer Science & Business Media: 2008; Vol. 168. [Google Scholar]
- Schulz G. E.; Schirmer R. H.. Principles of protein structure; Springer Science & Business Media: 2013. [Google Scholar]
- Branden C.; Tooze J.. Introduction to Protein Structure; Garland Science:New York, 1999. [Google Scholar]
- Orengo C. A.; Todd A. E.; Thornton J. M. From protein structure to function. Curr. Opin. Struct. Biol. 1999, 9, 374–382. 10.1016/S0959-440X(99)80051-7. [DOI] [PubMed] [Google Scholar]
- Han K.-L.; Zhang X.; Yang M.-j.. Protein conformational dynamics; Springer: 2014; Vol. 805. [Google Scholar]
- Hammes-Schiffer S.; Benkovic S. J. Relating protein motion to catalysis. Annu. Rev. Biochem. 2006, 75, 519–541. 10.1146/annurev.biochem.75.103004.142800. [DOI] [PubMed] [Google Scholar]
- Alberts B.; Johnson A.; Lewis J.; Raff M.; Roberts K.; Walter P.. Molecular Biology of the Cell, 4th ed.; Garland Science: 2002. [Google Scholar]
- Koshland D. E. Conformational changes: how small is big enough?. Nature medicine 1998, 4, 1112–1114. 10.1038/2605. [DOI] [PubMed] [Google Scholar]
- Karplus M.; Kuriyan J. Molecular dynamics and protein function. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 6679–6685. 10.1073/pnas.0408930102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baldwin J. M. The probable arrangement of the helices in G protein-coupled receptors. EMBO journal 1993, 12, 1693–1703. 10.1002/j.1460-2075.1993.tb05814.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grant B. J.; Gorfe A. A.; McCammon J. A. Large conformational changes in proteins: signaling and other functions. Curr. Opin. Struct. Biol. 2010, 20, 142–147. 10.1016/j.sbi.2009.12.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zacharias M. Accounting for conformational changes during protein–protein docking. Curr. Opin. Struct. Biol. 2010, 20, 180–186. 10.1016/j.sbi.2010.02.001. [DOI] [PubMed] [Google Scholar]
- Henzler-Wildman K.; Kern D. Dynamic personalities of proteins. Nature 2007, 450, 964–972. 10.1038/nature06522. [DOI] [PubMed] [Google Scholar]
- Frauenfelder H.; Sligar S. G.; Wolynes P. G. The energy landscapes and motions of proteins. Science 1991, 254, 1598–1603. 10.1126/science.1749933. [DOI] [PubMed] [Google Scholar]
- Kleckner I. R.; Foster M. P. An introduction to NMR-based approaches for measuring protein dynamics. Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics 2011, 1814, 942–968. 10.1016/j.bbapap.2010.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brändén G.; Neutze R. Advances and challenges in time-resolved macromolecular crystallography. Science 2021, 373, eaba0954 10.1126/science.aba0954. [DOI] [PubMed] [Google Scholar]
- Shaw D. E.; Maragakis P.; Lindorff-Larsen K.; Piana S.; Dror R. O.; Eastwood M. P.; Bank J. A.; Jumper J. M.; Salmon J. K.; Shan Y.; Wriggers W. others Atomic-level characterization of the structural dynamics of proteins. Science 2010, 330, 341–346. 10.1126/science.1187409. [DOI] [PubMed] [Google Scholar]
- Drenth J.Principles of protein X-ray crystallography; Springer Science & Business Media: 2007. [Google Scholar]
- Cavanagh J.; Fairbrother W. J.; Palmer A. G. III; Skelton N. J.. Protein NMR spectroscopy: principles and practice; Academic Press: 1996. [Google Scholar]
- Cundall R.Time-resolved fluorescence spectroscopy in biochemistry and biology; Springer Science & Business Media: 2013; Vol. 69. [Google Scholar]
- Daune M.Molecular biophysics: structures in motion; Oxford University Press: 1999.
- Torre R.Time-resolved spectroscopy in complex liquids; Springer: 2007. [Google Scholar]
- Karp G.Cell and molecular biology: concepts and experiments; John Wiley & Sons: 2009. [Google Scholar]
- Finkelstein A. V.; Ptitsyn O.. Protein physics: a course of lectures; Elsevier: 2016. [Google Scholar]
- Allen M. P.Introduction to molecular dynamics simulation. Computational soft matter: from synthetic polymers to proteins; John von Neumann Institute for Computing: 2004; Vol. 23, pp 1–28.
- Kukol A.Molecular modeling of proteins; Springer: 2008; Vol. 443. [Google Scholar]
- Freddolino P. L.; Harrison C. B.; Liu Y.; Schulten K. Challenges in protein-folding simulations. Nat. Phys. 2010, 6, 751–758. 10.1038/nphys1713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schlick T.Molecular modeling and simulation: an interdisciplinary guide; Springer: 2010; Vol. 2. [Google Scholar]
- Hollingsworth S. A.; Dror R. O. Molecular dynamics simulation for all. Neuron 2018, 99, 1129–1143. 10.1016/j.neuron.2018.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rapaport D. C.; Rapaport D. C. R.. The art of molecular dynamics simulation; Cambridge University Press: 2004. [Google Scholar]
- Hastie T.; Tibshirani R.; Friedman J. H.; Friedman J. H.. The elements of statistical learning: data mining, inference, and prediction; Springer: 2009; Vol. 2. [Google Scholar]
- Bishop C. M.; Nasrabadi N. M.. Pattern recognition and machine learning; Springer: 2006; Vol. 4. [Google Scholar]
- Kuhn M.; Johnson K.. Applied predictive modeling; Springer, 2013; Vol. 26. [Google Scholar]
- Warren J.; Marz N.. Big Data: Principles and best practices of scalable realtime data systems; Simon and Schuster: 2015. [Google Scholar]
- James G.; Witten D.; Hastie T.; Tibshirani R.. An introduction to statistical learning; Springer: 2013; Vol. 112. [Google Scholar]
- Murphy K. P.Machine learning: a probabilistic perspective; MIT Press: 2012. [Google Scholar]
- Müller A. C.; Guido S.. Introduction to machine learning with Python: a guide for data scientists; O’Reilly Media, Inc.: 2016. [Google Scholar]
- Farimani A. B.; Feinberg E. N.; Pande V. S.. Binding Pathway of Opiates to μOpioid Receptors Revealed by Unsupervised Machine Learning. arXiv, 1804.08206, 2018.
- Chodera J. D.; Singhal N.; Pande V. S.; Dill K. A.; Swope W. C. Automatic discovery of metastable states for the construction of Markov models of macromolecular conformational dynamics. J. Chem. Phys. 2007, 126, 155101 10.1063/1.2714538. [DOI] [PubMed] [Google Scholar]
- Feinberg E. N.; Farimani A. B.; Uprety R.; Hunkele A.; Pasternak G. W.; Majumdar S.; Pande V. S.. Machine Learning Harnesses Molecular Dynamics to Discover New μOpioid Chemotypes. arXiv, 1803.04479, 2018.
- Pande V. S.; Beauchamp K.; Bowman G. R. Everything you wanted to know about Markov State Models but were afraid to ask. Methods 2010, 52, 99–105. 10.1016/j.ymeth.2010.06.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rasmussen S. G.; Choi H.-J.; Fung J. J.; Pardon E.; Casarosa P.; Chae P. S.; DeVree B. T.; Rosenbaum D. M.; Thian F. S.; Kobilka T. S.; et al. Structure of a nanobody-stabilized active state of the β2 adrenoceptor. Nature 2011, 469, 175–180. 10.1038/nature09648. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Breiman L. Random forests. Machine learning 2001, 45, 5–32. 10.1023/A:1010933404324. [DOI] [Google Scholar]
- Chen T.; Guestrin C.. Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM Digital Library: 2016; pp 785–794.
- Pedregosa F.; Varoquaux G.; Gramfort A.; Michel V.; Thirion B.; Grisel O.; Blondel M.; Prettenhofer P.; Weiss R.; Dubourg V.; et al. Scikit-learn: Machine learning in Python.. Journal of machine Learning research 2011, 12, 2825–2830. [Google Scholar]
- McGibbon R. T.Fs MD Trajectories. figshare, Dataset, 2014; 10.6084/m9.figshare.1030363.v1 (accessed 10 April 2023). [DOI]
- DeLano W. L. Pymol: An open-source molecular graphics tool. CCP4 Newsl. Protein Crystallogr. 2002, 40, 82–92. [Google Scholar]
- Dror R. O.; Dirks R. M.; Grossman J.; Xu H.; Shaw D. E. Biomolecular simulation: a computational microscope for molecular biology. Annual review of biophysics 2012, 41, 429–452. 10.1146/annurev-biophys-042910-155245. [DOI] [PubMed] [Google Scholar]
- Kobilka B. The structural basis of G-protein-coupled receptor signaling (Nobel Lecture). Angew. Chem., Int. Ed. 2013, 52, 6380–6388. 10.1002/anie.201302116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cahill T. J. III; Thomsen A. R.; Tarrasch J. T.; Plouffe B.; Nguyen A. H.; Yang F.; Huang L.-Y.; Kahsai A. W.; Bassoni D. L.; Gavino B. J.; et al. Distinct conformations of GPCR−β -arrestin complexes mediate desensitization, signaling, and endocytosis. Proc. Natl. Acad. Sci. U. S. A. 2017, 114, 2562–2567. 10.1073/pnas.1701529114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mollaei P.; Barati Farimani A. Activity Map and Transition Pathways of G Protein-Coupled Receptor Revealed by Machine Learning. J. Chem. Inf. Model. 2023, 63, 2296–2304. 10.1021/acs.jcim.3c00032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yadav P.; Mollaei P.; Cao Z.; Wang Y.; Barati Farimani A. Prediction of GPCR activity using Machine Learning. Computational and Structural Biotechnology Journal 2022, 20, 2564–2573. 10.1016/j.csbj.2022.05.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liggett S. B. β2-Adrenergic receptor pharmacogenetics. American journal of respiratory and critical care medicine 2000, 161, S197–S201. 10.1164/ajrccm.161.supplement_2.a1q4-10. [DOI] [PubMed] [Google Scholar]
- Liggett S. B. The pharmacogenetics of β2-adrenergic receptors: relevance to asthma. Journal of allergy and clinical immunology 2000, 105, S487–S492. 10.1016/S0091-6749(00)90048-4. [DOI] [PubMed] [Google Scholar]
- Brodde O.-E. β-Adrenoceptor blocker treatment and the cardiac β-adrenoceptor-G-protein (s)-adenylyl cyclase system in chronic heart failure. Naunyn-Schmiedeberg’s archives of pharmacology 2007, 374, 361–372. 10.1007/s00210-006-0125-7. [DOI] [PubMed] [Google Scholar]
- Cherezov V.; Rosenbaum D. M.; Hanson M. A.; Rasmussen S. G.; Thian F. S.; Kobilka T. S.; Choi H.-J.; Kuhn P.; Weis W. I.; Kobilka B. K.; et al. High-resolution crystal structure of an engineered human β2-adrenergic G protein–coupled receptor. science 2007, 318, 1258–1265. 10.1126/science.1150577. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kohlhoff K. J.; Shukla D.; Lawrenz M.; Bowman G. R.; Konerding D. E.; Belov D.; Altman R. B.; Pande V. S. Cloud-based simulations on Google Exacycle reveal ligand modulation of GPCR activation pathways. Nat. Chem. 2014, 6, 15–21. 10.1038/nchem.1821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Manglik A.; Kruse A. C.; Kobilka T. S.; Thian F. S.; Mathiesen J. M.; Sunahara R. K.; Pardo L.; Weis W. I.; Kobilka B. K.; Granier S. Crystal structure of the μ-opioid receptor bound to a morphinan antagonist. Nature 2012, 485, 321–326. 10.1038/nature10954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mitra A.; Sarkar A.; Borics A. Universal Properties and Specificities of the β2-Adrenergic Receptor-Gs Protein Complex Activation Mechanism Revealed by All-Atom Molecular Dynamics Simulations. International Journal of Molecular Sciences 2021, 22, 10423. 10.3390/ijms221910423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giraldo J.; Pin J.-P.. G protein-coupled receptors: From structure to function; Royal Society of Chemistry: 2011. [Google Scholar]
- Ballesteros J. A.; Jensen A. D.; Liapakis G.; Rasmussen S. G.; Shi L.; Gether U.; Javitch J. A. Activation of the β2-adrenergic receptor involves disruption of an ionic lock between the cytoplasmic ends of transmembrane segments 3 and 6. J. Biol. Chem. 2001, 276, 29171–29177. 10.1074/jbc.M103747200. [DOI] [PubMed] [Google Scholar]
- Topiol S.; Sabio M. X-ray structure breakthroughs in the GPCR transmembrane region. Biochemical pharmacology 2009, 78, 11–20. 10.1016/j.bcp.2009.02.012. [DOI] [PubMed] [Google Scholar]
- Bhattarai A.; Wang J.; Miao Y. G-protein-coupled receptor–membrane interactions depend on the receptor activation state. Journal of computational chemistry 2020, 41, 460–471. 10.1002/jcc.26082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fritze O.; Filipek S.; Kuksa V.; Palczewski K.; Hofmann K. P.; Ernst O. P. Role of the conserved NPxxY (x) 5, 6F motif in the rhodopsin ground state and during activation. Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 2290–2295. 10.1073/pnas.0435715100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- He R.; Browning D. D.; Ye R. D. Differential roles of the NPXXY motif in formyl peptide receptor signaling. J. Immunol. 2001, 166, 4099–4105. 10.4049/jimmunol.166.6.4099. [DOI] [PubMed] [Google Scholar]
- Huang W.; Manglik A.; Venkatakrishnan A.; Laeremans T.; Feinberg E. N.; Sanborn A. L.; Kato H. E.; Livingston K. E.; Thorsen T. S.; Kling R. C.; et al. Structural insights into μ-opioid receptor activation. Nature 2015, 524, 315–321. 10.1038/nature14886. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The necessary information containing the training data sets, codes, and scripts for ML models used in this study is available here: https://github.com/pmollaei/AminoSwitch.