Abstract
Pepsin-like aspartic proteases (PAPs) are a class of aspartic proteases which shares tremendous structural similarity with human pepsin. One of the key structural features of PAPs is the presence of a β-hairpin motif otherwise known as flap. The biological function of the PAPs is highly dependent on the conformational dynamics of the flap region. In apo PAPs, the conformational dynamics of the flap is dominated by the rotational degrees of freedom associated with χ1 and χ2 angles of conserved Tyr (or Phe in some cases). However it is plausible that dihedral order parameters associated with several other residues might play crucial roles in the conformational dynamics of apo PAPs. Due to their size, complexities associated with conformational dynamics and clinical significance (drug targets for malaria, Alzheimer's disease etc.), PAPs provide a challenging testing ground for computational and experimental methods focusing on understanding conformational dynamics and molecular recognition in biomolecules. The opening of the flap region is necessary to accommodate substrate/ligand in the active site of the PAPs. The BIG challenge is to gain atomistic details into how reversible ligand binding/unbinding (molecular recognition) affects the conformational dynamics. Recent reports of kinetics (Ki, Kd) and thermodynamic parameters (ΔH, TΔS, and ΔG) associated with macro-cyclic ligands bound to BACE1 (belongs to PAP family) provide a perfect challenge (how to deal with big ligands with multiple torsional angles and select optimum order parameters to study reversible ligand binding/unbinding) for computational methods to predict binding free energies and kinetics beyond typical test systems e.g. benzamide–trypsin. In this work, i reviewed several order parameters which were proposed to capture the conformational dynamics and molecular recognition in PAPs. I further highlighted how machine learning methods can be used as order parameters in the context of PAPs. I then proposed some open ideas and challenges in the context of molecular simulation and put forward my case on how biophysical experiments e.g. NMR, time-resolved FRET etc. can be used in conjunction with biomolecular simulation to gain complete atomistic insights into the conformational dynamics of PAPs.
Pepsin-like aspartic proteases (PAPs) are a class of aspartic proteases which shares tremendous structural similarity with human pepsin.
Introduction
Aspartic proteases are a class of enzymes which consist of two highly conserved aspartates in the active site. These enzymes use an active water molecule bound (by H-bond interaction) to aspartate for catalysis of their substrates. Based on their three-dimensional structural similarity, aspartic proteases can be categorised into two categories: (1) pepsin-like aspartic proteases (PAPs) and (2) retroviral aspartic proteases (RAPs).1 PAPs belong to the A01 family whereas RAPs belong to the A02 family of proteases within the MEROPS database2 (Table 1).
Structures (PDB IDs) and members of a few representative aspartic proteases. Abbreviations: BACE: β-secretase, HIV: human immunodeficiency viruses, SIV: simian immunodeficiency virus.
PAPs (MEROPS: A01) | PDBs | RAPs (MEROPS: A02) | PDBs |
---|---|---|---|
Human pepsin | 1PSN, 1QRP, 1PSO | HIV-1 | 4L1A, 1TW7, 4NKK, 2PC0 |
Human renin | 1RNE, 1HRN, 3D91 | HIV-2 | 1IDA |
BACE-1 | 1W50, 3TPL, 1SGZ | SIV | 1TCW, 1AZ5, 1SIP |
BACE-2 | 3ZKG, 3ZKI, 2EWY | ||
Cathepsin D | 1LYA, 1LYB | ||
Plasmepsin I | 4CKU, 3QRV | ||
Plasmepsin II | 1LF4, 1LF3, 4CKU, 4Z22, 1SME, 2BJU | ||
Plasmepsin IV | 1LS5 | ||
Plasmepsin V | 4ZL4, 6C4G | ||
Plasmepsin IX | N/A | ||
Plasmepsin X | N/A | ||
Penicillopepsin | 1APW, 1PPM, 1PPK, 1APU |
In recent years, HIV protease (belongs to RAPs) gained significant attention in context of structural and computational biology due to its role in HIV/AIDS.3 Similar argument can be put forward for β-secretase enzyme, BACE1 which belongs to PAPs and it is a promising drug target for Alzheimer's disease. One of the comprehensive reviews‡ on PAPs have been published by Ben Dunn in 2002.1 Since then several blockbuster drug molecules have been proposed targeting BACE1.4 However, conformational dynamics and molecular recognition in PAPs have not been extensively studied by biophysical experiments e.g. NMR, single molecule FRET, Raman spectroscopy etc. Recent computational studies on BACE1 and plasmepsins (malarial PAP) have identified key order parameters (dynamical structural properties) that can be further explored by biophysical experiments combined with state-of-the art molecular simulation.5–10 The purpose of this review is to highlight common order parameters that governs conformational dynamics and molecular recognition of PAPs and to highlight the need of integrated computational and experimental approaches to gain atomistic insights into molecular mechanism of PAPs. I will also highlight how machine learning methods can be used for this purpose and will put forward my case on using PAPs as a test system for methodological development in computational and experimental frontiers. In this review, i will use plasmepsins (Plms) and BACE1 as test systems to guide the discussion. PAPs share incredible structural similarity, hence i believe computational/experimental methodologies applied on the aforementioned proteins will be broadly applicable to other enzymes of this class. Biophysical experiments, machine learning and biomolecular simulation are the three pillars which governs drug discovery in 21st century.11 I will further discuss how structural similarity of PAPs can be exploited in the context of drug re-purposing towards plasmepsins (drug target for malaria).
Structural features
The main aim of this section is to clarify the language, organize the concepts and define structural features across a few representative PAPs. Common sequential features that are conserved in PAPs are as follows: (1) presence of two catalytic aspartates (often known as catalytic site) and (2) the presence of –Asp–Thr–Gly–Ser– sequence (Fig. 1). Structurally PAPs can be characterised by two main domains (1) the flap region and (2) the coil region.3,5,6 The name coil region is not accurate. The so called coiled region consists of β-strand and coil structures. In this review, it will be denoted as CS region (coil-strand region) (Fig. 2 and Table 2).
Residue numbers of the flap and CS region in PlmII and BACE1.
Protein | Flap | CS region |
---|---|---|
PlmII | 58–88 | 282–302 |
BACE1 | 52–82 | 314–334 |
The flap region acts as a lid over the catalytic site which governs the entry of the substrate or small molecule inhibitors. The extent of flap opening (described in detail in later sections) also dictates the volume of the binding site i.e. open flap conformation increases the volume of the binding site which can accommodate bigger ligands. The CS domain also plays a role in ligand/substrate binding but its role is not as pronounced as the flap region.
One of the startling structural features of PAPs is the presence of conserved tyrosine and tryptophan residue. The tyrosine (Tyr) residue is highly conserved in the flap region of most of the PAPs. Tryptophan (Trp) is another conserved residue that is present in the PAPs. In most cases, Trp is present at the S1 region (Fig. 3). However in case of BACE1 and BACE2, the conserved Trp is a part of the flap region (Fig. 3). Plasmepsin V (PlmV), an emerging drug target against malaria doesn't possess the conserved Trp residue.14 Whereas in PlmIX and X, the conserved Tyr residue is replaced by phenylalanine (Phe) (Fig. 3). Visual inspection of several 3D crystal structure of Plm and BACE-1 shows that the orientation of Tyr and Trp influences the extent of flap opening (Fig. 4). In the next section, i will propose/review several order parameters that can be used to study the extent of flap opening and its connection with orientation of Tyr and Trp.
Order parameters to study conformational dynamics
Order parameter is a commonly used term in physical chemistry. A poor man's working definition of order parameter (sometimes referred as reaction co-ordinate) in context of biomolecular conformational dynamics can be expressed as follows: ‘geometric or abstract co-ordinate systems which captures conformational changes along a pathway’. In biomolecular simulation, order parameters are often known as collective variables (CVs). Some of the commonly used geometric order parameters are distance between two atoms, bond lengths, torsional angles etc. Abstract order parameters include vectorised linear combinations e.g. principal components. Time independent components, latent variable from variational auto-encoder etc. In this section, i will discuss how geometric and abstract order parameters can be used to capture conformational dynamics and molecular recognition in PAPs.
Geometric order parameters
Karubiu et al. proposed the Cα–Cα distance (DIST2) between catalytic Asp and the flap tip residue (flap tip residue varies across different PAPs) in PlmII as order parameters to capture the extent of flap opening in unbiased all-atom molecular dynamics (MD) simulation.5
Rule of thumb to select flap tip residue
Residue number of the conserved Tyr/Phe + 1. For example, in PlmII it will be Val78 (77(Tyr) + 1).
This distance criteria was inspired by the order parameter proposed to study spontaneous flap opening and closing in apo HIV protease.21,22 Karubiu et al. further proposed a distance metric (DIST3) which can capture opening/closing motion associated with CS domain (Fig. 5). Recent study by Bhakat and Söderhjelm6 showed that the fluctuation of these two order parameters (DIST2 and DIST3) are uncorrelated which led to the hypothesis that in apo PAPs, the conformational dynamics of flap and CS domains are independent of each other (Fig. 12). Both DIST2 and DIST3 were further applied on BACE1 and other plasmepsins in order to gain insights into conformational dynamics from classical MD simulations. Bhakat and Söderhjelm performed test calculations (not published) where they have used DIST2 and DIST3 as CVs within metadynamics framework. The authors observed that the metadynamics biasing along these distances led to distorted flap conformations which suggests that these distance based order parameters are not the optimum choices to capture reversible conformational dynamics in PAPs but rather can be used as a quantitative measurement to capture the extent of flap/CS opening/closing. The authors further proposed two additional distance based order parameters COMflap and COMCS in case of PlmII with an aim that both these order parameters will simultaneously act as quantitative measurement of the conformational dynamics and CVs in enhanced sampling calculations e.g. metadynamics.
COMflap: distance between centre of mass (COM) of the catalytic aspartates (only Cα atoms of catalytic aspartates) and COM of the flap region (Cα atoms of residues 58–88).
COMCS: distance between COM of the catalytic aspartates (only Cα atoms of catalytic aspartates) and COM of the CS region (Cα atoms of residues 282–302).
Authors further used COMflap and COMCS as CVs in two – dimensional well-tempered metadynamics simulation. Metadynamics simulations with COM CVs showed significant hysteresis and lack of reversible sampling of the conformational landscape as these CVs doesn't represent slow order parameters (more on this later) necessary for optimum conformational sampling (Fig. 13).
Karubiu et al. further proposed a distance based order parameter (Fig. 5) which measures the distance (DIST1) between flap and CS region as a function of time. Due to uncorrelated dynamics of these regions, DIST1 doesn't provide any additional information compared to DIST2/DIST3. A few other distance based order parameter has also been proposed by different research groups. Gorfe and Caflisch9 proposed (in context of BACE1) the distance between the Cα of the flap tip residue Thr72 and Cβ of the catalytic aspartate as a measure of flap opening (Y). This order parameter is almost identical to the DIST2 proposed by Karubiu et al. The authors further proposed an analogous distance criteria of DIST1: distance between Cα atoms of flap tip Thr72 and Thr329 of CS region (Y′). Spronk and Carlson10 introduced an analogous distance parameter (in BACE1) of DIST2 (denoted as z) which measures the distance between the Cα of the Gln73 and the Cβ of Asp32 (Fig. 6). Due to their similarity, either DIST2 or z can be applicable in PAPs to capture the extent of flap opening. Recently Shen and co-workers23 proposed the following distance parameters to measure the extent of flap opening in human renin: (1) distance between Tyr-83-OH and Asp-38-CG and (2) distance between Ser-84-CB and Asp-226-CG (Fig. 27 in ESI†). The distance between Tyr-83-OH and Asp-38-CG can be a deceptive measurement of the flap opening as the orientation of Tyr-83-OH is highly dependent on the rotational degrees (χ1 and χ2) of freedom of Tyr. Flipping (discussed later) of Tyr side-chain can change the orientation of the Tyr-83-OH. In that case, the distance between Tyr-83-OH and Asp-38-CG will not measure the true extent of flap opening. Distance between Ser-84-CB and Asp-38-CG consist of two stable anchor points which can act as an alternative to Tyr-83-OH and Asp-38-CG.
Additional order parameters have been proposed to capture flap twisting. Spronk and Carlson proposed flap twisting angle, ϕ as the dihedral angle involving Trp76C–Val69N–Thr72CA–Gln73CA (Fig. 7). However, this definition is specialised for BACE1. A general order parameter for capturing flap twisting in PAPs can be expressed as a dihedral angle involving following residues:
i + 5 − C–i − 2 − N–i + 1 − CA–i + 2 − CA where i is the residue number of conserved Tyr.
Recently, Bhakat and Söderhjelm proposed the torsion angles (χ1 and χ2) of conserved Tyr (in PlmII and BACE1) as order parameters to capture conformational dynamics in PAPs.6 A typical free energy profile of Tyr (along χ1) has three free energy minimas which corresponds to three different side-chain orientations: gauche+, gauche− and trans. MD and metadynamics investigations using PlmII and BACE1 predicted that in apo PAPs both gauche+ and trans conformations are equally populated§ (Fig. 8 and 9). Gauche+ is stabilised by side-chain H-bond interaction between Tyr and Trp (denoted as normal state) whereas the trans conformation is stabilised by side-chain H-bond interaction between Tyr and one of catalytic Asp (denoted as flipped state, recently Shen and co-workers sampled this interaction using constant pH molecular dynamics on human renin) (Fig. 10). Gauche− predicted to be the minor population which in case of BACE1 is stabilised by side-chain H-bond interaction between Tyr and Lys (PDB:1SGZ25) (Fig. 4 and 8). A typical way to understand the rotation of Tyr and its connection with flap elevation is to plot a free energy profile involving DIST2/z against χ1 (Fig. 10).¶
Investigation of crystal structures of plasmepsin have shown that not only Tyr but the conserved Trp can also adapt flipped orientation (Fig. 11). Flipping of Trp (Table 3) combined with flap opening (measured using DIST2, see Table 4) leads to expansion of the binding pocket which can accommodate open flap inhibitors. The interplay between flap opening and dihedral order parameters of Tyr and Trp have further being exploited by Oefner and colleagues in human renin.26 By increasing flap elevation by 1 Å, rotating Tyr by −120° (flipped conformation) and displacing Trp slightly led to expansion of the binding site|| which led to identification of novel piperidine based inhibitors with micro-molar ki value. Further computational and experimental studies (X-ray crystallography and NMR) are necessary to understand if the combined flipping of Tyr and Trp is rare or a common phenomena in PAPs that correlates with flap opening and expansion of the binding site.
Dihedral order parameters χ1 and χ2 (values in radian) shows different conformational states of Trp in Plm-II.
PDBs | χ1 | χ2 |
---|---|---|
1LF4 | 1.22 | −1.74 |
2BJU | −1.04 | 1.43 |
Crystal structures of Plasmepsin-II (PDB:1LF4, 2BJU, 2IGY, 4Z22), bovine chymosin protease (PDB:1CMS28), human cathepsin-D (1LYA, 1LYW) and BACE-1 (3TPL, 1W50, 1SGZ) show different conformational states of Tyr. Crystallographic structures of these PAPs shows the variation in flap opening (DIST2). In case of human cathepsin-D, pH alters the extent of flap opening and the orientation of Tyr.
PDB | χ 1 (radian) | DIST2 (nm) | State |
---|---|---|---|
1LF4 | −1.12 | 1.19 | Gauche + |
2BJU | −3.12 | 1.50 | Trans |
2IGY | −2.6 | 1.74 | Trans |
4Z22 | −1.02 | 2.11 | Gauche + |
1CMS | 3.08 | 1.28 | Trans |
1LYA | −1.29 | 1.20 | Gauche + |
1LYW | 0.96 | 1.78 | Gauche − |
3TPL | −1.31 | 1.23 | Gauche + |
1W50 | −1.06 | 1.76 | Gauche+ |
1SGZ | 0.87 | 1.61 | Gauche− |
Mutation of Tyr and its effect on conformational dynamics
Mutation of Tyr to Ala (amino acid with no bulky side-chain) in apo PlmII and BACE1 led to complete flap collapse in MD simulation which was captured by DIST2 (Fig. 12). This observation is consistent with previous experimental study by Suzuki and co-workers29 which shows that Tyr to Ala mutation resulted in loss of enzyme activity in human renin. Further, mutation of Tyr to Thr, Ile, Val in chymosin also led to loss of enzyme activity. MD simulation based mutational study together with experiments led us to conclude that the conserved Tyr plays a critical role in enzyme activity. In future, similar experiments on PlmII and BACE1 are necessary to generalise the aforementioned claim.
Substitution of Tyr to Phe in pepsin retains the enzyme activity. Structural investigation of Toxoplasma gondii PAP, PlmIX, PlmX and PlmV reveals the presence of Phe in place of Tyr (Fig. 4). Since Phe possess rotational degrees of freedom along χ1 and χ2 order parameters hence it dictates the flap dynamics in the aforementioned PAPs. In future, this assumption can be validated by mutating Phe to Ala which results in loss of enzyme activity and flap collapse in MD simulation.
Mutation of Trp
Mutation of Trp and its effect on enzyme activity of PAPs has not been studied extensively. Park et al.30 mutated conserved Trp (Trp39) of R. pusillus PAP with other residues and observed decreased enzyme activity. However, crystal structures of PlmV14 from P. vivax and P. falciparum (homology modelled) doesn't possess conserved Trp. In future detailed computational and experimental studies are required in order to decipher the exact role of Trp in conformational dynamics of PAPs.
Abstract order parameters
MD simulations of PAPs can produce high-dimensional dataset with million or more data points. These data points in together describes conformational motion associated with protein. Recently several dimensionality reduction techniques (e.g. PCA,31–33 tICA,34–38 tSNE,39–42 diffusion map43,44) have been proposed which can extract useful informations related to conformational dynamics from the plethora of the data generated during MD simulations.
Principal component analysis
Principal component analysis (PCA) does a linear transformation which (in context of MD simulation) aims at finding motions that maximize the variance. Before I dive into how it can be used as an order parameter to capture conformational dynamics associated with PAPs, I will briefly introduce the fundamental concept behind PCA.
The first principal component (often denoted as PC1, PCA1 or z1) is the linear trans formation of the original variables x1, x2, …., xp with α. Mathematically this transformation can be expressed as:
1 |
The weight components (α11, α12, …, α1p) maximizes the variance of the original data x subject to the following constraint (normalization):
α112 + α122 + …… + α1p2 = 1 | 2 |
Generally speaking kth PC can be expressed as zk = αTkx,* has the maximum variance and uncorrelated with z1, z2, …, zk−1. x is the eigenvector matrix. Cartesian coordinates for each time-step in a MD simulation defines each of the rows in x whereas, p columns of x consist of 3N cartesian co-ordinates for each atom. For detailed description of PCA in context of MD simulation please refer ref. 32 and33.
PCA algorithm as a post processing tool for MD simulation have been implemented in several state-of-the art software packages e.g. Gromacs,45 MSMBuilder,46 MDTraj47etc. In practice, first few principal components capture the motions of maximal variance from MD trajectories. Bhakat and Söderhjelm performed two independent MD simulations (∼500 ns) of PlmII and performed PCA using the Cα atoms of the combined trajectory (ignoring the tail part) using gcovar tool integrated with Gromacs. The authors thought that first few PCs will capture degrees of freedom associated with the flap opening of PlmII. Metadynamics simulations using PC1 and PC2 as order parameters did a poor sampling of the conformational space of apo PlmII. PCA using the Cα atomic co-ordinates didn't take into account the dihedral angles associated with Tyr (as described in previous section) hence it is not a surprise that the metadynamics with PC1 and PC2 as order parameters failed to enhanced the sampling of the conformational space in apo PlmII (Fig. 13). An alternative approach will be to perform PCA on the dihedral (often known as dihedral PCA31) space (using χ1 and χ2 angles of the flap region in apo PlmII) which in principle should able to capture Tyr mediated flap dynamics in PAPs.
Independent component analysis
Second order independent component analysis (ICA) or otherwise known as time-lagged ICA (TICA) has also been applied in context of PAPs. While in PCA the aim was to find orthogonal linear transformations (PCs) that maximizes the variance, the goal in TICA is to identify linear transformations where the vectors (ICs) are statistically independent and maximizes auto-correlation (Fig. 14). ICA based approaches have been used widely to analyze time-series data from fMRI48 and EEG49 experiments. The first TIC component/vector (often denoted as TIC1) captures the slow (in context of timescale of biomolecular motions) dynamical mode from input high-dimensional time-series data (dihedral order parameters of the flap region in PAPs). TIC vectors can also be used as order parameters to perform well-tempered metadynamics simulations. Bhakat and Söderhjelm performed TICA analysis (for full mathematical description of TICA please refer38 and36) on the MD trajectory of apo PlmII to identify key dihedral order parameters (more precisely linear combination of dihedral angles with different weights for each angle) that captures slow dynamical modes from MD trajectory (Fig. 15 and 16). The authors further used TIC vectors as order parameters in metadynamics simulation which enhances the sampling of flap dynamics compared to classical MD simulations (Fig. 13).
In principle analogous methods of TICA51i.e. VAC52 and SGOOP53†† can also be applied on PAPs to identify and bias order parameters which capture slow dynamical modes (from the MD trajectory) corresponds to flap dynamics.
Integration of several ICA algorithms e.g. Infomax,55 kernel ICA,37 JADE,57 Fast ICA58 with programming interfaces such as Python opens up the door to test the effectiveness of these methods in identifying order parameters from the high-dimensional dihedral space of PAPs.
Binary classifiers
Binary classifiers are a set of supervised machine learning algorithms which aims at classifying an object into one of the two possible categories e.g. classifying the pictures of cats from dogs. The concept of using binary classifiers in automated selection of order parameters was first introduced by Sultan and Pande.59 However, one prerequisite of using binary classifiers is that one has to sample the start and end states. In case of PAPs, these two states (otherwise known as state A and state B) can be normal and flipped states. If one extracts co-ordinates correspond to normal and flipped states from MD trajectory then the next step is to generate subset of order parameters (e.g. χ1 and χ2 angles of the flap region) which represents these two states. This is followed by training of the binary classifier e.g. support vector machine (SVM),60 passive-aggressive (PasAg) classifier,61 logistic regression,62 perceptron63 on the subset of order parameters. In case of PasAg classifier, the classifier decision boundary can be used as a order parameter to drive enhanced sampling calculation (Fig. 17). One advantage of using binary classifier based order parameters in metadynamics is that one can simultaneously bias multiple degrees of freedom (using parallel-bias metadynamics) which can lead to faster convergence (less chance for metadynamics to remain stuck along arbitrary orthogonal slow degrees of freedom) of metadynamics simulations. For example, if transition between normal to flipped state requires change in both χ1 and χ2 angles of Tyr and additional χ1 and χ2 angles of other amino acids (of the flap region), then binary classifier based order parameter would allow addition of metadynamics bias along these additional features using a single order parameter (classifier decision boundary). Initial calculations (unpublished) by the author have shown the effectiveness order parameters in sampling normal to flipped transition in apo PlmII (Fig. 18 and 19).
Another method that can be applied in capturing normal to flipped transition in PAPs is Fisher's linear discriminant analysis (LDA)64 (Fig. 18). LDA has been routinely used in machine learning as a robust supervised classification method. Recently, Parrinello and co-workers have proposed a modified version of LDA, harmonic linear discriminant analysis (HLDA)65 based order parameter which can be applied in context of normal to flipped transition in PAPs. However from a sampling point of view, i believe HLDA will not provide any additional benefits compared to binary classifier e.g. SVM, PasAg, logistic regression etc.
PAPs: testing ground for computational method development
‘If you know the answer beforehand it is easier to test new methods’
This was the philosophy behind using alanine dipeptide, chignolin, BPTI (together these systems known as the mice model for simulation methods) as typical test systems for novel computational methods directed towards order parameter selection. As discussed before the χ1 and χ2 order parameters6 of Tyr plays a critical role in conformational dynamics of apo PAPs. The rotation degrees of freedom associated with Tyr is believed to be one of the slow degrees of freedom which governs conformational dynamics of the flap. However, rotation degrees of freedom associated with conserved Trp and other residues present in the flap region also plays crucial role in overall conformational dynamics of PAPs. High dimensionality associated with dihedral angles of PAPs makes it an ideal play ground to test neural network based latent variable order parameters. In mathematical terms, latent variables are transformations that are not directly observed (probability distributions of dihedral angles) but are rather emerged (using mathematical transformations) from observed variables. Latent variable from bottleneck layer of neural network can effectively capture slow degrees of freedom from unbiased MD trajectories (Fig. 21 and 20). This type of neural network based order parameter can be further used to drive enhanced sampling calculations. Sidky et al.66 and Wang et al.67 laid out several machine learning algorithms that have been tested in context of biomolecular simulation. I believe most of these algorithms can be applied on PAPs with an aim to automatise identification of order parameters that governs its conformational dynamics. Bigger size of PAPs and prior knowledge on the role of conserved Tyr in flap dynamics (makes it easier to interpreted abstract order parameters) makes PAPs an ideal testing ground for novel machine learning algorithms (e.g. slow feature analyses,68 LSTM69 or transformer70 like recurrent neural network).
Open idea
Can one design a deep learning algorithm which will take multiple X-ray structures as inputs and predict trial CVs that captures slow degrees of freedom which governs conformational dynamics of a biomolecule? One can use PAPs as a test case in this project. The variations in PDB structures of BACE1 and PlmII are mainly dominated by torsional order parameters associated with conserved Tyr. An output trial CV from a deep learning architecture should give higher weights to torsional order parameters associated with Tyr. Besides, it must capture some non-linear combinations of other torsional order parameters that vary among these X-ray structures. The trial CVs can be iteratively optimized using a metadynamics framework similar to VAC.52
Order parameters to capture ligand binding/unbinding
Ligand or substrate binding with PAPs require conformational change i.e. opening of the flap so that ligand/substrate can bind to the active site. Flap opening in PAPs is similar to the flap opening in HIV/SIV protease.75 A generalised order parameter that can describe ligand binding/unbinding can be defined as follows:
‘COMunbind: the distance between COM of the ligand heavy atoms and the COM of the catalytic aspartates (Cα atoms)’
Plotting the aforementioned order parameter COMunbind in combination of DIST2 gives an indication of the extent of flap opening during ligand binding/unbinding. Further, plotting COMunbind with χ1 and χ2 angles of Tyr can give an idea on how flipping of conserved Tyr residue affects ligand binding/unbinding.
However, reversible sampling of ligand binding/unbinding using physical order parameters such as COMunbind is not a trivial task. In one such initial effort Bhakat and Söderhjelm used COMunbind order parameter within well-tempered metadynamics simulation. The results showed that flap opening is necessary for ligand unbinding (Fig. 22). However, the challenge is to multiple reversible sampling of ligand binding/unbinding in order to validate the role of flap opening with statistical certainty. Another open question which goes hand in hand with ligand binding: ‘does flipping of conserved Tyr/Phe in PAPs necessary during ligand binding to accommodate the ligand in the active site?’ Recently, Limongelli77 reviewed several pathway based simulation methods that can capture conformational change induced ligand binding/unbinding. One of these simulation methods include funnel metadynamics78 (facilitates frequent ligand binding/unbinding by using a funnel like restraint potential which reduces the sampling of the unbound state) and its variant volume-biased (sphere shaped restraint potential instead of funnel) metadynamics.79 In theory it is possible to use COMunbind order parameter within funnel/volume biased‡‡ metadynamics framework to understand how conformational changes in PAPs induce ligand binding/unbinding (Fig. 23 and 24). Other enhanced sampling methods such as accelerated MD (aMD),80–82 weighted ensemble (WE) method,83–85 adiabatic-biased MD (ABMD)86,87etc can also be tested on PAPs to explore conformational dynamics, kinetics and free energy associated with ligand binding/unbinding.
Challenge to the simulation community
Recently, macro-cyclic ligand bound structures of BACE1 were reported by Yen et al.88 (PDB IDs: 6NV7, 6NV9, 6NW3). The authors further reported experimental Ki and Kd values as well as several thermodynamic parameters (ΔH, TΔS, and ΔG) of ligand binding (thermodynamic parameters were calculated using isothermal titration calorimetry (ITC) experiments). Keeping in mind the availability of experimental datas, size and number of torsion angles of the macro-cyclic ligands and large scale conformational dynamics of BACE1, it will be the perfect challenge for methods e.g. funnel metadynamics, volume biased metadynamics, variational autoencoder driven infrequent metadynamics,89,90 WE,91 aMD etc. To predict binding free energies and kinetics on clinically important protein-ligand systems beyond typical test cases e.g. benzamide–trypsin and on systems where the ligand is relatively small with fewer torsional angles and there is no large scale conformational dynamics upon ligand binding.
Combining experiment with molecular simulation
Rotation of Tyr side-chain and opening of the flap region makes PAPs an ideal case to apply methods that combines molecular simulations with biophysical experiments. Combining MD simulation with NMR spin relaxation derived dynamical information93–97 to capture residue-wise backbone/side-chain configurational entropy is a powerful method to understand stability of proteins and its changes upon perturbation. An analytical approach to capture configurational entropy is to measure residue-wise O2 (often written as S2) from MD simulations. It has been shown that configurational entropy of protein can be described from dihedral distribution sampled during MD simulations.95 For each side-chain dihedral angle ωj, the probability distribution p(ωj) is determined by von-Mises kernel estimation which contributes toward the configurational entropy S:
3 |
O2 is mathematically connected with S using the following equation:
S = kBM[A + Bf(1 − ONMR2)] | 4 |
where M denotes number of side-chain dihedral angles, A and B are fitted parameters. One way to capture O2 from MD simulation is to use principal component based method otherwise known as isotropic reorientation eigenmode dynamics (iRED).98 In principal, O2 derived from MD simulation can be compared with NMR spin relaxation experiments which can give us an idea of residue-wise dynamics associated with the flap region of PAPs (Fig. 25). NMR experiments can further give insights (both dynamics and kinetics) into ring flip associated with conserved Tyr/Phe and Trp in PAPs (ref. 99–101).
Besides combining NMR with MD simulation, time resolved fluorescence spectroscopy, FRET and Raman spectroscopy§§ can be applied to capture conformational dynamics of PAPs. This is an open area of research which didn't gain much attention from the biophysical or structural biology community. In theory, time-dependent order parameters from experiments (e.g. FRET) can be integrated with MD simulations using embedding theorem e.g. Taken's embedding theorem (ref. 104). Ligand binding to PAPs can be calculated using isothermal titration calorimetry (ITC)105 or differential scanning calorimetry (DSC)106 which measures different thermodynamic parameters involved in the binding process. The major challenge in front of the simulation community is to calculate thermodynamic parameters from pathway based simulation methods (or from enhanced sampling calculations) that agrees with experiment. This is a growing area of research within biophysical community and PAPs can be an ideal test system for integrating biophysical experiments with biomolecular simulation.
Ideal playing field for drug repurposing
Plasmepsins (especially PlmII, PlmIV, PlmV, PlmIX and X) are drug targets for malaria.107,108 Artemisinin and chloroquine are the two major blockbuster drugs against malaria. Emergence of artemisinin109 and chloroquine110 resistant P. falciparum possess a huge risk towards eradication efforts towards malaria in Africa and South-East Asia. This calls for identification of novel drug targets against malaria and plasmepsins has potential to become the next big target against malaria. BACE-1 is a promising drug target for Alzheimer's disease with several inhibitors in different stages clinical trials and several ligand bound crystal structures deposited in PDB. Due to structural similarity and similar mechanism of action which governs conformational dynamics of apo PlmII and BACE-1 (in general all PAPs), the ligands targeting BACE-1 can be re-purposed on plasmepsins. Machine learning based methods (e.g. convolutional neural network) combined with molecular docking,111–113 free energy calculations114 (e.g. FEP, ABFE, MM/PBSA or MM/GBSA) and biochemical assay can then be used to identify BACE-1 inhibitors (in broad sense this concept is applicable for other PAP inhibitors e.g. human renin inhibitor) as potential plasmepsin inhibitors. This repurposing strategy can help to identify scaffolds¶¶ that can be further modified by synthetic chemists to develop potent molecules targeting plasmepsins (Fig. 26). I believe in future machine learning driven molecular docking combined with experiment will play an important role in repurposing other PAP inhibitors against plasmepsins.
Conclusions and future directions
The main aim of this review was to highlight the possibilities of using PAPs as model systems to test novel simulation methods and to combine biomolecular simulation with biophysical experiments. In order to kick-start this effort I have curated a Github profile which contains inputs for performing biomolecular simulation and order parameters that can be monitored during molecular simulation. Although in practice there has been a few advances in application of state-of-the art biomolecular simulations on PAPs, a combination of biophysical experiments with biomolecular simulation (otherwise known as integrative structural biology115) is far from a standard practice. Dynamic nature of PAPs makes it an ideal case to integrate experiment with molecular simulation using methods such as Bayesian-maximum entropy116etc. which can provide both atomic-level description and mechanistic understanding (e.g. population of normal/flipped states, role of force field and water models in population of different states, role of Tyr/Phe in conformational dynamics and ligand binding, extent of flap opening etc.) of the PAPs.
Finally, COVID-19 pandemic shows a resurgence of drug repurposing by combining molecular simulation with biochemical experiments. Projects like COVID Moonshot shows how an old simulation technique (free energy perturbation) can be effectively used in order to hunt novel drug candidates against COVID. Similar simulation strategies (AI driven molecular docking, free energy perturbation etc.) can be applied to re-purpose BACE1, human renin, pepsin inhibitors against plasmepsins followed up by experimental validation (biochemical assay, X-ray crystallography, NMR etc.). Further keeping in mind the neglected117 status of malaria it is of utmost importance to share computational and experimental protocols openly using public data/code-sharing platforms e.g. Jupyter Notebook, Github, Google Collab etc. Finally, bigger and dynamical nature PAPs and their role in human disease makes it an ideal model system for experimentalists and computational chemists to work together towards developing new methods and applying old methods which will have direct application towards developing novel therapeutics against malaria, Alzheimer's disease etc.
Note
To date, no pathway based simulation methods able to predict thermodynamics and kinetic parameters of HIV protease inhibitors (for the sake of sample size we can focus on nine clinically approved drugs) and compare it with experiments. Even in the year 2020, the most widely used method for small molecule drug discovery (clinically relevant) is molecular docking (often followed by visual inspection and FEP) which is computationally less expensive compared to physical pathway based methods. Two major problems of molecular docking are approximated energy functions and lack of sampling (especially of the protein). The remedy to this is a skin-in-the game problem.
‘when the solution is about solving this very problem-Nassim Nicholas Taleb’.
Observation
In D3R Grand Challenge 4 (ref. 118) which aims at predicting binding pose and free energy using BACE1, the top performing submissions didn't use pathway based sampling methods.
Conflicts of interest
Author declares no potential conflicts of interest.
Supplementary Material
Acknowledgments
SB thanks Swedish A-kassa for the financial support. The computations were performed on computer resources provided by the Swedish National Infrastructure for Computing (SNIC) at LUNARC (Lund University) and HPC2N (Umeå University).
Electronic supplementary information (ESI) available. See DOI: 10.1039/d0ra10359d
Footnotes
Focusing on structural and mechanistic aspects of PAPs.
Caution: due to lack of experimental validation on the varied population of different Tyr conformations in apo PlmII/BACE1, the readers should take the computational prediction with skepticism. The computational predictions are susceptible to choice of water-models, force-fields and protonation states.
One can make a cos transformation of χ1 which will handle the periodicity associated with χ angles.
Lack of structural reports (PDB structures) on this particular system makes it difficult to cross validate the claims.
The second principal component can be expressed as z2 = αT2x
TICA and VAC are methodologically equivalent. Tiwary and coworkers have demonstrated that SGOOP follows principle of maximum entropy (MaxEnt).53,54 An information theoretic approach for ICA has also highlighted its relation with MaxEnt38,55,56
The restraint potential shouldn't affect the natural dynamics associated with flap region.
Raman spectra can provide H-bonding pattern of Tyr and Trp side-chain residues. If Tyr residue shows Fermi resonance in Raman spectra that gives an indication of the rotational degrees of freedom associated with the Tyr side-chain.103
Without designing ligands from scratch.
References
- Dunn B. M. Structure and Mechanism of the Pepsin-Like Family of Aspartic Peptidases. Chem. Rev. 2002;102:4431–4458. doi: 10.1021/cr010167q. [DOI] [PubMed] [Google Scholar]
- Rawlings N. D. Barrett A. J. Thomas P. D. Huang X. Bateman A. Finn R. D. The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database. Nucleic Acids Res. 2017;46:D624–D632. doi: 10.1093/nar/gkx1134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mahanti M. Bhakat S. Nilsson U. J. Söderhjelm P. Flap Dynamics in Aspartic Proteases: A Computational Perspective. Chem. Biol. Drug Des. 2016;88:159–177. doi: 10.1111/cbdd.12745. [DOI] [PubMed] [Google Scholar]
- Moussa-Pacha N. M. Abdin S. M. Omar H. A. Alniss H. Al-Tel T. H. BACE1 inhibitors: Current status and future directions in treating Alzheimer's disease. Med. Res. Rev. 2020;40:339–384. doi: 10.1002/med.21622. [DOI] [PubMed] [Google Scholar]
- Karubiu W. Bhakat S. McGillewie L. Soliman M. E. S. Flap dynamics of plasmepsin proteases: insight into proposed parameters and molecular dynamics. Mol. BioSyst. 2015;11:1061–1066. doi: 10.1039/C4MB00631C. [DOI] [PubMed] [Google Scholar]
- Bhakat S. Söderhjelm P. Flap dynamics in pepsin-like aspartic proteases: a computational perspective using Plasmepsin-II and BACE-1 as model systems. bioRxiv. 2020:1–47. doi: 10.1021/acs.jcim.1c00840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumalo H. M. Bhakat S. Soliman M. E. Investigation of flap flexibility of β-secretase using molecular dynamic simulations. J. Biomol. Struct. Dyn. 2016;34:1008–1019. doi: 10.1080/07391102.2015.1064831. [DOI] [PubMed] [Google Scholar]
- Kumalo H. M. Soliman M. E. A comparative molecular dynamics study on BACE1 and BACE2 flap flexibility. J. Recept. Signal Transduction. 2016;36:505–514. doi: 10.3109/10799893.2015.1130058. [DOI] [PubMed] [Google Scholar]
- Gorfe A. A. Caflisch A. Functional Plasticity in the Substrate Binding Site of β−Secretase. Structure. 2005;13:1487–1498. doi: 10.1016/j.str.2005.06.015. [DOI] [PubMed] [Google Scholar]
- Spronk S. A. Carlson H. A. The role of tyrosine 71 in modulating the flap conformations of BACE1. Proteins: Struct., Funct., Bioinf. 2011;79:2247–2259. doi: 10.1002/prot.23050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bottaro S. Lindorff-Larsen K. Biophysical experiments and biomolecular simulations: A perfect match? Science. 2018;361:355–360. doi: 10.1126/science.aat4010. [DOI] [PubMed] [Google Scholar]
- Xu Y. Li M.-j. Greenblatt H. Chen W. Paz A. Dym O. Peleg Y. Chen T. Shen X. He J. Jiang H. Silman I. Sussman J. L. Flexibility of the flap in the active site of BACE1 as revealed by crystal structures and molecular dynamics simulations. Acta Crystallogr., Sect. D: Biol. Crystallogr. 2012;68:13–25. doi: 10.1107/S0907444911047251. [DOI] [PubMed] [Google Scholar]
- Asojo O. A. Gulnik S. V. Afonina E. Yu B. Ellman J. A. Haque T. S. Silva A. M. Novel Uncomplexed and Complexed Structures of Plasmepsin II, an Aspartic Protease from Plasmodium falciparum. J. Mol. Biol. 2003;327:173–181. doi: 10.1016/S0022-2836(03)00036-6. [DOI] [PubMed] [Google Scholar]
- Hodder A. N. Sleebs B. E. Czabotar P. E. Gazdik M. Xu Y. O'Neill M. T. Lopaticki S. Nebl T. Triglia T. Smith B. J. Lowes K. Boddey J. A. Cowman A. F. Structural basis for plasmepsin V inhibition that blocks export of malaria proteins to human erythrocytes. Nat. Struct. Mol. Biol. 2015;22:590–596. doi: 10.1038/nsmb.3061. [DOI] [PubMed] [Google Scholar]
- Waterhouse A. Bertoni M. Bienert S. Studer G. Tauriello G. Gumienny R. Heer F. T. de Beer T. A. Rempfer C. Bordoli L. Lepore R. Schwede T. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46:W296–W303. doi: 10.1093/nar/gky427. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prade L. Jones A. F. Boss C. Richard-Bildstein S. Meyer S. Binkert C. Bur D. X-ray Structure of Plasmepsin II Complexed with a Potent Achiral Inhibitor. J. Biol. Chem. 2005;280:23837–23843. doi: 10.1074/jbc.M501519200. [DOI] [PubMed] [Google Scholar]
- Boss C. Corminboeuf O. Grisostomi C. Meyer S. Jones A. Prade L. Binkert C. Fischli W. Weller T. Bur D. Achiral, Cheap, and Potent Inhibitors of Plasmepsins I, II, and IV. ChemMedChem. 2006;1:1341–1345. doi: 10.1002/cmdc.200600223. [DOI] [PubMed] [Google Scholar]
- Rasina D. Otikovs M. Leitans J. Recacha R. Borysov O. V. Kanepe-Lapsa I. Domraceva I. Pantelejevs T. Tars K. Blackman M. J. Jaudzems K. Jirgensons A. Fragment-Based Discovery of 2-Aminoquinazolin-4(3H)-ones As Novel Class Nonpeptidomimetic Inhibitors of the Plasmepsins I, II, and IV. J. Med. Chem. 2016;59:374–387. doi: 10.1021/acs.jmedchem.5b01558. [DOI] [PubMed] [Google Scholar]; , PMID: 26670264
- Lee A. Y. Gulnik S. V. Erickson J. W. Conformational switching in an aspartic proteinase. Nat. Struct. Biol. 1998;5:866–871. doi: 10.1038/2306. [DOI] [PubMed] [Google Scholar]
- Baldwin E. T. Bhat T. N. Gulnik S. Hosur M. V. Sowder R. C. Cachau R. E. Collins J. Silva A. M. Erickson J. W. Crystal structures of native and inhibited forms of human cathepsin D: implications for lysosomal targeting and drug design. Proc. Natl. Acad. Sci. U. S. A. 1993;90:6796–6800. doi: 10.1073/pnas.90.14.6796. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hornak V. Okur A. Rizzo R. C. Simmerling C. HIV-1 protease flaps spontaneously close to the correct structure in simulations following manual placement of an inhibitor into the open state. J. Am. Chem. Soc. 2006;128:2812–2813. doi: 10.1021/ja058211x. [DOI] [PMC free article] [PubMed] [Google Scholar]; , Using Smart Source Parsing Mar 8
- Hornak V. Okur A. Rizzo R. C. Simmerling C. HIV-1 protease flaps spontaneously open and reclose in molecular dynamics simulations. Proc. Natl. Acad. Sci. U. S. A. 2006;103:915–920. doi: 10.1073/pnas.0508452103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma S. Henderson J. A. Shen J. Exploring the pH-dependent structure-dynamics-function relationship of human renin. J. Chem. Inf. Model. 2021;61(1):400–407. doi: 10.1021/acs.jcim.0c01201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patel S. Vuillard L. Cleasby A. Murray C. W. Yon J. Apo and Inhibitor Complex Structures of BACE (β − secretase) J. Mol. Biol. 2004;343:407–416. doi: 10.1016/j.jmb.2004.08.018. [DOI] [PubMed] [Google Scholar]
- Hong L. Tang J. Flap Position of Free Memapsin 2 (β-Secretase), a Model for Flap Opening in Aspartic Protease Catalysis. Biochemistry. 2004;43:4689–4695. doi: 10.1021/bi0498252. [DOI] [PubMed] [Google Scholar]
- Oefner C. et al., Renin inhibition by substituted piperidines: A novel paradigm for the inhibition of monomeric aspartic proteinases? Chemistry and Biology. 1999;6:127–131. doi: 10.1016/S1074-5521(99)89004-8. [DOI] [PubMed] [Google Scholar]
- Bobrovs R. Jaudzems K. Jirgensons A. Exploiting Structural Dynamics To Design Open-Flap Inhibitors of Malarial Aspartic Proteases. J. Med. Chem. 2019;62:8931–8950. doi: 10.1021/acs.jmedchem.9b00184. [DOI] [PubMed] [Google Scholar]
- Gilliland G. L. Winborne E. L. Nachman J. Wlodawer A. The three-dimensional structure of recombinant bovine chymosin at 2.3 Å resolution. Proteins: Struct., Funct., Bioinf. 1990;8:82–101. doi: 10.1002/prot.340080110. [DOI] [PubMed] [Google Scholar]
- Suzuki F. Goto K. Shiratori Y. Inagami T. Murakami K. Nakamura Y. Tyrosine 83 of renin has an important role in renin?angiotensinogen reaction. Protein Pept. Lett. 1996;3:45–49. [Google Scholar]
- Park Y.-N. Aikawa J.-i. Nishiyama M. Horinouchi S. Beppu T. Involvement of a residue at position 75 in the catalytic mechanism of a fungal aspartic proteinase, Rhizomucor pusilus pepsin. Replacement of tyrosine 75 on the flap by asparagine enhances catalytic efficiency. Protein Eng., Des. Sel. 1996;9:869–875. doi: 10.1093/protein/9.10.869. [DOI] [PubMed] [Google Scholar]
- Sittel F. Jain A. Stock G. Principal component analysis of molecular dynamics: On the use of Cartesian vs. internal coordinates. J. Chem. Phys. 2014;141:014111. doi: 10.1063/1.4885338. [DOI] [PubMed] [Google Scholar]
- David C. C., and Jacobs D. J., in Protein Dynamics: Methods and Protocols, ed. D. R. Livesay, Humana Press, Totowa, NJ, 2014, pp. 193–226 [Google Scholar]
- Stein S. A. M., Loccisano A. E., Firestine S. M., and Evanseck J. D., in Chapter 13 Principal Components Analysis: A Review of its Application on Molecular Dynamics Data, Annual Reports in Computational Chemistry, ed. D. C. Spellmeyer, Elsevier, 2006; Vol. 2, pp 233 – 261 [Google Scholar]
- Naritomi Y. Fuchigami S. Slow dynamics of a protein backbone in molecular dynamics simulation revealed by time-structure based independent component analysis. J. Chem. Phys. 2013;139:215102. doi: 10.1063/1.4834695. [DOI] [PubMed] [Google Scholar]
- Blaschke T. Berkes P. Wiskott L. What Is the Relation Between Slow Feature Analysis and Independent Component Analysis? Neural Comput. 2006;18:2495–2508. doi: 10.1162/neco.2006.18.10.2495. [DOI] [PubMed] [Google Scholar]
- Molgedey L. Schuster H. G. Separation of a mixture of independent signals using time delayed correlations. Phys. Rev. Lett. 1994;72:3634–3637. doi: 10.1103/PhysRevLett.72.3634. [DOI] [PubMed] [Google Scholar]
- Schwantes C. R. Pande V. S. Modeling Molecular Kinetics with tICA and the Kernel Trick. J. Chem. Theory Comput. 2015;11:600–608. doi: 10.1021/ct5007357. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shlens J., A Tutorial on Independent Component Analysis, CoRR abs/1404.2986, 2014 [Google Scholar]
- Zhou H. Wang F. Tao P. t-Distributed Stochastic Neighbor Embedding Method with the Least Information Loss for Macromolecular Simulations. J. Chem. Theory Comput. 2018;14:5499–5510. doi: 10.1021/acs.jctc.8b00652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Spiwok V. Kříž P. Time-Lagged t-Distributed Stochastic Neighbor Embedding (t-SNE) of Molecular Simulation Trajectories. Front. Mol. Biosci. 2020;7:132. doi: 10.3389/fmolb.2020.00132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wattenberg M. Viégas F. Johnson I. How to Use t-SNE Effectively. Distill. 2016 doi: 10.23915/distill.00002. [DOI] [Google Scholar]
- van der Maaten L. Hinton G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008;9:2579–2605. [Google Scholar]
- Ferguson A. L. Panagiotopoulos A. Z. Kevrekidis I. G. Debenedetti P. G. Nonlinear dimensionality reduction in molecular simulation: The diffusion map approach. Chem. Phys. Lett. 2011;509:1–11. doi: 10.1016/j.cplett.2011.04.066. [DOI] [Google Scholar]
- Zheng W. Rohrdanz M. A. Clementi C. Rapid Exploration of Configuration Space with Diffusion-Map-Directed Molecular Dynamics. J. Phys. Chem. B. 2013;117:12769–12776. doi: 10.1021/jp401911h. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Abraham M. J. Murtola T. Schulz R. Pall S. Smith J. C. Hess B. Lindahl E. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1–2:19–25. doi: 10.1016/j.softx.2015.06.001. [DOI] [Google Scholar]
- Harrigan M. P. Sultan M. M. Hernández C. X. Husic B. E. Eastman P. Schwantes C. R. Beauchamp K. A. McGibbon R. T. Pande V. S. MSMBuilder: Statistical Models for Biomolecular Dynamics. Biophys. J. 2017;112:10–15. doi: 10.1016/j.bpj.2016.10.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McGibbon R. T. Beauchamp K. A. Harrigan M. P. Klein C. Swails J. M. Hernández C. X. Schwantes C. R. Wang L.-P. Lane T. J. Pande V. S. MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophys. J. 2015;109:1528–1532. doi: 10.1016/j.bpj.2015.08.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKeown M. J. Hansen L. K. Sejnowsk T. J. Independent component analysis of functional MRI: what is signal and what is noise? Curr. Opin. Neurobiol. 2003;13:620–629. doi: 10.1016/j.conb.2003.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun L.,, Liu Y.,, and Beadle P. J., Independent component analysis of EEG signals, Proceedings of 2005 IEEE International Workshop on VLSI Design and Video Technology, 2005, pp. 219–222 [Google Scholar]
- Klus S. Nüske F. Koltai P. Wu H. Kevrekidis I. Schütte C. Noé F. Data-Driven Model Reduction and Transfer Operator Approximation. J. Nonlinear Sci. 2018;28:985–1010. doi: 10.1007/s00332-017-9437-7. [DOI] [Google Scholar]
- Sultan M. Pande V. S. tICA-Metadynamics: Accelerating Metadynamics by Using Kinetically Selected Collective Variables. J. Chem. Theory Comput. 2017;13:2440–2447. doi: 10.1021/acs.jctc.7b00182. [DOI] [PubMed] [Google Scholar]
- McCarty J. Parrinello M. A variational conformational dynamics approach to the selection of collective variables in metadynamics. J. Chem. Phys. 2017;147:204109. doi: 10.1063/1.4998598. [DOI] [PubMed] [Google Scholar]
- Tiwary P. Berne B. J. Spectral gap optimization of order parameters for sampling complex molecular systems. Proc. Natl. Acad. Sci. U. S. A. 2016;113:2839–2844. doi: 10.1073/pnas.1600917113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ghosh K. Dixit P. D. Agozzino L. Dill K. A. The Maximum Caliber Variational Principle for Nonequilibria. Annu. Rev. Phys. Chem. 2020;71:213–238. doi: 10.1146/annurev-physchem-071119-040206. [DOI] [PMC free article] [PubMed] [Google Scholar]; , PMID: 32075515
- Bell A. J. Sejnowski T. J. An Information-Maximization Approach to Blind Separation and Blind Deconvolution. Neural Comput. 1995;7:1129–1159. doi: 10.1162/neco.1995.7.6.1129. [DOI] [PubMed] [Google Scholar]
- Hyvärinen A., Karhunen J. and Oja E., Independent Component Analysis, John Wiley and Sons, Ltd, 2001, ch. 10, pp. 221–227 [Google Scholar]
- Rutledge D. Jouan-Rimbaud Bouveresse D. Independent Components Analysis with the JADE algorithm. TrAC, Trends Anal. Chem. 2013;50:22–32. doi: 10.1016/j.trac.2013.03.013. [DOI] [Google Scholar]
- Hyvärinen A. Oja E. Independent component analysis: algorithms and applications. Neural Network. 2000;13:411–430. doi: 10.1016/S0893-6080(00)00026-5. [DOI] [PubMed] [Google Scholar]
- Sultan M. M. Pande V. S. Automated design of collective variables using supervised machine learning. J. Chem. Phys. 2018;149:094106. doi: 10.1063/1.5029972. [DOI] [PubMed] [Google Scholar]
- Cortes C. Vapnik V. Support-vector networks. Mach. Learn. 1995;20:273–297. [Google Scholar]
- Crammer K. Dekel O. Keshet J. Shalev-Shwartz S. Singer Y. Online Passive-Aggressive Algorithms. J. Mach. Learn. Res. 2006;7:551–585. [Google Scholar]
- Yu H.-F. Huang F.-L. Lin C.-J. Dual coordinate descent methods for logistic regression and maximum entropy models. Mach. Learn. 2011;85:41–75. doi: 10.1007/s10994-010-5221-8. [DOI] [Google Scholar]
- Freund Y. Schapire R. E. Large Margin Classification Using the Perceptron Algorithm. Mach. Learn. 1999;37:277–296. doi: 10.1023/A:1007662407062. [DOI] [Google Scholar]
- Martinez A. M. Kak A. C. PCA versus LDA. IEEE Trans. Pattern Anal. Mach. Intell. 2001;23:228–233. doi: 10.1109/34.908974. [DOI] [Google Scholar]
- Mendels D. Piccini G. Parrinello M. Collective Variables from Local Fluctuations. J. Phys. Chem. Lett. 2018;9:2776–2781. doi: 10.1021/acs.jpclett.8b00733. [DOI] [PubMed] [Google Scholar]
- Sidky H. Chen W. Ferguson A. L. Machine learning for collective variable discovery and enhanced sampling in biomolecular simulation. Mol. Phys. 2020;118:e1737742. doi: 10.1080/00268976.2020.1737742. [DOI] [Google Scholar]
- Wang Y. Lamim Ribeiro J. M. Tiwary P. Machine learning approaches for analyzing and enhancing molecular dynamics simulations. Curr. Opin. Struct. Biol. 2020;61:139–145. doi: 10.1016/j.sbi.2019.12.016. [DOI] [PubMed] [Google Scholar]; , Theory and Simulation: Macromolecular Assemblies
- Creutzig F. Sprekeler H. Predictive Coding and the Slowness Principle: An Information-Theoretic Approach. Neural Comput. 2008;20:1026–1041. doi: 10.1162/neco.2008.01-07-455. [DOI] [PubMed] [Google Scholar]
- Hochreiter S. Schmidhuber J. Long Short-Term Memory. Neural Comput. 1997;9:1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
- Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., Kaiser L., and Polosukhin I.Attention Is All You Need. 2017 [Google Scholar]
- Hernández C. X. Wayment-Steele H. K. Sultan M. M. Husic B. E. Pande V. S. Variational encoding of complex dynamics. Phys. Rev. E. 2018;97:062412. doi: 10.1103/PhysRevE.97.062412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Y. Ribeiro J. M. L. Tiwary P. Past–future information bottleneck for sampling molecular reaction coordinate simultaneously with thermodynamics and kinetics. Nat. Commun. 2019;10:3573. doi: 10.1038/s41467-019-11405-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldfeld Z., and Polyanskiy Y.The Information Bottleneck Problem and Its Applications in Machine Learning. 2020 [Google Scholar]
- Shwartz-Ziv R., and Tishby N.Opening the Black Box of Deep Neural Networks via Information. 2017 [Google Scholar]
- Yu Y. Wang J. Chen Z. Wang G. Shao Q. Shi J. Zhu W. Structural insights into HIV-1 protease flap opening processes and key intermediates. RSC Adv. 2017;7:45121–45128. doi: 10.1039/C7RA09691G. [DOI] [Google Scholar]
- Recacha R. Leitans J. Akopjana I. Aprupe L. Trapencieris P. Jaudzems K. Jirgensons A. Tars K. Structures of plasmepsin II from Plasmodium falciparum in complex with two hydroxyethylamine-based inhibitors. Acta Crystallogr., Sect. F: Struct. Biol. Commun. 2015;71:1531–1539. doi: 10.1107/S2053230X15022049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Limongelli V. Ligand binding free energy and kinetics calculation in 2020. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2020;10:e1455. doi: 10.1002/wcms.1455. [DOI] [Google Scholar]
- Raniolo S. Limongelli V. Ligand binding free-energy calculations with funnel metadynamics. Nat. Protoc. 2020;15:2837–2866. doi: 10.1038/s41596-020-0342-4. [DOI] [PubMed] [Google Scholar]
- Capelli R. Carloni P. Parrinello M. Exhaustive Search of Ligand Binding Pathways via Volume-Based Metadynamics. J. Phys. Chem. Lett. 2019;10:3495–3499. doi: 10.1021/acs.jpclett.9b01183. [DOI] [PubMed] [Google Scholar]
- Kappel K. Miao Y. McCammon J. A. Accelerated molecular dynamics simulations of ligand binding to a muscarinic G-protein-coupled receptor. Q. Rev. Biophys. 2015;48:479–487. doi: 10.1017/S0033583515000153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Araki M. Matsumoto S. Bekker G.-J. Isaka Y. Sagae Y. Kamiya N. Okuno Y. Exploring ligand binding pathways on proteins using hypersound–accelerated molecular dynamics. bioRxiv. 2020:1–14. doi: 10.1038/s41467-021-23157-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kokh D. B. Amaral M. Bomke J. Grädler U. Musil D. Buchstaller H.-P. Dreyer M. K. Frech M. Lowinski M. Vallee F. Bianciotto M. Rak A. Wade R. C. Estimation of Drug-Target Residence Times by τ−Random Acceleration Molecular Dynamics Simulations. J. Chem. Theory Comput. 2018;14:3859–3869. doi: 10.1021/acs.jctc.8b00230. [DOI] [PubMed] [Google Scholar]
- Nunes-Alves A. Zuckerman D. M. Arantes G. M. Escape of a Small Molecule from Inside T4 Lysozyme by Multiple Pathways. Biophys. J. 2018;114:1058–1066. doi: 10.1016/j.bpj.2018.01.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ahn S.-H. Jagger B. Amaro R. E. Ranking of Ligand Binding Kinetics using a Weighted Ensemble Approach and Comparison with Milestoning. Biophys. J. 2020;118:305a. doi: 10.1016/j.bpj.2019.11.1725. [DOI] [PubMed] [Google Scholar]
- Dickson A. Lotz S. D. Multiple Ligand Unbinding Pathways and Ligand-Induced Destabilization Revealed by WExplore. Biophys. J. 2017;112:620–629. doi: 10.1016/j.bpj.2017.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnston J. M., and Filizola M. in G Protein-Coupled Receptors - Modeling and Simulation, ed.M. Filizola, Springer Netherlands, Dordrecht, 2014, pp. 95–125 [Google Scholar]
- Marchi M. Ballone P. Adiabatic bias molecular dynamics: A method to navigate the conformational space of complex molecular systems. J. Chem. Phys. 1999;110:3697–3702. doi: 10.1063/1.478259. [DOI] [Google Scholar]
- Yen Y.-C. Kammeyer A. M. Jensen K. C. Tirlangi J. Ghosh A. K. Mesecar A. D. Development of an Efficient Enzyme Production and Structure-Based Discovery Platform for BACE1 Inhibitors. Biochemistry. 2019;58:4424–4435. doi: 10.1021/acs.biochem.9b00714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casasnovas R. Limongelli V. Tiwary P. Carloni P. Parrinello M. Unbinding Kinetics of a p38 MAP Kinase Type II Inhibitor from Metadynamics Simulations. J. Am. Chem. Soc. 2017;139:4780–4788. doi: 10.1021/jacs.6b12950. [DOI] [PubMed] [Google Scholar]
- Lamim Ribeiro J. M. Provasi D. Filizola M. A combination of machine learning and infrequent metadynamics to efficiently predict kinetic rates, transition states, and molecular determinants of drug dissociation from G protein-coupled receptors. J. Chem. Phys. 2020;153:124105. doi: 10.1063/5.0019100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dickson A. Lotz S. D. Multiple Ligand Unbinding Pathways and Ligand-Induced Destabilization Revealed by WExplore. Biophys. J. 2017;112:620–629. doi: 10.1016/j.bpj.2017.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Limongelli V. Bonomi M. Parrinello M. Funnel metadynamics as accurate binding free-energy method. Proc. Natl. Acad. Sci. U. S. A. 2013;110:6358–6363. doi: 10.1073/pnas.1303186110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trbovic N. Kim B. Friesner R. A. Palmer III A. G. Structural analysis of protein dynamics by MD simulations and NMR spin-relaxation. Proteins: Struct., Funct., Bioinf. 2008;71:684–694. doi: 10.1002/prot.21750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharp K. A. O'Brien E. Kasinath V. Wand A. J. On the relationship between NMR-derived amide order parameters and protein backbone entropy changes. Proteins: Struct., Funct., Bioinf. 2015;83:922–930. doi: 10.1002/prot.24789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li D.-W. Brüschweiler R. A Dictionary for Protein Side-Chain Entropies from NMR Order Parameters. J. Am. Chem. Soc. 2009;131:7226–7227. doi: 10.1021/ja902477s. [DOI] [PubMed] [Google Scholar]
- Gu Y. Li D.-W. Brüschweiler R. NMR Order Parameter Determination from Long Molecular Dynamics Trajectories for Objective Comparison with Experiment. J. Chem. Theory Comput. 2014;10:2599–2607. doi: 10.1021/ct500181v. [DOI] [PubMed] [Google Scholar]
- Villa A. Gerhard S. What NMR Relaxation Can Tell Us about the Internal Motion of an RNA Hairpin: A Molecular Dynamics Simulation Study. J. Chem. Theory Comput. 2006;2:1228–1236. doi: 10.1021/ct600160z. doi: 10.1021/ct600160z. [DOI] [PubMed] [Google Scholar]
- Prompers J. J. Brüschweiler R. General Framework for Studying the Dynamics of Folded and Nonfolded Proteins by NMR Relaxation Spectroscopy and MD Simulation. J. Am. Chem. Soc. 2002;124:4522–4534. doi: 10.1021/ja012750u. [DOI] [PubMed] [Google Scholar]
- Weininger U. Modig K. Akke M. Ring Flips Revisited: 13C Relaxation Dispersion Measurements of Aromatic Side Chain Dynamics and Activation Barriers in Basic Pancreatic Trypsin Inhibitor. Biochemistry. 2014;53:4519–4525. doi: 10.1021/bi500462k. [DOI] [PubMed] [Google Scholar]
- Dreydoppel M. Raum H. N. Weininger U. Slow ring flips in aromatic cluster of GB1 studied by aromatic 13C relaxation dispersion methods. J. Biomol. NMR. 2020;74:183–191. doi: 10.1007/s10858-020-00303-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hattori M. Li H. Yamada H. Akasaka K. Hengstenberg W. Gronwald W. Kalbitzer H. R. Infrequent cavity-forming fluctuations in HPr from Staphylococcus carnosus revealed by pressure and temperature dependent tyrosine ring flips. Protein Sci. 2004;13:3104–3114. doi: 10.1110/ps.04877104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pastor N. Amero C. Information flow and protein dynamics: the interplay between nuclear magnetic resonance spectroscopy and molecular dynamics simulations. Front. Plant Sci. 2015;6:306. doi: 10.3389/fpls.2015.00306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Valldeperas M. Talaikis M. Dhayal S. K. Velička M. Barauskas J. Niaura G. Nylander T. Encapsulation of Aspartic Protease in Nonlamellar Lipid Liquid Crystalline Phases. Biophys. J. 2019;117:829–843. doi: 10.1016/j.bpj.2019.07.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J. Ferguson A. L. Nonlinear reconstruction of single-molecule free-energy surfaces from univariate time series. Phys. Rev. E. 2016;93:032412. doi: 10.1103/PhysRevE.93.032412. [DOI] [PubMed] [Google Scholar]
- Su H. Xu Y. Application of ITC-Based Characterization of Thermodynamic and Kinetic Association of Ligands With Proteins in Drug Design. Front. Pharmacol. 2018;9:1133. doi: 10.3389/fphar.2018.01133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Celej M. S. Dassie S. A. Gonzalez M. Bianconi M. L. Fidelio G. D. Differential scanning calorimetry as a tool to estimate binding parameters in multiligand binding proteins. Anal. Biochem. 2006;350:277–284. doi: 10.1016/j.ab.2005.12.029. [DOI] [PubMed] [Google Scholar]
- Dan N. Bhakat S. New paradigm of an old target: An update on structural biology and current progress in drug design towards plasmepsin {II} Eur. J. Med. Chem. 2015;95:324–348. doi: 10.1016/j.ejmech.2015.03.049. [DOI] [PubMed] [Google Scholar]
- Cheuka P. M. Dziwornu G. Okombo J. Chibale K. Plasmepsin Inhibitors in Antimalarial Drug Discovery: Medicinal Chemistry and Target Validation (2000 to Present) J. Med. Chem. 2020;63:4445–4467. doi: 10.1021/acs.jmedchem.9b01622. [DOI] [PubMed] [Google Scholar]
- Dondorp A. M. et al., Artemisinin Resistance in Plasmodium falciparum Malaria. N. Engl. J. Med. 2009;361:455–467. doi: 10.1056/NEJMoa0808859. [DOI] [PMC free article] [PubMed] [Google Scholar]; , PMID: 19641202
- Wellems T. E. Plowe C. V. Chloroquine-Resistant Malaria. J. Infect. Dis. 2001;184:770–776. doi: 10.1086/322858. [DOI] [PubMed] [Google Scholar]
- Jiang H. Fan M. Wang J. Sarma A. Shruti M. Dokholyan Nikolay V. Mehrdad M. Kandemir Mahmut T. Guiding Conventional Protein–Ligand Docking Software with Convolutional Neural Networks. J. Chem. Inf. Model. 2020;60:4594–4602. doi: 10.1021/acs.jcim.0c00542. doi: 10.1021/acs.jcim.0c00542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang B. Ho-Leung Ng. Deep neural network affinity model for BACE inhibitors in D3R Grand Challenge 4. J. Comput.-Aided Mol. Des. 2020;34:201–217. doi: 10.1007/s10822-019-00275-z. [DOI] [PubMed] [Google Scholar]
- Wallach I., Dzamba M., and Heifets A., AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery. CoRR abs/1510.02855, 2015 [Google Scholar]
- Zoe C. Allen B. Sherman W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. J. Chem. Inf. Model. 2017;57:2911–2937. doi: 10.1021/acs.jcim.7b00564. doi: 10.1021/acs.jcim.7b00564. [DOI] [PubMed] [Google Scholar]
- Verteramo M. L. Stenström O. Ignjatović M. M. Caldararu O. Olsson M. A. Manzoni F. Leffler H. Oksanen E. Logan D. T. Nilsson U. J. Ryde U. Akke M. Interplay between Conformational Entropy and Solvation Entropy in Protein–Ligand Binding. J. Am. Chem. Soc. 2019;141:2012–2026. doi: 10.1021/jacs.8b11099. [DOI] [PubMed] [Google Scholar]
- He J. Alexander K. Bayesian maximum entropy approach and its applications: a review. Stoch. Environ. Res. Risk Assess. 2018;32:859–877. doi: 10.1007/s00477-017-1419-7. [DOI] [Google Scholar]
- Allarakhia M. Ajuwon L. Understanding and creating value from open source drug discovery for neglected tropical diseases. Expet Opin. Drug Discov. 2012;7:643–657. doi: 10.1517/17460441.2012.690390. [DOI] [PubMed] [Google Scholar]; , PMID: 22657529
- Parks Conor D. Gaieb Z. Chiu M. Yang H. Shao C. Patrick W. W. Jansen Johanna M. Georgia McG. Lewis Richard A. Bembenek Scott D. Ameriks Michael K. Tara M. Burley Stephen K. Amaro Rommie E. Gilson Michael K. D3R grand challenge 4: blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies. J. Comput.-Aided Mol. Des. 2020;34:99–119. doi: 10.1007/s10822-020-00289-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.