Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been widely spread around the world. It is necessary to examine the viral proteins that play a notorious role in the invasion of our body. The main protease (3CLpro) facilitates the maturation of the coronavirus. It is thought that the dimerization of 3CLpro leads to its catalytic activity; the detailed mechanism has, however, not been suggested. Furthermore, the structural differences between the predecessor SARS-CoV 3CLpro and SARS-CoV-2 3CLpro have not been fully understood. Here, we show the structural and dynamical differences between the two main proteases, and demonstrate the relationship between the dimerization and the activity via atomistic molecular dynamics simulations. Simulating monomeric and dimeric 3CLpro systems for each protease, we show that (i) global dynamics between the two different proteases are not conserved, (ii) the dimerization stabilizes the catalytic dyad and hydration water molecules behind the dyad, and (iii) the substrate-binding site (active site) and hydration water molecules in each protomer fluctuate asymmetrically. We then speculate the roles of hydration water molecules in their catalytic activity.
Keywords: SARS-CoV-2, Cysteine protease, Atomistic molecular dynamics simulation, Catalysis, Water molecule, Dimerization
1. Introduction
Severe Acute Respiratory Syndrome coronavirus (SARS-CoV) was originally identified in 2003 [1]. In 2019, the novel SARS coronavirus (SARS-CoV-2) emerged and has been spreading around the world. Both viruses cause severe pneumonia. The viruses encapsulate a single-stranded RNA in its viral envelope, and protrusions (i.e., spike proteins) project from the surface so that the virus can recognize certain receptors on cells.
The main protease (3-chymotrypsin like protease, 3CLpro) plays an important role in the maturation of the SARS-CoV and SARS-CoV-2 coronaviruses. The protease is a cysteine protease and has a catalytic dyad formed by histidine (H41) and cysteine (C145), which are highly conserved among different coronaviruses (Fig. S1). The protease catalyzes the polyprotein translated from the viral RNA, breaking the polyprotein into functional pieces, which include the protease itself. In the present study, we called the protease “3CLpro” and used the term “active site” as the one consisting of the physiological-ligand binding pocket and the hydrolysis-reaction catalytic site.
Because the SARS-CoV and SARS-CoV-2 3CLpro are a protease, designing binders to the active site is a direct way to deactivate the catalytic activity. On the basis of this manner, many investigations have been attempted to identify novel binders to the active site [2], [3], [4], [5], [6]–[7]. For instance, a Michael acceptor inhibitor (N3) was originally designed to inhibit SARS-CoV 3CLpro [2]. N3 also binds to the active site for SARS-CoV-2 3CLpro (PDBID: 6LU7 and 7BQY) [3].
Another way to impair the catalytic activity can be to inhibit the dimerization of the protease [8]. For dimeric SARS-CoV 3CLpro, an X-ray experiment suggested that the substrate pocket S1 in the active site of a protomer was collapsed and the other pocket wasn't, and hence one protomer is active, but the other is inactive [9]. The analysis of MD trajectories for dyad conformation suggested that one protomer in the dimeric SARS-CoV 3CLpro was inactive and a monomeric SARS-CoV 3CLpro was inactive [10]. However, for dimeric SARS-CoV-2 3CLpro, room-temperature X-ray crystallography demonstrated that the collapse of S1 pocket in the dimer did not occur; the authors pointed out that previous crystal structures might have an artefact [11]. The dimer dissociation constant for both SARS-CoV and SARS-CoV-2 3CLpro was estimated; also the catalytic efficiency of SARS-CoV-2 3CLpro was slightly higher than SARS-CoV one [12]. Subsequently, the dissociation constant was revised by native mass spectrometry, the value of which was about 0.14 0.03 [13]. The dimerization and the enzymatic activity was weakened by both the C- and N-terminal truncations of SARS-CoV 3CLpro [14]. On the other hand, the C-terminal truncation, which ranged from 300 to 306, did not cause noticeable effects on the quaternary structure [15]. The stability of several known compounds on the active site was assessed for finding potent binding sites [16]. Their blind docking result implied that the dimer interface would be a potent target.
Although the relationship between the dimerization and the catalytic activity has been investigated so far [8], few studies demonstrate how they relate and why the dimerization matters for the catalytic activity. Furthermore, it is unclear whether the results of SARS-CoV 3CLpro are reusable even for SARS-CoV-2 3CLpro. It is thus necessary to investigate the relationship and the difference between the two proteases. It should be confirmed whether SARS-CoV-2 3CLpro behaves in the same way as SARS-CoV 3CLpro; and whether it is inactive in a monomeric form and is active in a dimeric form. No study revealed, from a statistical and structural viewpoint, the reason why the monomer is inactive, but the dimer is active. The present study aims to unveil the relationship between the protease's structure/dynamics and its catalytic activity.
In the present study, we demonstrate that the dimerization of the proteases stabilizes their catalytic dyad and hydration water behind the dyad. We apply atomistic molecular dynamics simulations to calculate the global dynamics, dyad stability, and hydration water dynamics. We find that the global dynamics of SARS-CoV-2 3CLpro differ from those of SARS-CoV 3CLpr. For both dimeric proteases, each active site complies with different structural probability distributions. The catalytic dyad is retained in a dimeric state but not in a monomeric state, and hydration water molecules behind the dyad stay longer in a dimeric state than in a monomeric state.
2. Materials and methods
2.1. Preparation of simulation systems
All the target proteases in the present study are shown in Table 1. One simulation system for a dimeric system of SARS-CoV-2 3CLpro was constructed by homology modelling based on the structure of PDBID 1UJ1. The template sequence is shown in Table S1. Since the sequence identity is 96.3 %, the homology modelling approach would provide a reliable structural model. Furthermore, another dimeric system of SARS-CoV-2 3CLpro was constructed from the X-ray structure of PDBID 6LU7. The structural data of the homodimer was downloaded from the “Biological Assembly” of the PDB 6LU7. Note that one protomer's conformation in the dimeric system is identical to the other protomer's because the asymmetric unit of the PDBID 6LU7 is monomeric (i.e., the root mean square displacement between the two protomers is zero). On the basis of the chain IDs, each chain was named “protomer A” and “protomer B”.
Table 1.
Target | State | No | Ref | PDB ID | |||
---|---|---|---|---|---|---|---|
SASRS-CoV 3CLpro | Dimeric | 1 | [9] | 1UK3a | 87,554 | 5 | 1.0 |
2 | [9] | 1UJ1b | 79,123 | 5 | 1.0 | ||
Monomeric | 3 | [9] | 1UK3Ac | 68,521 | 5 | 1.0 | |
4 | [9] | 1UK3Bd | 70,073 | 5 | 1.0 | ||
SASRS-CoV-2 3CLpro | Dimeric | 5 | N/A | HMe | 78,938 | 5 | 1.0 |
6 | [18] | 6LU7f | 79,466 | 10 | 1.0 | ||
Monomeric | 7 | [18] | 6LU7f | 62,290 | 5 | 1.0 | |
Dimericg | 8 | [19] | 6WQF | 77,617 | 5 | 1.0 | |
Monomericg | 9 | [19] | 6WQF | 62,200 | 5 | 1.0 |
This structure was determined at pH 7.6.9 MD simulations were performed for this in our previous study [20].
This structure was determined at pH 6.0.9
Protomer A of 1UK3
Protomer B of 1UK3
A homology model (HM) derived from 1UJ1. This model was built by SWISS-MODEL [21],[22]. The query sequence is shown in Table S1 in the supporting information.
The ligand was removed from 6LU7, and thus the systems from 6LU7 were an apo form.
The dyad was ionized (i.e., H41+…C145−).
The protonation state of the histidine H41 would be crucial for the catalytic activity of 3CLpro. For the careful assignment, the catalytic site in a known X-ray structure of 3CLpro (PDBID: 1UJ1) was examined. The surroundings of the catalytic dyad (H41-C145) imply that of H41 is protonated and of H41 is not (Fig. S2). Accordingly, was protonated and was not in No. 1-7 simulations. Nevertheless, because it is also plausible for 3CLpro to form an ionized dyad [17], a dimeric system with two ionized dyad and a monomeric one with an ionized dyad were also constructed (No. 8-9 in Table 1). The computational results of the ionized systems are given in the Supporting Information.
For each structure prepared, the C- and N-terminus were capped by acetyl and amine groups, respectively. Then, each of the structures was immersed into a tetrahedron water box whose margin was 10 Å from each of the planes of the box to the structure. The salt concentration in the box was set to 0.1 M. Energy minimization was executed with all heavy atoms restrained and then without any restraints. Subsequently, five NVT simulations with different velocities were performed during 100 ps, and then five NPT simulations at 1 bar during 100 ps. From the five equilibrated structures, five independent NPT simulations at 1 bar at 310 K were performed. Only for the dimeric system of PDBID 6LU7, the number of simulations was 10. Production runs were performed for for the dimeric system and for the others (Table 1).
Computational setting are as follows: time step was set to 2 fs, an interaction table for cut-off scheme [23] was used whose list was updated every 20 fs. A leap-frog algorithm was used for an integrator, and LINCS [24] was for hydrogen-bond constraints (lincs_iter=1, lincs_order=4). V-rescale [25] was for temperature control for two groups, protein and non-protein (, compressibility=4.5106). Berendsen coupling [26] was used for the first box equilibration, and Parrinelo-Rahman coupling for all production runs to control pressure [27],[28] (, ). Short-range electrostatic and van der Waals interaction cut-off values were set to 12 Å. Amber ff99SB-ILDN was employed for protein force field [29] and TIP3P model [30] for water molecules. Reaction-field method [31] for long-range interaction calculation was used with the relative dielectric constant set infinity. Note that, in Gromacs notation, the option epsilon-rf=0 means that is infinity. Snapshots were stored for every 100 ps. Gromacs 2019.2 [32], [33], [34], [35], [36] was used for all simulations. The supercomputer TSUBAME 3.0 was used for all simulations. For analysis, MDAnalysis [37],[38] was used for parsing MD trajectory. All pictures for molecules were drawn by PyMOL 2.3.0 39. Every MD trajectory from to was analyzed (see Fig.s S3 and S4).
2.2. Principal Component Analysis (PCA)
Cartesian-based PCA (cPCA) was used for description of global protein dynamics. The cPCA provides dynamic modes, i.e., eigenvectors , where is a system name and is the number of coordinates of interest. The eigenvectors were arranged in a descending order of the eigenvalues. Correlations between and of different proteases were calculated (i.e., inner products ) [40]. Here, cPCA was applied to trajectories of the SARS-CoV 3CLpro and SARS-CoV-2 3CLpro systems. MD snapshot of a trajectory was expressed as , where is the number of atoms. Structural superimposition for every snapshot was carried out for all the atoms. The variance-covariance matrix was defined as , where is the ensemble average at temperature . was diagonalized, thereby generating eigenvectors (). Inner products were computed. Global modes were defined as the top-3 eigenvectors , , and . The inner products indicate the correlation of global modes between of SARS-CoV-2 3CLpro and SARS-CoV 3CLpro.
Furthermore, distance-based PCA (dPCA) was also utilized to grasp a structural distribution [41]. In terms of a structural distribution, dPCA shows better performance (e.g., convergence of cumulative variances) than cPCA.[41] The dPCA was thus applied for the active site that was represented as a set of distances between atoms (Table S2). The number of the selected atoms was . The active site of MD snapshot was expressed as a set of distances , where was the number of distance pairs, i.e.,. The variance-covariance matric was defined as . The matrix was diagonalized and then eigenvectors () were generated. The active site of each snapshot was projected onto every eigenvector by , where is the position of each snapshot on eigenvector, i.e., principal component (PC).
2.3. Statistical analysis of the catalytic dyad
The probability distribution of the catalytic dyad's distance was calculated (Fig. 1). On the basis of the probability distribution, the dyad was judged whether it was broken or not. The following procedure was used for the judgement: (i) The probability distribution of the dyad distance between and was calculated. The distribution was expressed as . (ii) The mode (the value that appears most frequently) of was calculated, which was expressed as . (iii) The deviation of from the reference distance was calculated by
(1) |
where was the distance between and in the X-ray structure of PDBID 6LU7. The following condition was applied for the judgement of the dyad's state:
(2) |
Using this condition, a confusion matrix was created (Table 3). This matrix was used for the evaluation of the structural difference of the dyad between the monomeric and the dimeric systems. Two-sided Fisher's exact test of independence was employed for the statistical assessment. The null hypothesis was that oligomeric states (i.e., monomeric or dimeric) are independent of the states of the dyad (i.e., broken or retained). Although this test is similar to test of independence, the use of Fisher's exact test is recommended if sample size is less than 1000 [42]. Matthew's correlation coefficient (MCC) of the matrix was also calculated for the assessment of the correlation between the two categories: monomeric/dimeric states and broken/retained dyad. MCC is considered a reliable statistical measure [43].
Table 3.
Monomeric | Dimeric | p-value= 0.0025 | |
---|---|---|---|
Broken dyad | 10 | 8 | MCC=0.48 |
Retained dyad | 5 | 32 |
2.4. Residence time of water molecules
Survival probability (SP) [44] of water molecules within the defined sphere (Fig. 2) was calculated by the following procedure: The average coordinates of the triad (H41, H164, and D187) is calculated for each snapshot. The number of water molecules at time within the sphere of radius 5 Å centered at is expressed as . The number of water molecules within the sphere from to is represented as , where is called a lag time. The SP is then defined as:
(3) |
where is the entire simulation time of interest and . The SP tells us how fast water molecules within the sphere diffuse. Let us consider two extreme cases: if the number of water molecules within the sphere never decreases after time , i.e., the water molecules stay in the sphere for the time ; if , the number of water molecules within the sphere vanishes for the time . Residence time was defined as the time when SP reaches 0.5. A python library, MDAnalysis, was used for the SP calculation [37], [38], [44].
3. Results and discussion
This section presents results of the systems with neutral dyads mainly, whereas results of the systems with ionized dyads are given in the supporting information.
3.1. Correlation between major dynamic modes of the two proteases
Performing cPCA for the entire protein structure, we produced their dynamic modes (i.e., eigenvectors) for the dimeric SARS-CoV 3CLpro and SARS-CoV-2 3CLpro systems. Since it is unclear how much the two proteases' global motion is conserved, we examined dynamic modes’ correlations between the two proteases. Here, the correlation was defined as cosine similarity between a pair of eigenvectors, which were scaled to be unit vectors. Table 2 shows the correlation for each pair of eigenvectors. The highest correlation was 0.37 between and , the second was 0.36 between and , and the third was 0.35 between and . The second mode was distributed into and . These results indicated that the major modes were not highly conserved, while the amino-acid sequence identity between two proteases was 96% that was very high value. The slight mutations on SARS-CoV-2 3CLpro altered the structural dynamics of the previous SARS-CoV 3CLpro. The alternation implied that the global dynamics were not a fundamental aspect for the proteases’ function.
Table 2.
SARS-CoV-2 3CLpro (6LU7) |
||||
---|---|---|---|---|
Mode | ||||
SARS-CoV 3CLpro (1UJ1&1UK3)a | 0.03 | 0.02 | 0.07 | |
0.36 | 0.11 | 0.35 | ||
0.05 | 0.13 | 0.37 |
The trajectories were initiated from two different initial structures 1UJ1 and 1UK3. Subsequently, they were merged into a single trajectory, which was analyzed via cPCA.
3.2. Structural Fluctuations of the Active Site
This section deals with structural fluctuations of the active site. For the analysis, we applied distance-based principal component analysis (dPCA) (Table S2 and Fig. 3E). Recall that the dimer has one active site in each protomer. We executed dPCA for each active site, obtaining the first principal component (PC1).
For the dimeric systems with neutral dyads, the active site in a protomer fluctuated differently from the other protomer's (Fig. 3). For instance, Traj2 in Fig. 3A shows two distinguishable probability distributions of PC1. Although the undistinguished distribution in Traj5 of Fig. 3D also occurred, it appeared rarer than the separated distributions. At the beginning of the simulations for 6LU7, the two active sites were structurally identical. This eliminates the possibility that the separated distribution was attributed to the initial structures. The asymmetric fluctuations may imply that substrate specificity of a protomer differ from that of the other. This asymmetric fluctuations of the active site were supported by several crystal structures whose protomer has conformationally distinct active site (e.g., PDBID: 1UK3) [9] and by earlier computational studies [10],[47]. An earlier study also pointed out the structural heterogeneity of the active site on the basis of known crystal strucutres [46].
For the monomeric systems with a neutral dyad, we observed multimodal probability distributions along with PC1 (Fig. S5). For example, Traj5 in Fig. S5C shows a bimodal distribution, which indicated that the monomeric systems’ active site was unstable more than the dimeric systems. Keeping active site's conformation was thus likely to be a difficult task for the monomer; this can be one explanation of why the monomer is inactive. For both proteases, the dynamic modes were altered by the slight mutations (Table 2), but the distributions of the active site showed the same tendency: each active site fluctuated asymmetrically (Fig. 3).
3.3. Dyad stability
We investigated the distance fluctuations of the catalytic dyad in order to connect the dyad's fluctuations with the catalytic activity of the 3CLpro. We calculated the distance between and . For each trajectory, we computed a probability distribution of and obtained the mode (i.e., the most frequent value). Using the mode and the reference distance , we judged whether the dyad was retained or broken (Table 3). Further details of this procedure are given in Materials and Methods.
We found that the dimeric systems tended to retain the active dyad's conformation more than the monomeric systems did (Fig. 4 and Table 3). To assess the difference between the monomeric and dimeric systems, we conducted Fisher's exact test and calculated Matthew's correlation coefficient (MCC). We have shown the statistical significance (p-value = 0.0025) if we employed the significance level of . Furthermore, MCC was 0.48, showing a positive correlation of broken dyad with the monomeric systems. Thus, from a statistical viewpoint, the dimeric systems are likely to retain the dyad, whereas the monomeric systems are less likely to do so. The dyad instability in the monomer was in agreement with the result that SARS-CoV 3CLpro (not SARS-CoV-2 3CLpro) at low concentration (corresponding to a monomeric state) did not show its enzymatic activity [10].
Table S3 shows that 20% dyad of one protomer in the dimeric systems tended to be broken. It implied that the dyad in each protomer kept its distance asymmetrically like the asymmetric conformation of the active sites in Fig. 3. We thus conjectured that the active site and the dyad in a protomer form conformations different from those in the other protomer and that each protomer has inhomogeneous catalytic activity. Although no study gave the direct evidence to this inhomogeneity, the structural heterogeneity of the active site has been pointed out from crystal structures [46].
It should be noted that the homology modelling structure could not retain the geometry of the dyad even though the template structure 1UJ1 retained its dyad geometry (Table S3). The entire dynamics seemed appropriate, but the probability distribution of the distance for the dyad might have been affected by the modelled initial structure. It suggested that a homology model should be handled carefully by examining not only the overall dynamics but also local structural fluctuations.
3.4. Consensus hydration sites of water molecules
To investigate the role of hydration water in the catalytic activity, we identified sites in which water molecules stayed for more than 500 ns (Fig. S6 and S7). From the sites, we selected and studied three sites (M1, M2, and D1, D2, and D3 in Fig. S6 and S7; M and D stands for Monomeric and Dimeric, respectively). Since these were found in most trajectories, we called the sites “consensus hydration sites”.
The consensus hydration sites had one noticeable feature: several water molecules were closely located and were stabilized mainly by the hydrogen bonds to N-H or C=O of the backbone (Fig. 5). For example, backbone's polar atoms in the site D2 of both protomers interacted with a water molecule (Fig. 5D). The water molecule stabilized the loop in the site D2. Interestingly, since some residues in the site D2 (e.g., F291, E290, and E288) were seen even in the site M2, this hydration site was likely to be conserved in both the monomeric and dimeric systems. The hydration site D2 may play a crucial role in the dimerization: An earlier study suggested that the mutation E290A for SARS-CoV 3CLpro led to a complete loss of the activity and dimerization because E290 in a protomer formed a salt bridge with R4 in the other protomer [48]. If the hydration water molecule does not stabilize the site D2 containing E290, forming an appropriate orientation of E290 towards R4 can be difficult. The hydration water was also observed in crystal structures.
The site D1 in the dimeric systems contained three water molecules (Fig. 5C). This site accommodated the water molecule (dubbed ) found in most crystal structures that is hydrogen-bonded to , , and [19],[45]. The three water molecules were positioned behind the catalytic dyad, whereas water molecules in the site D1 were not observed in the monomeric systems. Instead, the site M1 appeared and was adjacent to D1. M1 was likely to be transformed into D1 in the dimeric systems (Fig. S8)).
The site D1 accommodating the three water molecules corresponds to the one where a single crystal water molecule is bound (e.g., PDB ID 6WQF, 6LU7, 1UK3, and 1UJ1) (Fig. 2). The difference between our MD trajectories and the crystal structures was the number of hydration water molecules. The crystal structures show the existence of a single crystal water molecule in the site D1, whereas our trajectories indicated several water molecules in the site.
The site D3 was observed in most of the trajectories, which was located near D1 (Fig. 5E and the blue circles in Fig. S7). Thus, the site D3 can be said to be the most agreed hydration site among the dimeric systems. Our result indicated that this site was conserved among the two different proteases (Fig. S7). Since the water molecule in the site D3 stayed near D1, it can contribute to the stability of the site D1, and thus to the stability of the dyad. In the site D3, a crystal water molecule is also seen in crystal structures (e.g., 6WQF, 6LU7, 1UK3, 1UJ1, 6M03); it corresponds to the water molecule in Fig. 5E(b). Our trajectories demonstrated that the water molecule in the site D3 fluctuated between the following hydrogen-bond networks: Y182-G174-F134 and Y161-H172 (Fig. 5E(a) and 5E(b)). We noted that D3 was not observed in all the trajectories of the monomeric systems (Fig. S6).
3.5. Residence time of water molecules in the hydration site D1
In the previous section, we discussed the five hydration sites, and the site D1 contained three water molecules, one of which corresponded to the catalytic water Oin. Here, we investigate the site D1 further, using residence time , which indicates how long water molecules stay in a site on average (see the definition Eqn. 3). We computed survival probability (SP) of water molecules within the spherical space defined in Fig. 2 (This space was the definition of the site D1). We then estimated residence time for each system (Fig. 6 and S9, and Table S4).
For the monomeric systems, the residence time for 1UK3A and 1UK3B were shorter than the one for 6LU7 (left columns in Table S4). This showed that the ability of the monomeric SARS-CoV-2 3CLpro to keep water molecules in the site D1 was slightly higher than the monomeric SARS-CoV 3CLpro. Also, since the site D1 of the monomeric SARS-CoV-2 3CLpro had the two distinctive values of the residence time (e.g., and in Table S4). The site D1 appeared to form at least two water-bound modes. We referred to the two water-bound modes as “water-weakly-bound mode” (if ) and “water-strongly-bound mode” (), respectively. On the other hand, the site D1 in the monomeric SARS-CoV 3CLpro appeared to form only water-weakly-bound mode, because of the short residence time ().
For dimeric systems, the site D1 of one protomer was likely to hold water molecules for longer residence time than that of the other protomer (right columns in Table S4). For both SARS-CoV 3CLpro and SARS-CoV-2 3CLpro, the site D1 of one protomer was capable of forming the water-strongly-bound mode, whereas the site D1 of the other protomer formed the water-weakly-bound mode. Both protomers were unable to form the water-strongly-bound mode simultaneously (e.g., Fig. 6), but both were able to do the water-weakly-bound mode at the same time.
The comparison of the monomeric and dimeric systems demonstrated that the dimerization stabilized water molecules in the site D1, because the residence time of the monomeric systems was at most , whereas that of the dimeric systems was more than . Indeed, several water molecules settled in the site D1 for more than 800 ns (not residence time but simulation time). As further analysis, WaterMap method would also be useful to obtain insight into the hydration sites and drug design [49],[50].
It would be beneficial to reiterate the two features for the dimeric systems: (i) the longer residence time of water molecules in the site D1 than the monomeric systems; (ii) more stable dyad than the monomeric systems. The two results showed that the dimerization stabilized the dyad and hydration water in the site D1. Unexpectedly, the dyad stability appeared irrelevant to the residence time of the hydration water (Eqn S1 and Table S5), i.e., the hydration water was unlikely to stabilize the dyad. The hydration water molecules may have other roles in the catalytic activity, and we speculated the roles in the subsequent section.
Fig. 7 summarizes the relationship between the water molecules, the dyad, and the oligomeric states. Fig. 7A illustrates the mobility of the hydration water molecules in D1. The mobility depended on the oligomeric state of the proteases (Table S4 and Fig. S9). In a monomeric state, hydration water molecules did not stay in D1, and the dyad tended to be broken (Fig. 7B). The dimerization stabilized the water molecules in D1 and the dyad; the hydration water in D2 appeared to contribute the dimerization (Fig. 7C).
3.6. Speculative roles of hydration water
For further consideration, we conjectured three roles of the hydration water molecules. The first possibility is that the water molecules stabilize the dyad, thereby keeping the activity. The water molecules, however, seemed irrelevant to the dyad stability because of the conditional probabilities calculated from Table S5. For the dyad stability, the dimerization did matter (Fig. 4 and Table 3).
The second possibility is that some of the water molecules in D1 may transfer protons to C145 that is covalently bonded to a substrate, allowing the substrate to dissociate. It is thought that the catalytic process involves the hydrolysis of the covalent bond, which is attacked by a water molecule [51]. We speculated that a water molecule in D1 can donate a proton to the covalent bond easier than bulk water molecules does because of the steric hinderance of the covalently-bonded substrate.
The third possibility is that the hydration water may prevent the active site from being denatured by accepting the heat generated by the catalytic reaction and dissipating it into bulk. The hydration water molecules in the site D1 might act as a medium that gets and releases the heat into bulk. Earlier studies investigated the heat generated by a catalytic reaction of some enzymes: [52], [53], [54] The heat increased the enzyme's diffusivity [52],[53], and the energy of the heat was about 5 kcal/mol [54]. The results raise the question of why the resultant heat does not cause severe denaturation of the active site. We speculated that if the heat dissipates into the hydration water molecules in D1, the active site may not be intensely perturbed. Since the water molecules in D1 were mobile (Fig. 6 and S9), they were readily released into the bulk and stored again. In this way, the active site may be prevented from being overheated. For direct verification, future work will take the propagation of the heat into account. A computational method by Stock Gerhard group would be suitable, which applies local temperature jump and then keeps track of the energy propagation [55], [56], [57].
4. Conclusion
The investigation of SARS-CoV 3CLpro and SARS-CoV-2 3CLpro in an apo form provides fundamental insight into how they sustain their catalytic activity. In the present study, we simulated the two proteases. We revealed that major global modes were not conserved (Table 2). The changes in the modes would originate from the slight mutations on SARS-CoV-2 3CLpro. We observed that each active site in the dimeric 3CLpro obeyed different conformational distribution (Fig. 3). This may imply that each protomer has different substrate specificity. We found that the catalytic dyad in the dimeric systems tended to be retained more than that in the monomeric systems (Fig. 4 and Table 3), which indicated that the dimeric state is active.
Furthermore, we identified hydration water around the two proteases, the water which might be functionally relevant (Fig. 5). Hydration water molecules in the site D1 of a dimeric state stayed longer than of a monomeric state (Fig. 6). Their residence time depended on the oligomeric states of the two proteases. Since it is thought that the dimer is an active form, these hydration water molecules might be functionally relevant. These tendencies above were also observed in the systems with ionized dyads.
Lastly, we point out unsolved problems. Although the dimerization is crucial for the catalytic activity, the dimerization interface has a large interaction surface, which causes difficulty in targeting the surface. For future work, we illustrated important residues on the interface in “Contact Residues on Dimer Interface” in the supporting information. Plus, even though the allostery cause by several mutations was investigated,[58] the knowledge of the effects on the protease's dynamics is still not adequate for application. Future work should describe more (e.g., by pointing out allosteric pathways) to allow us to manipulate the protease's function. SARS-CoV-2 can mutate readily because it is an RNA virus that mutates more quickly than DNA viruses.[59] The understanding of the mutation effects on the activity is indispensable to a fight for the viruses’ drug-resistance.
CRediT authorship contribution statement
Shinji Iida: Conceptualization, Methodology, Investigation, Validation, Formal analysis, Visualization, Writing – original draft, Data curation. Yoshifumi Fukunishi: Conceptualization, Project administration, Supervision, Funding acquisition, Writing – review & editing.
Declarations of Competing Interest
None.
Acknowledgements
This work was supported by grants of the Development of core technologies for innovative drug development based upon IT from Japan Agency for Medical Research and Development (AMED). Our simulations were able to be performed on TSUBAME3.0 supercomputers at the Tokyo Institute of Technology via HPCI System Research Project (Project IDs: hp170020) and with TSUBAME Encouragement Program for Young/Female/ Younger Users (Project ID: tge-20IJ0047).
Footnotes
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.bbadva.2021.100016.
Appendix. Supplementary materials
References
- 1.Drosten C., et al. Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N. Engl. J. Med. 2003;348:1967–1976. doi: 10.1056/NEJMoa030747. [DOI] [PubMed] [Google Scholar]
- 2.Yang H., et al. Design of wide-spectrum inhibitors targeting coronavirus main proteases. PLoS Biol. 2005;3 doi: 10.1371/journal.pbio.0030324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Jin Z., et al. Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature. 2020;582:289–293. doi: 10.1038/s41586-020-2223-y. [DOI] [PubMed] [Google Scholar]
- 4.Dai W., et al. Structure-based design of antiviral drug candidates targeting the SARS-CoV-2 main protease. Science (80-.) 2020;368:1331–1335. doi: 10.1126/science.abb4489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Tripathi M.K., et al. Identification of bioactive molecule from Withania somnifera (Ashwagandha) as SARS-CoV-2 main protease inhibitor. J. Biomol. Struct. Dyn. 2020;0:1–14. doi: 10.1080/07391102.2020.1790425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tam N.M., et al. Binding of inhibitors to the monomeric and dimeric SARS-CoV-2 Mpro. RSC Adv. 2021;11:2926–2934. doi: 10.1039/d0ra09858b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ullrich S., Nitsche C. The SARS-CoV-2 main protease as drug target. Bioorg. Med. Chem. Lett. 2020;30 doi: 10.1016/j.bmcl.2020.127377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Goyal B., Goyal D. Targeting the dimerization of the main protease of coronaviruses: a potential broad-spectrum therapeutic strategy. ACS Comb. Sci. 2020;22:297–305. doi: 10.1021/acscombsci.0c00058. [DOI] [PubMed] [Google Scholar]
- 9.Yang H., et al. The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor. Proc. Natl. Acad. Sci. U. S. A. 2003;100:13190–13195. doi: 10.1073/pnas.1835675100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Chen H., et al. Only one protomer is active in the dimer of SARS 3C-like proteinase. J. Biol. Chem. 2006;281:13894–13898. doi: 10.1074/jbc.M510745200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kneller D.W., et al. Room-temperature X-ray crystallography reveals the oxidation and reactivity of cysteine residues in SARS-CoV-2 3CL Mpro: Insights into enzyme mechanism and drug design. IUCr J. 2020;7:1028–1035. doi: 10.1107/S2052252520012634. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zhang L., et al. Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors. Science. 2020;368(80-):409–412. doi: 10.1126/science.abb3405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.El-Baba T.J., et al. Allosteric inhibition of the SARS-CoV-2 main protease: insights from mass spectrometry based assays**. Angew. Chemie - Int. 2020;59:23544–23548. doi: 10.1002/anie.202010316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Hsu W.C., et al. Critical assessment of important regions in the subunit association and catalytic action of the severe acute respiratory syndrome coronavirus main protease. J. Biol. Chem. 2005;280:22741–22748. doi: 10.1074/jbc.M502556200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lin P.Y., Chou C.Y., Chang H.C., Hsu W.C., Chang G.G. Correlation between dissociation and catalysis of SARS-CoV main protease. Arch. Biochem. Biophys. 2008;472:34–42. doi: 10.1016/j.abb.2008.01.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Liang J., et al. Site mapping and small molecule blind docking reveal a possible target site on the SARS-CoV-2 main protease dimer interface. Comput. Biol. Chem. 2020;89 doi: 10.1016/j.compbiolchem.2020.107372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kneller D.W., et al. Unusual zwitterionic catalytic site of SARS–CoV-2 main protease revealed by neutron crystallography. J. Biol. Chem. 2020;295:17365–17373. doi: 10.1074/jbc.AC120.016154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Jin Z., et al. Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature. 2020;582:289–293. doi: 10.1038/s41586-020-2223-y. [DOI] [PubMed] [Google Scholar]
- 19.Kneller D.W., et al. Structural plasticity of SARS-CoV-2 3CL Mpro active site cavity revealed by room temperature X-ray crystallography. Nat. Commun. 2020;11:7–12. doi: 10.1038/s41467-020-16954-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Iida S., Nakamura H.K., Mashimo T., Fukunishi Y. Structural fluctuations of aromatic residues in an apo-form reveal cryptic binding sites: implications for fragment-based drug design. J. Phys. Chem. B. 2020;124:9977–9986. doi: 10.1021/acs.jpcb.0c04963. [DOI] [PubMed] [Google Scholar]
- 21.Biasini M., et al. SWISS-MODEL: Modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 2014;42 doi: 10.1093/nar/gku340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Hancock J.M., Zvelebil M.J., Dunbrack R. Dictionary of Bioinformatics and Computational Biology. 2004. SwissModel. [DOI] [Google Scholar]
- 23.Páll S., Hess B. A flexible algorithm for calculating pair interactions on SIMD architectures. Comput. Phys. Commun. 2013;184:2641–2650. [Google Scholar]
- 24.Hess B., Bekker H., Berendsen H.J.C., Fraaije J.G.E.M. LINCS: a linear constraint solver for molecular simulations. J. Comput. Chem. 1997;18:1463–1472. [Google Scholar]
- 25.Bussi G., Donadio D., Parrinello M. Canonical sampling through velocity rescaling. J. Chem. Phys. 2007;126 doi: 10.1063/1.2408420. [DOI] [PubMed] [Google Scholar]
- 26.Berendsen H.J.C., Postma J.P.M., van Gunsteren W.F., DiNola A., Haak J.R. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 1984;81:3684–3690. [Google Scholar]
- 27.Parrinello M., Rahman A. Polymorphic transitions in single crystals: a new molecular dynamics method. J. Appl. Phys. 1981;52:7182–7190. [Google Scholar]
- 28.Nosé S., Klein M.L. Constant pressure molecular dynamics for molecular systems. Mol. Phys. 1983;50:1055–1076. [Google Scholar]
- 29.Lindorff-Larsen K., et al. Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins Struct. Funct. Bioinforma. 2010;78:1950–1958. doi: 10.1002/prot.22711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Jorgensen W.L., Chandrasekhar J., Madura J.D., Impey R.W., Klein M.L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983;79:926–935. [Google Scholar]
- 31.Tironi I.G., Sperb R., Smith P.E., Van Gunsteren W.F. A generalized reaction field method for molecular dynamics simulations. J. Chem. Phys. 1995;102:5451–5459. [Google Scholar]
- 32.Páll S., Abraham M.J., Kutzner C., Hess B., Lindahl E. Tackling exascale software challenges in molecular dynamics simulations with GROMACS. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2015;8759:3–27. [Google Scholar]
- 33.Berendsen H.J.C., van der Spoel D., van Drunen R. GROMACS: a message-passing parallel molecular dynamics implementation. Comput. Phys. Commun. 1995;91:43–56. [Google Scholar]
- 34.Abraham M.J., et al. Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1–2:19–25. [Google Scholar]
- 35.Lindahl E., Hess B., van der Spoel D. GROMACS 3.0: a package for molecular simulation and trajectory analysis. J. Mol. Model. 2001;7:306–317. [Google Scholar]
- 36.Pronk S., et al. GROMACS 4.5: A high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics. 2013;29:845–854. doi: 10.1093/bioinformatics/btt055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Michaud-Agrawal N., Denning E.J., Woolf T.B., Beckstein O. MDAnalysis: A toolkit for the analysis of molecular dynamics simulations. J. Comput. Chem. 2011;32:2319–2327. doi: 10.1002/jcc.21787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Gowers R., et al. MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations. Proc. 15th Python Sci. Conf. 98–105. 2016 doi: 10.25080/Majora-629e541a-00e. [DOI] [Google Scholar]
- 39.DeLano L. W. PyMOL: An Open-Source Molecular Graphics Tool. Ccp4 Newslett. Protein Crystallogr. 2002 http://www.ccp4.ac.uk/newsletters/newsletter36.pdf [Google Scholar]
- 40.Hanson B.S., et al. Continuum mechanical parameterisation of cytoplasmic dynein from atomistic simulation. Methods. 2020;185:39–48. doi: 10.1016/j.ymeth.2020.01.021. [DOI] [PubMed] [Google Scholar]
- 41.Ernst M., Sittel F., Stock G. Contact- and distance-based principal component analysis of protein dynamics. J. Chem. Phys. 2015;143 doi: 10.1063/1.4938249. [DOI] [PubMed] [Google Scholar]
- 42.McDonald J.H. Fisher's exact test of independence. Handb. Biol. Stat. 2014;9 doi: 10.1002/0470011815.b2a10020. [DOI] [Google Scholar]
- 43.Chicco D., Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;21(6) doi: 10.1186/s12864-019-6413-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Liu P., Harder E., Berne B.J. On the calculation of diffusion coefficients in confined fluids and interfaces with an application to the liquid-vapor interface of water. J. Phys. Chem. B. 2004;108:6595–6602. [Google Scholar]
- 45.Tan J., et al. pH-dependent conformational flexibility of the SARS-CoV main proteinase (Mpro) dimer: Molecular dynamics simulations and multiple X-ray structure analyses. J. Mol. Biol. 2005;354:25–40. doi: 10.1016/j.jmb.2005.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Behnam M.A.M. Protein structural heterogeneity: A hypothesis for the basis of proteolytic recognition by the main protease of SARS-CoV and SARS-CoV-2. Biochimie. 2021;182:177–184. doi: 10.1016/j.biochi.2021.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Suárez D., Díaz N. SARS-CoV-2 Main Protease: A Molecular Dynamics Study. J. Chem. Inf. Model. 2020 doi: 10.1021/acs.jcim.0c00575. [DOI] [PubMed] [Google Scholar]
- 48.Chou C.Y., et al. Quaternary structure of the severe acute respiratory syndrome (SARS) coronavirus main protease. Biochemistry. 2004 doi: 10.1021/bi0490237. [DOI] [PubMed] [Google Scholar]
- 49.Wang L., Berne B.J., Friesner R.A. Ligand binding to protein-binding pockets with wet and dry regions. Proc. Natl. Acad. Sci. 2011;108:1326–1330. doi: 10.1073/pnas.1016793108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Fernandez A. WaterMap originates in dehydron-based drug design. J. Pharmacogenomics Pharmacoproteomics. 2017 doi: 10.4172/2153-0645.100e156. [DOI] [Google Scholar]
- 51.Wang H., et al. Comprehensive Insights into the Catalytic Mechanism of Middle East Respiratory Syndrome 3C-Like Protease and Severe Acute Respiratory Syndrome 3C-Like Protease. ACS Catal. 2020;10:5871–5890. doi: 10.1021/acscatal.0c00110. [DOI] [PubMed] [Google Scholar]
- 52.Wand A.J. Enzymes surf the heat wave. Nature. 2015;517:149–150. doi: 10.1038/nature14079. [DOI] [PubMed] [Google Scholar]
- 53.Riedel C., et al. The heat released during catalytic turnover enhances the diffusion of an enzyme. Nature. 2015;517:227–230. doi: 10.1038/nature14043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Dobry A., Sturtevant J.M. Heats of hydrolysis of amide and peptide bonds. J. Biol. Chem. 1952;195:141–147. [PubMed] [Google Scholar]
- 55.Nguyen P.H., Park S.M., Stock G. Nonequilibrium molecular dynamics simulation of the energy transport through a peptide helix. J. Chem. Phys. 2010;132 doi: 10.1063/1.3284742. [DOI] [PubMed] [Google Scholar]
- 56.Park S.M., Nguyen P.H., Stock G. Molecular dynamics simulation of cooling: heat transfer from a photoexcited peptide to the solvent. J. Chem. Phys. 2009;131:37–40. doi: 10.1063/1.3259971. [DOI] [PubMed] [Google Scholar]
- 57.Gulzar A., Valiño Borau L., Buchenberg S., Wolf S., Stock G. Energy transport pathways in proteins: a non-equilibrium molecular dynamics simulation study. J. Chem. Theory Comput. 2019;15:5750–5757. doi: 10.1021/acs.jctc.9b00598. [DOI] [PubMed] [Google Scholar]
- 58.Lim L., Shi J., Mu Y., Song J. Dynamically-driven enhancement of the catalytic machinery of the SARS 3C-Like Protease by the S284-T285-I286/A mutations on the extra domain. PLoS One. 2014;9 doi: 10.1371/journal.pone.0101941. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Peck K.M., Lauring A.S. Complexities of Viral Mutation Rates. J. Virol. 2018 doi: 10.1128/jvi.01031-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.