Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Oct 16.
Published in final edited form as: J Chem Inf Model. 2020 Sep 16;60(10):5080–5102. doi: 10.1021/acs.jcim.0c00634

Impact of early pandemic stage mutations on molecular dynamics of SARS-CoV-2 Mpro

Olivier Sheik Amamuddy 1, Gennady M Verkhivker 2,3,4, Özlem Tastan Bishop 1,*
PMCID: PMC7496595  NIHMSID: NIHMS1922616  PMID: 32853525

Abstract

The new coronavirus (SARS-CoV-2) is a global threat to the world health and economy. Its dimeric main protease (Mpro), which is required for the proteolytic cleavage of viral precursor proteins, is a good candidate for drug development owing to its conservation and the absence of a human homolog. Improving our understanding of Mpro behaviour can accelerate the discovery of effective therapies to reduce mortality. 100 ns all-atom molecular dynamics simulations of 50 mutant Mpro dimers obtained from filtered sequences from the GISAID database were analysed using RMSD, RMSF, Rg, the averaged betweenness centrality and geometry calculations. The results showed that SARS-CoV-2 Mpro essentially behaves in a similar manner to its SAR-CoV homolog. However, we report the following new findings from the variants: (1) residues GLY15, VAL157 and PRO184 have mutated more than once in SARS CoV-2; (2) the D48E variant has lead to a novel “TSEEMLN”” loop at the binding pocket; (3) inactive apo Mpro does not show signs of dissociation in 100 ns MD; (4) a non-canonical pose for PHE140 widens substrate binding surface; (5) dual allosteric pockets coinciding with various stabilising and functional components of the substrate binding pocket were found to display correlated compaction dynamics; (6) high betweenness centrality values for residues 17 and 128 in all Mpro samples suggest their high importance in dimer stability - one such consequence has been observed for the M17I mutation whereby on of the N-fingers was highly unstable. (7) Independent coarse-grained Monte Carlo simulations suggest a relationship between rigidity/mutability and enzymatic function.

Our entire approach combining database preparation, variant retrieval, homology modelling, Dynamic Residue Network (DRN), relevant conformation retrieval from 1-D kernel density estimates from reaction coordinates, to other existing approaches of structural analysis and data visualisation within the coronaviral Mpro is also novel and is applicable to other coronaviral proteins.

Graphical Abstract

graphic file with name nihms-1922616-f0020.jpg

1. Introduction

The human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strain is the causative agent of the COVID-19 pandemic.1 After being first reported from the Wuhan seafood and animal market in late December 2019,2 the total number of reported cases worldwide has reached over 21 million, with observed case-fatality rates ranging between 0.8 and 14.5% among the most affected countries.3 The disease is however more severe amongst the elderly, and those living with co-morbidities that involve endothelial dysfunction,4 such as hypertension, obesity and diabetes.5 While there is currently no cure,2,6 the drugs dexamethasone and remdesivir have both been recently reported to have significantly positive effects in clinical trials.7,8 The World Health Organisation also reports 160 candidate vaccines being at different stages of clinical evaluation.9 While this is very encouraging, there is a lot of uncertainty around the behaviour of the pathogen. Drastic measures designed to limit the rate of new infections10 have resulted in global economic problems, which have affected many livelihoods, even exacerbating food insecurity.11

Owing to the rapid generation of genomic sequence data12,13 and the timely availability of 3D structural data, research into potential drugs is greatly accelerated. Fundamental research is key to understanding the pathogen’s strategies such that more informed decisions can be made about clinical interventions. As seen in other pathogens, mutations occur through the normal process of evolution, and certain advantageous variations can be selected for over time. The SARS-CoV-2 genome is RNA-based, and viruses from this category have been reported to have increased rates of mutation.14 For instance, in HIV this has lead to several levels of classification of the virus, in which certain strains can manifest different transmissibility patterns and show differing responses to existing therapies.15,16 Nevertheless, coronaviruses comprise an RNA-dependent RNA polymerase genetic proofreading machinery17,18, which may be responsible for their reduced genetic diversity, even though the persistence of the pandemic may allow for the accumulation of immunologically important mutations.3 From the data gathered from the GISAID database12 and real-time sub-sample estimates of genetic relatedness from the Nextstrain web resource,19 it is clear that the virus is evolving within the human host. More recently, a SAR-CoV-2 spike variant was reported to occur asynchronously in multiple geographic locations with the capacity for expressing higher viral concentrations in the upper respiratory tract3, hinting to some form of selection.

Although progress is being gradually made in understanding the viral structural biology and symptomatology of the disease, current knowledge is still fragmentary.20,21 The death toll and the number of infections keep on rising albeit at variable rates in different parts of the world. Thus, time is of the essence for the discovery of effective therapies. It is imperative to better characterise parts of the viral mechanisms to better understand the behaviour of the new coronavirus. Already, with the help of experimentally determined structures, genomic data and annotations, a growing number of in silico work is suggesting potential solutions to the COVID-19 pandemic using various techniques, including the use of molecular docking2227 and dynamics (MD)2836, network-analysis3743 and machine learning.2,44,45 Collectively, these may pave the way to a potential solution by prioritising inhibitors or drug targets for against COVID-19. Most of the mentioned MD research has been focussed on the application of the technique to filter and prioritise hit compounds from small compound docking. Suárez and colleagues however, have investigated apo and substrate-bound Mpro using 2μs MD simulations in monomeric and dimeric form, to reveal instability of domain III in the former, using 4 protease crystal structures from SARS-CoV-236. In this study we examine some of the early mutations of SARS-CoV-2 Mpro, and investigate different aspects of their dynamics using MD.

The Mpro enzyme, also known as the 3C-like protease, is one of the best studied drug targets among the coronaviruses.46 This is mainly due to the similarities in their active site and their mechanisms with the related pathogenic betacoronaviruses from previous epidemics of SARS-CoV and MERS-CoV (Middle East respiratory syndrome coronavirus).47 Mpro is a conserved drug target present in all members of the Coronavirinae subfamily48,49 and is highly similar to its SARS-CoV counterpart.47 SARS-CoV-2 Mpro does not have a human homolog,39 which reduces the chances of accidentally targeting host proteins. Alongside the papain-like protease (PLP) enzyme, Mpro plays an essential role in the process of viral maturation,2 cleaving the large precursor replicase polyprotein 1ab to produce 16 non-structural proteins.2,50 The cysteine protease functions as a homodimer and mainly comprises three domains (I-III).2 Homo-dimerisation plays an important role in the catalytic activity of Mpro, as reported in the case of the SARS-CoV Mpro homolog, where the G11A mutation completely abolished its activity by interfering with the insertion of the N-finger region (residues 1-9).51 Additionally, only one of the dimers was shown to be active at a time in52 SARS-CoV Mpro. At the N-terminus the chymotrypsin-like domain I (residues 10-99) is connected to the picornavirus 3C-protease like domain II (residues 100-182), which together form a hydrophobic substrate binding site, with catalytic residues HIS41 and CYS145.50,53 Domain III (residues 198-303), also referred to as the helical domain) is connected to domain II54 by a 15 residue linker loop, and was shown to regulate enzymatic activity in SARS-CoV55. While each domain minimally contacts its equivalent domain from the alternate chain, the majority of the dimer contact interface is a result of interactions present between domain II (chain A) and the N-finger (chain B).50 In the same manner, the N-finger from chain A contacts domain II from chain B. Each chain is referred to as a protomer.34,42,50

In this work we study the collective effects of various Mpro mutations from a filtered sample of 50 SARS-CoV-2 isolates by first mapping them on 3D structures and performing all-atom MD for each of the mutants, in addition to the reference protein. All-atom simulations were carried out at a constant protonation state initially corresponding to a pH of 7. Multiple aspects of the protein dynamics were analysed using a battery of techniques, including the averaged betweenness centrality (BC) - a metric of Dynamic Residue Network (DRN) analysis, dynamic cross-correlation (DCC),56 geometry calculations (inter-domain angles and interprotomer distances) based on the centre of mass (COM), cavity compaction analyses, the anisotropic network model (ANM) and the analysis of residue and backbone fluctuations. A new metric based on the Jaccard’s coefficient was developed to sort non-Gaussian distributions to facilitate the differentiation of large numbers such curves obtained from MD reaction coordinates. Coarse-grained Monte Carlo simulations were also independently investigated.

2. Material and Methods

2.1. Sequence and template retrieval

A high resolution (1.48 Å) biological unit for crystal structure of the SARS-CoV-2 main protease (PDB ID: 5RFV57) was retrieved from the Protein Data Bank (PDB)58 to be used as a template for homology modelling. Its sequence was used as reference throughout this work. PyMOL (version 2.4)59 was used to remove any non-protein molecules and to reconstitute the biological unit as chains A and B. SARS-CoV-2 genomes of any length were acquired from the GISAID website as a FASTA-formatted file.12

2.2. Building the mutation data set

Low coverage sequences (entries with >5% unknown nucleotides) from GISAID were not selected. A local BLAST database was then set up for these sequences using the makeblastdb command available from the BLAST+ application (version 2.8.1).60 Protease mutants were subsequently retrieved using the reference sequence as query parameter for the tblastn command with default parameters, except for the maximum number of target sequences, which was set at 10000. Identical sequences were then filtered out before selecting BLAST hits that had a 100% sequence coverage and a percentage sequence identity of < 100%. Sequences coming from non-human hosts were discarded. In order to retain as much data as possible and minimize the incorporation of sequencing errors, fold coverage was used where provided, while quality was imputed on the basis of an identical Mpro sequence being present more than once in the filtered dataset for samples not possessing sequencing coverage information.

This resulted in 50 mutant sequences with either high coverage, where available or with additional sources of support otherwise. Further details about the samples are given in the supplementary acknowledgement Table S1.

2.3. Homology modelling, pH adjustment and analysis of residue interactions

PIR-formatted target-template sequence alignment files were generated for each mutant using the BioPython library (version 1.76)61 within ad hoc Python scripts for use in MODELLER (version 9.22).62 The automodel class was used with slow refinement and a deviation of 2 Å to generate 12 models in parallel for each mutant, after which the ones with the lowest z-DOPE scores were retained. The protein was then adjusted to a pH of 7 using the PROPKA algorithm from the PDB2PQR tool (version 2.1.1).63 For visualising the overall interactions at given residue positions, the Arpeggio tool64 was used to programmatically generate the inter-residue interactions, before computing their sums using an in-house Python script. More generally, the Discovery Studio Visualizer (version 19.1) was used for describing the non-bonded interactions.65

2.4. Molecular dynamics simulations

All-atom protein MD simulations (using the AMBER03 force field) were run for the protonated dimers using GROMACS (version 2016.1)66 at the Center for High Performance Computing (CHPC). Proteins were placed in a triclinic box containing 0.15M NaCl in SPC modelled water. A minimum image distance of 1.5 nm between the solute and the box was used. The system was then energy minimized using the steepest descent algorithm with an initial step size of 0.01 nm for a maximum force of 1000 kJ/mol/nm and a maximum of 50000 steps. Temperature was subsequently equilibrated at 310 K for 50 ps, according to the NVT ensemble. Pressure was then equilibrated at 1 bar for 50 ps, using the Berendsen algorithm according to the NPT ensemble. During both NVT and NPT, the protein was position restrained and constraints were applied on all bonds. 100 ns unrestrained production runs were then performed, with constraints were applied only on H-bonds, and the Parrinello-Rahman algorithm was used for pressure coupling. In all cases a time step of 2 fs was used, with a short-range non-bonded cut-off distance of 1.1 nm and the PME algorithm for long-range electrostatic interaction calculations. Prior to analysis, the periodic boundary conditions were removed and the trajectories were corrected for rotational and translational motions.

2.5. Coarse-grained simulations

Coarse-grained (CG) models enable simulations of long timescales for protein systems and assemblies, and represent a computationally effective strategy for adequate sampling of the conformational space while maintaining physical rigour. The CABS model was employed for multiple CG simulations6771 of the SARS-CoV-2 main protease dimer structures (PDB ID: 5RFV, 6Y2E, and 6Y2F50). In this model, the CG representation of protein residues is reduced to four united atoms. The residues are represented by main-chain α-carbons (Cα), β-carbons (Cβ), the COM of side chains and another pseudoatom placed in the centre of the Cα-Cα pseudo-bond. The sampling scheme involved Monte Carlo (MC) dynamics moves including local moves of individual residues and moves of small fragments composed of 3 protein residues. 100 independent CG simulations were carried out for each studied system with the CABS-flex standalone Python package for fast simulations of protein dynamics, which is implemented as a Python 2.7 object-oriented package.71 In each simulation, the total number of cycles was set to 1,000 and the number of cycles between trajectory frames was 100. Accordingly, the total number of generated models was 2,000,000 and the total number of saved models in the trajectory used for analysis was 20,000. It was previously shown that the CABS-flex approach can accurately recapitulate all-atom MD simulations on a long timescale.6771 The results of 100 independent CG-CABS simulations for each system were averaged to obtain adequate sampling and ensure convergence of simulation runs.

2.6. Dynamic Residue Network analysis, dynamic cross correlation and normal mode analysis

The MD-TASK tool kit56 was used to calculate the residue averaged BC values over the last 50 ns of simulation for each proteins using a cut-off distance of 6.70 Å and a step size of 25, generating a total of 10,001 frames. DCC was calculated for each of the proteins using the same frames and time step, before linearising each matrix. Pairwise Pearson correlations were then performed for all linearised matrices before performing hierarchical clustering. In all cases, the Cβ and glycine Cα atoms were used. The GROMACS commands trjconv and make_ndx were utilised to reduce the trajectory sizes, to only keep Cα and Cβ atoms prior to computation. Normal modes were obtained from Cα using ProDy,72 with default parameters for the ANM.

2.7. Pocket detection and dynamic analysis

The reconstituted biological unit for the reference structure was submitted to the FTMap web server73 using default parameters. The PyMOL plugin PyVOL74 was then used to identify the surfaces of any potential cavity, specifying the protein as selection, with the default minimum volume of 200 Å3. Predictions from both tools were combined. Residues for the interprotomer subpocket were defined by visually inspecting residues in close proximity to the cavity surfaces detected by PyVOL that overlapped with part of the FTMap probe binding predictions. The pocket was selected due to its location and accessibility to the outside. The substrate binding residues (identified using PyVOL) from both protomers of Mpro were also investigated due their functional importance in catalysis, even though FTMap did not identify a binding hotspot at that location. The radius of gyration (Rg) for the subpocket and the binding pockets was then computed from the entirety of the simulated MD data in each case, using the GROMACS gyrate command. The generated data was then visualised and analysed using various open source Python libraries, such as matplotlib,75 Seaborn, Pandas,76 NumPy,77 SciPy,78 MDTraj79 and NGLview.80

2.8. Coupling peak finding to an adapted Jaccard distance metric to rank and extract equilibrium conformations from KDE distributions of MD measurements

In order to facilitate the comparison of distributions of kernel density estimation (KDE) from a large number of simulations, a new metric based the Jaccard distance (dJ) was devised and applied to rank both Gaussian and non-Gaussian distributions obtained from MD measurements based on their distance from a comparator. Non-Gaussian curves can generally be observed from a collective variable when multiple equilibrium states are sampled (leading to multimodal distributions) or when there is a skewed distribution. The metric assumes no similarity of variance or distribution shapes, but requires a range [xmin,xmax] common to both samples to be set for the KDE fitting calculations and a common area under curve, here having a value of one. KDEdJ has a minimum of zero and a maximum of one, corresponding to complete dissimilarity and identity, respectively. Scott’s rule, defined by the equation (n1d+4), was used for Gaussian kernel bandwidth selection, where n and d are the sample size and number of dimensions78,81. The fitting range was set to the global minimum and maximum values of the complete dataset to be compared. The equation used, which is based on numerical approximation of the integrals, is defined as:

KDEdJ=1xmin,g(x)f(x)xmaxf(x)+xmin,f(x)>g(x)xmaxg(x)xmin,f(x)g(x)xmaxf(x)+xmin,g(x)>f(x)xmaxg(x) (1)

where f(x) and g(x) are the kernel density estimates of the reference and test samples. The metric was applied to rank the mutant distributions according to their distance from the reference Mpro sample in the case of Cα RMSD and COM distances distributions.

Thereafter a straightforward 1D peak finding algorithm was applied to detect the modes from the ranked samples. The KDE modes correspond to zones in a conformational landscape about which structures tend to visit more frequently. Only peaks above a set minimum KDE estimate (ϵ = 0.05) were used to limit the detection insignificantly low peaks. Using this approach, equilibrium conformations corresponding to various energy minima along given reaction coordinates were extracted.

3. Results and discussion

3.1. Analysis of residue mutations and their distribution in the 3D structure

As a preliminary investigation of the propensity of the sequences to acquire novel mutations, unique residue mutations were determined across our set of 50 protein sequences, which were filtered from the GISAID database.12 While we cannot infer population frequencies (across the world) from our relatively small sample, we show from our estimate that multiple missense mutations have already occurred on each domain of the Mpro (Figure 1). These mutations are shown in Table 1 and further experimental support is reported in supplementary Table S1. From these, it can be observed that many mutations are inter-conversions of the hydrophobic side-chain residues alanine and valine. As seen in Figure 1, most residue mutations have occurred in solvent-accessible surfaces, with the exception of A7V, V20L, L89F, A116V, A129V, T135I, I136V, V157I/L, C160S, A173V, T201A, A234V and A266V, which were predicted to be buried by the PyMOL script findSurfaceResidues, using the default cut-off of 2.5 Å2.

Figure 1:

Figure 1:

Mapping of the positions showing unique mutations from the reference Mpro sequence. For clarity, domains (I-III) are coloured (red, blue and orange respectively) only for one of the monomers, while the other is represented as a grey surface. The domain linker region is in green and the N-finger is in cyan. The size of the labels denotes the number of unique mutations recorded at that position.

Table 1:

List of missense mutations in SARS-CoV-2 Mpro. Samples with noticeably large differences in Cα RMSD from the reference protease are marked with an asterisk.

GISAID ID Mutation GISAID ID Mutation GISAID ID Mutation
EPI_ISL_425319* A7V EPI_ISL_425284* A116V EPI_ISL_424470* T196M
EPI_ISL_420422 G15D EPI_ISL_421005* P108S EPI_ISL_423642 T201A
EPI_ISL_420181 G15S EPI_ISL_422860* A129V EPI_ISL_419256* L220F
EPI_ISL_425242 G15S,D48E EPI_ISL_420579 P132L EPI_ISL_421506 L232F
EPI_ISL_423772* M17I EPI_ISL_425655 T135I EPI_ISL_425235 A234V
EPI_ISL_425342 V20L EPI_ISL_420182 I136V EPI_ISL_426097* K236R
EPI_ISL_421312* T45I EPI_ISL_420510* N151D EPI_ISL_416720* Y237H
EPI_ISL_425839* M49I EPI_ISL_415503 V157I EPI_ISL_425886* D248E
EPI_ISL_418269* R60C EPI_ISL_426028 V157L EPI_ISL_418075 A255V
EPI_ISL_420306 K61R EPI_ISL_417413 C160S EPI_ISL_422919 I259T
EPI_ISL_421763* A70T EPI_ISL_418082 A173V EPI_ISL_423725* A260V
EPI_ISL_413021 G71S EPI_ISL_423288 P184S EPI_ISL_425498* V261A
EPI_ISL_415643 L89F EPI_ISL_420241* P184L EPI_ISL_421380* A266V
EPI_ISL_420059 K90R EPI_ISL_419710* A191V,L220F EPI_ISL_420610* N274D
EPI_ISL_419756 P99L EPI_ISL_415610* A193V EPI_ISL_425643 R279C
EPI_ISL_425132 Y101C EPI_ISL_421515* T198I EPI_ISL_422184* S301L
EPI_ISL_419984* R105H EPI_ISL_423007 T190I

Current genomic sequence data suggests a generally low rate of mutation in SARS-CoV-282,83. This being said, in the case of the Mpro enzyme, a relatively higher rate of non-synonymous mutations has occurred at residue position 15 (G15D/S) in domain I, residue position 157 (V157I/L) in domain II and at position 184 (P184L/S) within the inter-domain linker region. On the 3D structure, it can be seen that these mutable areas occur away from the core areas of domains I and II, hence the probable lower selective pressure for these regions. For this reason, we posit that individually, these loci may be less important for basic enzymatic function. Additionally, samples EPI_ISL_425242 and EPI_ISL_419710 have accumulated two mutations each. Of notable interest is the mutation of the active site flap residue 48 from the “TSEDMLN” loop described by Ma and coworkers84 to “TSEEMLN”” in sample EPI_ISL_425242. Earlier, the homologous region displayed the motif “TAEDMLN” in SARS-CoV. It possible that this may affect the affinity of the enzyme for its substrate and play a role in the virus’s level of fitness. Residue 15 mutations are examined in section 3.2.

Mutation rates of RNA viruses are known to be generally high, endowing them with the ability to escape host immune responses, improve their virulence, and even change tissue tropism.14,85 Extinction events are not uncommon among the RNA viruses, as seen in the influenza A H1N1 strains86 and the previous SARS-CoV strain,87 and are may be associated with the gradual accumulation of non-synonymous mutations. On the other hand, a reduction in replicative speed and genetic diversity was independently observed in the poliovirus 3DG64S mutant compared to its wild-type (WT) when rates of mutation were artificially increased by exposure to a mutagen.14,88,89 In the case of the HIV, multiple mutations of minor effect are known to collaboratively modulate the effect of other advantageous mutations already present in the protease.90,91 As a newly emerged pathogen with a relatively long incubation period and an incompletely understood biology, these facts from related viruses presuppose a potentially complex mechanism of viral evolution and adaptation, which suggests that mutations have to be closely monitored for global health and security. It is interesting to note that amongst the buried residue mutations, domain II has accumulated the highest number of these mutations in such little time, which may suggest a certain degree of tolerance to mutations in that region, despite their presence within beta strands. In the same domain, the A116V mutation occurs on a beta strand, which is supported by a rich network of hydrogen bonds. The local impacts of the A116V mutation are discussed further in section 3.2.

While several mutations are present in domain III, the current data suggests that these have not yet (at the time of writing) further evolved at these positions, and probably suggests that there might a higher fitness cost involved with mutating residues from this domain. This may be due to its major role in regulating enzyme activity, together with the stabilising N-finger region.53 It is also possible that under-sampling may have lead to a similar observation. Most of the mutations have occurred in solvent-exposed surfaces of the protein, which may be under reduced selective pressure, especially if these are in loop regions. On the other hand, interfacial residue mutations (particularly at the chain interface) may exert their effects in a more exacerbated manner by either over-stabilising or destabilising protein-protein interactions, as reported in work on other proteins.92,93 One such mutation has already happened at position 7 (A7V) in the N-finger. The local impacts of the A7V mutation are discussed in section 3.2.

3.2. Homology modelling and inspection of residue protonation states

After the preliminary analysis of Mpro at the sequence level, their 3D structures were built for further investigation. The z-DOPE scores for the best homology models obtained for each sample were all below −1, indicating that they were all native-like.62,94 Overall, z-DOPE values had a minimum of −1.48, a maximum of −1.38, with a median of −1.42. While it is not possible to simulate changes in protonation state using classical MD, an initial approximation of the most prevalent residue protonation states (at pH 7) was used for each of the Mpro samples. As there was a relatively higher level of variation in residue protonation states amongst each of the seven histidine residues found in each Mpro protomer, only the catalytic residue and the missense mutations are described herein, post homology modelling. We suspect that the high number of titratable amino acids may play a role in influencing protein behaviour at varying pH levels. Previous work in SARS-CoV indeed reports of a pH-dependent activity-switch of the main protease.95 In our case, the catalytic residue HIS41 was generally protonated at the delta nitrogen (HID) atom, but also occurred in its fully protonated state (HIP) in one of the protomers for samples EPI_ISL_419710 and EPI_ISL_425655. Protonated aspartic acid (ASH) was found in both protomers of sample EPI_ISL_420510, which was the only isolate to contain the N151D mutation. ASH was also found in sample EPI_ISL_421312, for only one of its protomers at residue position 289, which otherwise occurs in its deprotonated form in all other samples. The R105H mutation (present only in sample EPI_ISL_419984) occurs as a HID in each protomer. Similarly the Y237H mutation, present only in sample EPI_ISL_416720 occurs as HID in both of its protomers. The local residue interactions around residue mutations of interest were also investigated, by comparing them with their equivalent position in the Mpro reference, using their modelled structure. These mutations comprised residues that underwent a higher number of mutations (G15D/S, V157I/L and P184L/S) in addition to mutations that occurred at or close to the dimer interface (A7V and A116V). A7V significantly increased the number of proximal interactions to neighbouring residues when compared to the reference protein (Figure 2), and gaining in hydrophobic interactions, although a clash in van der Waal radius is additionally present.

Figure 2:

Figure 2:

Differences in the sum of each interaction type for the Mpro, only for the residue locations that either accumulated more than one non-synonymous mutation or the ones occurring close to protein interfaces. The differences were obtained by subtracting the reference values from the matching residue loci in the mutant. Sample names and the selected residue mutations are shown along the y-axis.

G15D is found to increase the number of proximal contacts by the largest extent, while also modestly increasing the number of hydrophobic interactions. By replacing alanine with valine at position 116, an increased amount of proximal interactions is gained at V116 in sample EPI_ISL_425284, with a higher amount of hydrophobic contacts to the residue. Mutations V157I/L both reduced the number of local hydrophobic contacts and resulted in a reduced van der Waal clash compared to the reference. P184L had a reduced number of proximal contacts compared to the reference, while P184S was very similar to its equivalent position in the reference. These observations only give a general indication of the local changes present in the static starting structures. In the sections that follow, the dynamic aspects of Mpro are investigated.

3.3. Estimation of the protein backbone flexibility from MD using Cα RMSD

The Cα RMSD values obtained after frame fitting to the initial frames and periodic image correction (3A) revealed a relatively high range of divergence from the reference Mpro among the 50 isolates, ranging from 15% to 94%. From the shapes of the kernel density plots, it can be seen that some mutants may be equilibrating around single energy minima (for uni-modal distributions), while others are selecting for multiple major conformations, as depicted by the multi-modal shapes. In all, these samples involve mutations A7V, M17I, A70T, A116V, K236R, Y237H, D248E, A266V and N274D, in which the last five are exclusive to domain III. A7V occurs on the N-finger, which is a critical region for Mpro dimer stability. M17I occurs on an internal loop that connects a β strand to a helix in domain I, while A70T occurs on solvent-exposed loop in the same domain. Mutation A116V occurs in a buried β strand within domain II. From their visibly positively shifted upper quartiles (and based on the relatively large KDE dJ and the visibly higher modes), we may infer that more mobile backbones were sampled for mutants EPI_ISL_421380, EPI_ISL_423772 and EPI_ISL_425886, compared to the reference. Upon visualising the trajectories, a slight, global twisting motion was observed between their protomers, whereby each protomer moved in opposite directions, whilst being tethered at the centre. However, a similar motion was also generally observed in all other samples, and is summarised using the first non-trivial mode (number 7) from the ANM using the highest probability equilibrium pose from the reference (Figure 3B). This may indicate that the global twisting motions are a normal behaviour of dimeric Mpro, at least under our simulated conditions for the apo state. On the other hand, the second non-trivial ANM mode (number 8) showed the domain rotational motions - they are similar overall for both the reference (Figure 3C) and in EPI_ISL_425886 where the highest equilibrium backbone RMSD was observed (Figure 3D), with the exception that protomer A behaved as protomer B, and vice versa in each case. The predictions were obtained from ProDy in VMD.96

Figure 3:

Figure 3:

(A) Violin plots of Cα RMSD values for the reference (in grey) and the mutant (coloured in blue) Mpro, showing the 25th, 50th and 75th percentiles in dotted lines inside the kernel density plots. Distributions are scaled by area, and have been sorted by the KDE DJ distance (shown above each distribution) computed between the each sample and the reference protease. (B) The general twisting motion of the protomers observed across Mpro samples, inferred from the highest probability density reference protease pose using the ANM mode 7. (C) Mode 8 is shown for the same reference pose and for (D) the highest recorded distribution mode in EPI_ISL_425886 at 77 ns.

We further examined these samples by collecting single conformations found at each of the local peaks (local energy minima) from their individual KDE distributions. A cut-off KDE dJ value of 60% was used as a prior filter to limit the number of samples to be assessed visually. Using this criterion, modes from the KDE distributions of 25 mutant samples (marked by an asterisk symbol in Table 1), in addition to the reference protease, were used to extract conformations (Figure 4) corresponding to different local energy minima.

Figure 4:

Figure 4:

3D visualisation of Mpro conformations extracted from KDE distributions of Cα RMSD. All panels share a common colour scale (ranging from blue, through yellow to red) depicting the Cα distances calculated from an aligned high probability reference conformation obtained at 67.84ns (in white cartoon representation). Regions of appreciably higher distances are circled in red (domain III) and black (domain I) dotted lines.

For a fair comparison, the reference protease conformation corresponding to the highest KDE peak was used as a comparator for all samples. Thereafter, conformations displaying the highest displacements were visually selected. From the bird’s eye view of the sampled backbone RMSD in Figure 4, it can be seen that the regions of highest divergence include part of domain III, and to a lesser extent domain I. Only one subunit is seen to display more movement, in each of the extracted high probability density poses. Due to the homodimeric nature of the protein, a permutation of structural superpositions of the mutant samples was performed against the reference protease chains to determine the best reference position, based on the lowest RMSD to domain II, which was the least mobile protease domain during MD. Domain I also showed notable movement, albeit of lesser magnitude than domain III. This region mainly comprised the active site components. The fact that motion happens about a more stationary centre is reminiscent of a fulcrum. The overall backbone dynamics and asymmetries are similar to what is described in SARS-CoV Mpro.

3.4. Estimation of the N-finger flexibility from MD using all-atom RMSD

The N-finger region is an important structural foundation required for the stabilisation of the functional Mpro dimer, as reported by a large body of work done in SARS-CoV97. The N-finger from one protomer controls the activity of the dimer’s alternate subunit98. Early work by Chou and co-workers demonstrated the importance of the salt bridge found between ARG4 (N-finger) of and GLU290 (domain III) from alternate protomers in subunit association99. Additional studies involving deletion53 and missense mutations51,100 have highlighted the importance of the coronaviral Mpro N-finger region in enzymatic activity. Multiple studies have also reported the dimer to be active form of the enzyme51,100, with only one protomer being active at a time52. Chen and colleagues suggested that the N-terminus only stabilises the active form of the enzyme and is not a prerequisite for dimerisation101 of the SARS-Cov Mpro. More specifically, an N-finger from each protomer interacts with the GLU166 residue from the alternate protomer to stabilise the catalytic pocket102. More recently, Li and co-workers showed that enzymatic activity is in fact coupled to the flexibility of a short loop (residues 139-141) near the active site, by engineering an active tetramutant Mpro monomer103. Their work, which was focused on understanding the stability of SARS-CoV Mpro obviates the need for a similar inquisition in its SARS-CoV-2 homolog. Therefore we started our investigation of the N-finger stability within the dimeric protein across all Mpro samples, as a means to detect any hint of reduced enzymatic activity via a reduction dimer stability.

While the observation of dissociation would give a clear verdict about inactivation (as experimentally reported in SARS-CoV98,100), its absence during an MD simulation may not inform us about inactivation. Independent simulations performed in duplicate (not shown) of dimeric Mpro containing the mutants G11A, E14A, R298A and even of the triple mutant (G11A/R298A/Q299A) did not dissociate over a period of 100ns, even though these were experimentally shown to be inactive monomers in SARS-CoV, suggesting that particular equilibrium states that may need to be traversed before observing such events. Another possibility may lie behind an incompletely understood mechanism of dimer formation, for which two models have been proposed using SARS-CoV, both suggesting that Mpro dimerisation may be substrate-induced102,104. We nevertheless examined the 100ns dynamics for any notable differences between the SARS-CoV-2 variants.

After fitting the proteins globally, no further fitting was done for the N-finger RMSD calculation in order to better represent the N-finger motions. As seen the KDE plots (Figure 5), the reference protein has two very stable N-finger conformations, characterised by unimodal distributions although the protomer equilibria are asymmetric. This tendency towards asymmetry is seen in most cases, with the exception of samples EPI_ISL_421506, EPI_ISL_417413, and to some extent EPI_ISL_425235. Due to the existence of both forms of symmetry, with asymmetry largely dominating the dynamics, it is probable that some form of dynamically-associated constraint may drive one N-finger into an equilibrium state while the other is forced into another. While one may speculate that EPI_ISL_421506 may be in an inactive state due to the perceived symmetry, other parts of the enzyme are not asymmetric. Multimodal distributions were observed in several cases, which clearly suggest the presence of multiple local minima, with transitory states occurring in between these modes. On the far right of Figure 5, samples EPI_ISL_423288, EPI_ISL_422191, EPI_ISL_420241, EPI_ISL_425643, EPI_ISL_419256, EPI_ISL_423007, EPI_ISL_420306 show median differences above 1.1 Å between the protomers. EPI_ISL_423772 showed the most prominent difference, which corresponds to a larger amount of time spent away from a stable conformational equilibrium, even though a stable equilibrium was also visited in chain A (the lowest mode for the same sample).

Figure 5:

Figure 5:

Kernel density distributions of RMSD values for the N-finger region across the mutant and reference protease complexes. The violin plots are split in two for each protein sample, showing the RMSD values for chains A (in blue) and B (in red). The tips of the distributions mark the minimum and maximum values for both chains combined in each complex. Samples have been sorted by increasing median distance between the chains, also shown (in Å) at the top of each sample distribution.

N-finger poses corresponding to the RMSD KDE peaks for EPI_ISL_423772 (M17I mutant) and the reference protease are shown in Figure 6. From the N-finger visualisation, we can observe that its distance from the high probability reference protease equilibrium pose varies from 1.0 to 13.5 Å. While subfigure (g) shows the highest distance reached by the N-finger Cα atom, all other poses (a-f and h) in Figure 6 correspond to multiple modes from the KDE distribution. Additionally, from the chain labels, the asymmetry can also be observed. It is also noteworthy to observe the transient nature of the α-helical region (residues 10-15) to which the N-finger is connected, both in the mutant and the reference. The STRIDE105 functionality within VMD (version 1.9.3) predicts the helical region to be shorter (residues 14-16) in the reference protease. This lengthening and shortening is likely related to interprotomer stability and may potentially be associated with enzymatic activity. It may therefore be a generalisation of the similar functionality observed in the chameleon loop that is directly associated with enzymatic activity switching. The same protein displayed discernible backbone motions in domain I and III (Figures 3 and 4). While the mutation occurs on a beta strand located in a core area of the protein, the non-bonded interactions are similar for both M17 and I17.

Figure 6:

Figure 6:

3D visualisation of equillibrium N-finger conformations from different Mpro samples. Each panel shows equilibrium poses of the N-finger using all-atom RMSD as the collective variable. The mutant chain labels are shown, next to the sampled times. The reference pose (chain A, 46.34ns in grey) is used as comparator in each panel, while the samples are coloured according to their Cα atom displacements from the reference, starting with blue (lowest), through yellow to red (highest). The distance values are coloured from blue (low values), through yellow to red (highest values). The distance between the first Cα atom (in Å) of each sample from the equivalent position in the reference protein is also shown.

It is clear that the N-finger can adopt a range of equilibrium conformations, with some showing protomer symmetries and others not. Above all, it can safely be assumed that all of the samples are enzymatically active, otherwise the SARS-CoV-2 would not be able to perform catalysis and thrive in the human host. The distribution shapes may therefore be different parts of a continuum along the potential energy surface of active Mpro. It is also possible that a different energy surface may manifest itself in the presence of a substrate to emphasize the differences in distribution shapes. There are hypotheses that support this claim, and are elaborated further in section 3.8.

3.5. Analysis of residue fluctuations and BC across Mpro samples

The analysis of residue fluctuations shows that there are generally interspersed areas of low and high flexibility along Mpro (supplementary Figure S1). Some local areas of higher flexibility were also seen, as is generally observed in unstructured secondary structures.106 Additionally, from the lack of a clustering between chains belonging to the same sample for each segment of Mpro (supplementary Figure S1; cluster tree was removed for figure visibility), we conclude that the residue dynamics for the same domain between alternate chains of Mpro are asymmetric in the dimeric apo form. Focusing onto the specific regions of Mpro, it can be seen that the N-terminal region of the N-finger generally displayed moderate residue fluctuations despite being sandwiched at the interface of two Mpro chains, thus agreeing with the N-finger RMSD data. Domain I is generally moderately flexible at residue positions 22-25, 33-34 and 92-93. Higher flexibility was globally observed along the intervals 45-65 and 71-74, and at position 76. Very high fluctuations were recorded for a subset of mutants at positions 46-54, most particularly for only one chain amongst the mutants EPI_ISL_416720, EPI_ISL_423642, EPI_ISL_415503 and EPI_ISL_425886, thus reinforcing the observation of asymmetry between chains. Mutations from samples EPI_ISL_416720 and EPI_ISL_425886 were already found to lead to increased backbone fluctuation using Cα RMSD. The Y237H mutation in EPI_ISL_416720 introduces two carbon H-bonds (one with L272 and another with V233), in addition to the pi-alkyl interaction that is present in both the reference and this mutant, which seem to hold the solvent exposed helices together in domain III. In a review by Horowitz and Trievel, the carbon H-bond was highlighted as an underappreciated interaction that is otherwise widespread in proteins, with the ability to form interactions as strong as conventional H-bonds via polarisation.107 It is possible that such an increase in interaction may improve the stability around this region in domain III for the Y237H mutation. In the case of D248E mutation in EPI_ISL_425886, the D248 side chain is H-bonded to Q244 in the reference structure, possibly stabilising the helical structure. This interaction is absent upon mutating to an E248. In the last frame from MD, the side chain epsilon oxygen was found to interact with its backbone hydrogen atom, indicating a decreased stabilisation of the helical region of domain III. T201A in EPI_ISL_423642 abolishes the H-bond that is otherwise present between T201 and the backbone oxygen atom of E240, very likely weakening their interaction. The V157I mutation in EPI_ISL_415503 does not significantly alter the non-bonded interactions, but occurs on a beta strand on domain II. A unique behaviour was observed in the case of chain B of EPI_ISL_423772, where the leading residues of the N-finger were the most flexible while the rest of the protein was the least flexible across all samples. However, the M17I mutation does not significantly change the non-bonded interaction in EPI_ISL_423772, but occurs on a beta strand in domain I. Domain II is most flexible across all samples at residue position 153-155. Moderate flexibility is generally observed at residue positions 100, 107, 119, 137, 141-142, 165-171, 178 and 180. The linker region was generally highly flexible in all cases within the region spanned by residues 188-197. Notably higher fluctuations were observed at position 185 in samples EPI_ISL_420610 (chain B) and EPI_ISL_423007 (chain A). Across all domains, parts of domain III contain the most flexible residues within Mpro. It is highly flexible at residue positions 222-224, 231-233, 235-236, 244-245 and 273-279. The high fluctuations observed at C-terminus residues may not be very informative in our case as they freely interact with the solvent and do not have a strong enough network of non-bonded interactions with the Mpro domains. As a whole, from the empirical cumulative distribution (ECD) of averaged RMSF values across all Mpro samples, the top 5% positions (most variable regions) comprise residues 222, 277, 223, 154, 47, 72, 50, 224, 232, 64, 279, 236, 235, 244 and 51, with values ranging from 0.386 to 0.245 nm. The bottom 5% (most stable regions) of the distribution comprise positions 149, 146, 147, 174, 29, 113, 163, 39, 124, 150, 7, 175, 38, 161 and 173, with RMSF values ranging from 0.057 to 0.077 nm. On the 3D mapping (Figure 7A), it can be seen that the regions with the highest flexibility are solvent exposed surfaces, comprising loops or parts of helices connected by loops. The central core of the enzyme has the lowest flexibility, most likely to provide structural stability to the functional dimer. Catalytic residues (HIS41 and CYS145) are connected to these stable core residues on one side. However, HIS41 is connected on the other side to a more mobile structure composed of a 310 helix connected by loops on each end, which forms a lid structure, similar to what is described earlier for previous human coronavirus strains.108,109 Further discussions around the substrate binding site are given in section 3.8.

Figure 7:

Figure 7:

3D mapping of averaged values for (A) RMSF and (B) the average BC, computed across all Mpro samples. Only the extremes (top and bottom 5% of the ECD values across all samples) of averaged values are labelled for each metric. The lowest averaged values are in yellow while the highest ones are in red. The last three C-terminus residues (in white) were not mapped by RMSF as their high values would mask other values. While only one protomer is detailed, the data is applicable to both protomers. The other protomer is represented as a grey surface.

BC is a network centrality metric that is maximised when a large number of nodes (Cβ or GLY Cα atoms) traverse a single node along geodesic paths to reach a large number of other nodes within a network. When averaged from MD frames, this metric has been shown to be approximately inversely related to RMSF.110 These values were very similar across all samples (supplementary Figure S2). For this reason, the discussion of our findings will be around the overall BC behaviour recorded across all samples. High and low BC values are present within all domains (supplementary Figure S2). The N-finger was found to be generally composed of high BC residues, most likely due to its high degree of non-bonded interactions between the protease chains. A large portion of domains I and III have low BC values, most likely due to the comparatively lower amount of contact with protein surfaces, which allows for a relatively higher mobility compared to domain II. The top 5% highest averaged BC values across all Mpro samples were either found in the monomer core regions or at the dimer interface, as seen in Figure 7B. These comprised residue positions 17, 128, 115, 111, 112, 14, 18, 39, 11, 13, 146, 148, 147, 139, 6 and 140 in descending order of overall averaged BC values, ranging from 10480.987 to 6064.650. The residue positions within the lowest 5% overall averaged BC values comprise the positions 96, 72, 92, 258, 64, 244, 47, 59, 228, 56, 76, 60, 255, 78 and 232, ranging from 8.095 to 46.251. A complete listing of the top BC residues for each protomer is given in Table S2. Compared to the high BC areas that correspond to very well maintained communication residues (which may well be actual functional sites110), the low BC values represent residues that are least important for maintaining the flow of communication across the protein, due to the transient nature of their medium to long-ranged path lengths. From the computed sample average BC values and as seen the supplementary Table S2, residues positions 17 and 128 were found to occur as the most common first two residues in all cases. In the previous work done in human heat shock protein, high BC residues found within cavities were found to correspond to allosteric hotspots, as these had been independently verified by the sequential application of external forces on protein residues using the perturbation response scanning (PRS) method. In our case, however these two positions are not found in cavities, and are rather buried structural units that are not very mobile in the short to medium range. The high BC at M17 is possibly due to the increased stability imparted by their presence at a dimer interface. Due to its presence between the N-finger and the beginning of domain I, it is likely a pivot point for the N-finger. Due to the high centrality of these residues, it is possible that mutations leading to the alteration of their physicochemical activity may be accompanied by a decrease in dimer stability. EPI_ISL_423772, which contains the M17I mutation displayed a relatively high backbone mobility, more particular along one of the α-helices of domain III. Even though both residues are hydrophobic an increase in backbone motion was observed. It is possible that by the replacing the unbranched methionine side chain111 by the branched amino acid isoleucine, the potential for local mobility is reduced, possible leading to farranging compensatory effects.

3.6. Estimation of interprotomer distances using COM distance

The COM distance distributions estimated from all atoms between the Mpro protomers (Figure 8) indicate that the distance between them can vary from over about 2.4 nm to 2.8 nm globally, and the presence of modes of differing height suggests at least two different equilibrium chain COM distances in these cases. About half of the samples displayed KDE dJ distances with that were very close to the reference protease, corresponding to a distance value of about 0.5 nm. More specifically, in isolates EPI_ISL_421763 and EPI_ISL_419984 a significant proportion of the sampled conformations depict the presence of closer chain COM values, though the percentage of such conformations is lower in EPI_ISL_419984. 3D visualisation showed that the THR70 side chain formed an additional (but transient) H-bond with either VAL73 or with the MET17 backbone oxygen in EPI_ISL_421763. In addition to increasing interconnectivity within the β hairpin, temporary H-bonding interactions between MET17 and THR70 connect the N and C-termini from domain I may be the main for the lowest interprotomer COM distance. We note that ALA70 cannot form such a bridge with MET17 due to its hydrophobic side chain. From Figure 9, it can be seen that domain III is a key player associated with an increased COM distance, given that the largest displacements are experienced there. Therefore, it is not impossible that THR70 may have remotely interfered with the motion of domain III. Longer MD simulations may be needed to ascertain this observation. Isolate EPI_ISL_415610 (bearing mutation A193V) displayed the highest median interprotomer COM distance. The main physicochemical difference in A193V is the higher molecular weight and volume of the valine side chain, however the mutation occurs in the solvent exposed linker that connects domains II and III. Weighted residue contact mapping showed that the main difference was a six-fold increase in the residue contact frequency (0.72 in the mutant versus 0.17 in the reference) between residue 193 and THR196, in chain A. Samples starting from EPI_ISL_425655 onwards were sampled for modes, aligned to the reference protease and coloured by their Cα-Cα distance in Figure 9. Only samples showing the most marked differences from the reference are shown. Superposition adjustment was made as previously explained, to obtain the most probable structural alignment for the dimer. Due to the observation of a generally higher deviation of domain III from the rest of the protein, a correlation was calculated at a 95% significance level for all shortlisted equilibrium conformations against the interprotomer COM distance, and as intuited, a non-negligible correlation of 0.59 with a p-value of 0.003 was obtained. It is very likely that the domain III COM motions could drive larger conformational changes due to its size and flexibility, should their stabilising interprotomer interactions weaken. This is in good agreement with previous studies suggesting its involvement in regulating of dimerisation, more specifically with the proposition by Hu and colleagues that domain III might act as a “motivator” controlling N-finger movement in SARS-CoV112. Another notable observation is the relatively higher mobility of one of the domain III α helices and one of the 310 helices in some mutant equilibrium conformations, when compared to the homologous positions in the reference equilibrium conformation. COM distance can be influenced to a certain extent by the shape of domain arrangements as it can modulate the COM. As such, this may not represent other changes to the protein geometry. Therefore, we proceeded to quantify inter-domain angles as a next step.

Figure 8:

Figure 8:

Distributions of interprotomer COM distances across samples, arranged in ascending order of the KDE dJ. The reference protease is in grey while the mutants are coloured from yellow to red, in increasing order of distance from the reference KDE.

Figure 9:

Figure 9:

3D visualisation (top view) of the interprotomer COM distances across samples. Domain III from each protomer is circled in black, while the chains are coloured white and salmon in the reference (a). All other samples are coloured according to their Cα atom displacements from the aligned reference, starting with blue (lowest), through yellow to red (highest). The interprotomer COM distance is depicted by horizontal yellow dashes (and yellow font). Domain III COM distance is also shown, in red dashes and font.

3.7. Estimation of inter-domain angles in each protomer of Mpro

The angle formed between domains I, II and II has a relatively high information content for Mpro dynamics due to the variability it captures. While both chains A and B behave similarly in the reference, a higher degree of inter-domain angle variation is seen across the mutants, comprising skewed distributions and angle asymmetries between protomers (Figure 10). The angle distributions are similar to those of the reference in some cases, but are also more variable in others. The difference between inter-domain angles was computed for each protomer pair and used to sort the samples in increasing order of divergence, such that the most asymmetric distributions occur on the right. Among the divergent samples, using a median difference of 2 degrees as cut-off, chains A and B were found to be most divergent in samples EPI_ISL_423280, EPI_ISL_424470, EPI_ISL_418075, EPI_ISL_425319, EPI_ISL_420510, EPI_ISL_425284, EPI_ISL_423642, EPI_ISL_425342, EPI_ISL_425643, EPI_ISL_421312, EPI_ISL_421515, EPI_ISL_419710, EPI_ISL_425886, EPI_ISL_418269, EPI_ISL_413021, EPI_ISL_420181, EPI_ISL_417413, EPI_ISL_423007, EPI_ISL_422860, EPI_ISL_419256, EPI_ISL_425132, EPI_ISL_425839, EPI_ISL_419756, EPI_ISL_420241 and EPI_ISL_421005 in ascending order of difference. This provides additional support behind our general observation of the protomers generally behaving in an independent manner in SARS-CoV-2 Mpro, as similarly described in SARS-CoV. As explained earlier, symmetric behaviour may be part of the protease mechanics in the absence of a substrate. Due to the richness of information retrieved from the measurement interdomain angles, we propose that this metric may also help assist the in silico characterisation (or differentiation) of Mpro variants from future strains of SARS-CoV-2 should any particular virus-associated phenotype become available.

Figure 10:

Figure 10:

Kernel density distributions of inter-domain angles (domains I-II-III) across the mutant and reference protease complexes. The violin plots are split in two for each protein, showing the inter-domain angles for chains A (in blue) and B (in red). The tips of the distributions mark the minimum and maximum values for both chains combined in each protein complex.

Samples of the most divergent conformations from the reference protease are shown in Figure 11. Using domain II as fulcrum, it can be seen that inter-domain angles are not necessarily planar. On the basis of divergence from the reference protease, it can be observed that domain III moves the more than domain I around the fulcrum. While the equilibrium for EPI_ISL_418075 is quasi-planar (even though it displayed the widest range of inter-domain angles), the equilibrium displaying the largest angle shown (183.2 degrees) was recorded from EPI_ISL_421005 (from the explement the shown angle in Figure 11c).

Figure 11:

Figure 11:

Visualisation of inter-domain angles (domains I-II-III) across the mutant and reference protease complexes. The reference is in white, while the mutant samples are coloured according to their Cα atom displacements from the reference protease, starting with blue (lowest), through yellow to red (highest). Angles are shown in degrees. Only one of the protomers is shown in each case.

3.8. Analysis of pocket compaction and proposition of a dual allosteric site in SARS-CoV-2

The substrate binding pocket, nested in the cleft between domains I and II112, is composed of the very flexible loops intertwined with the catalytic dyad residues HIS41 and CYS145. With the assistance of HIS41, residue CYS45 acts as an electrophile during the first step of hydrolysis.36 Previously, studies in SARS-CoV have reported that components of this pocket - the S1 subsite (residues PHE140, HIS163, MET165, GLU166, and HIS172), an oxyanion hole (main chain amides of residues GLY143, SER144 and CYS145) and oxyanion loop (residues SER139-LEU141) - play an important role in stabilisation of the binding site in the active form of Mpro in SARS-CoV95,102,112. Conversely, inactive Mpro is characterised by the collapse of both the S1 subsite and the oxyanion hole95, whereby the loop moves inwards leaving no space for the tetrahedral intermediate product, which normally results from proteolytic cleavage of a substrate112,113. A more recent study by Li positioned the same loop (also known as the “chameleon loop”) as a switch by producing an active monomer103, whereby its stabilisation into a 310 helix was associated with the inactive enzyme. The stacking of the PHE140 phenyl ring against the HIS163 imidazole ring (found in the S1 subsite) is a key interaction that maintains the latter uncharged over a range of pH changes100. Another important feature of the binding site is the presence of a very flexible loop at residue positions 45-51 which form subsites S2 and S3’, termed the “TAEDMLN loop” in SARS-CoV. In SARS-CoV-2, this loop has mutated to “TSEDMLN”84. The loop has further mutated to “TSEEMLN”” in sample EPI_ISL_425242, which we speculate may have an impact on substrate binding affinity or even specificity. Different cavity calculations methods were then used to scan the reference protease for any potentially unknown functional sites, namely FTMap and PyVOL. Their predictions were concordant for all cases, except for the substrate binding site, which FTMap did not detect (Figure 12). However, a very good coverage of interprotomer cavities was obtained by combining both methods, but more interestingly a potential allosteric site was found at the interprotomer interface, next to the substrate binding site. It is mirrored across each side of the interacting protomers. For this reason, the dynamics of both pockets were examined, as this could play an important role in the dimerisation properties of the protomers. The substrate binding site from each protomer was also examined due to its already known functional importance in catalysis. While the substrate binding pockets are easily defined as belonging to a given protomer, the interfacial pocket is composed of residues from each chain, namely residues 116, 118, 123, 124, 139 and 141 on chain A, and residues 5-8, 111, 127, 291, 295, 298, 299, 302, 303 on chain B. The mirrored interfacial pocket comprised the same residue positions, but for the opposite chain labels. When compared to available literature from the SARS-CoV, multiple residues corresponded to stabilising elements of the catalytic pocket. For instance, residues 139 and 141 form part of the “chameleon” switch, which assumes a 310 helix form when inactive, while residues 5-8 are part of the N-finger, which is essential for the stability and activity of the alternate protomer. The FTMap probes formed identical cross-clusters composed of probe IDs 1amn, 1ady, 1eth and 1acn at each site, which respectively correspond to the compounds methylamine, acetaldehyde, ethane and acetonitrile. As noted from the probe chemical compositions, this dual site has the potential to accommodate compounds of low molecular weight, with no (or probably limited) rotatable bonds and a low octanol-water partition coefficients.114 The substrate binding site, as determined by PyVOL comprised residues 25, 26, 27, 41, 44, 49, 140, 141, 142, 143, 144, 145, 163, 166 in each protomer.

Figure 12:

Figure 12:

Pocket detection using combined predictions from FTMap and PyVOL. FTMap probes are shown as stick figure representations, while those from PyVOL are shown as surfaces. The protomers are depicted as cartoon representations, in grey and light orange.

From the distributions of Rg values for the catalytic site (Figure 13), we promptly observe the asymmetry in compaction between substrate binding pockets coming from each protomer in each sample. Partial symmetry is seen in few cases, such as EPI_ISL_419710, EPI_ISL_425242, EPI_ISL_423007, 416720, EPI_ISL_421380, EPI_ISL_425284 and EPI_ISL_419256. In the reference sample we observed a shifted in equilibrium Rg, where one protomer oscillates around value while its counterpart also explores two others. Multi-modal distributions indicate the presence of more than one equilibrium, which hints at cavity expansion and compaction movements. The most compact substrate binding cavities are observed in one of the protomers from sample EPI_ISL_425489, and to some extent samples EPI_ISL_418082

Figure 13:

Figure 13:

Kernel density distributions of Rg values for the substrate binding site from each protomer of Mpro, arranged in ascending order of difference in median from each chain. The differences, shown at the top are in nanometres. Chain A values are in red while chain B values are in blue. The maxima and minima are across protomers. Quartiles for each binding site are shown as dotted lines.

Some substrate binding cavities of interest are shown in Figure 14. Shi and colleagues described several features surrounding the substrate binding pocket in SARS-CoV100. One of them is the requirement of a key interaction between residues PHE140 and HIS163 for the stability and activity of the protease. This same region is shown in Figure 14, using poses identified from local energy minima for (a) the reference protease, (b) EPI_ISL_420610 and (c) EPI_ISL_423642. The highest probability density pose (in white) is used as a comparator. In Figure 14(a), the stabilising PHE140 and HIS163 display their stacked rings and the stabilising H-bonds linking the N-finger SER1 to GLU166 and HIE172. ASN28 is also shown to stabilise the positioning of CYS145 in all cases. In EPI_ISL_420610 (N274D), however, we observed a yet undescribed relatively stable pose for PHE140. It basically loses its described interaction and flips to hydrophopically interact with the C-terminal helix, via ARG298 (pi-alkyl interaction), GLN299 (amide-pi stacked interaction and a carbon H-bond) and MET6 (pi-sulfur interaction)65. This has the effect of pulling the oxyanion loop to face away from the catalytic residues, thus making the pocket wider. Also shown is the coupled movement of an opposite loop (containing ARG188) by a distance of 8.9 Å, compared to the reference protease conformation. In EPI_ISL_423642, the oxyanion loop showed signs of stabilisation, which may correlate with a of lack enzymatic activity for that protomer. As this was observed for only one of the protomers and given that the isolate was obtained from an infected individual, this sequence we cannot infer global inactivation.

Figure 14:

Figure 14:

3D visualisation of the Mpro substrate binding site poses of interest. (a) A local minimum reference from the reference protease conformation. (b) Detection of a non-canonical pose for residue 140 (in purple) together with a significant loop opening (dashed red line) in EPI_ISL_420610. (c) Early stages of a 310 helix formation in EPI_ISL_423642. Another local minimum (highest probability density) pose from the reference protease (in white) is used as comparator in all panes. Chain A is in pale blue while chain B is in green. The key interacting residue for PHE140 - HIS163 is depicted as yellow sticks.

As done for the substrate binding cavity, the degree of compaction of the interprotomer cavities was also measured. Here as well, the distributions are asymmetric (Figure 15). It is tempting to do a comparison between the substrate binding cavity and the interprotomer pockets. However, when the median Rg values obtained for the substrate binding pocket are correlated (using Spearman’s correlation) against the corresponding medians for the interprotomer pocket across all samples, no significant correlation was obtained, even though in our case, residue 141 was shared between both pockets. On an individual level, however there is a significant degree of correlation between the interprotomer pocket and the substrate binding site, ranging from −0.68 to 0.54 (using Spearman’s correlation), with p-values < 0.01 and absolute correlations > 0.1 for 64.9% of the sample comparisons (two substrate binding sites vs two interprotomer pockets for each sample). As the interprotomer pocket is mirrored between chains this may explain the negative correlations. From this finding we propose that the interprotomer pockets may play an important role in affecting the degree of compaction of the binding cavity and vice-versa. This suggests the possibility of a potentially bivalent modulation of the interprotomer pocket and the substrate binding site. As Rg only captures the overall degree of compaction, it may not entirely inform us about the cavity volume accessible to an allosteric modulator, however this may be an interesting lead for allosteric modulator targeting. Snapshots where large deviations were obtained (compared to a high probability density reference conformation) are shown in Figure 16. The Rg values are used as radii for the spheres shown in Figure 16, centred at the COM of the allosteric pockets. Different distribution modes for EPI_ISL_423772 (a-c) point to a greater movement of an α helical region of domain III, from only on protomer at a time, as seen previously. The equilibrium from EPI_ISL_425886, and to some extent EPI_ISL_425886 behaved similarly. The second equilibrium from the reference sample (d) performed very similarly to the reference of highest probability density used as comparator (in white) for all the shown cases. From all the equilibrium poses, it can be seen that the differences in subpocket Rg between each protomer pair are not associated with similar displacements of the domain III helix.

Figure 15:

Figure 15:

Kernel density distributions of Rg values across samples for the mirrored interfacial (and potentially allosteric) pockets. The differences, shown at the top are in nanometres.

Figure 16:

Figure 16:

Visualisations of the Rg values (Å) of the mirrored interfacial pockets in various samples. The reference (obtained after 63ns) is in white, while the mutant samples are coloured according to their Cα atom displacements from the reference protease, starting with blue (lowest), through yellow to red (highest). Regions of high divergence are circled. The spheres are centred at the COM of the pocket residues, while their radii are the Rg values in Å.

3.9. Investigating correlated residue motions using DCC

In order to compare the DCC across all samples, the DCC matrices were linearised before calculating pairwise correlations and clustering the final matrix (Figure 17). While abstracting out the intricate details of residue correlation, clusters of correlated samples inform us on the global similarity of correlated protein motion across each pair of samples. From Figure 17, it can be seen that the samples are generally highly correlated, with the exception of samples EPI_ISL_415503, EPI_ISL_416720, EPI_ISL_419984, EPI_ISL_420241, EPI_ISL_422184, and EPI_ISL_418075, which form a sub-cluster of moderately correlated samples. Both EPI_ISL_415503 and EPI_ISL_416720 have been described in section 3.5 as having higher RMSF values at positions positions 46-54 due to the mutations V157I and Y237H. Mutation R105H (occurring on a loop region) in EPI_ISL_419984 leads to the loss of an H-bond with F181, but forms a pi-pi stacking interaction with Y182. The most likely explanation for a difference in this case may be related to the sampling of at least two conformational equilibria using protomer COM distance, as described in section 3.5. In sample EPI_ISL_420241, the P184L mutation occurs in a solvent-exposed loop and no major non-bonded interactions were detected from the side chains or backbone atoms. The main reason for the observed difference is due to the increased divergence in inter-domain angles sampled from MD. In the case of EPI_ISL_422184, as explained in section 3.7, it was found the N-finger from one chains had moved by a larger extent at the end of the simulation (at about 93ns), diminishing its contacts with the alternate protomer, to interact more with its own protomer. This may be attributable to the S301L mutation, which reduces H-bonding at the end of the C-terminal helical structure. In the reference, four H-bonds are formed between S301, and the backbone oxygen atoms of V297 and R298, whereas two H-bonds are formed by L301. Sample EPI_ISL_418075 had the lowest positive correlation to all samples. Upon closer examination, it was found that the terminal α helix for each protomer was getting gradually destabilised towards the end of the simulation. This is very likely an artefact linked to the absence of the two C-terminal residues, as the residue is solvent exposed protein and display very similar residue interactions at the position 255 in both the reference and the mutant, even though A255V occurs on a helix.

Figure 17:

Figure 17:

Heat map of correlations obtained from the linearised DCC matrices for the mutants and the reference Mpro clustered using the Euclidean distance.

A main observation across all samples is that residues within their individual domains are highly positively correlated within their respective domains. Domains I and II are seen to share a high degree of correlation with each other, behaving as a single unit most likely to maintain the integrity of the catalytic surface. From the MD simulations, we find that domain III is generally not positively correlated with any of the other two domains, and can even be negatively correlated with itself on the alternate chain, and may suggest a degree of independence for domain III in terms of dynamics and possibly function as well, at least in its dimeric apo form.

3.10. Coarse-grained simulations of SARS-CoV-2 Mpro structures reveal relationships between mutational patterns and functional motions

To complement all-atom MD simulations and obtain a more granular description of structural dynamics in studied systems, we performed coarse-grained (CG) simulations of in the SARS-CoV-2 main protease structures in the free and ligand-bound forms using the CABS approach6771 (Figure 18). By using a large number of independent CG simulations, we obtained conformational dynamics profiles for the studied systems and analysed these distributions in the context of examined mutations.

Figure 18:

Figure 18:

Conformational dynamics and collective motion slow mode profiles of the SARS-CoV-2 main protease structures. (A) The computed root mean square deviations (RMSF) from CG MC dynamics simulations of the free enzyme of SARS-CoV-2 main protease (PDB ID: 5RVF, 6Y2E). The profile is shown in magenta lines. The positions of the residues undergoing mutations are shown in red filled circles (A7, G15, M17, V20, T45, D48, M49, R60, K61, A70, G71, L89, K90, P99, Y101, R105, P108, A116, A129, P132, T135, I136, N151, V157, C160, A173, P184, T190, A191, A193, T196, T198, T201, L220, L232, A234, K236, Y237, D248, A255, T259, A260, V261, A266, N274, R279 and S301L). (B) PCA analysis of functional dynamics SARS-CoV-2 main protease structures. The slow mode shapes are shown as mean square fluctuations averaged over first five lowest frequency modes (in magenta lines). The residues undergoing mutations are shown in red filled circles as in the panel A. (C) Structural map of the functional dynamics profiles derived from CG-MD simulations in the SARS-CoV-2 main protease (PDB ID: 5RVF, 6Y2E). The colour gradient from blue to red indicates the decreasing structural rigidity and increasing flexibility as averaged over the first five low frequency modes. The positions of the residues undergoing mutations are shown in spheres coloured according to their level of rigidity/flexibility in slow modes (blue-rigid, red-flexible). The locations of the protease domains I, II, and III are indicated. The catalytic residues HIS41 and CYS145 are shown in sticks. (D) The rotated view for the structural map of functional dynamics profiles in the SARS-CoV-2 main protease with sites of mutations in spheres.

RMSF of protein residues revealed the distribution of stable and flexible regions, thereby allowing the assessment of the extent of mobility for mutational sites (Figure 18, panel A). In the domain I the most flexible residues are in the region 45-75, while for domain II the L2 loop (residues 165-172 and L3 (residues 185-200) located around the substrate-binding pocket also harbour a significant number of flexible positions. In addition, we observed significant fluctuations for surface residues 153-155 and 274-277 (Figure 18, panel A). Of particular interest is the distribution of stable and flexible residues in the substrate binding pocket. Some pocket residues from domain I (T24, T25, T26, and L27) experience moderate fluctuations, while several other sites (M49, Y54) displayed considerably higher mobility. Another group of the substrate binding site residues from domain II (S139, F140, Q189 ) also exhibited appreciable fluctuations, while residues G143, S144, H164,H163, E166, P168 C145 showed only moderate changes and remained stable in CG simulations (Figure 18, panel A). Notably and as expected the catalytic residues C145 and H41 from the substrate binding site also remained stable. The analysis generally showed that domain II residues were stable, while domain III (residues 198 to 303) showed more flexibility, especially in the peripheral solvent-exposed regions. This domain is involved in regulation of the dimerisation through a salt-bridge interaction between GLU290 of one protomer and ARG4 of the other protomer. Importantly, we found that these residues remained extremely stable in simulations. Interestingly, buried positions subjected to mutations exhibited different level of flexibility. While positions A7, V20, A116, A129, T135, I136, V157, C160, A173, and T201 were very stable, other buried sites with registered mutations in the domain III (A234, A266) showed larger fluctuations. It is worth mentioning that the interfacial residue A7 in the N-finger region important for enzymatic activity showed extreme level of rigidity.Simulation-derived residue-residue couplings were evaluated using principal component analysis (PCA). By comparing slow mode profiles we found that functionally significant patterns can be yielded with up to the five slowest eigenvectors that account for 90% of the total variance of the dynamic fluctuations. The functional dynamics profile averaged over the five slow modes showed that the domain I (residues 10-99) and domain III (residues 198-303) are mostly mobile in functional motions and can undergo large structural changes (Figure 18, panel B). At the same time, domain II (residues 100-182), is mostly stable during functional dynamics. The distribution of mutational sites clearly indicated the existence of two major clusters. One cluster of mutations is located in highly mobile regions of domain III (T198I, T201A, L220F, L232F, A234V, K236R, Y237H, D248E, A255V, T259I, A260V, V261A, A266V, N274D, R279C and S301L). These residues involved in protein motion are likely under different evolutionary constraints than are other functional sites. Another cluster of mutations is distributed in the domain II and includes 3 subgroups: a group of fully immobilized positions (A116V, A129V, P132L, T135I, I136V), a group of bridging (hinge-like) sites that connect rigid and flexible regions (Y101C, R105H, P108S, N151D, V157I/L, C160S, A173V, P184L/S) and a group of mostly mobile residues (T190I, A191V, A193V) (Figure 18, panel B). The group of potential hinge sites may be important for controlling regulatory motions and mutations in these regions (such as V157I/L, P184L/S) may affect global movements in the protease and its enzymatic activity.It is particularly important to dissect the connection between the function of some key residues and their contribution in collective movements. The dimerisation residues (R4, M6, S10, G11, E14, N28, S139, F140, S147, E166, E290, R298) are characterized by different local flexibility but tend to correspond to low moving regions of the protein in collective motions (Figure 18, panel C, D). The key substrate binding residues (H163, H164, M165, E166, and L167) are located at the very border of structurally immobilized and more flexible regions, and as such may constitute a hinge region that controls cooperative movements. Notably, some other binding site residues D187, R188, Q189, T190, and A191 are more flexible in slow modes and may undergo functional motions. Substrate recognition sites tend to exhibit structural flexibility and sequence variations so as to enable specific recognition required for mediating substrate specificity. We also explored the functional dynamics profile of the ligand bound protease complex (Figure 19). This structural map clearly illustrated that the ligand binding site is comprised of both rigid and flexible residues and located in the region that bridges area of high and low structural stability. In particular, we highlighted that residues S46, M49, T190, A191 in the substrate recognition site and in the ligand proximity may belong to moving regions in the global motions (Figure 19, panel B).

Figure 19:

Figure 19:

Structural map of functional motion profiles of the SARS-CoV-2 main protease structure complex with a ligand. (A) Structural map of the functional dynamics profiles derived from CG-MD simulations in the SARS-CoV-2 main protease in the complex with -ketoamide ligand (PDB ID: 6Y2F). The slow mode shapes are averaged over first five lowest frequency modes. The colour gradient from blue to red indicates the decreasing structural rigidity and increasing flexibility as averaged over the first five low frequency modes. The positions of the residues undergoing mutations (A7, G15, M17, V20, T45, D48, M49, R60, K61, A70, G71, L89, K90, P99, Y101, R105, P108, A116, A129, P132, T135, I136, N151, V157, C160, A173, P184, T190, A191, A193, T196, T198, T201, L220, L232, A234, K236, Y237, D248, A255, T259, A260, V261, A266, N274, R279 and S301L) are shown in spheres coloured according to their level of rigidity and flexibility in the low frequency modes (blue-rigid, red-flexible). The locations of the protease domains I, II, and III are indicated. The α-ketoamide ligands are shown in sticks in both protomers. (D) The rotated view for the structural map of functional dynamics profiles in the SARS-CoV-2 main protease with sites of mutations in spheres. The position of the α-ketoamide ligand is shown in sticks. The mobile residues in the slow modes from the substrate binding site that form interactions with the ligand (S46, M49, T190, A191) are indicated by arrows and annotations.

Our analysis shows that structural clusters of mutations may be distinguished by their evolution propensity and global mobility in slow modes regions. The mobile residues may be predisposed to serve as substrate recognition sites, whereas residues acting as global hinges during collective dynamics are often supported by conserved residues. The observed conservation and mutational patterns may thus be determined by functional catalytic requirements, structural stability and geometrical constraints, and functional dynamics patterns. We found that some sites and corresponding mutations may be associated with dynamic hinge function. The mutability of hinge sites (Y101C, R105H, P108S, V157I/L, C160S, A173V, P184L/S) and nearby sites (T190I, A191V, A193V) may be related with their structural and dynamic signatures to reside in the exposed protein regions rather than in the more conserved protein core. We could also conclude that these sites are located near the active site and control in the bending motions needed for catalysis, so their mutability may have an important functional role for enzymatic activity especially when combined with mutations in adjacent regions.

4. Conclusions

COVID-19 represents a significant global threat for which no effective solution currently exists. Physical distancing115 has significantly contributed in the slowing down the viral progression. However, time is of the essence to find a cure that will counter current and future impacts of the virus, especially for those at higher risk and for the world economies. Analysing the structural and dynamic properties of the novel mutants of the SARS-CoV-2 Mpro and contrasts with known SARS-CoV homolog facts gives important pointers and details about its dynamics behaviour.

From this work, several missense mutations were found across all domains of the SARS-CoV-2 Mpro, with various levels of support. There are various single residue substitutions, among which several are substitutions of alanine and valine. These mutations have occurred both in buried and solvent-accessible surfaces. From our filtered data set, residue positions 15, 157 and 184 appear to have mutated more than once. A relatively high number of titratable amino acids present in Mpro, which similarly to SARS-CoV, may play an important role in influencing its behaviour at various pH levels. Higher backbone flexibility was observed for the isolates EPI_ISL_416720, EPI_ISL_426097, EPI_ISL_421763, EPI_ISL_420610, EPI_ISL_425284, EPI_ISL_421380, EPI_ISL_423772 and EPI_ISL_425886 mainly due to domain III motions, and to some extent part of domain I. More generally, regions of lowest flexibility (and high BC) were core residues, while solvent-exposed loops were most flexible. All samples displayed a slight interprotomer twisting motion and domain rotational motions. A high degree of variation was observed in (1) the angle formed between domains I, II and II, (2) the substrate binding pocket Rg, (3) the interprotomer pocket Rg and (4) the N-finger flexibility, which may all be good descriptors for characterising Mpro dynamics. BC values were very similar across all samples, with extreme values being essentially anti-correlated to RMSF. Residues 17 and 128 appear to be very central residues based on the BC network metric, and it is likely that mutations altering their physicochemical property may have the potential to alter dimer stability. The D48E variant (from sample EPI_ISL_425242) lead to a novel “TSEEMLN”” motif at the substrate binding flap, which may have repercussions on the efficiency and specificity of substrate binding. Inactive apo Mpro did not show signs of dissociation. A non-canonical pose for the PHE140 widens the accessible surface of the substrate binding pocket by pulling on the oxyanion loop. We propose the presence of a mirrored allosteric interprotomer pocket, supported by two cavity detection approaches, and correlations between the interprotomer pocket compaction and the substrate binding pocket. The mirrored pocket may have the potential to accommodate compounds of low molecular weight and polarity. Asymmetries to partial symmetries in Rg distributions were seen for each of the substrate binding pockets and the interprotomer cavities for each isolate. The compactions of the allosteric subpockets does not seem to a noticeably affect the movement of domain III. However, a large portion of the samples displayed overall positive correlations according to DCC, meaning that the mutants generally behave similarly. In each individual DCC plot, domains I and II are found to behave as a single unit while domain III is generally more independent. Thorough, independent CG simulations of the apo and the ligand-bound Mpro further revealed a connection between regions accumulating clusters of mutations and their degree of residue fluctuation from the slowest modes. Additionally, we report of a possible set of dynamic hinging residues and their tendency to acquire mutations in exposed protein regions, whilst being grounded by less mutable core residues.

As a final note, it is important to be aware that there is an inherent lack of sampling depth in this current analysis of COVID-19 sequences due to the existence of undiagnosed mutations that may be present among infected individuals116 at the time of writing, and in our case we might have missed certain mutations using our filtering criteria, in our effort to balance accuracy and the number of high confidence representative samples. Therefore frequencies should be handled with caution as the sampling was done at an early stage of the pandemic.

Supplementary Material

ci-2020-00634r SI

Acknowledgements

We would like to thank the Centre for High Performance Computing for providing the computer time for the simulations. We also thank GISAID for making available the COVID-19 genomic material, together with the originating and submitting laboratories (acknowledged in supplementary Table S1). Genetic sequences are not disclosed in this article due to the restrictions from GISAID, but are fully accessible from the database.

Funding

O.S.A. is funded as a postdoctoral fellow by H3ABioNet, which is supported by the National Institutes of Health Common Fund grant number U24HG006941. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The content of this publication is solely the responsibility of the authors and does not necessarily represent the official views of the funders.

Footnotes

Supporting Information Available

Heat maps of RMSF (Figure S1) and average BC (Figure S2). Acknowledgement table detailing sample information (Table S1), and another (Table S2) showing the high BC residues for each chain.

Conflict of Interest Statement

The authors declare no conflict of interest.

References

  • (1).Zhou P; Yang X-L; Wang X-G; Hu B; Zhang L; Zhang W; Si H-R; Zhu Y; Li B; Huang C-L; Chen H-D; Chen J; Luo Y; Guo H; Jiang R-D; Liu M-Q; Chen Y; Shen X-R; Wang X; Zheng X-S; Zhao K; Chen Q-J; Deng F; Liu L-L; Yan B; Zhan F-X; Wang Y-Y; Xiao G-F; Shi Z-L A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 2020. , 579, 270–273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (2).Dai W; Zhang B; Su H; Li J; Zhao Y; Xie X; Jin Z; Liu F; Li C; Li Y; Bai F; Wang H; Cheng X; Cen X; Hu S; Yang X; Wang J; Liu X; Xiao G; Jiang H; Rao Z; Zhang L-K; Xu Y; Yang H; Liu H Structure-based design of antiviral drug candidates targeting the SARS-CoV-2 main protease. Science 2020, eabb4489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (3).Korber B; Fischer WM; Gnanakaran S; Yoon H; Theiler J; Abfalterer W; Hengartner N; Giorgi EE; Bhattacharya T; Foley B; Hastie KM; Parker DG Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell 2020, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (4).Varga Z; Flammer AJ; Steiger P; Haberecker M; Andermatt R; Zinker-nagel AS; Mehra MR; Schuepbach RA; Ruschitzka F; Moch H Endothelial cell infection and endotheliitis in COVID-19. The Lancet 2020, 395, 1417–1418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (5).Richardson S; Hirsch JS; Narasimhan M; Crawford JM; McGinn T; Davidson KW; Barnaby DP; Becker LB; Chelico JD; Cohen SL; Cookingham J; Coppa K; Diefenbach MA; Dominello AJ; Duer-Hefele J; Falzon L; Gitlin J; Hajizadeh N; Harvin TG; Hirschwerk DA; Kim EJ; Kozel ZM;Marrast LM; Mogavero JN; Osorio GA; Qiu M; Zanos TP Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area. JAMA 2020, 10022, 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (6).Helmy YA; Fawzy M; Elaswad A; Sobieh A; Kenney SP; Shehata AA The COVID-19 Pandemic: A Comprehensive Review of Taxonomy, Genetics, Epidemiology, Diagnosis, Treatment, and Control. Journal of Clinical Medicine 2020, Vol. 9, Page 1225 2020, 9, 1225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (7).Horby P; Lim WS; Emberson J; Mafham M; Bell J; Linsell L; Staplin N; Brightling C; Ustianowski A; Elmahi E; Prudon B; Green C; Felton T; Chadwick D; Rege K; Fegan C; Chappell LC; Faust SN; Jaki T; Jeffery K; Montgomery A; Rowan K; Juszczak E; Baillie JK; Haynes R; Landray MJ; Group RC Effect of Dexamethasone in Hospitalized Patients with COVID-19: Preliminary Report. medRxiv 2020, 2020.06.22.20137273. [Google Scholar]
  • (8).Beigel JH; Tomashek KM; Dodd LE; Mehta AK; Zingman BS; Kalil AC; Hohmann E; Chu HY; Luetkemeyer A; Kline S; Lopez de Castilla D; Finberg RW; Dierberg K; Tapson V; Hsieh L; Patterson TF; Paredes R; Sweeney DA; Short WR; Touloumi G; Lye DC; Ohmagari N; Oh MD; Ruiz-Palacios GM; Benfield T; Fatkenheuer G; Kortepeter MG; Atmar RL; Creech CB; Lundgren J; Babiker AG; Pett S; Neaton JD; Burgess TH; Bonnett T; Green M; Makowski M; Osinusi A; Nayak S; Lane HC Remdesivir for the Treatment of Covid-19 — Preliminary Report. New England Journal of Medicine 2020, NEJMoa2007764. [DOI] [PubMed] [Google Scholar]
  • (9).World Health Organization, Draft landscape of COVID-19 candidate vaccines - 7 July 2020. 2020; https://www.who.int/who-documents-detail/draft-landscape-of-covid-19-candidate-vaccines. [Google Scholar]
  • (10).Rothan HA; Byrareddy SN The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak. Journal of Autoimmunity 2020, 109, 102433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (11).Van Lancker W; Parolin Z COVID-19, school closures, and child poverty: a social crisis in the making. The Lancet Public Health 2020, 5, e243–e244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (12).Elbe S; Buckland-Merrett G Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges 2017, 1, 33–46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (13).Hatcher EL; Zhdanov SA; Bao Y; Blinkova O; Nawrocki EP; Ostapchuck Y; Schäffer AA; Brister JR Virus Variation Resource – improved response to emergent viral outbreaks. Nucleic Acids Research 2017, 45, D482–D490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (14).Duffy S. Why are RNA virus mutation rates so damn high? PLOS Biology 2018, 16, e3000003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (15).Kosakovsky Pond SL; Smith DM Are All Subtypes Created Equal? The Effectiveness of Antiretroviral Therapy against Non–Subtype B HIV-1. Clinical Infectious Diseases 2009, 48, 1306–1309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (16).Abecasis AB; Wensing AMJ; Paraskevis D; Vercauteren J; Theys K; Van de Vijver D. a. M. C.; Albert J; Asjö B; Balotta C; Beshkov D; Camacho RJ; Clotet B; De Gascun C; Griskevicius A; Grossman Z; Hamouda O; Horban A; Kolupajeva T; Korn K; Kostrikis LG; Kücherer C; Liitsola K; Linka M; Nielsen C; Otelea D; Paredes R; Poljak M; Puchhammer-Stöckl E; Schmit J-C; Sönnerborg A; Stanekova D; Stanojevic M; Struck D; Boucher C. a. B.; Vandamme A-M HIV-1 subtype distribution and its demographic determinants in newly diagnosed patients in Europe suggest highly compartmentalized epidemics. Retrovirology 2013, 10, 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (17).Sevajol M; Subissi L; Decroly E; Canard B; Imbert I Insights into RNA synthesis, capping, and proofreading mechanisms of SARS-coronavirus. Virus Research 2014, 194 , 90–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (18).Pachetti M; Marini B; Benedetti F; Giudici F; Mauro E; Storici P; Masciovecchio C; Angeletti S; Ciccozzi M; Gallo RC; Zella D; Ippodrino R Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant. Journal of Translational Medicine 2020, 18, 179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (19).Hadfield J; Megill C; Bell SM; Huddleston J; Potter B; Callender C; Sagulenko P; Bedford T; Neher RA Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 2018. , 34 , 4121–4123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (20).Liang Y; Wang M.-l.; Chien C.-s.; Yarmishyn AA; Yang Y-P; Lai W-Y; Luo Y-H; Lin Y-T; Chen Y-J; Chang P-C; Chiou S-H. Highlight of Immune Pathogenic Response and Hematopathologic Effect in SARS-CoV, MERS-CoV, and SARS-Cov-2 Infection. Frontiers in Immunology 2020, 11, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (21).Mousavizadeh L; Ghasemi S Genotype and phenotype of COVID-19: Their roles in pathogenesis. Journal of Microbiology, Immunology and Infection 2020, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (22).Aanouz I; Belhassan A; El-Khatabi K; Lakhlifi T; El-ldrissi M; Bouachrine M Moroccan Medicinal plants as inhibitors against SARS-CoV-2 main protease: Computational investigations. Journal of Biomolecular Structure and Dynamics 2020, 0, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (23).Hu X; Cai X; Song X; Li C; Zhao J; Luo W; Zhang Q; Ekumi IO; He Z Possible SARS-coronavirus 2 inhibitor revealed by simulated molecular docking to viral main protease and host toll-like receptor. Future Virology 2020, fvl–2020–0099. [Google Scholar]
  • (24).Peele KA; Potla Durthi C; Srihansa T; Krupanidhi S; Ayyagari VS; Babu DJ; Indira M; Reddy AR; Venkateswarulu T Molecular docking and dynamic simulations for antiviral compounds against SARS-CoV-2: A computational study. Informatics in Medicine Unlocked 2020, 19, 100345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (25).Yu R; Chen L; Lan R; Shen R; Li P Computational screening of antagonists against the SARS-CoV-2 (COVID-19) coronavirus by molecular docking. International Journal of Antimicrobial Agents 2020, 56, 106012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (26).Gentile D; Patamia V; Scala A; Sciortino MT; Piperno A; Rescifina A Putative Inhibitors of SARS-CoV-2 Main Protease from A Library of Marine Natural Products: A Virtual Screening and Molecular Modeling Study. Marine Drugs 2020, 18, 225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (27).Wu C; Liu Y; Yang Y; Zhang P; Zhong W; Wang Y; Wang Q; Xu Y; Li M; Li X; Zheng M; Chen L; Li H Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods. Acta Pharmaceutica Sinica B 2020, 10, 766–788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (28).Alazmi M; Motwalli O In silico virtual screening, characterization, docking and molecular dynamics studies of crucial SARS-CoV-2 proteins. Journal of Biomolecular Structure and Dynamics 2020, 0, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (29).Kumar S; Sharma PP; Shankar U; Kumar D; Joshi SK; Pena L; Durvasula R; Kumar A; Kempaiah P; Poonam,; Rathi B Discovery of New Hydroxyethylamine Analogs against 3CL pro Protein Target of SARS-CoV-2: Molecular Docking, Molecular Dynamics Simulation, and Structure–Activity Relationship Studies. Journal of Chemical Information and Modeling 2020, [DOI] [PubMed] [Google Scholar]
  • (30).Ghosh R; Chakraborty A; Biswas A; Chowdhuri S Identification of polyphenols from Broussonetia papyrifera as SARS CoV-2 main protease inhibitors using in silico docking and molecular dynamics simulation approaches. Journal of Biomolecular Structure and Dynamics 2020, 0, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (31).Khan MT; Ali A; Wang Q; Irfan M; Khan A; Zeb MT; Zhang Y-J; Chinnasamy S; Wei D-Q Marine natural compounds as potents inhibitors against the main protease of SARS-CoV-2—a molecular dynamic study. Journal of Biomolecular Structure and Dynamics 2020, 0, 1–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (32).Khelfaoui H; Harkati D; Saleh BA Molecular docking, molecular dynamics simulations and reactivity, studies on approved drugs library targeting ACE2 and SARS-CoV-2 binding with ACE2. Journal of Biomolecular Structure and Dynamics 2020, 0, 1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (33).Alamri MA; Tahir ul Qamar M; Mirza MU; Bhadane R; Alqahtani SM; Muneer I; Froeyen M; Salo-Ahen OMH Pharmacoinformatics and molecular dynamics simulation studies reveal potential covalent and FDA-approved inhibitors of SARS-CoV-2 main protease 3CL pro. Journal of Biomolecular Structure and Dynamics 2020, 0, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (34).Tahir ul Qamar M; Alqahtani SM; Alamri MA; Chen L-L Structural basis of SARS-CoV-2 3CLpro and anti-COVID-19 drug discovery from medicinal plants. Journal of Pharmaceutical Analysis 2020, 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (35).Mahmud S; Uddin MAR; Zaman M; Sujon KM; Rahman ME; Shehab MN; Islam A; Alom MW; Amin A; Akash AS; Saleh MA Molecular docking and dynamics study of natural compound for potential inhibition of main protease of SARS-CoV-2. Journal of Biomolecular Structure and Dynamics 2020, 0, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (36).Suárez D; Díaz N SARS-CoV-2 Main Protease: A Molecular Dynamics Study. Journal of Chemical Information and Modeling 2020, [DOI] [PubMed] [Google Scholar]
  • (37).Cava C; Bertoli G; Castiglioni I In silico discovery of candidate drugs against covid-19. Viruses 2020, 12, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (38).Gysi DM; Valle ÍD; Zitnik M; Ameli A; Gan X; Varol O; Sanchez H; Baron RM; Ghiassian D; Loscalzo J; Barabási A-L Network Medicine Framework for Identifying Drug Repurposing Opportunities for COViD-19. arXiv preprint arXiv:2004.07229 2020, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (39).Joshi RS; Jagdale SS; Bansode SB; Shankar SS; Tellis MB; Pandya VK; Chugh A; Giri AP; Kulkarni MJ Discovery of potential multi-target-directed ligands by targeting host-specific SARS-CoV-2 structurally conserved main protease. Journal of Biomolecular Structure and Dynamics 2020, 0, 1–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (40).Zhou Y; Hou Y; Shen J; Huang Y; Martin W; Cheng F Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2. Cell Discovery 2020, 6, 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (41).Das S; Sarmah S; Lyndem S; Singha Roy A An investigation into the identification of potential inhibitors of SARS-CoV-2 main protease using molecular docking study. Journal of Biomolecular Structure and Dynamics 2020, 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (42).Jin Z; Du X; Xu Y; Deng Y; Liu M; Zhao Y; Zhang B; Li X; Zhang L; Peng C; Duan Y; Yu J; Wang L; Yang K; Liu F; Jiang R; Yang X; You T; Liu X; Yang X; Bai F; Liu H; Liu X; Guddat LW; Xu W; Xiao G; Qin C; Shi Z; Jiang H; Rao Z; Yang H Structure of Mpro from COViD-19 virus and discovery of its inhibitors. Nature 2020, [DOI] [PubMed] [Google Scholar]
  • (43).Shah B; Modi P; Sagar SR in silico studies on therapeutic agents for COViD-19: Drug repurposing approach. Life Sciences 2020, 252, 117652. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (44).Randhawa GS; Soltysiak MP; El Roz H; de Souza CP; Hill KA; Kari L Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COViD-19 case study. PLoS ONE 2020, 15, 1–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (45).Ton AT; Gentile F; Hsing M; Ban F; Cherkasov A Rapid Identification of Potential Inhibitors of SARS-CoV-2 Main Protease by Deep Docking of 1.3 Billion Compounds. Molecular Informatics 2020, 2000028, 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (46).Anand K. Coronavirus Main Proteinase (3CLpro) Structure: Basis for Design of Anti-SARS Drugs. Science 2003, 300, 1763–1767. [DOI] [PubMed] [Google Scholar]
  • (47).Chen YW; Yiu C-PB; Wong K-Y Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CLpro) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates. F1000Research 2020, 9, 129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (48).Yang H; Xie W; Xue X; Yang K; Ma J; Liang W; Zhao Q; Zhou Z; Pei D; Ziebuhr J; Hilgenfeld R; Yuen KY; Wong L; Gao G; Chen S; Chen Z; Ma D; Bartlam M; Rao Z Design of Wide-Spectrum Inhibitors Targeting Coronavirus Main Proteases. PLoS Biology 2005, 3, e324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (49).Xue X; Yu H; Yang H; Xue F; Wu Z; Shen W; Li J; Zhou Z; Ding Y; Zhao Q; Zhang XC; Liao M; Bartlam M; Rao Z Structures of Two Coronavirus Main Proteases: Implications for Substrate Binding and Antiviral Drug Design. Journal of Virology 2008, 82, 2515–2527. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (50).Zhang L; Lin D; Sun X; Curth U; Drosten C; Sauerhering L; Becker S; Rox K; Hilgenfeld R Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors. Science 2020, 412, eabb3405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (51).Chen S; Hu T; Zhang J; Chen J; Chen K; Ding J; Jiang H; Shen X Mutation of Gly-11 on the Dimer Interface Results in the Complete Crystallographic Dimer Dissociation of Severe Acute Respiratory Syndrome Coronavirus 3C-like Protease. Journal of Biological Chemistry 2008, 283, 554–564. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (52).Chen H; Wei P; Huang C; Tan L; Liu Y; Lai L Only One Protomer Is Active in the Dimer of SARS 3C-like Proteinase. Journal of Biological Chemistry 2006, 281, 13894–13898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (53).Anand K. Structure of coronavirus main proteinase reveals combination of a chymotrypsin fold with an extra alpha-helical domain. The EMBO Journal 2002, 21, 3213–3224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (54).Shi J; Wei Z; Song J Dissection Study on the Severe Acute Respiratory Syndrome 3C-like Protease Reveals the Critical Role of the Extra Domain in Dimerization of the Enzyme. Journal of Biological Chemistry 2004, 279, 24765–24773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (55).Shi J; Song J The catalysis of the SARS 3C-like protease is under extensive regulation by its extra domain. The FEBS journal 2006. , 273, 1035–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (56).Brown DK; Penkler DL; Sheik Amamuddy O; Ross C; Atilgan AR; Atilgan C; Taştan Bishop Ö MD-TASK: a software suite for analyzing molecular dynamics trajectories. Bioinformatics 2017, 33, 2768–2771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (57).Fearon D; Owen C; Douangamath A; Lukacik P; Powell A; Strain-Damerell C; Resnick E; Krojer T; Gehrtz P; Wild C; Aimon A; Brandao-Neto J, Carbery A; Dunnett L, Skyner R; Snee M; London N; Walsh M; von Delft F PanDDA analysis group deposition – Crystal Structure of SARS-CoV-2 main protease in complex with PCM-0102306. 2020. [Google Scholar]
  • (58).Burley SK; Berman HM; Bhikadiya C; Bi C; Chen L; Di Costanzo L; Christie C; Dalenberg K; Duarte JM; Dutta S; Feng Z; Ghosh S; Goodsell DS; Green RK; Guranović V; Guzenko D; Hudson BP; Kalro T; Liang Y; Lowe R; Namkoong H; Peisach E; Periskova I; Prlić A; Randle C; Rose A; Rose P; Sala R; Sekharan M; Shao C; Tan L; Tao Y-P; Valasa-tava Y; Voigt M; Westbrook J; Woo J; Yang H; Young J; Zhuravleva M; Zardecki C RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy. Nucleic Acids Research 2019, 47, D464–D474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (59).Schrödinger L. The PyMOL Molecular Graphics System, Version 2.4.0a0. 2015. [Google Scholar]
  • (60).Camacho C; Coulouris G; Avagyan V; Ma N; Papadopoulos J; Bealer K; Madden TL BLAST+: architecture and applications. BMC Bioinformatics 2009, 10, 421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (61).Cock PJA; Antao T; Chang JT; Chapman BA; Cox CJ; Dalke A; Friedberg I; Hamelryck T; Kauff F; Wilczynski B; de Hoon MJL Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009, 25, 1422–1423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (62).Šali A; Blundell TL Comparative Protein Modelling by Satisfaction of Spatial Restraints. Journal of Molecular Biology 1993. , 234 , 779–815. [DOI] [PubMed] [Google Scholar]
  • (63).Dolinsky TJ; Nielsen JE; McCammon JA; Baker NA PDB2PQR: An automated pipeline for the setup of Poisson-Boltzmann electrostatics calculations. Nucleic Acids Research 2004, 32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (64).Jubb HC; Higueruelo AP; Ochoa-Montaño B; Pitt WR; Ascher DB; Blundell TL Arpeggio: A Web Server for Calculating and Visualising Interatomic Interactions in Protein Structures. Journal of Molecular Biology 2017, 429, 365–371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (65).Dassault Systèmes BIOVIA, Discovery Studio Visualizer, v19.1.0.18287 2019; http://accelrys.com/products/discovery-studio/.
  • (66).Van Der Spoel D; Lindahl E; Hess B; Groenhof G; Mark AE; Berendsen HJC GROMACS: Fast, flexible, and free. Journal of Computational Chemistry 2005, 26, 1701–1718. [DOI] [PubMed] [Google Scholar]
  • (67).Kolinski A. Protein modeling and structure prediction with a reduced representation. Acta Biochimica Polonica 2004, 51, 349–371. [PubMed] [Google Scholar]
  • (68).Kmiecik S; Gront D; Kolinski M; Wieteska L; Dawid AE; Kolinski A Coarse-Grained Protein Models and Their Applications. Chemical Reviews 2016, 116, 7898–7936. [DOI] [PubMed] [Google Scholar]
  • (69).Kmiecik S; Kouza M; Badaczewska-Dawid A; Kloczkowski A; Kolinski A Modeling of Protein Structural Flexibility and Large-Scale Dynamics: Coarse-Grained Simulations and Elastic Network Models. International Journal of Molecular Sciences 2018, 19, 3496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (70).Ciemny M; Badaczewska-Dawid A; Pikuzinska M; Kolinski A; Kmiecik S Modeling of Disordered Protein Structures Using Monte Carlo Simulations and Knowledge-Based Statistical Force Fields. International Journal of Molecular Sciences 2019, 20, 606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (71).Kurcinski M; Oleniecki T; Ciemny MP; Kuriata A; Kolinski A; Kmiecik S CABS-flex standalone: a simulation environment for fast modeling of protein flexibility. Bioinformatics 2019, 35, 694–695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (72).Bakan A; Meireles LM; Bahar I ProDy: Protein Dynamics Inferred from Theory and Experiments. Bioinformatics 2011, 27, 1575–1577. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (73).Kozakov D; Grove LE; Hall DR; Bohnuud T; Mottarella SE; Luo L; Xia B; Beglov D; Vajda S The FTMap family of web servers for determining and characterizing ligand-binding hot spots of proteins. Nature Protocols 2015, 10, 733–755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (74).Smith RH; Dar AC; Schlessinger A PyVOL: a PyMOL plugin for visualization, comparison, and volume calculation of drug-binding sites. bioRxiv 2019, c, 816702. [Google Scholar]
  • (75).Hunter JD. Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering 2007, 9, 90–95. [Google Scholar]
  • (76).McKinney W. Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference. 2010; pp 51–56. [Google Scholar]
  • (77).Van Der Walt S; Colbert SC; Varoquaux G. The NumPy array: A structure for efficient numerical computation. Computing in Science and Engineering 2011, 13, 22–30. [Google Scholar]
  • (78).Virtanen P; Gommers R; Oliphant TE; Haberland M; Reddy T; Courna-peau D; Burovski E; Peterson P; Weckesser W; Bright J; van der Walt SJ; Brett M; Wilson J; Millman KJ; Mayorov N; Nelson ARJ; Jones E; Kern R; Larson E; Carey CJ; Polat I; Feng Y; Moore EW; VanderPlas J; Laxalde D; Perktold J; Cimrman R; Henriksen I; Quintero EA; Harris CR; Archibald AM; Ribeiro AH; Pedregosa F; van Mulbregt P SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 2020, 17, 261–272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (79).McGibbon RT; Beauchamp KA; Harrigan MP; Klein C; Swails JM; Hernández CX; Schwantes CR; Wang L-P; Lane TJ; Pande VS MDTraj: A Modern Open Library for the Analysis of Molecular Dynamics Trajectories. Biophysical Journal 2015, 109, 1528–1532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (80).Nguyen H; Case DA; Rose AS NGLview–interactive molecular graphics for Jupyter notebooks. Bioinformatics 2018. , 34, 1241–1242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (81).Scott DW Multivariate Density Estimation: Theory, Practice, and Visualization; 1992; Chapter 6, pp 125–193. [Google Scholar]
  • (82).Rice AM; Morales AC; Ho AT; Mordstein C; Mühlhausen S; Watson S; Cano L; Young B; Kudla G; Hurst LD Evidence for strong mutation bias towards, and selection against, U content in SARS-CoV-2: implications for vaccine design. Molecular Biology and Evolution 2020, [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (83).Rambaut A; Holmes EC; O’Toole Á; Hill V; McCrone JT; Ruis C; du Plessis L; Pybus OG A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nature Microbiology 2020, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (84).Ma C; Sacco MD; Hurst B; Townsend JA; Hu Y; Szeto T; Zhang X; Tarbet B; Marty MT; Chen Y; Wang J Boceprevir, GC-376, and calpain inhibitors II, XII inhibit SARS-CoV-2 viral replication by targeting the viral main protease. Cell Research 2020, 30, 678–692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (85).Rasschaert D; Duarte M; Laude H Porcine respiratory coronavirus differs from transmissible gastroenteritis virus by a few genomic deletions. Journal of General Virology 1990, 71, 2599–2607. [DOI] [PubMed] [Google Scholar]
  • (86).Carter RW; Sanford JC A new look at an old virus: patterns of mutation accumulation in the human H1N1 influenza virus since 1918. Theoretical Biology and Medical Modelling 2012, 9, 42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (87).He JF; Peng GW; Min J; Yu DW; Liang WJ; Zhang SY; Xu RH; Zheng HY; Wu XW; Xu J; Wang ZH; Fang L; Zhang X; Li H; Yan XG; Lu JH; Hu ZH; Huang JC; Wan ZY; Hou JL; Lin JY; Song HD; Wang SY; Zhou XJ; Zhang GW; Gu BW; Zheng HJ; Zhang XL; He M; Zheng K; Wang BF; Fu G; Wang XN; Chen SJ; Chen Z; Hao P; Tang H; Ren SX; Zhong Y; Guo ZM; Liu Q; Miao YG; Kong XY; He WZ; Li YX; Wu CI; Zhao GP; Chiu RW; Chim SS; Tong YK; Chan PK; Tam JS; Lo YM Molecular Evolution of the SARS Coronavirus, during the Course of the SARS Epidemic in China. Science 2004, 303, 1666–1669. [DOI] [PubMed] [Google Scholar]
  • (88).Pfeiffer JK; Kirkegaard K Increased Fidelity Reduces Poliovirus Fitness and Virulence under Selective Pressure in Mice. PLoS Pathogens 2005, 1, e11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (89).Vignuzzi M; Stone JK; Andino R Ribavirin and lethal mutagenesis of poliovirus: molecular mechanisms, resistance and biological implications. Virus Research 2005, 107, 173–181. [DOI] [PubMed] [Google Scholar]
  • (90).Wensing A; Calvez V; Gunthard H; Johnson V; Paredes R; Pillay D; Shafer R; Richman D 2017 Update of the Drug Resistance Mutations in HIV-1. Topics in antiviral medicine 2017, 24, 132–133. [PMC free article] [PubMed] [Google Scholar]
  • (91).Wensing AM; Calvez V; Ceccherini-Silberstein F; Charpentier C; Gunthard H; Paredes R; W. SR; Richman D 2019 Update of the Drug Resistance Mutations in HIV-1. Topics in Antiviral Medicine 2019, 27. [PMC free article] [PubMed] [Google Scholar]
  • (92).Brown DK; Sheik Amamuddy O; Tastan Bishop Ö Structure-Based Analysis of Single Nucleotide Variants in the Renin-Angiotensinogen Complex. Global Heart 2017, 12, 121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (93).Jubb HC; Pandurangan AP; Turner MA; Ochoa-Montaño B; Blundell TL; Ascher DB Mutations at protein-protein interfaces: Small changes over big surfaces have large impacts on human health. Progress in Biophysics and Molecular Biology 2017, 128, 3–13. [DOI] [PubMed] [Google Scholar]
  • (94).Šali A. MODELLER: A Program for Protein Structure Modeling Release 9.12, r9480. 2013; http://scholar.google.com/scholar?hl=en{\&}btnG=Search{\&}q=intitle:MODELLER+A+Program+for+Protein+Structure+Modeling{\#}6. [Google Scholar]
  • (95).Yang H; Yang M; Ding Y; Liu Y; Lou Z; Zhou Z; Sun L; Mo L; Ye S; Pang H; Gao GF; Anand K; Bartlam M; Hilgenfeld R; Rao Z The crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor. Proceedings of the National Academy of Sciences 2003, 100, 13190–13195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (96).Humphrey W; Dalke A; Schulten K VMD: Visual molecular dynamics. Journal of Molecular Graphics 1996, 14, 33–38. [DOI] [PubMed] [Google Scholar]
  • (97).Goyal B; Goyal D Targeting the Dimerization of the Main Protease of Coronaviruses: A Potential Broad-Spectrum Therapeutic Strategy. ACS Combinatorial Science 2020, 22, 297–305. [DOI] [PubMed] [Google Scholar]
  • (98).Wei P; Fan K; Chen H; Ma L; Huang C; Tan L; Xi D; Li C; Liu Y; Cao A; Lai L The N-terminal octapeptide acts as a dimerization inhibitor of SARS coronavirus 3C-like proteinase. Biochemical and Biophysical Research Communications 2006, 339, 865–872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (99).Chou C-Y; Chang H-C; Hsu W-C; Lin T-Z; Lin C-H; Chang G-G Quaternary Structure of the Severe Acute Respiratory Syndrome (SARS) Coronavirus Main Protease. Biochemistry 2004, 43, 14958–14970. [DOI] [PubMed] [Google Scholar]
  • (100).Shi J; Sivaraman J; Song J Mechanism for Controlling the Dimer-Monomer Switch and Coupling Dimerization to Catalysis of the Severe Acute Respiratory Syndrome Coronavirus 3C-Like Protease. Journal of Virology 2008, 82, 4620–4629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (101).Chen S; Chen L; Tan J; Chen J; Du L; Sun T; Shen J; Chen K; Jiang H; Shen X Severe Acute Respiratory Syndrome Coronavirus 3C-like Proteinase N Terminus Is Indispensable for Proteolytic Activity but Not for Enzyme Dimerization. Journal of Biological Chemistry 2005, 280, 164–173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (102).Cheng S-C; Chang G-G; Chou C-Y Mutation of Glu-166 Blocks the Substrate-Induced Dimerization of SARS Coronavirus Main Protease. Biophysical Journal 2010, 98, 1327–1336. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (103).Li C; Teng X; Qi Y; Tang B; Shi H; Ma X; Lai L Conformational Flexibility of a Short Loop near the Active Site of the SARS-3CLpro is Essential to Maintain Catalytic Activity. Scientific Reports 2016, 6, 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (104).Li C; Qi Y; Teng X; Yang Z; Wei P; Zhang C; Tan L; Zhou L; Liu Y; Lai L Maturation mechanism of severe acute respiratory syndrome (SARS) coronavirus 3C-like proteinase. Journal of Biological Chemistry 2010, 285, 28134–28140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (105).Frishman D; Argos P Knowledge-based protein secondary structure assignment. Proteins: Structure, Function, and Genetics 1995, 23, 566–579. [DOI] [PubMed] [Google Scholar]
  • (106).Faraggi E; Dunker AK; Jernigan RL; Kloczkowski A Entropy, Fluctuations, and Disordered Proteins. Entropy 2019. , 21, 764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (107).Horowitz S; Trievel RC Carbon-Oxygen Hydrogen Bonding in Biological Structure and Function. Journal of Biological Chemistry 2012, 287, 41576–41582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (108).Zhao Q; Li S; Xue F; Zou Y; Chen C; Bartlam M; Rao Z Structure of the Main Protease from a Global Infectious Human Coronavirus, HCoV-HKU1. Journal of Virology 2008, 82, 8647–8655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (109).Wang F; Chen C; Tan W; Yang K; Yang H Structure of Main Protease from Human Coronavirus NL63: Insights for Wide Spectrum Anti-Coronavirus Drug Design. Scientific Reports 2016, 6, 22677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (110).Penkler DL; Atilgan C; Tastan Bishop Ö Allosteric Modulation of Human Hsp90α Conformational Dynamics. Journal of Chemical Information and Modeling 2018, 58, 383–404. [DOI] [PubMed] [Google Scholar]
  • (111).Aledo JC. Methionine in proteins: The Cinderella of the proteinogenic amino acids. Protein Science 2019, 28, 1785–1796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (112).Hu T; Zhang Y; Li L; Wang K; Chen S; Chen J; Ding J; Jiang H; Shen X Two adjacent mutations on the dimer interface of SARS coronavirus 3C-like protease cause different conformational changes in crystal structure. Virology 2009, 388, 324–334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (113).Zheng K; Ma G; Zhou J; Zen M; Zhao W; Jiang Y; Yu Q; Feng J Insight into the activity of SARS main protease: Molecular dynamics study of dimeric and monomeric form of enzyme. Proteins: Structure, Function, and Bioinformatics 2006, 66, 467–479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (114).Kim S; Chen J; Cheng T; Gindulyte A; He J; He S; Li Q; Shoemaker BA; Thiessen PA; Yu B; Zaslavsky L; Zhang J; Bolton EE PubChem 2019 update: improved access to chemical data. Nucleic Acids Research 2019, 47, D1102–D1109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (115).Chu DK; Akl EA; Duda S; Solo K; Yaacoub S; Schünemann HJ; Chu DK; Akl EA; El-harakeh A; Bognanni A; Lotfi T; Loeb M; Hajizadeh A; Bak A; Izcovich A; Cuello-Garcia CA; Chen C; Harris DJ; Borowiack E; Chamseddine F; Schünemann F; Morgano GP; Muti Schünemann GEU; Chen G; Zhao H; Neumann I; Chan J; Khabsa J; Hneiny L; Harrison L; Smith M; Rizk N; Giorgi Rossi P; AbiHanna P; El-khoury R; Stalteri R; Baldeh T; Piggott T; Zhang Y; Saad Z; Khamis A; Reinap M; Duda S; Solo K; Yaacoub S; Schünemann HJ Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: a systematic review and meta-analysis. The Lancet 2020, 6736, 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • (116).Chookajorn T. Evolving COVID-19 conundrum and its impact. Proceedings of the National Academy of Sciences 2020, 117, 12520–12521. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ci-2020-00634r SI

RESOURCES