Skip to main content
AAAS - PMC COVID-19 Collection logoLink to AAAS - PMC COVID-19 Collection
. 2022 May 24:abm1208. doi: 10.1126/science.abm1208

Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness

Fritz Obermeyer 1,2,*,, Martin Jankowiak 1,2, Nikolaos Barkas 1, Stephen F Schaffner 1,3,4, Jesse D Pyle 1,5, Leonid Yurkovetskiy 6, Matteo Bosso 6, Daniel J Park 1, Mehrtash Babadi 1, Bronwyn L MacInnis 1,4,7, Jeremy Luban 1,6,7,8, Pardis C Sabeti 1,3,4,7,9,, Jacob E Lemieux 1,10,*,
PMCID: PMC9161372  PMID: 35608456

Abstract

Repeated emergence of SARS-CoV-2 variants with increased fitness underscores the value of rapid detection and characterization of new lineages. We have developed PyR0, a hierarchical Bayesian multinomial logistic regression model that infers relative prevalence of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to fitness. Applying PyR0 to all publicly available SARS-CoV-2 genomes, we identify numerous substitutions that increase fitness, including previously identified spike mutations and many non-spike mutations within the nucleocapsid and nonstructural proteins. PyR0 forecasts growth of new lineages from their mutational profile, ranks the fitness of lineages as new sequences become available, and prioritizes mutations of biological and public health concern for functional characterization.


The SARS-CoV-2 pandemic has been characterized by repeated waves of cases driven by the emergence of new lineages with higher fitness, where fitness encompasses any trait that affects the lineage’s growth, including its basic reproduction number (R0), ability to evade existing immunity, and generation time. Rapidly identifying such lineages as they emerge, and accurately forecasting their dynamics, is critical for guiding outbreak response. Doing so effectively would benefit from the ability to interrogate the entirety of the global SARS-CoV-2 genomic dataset. The large size (currently over 7.5 million virus genomes) and geographic and temporal variability of the available data present significant challenges that will become greater as more viruses are sequenced. Current phylogenetic approaches are computationally inefficient on datasets with more than ~5000 samples and take days to run at that scale. Ad hoc methods to estimate the relative fitness of particular SARS-CoV-2 lineages are a computationally efficient alternative ( 1 3 ), but have typically relied on models in which one or two lineages of interest are compared to all others and do not capture the complex dynamics of multiple co-circulating lineages.

Furthermore, estimates of relative fitness based on lineage frequency data alone ( 2 , 4 , 5 ) do not take advantage of additional statistical power that can be gained from analyzing the independent appearance and growth of the same mutation in multiple lineages. Performing a mutation-based analysis of lineage prevalence has the additional advantage of identifying specific genetic determinants of a lineage’s phenotype, which is critically important both for understanding the biology of transmission and pathogenesis and for predicting the phenotype of new lineages. The SARS-CoV-2 pandemic has already been dominated by several genetic changes of functional and epidemiological importance, including the spike (S) D614G mutation that is associated with higher SARS-CoV-2 loads ( 6 , 7 ). Mutations found in Variants of Concern (VoC), such as S:N439R, S:N501Y, and S:E484K, have been linked, respectively, to increased transmissibility ( 8 ), enhanced binding to ACE2 ( 9 ), and antibody escape ( 10 , 11 ). Despite these successes, identifying functionally important mutations in the context of a large background of genetic variants of little or no phenotypic consequence remains challenging.

In modeling the relative fitness of SARS-CoV-2 lineages, we estimated their growth as a linear combination of the effects of individual mutations. To this end, we developed PyR0, a hierarchical Bayesian regression model that enables scalable analysis of the complete set of publicly available SARS-CoV-2 genomes, that can be applied to any viral genomic dataset and to other viral phenotypes. The model, which is summarized in fig. S1, and described in detail in the supplementary materials, avoids the complexity of full phylogenetic inference by first clustering genomes by genetic similarity (refining PANGO lineages ( 12 )), and estimating the incremental effect on growth rate of each of the most common amino acid changes on the lineages in which they appear. By regressing growth rate on genome sequence, the model shares statistical strength among genetically similar lineages without explicitly relying on phylogeny. By modeling only the multinomial proportion of different lineages rather than the absolute number of samples for each lineage ( 13 , 14 ), and by doing so within 14-day intervals in 1,560 globally-distributed geographic regions, the model achieves robustness to a number of sources of bias that affect all lineages, across regions and over time, including differences in data collection and changes in transmission due to such factors as social behavior, public health policy, and vaccination.

We fit PyR0 to 6,466,300 SARS-CoV-2 genomes available on GISAID ( 15 ) as of January 20, 2022, in a model that contained 3,000 clusters, derived from 1,544 PANGO lineages, and 2,904 nonsynonymous mutations. The output of the model is a posterior distribution for the relative fitness (exponential growth rate) of each lineage and for the contribution to the fitness from each mutation. Fitting this large model is computationally challenging, so we used stochastic variational inference, an approximate inference method that reduced our task to solving a 75-million-dimensional optimization problem on a GPU. Inference was implemented in the Pyro ( 16 ) probabilistic programming framework (see Supplemental Materials). The trained model can be used to infer lineage fitness, predict the fitness of completely new lineages, forecast future lineage proportions, and estimate the effects of individual mutations on fitness.

The model's lineage fitness estimates (Fig. 1B) show a modest upward trend over time among all lineages, interrupted by several lineages with much higher fitness. Sensitivity analyses revealed qualitative consistency of fitness estimates across spatial data subsets (fig. S2). The upward trend may in part reflect an upward bias caused by the lineage assignment process, as can be seen in simulation studies (fig. S3), but the high tail of the distribution exhibits elevated fitness values far in excess of this trend. The spread of the virus into human populations in late 2019 and early 2022 has been marked by periods of rapid evolution in fitness and waves of increase in case counts (Fig. 1). While PANGO lineages facilitate communication by providing a stable nomenclature, we observed some PANGO lineages with multiple successive peaks in some regions, suggesting that sublineages within them had differing fitnesses. We therefore algorithmically refined the 1,544 PANGO lineages into 3,000 finer clusters, and found that our model identified significant heterogeneity within some PANGO lineages (fig. S4). When we tested the model's predictive ability (fig. S5), we found that forecasts were reliable for 1-2 months into the future for variants of concern, but not necessarily other variants, when they tended to be disrupted by the emergence of a completely new strain (table S1, fig. S6). The accuracy of forecasts stabilized typically stabilized within two weeks after the emergence of a new competitive lineage in a region (fig. S6).

Fig. 1. Relative fitness versus date of lineage emergence.


Fig. 1.

Circle size is proportional to cumulative case count inferred from lineage proportion estimates and confirmed case counts. Inset table lists the 10 fittest lineages inferred by the model. R/RA is the fold increase in relative fitness over the Wuhan (A) lineage, assuming a fixed generation time of 5.5 days.

The model correctly infers WHO classification variant Omicron (PANGO BA.2) to have the highest fitness to date, 8.9-fold (95% CI, 8.6-9.2) higher than the original A lineage (Fig. 1 inset), accurately foreshadowing its rise in regions where it is circulating (fig. S7). Through systematic backtesting, we found that the model would have provided early warning and aided in the identification of VoCs had it been routinely applied to SARS-CoV-2 samples, confirming the importance for public health of timely publication of genomic data. For example, the elevated fitness of BA.2 was identified by mid-December 2021 on the basis of 76 reported sequences (fig. S8); sharing statistical strength over mutations enabled an earlier and more confident prediction that BA.2 was the fittest lineage yet observed (fig. S10). Likewise, PyR0 would have forecast the dominance of B.1.1.7 in late November 2020 (fig. S9), AY.4 by May 2021 (fig. S10), and BA.1 by early December 2021 (fig. S8). While variant-specific models were accurate and useful in predicting the rise of these lineages ( 2 ), each modeling effort was specific to a particular lineage and geographic region. PyR0’s global approach provides similar early detection while also offering automated, rapid, and standardized unbiased consideration of all variants and lineages, together with ranking based on relative fitness.

Compared to standard multinomial regression models, PyR0 estimates of lineage fitness were similar (Pearson’s R = 0.95, S11-S12), but including mutations in the model enables PyR0 to infer elevated fitness of Omicron lineages BA.1 and BA.2 faster than the model without mutations (fig. S14). In contrast to non-hierarchical binomial logistic regression (fig. S13), PyR0 estimates displayed less variability as data accumulated, benefitting from the sharing of information across regions and the regularizing effect of the priors. Lineage fitness estimates were also stable between our initial analysis of 2.1 million genomes in August 2021 ( 17 ), shortly after the emergence of Delta lineages, and before the emergence of Omicron (Spearman’s rho = 0.78, fig. S15C). The correlation between individual amino acids in the two models was weaker than that for lineages (fig. S15D-E, rho = 0.48) but still significant (test of no association for rho, p < 2 x 10−16), reflecting both the inherent difficulty of estimating high-dimensional mutational coefficients observed indirectly through lineage counts (Supplementary Note 1), as well as the addition of 4.3 million sequences, including highly fit Omicron lineages distinguished by their enhanced immune escape.

By jointly modeling fitness estimates using lineage counts and individual mutations, PyR0 harnesses convergent evolution (Table 1 and fig. S16) to infer the fitness of new constellations of mutations based on the trajectories of other lineages in which they have previously emerged. This predictive capability has the potential to aid public health efforts because the model has the potential to learn faster by incorporating mutations than it would by relying on lineage counts alone (fig. S14). To test the reliability of this kind of estimate, we fit leave-one-out estimators for PANGO lineages on subsets of the dataset with that entire lineage removed, based solely on the mutational content of the omitted lineage (fig. S17). These estimators showed excellent agreement with estimators based on the observed behavior of the lineages, and they were also more accurate than naive phylogenetic estimators that assume the fitness of each new strain is equal to its parent lineage's fitness (Pearson's R = 0.983, after correcting for parent fitness, fig. S17). Together, these analyses suggest that PyR0 has the potential to aid genomic surveillance efforts by providing an automated early warning system on a similar time scale as sophisticated regional surveillance efforts ( 18 , 19 ).

Table 1. Amino acid substitutions most significantly associated with increased fitness.

Significance is defined as posterior mean / posterior standard deviation. Fitness is per 5.5 days (estimated generation time of the Wuhan (A) lineage ( 1 , 23 )). Final c​​olumn: number of PANGO lineages in which each substitution emerged independently.

Rank Gene Substitution Fold Increase in Fitness Number of
Lineages
1 S H655Y 1.051 33
2 S T95I 1.046 30
3 ORF1a P3395H 1.039 5
4 S N764K 1.04 6
5 ORF1a K856R 1.039 2
6 S S371L 1.041 3
7 E T9I 1.04 5
8 S Q954H 1.04 5
9 ORF9b P10S 1.039 25
10 S L981F 1.04 2
11 N P13L 1.04 25
12 S G339D 1.039 4
13 S S375F 1.04 5
14 S S477N 1.039 47
15 S N679K 1.04 11
16 S S373P 1.04 5
17 M Q19E 1.039 5
18 S D796Y 1.038 11
19 S N969K 1.04 5
20 S T547K 1.038 3

Genome-wide estimates of the effect of SARS-CoV-2 mutations on fitness also provide a powerful tool for better understanding the biology of fitness. Our model allowed us to estimate the contribution of 2,904 amino acid substitutions (Fig. 2A and Table 1) to lineage fitness and to rank them by inferred statistical significance (fig. S18). Cross-validation confirmed that these results replicate qualitatively across different geographic regions (fig. S19).

Fig. 2. Manhattan plot of amino acid changes assessed in this study.


Fig. 2.

(A) Changes across the entire genome. (B) Changes in the first 850 amino acids of S. In each of (A) to (C) the y axis shows effect size Δ log R, the estimated change in log relative fitness due to each amino acid change. The bottom three axes show the background density of all observed amino acid changes, the density of those associated with growth (weighted by |Δ log R|), and the ratio of the two. The top 55 amino acid changes are labeled. See fig. S13 for detailed views of S, N, ORF1a, and ORF1b. C. Changes in the first 250 amino acids of N. (D) Structure of the spike-ACE2 complex (PDB: 7KNB). Spike subunits colored light blue, light orange, and gray. Top-ranked mutations are shown as red spheres. ACE2 is shown in magenta. (E) Close-up view of the RBD interface. (F) Top-ranked mutations in the N-terminal RNA-binding domain of N. Residues 44-180 of N (PDB: 7ACT) are shown in light blue. Amino acid positions corresponding to top mutations in this region are shown as red spheres. A 10-nt bound RNA is shown in gray.

The highest concentrations of fitness-associated mutations were found in the S, N, and the ORF1 polyprotein genes (ORF1a and ORF1b, Fig. 2, A and B, and figs. S20 and S21). Using spatial autocorrelation as a measure of spatial structure, we found evidence of functional hotspots in the S, N, ORF7a, ORF3a, and ORF1a genes (table S2). Within S, we confirmed three hotspots of fitness-enhancing mutations, each within a defined functional region: the N-terminal domain, the receptor-binding domain (RBD), and the furin-cleavage site (Fig. 2B). We assessed mutational enrichment in the top-ranked set of mutations and identified an enrichment for lysine to asparagine mutations in the S gene (fig. S22C). We visualized top scoring mutations within atomic structures for the spike protein (Fig. 2, D to E), the nucleocapsid's N-terminal domain (Fig. 2F), the polymerase (fig. S23), and two proteases (fig. S24). Many of the top mutations in the S gene occurred in the receptor binding domain (RBD) making direct contacts with the ACE2 receptor, including K417N/T and E484K (Fig. 2, D to E). Two top-ranked mutations, T478K and S477N, occur in a flexible loop adjacent to the S-ACE2 interface (Fig. 3E), suggesting that these mutations may affect the kinetics of receptor engagement or the Spike conformational changes that follow. Other mutations occurred in regions proximal to essential enzymatic active sites of the viral replication (fig. S15) or protein processing (fig. S16) machinery.

Fig. 3. (A) Infectivity relative to WT of lentiviral vectors pseudotyped with the indicated Spike mutants.


Fig. 3.

Target cells were HEK293T cells expressing ACE2 and TMPRSS2 transgenes. The genetic background of the Spike was Wuhan-Hu-1 bearing D614G. Red bars were significantly different from WT (adjusted p values shown). Black bars were not significantly different from WT. (B) For the 1701 SARS-CoV-2 clusters with at least one amino acid substitution in the RBD domain we compare: i) the PyR0 prediction for the contribution to Δ log R from RBD substitutions only; to ii) antibody binding computed using the antibody-escape calculator in ( 20 ). The escape calculator is based on an intuitive non-linear model parameterized using deep mutational scanning data for 33 neutralizing antibodies elicited by SARS-CoV-2. PyR0 predictions exhibit high (Spearman) correlation with predictions from Greaney et al. ( 20 ) (C to E) We dissect PyR0 Δ log R estimates into S-gene (C), RBD (D), and non-S-gene (E) contributions for 3000 SARS-CoV-2 clusters (blue dots). The horizontal axis corresponds to the date at which each cluster first emerged. Red squares denote the median Δ log R within each monthly bin. The increased importance of S-gene mutations (notably in the RBD) over non-S-gene mutations starting around November 2021 is apparent.

We tested several of the high-scoring mutations in single-cycle infectivity assays as done previously ( 7 ), focusing on the RBD (Fig. 3A). We found that while some individual mutations increased infectivity, on average, high-scoring RBD mutations did not promote infectivity per se. We considered an alternate possibility that fitness of Spike mutations is driven by immune escape. Using RBD-aggregated mutations as a proxy for immune escape, we found that the fitness effect of these Spike mutations correlates well with antibody escape estimates from Greaney et al. ( 20 ) (Fig. 3B). Together with the observed jump in fitness beginning in late 2021 (Fig. 3C) associated with Spike mutations, but not mutations elsewhere in the genome (Fig. 3E), these results suggest that immune escape is the dominant driver of current fitness increases. BA.1 and BA.2 had similar estimated fitness from Spike mutations, potentially consistent with similar Spike antibody neutralization of these variants ( 21 ), whereas PyR0 inferred that the elevated fitness of BA.2 is attributed to non-Spike mutations (fig. S25). In contrast to mutations in Spike, those in the serine-arginine rich region of N were linked to increased efficiency of SARS-CoV-2 genomic RNA packaging ( 22 ). Within ORF1, we found fitness-associated mutations across all viral enzymes, and clusters within additional non-structural proteins (nsps). The highest concentration of fitness-associated mutations is found in nsp4, nsp6, and nsp12–14 (fig. S12B,S13C-D), suggesting unexplored function at those sites. For example, nsp4 and nsp6 have roles in assembly of replication compartments, and substitutions in these regions may influence the kinetics of replication (see Supplemental Note 3). We caution that while convergent evolution makes it possible to identify candidate functional mutations, observational data alone is insufficient to declare mutations as causal rather than merely correlated. Our uncertainty-ranked list of important mutations can be used to prioritize hits identified by our study for functional follow-up.

Some lineages increased in fitness more than others over the course of the pandemic (fig. S4). Notably, B.1.1 displayed the greatest variability among sublineages, followed by B.1. Fitness appeared to reach a plateau over time for most lineages (Fig. 1 and fig. S4). In contrast to Omicron sublineages, Alpha and Delta showed little variability in Spike-attributable fitness (fig. S25), suggesting that the propensity to acquire new Spike mutations depends on the constellation of mutations that comprise a lineage, consistent with epistasis. A limitation of PyR0 is that it does not incorporate epistatic interactions between mutations (Supplemental Note 1); however, our results demonstrate the feasibility of inferring genetic determinants and lineage fitness using the simplest possible linear-additive model and provide a foundation for future research for more complex modeling that includes epistatic effects between mutations and migration across geographic regions.

In summary, PyR0 provides a genome-wide, automated approach for detecting viral lineages with increased fitness. By combining a model-based assessment of lineage fitness with absolute case counts, our model provides a global picture of the events of the first two years of the pandemic. Because it assesses the contribution of individual mutations and aggregates across all lineages and geographic regions, it can identify mutations and gene regions that likely increase fitness, and mutation-level information may help detect fitter lineages earlier than case counts alone. Applied to the full set of publicly available SARS-CoV-2 genomes, it provides a genomic view of the mutations driving increased fitness of the virus, identifying experimentally established driver mutations in S and highlighting the key role of non-S mutations, particularly in N, ORF1b, and ORF1a, which have received relatively less research attention. By modeling millions of viral sequences across thousands of regions, PyR0 yields mechanistic insight into viral fitness and offers a panoramic view of viral evolution, revealing a pattern whereby major circulating lineages fragment into sublineages with modest differences in fitness before they are collectively displaced by the sudden emergence of markedly fitter variants.

Acknowledgements

We acknowledge crucial assistance in data preprocessing from A. Hinrichs. We thank T. Bedford and C. Roemer for visualizing the outputs of our model on nextstrain.org. We acknowledge helpful discussions and feedback from D. Phan, W. Hanage, C. Tomkins-Tinch, S. Weingarten-Gabbay, K. Siddle, S. Gosai, S. Reilly, E. Bingham, H. Soutter, D. Marks, N. Youssef, S. Gurev, and N. Thadani. We gratefully acknowledge the authors from the originating laboratories and the submitting laboratories, who generated and shared via GISAID genetic sequence data on which this research is based (Supplementary Data File 3). This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/. This license does not apply to figures/photos/artwork or other content included in the article that is credited to a third party; obtain authorization from the rights holder before using such material.

Funding: This work was sponsored by the U.S. Centers for Disease Control and Prevention (BAA 75D30120C09605 to B.L.M.), as well as support from the Doris Duke Charitable Foundation (J.E.L.), the Howard Hughes Medical Institute (P.C.S.), the National Institute of Allergy and Infectious Diseases R37AI147868 (J.L.) and U19AI110818 (P.C.S.), and the Evergrande COVID-19 Response Fund Award from the Massachusetts Consortium on Pathogen Readiness (J.L.).

Competing interests: P.C.S. is a co-founder of and consultant to Sherlock Biosciences and a Board Member of Danaher Corporation, and holds equity in the companies. The authors declare no conflicts of interest.

Author contributions: Conceptualization: F.O., S.F.S., J.E.L., M.J. Data curation: F.O., N.B. Formal Analysis: F.O., S.F.S, M.J., N.B., J.E.L. Funding acquisition: D.J.P., B.M., P.C.S, J.L., J.E.L. Investigation: all authors. Methodology: F.O., S.F.S, M.J., J.E.L., L.Y., M.B. Project administration: all authors. Software: F.O., N.B., M.J. Supervision: D.J.P., B.M., J.L., P.C.S., J.E.L. Validation: F.O., N.B., M.J., S.F.S. Visualization: F.O., J.E.L., N.B., J.P., S.F.S. Writing – original draft: F.O., S.F.S., B.M., P.C.S, J.E.L. Writing – review and editing: All authors.

Data and materials availability: Code is available at ( 24 ). We gratefully acknowledge all data contributors, i.e., the authors and their originating laboratories responsible for obtaining the specimens, and their submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID initiative ( 15 ) on which this research is based. A total of 6,466,300 submissions are included in this study. A complete list of 6.4 million accession numbers is included as data S3.

Supplementary Materials

This PDF file includes:

Materials and Methods

Figures S1 to S34

Tables S1 to S5

References ( 25 56 )

MDAR Reproducibility Checklist

Data S1 to S5

GISAID Acknowledgments table

Other Supplementary Material for this manuscript includes the following:

MDAR Reproducibility Checklist

Data S1 to S5

References

  • 1. Davies N. G., Abbott S., Barnard R. C., Jarvis C. I., Kucharski A. J., Munday J. D., Pearson C. A. B., Russell T. W., Tully D. C., Washburne A. D., Wenseleers T., Gimma A., Waites W., Wong K. L. M., van Zandvoort K., Silverman J. D., Diaz-Ordaz K., Keogh R., Eggo R. M., Funk S., Jit M., Atkins K. E., Edmunds W. J.; CMMID COVID-19 Working Group; COVID-19 Genomics UK (COG-UK) Consortium , Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science 372, eabg3055 (2021). 10.1126/science.abg3055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Volz E., Mishra S., Chand M., Barrett J. C., Johnson R., Geidelberg L., Hinsley W. R., Laydon D. J., Dabrera G., O’Toole Á., Amato R., Ragonnet-Cronin M., Harrison I., Jackson B., Ariani C. V., Boyd O., Loman N. J., McCrone J. T., Gonçalves S., Jorgensen D., Myers R., Hill V., Jackson D. K., Gaythorpe K., Groves N., Sillitoe J., Kwiatkowski D. P., Flaxman S., Ratmann O., Bhatt S., Hopkins S., Gandy A., Rambaut A., Ferguson N. M.; COVID-19 Genomics UK (COG-UK) consortium , Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England. Nature 593, 266–269 (2021). [DOI] [PubMed] [Google Scholar]
  • 3. Stefanelli P., Trentini F., Guzzetta G., Marziano V., Mammone A., Poletti P., Grané C. M., Manica M., del Manso M., Andrianou X., Others, Co-circulation of SARS-CoV-2 variants B. 1.1. 7 and P. 1. medRxiv (2021) (available at https://www.medrxiv.org/content/10.1101/2021.04.06.21254923v1.abstract).
  • 4. Stefanelli P., Trentini F., Guzzetta G., Marziano V., Mammone A., Sane Schepisi M., Poletti P., Molina Grané C., Manica M., Del Manso M., Andrianou X., Ajelli M., Rezza G., Brusaferro S., Merler S.; COVID-19 National Microbiology Surveillance Study Group , Co-circulation of SARS-CoV-2 Alpha and Gamma variants in Italy, February and March 2021. Euro Surveill. 27, (2022). 10.2807/1560-7917.ES.2022.27.5.2100429 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Vöhringer H. S., Sanderson T., Sinnott M., De Maio N., Nguyen T., Goater R., Schwach F., Harrison I., Hellewell J., Ariani C. V., Gonçalves S., Jackson D. K., Johnston I., Jung A. W., Saint C., Sillitoe J., Suciu M., Goldman N., Panovska-Griffiths J., Birney E., Volz E., Funk S., Kwiatkowski D., Chand M., Martincorena I., Barrett J. C., Gerstung M.; Wellcome Sanger Institute COVID-19 Surveillance Team; COVID-19 Genomics UK (COG-UK) Consortium* , Genomic reconstruction of the SARS-CoV-2 epidemic in England. Nature 600, 506–511 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Korber B., Fischer W. M., Gnanakaran S., Yoon H., Theiler J., Abfalterer W., Hengartner N., Giorgi E. E., Bhattacharya T., Foley B., Hastie K. M., Parker M. D., Partridge D. G., Evans C. M., Freeman T. M., de Silva T. I., McDanal C., Perez L. G., Tang H., Moon-Walker A., Whelan S. P., LaBranche C. C., Saphire E. O., Montefiori D. C., Angyal A., Brown R. L., Carrilero L., Green L. R., Groves D. C., Johnson K. J., Keeley A. J., Lindsey B. B., Parsons P. J., Raza M., Rowland-Jones S., Smith N., Tucker R. M., Wang D., Wyles M. D.; Sheffield COVID-19 Genomics Group , Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus. Cell 182, 812–827.e19 (2020). 10.1016/j.cell.2020.06.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Yurkovetskiy L., Wang X., Pascal K. E., Tomkins-Tinch C., Nyalile T. P., Wang Y., Baum A., Diehl W. E., Dauphin A., Carbone C., Veinotte K., Egri S. B., Schaffner S. F., Lemieux J. E., Munro J. B., Rafique A., Barve A., Sabeti P. C., Kyratsous C. A., Dudkina N. V., Shen K., Luban J., Structural and Functional Analysis of the D614G SARS-CoV-2 Spike Protein Variant. Cell 183, 739–751.e8 (2020). 10.1016/j.cell.2020.09.032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Deng X., Garcia-Knight M. A., Khalid M. M., Servellita V., Wang C., Morris M. K., Sotomayor-González A., Glasner D. R., Reyes K. R., Gliwa A. S., Reddy N. P., Sanchez San Martin C., Federman S., Cheng J., Balcerek J., Taylor J., Streithorst J. A., Miller S., Sreekumar B., Chen P.-Y., Schulze-Gahmen U., Taha T. Y., Hayashi J. M., Simoneau C. R., Kumar G. R., McMahon S., Lidsky P. V., Xiao Y., Hemarajata P., Green N. M., Espinosa A., Kath C., Haw M., Bell J., Hacker J. K., Hanson C., Wadford D. A., Anaya C., Ferguson D., Frankino P. A., Shivram H., Lareau L. F., Wyman S. K., Ott M., Andino R., Chiu C. Y., Transmission, infectivity, and neutralization of a spike L452R SARS-CoV-2 variant. Cell 184, 3426–3437.e8 (2021). 10.1016/j.cell.2021.04.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Starr T. N., Greaney A. J., Hilton S. K., Ellis D., Crawford K. H. D., Dingens A. S., Navarro M. J., Bowen J. E., Tortorici M. A., Walls A. C., King N. P., Veesler D., Bloom J. D., Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding. Cell 182, 1295–1310.e20 (2020). 10.1016/j.cell.2020.08.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Choi B., Choudhary M. C., Regan J., Sparks J. A., Padera R. F., Qiu X., Solomon I. H., Kuo H.-H., Boucau J., Bowman K., Adhikari U. D., Winkler M. L., Mueller A. A., Hsu T. Y.-T., Desjardins M., Baden L. R., Chan B. T., Walker B. D., Lichterfeld M., Brigl M., Kwon D. S., Kanjilal S., Richardson E. T., Jonsson A. H., Alter G., Barczak A. K., Hanage W. P., Yu X. G., Gaiha G. D., Seaman M. S., Cernadas M., Li J. Z., Persistence and Evolution of SARS-CoV-2 in an Immunocompromised Host. N. Engl. J. Med. 383, 2291–2293 (2020). 10.1056/NEJMc2031364 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Greaney A. J., Starr T. N., Gilchuk P., Zost S. J., Binshtein E., Loes A. N., Hilton S. K., Huddleston J., Eguia R., Crawford K. H. D., Dingens A. S., Nargi R. S., Sutton R. E., Suryadevara N., Rothlauf P. W., Liu Z., Whelan S. P. J., Carnahan R. H., Crowe J. E. Jr., Bloom J. D., Complete Mapping of Mutations to the SARS-CoV-2 Spike Receptor-Binding Domain that Escape Antibody Recognition. Cell Host Microbe 29, 44–57.e9 (2021). 10.1016/j.chom.2020.11.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Rambaut A., Holmes E. C., O’Toole Á., Hill V., McCrone J. T., Ruis C., du Plessis L., Pybus O. G., A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 5, 1403–1407 (2020). 10.1038/s41564-020-0770-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Vöhringer H. S., Sanderson T., Sinnott M., De Maio N., Nguyen T., Goater R., Schwach F., Harrison I., Hellewell J., Ariani C., Gonçalves S., Jackson D., Johnston I., Jung A. W., Saint C., Sillitoe J., Suciu M., Goldman N., Birney E., Funk S., Volz E., Kwiatkowski D., Chand M., Martincorena I., Barrett J. C., Gerstung M., The Wellcome Sanger Institute Covid-19 Surveillance Team, The COVID-19 Genomics UK (COG-UK) Consortium, Genomic reconstruction of the SARS-CoV-2 epidemic across England from September 2020 to May 2021 bioRxiv (2021), doi:. 10.1101/2021.05.22.21257633 [DOI]
  • 14. Campbell F., Archer B., Laurenson-Schafer H., Jinnai Y., Konings F., Batra N., Pavlin B., Vandemaele K., Van Kerkhove M. D., Jombart T., Morgan O., le Polain de Waroux O., Increased transmissibility and global spread of SARS-CoV-2 variants of concern as at June 2021. Euro Surveill. 26, (2021). 10.2807/1560-7917.ES.2021.26.24.2100509 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Elbe S., Buckland-Merrett G., Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob. Chall. 1, 33–46 (2017). 10.1002/gch2.1018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Bingham E., Chen J. P., Jankowiak M., Obermeyer F., Pradhan N., Karaletsos T., Singh R., Szerlip P., Horsfall P., Goodman N. D., Pyro: Deep universal probabilistic programming. J. Mach. Learn. Res. 20, 973–978 (2019). [Google Scholar]
  • 17. Obermeyer F., Schaffner S. F., Jankowiak M., Barkas N., Pyle J. D., Park D. J., MacInnis B. L., Luban J., Sabeti P. C., Lemieux J. E., Analysis of 2.1 million SARS-CoV-2 genomes identifies mutations associated with transmissibility medRxiv (2021), doi:. 10.1101/2021.09.07.21263228 [DOI] [PMC free article] [PubMed]
  • 18.Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutationsVirological (2020) (available at https://virological.org/t/preliminary-genomic-characterisation-of-an-emergent-sars-cov-2-lineage-in-the-uk-defined-by-a-novel-set-of-spike-mutations/563).
  • 19. Viana R., Moyo S., Amoako D. G., Tegally H., Scheepers C., Althaus C. L., Anyaneji U. J., Bester P. A., Boni M. F., Chand M., Choga W. T., Colquhoun R., Davids M., Deforche K., Doolabh D., du Plessis L., Engelbrecht S., Everatt J., Giandhari J., Giovanetti M., Hardie D., Hill V., Hsiao N.-Y., Iranzadeh A., Ismail A., Joseph C., Joseph R., Koopile L., Kosakovsky Pond S. L., Kraemer M. U. G., Kuate-Lere L., Laguda-Akingba O., Lesetedi-Mafoko O., Lessells R. J., Lockman S., Lucaci A. G., Maharaj A., Mahlangu B., Maponga T., Mahlakwane K., Makatini Z., Marais G., Maruapula D., Masupu K., Matshaba M., Mayaphi S., Mbhele N., Mbulawa M. B., Mendes A., Mlisana K., Mnguni A., Mohale T., Moir M., Moruisi K., Mosepele M., Motsatsi G., Motswaledi M. S., Mphoyakgosi T., Msomi N., Mwangi P. N., Naidoo Y., Ntuli N., Nyaga M., Olubayo L., Pillay S., Radibe B., Ramphal Y., Ramphal U., San J. E., Scott L., Shapiro R., Singh L., Smith-Lawrence P., Stevens W., Strydom A., Subramoney K., Tebeila N., Tshiabuila D., Tsui J., van Wyk S., Weaver S., Wibmer C. K., Wilkinson E., Wolter N., Zarebski A. E., Zuze B., Goedhals D., Preiser W., Treurnicht F., Venter M., Williamson C., Pybus O. G., Bhiman J., Glass A., Martin D. P., Rambaut A., Gaseitsiwe S., von Gottberg A., de Oliveira T., Rapid epidemic expansion of the SARS-CoV-2 Omicron variant in southern Africa. Nature 603, 679–686 (2022). 10.1038/s41586-022-04411-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Greaney A. J., Starr T. N., Bloom J. D., An antibody-escape estimator for mutations to the SARS-CoV-2 receptor-binding domain. Virus Evol. 8, veac021 (2022). 10.1093/ve/veac021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Yu J., Collier A. Y., Rowe M., Mardas F., Ventura J. D., Wan H., Miller J., Powers O., Chung B., Siamatu M., Hachmann N. P., Surve N., Nampanya F., Chandrashekar A., Barouch D. H., Neutralization of the SARS-CoV-2 omicron BA.1 and BA.2 variants. N. Engl. J. Med. 386, 1579–1580 (2022). 10.1056/NEJMc2201849 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Syed A. M., Taha T. Y., Tabata T., Chen I. P., Ciling A., Khalid M. M., Sreekumar B., Chen P.-Y., Hayashi J. M., Soczek K. M., Ott M., Doudna J. A., Rapid assessment of SARS-CoV-2-evolved variants using virus-like particles. Science 374, 1626–1632 (2021). 10.1126/science.abl6184 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Ferretti L., Ledda A., Wymant C., Zhao L., Ledda V., Abeler-Dörner L., Kendall M., Nurtay A., Cheng H.-Y., Ng T.-C., Lin H.-H., Hinch R., Masel J., Kilpatrick A. M., Fraser C., The timing of COVID-19 transmission bioRxiv (2020), doi:. 10.1101/2020.09.04.20188516 [DOI]
  • 24. broadinstitute/pyro-cov : v0.2.1 (2022; https://zenodo.org/record/6399987).
  • 25. Turakhia Y., Thornlow B., Hinrichs A. S., De Maio N., Gozashti L., Lanfear R., Haussler D., Corbett-Detig R., Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic. Nat. Genet. 53, 809–816 (2021). 10.1038/s41588-021-00862-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. McBroome J., Thornlow B., Hinrichs A. S., Kramer A., De Maio N., Goldman N., Haussler D., Corbett-Detig R., Turakhia Y., A daily-updated database and tools for comprehensive SARS-CoV-2 mutation-annotated trees. Mol. Biol. Evol. 38, 5819–5824 (2021). 10.1093/molbev/msab264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Nersisyan S., Zhiyanov A., Shkurnikov M., Tonevitsky A., T-CoV: a comprehensive portal of HLA-peptide interactions affected by SARS-CoV-2 mutations bioRxiv, 2021.07.06.451227 (2021). 10.1101/2021.07.06.451227 [DOI] [PMC free article] [PubMed]
  • 28.J. F. Crow, M. and Kimura, An Introduction to Population Genetics Theory (The Blackburn Press, 1970). [Google Scholar]
  • 29. Hopf T. A., Schärfe C. P. I., Rodrigues J. P. G. L. M., Green A. G., Kohlbacher O., Sander C., Bonvin A. M. J. J., Marks D. S., Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3, e03430 (2014). 10.7554/eLife.03430 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Frazer J., Notin P., Dias M., Gomez A., Min J. K., Brock K., Gal Y., Marks D. S., Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021). 10.1038/s41586-021-04043-8 [DOI] [PubMed] [Google Scholar]
  • 31.A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in PyTorch (2017) (available at https://openreview.net/pdf?id=BJJsrmfCZ).
  • 32.M. Gorinova, D. Moore, M. Hoffman, in Proceedings of the 37th International Conference on Machine Learning, Proceedings of Machine Learning Research. H. D. Iii, A. Singh, Eds. (PMLR, 2020), vol. 119, pp. 3648–3657. [Google Scholar]
  • 33. Neal R. M., Slice sampling. Ann. Stat. 31, (2003). 10.1214/aos/1056562461 [DOI] [Google Scholar]
  • 34. Kingma D. P., Ba J., Adam: A Method for Stochastic OptimizationarXiv [cs.LG] (2014) (available at https://arxiv.org/abs/1412.6980).
  • 35. Cappello L., Kim J., Liu S., Palacios J. A., Statistical Challenges in Tracking the Evolution of SARS-CoV-2arXiv [stat.AP] (2021) (available at https://arxiv.org/abs/2108.13362). [DOI] [PMC free article] [PubMed]
  • 36. Cao Y., Wang J., Jian F., Xiao T., Song W., Yisimayi A., Huang W., Li Q., Wang P., An R., Wang J., Wang Y., Niu X., Yang S., Liang H., Sun H., Li T., Yu Y., Cui Q., Liu S., Yang X., Du S., Zhang Z., Hao X., Shao F., Jin R., Wang X., Xiao J., Wang Y., Xie X. S., Omicron escapes the majority of existing SARS-CoV-2 neutralizing antibodies. Nature 602, 657–663 (2022). 10.1038/s41586-021-04385-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Planas D., Saunders N., Maes P., Guivel-Benhassine F., Planchais C., Buchrieser J., Bolland W.-H., Porrot F., Staropoli I., Lemoine F., Péré H., Veyer D., Puech J., Rodary J., Baele G., Dellicour S., Raymenants J., Gorissen S., Geenen C., Vanmechelen B., Wawina-Bokalanga T., Martí-Carreras J., Cuypers L., Sève A., Hocqueloux L., Prazuck T., Rey F. A., Simon-Loriere E., Bruel T., Mouquet H., André E., Schwartz O., Considerable escape of SARS-CoV-2 Omicron to antibody neutralization. Nature 602, 671–675 (2022). 10.1038/s41586-021-04389-z [DOI] [PubMed] [Google Scholar]
  • 38. Weisblum Y., Schmidt F., Zhang F., DaSilva J., Poston D., Lorenzi J. C., Muecksch F., Rutkowska M., Hoffmann H.-H., Michailidis E., Gaebler C., Agudelo M., Cho A., Wang Z., Gazumyan A., Cipolla M., Luchsinger L., Hillyer C. D., Caskey M., Robbiani D. F., Rice C. M., Nussenzweig M. C., Hatziioannou T., Bieniasz P. D., Escape from neutralizing antibodies by SARS-CoV-2 spike protein variants. eLife 9, e61312 (2020). 10.7554/eLife.61312 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Lin A. E., Diehl W. E., Cai Y., Finch C. L., Akusobi C., Kirchdoerfer R. N., Bollinger L., Schaffner S. F., Brown E. A., Saphire E. O., Andersen K. G., Kuhn J. H., Luban J., Sabeti P. C., Reporter Assays for Ebola Virus Nucleoprotein Oligomerization, Virion-Like Particle Budding, and Minigenome Activity Reveal the Importance of Nucleoprotein Amino Acid Position 111. Viruses 12, 105 (2020). 10.3390/v12010105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Syed A. M., Taha T. Y., Khalid M. M., Tabata T., Chen I. P., Sreekumar B., Chen P.-Y., Hayashi J. M., Soczek K. M., Ott M., Doudna J. A., Rapid assessment of SARS-CoV-2 evolved variants using virus-like particles bioRxiv, 2021.08.05.455082 (2021). [DOI] [PMC free article] [PubMed]
  • 41. Angelini M. M., Akhlaghpour M., Neuman B. W., Buchmeier M. J., Severe acute respiratory syndrome coronavirus nonstructural proteins 3, 4, and 6 induce double-membrane vesicles. mBio 4, e00524-13 (2013). 10.1128/mBio.00524-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Graham R. L., Sims A. C., Brockway S. M., Baric R. S., Denison M. R., The nsp2 replicase proteins of murine hepatitis virus and severe acute respiratory syndrome coronavirus are dispensable for viral replication. J. Virol. 79, 13399–13411 (2005). 10.1128/JVI.79.21.13399-13411.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Jungreis I., Sealfon R., Kellis M., SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes. Nat. Commun. 12, 2642 (2021). 10.1038/s41467-021-22905-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Islam M. R., Hoque M. N., Rahman M. S., Alam A. S. M. R. U., Akther M., Puspo J. A., Akter S., Sultana M., Crandall K. A., Hossain M. A., Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity. Sci. Rep. 10, 14004 (2020). 10.1038/s41598-020-70812-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Cornillez-Ty C. T., Liao L., Yates J. R. 3rd, Kuhn P., Buchmeier M. J., Severe acute respiratory syndrome coronavirus nonstructural protein 2 interacts with a host protein complex involved in mitochondrial biogenesis and intracellular signaling. J. Virol. 83, 10314–10318 (2009). 10.1128/JVI.00842-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Gupta M., Azumaya C. M., Moritz M., Pourmal S., Diallo A., Merz G. E., Jang G., Bouhaddou M., Fossati A., Brilot A. F., Diwanji D., Hernandez E., Herrera N., Kratochvil H. T., Lam V. L., Li F., Li Y., Nguyen H. C., Nowotny C., Owens T. W., Peters J. K., Rizo A. N., Schulze-Gahmen U., Smith A. M., Young I. D., Yu Z., Asarnow D., Billesbølle C., Campbell M. G., Chen J., Chen K.-H., Chio U. S., Dickinson M. S., Doan L., Jin M., Kim K., Li J., Li Y.-L., Linossi E., Liu Y., Lo M., Lopez J., Lopez K. E., Mancino A., Moss F. R., Paul M. D., Pawar K. I., Pelin A., Pospiech T. H., Puchades C., Remesh S. G., Safari M., Schaefer K., Sun M., Tabios M. C., Thwin A. C., Titus E. W., Trenker R., Tse E., Tsui T. K. M., Wang F., Zhang K., Zhang Y., Zhao J., Zhou F., Zhou Y., Zuliani-Alvarez L., QCRG Structural Biology Consortium , D. A. Agard, Y. Cheng, J. S. Fraser, N. Jura, T. Kortemme, A. Manglik, D. R. Southworth, R. M. Stroud, D. L. Swaney, N. J. Krogan, A. Frost, O. S. Rosenberg, K. A. Verba, CryoEM and AI reveal a structure of SARS-CoV-2 Nsp2, a multifunctional protein involved in key host processes. bioRxiv (2021), doi:. 10.1101/2021.05.10.443524 [DOI]
  • 47. Jin Z., Du X., Xu Y., Deng Y., Liu M., Zhao Y., Zhang B., Li X., Zhang L., Peng C., Duan Y., Yu J., Wang L., Yang K., Liu F., Jiang R., Yang X., You T., Liu X., Yang X., Bai F., Liu H., Liu X., Guddat L. W., Xu W., Xiao G., Qin C., Shi Z., Jiang H., Rao Z., Yang H., Structure of Mpro from SARS-CoV-2 and discovery of its inhibitors. Nature 582, 289–293 (2020). 10.1038/s41586-020-2223-y [DOI] [PubMed] [Google Scholar]
  • 48. Osipiuk J., Azizi S.-A., Dvorkin S., Endres M., Jedrzejczak R., Jones K. A., Kang S., Kathayat R. S., Kim Y., Lisnyak V. G., Maki S. L., Nicolaescu V., Taylor C. A., Tesar C., Zhang Y.-A., Zhou Z., Randall G., Michalska K., Snyder S. A., Dickinson B. C., Joachimiak A., Structure of papain-like protease from SARS-CoV-2 and its complexes with non-covalent inhibitors. Nat. Commun. 12, 743 (2021). 10.1038/s41467-021-21060-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Hillen H. S., Kokic G., Farnung L., Dienemann C., Tegunov D., Cramer P., Structure of replicating SARS-CoV-2 polymerase. Nature 584, 154–156 (2020). 10.1038/s41586-020-2368-8 [DOI] [PubMed] [Google Scholar]
  • 50. Yan L., Ge J., Zheng L., Zhang Y., Gao Y., Wang T., Huang Y., Yang Y., Gao S., Li M., Liu Z., Wang H., Li Y., Chen Y., Guddat L. W., Wang Q., Rao Z., Lou Z., Cryo-EM Structure of an Extended SARS-CoV-2 Replication and Transcription Complex Reveals an Intermediate State in Cap Synthesis. Cell 184, 184–193.e10 (2021). 10.1016/j.cell.2020.11.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Chen J., Malone B., Llewellyn E., Grasso M., Shelton P. M. M., Olinares P. D. B., Maruthi K., Eng E. T., Vatandaslar H., Chait B. T., Kapoor T. M., Darst S. A., Campbell E. A., Structural Basis for Helicase-Polymerase Coupling in the SARS-CoV-2 Replication-Transcription Complex. Cell 182, 1560–1573.e13 (2020). 10.1016/j.cell.2020.07.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Chen Y., Cai H., Pan J., Xiang N., Tien P., Ahola T., Guo D., Functional screen reveals SARS coronavirus nonstructural protein nsp14 as a novel cap N7 methyltransferase. Proc. Natl. Acad. Sci. U.S.A. 106, 3484–3489 (2009). 10.1073/pnas.0808790106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Huang Y., Yang C., Xu X.-F., Xu W., Liu S.-W., Structural and functional properties of SARS-CoV-2 spike protein: Potential antivirus drug development for COVID-19. Acta Pharmacol. Sin. 41, 1141–1149 (2020). 10.1038/s41401-020-0485-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Cubuk J., Alston J. J., Incicco J. J., Singh S., Stuchell-Brereton M. D., Ward M. D., Zimmerman M. I., Vithani N., Griffith D., Wagoner J. A., Bowman G. R., Hall K. B., Soranno A., Holehouse A. S., The SARS-CoV-2 nucleocapsid protein is dynamic, disordered, and phase separates with RNA. Nat. Commun. 12, 1936 (2021). 10.1038/s41467-021-21953-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Chen Z., Pei D., Jiang L., Song Y., Wang J., Wang H., Zhou D., Zhai J., Du Z., Li B., Qiu M., Han Y., Guo Z., Yang R., Antigenicity analysis of different regions of the severe acute respiratory syndrome coronavirus nucleocapsid protein. Clin. Chem. 50, 988–995 (2004). 10.1373/clinchem.2004.031096 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Alaa Abdel Latif, Julia L. Mullen, Manar Alkuzweny, Ginger Tsueng, Marco Cano, Emily Haag, Jerry Zhou, Mark Zeller, Emory Hufbauer, Nate Matteson, Chunlei Wu, Kristian G. Andersen, Andrew I. Su, Karthik Gangavarapu, Laura D. Hughes, and the Center for Viral Systems Biology, Spike:D614G Mutation Report.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Materials and Methods

Figures S1 to S34

Tables S1 to S5

References ( 25 56 )

MDAR Reproducibility Checklist

Data S1 to S5

GISAID Acknowledgments table

MDAR Reproducibility Checklist

Data S1 to S5


Articles from Science (New York, N.y.) are provided here courtesy of American Association for the Advancement of Science

RESOURCES