Abstract
Accurate and reliable forecasting of emerging dominant severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants enables policymakers and vaccine makers to get prepared for future waves of infections. The last three waves of SARS-CoV-2 infections caused by dominant variants Omicron (BA.1), BA.2, and BA.4/BA.5 were accurately foretold by our artificial intelligence (AI) models built with biophysics, genotyping of viral genomes, experimental data, algebraic topology, and deep learning. Based on newly available experimental data, we analyzed the impacts of all possible viral spike (S) protein receptor-binding domain (RBD) mutations on the SARS-CoV-2 infectivity. Our analysis sheds light on viral evolutionary mechanisms, i.e., natural selection through infectivity strengthening and antibody resistance. We forecast that BA.2.10.4, BA.2.75, BQ.1.1, and particularly, BA.2.75+R346T, have high potential to become new dominant variants to drive the next surge.
Keywords: COVID-19, SARS-CoV-2, Omicron, infectivity, subvariants, deep learning, algebraic topology
1. Introduction
In the past two years, the coronavirus disease-2019 (COVID-19) pandemic was fueled by the spread of a few dominant variants of severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2), as shown in Figure 1. Specifically, the Alpha and Beta variants contributed to a peak of infections and deaths from October 2020 to January 2021. The Gamma variant caused another peak of infections and deaths in April and May 2021. The Delta variant led to the third wave of COVID-19 infections and deaths around August 2021. The Omicron (B.1.1.529), which was extraordinary in its infectivity, vaccine breakthrough, and antibody resistance, created a huge spike in the world’s daily infection record in December 2021 and January 2022. Omicron BA.2 subvariant rapidly replaced the original Omicron (i.e., BA.1) in March 2022. Around July 2022, Omicron subvariants BA.4 and BA.5 took over BA.2 and became the new dominant SARS-CoV-2 variant. These variant-driven waves of infections are also associated with spikes in deaths and have given rise to tremendous economic loss. A life-and-death question is: what will be future dominant variants?
Figure 1:
Illustration of daily COVID-19 cases (light blue) and deaths (red) since 2020 [1]. The curves are smoothed by five-day averages.
Forecasting and surveillance of emerging SARS-CoV-2 variants are some of the most challenging tasks of our time. Among about half a million SARS-CoV-2/COVID-19 related publications recorded in Google Scholar, few accurately foretold the emerging SARS-CoV-2 variants. Accurate and reliable forecasting of emerging SARS-CoV-2 variants enables policymakers and vaccine makers to plan, leading to enormous social, economic, and health benefits. To foretell future variants, one must have the full understanding of the mechanisms of viral evolution, the mechanisms of viral mutations, and the relationship between viral evolution and viral mutation.
Future variants are created through the SARS-CoV-2 evolution, in which is a SARS-CoV-2 evolves through changes in its RNA at molecular scale to gain fitness over its counterparts at the host population scale. At the molecular scale, most mutations occur randomly. Indeed, random genetic drift is a major mechanism of mutations, resulting in errors in various biological processes, such as replication, transcription, and translation. Additionally, virus-virus intra-organismic recombination can alter SASR-CoV-2 genes, which has a stochastic nature too. However, SARS-CoV-2 has a genetic proofreading mechanism facilitated by the synergistic interactions between RNA-dependent RNA polymerase and non-structure proteins 14 (NSP14) [2, 3]. At the organismic scale, inter-organismic recombination happens but the resulting variants may not be clinically significant. In contrast, host editing of virus genes is known to be a significant mechanism for SARS-CoV-2 mutations [4]. At the population scale, mutations occurring at molecular and organismic scales are regulated, i.e., enhanced and/or suppressed via natural selection, giving rise to SARS-CoV-2 variants with increased fitness [5]. Therefore, natural selection is the fundamental driven force for viral evolution.
It remains to understand what controls the natural selection of SARS-CoV-2. The mechanism of SARS-CoV-2 evolution was elusive at the beginning of the COVID-19 pandemic. Indeed, the life cycle of SARS-CoV-2 is extremely sophisticated, involving the viral entry of host cells, the release of the viral genome, the synthesis of viral NSPs, RNA replication, the transcription, translation, and synthesis of viral structural proteins, and the packing, assembly, and release of new viruses [6]. The SARS-CoV-2 mutations occur nearly randomly on all of its 29 genes, as shown in Figure 2. Nonetheless, in early 2020, we hypothesized that SARS-CoV-2 natural selection is controlled through infectivity-strengthening mutations [5], which primarily occur at the viral spike (S) protein receptor-binding domain (RBD) that binds with host angiotensin-converting enzyme 2 (ACE2) to facilitate the viral cell entry [7–11]. Our hypothesis was initially supported by our genotyping of 15,140 SARS-CoV-2 genomes extracted from patients. We demonstrated that among 89 unique RBD mutations, the observed frequencies of infectivity-strengthening mutations outpace those of infectivity-weakening ones in their time evolution. Our infectivity-strengthening mechanism of natural selection was proven beyond doubt in April 2021, with 506,768 SARS-CoV-2 genomes isolated from patients [12].
Figure 2:
Illustration of unique mutations on SARS-CoV-2 genomes extracted from patients. Each dot represents a unique mutation. The x-axis is the gene position of a mutation and the y-axis represents its observed frequency in the natural logarithmic scale.
However, we found that not all of the most observable RBD mutations strengthen viral infectivity [13]. This exception took place in the middle and late 2021 when a good portion of the population in many developed countries was vaccinated. By the genotyping of 2,298,349 complete SARS-CoV-2 genomes, we discovered vaccination-induced antibody-resistant mutations, which make the virus less infectious [13]. This discovery leads to a complementary mechanism of natural selection, namely antibody-resistant mutations. In other words, viral evolution also favors RBD mutations in a population that enable the virus to escape antibody protection generated from vaccination or infection.
The Omicron variant was the first example that was induced by both infectivity strengthening and antibody resistance mechanisms [13]. It has 32 mutations on the S protein, the main antigenic target of antibodies [14]. Among them, 15 are on the Omicron RBD, leading to a dramatic increase in SARS-CoV-2 infectivity, vaccine breakthrough, and antibody resistance [15]. The World Health Organization (WHO) declared Omicron as a variant of concern (VOC) on November 26, 2021. On December 1, 2021, when there were no experimental data available, we announced our topological artificial intelligence (AI) predictions based on the genotyping of viral genomes, biophysics, experimental data of protein-protein interactions, algebraic topology, and deep learning [15]. We predicted that Omicron is about 2.8 times as infectious as the Delta and has nearly 90% likelihood to escape vaccines, which would compromise essentially all of existing monoclonal antibody (mAb) therapies from Eli Lilly, Regeneron, AstraZeneca, etc. These predictions were subsequently confirmed by experiments [14, 16–20]. On February 10, our topological AI model foretold the taking over of Omicron BA.1 by Omicron subvariant BA.2 [21]. The WHO declared BA.2’s dominance on March 26, 2022. On May 1, 2022, our topological AI model projected the incoming dominance of BA.4 and BA.5 [22], which became reality in late June 2022. Currently, BA.5 is still the world’s dominant variant. Therefore, our topological AI model has been offering unusually accurate two-month forecasts of emerging dominant variants.
The COVID fatigue and the worldwide relaxation of COVID-19 prevention measures have given the virus enormous new opportunities to spread in world populations, which enables the virus to further evolve. Additionally, the newly generated Omicron subvariant RBD structures leave abundant room for the virus to further optimize its binding with the ACE2 and disrupt existing antibodies, resulting in a large number of emerging subvariants. It is of paramount importance to analyze their growth potentials in the world’s populations and alert future dominant variants.
This work analyzes SARS-CoV-2 evolutionary trends. We predict the SARS-CoV-2 S protein RBD mutation-induced binding free energy (BFE) changes of RBD-human ACE2 complexes at all RBD residues. Such changes are employed to forecast Omicron subvariants’ growth potentials and chances of becoming future dominant variants. Topological AI models are built from newly available deep mutational screening data and Omicron BA.1 and BA.2 three-dimensional (3D) structures. Our studies are assisted with the genotyping of over three million SARS-CoV-2 genomes extracted from patients and the evolutionary pattern of viral lineages among infections in the United States.
2. Results
We carry out single nucleotide polymorphism (SNP) calling for 3,616,783 million complete genomes extracted from patients. All unique mutations and their observed frequencies are illustrated in Fig. 2. Our interactive website, Mutation Tracker, also provides detailed records of mutations for download. On average, each nucleic acid site has one mutation. Overall, mutations occur essentially randomly at all 29,903 bases. Therefore, simple SNP calling and genotyping does not offer any direct evidence for SARS-CoV-2 variants as discussed earlier. More specific analysis of the RBD mutations is used for the forecasting of future dominant variants.
We collect emerging Omicron sublineages and compare them with previous VOCs. We use the notation “*” to represent the lineage and its sublineage. For instance, BA.2* represents BA.2 and all its sublineages in Figure 3. Figure 3a and b show the 3D structures of RBD binding to human ACE2. Figure 3a includes the RBD mutations of previous VOCs and Omicron subvariants BA.1, BA.2, BA.3, BA.4, and BA.5, while Figure 3b shows mutations of the subvariants of Omicron BA.2 and BA.5. Lineages originated from BA.2 are marked in red type of colors. Subvariants originated from BA.5 are labeled in green type of colors. In Figure 3c, the BFE changes of previous VOCs, BA.1 and BA.2 are calculated as the accumulation of single mutations according to the original structure (PDB: 6M0J [24]). The BFE changes of BA.1.1 is calculated based on the BA.1 RBD-ACE2 structure (PDB: 7T9L [25]). For the sublineages of BA.2, as well as BA.3, BA.4* and BA.5* (including BF.7, BQ.1*, and BE.1.2), their BFE changes are calculated based on the BA.2 structure (PDB: 7XB0 [23]).
Figure 3:
a. and b. the 3D structure of BA.2 (PDB: 7XB0 [23]) with two sets of mutations (colors are consistent with those in c and integrated colors indicate that mutation appears on multiple variants). a. the mutations of precious VOCs (cyan) and BA.1 (blue), BA.2 (pink), BA.3 (orange), BA.4, and BA.5 (green). b. the mutations of the Omicron subvariants with BA.2 sublineages (pink) and BA.5 sublineages (green). c. A comparison of predicted mutation-induced BFE changes for previous VOCs and Omicron subvariants. Previous VOCs (cyan): Alpha, Beta, Gamma, Delta, Theta, Kappa, Lambda, Mu; BA.1 and BA.1.1 (blue); BA.2 and sublineages (pink): BA.2, BA.2.10.4, BA.2.11, BA.2.12.1, BA.2.75, BA.2.75+R346T, BA.2.75+F486S, BA.2.75.2, BJ.1 (BA.2.10.1.1); BA.3 (light orange); BA.4 and BA.4.6 (orange) ; BA.5 and sublineages (green): BA.5, BA.5+R346T/I/S, BA.5+K444T/N/R/M, BF.7 (BA.5.2.1.7), BQ.1 (BA.5.3.1.1.1.1.1), BQ.1.1 (BA.5.3.1.1.1.1.1.1), BE.1.2 (BA.5.3.1.1.2).
In Figure 3c, the variants prior to the Omicron are presented in light blue. There are two main clades, one from BA.2 and the other from BA.5. Others, including previous VOCs, BA.1, BA.3, and BA.4, as well as their sublineages, are also presented. Firstly, three mutations from BA.2 to BA.5 are L452R, F486V, and R493Q, which make BA.5 two-fold as infectious as BA.2. This explains why BA.5 replaced BA.2 as a new dominant variant in late June 2022. Among the BA.2 sublineages, BA.2.10.4, BA.2.75, and BA.2.75+R346T were predicted to have BFE changes greater than 4.0 kcal/mol. These three sublineages together with BA.2.10.4 and BA.2.75.2 have higher BFE changes and are more infectious than BA.4 and BA.5. As for BA.4 and BA.5 sublineages, BA.4.6 is more infectious than BA.4 and BA.5 and has potential to become a dominant variant. Among the sublineages of BA.5, BQ.1.1 has the highest potential to replace the spreading of BA.5 as its BFE change is greater than 4.0 kcal/mol, while BA.5+R346T, BF.7, BQ.1, and BE.1.2 have larger BFE changes than that of BA.5. Based on this analysis, we forecast that BA.2.10.4, BA.2.75, BA.2.75+R346T, and BQ.1.1 have high potentials to become new dominant variants.
Figure 4 shows the heatmap of mutation-induced BFE change predictions of BA.1 (top panel) and BA.2 (bottom panel). We plot those residues that have at least one mutation-induced BFE change greater than0.1 kcal/mol, which gives rise to 89 residues in the plots. In other words, we keep mutations that will lead to more infectious variants. The deep blue color indicates infectivity-strengthening mutations. Deep red color shows infectivity-weakening mutations. It is seen from Figure 4 that most mutations will weaken the binding between RBD and ACE2 for BA.1 and BA.2. However, such mutations, once occurred, will have little chance of becoming clinically significant due the natural selection. Figure 4 indicates that both BA.1 and BA.2 are highly infectivity-optimized variants. They just leave a few residues to be further optimized. Obviously, for both B.1 and BA.2, many mutations on residue sites R439, Y453, and N417 will most likely lead to more infectious new variants. For BA.2, surveillance is also required for residue sites N504 and R403.
Figure 4:
Heatmap of mutation-induced BFE change predictions of BA.1 (top panel) and BA.2 (bottom panel). Residues that have at least one mutation-induced BFE change greater than 0.1 kcal/mol are selected. The sites of BA.2’s six distinct mutations are marked red and framed in the heatmap.
Compared with BA.2, BA.5 has three additional mutations, i.e., L452R, F486V, and R493Q. Among them, R493Q makes BA.5 significantly more infectious as shown in Figure 4. This reverse mutation (original residue is glutamine) occurs in many other lineages showing in Figure 3c, namely, BA.2.10.4, BA.2.75*, BA.4*, BA.5, BF.7, BQ.1* and BE.1.2. In addition to R493Q, BA.2.75* and BQ.1.1 in Figure 3 share the mutation N460K with the BFE change 0.267 kcal/mol. This indicates that more infectious variants will emerge with multiple infectivity-strengthening mutations. Overall, comparing the two heatmaps in Figure 4, it is easy to note that BA.2 has more positive BFE changes, which makes future BA.2 sublineages more competitive than future BA.1 sublineages in terms of infectivity.
The top panel of Figure 4 explains why BA.2 is more infectious than BA.1. BA.2 shares 12 of its RBD mutations with BA.1, except for six mutations, i.e., L371F, T376A, D405N, R408S, S446G, and S496G. These residue sites are marked with red in both panels of Figure 4. Among these mutations, L371F, T376A, D405N, and R408S induced minor BFE changes as shown in the top panel of Figure 4. However, S446G and S496G render BA.2 significantly more infectious than BA.1.
3. Discussion
Figure 5 presents the evolution pattern of weekly viral lineage distribution among infections in the United States from 06/26/2022 to 10/08/2022. Each lineage is illustrated by aggregating its sublineages to except for its sublineage is also listed. Note that BA.2.75 sublineages are aggregated to BA.2.75, which means lineages BA.2.75+R346T and BA.2.75+F486S in Figure 3 belong to this category. It is interesting to note that there is high consistence between Figures 3 and 5. Specifically, all the emerging variants listed in Figure 5 have relatively high BFE changes as depicted in Figure 3.
Figure 5:
Weekly viral lineages among infections in the United States from 06/26/2022 to 10/08/2022. AY.1-AY.133, Delta (B.1.617.2), BA.1 and sublineages of BA.1 variant are aggregated to category “Others”. BA.2 sublineages except BA.2.12.1, BA.2.75 and BA.2.75.2, are aggregated with BA.2. BA.4 sublineages are aggregated to BA.4 except BA.4.6. Sublineages of BA.5 are aggregated to BA.5 except BF.7, BQ.1 and BQ.1.1. The spike substitution R346T is included in lineages BA.4.6, BF.7, and BA.2.75 Data from CDC website [26].
It is also interesting to note from Figure 5 that the relative populations of BA.2.12.1, BA.4, and BA.5 are shrinking during this period. BA.5 slightly expanded at the beginning and took a portion of BA.2.12.1’s population. BA.4.6 is a sublineage of BA.4, while BF.7, BQ.1, and BQ.1.1 are the sublineages of BA.5. Their relative populations are increasing. BQ.1.1 has a faster growth rate than BQ.1 and BF.7, which indicates that the predicted BFE change of BQ.1.1 is the highest among the sublineages of BA.5. As shown in Figure 3, BA.2.75, and BQ.1.1 have higher potentials to become future dominant variants.
In our earlier predictions of Omicron [15] and BA.2 [21], we utilized nearly 200 antibody-RBD complexes to analyze the impact of antibody resistance. Such analysis is necessary because the Omicron variant involves a dramatic increase in the number of RBD mutations. For the most variants studied in the present work, there are only gradual changes in the number of new RBD mutations and thus the impact of antibody resistance on natural selection may be relatively small.
While the BFE change-based prediction favors the variant with the highest BFE change, its dominance in the population is also determined by the viral transmission environment (i.e., vaccination, prevention measures, human interaction intensity, etc.) and temporal dynamics. Therefore, a variant with slightly lower BFE change might become a dominant variant over a short period, which is called kinetic reaction control in thermodynamics. In an idealized viral transmission environment, the variant with the highest BFE change would have an exponential advantage over other variants, according to the Boltzmann distribution, which is called thermodynamic reaction control.
4. Methods
4.1. Deep learning model
The model applied in this work is an updated version of the recently proposed machine learning model, TopLapNet, by integrating the SKEMPI 2.0 dataset [27] and deep mutational scanning datasets [28–31]. Briefly speaking, the TopLapNet model is a deep neural network model and implements biophysics and biochemistry descriptors, as well as mathematical descriptors based on algebraic topology [32–34] to predict the binding free energy (BFE) changes of protein-protein interactions (PPIs) induced by single mutations. A deep neural network maps sample features to an output layer where hidden layers in the network contain numerous neuron units and weights updated by backpropagation methods. The single neuron gets fully connected with the neurons in the following layers. For the model cross-validations, the Pearson correlation of 10-fold cross-validation is 0.864, and the root mean square error is 1.019 kcal/mol. As for predictions, the TopLapNet model is used to calculate all possible mutation impacts on RBD binding to ACE2 for the original virus (PDB: 6M0J [24]), BA.1 (PDB: 7T9L [25]), and BA.2 structures (PDB: 7XB0 [23]). Thus, previous VOCs’ infectivities as well as that of BA.1 and BA.2 are calculated based on the original structure. The infectivity of BA.1.1 is calculated by accumulating BFE changes based on the BA.1 structure. The infectivities of all other sublineages presented in Figure 3 are calculated by the accumulations of BFE changes based on the BA.2 structure.
4.2. Feature generation
Feature generation methods decipher protein structures to extract their biophysics, biochemistry, and mathematical information. These methods use physical, chemical, and mathematical modeling of protein structures to provide suitable features for machine-learning algorithms. There are two types of features, i.e., residue-level ones and atom-level ones. Residue-level features are generated from secondary structures, which are provided by a position-specific scoring matrix (PSSM) in the form of conservation scores of each amino acid [35]. Atom-level features consider seven groups of atom types, including C, N, O, S, H, all heavy atoms, and all atoms. Surface areas, partial changes, atomic pairwise interactions, and electrostatics are assembled in an element-specific manner in terms of these seven groups. Moreover, the most important features from modelings are topological features and graph features generated by using persistent homology [32, 33] and persistent Laplacian [34].
Persistent homology describes proteins by analogy to point cloud data. Atoms are regarded as vertices to build a simplicial complex, which is a collection of infinitely many simplicies such as nodes, edges, triangles, and tetrahedrons. The simplicies among atoms are defined by whether there is an overlap under a given influence domain or radius r. Filtration of this topological space is defined by varying the radius as a sequence of snapshots of each simplicial complex to extract more geometric and topological properties. Then, the Betti numbers on each snapshot are computed as descriptors of the number of connected components, cycles, and cavities in a protein structure. Persistent Laplacian (also known as persistent spectral graph [34]) on the other hand unveil the homotopic shape evolution of a protein structure in filtration that the persistent homology cannot provide. Persistent Laplacian applies the same scheme as persistent homology to construct simplicial complexes during filtration. However, persistent Laplacian calculates all eigenvalues of the combinatorial Laplacian with boundary operators on simplices. Our mathematical features consist of both topological invariants from persistent homology and spectral invariants from persistent Laplacian.
4.3. SNP calling and Mutation Tracker
For genotyping, SARS-CoV-2 complete genome sequences with high coverage and exact collection date were downloaded from the GISAID database [36] (https://www.gisaid.org/) as of September 30, 2022. Such sequences were aligned to the reference genome downloaded from GenBank (NC_045512.2) [37]. Next, we applied single nucleotide polymorphism (SNP) calling [38, 39] to measure the genetic variations between SARS-CoV-2 sequences through Cluster Omega with default parameters. The SNP calling can track differences between various SARS-CoV-2 sequences and the reference genome. By applying it, we decoded 29,290 unique single mutations from more than 3.6 million complete SARS-CoV-2 genomes. The detailed mutation information can be viewed at Mutation Tracker. Lastly, the Omicron sublineages analyzed in Figure 3 are selected from the SNP analysis and other web-servers [26,40,41].
Acknowledgment
This work was supported in part by NIH grants R01GM126189 and R01AI164266, NSF grants DMS-2052983, DMS-1761320, and IIS-1900473, NASA grant 80NSSC21M0023, MSU Foundation, Bristol-Myers Squibb 65109, and Pfizer.
Footnotes
Supporting information
The Supporting information (Supporting_Material.zip) includes the predicted BFE changes for all possible RBD mutations based on BA.1 and BA.2 structures, PDB files (6M0J, 7T9L, 7XB0) and a README file for the links of external datasets.
Data and model availability
The world’s SARS-CoV-2 SNP data are available at Mutation Tracker (https://users.math.msu.edu/users/weig/SARS-CoV-2_Mutation_Tracker.html). The TopNetTree model is available at https://github.com/WeilabMSU/TopNetmAb.
References
- [1].Daily new cases. https://www.worldometers.info/coronavirus/. Accessed: 2022-10-16.
- [2].Sevajol Marion, Subissi Lorenzo, Decroly Etienne, Canard Bruno, and Imbert Isabelle. Insights into RNA synthesis, capping, and proofreading mechanisms of SARS-coronavirus. Virus Research, 194:90–99, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Ferron François, Subissi Lorenzo, Silveira De Morais Ana Theresa, Le Nhung Thi Tuyet, Sevajol Marion, Gluais Laure, Decroly Etienne, Vonrhein Clemens, Bricogne Gérard, Canard Bruno, et al. Structural and molecular basis of mismatch correction and ribavirin excision from coronavirus RNA. Proceedings of the National Academy of Sciences, 115(2):E162–E171, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Wang Rui, Hozumi Yuta, Zheng Yong-Hui, Yin Changchuan, and Wei Guo-Wei. Host immune response driving SARS-CoV-2 evolution. Viruses, 12(10):1095, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Chen Jiahui, Wang Rui, Wang Menglun, and Wei Guo-Wei. Mutations strengthened SARS-CoV-2 infectivity. J. Mol. Biol., 432(19):5212–5226, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Trougakos Ioannis P, Stamatelopoulos Kimon, Terpos Evangelos, Tsitsilonis Ourania E, Aivalioti Evmorfia, Paraskevis Dimitrios, Kastritis Efstathios, Pavlakis George N, and Dimopoulos Meletios A. Insights to sars-cov-2 life cycle, pathophysiology, and rationalized treatments that target covid-19 clinical complications. Journal of Biomedical Science, 28(1):1–18, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Li Wendong, Shi Zhengli, Yu Meng, Ren Wuze, Smith Craig, Epstein Jonathan H, Wang Hanzhong, Crameri Gary, Hu Zhihong, Zhang Huajun, et al. Bats are natural reservoirs of SARS-like coronaviruses. Science, 310(5748):676–679, 2005. [DOI] [PubMed] [Google Scholar]
- [8].Qu Xiu-Xia, Hao Pei, Song Xi-Jun, Jiang Si-Ming, Liu Yan-Xia, Wang Pei-Gang, Rao Xi, Song Huai-Dong, Wang Sheng-Yue, Zuo Yu, et al. Identification of two critical amino acid residues of the severe acute respiratory syndrome coronavirus spike protein for its variation in zoonotic tropism transition via a double substitution strategy. Journal of Biological Chemistry, 280(33):29588–29595, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Song Huai-Dong, Tu Chang-Chun, Zhang Guo-Wei, Wang Sheng-Yue, Zheng Kui, Lei Lian-Cheng, Chen Qiu-Xia, Gao Yu-Wei, Zhou Hui-Qiong, Xiang Hua, et al. Cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human. Proceedings of the National Academy of Sciences, 102(7):2430–2435, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Hoffmann Markus, Kleine-Weber Hannah, Schroeder Simon, Krüger Nadine, Herrler Tanja, Erichsen Sandra, Schiergens Tobias S, Herrler Georg, Wu Nai-Huei, Nitsche Andreas, et al. SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor. Cell, 181(2):271–280, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Walls Alexandra C, Park Young-Jun, Tortorici M Alejandra, Wall Abigail, McGuire Andrew T, and Veesler David. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell, 181(2):281–292, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Wang Rui, Chen Jiahui, Gao Kaifu, and Wei Guo-Wei. Vaccine-escape and fast-growing mutations in the United Kingdom, the United States, Singapore, Spain, India, and other COVID-19-devastated countries. Genomics, 113(4):2158–2170, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Wang Rui, Chen Jiahui, and Wei Guo-Wei. Mechanisms of sars-cov-2 evolution revealing vaccine-resistant mutations in europe and america. The Journal of Physical Chemistry Letters, 12:11850–11857, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Cele Sandile, Jackson Laurelle, Khoury David S, Khan Khadija, Moyo-Gwete Thandeka, Tegally Houriiyah, San James Emmanuel, Cromer Deborah, Scheepers Cathrine, Amoako Daniel G, et al. Omicron extensively but incompletely escapes pfizer bnt162b2 neutralization. Nature, pages 1–5, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Chen Jiahui, Wang Rui, Gilby Nancy Benovich, and Wei Guo-Wei. Omicron variant (b. 1.1. 529): Infectivity, vaccine breakthrough, and antibody resistance. J Chem Inf Model, 62(2):412–422, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Shuai Huiping, Chan Jasper Fuk-Woo, Hu Bingjie, Chai Yue, Yuen Terrence Tsz-Tai, Yin Feifei, Huang Xiner, Yoon Chaemin, Hu Jing-Chu, Liu Huan, et al. Attenuated replication and pathogenicity of sars-cov-2 b. 1.1. 529 omicron. Nature, 603:693–699, 2022. [DOI] [PubMed] [Google Scholar]
- [17].Zhang Li, Li Qianqian, Liang Ziteng, Li Tao, Liu Shuo, Cui Qianqian, Nie Jianhui, Wu Qian, Qu Xiaowang, Huang Weijin, et al. The significant immune escape of pseudotyped sars-cov-2 variant omicron. Emerg. microbes & infect., 11(1):1–5, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [18].Liu Lihong, Iketani Sho, Guo Yicheng, Chan Jasper FW, Wang Maple, Liu Liyuan, Luo Yang, Chu Hin, Huang Yiming, Nair Manoj S, et al. Striking antibody evasion manifested by the omicron variant of sars-cov-2. Nature, pages 1–8, 2021. [DOI] [PubMed] [Google Scholar]
- [19].Lu Lu, Mok Bobo Wing-Yee, Chen Linlei, Chan Jacky Man-Chun, Tsang Owen Tak-Yin, Lam Bosco Hoi-Shiu, Chuang Vivien Wai-Man, Chu Allen Wing-Ho, Chan Wan-Mui, Ip Jonathan Daniel, et al. Neutralization of sars-cov-2 omicron variant by sera from bnt162b2 or coronavac vaccine recipients. Clin Infect Dis, doi: 10.1093/cid/ciab1041, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Hoffmann Markus, Krüger Nadine, Schulz Sebastian, Cossmann Anne, Rocha Cheila, Kempf Amy, Nehlmeier Inga, Graichen Luise, Moldenhauer Anna-Sophie, Winkler Martin S, et al. The omicron variant is highly resistant against antibody-mediated neutralization–implications for control of the covid-19 pandemic. Cell, 185:447–456, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Chen Jiahui and Wei Guo-Wei. Omicron ba.2 (b.1.1.529.2): high potential to becoming the next dominating variant. arXiv:2202.05031, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Chen Jiahui, Qiu Yuchi, Wang Rui, and Wei Guo-Wei. Persistent laplacian projected omicron ba. 4 and ba. 5 to become new dominating variants. arXiv preprint arXiv:2205.00532, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Li Linjie, Liao Hanyi, Meng Yumin, Li Weiwei, Han Pengcheng, Liu Kefang, Wang Qing, Li Dedong, Zhang Yanfang, Wang Liang, et al. Structural basis of human ace2 higher binding affinity to currently circulating omicron sars-cov-2 sub-variants ba. 2 and ba. 1.1. Cell, 185(16):2952–2960, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Lan Jun, Ge Jiwan, Yu Jinfang, Shan Sisi, Zhou Huan, Fan Shilong, Zhang Qi, Shi Xuanling, Wang Qisheng, Zhang Linqi, et al. Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. Nature, pages 1–6, 2020. [DOI] [PubMed] [Google Scholar]
- [25].Mannar Dhiraj, Saville James W, Zhu Xing, Srivastava Shanti S, Berezuk Alison M, Tuttle Katharine S, Marquez Ana Citlali, Sekirov Inna, and Subramaniam Sriram. Sars-cov-2 omicron variant: Antibody evasion and cryo-em structure of spike protein–ace2 complex. Science, 375(6582):760–764, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Centers for Disease Control and Prevention. Variant proportions. https://covid.cdc.gov/covid-data-tracker/#variant-proportions, 2021.
- [27].Jankauskaite Justina, Jiménez-García Brian, Dapkūnas Justas, Fernández-Recio Juan, and Moal Iain H. SKEMPI 2.0: an updated benchmark of changes in protein–protein binding energy, kinetics and thermodynamics upon mutation. Bioinformatics, 35(3):462–469, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Chan Kui K, Dorosky Danielle, Sharma Preeti, Abbasi Shawn A, Dye John M, Kranz David M, Herbert Andrew S, and Procko Erik. Engineering human ACE2 to optimize binding to the spike protein of SARS coronavirus 2. Science, 369(6508):1261–1265, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Starr Tyler N, Greaney Allison J, Hilton Sarah K, Ellis Daniel, Crawford Katharine HD, Dingens Adam S, Navarro Mary Jane, Bowen John E, Tortorici M Alejandra, Walls Alexandra C, et al. Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell, 182(5):1295–1310, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Linsky Thomas W, Vergara Renan, Codina Nuria, Nelson Jorgen W, Walker Matthew J, Su Wen, Barnes Christopher O, Hsiang Tien-Ying, Esser-Nobis Katharina, Yu Kevin, et al. De novo design of potent and resilient hACE2 decoys to neutralize SARS-CoV-2. Science, 370(6521):1208–1214, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Starr Tyler N, Greaney Allison J, Stewart Cameron M, Walls Alexandra C, Hannon William W, Veesler David, and Bloom Jesse D. Deep mutational scans for ace2 binding, rbd expression, and antibody escape in the sars-cov-2 omicron ba. 1 and ba. 2 receptor-binding domains. bioRxiv, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Zomorodian Afra and Carlsson Gunnar. Computing persistent homology. Discrete Comput Geom, 33(2):249–274, 2005. [Google Scholar]
- [33].Edelsbrunner Herbert, Harer John, et al. Persistent homology-a survey. Contemp. Math., 453:257–282, 2008. [Google Scholar]
- [34].Wang Rui, Nguyen Duc Duy, and Wei Guo-Wei. Persistent spectral graph. International journal for numerical methods in biomedical engineering, 36(9):e3376, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Altschul Stephen F, Madden Thomas L, Schäffer Alejandro A, Zhang Jinghui, Zhang Zheng, Miller Webb, and Lipman David J. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic acids research, 25(17):3389–3402, 1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Shu Yuelong and McCauley John. GISAID: Global initiative on sharing all influenza data–from vision to reality. Eurosurveillance, 22(13):30494, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Wu Fan, Zhao Su, Yu Bin, Chen Yan-Mei, Wang Wen, Song Zhi-Gang, Hu Yi, Tao Zhao-Wu, Tian Jun-Hua, Pei Yuan-Yuan, et al. A new coronavirus associated with human respiratory disease in China. Nature, 579(7798):265–269, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Yin Changchuan. Genotyping coronavirus SARS-CoV-2: methods and implications. Genomics, 112(5):3588–3596, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Kim Sobin and Misra Ashish. Snp genotyping: technologies and biomedical applications. Annu. Rev. Biomed. Eng., 9:289–320, 2007. [DOI] [PubMed] [Google Scholar]
- [40].NIH National Library of Medicine. SARS-CoV-2 Variants Overview. https://www.ncbi.nlm.nih.gov/activ. Accessed: 2022-10-16.
- [41].covSPECTRUM. Detect and analyze variants of SARS-CoV-2. https://cov-spectrum.org/explore/World/AllSamples/Past6M/. Accessed: 2022-10-16.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The world’s SARS-CoV-2 SNP data are available at Mutation Tracker (https://users.math.msu.edu/users/weig/SARS-CoV-2_Mutation_Tracker.html). The TopNetTree model is available at https://github.com/WeilabMSU/TopNetmAb.





