Abstract
Increasing the expression level of the SARS-CoV-2 spike (S) protein has been critical for COVID-19 vaccine development. While previous efforts largely focused on engineering the receptor-binding domain (RBD) and the S2 subunit, the amino-terminal domain (NTD) has been long overlooked because of the limited understanding of its biophysical constraints. In this study, the effects of thousands of NTD single mutations on S protein expression were quantified by deep mutational scanning. Our results revealed that in terms of S protein expression, the mutational tolerability of NTD residues was inversely correlated with their proximity to the RBD and S2. We also identified NTD mutations at the interdomain interface that increased S protein expression without altering its antigenicity. Overall, this study not only advances the understanding of the biophysical constraints of the NTD but also provides invaluable insights into S-based immunogen design.
A survey on thousands of SARS-CoV-2 S protein NTD mutations reveals a complementary strategy for immunogen optimization.
INTRODUCTION
The emergence of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has led to the coronavirus disease 2019 (COVID-19) pandemic (1, 2). As the major antigen of SARS-CoV-2, spike (S) glycoprotein plays a critical role in facilitating virus entry (3, 4). Therefore, antibodies to SARS-CoV-2 S are often neutralizing (5, 6). SARS-CoV-2 S protein consists of an N-terminal S1 subunit, which is responsible for engaging the host receptor angiotensin-converting enzyme 2 (ACE2) via the receptor-binding domain (RBD), as well as a C-terminal S2 subunit, which mediates virus-host membrane fusion (4, 7, 8). The S1 subunit also contains an N-terminal domain (NTD) in addition to the RBD (4, 7). While the RBD is generally considered to be immunodominant, the NTD is also a target of neutralizing antibodies (9–11). Structural studies revealed the presence of an antigenic supersite on the NTD that is frequently mutated in SARS-CoV-2 variants of concern (VOCs) (12–17). Amino acid mutations and indels rapidly accumulate within the NTD during the evolution of SARS-CoV-2 in human, at least partly due to the immune selection pressure (18). On the other hand, antibodies to NTD epitopes that are conserved across VOCs have also been identified (16, 19). Despite the importance of NTD in immune response against SARS-CoV-2, the biophysical constraints of NTD remain largely elusive.
COVID-19 vaccines, including both recombinant protein–based and mRNA-based, are proven to be highly protective against SARS-CoV-2 infection (20–23). There is an inverse relationship between the production yield and cost of recombinant protein–based COVID-19 vaccines, such as that from Novavax, which showed promising results in phase 3 clinical trials (22), as well as others that are in earlier phases of clinical trials (24). High protein expression level is also believed to be critical for the effectiveness of mRNA vaccines (25). As a result, identifying mutations that increase S protein expression are crucial for optimizing COVID-19 vaccines. While most studies focused on mutating the S2 subunit and the RBD to increase S protein expression (7, 26–29), little effort has been spent on NTD because of the lack of understanding of its biophysical properties.
Phenotypes of numerous mutations can be measured in a massively parallel manner using deep mutational scanning, which combines saturation mutagenesis and next-generation sequencing (30). Previous studies have applied deep mutational scanning to evaluate the effects of RBD mutations on protein expression, ACE2-binding affinity, and antibody escape (31–36). Although deep mutational scanning of the RBD provided important insights into immunogen design and SARS-CoV-2 evolution (29, 31, 32, 35, 36), similar studies on other regions of the S protein have not yet been carried out.
Here, we used deep mutational scanning to quantify the effects of thousands of NTD single mutations on S protein expression. One notable observation was that NTD residues, unlike RBD residues, showed a weak correlation between mutational tolerability and relative solvent accessibility (RSA). Instead, the mutational tolerability of NTD residues strongly correlated with their distance to RBD and S2. Residues S50 and G232 were two exceptions, in which they were proximal to S2 and RBD, respectively, and yet had a high mutational tolerability. Subsequently, we functionally characterized two mutations that increased S protein expression, namely, S50Q and G232E. These results have important implications toward understanding NTD evolution and S-based immunogen design.
RESULTS
Most NTD mutations have minimal impact on S protein expression
To study how SARS-CoV-2 S protein expression is influenced by NTD mutations, we created a mutant library that contained all possible single amino acid mutations across residues 14 to 301 of the S protein. Each of these 288 residues was mutated with the choice of all 19 other amino acids and the stop codon, leading to a mutant library with 5760 single amino acid mutations. We used a cassette-based polymerase chain reaction (PCR) strategy that was designed to avoid potential off-target errors and generation of double mutants during the library construction process (fig. S1). As a quality control, Sanger sequencing was performed on 10 selected colonies following the construction of the mutant library. All of them contained a single mutation as expected. The mutant library was expressed using the human embryonic kidney (HEK) 293T landing pad cell system such that each transfected cell stably expressed only one mutant (37, 38). Fluorescence-activated cell sorting (FACS) was then performed using the human anti-S2 antibody CC40.8 (39), with phycoerythrin (PE) anti-human immunoglobulin G (IgG) Fc as the secondary antibody. Four separated gates were set up on the basis of the PE signals, each covering 25% of the entire population (fig. S2). The frequency of each mutant among the entire population was calculated (see Materials and Methods), and a cutoff of 0.0075% was set up to filter out mutants with potentially noisy measurements. Among the 5760 missense and nonsense mutations, 3999 (69%) of them satisfied the frequency cutoff for downstream analysis. Notably, the design of our mutant library adopted an internal barcoding strategy that uses synonymous mutations to facilitate sequencing error correction (40). As described previously (41), the expression score of each mutation was calculated on the basis of their frequency in each of the four gates and normalized such that the average expression score of silent mutations was 1 and that of nonsense mutations was 0. Blocks of missing sites could be observed periodically across the NTD, potentially due to the locations of NNK codons in our cassette PCR design (fig. S3).
To evaluate the quality of the deep mutational scanning results, we assessed the expression score distributions of missense, nonsense, and silent mutations (fig. S4A). The difference between the expression scores of silent mutations and nonsense mutations was apparent and significant (P = 6 × 10−166), which validated the selectivity of the deep mutational scanning experiment. Silent mutation and missense mutations had similar expression scores, although the difference is statistically significant (P = 2 × 10−5), indicating that most amino acid mutations in the NTD did not affect S protein expression. While dead cells could potentially increase the noise of the deep mutational scanning results, this effect seemed to be relatively minor in our experiment (fig. S5). In addition, a Pearson correlation of 0.53 was obtained between the expression scores of each mutant from two independent biological replicates (fig. S4B), demonstrating the reproducibility of the deep mutational scanning experiment.
To summarize the expression scores for individual mutations, a heatmap was generated (Fig. 1). We noticed that high-expressing mutations were enriched within the five NTD loop regions (fig. S7A) (12). High-expressing mutations were also found in residues outside of the loop regions, such as residues S50 and G232. This observation shows that some NTD mutations can improve the expression of S protein.
Mutational tolerability has minimal correlation with solvent accessibility
While some residues were enriched in high-expression mutations (see above), others were enriched in low-expression mutations (e.g., residues D40, L84, and N234; Fig. 1). Consequently, we aimed to identify the biophysical determinants of mutational tolerability in terms of S protein expression. For each residue, we defined the mutational tolerability as the mean expression score of mutations. A higher mutational tolerability would indicate the enrichment of high-expressing mutations at the specified residue. In contrast, a lower mutational tolerability would indicate the enrichment of low-expressing mutations at the specified residue. A total of 243 NTD residues had six or more mutations with expression score available and were included in this analysis. Notably, a Pearson correlation of 0.68 was obtained between the mutational tolerability values of each position from the two independent biological replicates (fig. S4C).
First, we investigated whether a correlation existed between the mutational tolerability and RSA. Because buried residues are typically important for protein folding stability, residues with a lower RSA are generally expected to have a lower mutational tolerability. For example, previous deep mutational scanning studies on the RBD have shown a decent correlation between RSA and mutational tolerability (Spearman correlation = 0.73; Fig. 2A) (34, 42). In contrast, the mutational tolerability of NTD residues had a much weaker correlation with RSA (Spearman correlation = 0.19; Fig. 2B). These observations indicate that the folding stability of NTD does not have a strong influence on its mutational tolerability and, hence, the S expression level. Alternatively, it is possible that some mutations can destabilize NTD, but NTD instability does not affect the S expression level.
To investigate whether the mutational tolerability correlated with sequence conservation, we then analyzed the NTD sequences of 27 sarbecovirus strains, including SARS-CoV-2. Less conserved residues tended to have a higher mutational tolerability, while more conserved residues tended to have a lower mutational tolerability, although the correlation was not strong (Spearman correlation = −0.30; Fig. 2C). In comparison, the correlation between sequence conservation and RSA was even weaker (Spearman correlation = −0.16; Fig. 2D).
Mutational tolerability correlates with distance to RBD/S2
We further calculated the distance from each NTD residue to RBD/S2 of the S protein. A positive correlation was observed between the mutational tolerability and the distance to RBD/S2 (Spearman correlation = 0.55; Fig. 2E). In other words, the more distant an NTD residue was from the RBD/S2, the higher the mutational tolerability was. This correlation was apparent when the mutational tolerability of each NTD residue was projected on the S protein structure (Fig. 2G). Consistently, the epitopes of two cross-neutralizing antibodies, namely, C1717 and C1791, were significantly closer to RBD/S2 (P ≤ 1 × 10−4) and had lower mutational tolerability (P ≤ 0.03) when compared to the rapidly evolving NTD antigenic supersite (Fig. 2F and fig. S7B) (14, 16, 43).
Naturally circulating NTD indel sites have significantly higher mutational tolerability
We then examined the naturally occurring NTD mutations and indels observed among 17 SARS-CoV-2 major variants using our deep mutational scanning data. Among these 17 variants, there are 25 different amino acid mutations and 25 indel sites relative to the ancestral strain. Among the 25 amino acid mutations and 25 indel sites, 20 and 23, respectively, have available expression scores and site-wise mutational tolerability in our dataset. The expression score distribution of the 20 natural amino acid mutations was similar to the rest of the missense mutations (P = 0.15; Fig. 3A). In contrast, the 23 natural indel sites had significant higher mutational tolerability than the other NTD residues (P = 2 × 10−7; Fig. 3B). This observation is consistent with the enrichment of natural indels in the five NTD loops, which have high mutational tolerability (Fig. 3, C and D).
Two buried NTD mutations increase S protein expression
While NTD residues adjacent to RBD/S2 typically had a low mutational tolerability, S50 and G232 were two exceptions (Fig. 2G). For example, mutations S50G and G232E had a high expression score in our deep mutational scanning results. To validate this finding, we used the same landing pad system to construct HEK293T cell lines that stably expressed S50Q, G232E, and S50Q/G232E double mutant. As quantified by flow cytometry analysis (Fig. 5A and fig. S8), the mean fluorescence intensity (MFI) of S50Q and G232E increased from wild type (WT) by 1.7-fold (P = 0.002) and 1.5-fold (P = 9 × 10−4), respectively, whereas that of S50Q/G232E increased by 2.5-fold (P = 2 × 10−6). Although fold change in MFI is unlikely to linearly correlate with the fold change in protein synthesis, such increase in MFI shows that S50Q, G232E, and S50Q/G232E double mutant have increased surface expression compared to WT.
Subsequently, we examined the natural occurrences of S50Q and G232E mutations. Both S50Q and S232E rarely occur in circulating SARS-CoV-2. Among over 10 million NTD sequences on Global Initiative for Sharing Avian Influenza Data (GISAID) (44), only 22 and 3 sequences contain S50Q and G232E, respectively. To probe the structural impact of S50Q and G232E, we analyzed their local environments on the structure of S protein and performed structural modeling using Rosetta (Fig. 5, C and D) (45–47). S50 forms a hydrogen bond with K304 and is proximal to the S2 subunit. Structural modeling showed that S50Q not only is able to maintain the hydrogen bond with K304 but also strengthens the van der Waals interaction between the NTD and S2 by pushing K304 toward S2 from the adjacent protomer (Fig. 5C). G232 is proximal to a positively charged region on the RBD that is featured by R355 and R466 (Fig. 5D). Structural modeling suggested that G232E could form favorable electrostatic interactions with both R355 and R466. We further recombinantly expressed these mutants and tested their thermostability using a thermal shift assay (Fig. 5B). Notably, all the recombinantly expressed S proteins contained K986P/V987P mutations in the S2 subunit, which are known to stabilize the prefusion conformation and increase expression (26, 48). The melting temperatures of WT and NTD mutants were almost identical at a Tm of 46°C to 46.5°C. These observations indicate that despite both S50Q and G232E improve the interaction between NTD and the rest of the S protein, they have minimal impact on the global folding stability of the S protein. Notably, all mutants increased the yield of the soluble recombinantly expressed S protein compared to WT (fig. S6), although the rank order was different from that of the membrane-bound form (Fig. 4A and fig. S6).
S50Q and G232E have minimal effects on the fusion activity and antigenicity
To understand the functional consequences of S50Q and G232E, we further tested whether S50Q, G232E, and S50Q/G232E exhibited a change in fusion activity compared to WT. A fluorescence-based cell-cell fusion assay that relied on the split mNeonGreen2 (mNG2) (49) was performed (see Materials and Methods; fig. S9). Briefly, HEK293T landing pad cells that expressed human ACE2 (hACE2) and mNG21–10 were mixed with HEK293T landing pad cells that expressed S proteins and mNG211. Green fluorescence due to mNG2 complementation was generated when fusion between the two cell lines occurred. Fluorescence microscopy analysis showed that all mutants facilitated hACE2-mediated fusion (Fig. 6, A and B). Consistently, flow cytometry analysis at both 3-hour and 24-hour postmixing indicated that none of the tested mutants diminished the fusion activity when compared to WT (Fig. 6, C and D). At 3-hour postmixing, both S50Q (24%, P = 0.03) and G232E (25%, P = 0.01) showed mild, yet significant, increases in fusion activity compared to WT. Similarly, at 24-hour postmixing, S50Q (19%, P = 0.01), G232E (13%, P = 0.02), and S50Q/G232E double mutant (37%, P = 0.005) all showed an increase in fusion activity compared to WT. Such a mild increase in fusion activity may simply be attributed to the higher expression level of the mutants. However, because the relationship between protein expression level and fusion activity remained elusive, we are unable to assess the fusion activity per S protein molecule based on the current data. Negative control cells expressing the K986P/V987P double mutant, which is known to stabilize the prefusion form of the S protein (26, 48), did not show any fusion activity (Fig. 5, A to D). Overall, our results demonstrated that both S50Q and G232E did not affect the fusogenic capabilities of the S protein.
We then proceeded to investigate whether S50Q, G232E, and S50Q/G232E alter the antigenicity of the S protein. The binding of three antibodies targeting different domains of the S protein was tested, namely, CC12.3 (anti-RBD) (50), S2M28 (anti-NTD) (14), and COVA1-07 (anti-S2) (51). Notably, S2M28 is an NTD supersite–targeting antibody. Flow cytometry analysis showed that all three antibodies bound to the tested mutants at a similar level to WT (Fig. 6 and fig. S10), indicating that S50Q, G232E, and S50Q/G232E did not alter the structural conformation and antigenicity of the S protein.
DISCUSSION
S protein is central to the research of SARS-CoV-2 evolution and COVID-19 vaccines (50, 52–54). While both the RBD and the NTD on the S protein are targets of neutralizing antibodies and are involved in the antigenic drift of SARS-CoV-2 (43, 55–61), the NTD often receives less attention than does the RBD. Using deep mutational scanning, this study shows that many NTD mutations at buried residues do not affect S protein expression. At the same time, the closer an NTD mutation is to RBD/S2, the more likely it is detrimental to S protein expression. These observations imply that for optimum S protein expression, the structural stability at the NTD-RBD and the NTD-S2 interfaces is more critical than the folding stability of the NTD. Our results also at least partly explain why the N1 to N5 loops, which contain the NTD antigenic supersite (62) and are far from the NTD-RBD/S2 interfaces, are highly diverse among SARS-CoV-2 variants and sarbecovirus strains. Overall, this study provides crucial biophysical insights into the evolution of the NTD.
NTD mutations S50Q and G232E, which locate at the interdomain interface and increase S protein expression, represent another important finding of this study. Engineering high-expressing S protein can lower the production cost of recombinant COVID-19 vaccine and may improve the effectiveness of mRNA vaccines (25). Similar to certain previously characterized mutations in the S2 (26, 27), S50Q and G232E in the NTD increase the expression yield of the S protein without changing its Tm. Consistently, a recent study showed that NTD mutations in BA.1 improve the expression of S protein without increasing its thermostability (63). Furthermore, S50Q and G232E are not solvent-exposed on the S protein surface and do not seem to alter the antigenicity of the S protein. Notably, according to our deep mutational scanning data, S50Q and G232E are just two of many mutations that enhance S protein expression. Therefore, although most studies on S-based immunogen design focus on the mutations in the RBD and S2 (7, 26–29), our results suggest that mutations in NTD can provide a complementary strategy.
We acknowledge that the S protein expression level does not necessarily correlate with virus replication fitness. For example, NTD mutations that do not affect the S protein expression may be detrimental to the replication fitness of SARS-CoV-2 due to the negative impact on NTD functionality. While the functional importance of the NTD in natural infection remains largely unclear, NTD has been proposed to facilitate virus entry by interacting with DC-SIGN, L/SIGN, AXL, ASGR1, and KREMEN1 (64–66). Studies have also shown that the NTD can allosterically evade antibody binding by interacting with a heme metabolite (46) and modulate the efficiency of virus-host membrane fusion (67, 68). To fully comprehend the biophysical constraints of NTD, future studies should systematically investigate how different NTD mutations affect virus replication fitness.
MATERIALS AND METHODS
Construction of the NTD mutant library
SARS-CoV-2 S NTD mutant library was constructed based on the HEK293T landing pad system (37, 38). The template for constructing the NTD mutant library was a plasmid that encoded (from 5′ to 3′) an attB site, a codon-optimized SARS-CoV-2 S (GenBank ID: NC_045512.2) with the PRRA motif in the furin cleavage site deleted, an internal ribosome entry site (IRES), and a puromycin resistance marker. This plasmid was used as a PCR template to generate a linearized vector and a library of mutant NTD inserts. The linearized vector was generated using 5′-TGCTCGTCTCTACAACTCCGCCAGCTTCAGCACC-3′ and 5′-TGCTCGTCTCTTCACTGGCCGTCGTTTTACAACG-3′ as primers. Inserts were generated by two separate batches of PCRs to cover the entire NTD. The first batch of PCRs consisted of 36 reactions, each containing one cassette of forward primers and the universal reverse primer 5′-TGCTCGTCTCGTTGTACAGCACGGAGTAGTCGGC-3′. Each cassette contained an equal molar ratio of eight forward primers that had the same 21 nucleotides (nt) at the 5′ end and 15 nt at the 3′ end. Each primer within a cassette was also encoded with an NNK (N: A, C, G, T; K: G, T) sequence at a specified codon positions for saturation mutagenesis. In addition, each primer also carried unique silent mutations (also known as synonymous mutations) to help distinguish between sequencing errors and true mutations in downstream sequencing data analysis as described previously (40). The forward primers, named as CassetteX_N (X: cassette number, N: primer number), are listed in table S1. The second batch of PCR consisted of another 36 PCRs, each with a universal forward primer 5′-TGCTCGTCTCAGTGAATTGTAATACGACTCACTA-3′ and a unique reverse primer as listed in table S2. Subsequently, 36 overlapping PCRs were performed using the universal forward and reverse primers, as well as a mixture of 10 ng each of the corresponding products from the first and second batches of PCR. The 36 overlap PCR products were then mixed at equal molar ratio to generate the final insert of the NTD mutant library. All PCRs were performed using PrimeSTAR Max polymerase (Takara Bio, catalog no. R045B) per the manufacturer’s instruction, followed by purification using the Monarch Gel Extraction Kit (New England Biolabs, catalog no. T1020L). The final insert and the linearized vector were digested by BsmBI-v2 (New England Biolabs, catalog no. R0739L) and ligated using T4 DNA Ligase (New England Biolabs, catalog no. M0202L). Ligation product was purified by the PureLink PCR Purification Kit (Thermo Fisher Scientific, catalog no. K310002) and then transformed into MegaX Dh10B T1R cells (Thermo Fisher Scientific, catalog no. C640003). At least half a million colonies were collected. Plasmid mutant library was purified from the bacteria colonies using the PureLink HiPure Plasmid Midiprep Kit (Invitrogen, catalog no. K210005). All primers in this study were ordered from Integrated DNA Technologies.
Construction of stable cell lines using HEK293T landing pad cells
HEK293T landing pad cells (37, 38) were used to display the NTD mutant library for deep mutational scanning. Landing pad cells were maintained using complete growth medium consisting of Dulbecco’s modified Eagle’s medium (DMEM; Corning), 10% (v/v) fetal bovine serum (FBS; VWR), penicillin-streptomycin (Gibco), nonessential amino acid (Gibco), and doxycycline (2 μg/ml). Plasmid (1.2 μg) was transfected into 6 × 105 landing pad cells. For the deep mutational scanning experiment, eight transfection reactions were carried out in parallel to minimize loss of mutant diversity at the transfection step. Transfected cells were then incubated at 37°C with 5% CO2. After 48 hours, 10 nM AP1903 was supplemented to carry out negative selection. At 72 hours after the negative selection, positive selection antibiotic [puromycin (1 μg/ml) for NTD cell lines or hygromycin (100 μg/ml) for hACE2 cell lines] was supplemented to the medium to carry out positive enrichment of cells with successful recombination. Constructed cell lines would remain in the complete growth medium supplemented with doxycycline and the positive selection antibiotics.
Sorting the NTD mutant library based on S protein expression level
Four T-75 flasks (Corning) that were 90% confluent with cells that carried the NTD mutant library were washed with 1× phosphate-buffered saline (PBS), harvested with warm versene, and pelleted via centrifugation at 300g for 5 min at room temperature. Cells were then resuspended in FACS buffer [2% (v/v) FBS, 5 mM EDTA in DMEM supplemented with glucose, l-glutamine, and Hepes but without phenol red (Gibco)]. Subsequently, cells were incubated with CC40.8 (5 μg/ml) at 4°C with gentle shaking for 1 hour. Cells were washed once with ice-cold FACS buffer and incubated with PE anti-human IgG Fc (1 μg/ml; BioLegend, catalog no. 410708) at 4°C with gentle shaking in the dark for 1 hour. Cells were washed once and resuspended in ice-cold FACS buffer. Cells were then filtered using a 40-μm cell strainer (VWR) before cell sorting. FACS was performed using a BD FACSAria II cell sorter (BD Biosciences) with a 561-nm laser and a 582/15 band-pass filter. Cells were collected into ice-cold D10 medium [DMEM with glucose (4.5 g/liter), 4 mM l-glutamine, and sodium pyruvate (110 mg/liter; Corning), supplemented with 10% (v/v) FBS (VWR), 1× penicillin-streptomycin (Gibco), and 1× nonessential amino acids (Gibco)] and binned into no (bin 0), low (bin 1), medium (bin 2), and high (bin 3) expression according to PE signal, where each bin contains 25% of the singlet population (fig. S2). A biological replicate of the deep mutational scanning experiment was performed, starting from the transfection step.
Next-generation sequencing of the NTD mutant library
Sorted cells from each bin were pelleted at 300g, 4°C for 15 min and then resuspended in 200 μl of PBS (Corning). Genomic DNA extraction was then performed using the DNA Blood and Tissue Kit (Qiagen, catalog no. 69504) according to the manufacturer’s instructions with a modification: Cells were incubated at 56°C for 30 min instead of 10 min. The NTD mutant library was amplified from the genomic DNA in two nonoverlapping fragments using KOD Hot Start DNA polymerase (MilliporeSigma, catalog no. 710863) per the manufacturer’s instruction with the following two primer sets, respectively (also see table S3): set 1: 5′-CACTCTTTCCCTACACGACGCTCTTCCGATCTCTGCTGCCTCTGGTGTCCAGC-3′ (NTD-DMS-recover-1F) and 5′-GACTGGAGTTCAGACGTGTGCTCTTCCGATCTGTTGGCGCTGCTGTACACCCG-3′ (NTD-DMS-recover-1R); set 2: 5′-CACTCTTTCCCTACACGACGCTCTTCCGATCTAGCTGGATGGAAAGCGAGTTC-3′ (NTD-DMS-recover-2F) and 5′-GACTGGAGTTCAGACGTGTGCTCTTCCGATCTCACGGTGAAGGACTTCAGGGT-3′ (NTD-DMS-recover-2R).
A second round of PCR was carried out to add the adapter sequence and index to the amplicons as described previously (69). The final PCR products were submitted for next-generation sequencing using Illumina MiSeq PE300.
Analysis of next-generation sequencing data
Next-generation sequencing data were obtained in FASTQ format. Forward and reverse reads of each paired-end read were merged by PEAR (70). The merged reads were parsed by SeqIO module in BioPython (71). Primer sequences were trimmed from the merged reads. Trimmed reads with lengths inconsistent with the expected length were discarded. The trimmed reads were then translated to amino acid sequences, with sequencing error correction performed at the same time as previously described (40). Amino acid mutations were called by comparing the translated reads to the WT amino acid sequence. Frequency (F) of a mutant i at position s within bin n of replicate k was computed for each replicate as follows
(1) |
A pseudocount of 1 was added to the read counts of each mutant to avoid division by zero in subsequent steps. We then calculated the total frequency (Ftotal) of mutant i at position s as follows
(2) |
Mutants with Ftotal of equal or greater than 0.0075% were selected for downstream analysis. Subsequently, the weighted average (W) of each mutant among 4 bins (bin 0 to bin 3) in each replicate was computed as described previously (41)
(3) |
Selected mutants were then categorized based on the mutation types (missense, nonsense, and silent). The mean value of weighted average for nonsense and silent mutations was calculated. Expression score (ES) of a mutant i at position s of replicate k was calculated as described previously (41)
(4) |
The final expression score of a mutant i at position s was calculated by taking the average of the expression scores between replicates. Mutational tolerability of position s was then calculated by taking the average of the expression scores of all mutants at that position
(5) |
Structural analysis of deep mutational scanning results
DSSP (72, 73) was used to calculate the solvent exposure surface area (SASA) of each residue in NTD and RBD on the S trimer [Protein Data Bank (PDB) 6ZGE] (45). Deep mutational scanning result of RBD was extracted from a previous study (42). RSA was computed by dividing the SASA by the theoretical maximum allowed solvent accessibility of the corresponding amino acid (74).
Each NTD residue’s distance to RBD/S2 was calculated on the basis of the S trimer structure (PDB 6ZGE) (45), with the NTD replaced by the high-resolution crystal structure (PDB 7B62) (46). For each NTD residue, the distances to all RBD and S2 residues were measured. The shortest distance was then recorded as the “distance to RBD/S2.” Residue-residue distance was defined as the distance between the centroid coordinates of two residues.
To visualize the mutational tolerability of each NTD residue, the crystal structure of SARS-CoV-2 S protein NTD (PDB 7B62) was used (46). The NTD crystal structure was then aligned with the S trimer to generate the figures (PDB 6ZGE) (45).
Evolution analysis of NTD sequences
The sequence conservation analysis of NTD was based on 27 sarbecovirus strains (table S6) (1, 75–79). S sequences of these stains were retrieved from GenBank and GISAID (44). Their NTD sequences were then identified using tBlastn search using the amino acid sequence of SARS-CoV-2 Hu-1 NTD (gene ID: 43740568) as the query sequence. The BlastXML output of tBlastn was then parsed and used as the input for multiple sequence alignment using MAFFT (80, 81). For each residue position, sequence conservation was defined as the proportion of strains that contains the same amino acid variant as SARS-CoV-2 Hu-1. The information of the 17 SARS-CoV-2 major variants and the observed naturally occurring NTD missense/indel sites were collected from ViralZone (table S7) (82).
Rosetta-based mutagenesis
The structure of the S protein was obtained from PDB (PDB 6ZGE) (45). Water molecules and N-acetyl-d-glucosamine were removed using PyMOL (Schrödinger). Then, the amino acids were renumbered using pdb-tools (83). Fixed backbone point mutagenesis for S50Q and G232E was performed using the “fixbb” application in Rosetta (RosettaCommons). One-hundred poses were generated for each mutagenesis. Using the lowest-scoring structure from fixed backbone mutagenesis as input, a constraint file was obtained using the minimize_with_cst application in Rosetta. Fast relax was then performed via the “relax” application in Rosetta (47) with the corresponding constraint file. The lowest-scoring structure out of eight was then used for structural analysis. Code and source files for structural modeling are available in https://github.com/nicwulab/SARS-CoV-2_NTD_DMS/tree/main/rosetta.
Split mNG2-based cell-cell fusion assay
hACE2 construct was constructed in a previous study (38). A split mNG2 reporter system was integrated into the S plasmid (see above) and the hACE2 plasmid (49). Specifically, a gene fragment that encoded (from 5′ to 3′) a GCN4 leucine zipper, a GS linker, mNG21–10, and a 2A self-cleaving peptide was inserted into the hACE2 plasmid between the IRES and the hygromycin resistance marker. Similarly, a gene fragment that encodes (from 5′ to 3′) a GCN4 leucine zipper, a GS linker, mNG211, and a 2A self-cleaving peptide was inserted into the S plasmid between the IRES and the puromycin resistance marker. Each plasmid construct was transfected and recombined into HEK293T landing pad cells per step described above.
Once the stable cell lines were created, 5 × 105 landing pad cells expressing hACE2 with mNG21–10 were seeded in six-well plates (Fisher Scientific). The cells were then incubated at 37°C with 5% CO2 for 15 min to allow seeding. Subsequently, 5 × 105 landing pad cells expressing S with mNG211 were then added dropwise to the seeded hACE2 cells. Both cells were filtered through 40-μm cell strainer (VWR) before seeding. At 3- and 24-hour postmixing, fusion events in each well were qualitatively assessed with an ECHO Revolve epifluorescence microscope (ECHO) in inverted format. Overlaid images were captured on white light and fluorescein isothiocyanate filter channels using an UPlanFL N 10×/0.30 numerical aperture objective (Olympus) with identical light intensity and exposure settings for all conditions. Cells in each well were then collected using 0.5 mM EDTA, pelleted via centrifugation at 300g for 5 min at room temperature, and resuspended in the FACS buffer. LSRII flow cytometry (BD Biosciences) was used to quantify the fusion events of each sample. Negative controls were measured first to set up proper gating strategies (fig. S9). Then, the flow cytometry analysis was performed on 105 live cells for each sample. Data were analyzed using FCS Express 6 software (De Novo Software). The percentage of mNG2-positive population of each sample was used for normalization (table S4).
Flow cytometry analysis for the protein expression assay and antibody binding assay
Approximately 1 × 106 cells that carried the selected SARS-CoV-2 S NTD mutant were washed with 1× PBS, harvested with warm versene, and pelleted via centrifugation at 300g for 5 min at room temperature. The cells were resuspended in the FACS buffer. Subsequently, cells were incubated with 5 μg/ml of the selected antibodies at 4°C with gentle shaking for 1 hour. Cells were then washed once with ice-cold FACS buffer and incubated with PE anti-human IgG Fc (2 μg/ml; BioLegend, catalog no. 410708) at 4°C with gentle shaking in the dark for 1 hour. Cells were washed once, pelleted via centrifugation at 300g for 5 min at room temperature, and resuspended in ice-cold FACS buffer. LSRII flow cytometry (BD Biosciences) was used to measure the PE signal of each sample. Negative controls were measured first to set up proper gating strategies (figs. S8 and S10). Then, the flow cytometry analysis was performed on 105 singlets for each sample. Data were analyzed using FCS Express 6 software (De Novo Software).
Normalization of the expression assay results
The MFI of the entire population was recorded for each sample, followed by the normalization as previously described (42). For a given sample i, the following equation was used to compute the normalized expression (NE)
(6) |
Normalizations were performed for each sample within a given biological replicate (table S4).
Recombinant expression and purification of soluble S protein
SARS-CoV-2 S ectodomain with the PRRA motif in the furin cleavage site deleted and mutations K986P/V987P, which are known to stabilize the prefusion conformation and increase expression (26, 48), was cloned into a phCMV3 vector. The S ectodomain construct contained a trimerization domain and a 6×His-tag at the C terminus. Expi293F cells (Gibco), which were maintained using Expi293 expression medium (Gibco), were used to express soluble S protein. Briefly, 25 μg of the plasmid was transfected into 25 ml of Expi293F cells at 3 × 106 cells ml−1 using the ExpiFectamine 293 Transfection Kit (Thermo Fisher Scientific) following the manufacturer’s instructions. Transfected cells were then incubated at 37°C, 8% CO2 and shaking at 125 rpm for 6 days. Cell cultures were then harvested and centrifuged at 4000g at 4°C for 15 min. The supernatant was clarified using a 0.22-μm polyethersulfone filter (Millipore). S protein in the clarified supernatant was then purified using Nickel Sepharose Excel resin (Cytiva), with 20 mM imidazole in PBS as wash buffer, and 300 mM imidazole in PBS as elution buffer. Three rounds of 2-ml elutions were performed. The eluted protein was then concentrated and analyzed by SDS–polyacrylamide gel electrophoresis reducing gels (Bio-Rad; fig. S6A). Concentrated protein solution was further purified using Superdex 200 XK 16/100 size exclusion column (Cytiva) in 20 mM tris-HCl (pH 8.0) and 150 mM NaCl (fig. S6B). Selected elution fractions were combined and concentrated. Final protein concentration was measured using NanoDrop One (Thermo Fisher Scientific).
Protein thermostability assay
Five micrograms of purified protein was mixed with 5× SYPRO orange (Thermo Fisher Scientific) in 20 mM tris-HCl (pH 8.0), 150 mM NaCl at a final volume of 25 μl. The sample mixture was then transferred into an optically clear PCR tube (VWR). SYPRO orange fluorescence data in relative fluorescence unit (RFU) was collected from 10° to 95°C using CFX Connect Real-Time PCR Detection System (Bio-Rad). The temperature corresponding to the lowest point of the first derivative, −d(RFU)/dT, was defined as the melting temperature (Tm). Data were analyzed using OriginPro 2020b (Origin Lab). Raw data are shown in table S5.
Acknowledgments
We thank M. Yuan, H. Lv, and Q. W. Teo for helpful discussion and the Roy J. Carver Biotechnology Center at the University of Illinois at Urbana-Champaign for assistance with fluorescence-activated cell sorting and next-generation sequencing.
Funding: This work was supported by NIH grants R00 AI139445 (to N.C.W.), DP2 AT011966 (to N.C.W.), and R01 AI167910 (to N.C.W.) and The Michelson Prizes for Human Immunology and Vaccine Research (to N.C.W.).
Author contributions: Conceptualization: W.O.O., T.J.C.T., and N.C.W. Methodology: W.O.O., T.J.C.T., K.A.M., and N.C.W. Investigation: W.O.O., T.J.C.T., R.L., and N.C.W. Resources: G.S., R.A., and K.A.M. Formal analysis: W.O.O., T.J.C.T., C.K., and N.C.W. Visualization: W.O.O., T.J.C.T., and N.C.W. Supervision: N.C.W. Writing—original draft: W.O.O., T.J.C.T., and N.C.W. Writing—review and editing: W.O.O., T.J.C.T., R.L., G.S., C.K., R.A., K.A.M., and N.C.W.
Competing interests: N.C.W. consults for HeliXon. The authors declare no other competing interests.
Data and materials availability: Raw sequencing data have been submitted to the NIH Short Read Archive under accession number: BioProject PRJNA792013. All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Source data and custom python/R scripts for data analysis and plotting in this study have been deposited to Zenodo (https://doi.org/10.5281/zenodo.7066099) and GitHub (https://github.com/nicwulab/SARS-CoV-2_NTD_DMS).
Supplementary Materials
This PDF file includes:
Other Supplementary Material for this manuscript includes the following:
REFERENCES AND NOTES
- 1.Zhou P., Yang X.-L., Wang X.-G., Hu B., Zhang L., Zhang W., Si H.-R., Zhu Y., Li B., Huang C.-L., Chen H.-D., Chen J., Luo Y., Guo H., Jiang R.-D., Liu M.-Q., Chen Y., Shen X.-R., Wang X., Zheng X.-S., Zhao K., Chen Q.-J., Deng F., Liu L.-L., Yan B., Zhan F.-X., Wang Y.-Y., Xiao G.-F., Shi Z.-L., A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zhu N., Zhang D., Wang W., Li X., Yang B., Song J., Zhao X., Huang B., Shi W., Lu R., Niu P., Zhan F., Ma X., Wang D., Xu W., Wu G., Gao G. F., Tan W., A novel coronavirus from patients with pneumonia in China, 2019. N. Engl. J. Med. 382, 727–733 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Letko M., Marzi A., Munster V., Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses. Nat. Microbiol. 5, 562–569 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Walls A. C., Park Y.-J., Tortorici M. A., Wall A., McGuire A. T., Veesler D., Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell 181, 281–292.e6 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Strohl W. R., Ku Z., An Z., Carroll S. F., Keyt B. A., Strohl L. M., Passive immunotherapy against SARS-CoV-2: From plasma-based therapy to single potent antibodies in the race to stay ahead of the variants. BioDrugs 36, 231–323 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Yuan M., Liu H., Wu N. C., Wilson I. A., Recognition of the SARS-CoV-2 receptor binding domain by neutralizing antibodies. Biochem. Biophys. Res. Commun. 538, 192–203 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wrapp D., Wang N., Corbett K. S., Goldsmith J. A., Hsieh C.-L., Abiona O., Graham B. S., McLellan J. S., Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science 367, 1260–1263 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Huang Y., Yang C., Xu X., Xu W., Liu S., Structural and functional properties of SARS-CoV-2 spike protein: Potential antivirus drug development for COVID-19. Acta Pharmacol. Sin. 41, 1141–1149 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Premkumar L., Segovia-Chumbez B., Jadi R., Martinez D. R., Raut R., Markmann A. J., Cornaby C., Bartelt L., Weiss S., Park Y., Edwards C. E., Weimer E., Scherer E. M., Rouphael N., Edupuganti S., Weiskopf D., Tse L. V., Hou Y. J., Margolis D., Sette A., Collins M. H., Schmitz J., Baric R. S., de Silva A. M., The receptor-binding domain of the viral spike protein is an immunodominant and highly specific target of antibodies in SARS-CoV-2 patients. Sci. Immunol. 5, eabc8413 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Piccoli L., Park Y.-J., Tortorici M. A., Czudnochowski N., Walls A. C., Beltramello M., Silacci-Fregni C., Pinto D., Rosen L. E., Bowen J. E., Acton O. J., Jaconi S., Guarino B., Minola A., Zatta F., Sprugasci N., Bassi J., Peter A., De Marco A., Nix J. C., Mele F., Jovic S., Rodriguez B. F., Gupta S. V., Jin F., Piumatti G., Presti G. L., Pellanda A. F., Biggiogero M., Tarkowski M., Pizzuto M. S., Cameroni E., Havenar-Daughton C., Smithey M., Hong D., Lepori V., Albanese E., Ceschi A., Bernasconi E., Elzi L., Ferrari P., Garzoni C., Riva A., Snell G., Sallusto F., Fink K., Virgin H. W., Lanzavecchia A., Corti D., Veesler D., Mapping neutralizing and immunodominant sites on the SARS-CoV-2 spike receptor-binding domain by structure-guided high-resolution serology. Cell 183, 1024–1042.e21 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Greaney A. J., Loes A. N., Crawford K. H. D., Starr T. N., Malone K. D., Chu H. Y., Bloom J. D., Comprehensive mapping of mutations in the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human plasma antibodies. Cell Host Microbe 29, 463–476.e6 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chi X., Yan R., Zhang J., Zhang G., Zhang Y., Hao M., Zhang Z., Fan P., Dong Y., Yang Y., Chen Z., Guo Y., Zhang J., Li Y., Song X., Chen Y., Xia L., Fu L., Hou L., Xu J., Yu C., Li J., Zhou Q., Chen W., A neutralizing human antibody binds to the N-terminal domain of the spike protein of SARS-CoV-2. Science 369, 650–655 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Liu L., Wang P., Nair M. S., Yu J., Rapp M., Wang Q., Luo Y., Chan J. F.-W., Sahi V., Figueroa A., Guo X. V., Cerutti G., Bimela J., Gorman J., Zhou T., Chen Z., Yuen K.-Y., Kwong P. D., Sodroski J. G., Yin M. T., Sheng Z., Huang Y., Shapiro L., Ho D. D., Potent neutralizing antibodies against multiple epitopes on SARS-CoV-2 spike. Nature 584, 450–456 (2020). [DOI] [PubMed] [Google Scholar]
- 14.McCallum M., De Marco A., Lempp F. A., Tortorici M. A., Pinto D., Walls A. C., Beltramello M., Chen A., Liu Z., Zatta F., Zepeda S., di Iulio J., Bowen J. E., Montiel-Ruiz M., Zhou J., Rosen L. E., Bianchi S., Guarino B., Fregni C. S., Abdelnabi R., Foo S.-Y. C., Rothlauf P. W., Bloyet L.-M., Benigni F., Cameroni E., Neyts J., Riva A., Snell G., Telenti A., Whelan S. P. J., Virgin H. W., Corti D., Pizzuto M. S., Veesler D., N-terminal domain antigenic mapping reveals a site of vulnerability for SARS-CoV-2. Cell 184, 2332–2347.e16 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Suryadevara N., Shrihari S., Gilchuk P., VanBlargan L. A., Binshtein E., Zost S. J., Nargi R. S., Sutton R. E., Winkler E. S., Chen E. C., Fouch M. E., Davidson E., Doranz B. J., Chen R. E., Shi P.-Y., Carnahan R. H., Thackray L. B., Diamond M. S., Crowe J. E., Neutralizing and protective human monoclonal antibodies recognizing the N-terminal domain of the SARS-CoV-2 spike protein. Cell 184, 2316–2331.e15 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wang Z., Muecksch F., Cho A., Gaebler C., Hoffmann H.-H., Ramos V., Zong S., Cipolla M., Johnson B., Schmidt F., DaSilva J., Bednarski E., Ben Tanfous T., Raspe R., Yao K., Lee Y. E., Chen T., Turroja M., Milard K. G., Dizon J., Kaczynska A., Gazumyan A., Oliveira T. Y., Rice C. M., Caskey M., Bieniasz P. D., Hatziioannou T., Barnes C. O., Nussenzweig M. C., Analysis of memory B cells identifies conserved neutralizing epitopes on the N-terminal domain of variant SARS-Cov-2 spike proteins. Immunity 55, 998–1012.e8 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Cerutti G., Guo Y., Zhou T., Gorman J., Lee M., Rapp M., Reddem E. R., Yu J., Bahna F., Bimela J., Huang Y., Katsamba P. S., Liu L., Nair M. S., Rawi R., Olia A. S., Wang P., Zhang B., Chuang G.-Y., Ho D. D., Sheng Z., Kwong P. D., Shapiro L., Potent SARS-CoV-2 neutralizing antibodies directed against spike N-terminal domain target a single supersite. Cell Host Microbe 29, 819–833.e7 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Wolf K. A., Kwan J. C., Kamil J. P., Structural dynamics and molecular evolution of the SARS-CoV-2 spike protein. MBio 13, e02030-21 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lin W.-S., Chen I.-C., Chen H.-C., Lee Y.-C., Wu S.-C., Glycan masking of epitopes in the NTD and RBD of the spike protein elicits broadly neutralizing antibodies against SARS-CoV-2 variants. Front. Immunol. 12, 795741 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Tenforde M. W., Self W. H., Gaglani M., Ginde A. A., Douin D. J., Talbot H. K., Casey J. D., Mohr N. M., Zepeski A., McNeal T., Ghamande S., Gibbs K. W., Files D. C., Hager D. N., Shehu A., Prekker M. E., Frosch A. E., Gong M. N., Mohamed A., Johnson N. J., Srinivasan V., Steingrub J. S., Peltan I. D., Brown S. M., Martin E. T., Monto A. S., Khan A., Hough C. L., Busse L. W., Duggal A., Wilson J. G., Qadir N., Chang S. Y., Mallow C., Rivas C., Babcock H. M., Kwon J. H., Exline M. C., Botros M., Lauring A. S., Shapiro N. I., Halasa N., Chappell J. D., Grijalva C. G., Rice T. W., Jones I. D., Stubblefield W. B., Baughman A., Womack K. N., Rhoads J. P., Lindsell C. J., Hart K. W., Zhu Y., Adams K., Surie D., McMorrow M. L., Patel M. M., Network I. V. Y., Effectiveness of mRNA vaccination in preventing COVID-19-associated invasive mechanical ventilation and death - United States, March 2021-January 2022. MMWR Morb. Mortal. Wkly Rep. 71, 459–465 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tartof S. Y., Slezak J. M., Fischer H., Hong V., Ackerson B. K., Ranasinghe O. N., Frankland T. B., Ogun O. A., Zamparo J. M., Gray S., Valluri S. R., Pan K., Angulo F. J., Jodar L., McLaughlin J. M., Effectiveness of mRNA BNT162b2 COVID-19 vaccine up to 6 months in a large integrated health system in the USA: A retrospective cohort study. Lancet 398, 1407–1416 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Heath P. T., Galiza E. P., Baxter D. N., Boffito M., Browne D., Burns F., Chadwick D. R., Clark R., Cosgrove C., Galloway J., Goodman A. L., Heer A., Higham A., Iyengar S., Jamal A., Jeanes C., Kalra P. A., Kyriakidou C., McAuley D. F., Meyrick A., Minassian A. M., Minton J., Moore P., Munsoor I., Nicholls H., Osanlou O., Packham J., Pretswell C. H., Ramos A. S. F., Saralaya D., Sheridan R. P., Smith R., Soiza R. L., Swift P. A., Thomson E. C., Turner J., Viljoen M. E., Albert G., Cho I., Dubovsky F., Glenn G., Rivers J., Robertson A., Smith K., Toback S.; 2019nCoV-302 Study Group , Safety and efficacy of NVX-CoV2373 COVID-19 vaccine. N. Engl. J. Med. 385, 1172–1183 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sridhar S., Joaquin A., Bonaparte M. I., Bueso A., Chabanon A.-L., Chen A., Chicz R. M., Diemert D., Essink B. J., Fu B., Grunenberg N. A., Janosczyk H., Keefer M. C., M D. M. R., Meng Y., Michael N. L., Munsiff S. S., Ogbuagu O., Raabe V. N., Severance R., Rivas E., Romanyak N., Rouphael N. G., Schuerman L., Sher L. D., Walsh S. R., White J., von Barbier D., de Bruyn G., Canter R., Grillet M.-H., Keshtkar-Jahromi M., Koutsoukos M., Lopez D., Masotti R., Mendoza S., Moreau C., Ceregido M. A., Ramirez S., Said A., Tavares-Da-Silva F., Shi J., Tong T., Treanor J., Diazgranados C. A., Savarino S., Safety and immunogenicity of an AS03-adjuvanted SARS-CoV-2 recombinant protein vaccine (CoV2 preS dTM) in healthy adults: Interim findings from a phase 2, randomised, dose-finding, multicentre study. Lancet Infect. Dis. 22, 636–648 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Pollet J., Chen W.-H., Strych U., Recombinant protein vaccines, a proven approach against coronavirus pandemics. Adv. Drug Deliv. Rev. 170, 71–82 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Schlake T., Thess A., Fotin-Mleczek M., Kallen K.-J., Developing mRNA-vaccine technologies. RNA Biol. 9, 1319–1330 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Juraszek J., Rutten L., Blokland S., Bouchier P., Voorzaat R., Ritschel T., Bakkers M. J. G., Renault L. L. R., Langedijk J. P. M., Stabilizing the closed SARS-CoV-2 spike trimer. Nat. Commun. 12, 244 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Hsieh C.-L., Goldsmith J. A., Schaub J. M., DiVenere A. M., Kuo H.-C., Javanmardi K., Le K. C., Wrapp D., Lee A. G., Liu Y., Chou C.-W., Byrne P. O., Hjorth C. K., Johnson N. V., Ludes-Meyers J., Nguyen A. W., Park J., Wang N., Amengor D., Lavinder J. J., Ippolito G. C., Maynard J. A., Finkelstein I. J., McLellan J. S., Structure-based design of prefusion-stabilized SARS-CoV-2 spikes. Science 369, 1501–1505 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.E. Olmedillas, C. J. Mann, W. Peng, Y.-T. Wang, R. D. Avalos, D. Bedinger, K. Valentine, N. Shafee, S. L. Schendel, M. Yuan, G. Lang, R. Rouet, D. Christ, W. Jiang, I. A. Wilson, T. Germann, S. Shresta, J. Snijder, E. O. Saphire, Structure-based design of a highly stable, covalently-linked SARS-CoV-2 spike trimer with improved structural properties and immunogenicity. bioRxiv 2021.05.06.441046 [Preprint]. 6 May 2021. 10.1101/2021.05.06.441046. [DOI]
- 29.Ellis D., Brunette N., Crawford K. H. D., Walls A. C., Pham M. N., Chen C., Herpoldt K.-L., Fiala B., Murphy M., Pettie D., Kraft J. C., Malone K. D., Navarro M. J., Ogohara C., Kepl E., Ravichandran R., Sydeman C., Ahlrichs M., Johnson M., Blackstone A., Carter L., Starr T. N., Greaney A. J., Lee K. K., Veesler D., Bloom J. D., King N. P., Stabilization of the SARS-CoV-2 spike receptor-binding domain using deep mutational scanning and structure-based design. Front. Immunol. 12, 710263 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Fowler D. M., Fields S., Deep mutational scanning: A new style of protein science. Nat. Methods 11, 801–807 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Greaney A. J., Starr T. N., Barnes C. O., Weisblum Y., Schmidt F., Caskey M., Gaebler C., Cho A., Agudelo M., Finkin S., Wang Z., Poston D., Muecksch F., Hatziioannou T., Bieniasz P. D., Robbiani D. F., Nussenzweig M. C., Bjorkman P. J., Bloom J. D., Mapping mutations to the SARS-CoV-2 RBD that escape binding by different classes of antibodies. Nat. Commun. 12, 4196 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Greaney A. J., Starr T. N., Bloom J. D., An antibody-escape estimator for mutations to the SARS-CoV-2 receptor-binding domain. Virus Evol. 8, veac021 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Starr T. N., Czudnochowski N., Liu Z., Zatta F., Park Y.-J., Addetia A., Pinto D., Beltramello M., Hernandez P., Greaney A. J., Marzi R., Glass W. G., Zhang I., Dingens A. S., Bowen J. E., Tortorici M. A., Walls A. C., Wojcechowskyj J. A., De Marco A., Rosen L. E., Zhou J., Montiel-Ruiz M., Kaiser H., Dillen J. R., Tucker H., Bassi J., Silacci-Fregni C., Housley M. P., di Iulio J., Lombardo G., Agostini M., Sprugasci N., Culap K., Jaconi S., Meury M., Dellota E. Jr., Abdelnabi R., Foo S.-Y. C., Cameroni E., Stumpf S., Croll T. I., Nix J. C., Havenar-Daughton C., Piccoli L., Benigni F., Neyts J., Telenti A., Lempp F. A., Pizzuto M. S., Chodera J. D., Hebner C. M., Virgin H. W., Whelan S. P. J., Veesler D., Corti D., Bloom J. D., Snell G., SARS-CoV-2 RBD antibodies that maximize breadth and resistance to escape. Nature 597, 97–102 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Starr T. N., Greaney A. J., Hilton S. K., Ellis D., Crawford K. H. D., Dingens A. S., Navarro M. J., Bowen J. E., Tortorici M. A., Walls A. C., King N. P., Veesler D., Bloom J. D., Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell 182, 1295–1310.e20 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Starr T. N., Greaney A. J., Dingens A. S., Bloom J. D., Complete map of SARS-CoV-2 RBD mutations that escape the monoclonal antibody LY-CoV555 and its cocktail with LY-CoV016. Cell Rep. Med. 2, 100255 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Starr T. N., Greaney A. J., Addetia A., Hannon W. W., Choudhary M. C., Dingens A. S., Li J. Z., Bloom J. D., Prospective mapping of viral mutations that escape antibodies used to treat COVID-19. Science 371, 850–854 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Matreyek K. A., Stephany J. J., Fowler D. M., A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res. 45, e102 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Shukla N., Roelle S. M., Suzart V. G., Bruchez A. M., Matreyek K. A., Mutants of human ACE2 differentially promote SARS-CoV and SARS-CoV-2 spike mediated infection. PLOS Pathog. 17, e1009715 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Zhou P., Yuan M., Song G., Beutler N., Shaabani N., Huang D., He W., Zhu X., Callaghan S., Yong P., Anzanello F., Peng L., Ricketts J., Parren M., Garcia E., Rawlings S. A., Smith D. M., Nemazee D., Teijaro J. R., Rogers T. F., Wilson I. A., Burton D. R., Andrabi R., A human antibody reveals a conserved site on beta-coronavirus spike proteins and confers protection against SARS-CoV-2 infection. Sci. Transl. Med. 14, eabi9215 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Olson C. A., Wu N. C., Sun R., A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr. Biol. 24, 2643–2651 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Matreyek K. A., Starita L. M., Stephany J. J., Martin B., Chiasson M. A., Gray V. E., Kircher M., Khechaduri A., Dines J. N., Hause R. J., Bhatia S., Evans W. E., Relling M. V., Yang W., Shendure J., Fowler D. M., Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 50, 874–882 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Chan K. K., Tan T. J. C., Narayanan K. K., Procko E., An engineered decoy receptor for SARS-CoV-2 broadly binds protein S sequence variants. Sci. Adv. 7, eabf1738 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.McCallum M., Walls A. C., Sprouse K. R., Bowen J. E., Rosen L. E., Dang H. V., De Marco A., Franko N., Tilles S. W., Logue J., Miranda M. C., Ahlrichs M., Carter L., Snell G., Pizzuto M. S., Chu H. Y., Van Voorhis W. C., Corti D., Veesler D., Molecular basis of immune evasion by the Delta and Kappa SARS-CoV-2 variants. Science 374, 1621–1626 (2021). [DOI] [PubMed] [Google Scholar]
- 44.Shu Y., McCauley J., GISAID: Global initiative on sharing all influenza data—From vision to reality. Euro Surveill. 22, 30494 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wrobel A. G., Benton D. J., Xu P., Roustan C., Martin S. R., Rosenthal P. B., Skehel J. J., Gamblin S. J., SARS-CoV-2 and bat RaTG13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects. Nat. Struct. Mol. Biol. 27, 763–767 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Rosa A., Pye V. E., Graham C., Muir L., Seow J., Ng K. W., Cook N. J., Rees-Spear C., Parker E., Santos M. S. D., Rosadas C., Susana A., Rhys H., Nans A., Masino L., Roustan C., Christodoulou E., Ulferts R., Wrobel A. G., Short C.-E., Fertleman M., Sanders R. W., Heaney J., Spyer M., Kjær S., Riddell A., Malim M. H., Beale R., MacRae J. I., Taylor G. P., Nastouli E., van Gils M. J., Rosenthal P. B., Pizzato M., McClure M. O., Tedder R. S., Kassiotis G., McCoy L. E., Doores K. J., Cherepanov P., SARS-CoV-2 can recruit a heme metabolite to evade antibody immunity. Sci. Adv. 7, eabg7607 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Kellogg E. H., Leaver-Fay A., Baker D., Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins 79, 830–838 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Sanders R. W., Moore J. P., Virus vaccines: Proteins prefer prolines. Cell Host Microbe 29, 327–333 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Feng S., Sekine S., Pessino V., Li H., Leonetti M. D., Huang B., Improved split fluorescent proteins for endogenous protein labeling. Nat. Commun. 8, 370 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Yuan M., Huang D., Lee C.-C. D., Wu N. C., Jackson A. M., Zhu X., Liu H., Peng L., van Gils M. J., Sanders R. W., Burton D. R., Reincke S. M., Prüss H., Kreye J., Nemazee D., Ward A. B., Wilson I. A., Structural and functional ramifications of antigenic drift in recent SARS-CoV-2 variants. Science 373, 818–823 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Claireaux M., Caniels T. G., de Gast M., Han J., Guerra D., Kerster G., van Schaik B. D., Jongejan A., Schriek A. I., Grobben M., Brouwer P. J., van der Straten K., Aldon Y., Capella-Pujol J., Snitselaar J. L., Olijhoek W., Aartse A., Brinkkemper M., Bontjer I., Burger J. A., Poniman M., Bijl T. P., Torres J. L., Copps J., Martin I. C., de Taeye S. W., de Bree G. J., Ward A. B., Sliepen K., van Kampen A. H., Moerland P. D., Sanders R. W., van Gils M. J., A public antibody class recognizes an S2 epitope exposed on open conformations of SARS-CoV-2 spike. Nat. Commun. 13, 4539 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Yewdell J. W., Antigenic drift: Understanding COVID-19. Immunity 54, 2681–2687 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Harvey W. T., Carabelli A. M., Jackson B., Gupta R. K., Thomson E. C., Harrison E. M., Ludden C., Reeve R., Rambaut A., Peacock S. J., Robertson D. L., SARS-CoV-2 variants, spike mutations and immune escape. Nat. Rev. Microbiol. 19, 409–424 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Telenti A., Arvin A., Corey L., Corti D., Diamond M. S., García-Sastre A., Garry R. F., Holmes E. C., Pang P. S., Virgin H. W., After the pandemic: Perspectives on the future trajectory of COVID-19. Nature 596, 495–504 (2021). [DOI] [PubMed] [Google Scholar]
- 55.Wang P., Nair M. S., Liu L., Iketani S., Luo Y., Guo Y., Wang M., Yu J., Zhang B., Kwong P. D., Graham B. S., Mascola J. R., Chang J. Y., Yin M. T., Sobieszczyk M., Kyratsous C. A., Shapiro L., Sheng Z., Huang Y., Ho D. D., Antibody resistance of SARS-CoV-2 variants B.1.351 and B.1.1.7. Nature 593, 130–135 (2021). [DOI] [PubMed] [Google Scholar]
- 56.Chen R. E., Zhang X., Case J. B., Winkler E. S., Liu Y., VanBlargan L. A., Liu J., Errico J. M., Xie X., Suryadevara N., Gilchuk P., Zost S. J., Tahan S., Droit L., Turner J. S., Kim W., Schmitz A. J., Thapa M., Wang D., Boon A. C. M., Presti R. M., O’Halloran J. A., Kim A. H. J., Deepak P., Pinto D., Fremont D. H., Crowe J. E., Corti D., Virgin H. W., Ellebedy A. H., Shi P.-Y., Diamond M. S., Resistance of SARS-CoV-2 variants to neutralization by monoclonal and serum-derived polyclonal antibodies. Nat. Med. 27, 717–726 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.McCallum M., Bassi J., De Marco A., Chen A., Walls A. C., Di Iulio J., Tortorici M. A., Navarro M.-J., Silacci-Fregni C., Saliba C., Sprouse K. R., Agostini M., Pinto D., Culap K., Bianchi S., Jaconi S., Cameroni E., Bowen J. E., Tilles S. W., Pizzuto M. S., Guastalla S. B., Bona G., Pellanda A. F., Garzoni C., Van Voorhis W. C., Rosen L. E., Snell G., Telenti A., Virgin H. W., Piccoli L., Corti D., Veesler D., SARS-CoV-2 immune evasion by the B.1.427/B.1.429 variant of concern. Science 373, 648–654 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Mlcochova P., Kemp S. A., Dhar M. S., Papa G., Meng B., Ferreira I. A. T. M., Datir R., Collier D. A., Albecka A., Singh S., Pandey R., Brown J., Zhou J., Goonawardane N., Mishra S., Whittaker C., Mellan T., Marwal R., Datta M., Sengupta S., Ponnusamy K., Radhakrishnan V. S., Abdullahi A., Charles O., Chattopadhyay P., Devi P., Caputo D., Peacock T., Wattal C., Goel N., Satwik A., Vaishya R., Agarwal M.; Indian SARS-CoV-2 Genomics Consortium (INSACOG); Genotype to Phenotype Japan (G2P-Japan) Consortium; CITIID-NIHR BioResource COVID-19 Collaboration, Mavousian A., Lee J. H., Bassi J., Silacci-Fegni C., Saliba C., Pinto D., Irie T., Yoshida I., Hamilton W. L., Sato K., Bhatt S., Flaxman S., James L. C., Corti D., Piccoli L., Barclay W. S., Rakshit P., Agrawal A., Gupta R. K., SARS-CoV-2 B.1.617.2 Delta variant replication and immune evasion. Nature 599, 114–119 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Cameroni E., Bowen J. E., Rosen L. E., Saliba C., Zepeda S. K., Culap K., Pinto D., VanBlargan L. A., De Marco A., di Iulio J., Zatta F., Kaiser H., Noack J., Farhat N., Czudnochowski N., Havenar-Daughton C., Sprouse K. R., Dillen J. R., Powell A. E., Chen A., Maher C., Yin L., Sun D., Soriaga L., Bassi J., Silacci-Fregni C., Gustafsson C., Franko N. M., Logue J., Iqbal N. T., Mazzitelli I., Geffner J., Grifantini R., Chu H., Gori A., Riva A., Giannini O., Ceschi A., Ferrari P., Cippà P. E., Franzetti-Pellanda A., Garzoni C., Halfmann P. J., Kawaoka Y., Hebner C., Purcell L. A., Piccoli L., Pizzuto M. S., Walls A. C., Diamond M. S., Telenti A., Virgin H. W., Lanzavecchia A., Snell G., Veesler D., Corti D., Broadly neutralizing antibodies overcome SARS-CoV-2 Omicron antigenic shift. Nature 602, 664–670 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Singh Y., Fuloria N. K., Fuloria S., Subramaniyan V., Meenakshi D. U., Chakravarthi S., Kumari U., Joshi N., Gupta G., N-terminal domain of SARS CoV-2 spike protein mutation associated reduction in effectivity of neutralizing antibody with vaccinated individuals. J. Med. Virol. 93, 5726–5728 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Gobeil S. M.-C., Janowska K., McDowell S., Mansouri K., Parks R., Stalls V., Kopp M. F., Manne K., Li D., Wiehe K., Saunders K. O., Edwards R. J., Korber B., Haynes B. F., Henderson R., Acharya P., Effect of natural mutations of SARS-CoV-2 on spike structure, conformation, and antigenicity. Science 373, eabi6226 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Lok S.-M., An NTD supersite of attack. Cell Host Microbe 29, 744–746 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Javanmardi K., Segall-Shapiro T. H., Chou C.-W., Boutz D. R., Olsen R. J., Xie X., Xia H., Shi P.-Y., Johnson C. D., Annapareddy A., Weaver S., Musser J. M., Ellington A. D., Finkelstein I. J., Gollihar J. D., Antibody escape and cryptic cross-domain stabilization in the SARS-CoV-2 Omicron spike protein. Cell Host Microbe 30, 1242–1254.e6 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Wang S., Qiu Z., Hou Y., Deng X., Xu W., Zheng T., Wu P., Xie S., Bian W., Zhang C., Sun Z., Liu K., Shan C., Lin A., Jiang S., Xie Y., Zhou Q., Lu L., Huang J., Li X., AXL is a candidate receptor for SARS-CoV-2 that promotes infection of pulmonary and bronchial epithelial cells. Cell Res. 31, 126–140 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Gu Y., Cao J., Zhang X., Gao H., Wang Y., Wang J., He J., Jiang X., Zhang J., Shen G., Yang J., Zheng X., Hu G., Zhu Y., Du S., Zhu Y., Zhang R., Xu J., Lan F., Qu D., Xu G., Zhao Y., Gao D., Xie Y., Luo M., Lu Z., Receptome profiling identifies KREMEN1 and ASGR1 as alternative functional receptors of SARS-CoV-2. Cell Res. 32, 24–37 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.W. T. Soh, Y. Liu, E. E. Nakayama, C. Ono, S. Torii, H. Nakagami, Y. Matsuura, T. Shioda, H. Arase, The N-terminal domain of spike glycoprotein mediates SARS-CoV-2 infection by associating with L-SIGN and DC-SIGN. bioRxiv 2020.11.05.369264 [Preprint]. 5 November 2020. 10.1101/2020.11.05.369264. [DOI]
- 67.Qing E., Li P., Cooper L., Schulz S., Jäck H.-M., Rong L., Perlman S., Gallagher T., Inter-domain communication in SARS-CoV-2 spike proteins controls protease-triggered cell entry. Cell Rep. 39, 110786 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Qing E., Kicmal T., Kumar B., Hawkins G. M., Timm E., Perlman S., Gallagher T., Dynamics of SARS-CoV-2 spike proteins in cell entry: Control elements in the amino-terminal domains. mBio 12, e0159021 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Wang Y., Lei R., Nourmohammad A., Wu N. C., Antigenic evolution of human influenza H3N2 neuraminidase is constrained by charge balancing. eLife 10, e72516 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Zhang J., Kobert K., Flouri T., Stamatakis A., PEAR: A fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30, 614–620 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Cock P. J. A., Antao T., Chang J. T., Chapman B. A., Cox C. J., Dalke A., Friedberg I., Hamelryck T., Kauff F., Wilczynski B., de Hoon M. J. L., Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Kabsch W., Sander C., Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983). [DOI] [PubMed] [Google Scholar]
- 73.Joosten R. P., te Beek T. A. H., Krieger E., Hekkelman M. L., Hooft R. W. W., Schneider R., Sander C., Vriend G., A series of PDB related databases for everyday needs. Nucleic Acids Res. 39, D411–D419 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Tien M. Z., Meyer A. G., Sydykova D. K., Spielman S. J., Wilke C. O., Maximum allowed solvent accessibilites of residues in proteins. PLOS ONE 8, e80635 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Wu N. C., Yuan M., Bangaru S., Huang D., Zhu X., Lee C.-C. D., Turner H. L., Peng L., Yang L., Burton D. R., Nemazee D., Ward A. B., Wilson I. A., A natural mutation between SARS-CoV-2 and SARS-CoVdetermines neutralization by a cross-reactive antibody. PLOS Pathog. 16, e1009089 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Temmam S., Vongphayloth K., Baquero E., Munier S., Bonomi M., Regnault B., Douangboubpha B., Karami Y., Chrétien D., Sanamxay D., Xayaphet V., Paphaphanh P., Lacoste V., Somlor S., Lakeomany K., Phommavanh N., Pérot P., Dehan O., Amara F., Donati F., Bigot T., Nilges M., Rey F. A., van der Werf S., Brey P. T., Eloit M., Bat coronaviruses related to SARS-CoV-2 and infectious for human cells. Nature 604, 330–336 (2022). [DOI] [PubMed] [Google Scholar]
- 77.Zhou H., Chen X., Hu T., Li J., Song H., Liu Y., Wang P., Liu D., Yang J., Holmes E. C., Hughes A. C., Bi Y., Shi W., A novel bat coronavirus closely related to sars-cov-2 contains natural insertions at the s1/s2 cleavage site of the spike protein. Curr. Biol. 30, 2196–2203.e3 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Lam T. T.-Y., Jia N., Zhang Y.-W., Shum M. H.-H., Jiang J.-F., Zhu H.-C., Tong Y.-G., Shi Y.-X., Ni X.-B., Liao Y.-S., Li W.-J., Jiang B.-G., Wei W., Yuan T.-T., Zheng K., Cui X.-M., Li J., Pei G.-Q., Qiang X., Cheung W. Y.-M., Li L.-F., Sun F.-F., Qin S., Huang J.-C., Leung G. M., Holmes E. C., Hu Y.-L., Guan Y., Cao W.-C., Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Nature 583, 282–285 (2020). [DOI] [PubMed] [Google Scholar]
- 79.Tao Y., Tong S., Complete genome sequence of a severe acute respiratory syndrome-related coronavirus from Kenyan bats. Microbiol. Resour. Announc. 8, e00548-19 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.D. Ream, A. J. Kiss, NCBI/GenBank BLAST output XML parser tool (2013); www.semanticscholar.org/paper/NCBI-%2F-GenBank-BLAST-Output-XML-Parser-Tool-Ream-Kiss/3ead0ae31b91d3096369de11f3488024f752bdc5).
- 81.Katoh K., Standley D. M., MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Hulo C., de Castro E., Masson P., Bougueleret L., Bairoch A., Xenarios I., Le Mercier P., ViralZone: A knowledge resource to understand virus diversity. Nucleic Acids Res. 39, D576–D582 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Rodrigues J. P. G. L. M., Teixeira J. M. C., Trellet M., Bonvin A. M. J. J., pdb-tools: A swiss army knife for molecular structures. F1000Research 7, 1961 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.