Summary
The receptor binding domain (RBD) of the SARS-CoV-2 spike glycoprotein mediates viral attachment to ACE2 receptor and is a major determinant of host range and a dominant target of neutralizing antibodies. Here, we experimentally measure how all amino acid mutations to the RBD affect expression of folded protein and its affinity for ACE2. Most mutations are deleterious for RBD expression and ACE2 binding, and we identify constrained regions on the RBD’s surface that may be desirable targets for vaccines and antibody-based therapeutics. But a substantial number of mutations are well tolerated or even enhance ACE2 binding, including at ACE2 interface residues that vary across SARS-related coronaviruses. However, we find no evidence that these ACE2-affinity-enhancing mutations have been selected in current SARS-CoV-2 pandemic isolates. We present an interactive visualization and open analysis pipeline to facilitate use of our dataset for vaccine design and functional annotation of mutations observed during viral surveillance.
Keywords: deep mutational scanning, SARS-CoV-2, receptor-binding domain, ACE2
Graphical Abstract
Highlights
-
•
Measured effects on folding and ACE2 binding of all mutations to the SARS-CoV-2 RBD
-
•
Provide open data and interactive visualization for vaccine design and surveillance
-
•
Identify constrained surfaces as ideal targets for vaccines and antibody therapeutics
-
•
Mutations that enhance ACE2 affinity exist but are not selected in pandemic isolates
Starr et al. systematically change every amino acid in the receptor binding domain (RBD) of the SARS-CoV-2 spike protein and determine the effects of the substitutions on RBD expression, folding, and ACE2 binding. The work identifies structurally constrained regions of the spike RBD that would be ideal targets for COVID-19 countermeasures and demonstrates that mutations in the virus that enhance ACE2 affinity can be engineered but have not, to date, been naturally selected during the pandemic.
Introduction
The SARS-related (sarbecovirus) subgenus of betacoronaviruses comprises a diverse lineage of viruses that circulate in bat reservoirs and spill over into other mammalian species (Figure 1A; Bolles et al., 2011; Cui et al., 2019). Sarbecoviruses initiate infection by binding to receptors on host cells via the viral spike protein. The entry receptor for both SARS-CoV-2 and the original SARS-CoV (which we refer to here as SARS-CoV-1) is the human cell-surface protein angiotensin converting enzyme 2 (ACE2). The receptor binding domain (RBD) of spike from both these viruses binds ACE2 with high affinity (Hoffmann et al., 2020; Letko et al., 2020; Li et al., 2003; Walls et al., 2020; Wrapp et al., 2020a). Because of its role in viral entry, the RBD is a major determinant of cross-species transmission and evolution (Becker et al., 2008; Frieman et al., 2012; Letko et al., 2020; Li, 2008; Li et al., 2005b; Qu et al., 2005; Ren et al., 2008; Sheahan et al., 2008a, 2008b; Wu et al., 2012). In addition, the RBD is the target of the most potent anti-SARS-CoV-2-neutralizing antibodies identified to date (Cao et al., 2020; Ju et al., 2020; Pinto et al., 2020; Rogers et al., 2020; Seydoux et al., 2020; Shi et al., 2020; Wu et al., 2020b; Zost et al., 2020), and several promising vaccine candidates use the RBD as the sole antigen (Chen et al., 2020a, 2020b; Mulligan et al., 2020; Quinlan et al., 2020; Ravichandran et al., 2020; Yang et al., 2020; Zang et al., 2020).
Despite its important function, the RBD is highly variable among sarbecoviruses (Hu et al., 2017), reflecting the complex selective pressures shaping its evolution (Demogines et al., 2012; Frank et al., 2020; MacLean et al., 2020). Furthermore, RBD mutations have already appeared among SARS-CoV-2 pandemic isolates, including some near the ACE2-binding interface—but their impacts on receptor recognition and other biochemical phenotypes remain largely uncharacterized. Therefore, comprehensive knowledge of how mutations impact the SARS-CoV-2 RBD would aid efforts to understand viral evolution and guide the design of vaccines and other countermeasures.
To address this need, we used a quantitative deep mutational scanning approach (Adams et al., 2016; Fowler and Fields, 2014; Weile and Roth, 2018) to experimentally measure how all possible SARS-CoV-2 RBD amino acid mutations affect ACE2-binding affinity and protein expression (a correlate of protein folding stability). The resulting sequence-phenotype maps illuminate the forces that shape RBD evolution, quantify constraint on antibody epitopes, and suggest that purifying selection is the main force acting on RBD mutations observed in human SARS-CoV-2 isolates to date. To facilitate use of our measurements in immunogen design and viral surveillance, we provide interactive visualizations, an open analysis pipeline, and complete raw and processed data.
Results
Yeast Display of RBDs from SARS-CoV-2 and Related Sarbecoviruses
To enable rapid functional characterization of thousands of RBD variants, we developed a yeast-surface-display platform for measuring expression of folded RBD protein and its binding to ACE2 (Adams et al., 2016; Boder and Wittrup, 1997). This platform enables RBD expression on the cell surface of yeast (Figure 1B), where it can be assayed for ligand-binding affinity or protein expression levels, a close correlate of protein folding efficiency and stability (Kowalski et al., 1998a, 1998b; Shusta et al., 1999). Because yeast have protein-folding quality control and glycosylation machinery similar to mammalian cells, they add N-linked glycans at the same RBD sites as human cells (Chen et al., 2014), although these glycans are more mannose rich than mammalian-derived glycans (Hamilton et al., 2003). The yeast-expressed RBD from SARS-CoV-1 has similar antigenic and structural properties to the RBD expressed in mammalian cells (Chen et al., 2014, 2017, 2020a) and binds to ACE2 as expected (Chen et al., 2014).
To validate the yeast-display platform, we selected RBDs from the Wuhan-Hu-1 SARS-CoV-2 isolate and six related sarbecoviruses (Figure 1A). These other sarbecoviruses include the closest known relatives of SARS-CoV-2 from bats and pangolins (RaTG13 and GD-Pangolin), SARS-CoV-1 (Urbani strain) and a close bat relative (LYRa11), and two more distantly related bat sarbecoviruses (BM48-31 and HKU3-1). Based on prior work, all these RBDs are expected to bind human ACE2 except those from BM48-31 and HKU3-1 (Lam et al., 2020; Letko et al., 2020; Shang et al., 2020). We cloned the RBDs into a vector for yeast display, induced RBD expression, and incubated with varying concentrations of fluorescently labeled human ACE2 (Figure 1B). We then used flow cytometry to measure ACE2 binding across 11 ACE2 concentrations, enabling the calculation of a dissociation constant for the binding of each RBD to ACE2 (Figure 1C). Because we used ACE2 in its native dimeric form (Yan et al., 2020), we refer to the measured constants as apparent dissociation constants (KD,app), which are affected by binding avidity. We report log binding constants Δlog10(KD,app) relative to the wild-type SARS-CoV-2 RBD, polarized such that a positive value reflects stronger binding (Figure 1D).
All RBDs exhibited ACE2-binding affinities consistent with prior knowledge. We measure KD,app = 3.9 × 10−11 M for the SARS-CoV-2 RBD (Figure 1C), which is tighter than affinities reported for monomeric ACE2 (Shang et al., 2020; Walls et al., 2020; Wrapp et al., 2020a) due to avidity effects caused by our use of native dimeric ACE2. Consistent with previous studies (Shang et al., 2020; Walls et al., 2020; Wrapp et al., 2020a), the SARS-CoV-1 RBD binds ACE2 with lower affinity than SARS-CoV-2 (Figures 1C and 1D). The SARS-CoV-1-related bat strain LYRa11 binds with even lower affinity, while the more distant bat RBDs (HKU3-1 and BM48-31) have no detectable binding. These measurements are consistent with the ability of these RBDs to enable viral particles to enter cells expressing human ACE2 (Letko et al., 2020; Figure 1D). Within the newly described SARS-CoV-2 clade, GD-Pangolin binds ACE2 with slightly higher affinity than SARS-CoV-2, while the bat isolate RaTG13 binds with two orders of magnitude lower affinity, consistent with prior reports (Shang et al., 2020; Wrobel et al., 2020). These results validate our yeast-surface-display platform for RBD affinity measurements and map variation in ACE2 affinity within the SARS-CoV-2 clade and the broader sarbecovirus subgenus.
Deep Mutational Scanning of All Amino Acid Mutations to the SARS-CoV-2 RBD
We next integrated the yeast-display platform with deep mutational scanning to determine how all amino acid mutations to the SARS-CoV-2 RBD impact expression and binding affinity for ACE2. We constructed two independent mutant libraries of the RBD using a PCR-based mutagenesis method that introduces all 19 mutant amino acids at each position (Bloom, 2014). To facilitate sequencing and obtain linkage among amino acid mutations within a single variant, we appended 16-nucleotide barcodes downstream of the coding sequence (Hiatt et al., 2010), bottlenecked each library to ∼100,000 barcoded variants, and linked each RBD variant to its barcode via long-read PacBio SMRT sequencing (Matreyek et al., 2018; Figure S1A). By examining the concordance of RBD variant sequences for barcodes sampled by multiple PacBio reads, we validated that this process correctly determined the sequence of >99.8% of the variants (Figure S1B). RBD variants contained an average of 2.7 amino acid mutations, with the number of mutations per variant roughly following a Poisson distribution (Figure S1C). Our libraries covered 3,804 of the 3,819 possible RBD amino acid mutations, of which 95.7% were present as the sole amino acid mutation in at least one barcoded variant (Figures S1D and S1E). To provide internal standards for our measurements, we spiked the mutant libraries with a barcoded panel of 11 unmutated sarbecovirus RBD homologs (strains in color in Figure 1A), including those tested in Figure 1C.
To determine how mutations affect RBD expression and ACE2 binding, we combined fluorescence-activated cell sorting (FACS) with deep sequencing of variant barcodes (Adams et al., 2016; Peterman and Levine, 2016). To measure expression, we fluorescently labeled the RBD protein on the yeast surface via a C-terminal epitope tag and used FACS to collect ∼15 million cells from each library, partitioned into four bins from low to high expression (Figures 2A and S2A). We sequenced the barcodes from each bin and reconstructed each variant’s mean fluorescence intensity (MFI) from its distribution of reads across bins (Figure S2C). We represent expression as Δlog(MFI) relative to the unmutated SARS-CoV-2 RBD, such that a positive Δlog(MFI) indicates increased expression. To measure ACE2-binding affinity, we incubated yeast libraries that had been pre-sorted for RBD expression with 16 concentrations of fluorescently labeled ACE2 (10−6 to 10−13 M, and 0M ACE2) and used FACS to collect >5 million RBD+ yeast cells at each concentration, partitioned into 4 bins from low to high ACE2 binding (Figures 2B and S2B). We again sequenced the barcodes from each bin, reconstructed the mean ACE2 binding of each variant at each concentration (Figure S2C), and used the resulting titration curves to infer dissociation constants KD,app (Figure S2D), which we represent as Δlog10(KD,app) relative to the unmutated SARS-CoV-2 RBD, with positive values indicating stronger binding.
These high-throughput measurements of expression and ACE2 binding were consistent with expectations about the effects of mutations. RBD variants containing premature stop codons universally failed to express folded full-length protein (Figure 2C). Unmutated variants and those with synonymous mutations had a tight distribution of neutral expression and binding measurements (Figures 2C and 2D). Variants containing amino acid mutations had a wide range of expression and binding phenotypes, with variants containing just one mutation tending to have milder functional defects than those with multiple mutations (Figures 2C and 2D). These trends are consistent with the fact that most mutations are deleterious to protein folding or function (Soskine and Tawfik, 2010)—however, some mutated variants exhibit expression or binding that is comparable or even higher than the parental SARS-CoV-2 RBD. The panel of RBD homologs from other sarbecovirus strains all expressed well but exhibited a wide range of ACE2-binding affinities (Figures 2C and 2D; Table S1), as expected since only some are derived from viruses that can enter cells using human ACE2 (Letko et al., 2020).
These measurements show that the RBD possesses considerable mutational tolerance (Figures 2C and 2D). For instance, 46% of single amino acid mutations to SARS-CoV-2 RBD maintain an affinity to ACE2 at least as high as that of SARS-CoV-1, suggesting that there is a substantial mutational space consistent with sufficient affinity to maintain human infectivity. Many single amino acid mutants also maintain expression comparable to unmutated SARS-CoV-2, indicating that a large mutational space is compatible with properly folded RBD protein.
We next aggregated the measurements on all variants to quantify the effects of individual amino acid mutations. Because many variants contain multiple mutations, we used global epistasis models to determine the effects of individual mutations from all singly and multiply mutated variants (Otwinowski et al., 2018; Figures S2E–S2K). The resulting single-mutant Δlog(MFI) and Δlog10(KD,app) measurements correlated well between the independent library duplicates (R2 = 0.93 and 0.95, respectively; Figures 2E and 2F). Throughout the rest of this paper, we report single-mutant effects as the average of the duplicate measurements. Overall, we obtained expression measurements for 99.5% and binding measurements for 99.6% of all 3,819 single amino acid mutations.
Visualization of Sequence-to-Phenotype Maps
The complete measurements of how amino acid mutations affect expression and ACE2 binding represent rich sequence-to-phenotype maps for the RBD. We visualize the data in several ways. Figure 3 provides heatmaps that show how each mutation affects expression or ACE2 binding, with sites annotated by whether they contact ACE2, their relative solvent accessibility, and their amino acid identities in SARS-CoV-2 and SARS-CoV-1. Interactive versions of these heatmaps are in Data S1 and at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS and enable zooming, subsetting by functional annotations, and mouse-selection-based readouts of numerical measurements. As an alternative representation, Figure S3 provides logo plots that enable side-by-side comparison of how mutations affect expression and ACE2 binding. Finally, interactive structure-based visualizations using dms-view (Hilton et al., 2020) are at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS/structures/ and project the effects of mutations onto a crystal structure of the ACE2-bound RBD (Lan et al., 2020) and a cryoelectron microscopy (cryo-EM) structure of the full spike ectodomain (Walls et al., 2020). The underlying raw data are in Table S2.
The sequence-phenotype maps reveal tremendous heterogeneity in mutational constraint across the RBD. Many sites are highly tolerant of mutations with respect to one or both of expression and ACE2 binding, while other sites are constrained to the wild-type amino acid. A substantial number of sites (e.g., 382–395) are tolerant of mutations with respect to ACE2 binding but are constrained with respect to expression—consistent with folding and stability being global constraints common to many sites (Fane et al., 1991; Poteete et al., 1997). There are also a handful of sites where ACE2 binding imposes strong constraint but expression does not (e.g., 489, 502, and 505). Moreover, at some sites there are mutations that clearly enhance expression or ACE2-binding affinity (blue colors in Figure 3).
Validation of Deep Mutational Scanning Measurements
We performed a series of experiments to confirm the dynamic range of our assays and their relevance for RBD expressed in mammalian cells or full spike trimer on pseudotyped lentiviral particles (Figures 4 and S4).
To validate the dynamic range of our deep mutational scanning, we re-cloned and tested RBD mutants in isogenic yeast-display assays. These experiments recapitulated the deep mutational scanning (Figures 4A–4C), including confirmation that some mutations enhance expression (V367F and G502D) or ACE2 affinity (N501F, N501T, and Q498Y) in the context of yeast-expressed RBD.
We next compared our deep mutational scanning to measurements on mammalian-expressed RBDs. We purified mammalian-expressed RBDs from six sarbecoviruses (SARS-CoV-2, SARS-CoV-1, WIV1, RaTG13, ZXC21, and ZC45) and measured their 1:1 binding affinities for monomeric human ACE2 using biolayer interferometry, which agreed with the measurements from our deep mutational scan (Figures 4D and S4A–S4F). Moreover, we observed that using a natively dimeric ACE2 enables detection of binding by the RaTG13 RBD, which can support ACE2-mediated cell entry (Shang et al., 2020) even though the 1:1 affinity is too weak to detect (Figure S4D).
We also validated that mutations enhancing yeast surface expression improve soluble yield and stability of mammalian-expressed RBD protein. We tested five expression-enhancing mutations and found that each greatly increased soluble RBD yield (2.3- to 4.8-fold increase; Figures 4E, 4F, and S4G). Four of the mutations also increased RBD stability (Figures 4G and S4H), including one (V367F) that increased the melting temperature by 3.9°C. All five mutations also maintained ACE2 binding and antigenicity (Figure S4I), suggesting they could be useful for enhancing production of RBD-based vaccine immunogens.
Finally, we validated the deep mutational scanning measurements in the context of spike-pseudotyped lentiviral particles (Figures 4H and S4J; Crawford et al., 2020). The trends observed for entry by the spike-pseudotyped lentiviral particles generally confirmed the deep mutational scanning: three of four mutations that were detrimental for RBD expression or ACE2 binding reduced pseudovirus entry, while a mutation that had little phenotypic effect in the deep mutational scan did not affect viral entry. We also tested two ACE2 affinity-enhancing mutations and found that both increased pseudovirus entry. Note that this result with single-cycle pseudovirus does not necessarily imply that these mutations would increase growth of authentic SARS-CoV-2, since multi-cycle viral replication often involves tuning of receptor affinity to simultaneously optimize viral attachment and release (Callaway et al., 2018; Hensley et al., 2009; Lang et al., 2020). Taken together, these experiments help validate the accuracy and relevance of the deep mutational scanning.
Interpreting Mutation Effects in the Context of the RBD Structure
To relate our sequence-phenotype maps to the RBD structure, we mapped the effects of mutations onto the ACE2-bound SARS-CoV-2 RBD crystal structure (Lan et al., 2020), coloring each residue’s Cɑ by the mean effect of mutations at that site on expression (Figure 5A) or binding (Figure 5B). Interactive structure-based visualizations of specific residue sets discussed in the following sections can be found at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS/structures/.
The two subdomains of the RBD differ in mutational constraint on expression and binding. The core-RBD subdomain consists of a central beta sheet flanked by alpha-helices and presents a stably folded scaffold for the receptor binding motif (RBM, residues 437–508; Li et al., 2005a), which encodes ACE2 binding and receptor specificity (Letko et al., 2020). The RBM consists of a concave surface anchored by a β-hairpin and a disulfide bond stabilizing one of the lateral loops, which cradles the ACE2 ɑ1 helix and a β-hairpin centered on K353ACE2. Consistent with the modularity of core-RBD-encoded stability and RBM-encoded binding, constraint on expression primarily focuses on buried residues within the core-RBD (Figure 5A), while constraint on binding focuses on the RBM-proximal core-RBD in addition to the RBM itself (Figure 5B), particularly on RBM residues that contact K31ACE2 and K353ACE2, which are “hotspots” of binding for SARS-CoV-1 and SARS-CoV-2 (Li, 2008; Shang et al., 2020; Wu et al., 2012).
Several ACE2-contact residues exhibit binding-stability tradeoffs, as has been seen in the active sites and binding interfaces of other proteins (Julian et al., 2017; Tokuriki et al., 2008; Wang et al., 2002). For example, several mutations to G502 enhance RBD expression (Figure 3A) but abolish binding (Figure 3B) due to steric clashes with ACE2 (Figure S5A). Similarly, mutations to polar amino acids enhance expression at interface residues Y449, L455, F486, and Y505 (Figure 3A), consistent with the destabilizing effect of surface-exposed hydrophobic patches (Schwehm et al., 1998)—but these hydrophobic residues form ACE2-packing contacts and are required for binding (Figures 3B and S5B).
However, our data also indicate that global RBD stability contributes to ACE2-binding affinity. In general, mutation effects on RBD binding and expression are correlated (Figures 5C and S5C), with residues that deviate from this trend clustering at the ACE2 interface (Figure 5C, cyan points). This correlation between expression and binding is consistent with studies on antibodies, where mutations that improve stability and rigidity accompany increases in binding affinity (Davenport et al., 2016; Ovchinnikov et al., 2018; Schmidt et al., 2013). Because ACE2 binding is influenced by both global RBD stability and interface-specific constraints, a site’s tolerance to mutation is better explained by its extent of burial in the ACE2-bound RBD structure than its burial in the free RBD structure alone (Figure S5D). The contribution of RBD stability to ACE2 binding may be influenced by other factors in the full spike trimer, though our measurements on pseudotyped lentiviral particles (Figure 4H) indicate that a destabilizing RBD mutation (C432D) reduces ACE2-mediated cellular entry in the context of spike trimer.
Our data also reveal the importance of other sequence features. For example, the four disulfide bonds in the RBD have varying tolerance to mutation (Figures 5A, 5B, and S5E), with the RBM C480:C488 disulfide completely constrained for ACE2 binding. The two RBD N-linked glycans contribute to RBD stability, as mutations that ablate the NxS/T glycosylation motif decrease RBD expression (Figure S5F). The SARS-CoV-1 RBD contains a third glycan, but its introduction at the homologous N370 in SARS-CoV-2 is mildly deleterious for expression (Figure S5F). However, there are other surface positions where introduction of NxS/T glycosylation motifs is tolerated or even beneficial for RBD expression (Figures S5G–S5I); adding glycans at some of these sites could be useful in resurfacing RBDs as antibody probes (Wu et al., 2010; Zhou et al., 2020c) or epitope-focused immunogens (Duan et al., 2018; Eggink et al., 2014; Jardine et al., 2016; Kulp et al., 2017; Weidenbacher and Kim, 2019).
Mutation Effects at ACE2 Contact Sites and Implications for Sarbecovirus Evolution
An initially surprising feature of SARS-CoV-2 was that its RBD tightly binds ACE2 despite differing in sequence from SARS-CoV-1 at many residues that had been defined as important for ACE2 binding (Andersen et al., 2020; Wan et al., 2020). Our map of mutational effects explains this observation by revealing remarkable degeneracy at ACE2 contact positions, with many interface mutations being tolerated or even enhancing affinity (Figure 5D). Mutations that enhance affinity are notable at RBD sites Q493, Q498, and N501. Although these residues are involved in a dense network of polar contacts with ACE2 (Shang et al., 2020; Figure 5E), our measurements show there is substantial plasticity in this network, as mutations that reduce the polar character of these residues can enhance affinity.
Within the SARS-CoV-2 clade of sarbecoviruses, our maps of mutational effects on binding explain variation in ACE2 affinity among different viruses. For example, GD-Pangolin has higher affinity for ACE2 than SARS-CoV-2 (Figures 1C and 2D), and this can be explained by the affinity-enhancing Q498H mutation present in this virus’s RBD sequence (Figure 5F). In contrast, RaTG13 has substantially lower affinity for ACE2 than SARS-CoV-2 (Figures 1C and 2D), consistent with the presence of affinity-decreasing mutations including Y449F and N501D (Figure 5F). The fact that differences in binding affinity of GD-Pangolin and RaTG13 are well explained by summing the effects of individual mutations relative to SARS-CoV-2 suggests that our deep mutational scanning is useful for sequence-based predictions of the ACE2-binding potential of future viruses isolated from the SARS-CoV-2 clade.
In contrast, the ACE2-binding interface of SARS-CoV-1 has many more mutations relative to SARS-CoV-2, and this increased divergence causes shifts in the actual effects of mutations on ACE2 binding. In particular, our deep mutational scanning shows that most SARS-CoV-1 amino acid states are individually deleterious in SARS-CoV-2, despite being compatible with high-affinity binding by SARS-CoV-1 (Figure 5F). This shift in the effects of mutations between more distantly related RBDs is consistent with studies of protein evolution demonstrating that epistastic entrenchment causes amino acid preferences to change as proteins diverge (Hilton and Bloom, 2018; Lee et al., 2018; Pollock et al., 2012; Povolotskaya and Kondrashov, 2010; Shah et al., 2015; Starr and Thornton, 2016; Starr et al., 2018). Therefore, our current SARS-CoV-2 deep mutational scanning data are likely to be most useful for predicting the effects of mutations to RBDs closely related to that of SARS-CoV-2.
Mutational Constraint of Antibody Epitopes
The RBD is the dominant target of neutralizing antibodies to SARS-CoV-2 (Brouwer et al., 2020; Cao et al., 2020; Ju et al., 2020; Pinto et al., 2020; Premkumar et al., 2020; Rogers et al., 2020; Suthar et al., 2020; Yuan et al., 2020a; Zhang et al., 2020; Zost et al., 2020). It is unclear to what extent the RBD will evolve to escape such antibodies in a manner reminiscent of some other viruses (Smith et al., 2004; Trkola et al., 2005), although in vitro studies suggest that SARS-CoV-2 and SARS-CoV-1 RBDs are capable of fixing mutations that escape neutralizing antibodies (Baum et al., 2020; Rockx et al., 2010). To better define the RBD’s evolutionary capacity for antibody escape, we examined mutational constraint in the epitopes of antibodies that bind the SARS-CoV-1 or SARS-CoV-2 RBD (Figures 6A, S6A, and S6B; Hwang et al., 2006; Pak et al., 2009; Pinto et al., 2020; Prabakaran et al., 2006; Walls et al., 2019; Wrapp et al., 2020b; Wu et al., 2020b; Yuan et al., 2020b).
Many antibodies have epitopes that overlap the RBD ACE2-contact interface and are therefore strongly constrained by mutation effects on binding. For instance, antibodies B38 and 80R engage the two constrained patches that comprise the ACE2-binding interface, while S230, F26G19, and m396 engage either one of these ACE2-binding patches. However, none of the currently characterized antibodies have epitopes as constrained as the ACE2-contact surface itself (Figure 6B), suggesting further epitope focusing could be achieved. The importance of such focusing is demonstrated by a recent study that identified RBD mutations enabling escape from RBM-directed neutralizing antibodies (Baum et al., 2020)—our data indicate that the escape occurs at sites that have high mutational tolerance (Figures S6C and S6D).
Epitopes of core-RBD-directed antibodies tend to be mutationally constrained with respect to expression rather than binding (Figures 6A and 6B). These core-RBD epitopes are conserved across the sarbecovirus alignment (Figure S6E), explaining the possible cross-reactivity of these antibodies between SARS-CoV-1 and SARS-CoV-2 (Huo et al., 2020; Pinto et al., 2020; Wrapp et al., 2020b). Although residues in these epitopes are constrained for stability even in our measurements on the isolated RBD, some of them likely exhibit additional constraint due to quaternary contacts in the full spike trimer (Walls et al., 2020; Wrapp et al., 2020a; Yuan et al., 2020b). We identified an additional core-RBD patch centered on residue E465 that is also mutationally constrained (Figure 6C) and evolutionarily conserved (Figure S6E) but is not targeted by any currently known antibody and might represent a promising target.
Taken together, our results identify multiple mutationally constrained patches on the RBD surface that can be targeted by antibodies. These findings provide a framework that could inform the formulation of antibody cocktails aiming to limit the emergence of viral escape mutants (Baum et al., 2020; Pinto et al., 2020; Wu et al., 2020b; Zost et al., 2020), particularly if deep mutational scanning approaches like our own are extended to define antibody epitopes in functional as well as structural terms (Dingens et al., 2019).
Using Sequence-Phenotype Maps to Interpret Genetic Variation in SARS-CoV-2
An important question is whether any mutations that have appeared in circulating SARS-CoV-2 isolates have functional consequences. Despite intense interest in this question, experimental work to characterize the effects of SARS-CoV-2 mutations has lagged far behind their identification in viral sequences. Our comprehensive maps of the phenotypic effects of mutations provide a direct way to interpret the impact of current and future genetic variation in the SARS-CoV-2 RBD.
To assess the phenotypic impacts of mutations that have appeared in the SARS-CoV-2 RBD to date, we downloaded all 31,570 spike sequences available from GISAID (Elbe and Buckland-Merrett, 2017) on May 27, 2020 and identified RBD amino acid mutations present in high-quality clinical isolates. All observed RBD mutations are at low frequency, with 56 of the 98 observed mutations present only in a single sequence. The observed mutations are significantly less deleterious for ACE2 binding and RBD expression than random single-nucleotide-accessible mutations (Figures 7A, S7A, and S7B, p < 10−6, permutation tests), consistent with the action of purifying selection. Purifying selection against deleterious mutations is especially apparent for mutations that are observed multiple times in circulating variants, with a substantial number of singletons being mildly or moderately deleterious, whereas mutations observed multiple times are largely neutral. This general pattern of increased purifying selection on more common mutations is consistent with theoretical expectation and empirical patterns observed for other viruses (Pybus et al., 2007; Xue and Bloom, 2020).
Our discovery of affinity-enhancing mutations to the SARS-CoV-2 RBD raises the question of whether positive selection favors such mutations, since the relationship between receptor affinity and fitness can be complex for viruses that are well adapted to their hosts (Callaway et al., 2018; Hensley et al., 2009; Lang et al., 2020). Affinity-enhancing mutations are accessible via single-nucleotide mutation from SARS-CoV-2 (Figure S7C), but none are observed among circulating viral sequences (Figure 7A), and observed mutations do not enhance ACE2 affinity more than randomly drawn samples of single-nucleotide mutations (Figure S7D). Taken together, we see no clear evidence of selection for stronger ACE2 binding, consistent with SARS-CoV-2 already possessing adequate ACE2 affinity at the beginning of the pandemic.
Last, we validated our deep mutational scanning for mutations that are especially prevalent among naturally occurring sequences in GISAID. The deep mutational scanning suggests small phenotypic effects for the most prevalent mutations, with the exception of V367F, which substantially enhances expression (Figure 7B). We re-cloned and tested most of these prevalent mutations for expression and ACE2 binding in isogenic yeast-display assays. Consistent with the deep mutational scanning, the only large phenotypic effect was increased expression of V367F (Figures 7C and 7D), which we also validated enhances thermal stability of mammalian-expressed RBD (Figures 4G and S4H). The relevance of V367F’s stability-enhancing effect for viral fitness is unclear, though this mutation has independently arisen multiple times (van Dorp et al., 2020). We also validated that N439K, the most prevalent RBD mutation, which may have a very slight affinity-enhancing effect (Figures 7B and 7C), has no measurable impact on entry of spike-pseudotyped lentiviral particles (Figure 4H). Taken together, our results suggest that there is little phenotypic diversity in ACE2 binding among circulating variants at this early stage of the pandemic—although it will be interesting to use our maps to continually assess the phenotypic effects of future mutations as the virus evolves.
Discussion
Vast numbers of viral genomes have been sequenced in almost real-time during the SARS-CoV-2 pandemic. These genomic sequences have been useful for understanding viral emergence and spread (Andersen et al., 2020; Bedford et al., 2020; Fauver et al., 2020), but the lack of corresponding high-throughput functional characterization means that speculation has outpaced experimental data when it comes to understanding the phenotypic consequences of mutations. Here, we take a step toward providing phenotypic maps commensurate with the scale of genomic data by experimentally characterizing how all amino acid mutations to the RBD affect the expression of folded protein and its affinity for ACE2, two key factors for viral fitness. These maps show that RBD mutations that have appeared in SARS-CoV-2 to date are nearly neutral with respect to these two biochemical phenotypes, with the exception of one mutation (V367F) that increases RBD stability. Notably, there has been no selection to date for any of the evolutionarily accessible mutations that enhance ACE2 binding affinity. The genetic diversity of SARS-CoV-2 is likely to increase as it continues to circulate in the human population, and so our phenotypic maps should become increasingly valuable for viral surveillance as mutations accumulate over time.
It is important to remember that our maps define biochemical phenotypes of the RBD, not how these phenotypes relate to viral fitness. There are many complexities in the relationship between biochemical phenotypes of yeast-displayed RBD and viral fitness. First, there are subtle differences in glycan structures between yeast versus human cells (Hamilton et al., 2003), though the overall role of glycans in RBD stability is preserved in yeast systems (Chen et al., 2014). Second, the RBD is just one domain of the viral spike, which engages in complex dynamic movements to mediate viral entry (Huo et al., 2020; Walls et al., 2019, 2020; Wrapp et al., 2020b). Finally, spike-mediated entry is just one component of fitness, which involves a myriad of incompletely understood factors that determine how well a virus spreads from one human to another (Kutter et al., 2018). To some degree, these caveats are universal of experimental studies, as even sophisticated animal models are imperfect proxies for true fitness (Louz et al., 2013)—but they are especially true for basic biochemical phenotypes like the ones we measure. However, on a hopeful note, our measurements correlate well with cellular entry by spike-pseudotyped viral particles expressing sarbecovirus RBD homologs (Figure 1D) and single mutants of the SARS-CoV-2 RBD (Figure 4H). Fitness ultimately arises from the concerted action of biochemical phenotypes, which are in turn determined by genotype (Dean and Thornton, 2007; Harms and Thornton, 2013; Russell et al., 2014). By making the first link from mutations to biochemical phenotypes, we have taken a step toward enabling better interpretation of viral genetic variation.
One important area where our maps do have clear relevance is assessing the potential for SARS-CoV-2 to undergo antigenic drift by fixing mutations at sites targeted by antibodies, as occurs for some other viruses such as influenza (Smith et al., 2004). The RBD is the dominant target of neutralizing antibodies (Cao et al., 2020; Ju et al., 2020; Pinto et al., 2020; Rogers et al., 2020; Seydoux et al., 2020; Shi et al., 2020; Wu et al., 2020b; Zost et al., 2020), and so any antigenic drift will be constrained by its mutational tolerance. Our results show that many mutations to the RBD are well tolerated with respect to both protein folding and ACE2 binding. However, the ACE2 binding interface is more constrained than most of the RBD’s surface, which could limit viral escape from antibodies that target this interface (Rockx et al., 2010). In this respect, our maps enable several important observations. First, no characterized antibodies have epitopes as constrained as the actual RBD surface that contacts ACE2, suggesting that there is room for epitope focusing to minimize viral escape. Second, there are a number of RBD mutations that enhance ACE2 affinity, which implies evolutionary potential for compensation of deleterious mutations in the ACE2 interface in a manner reminiscent of multi-step escape pathways that have been described for other viruses (Bloom et al., 2010; Friedrich et al., 2004; Gong et al., 2013; Lynch et al., 2015; Wu et al., 2017). It should be possible to shed further experimental light on the potential for antigenic drift by extending our deep mutational scanning methodology to directly map immune-escape mutations as has been done for other viruses (Dingens et al., 2019; Lee et al., 2019; Wu et al., 2020a).
RBD-based antigens represent a promising vaccine approach (Chen et al., 2020a, 2020b; Mulligan et al., 2020; Quinlan et al., 2020; Ravichandran et al., 2020; Zang et al., 2020). Our sequence-phenotype maps can directly inform efforts to engineer such vaccines in several ways. First, we identify many mutations that enhance RBD expression and thermal stability, a desirable property in vaccine immunogens. Second, our maps show which mutations can be introduced into the RBD without disrupting key biochemical phenotypes, thereby opening the door to resurfacing immunogens to focus antibodies on specific epitopes (Duan et al., 2018; Eggink et al., 2014; Jardine et al., 2016; Kulp et al., 2017; Weidenbacher and Kim, 2019; Wu et al., 2010). Finally, our maps show which surfaces of the RBD are under strong constraint and might thereby be targeted by structure-guided vaccines to stimulate immunity with breadth across the sarbecovirus clade: in addition to the ACE2 interface itself, these surfaces include several core-RBD patches targeted by currently described antibodies and a previously undescribed core-RBD patch surrounding residue E465.
Finally, our work should be useful for understanding the evolution of sarbecoviruses more broadly, including the potential for more spillovers into the human population. There is a dizzying diversity of RBD genotypes and phenotypes among sarbecoviruses within bat reservoirs (Boni et al., 2020; Demogines et al., 2012; Frank et al., 2020; Hu et al., 2017; Latinne et al., 2020; Letko et al., 2020; MacLean et al., 2020). A prerequisite for these viruses to jump to humans is the ability to efficiently bind human receptors (Becker et al., 2008; Letko et al., 2020; Menachery et al., 2015, 2016). Our maps are immediately useful in assessing the effects on ACE2 binding of mutations to viruses within the SARS-CoV-2 clade, and extensions to account for epistasis and genetic background could further inform understanding of the evolutionary trajectories that enable sarbecoviruses to efficiently infect human cells.
STAR★Methods
Key Resources Table
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Antibodies | ||
FITC-conjugated chicken anti-cMyc antibody | Immunology Consultants Laboratory, Inc. | Cat# CMYC-45F |
PE-conjugated streptavidin | Thermo Fisher | Cat# S866 |
Chemicals, Peptides, and Recombinant Proteins | ||
Biotinylated human ACE2 | ACROBiosytems | Cat# AC2-H82E6 |
Critical Commercial Assays | ||
HIV-1 p24 Antigen Capture Assay | Advanced Bioscience Laboratories, Inc. | Cat# 5421 |
Deposited Data | ||
Raw sequencing data | This paper | NCBI SRA: BioProject PRJNA639956 |
GISAID EpiCoV SARS-CoV-2 sequence isolates | GISAID | Full list of contributing labs and accessions: https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/alignments/Spike_GISAID/gisaid_hcov-19_acknowledgement_table.xls |
Sarbecovirus homolog RBD sequences | NCBI | NCBI GenBank: MN908947, MN996532, AY278741, KF367457, KF569996, DQ071615, DQ022305, DQ412042, MG772934, MG772933, NC014470 |
GD-Pangolin RBD sequence | Lam et al., 2020 | N/A |
ACE2-bound RBD structures | Li et al., 2005a; Lan et al., 2020 | PDB 2AJF, 6M0J |
Antibody-bound RBD structures | Yuan et al., 2020b; Wrapp et al., 2020b; Prabakaran et al., 2006; Pak et al., 2009; Hwang et al., 2006; Wu et al., 2020b; Walls et al., 2019; Pinto et al., 2020 | PDB 6W41, 6WAQ, 2DD8, 3BGF, 2GHW, 7BZ5, 6NB6, 6NB7, 6WPS |
Experimental Models: Cell Lines | ||
Saccharomyces cerevisiae strain AWY101 | Wentz and Shusta 2007 | AWY101 |
Human Embryonic Kidney cells (HEK293T) | ATCC | ATCC CRL-3216 |
Human Embryonic Kidney cells expressing human ACE2 (HEK293T-hACE2) | BEI | BEI NR-52511 |
Expi293F cells | Thermo Fisher | Cat# A14527 |
FreeStyle 293F cells | Thermo Fisher | Cat# R79007 |
Oligonucleotides | ||
primers for RBD library mutagenesis | Integrated DNA Technologies | sequences given at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/primers/mutational_lib/SARS-CoV-2_RBD_NNSprimers.txt |
primers for RBD library construction and Illumina sequencing | Integrated DNA Technologies | sequences given at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/primers/primers.csv |
Recombinant DNA | ||
plasmid 2649: pETcon_SARS-CoV-2_RBD | This paper | sequence at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/plasmid_maps/2649_pETcon-SARS-CoV-2-RBD-201aa.gb |
plasmid 2736: HDM_IDTSpike_EcoKozak | This paper | sequence at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/plasmid_maps/2736_HDM_IDTSpike_EcoKozak.gb |
pHAGE2-CMV-ZsGreen-W | BEI | BEI Resources NR-52520 |
HDM-Hgpm2 | BEI | BEI Resources NR-52517 |
pRC-CMV-Rev1b | BEI | BEI Resources NR-52519 |
HDM-tat1b | BEI | BEI Resources NR-52518 |
Software and Algorithms | ||
ccs, version 4.2.0 | Pacific Biosciences | https://github.com/PacificBiosciences/ccs |
alignparse, version 0.1.3 | Crawford and Bloom, 2019 | https://github.com/jbloomlab/alignparse |
minimap, version 2.17 | Li 2018 | https://github.com/lh3/minimap2 |
dms_variants, version 0.6.0 | GitHub | https://jbloomlab.github.io/dms_variants/ |
custom code | This paper | all analyses provided on github: https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS |
Other | ||
SARS-CoV-2 RBD mutant libraries | This paper | N/A |
Resource Availability
Lead Contact
Further information and requests for reagents and resources should be directed to and will be fulfilled by the Lead Contact, Jesse Bloom (jbloom@fredhutch.org).
Materials Availability
SARS-CoV-2 mutant libraries generated in this study will be made available on request by the Lead Contact with a completed Materials Transfer Agreement.
Data and Code Availability
We provide all data and code in the following ways:
-
•
Raw data tables of our replicate functional scores at the level of single mutations (Table S2, and GitHub: https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/single_mut_effects/single_mut_effects.csv)
-
•
Raw data tables of our replicate functional scores among sarbecovirus homologs (Table S1 and GitHub: https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/single_mut_effects/homolog_effects.csv)
-
•
Interactive heatmaps for lookup of individual mutational effects and related information (Data S1 and GitHub: https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS/)
-
•
Illumina sequencing counts for each barcode among FACS bins (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/counts/variant_counts.csv)
-
•
The complete variant:barcode lookup table (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/variants/codon_variant_table.csv)
-
•
The complete computational workflow to generate and analyze these data, including reproducible code within a programmatically constructed computational environment (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS)
-
•
A Markdown summary of the organization of analysis steps, with links to key data files and Markdown summaries of each step in the analysis pipeline (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/summary.md), with specific Markdown summaries linked in the relevant STAR Methods sections below
-
•
All raw sequencing data are uploaded to the NCBI Short Read Archive: BioProject PRJNA639956.
Experimental Model and Subject Details
Saccharomyces cerevisiae strain AWY101 (Wentz and Shusta, 2007) was cultured at 30°C (except where indicated) in baffled flasks while shaking at 275rpm. Selective media contained 6.7 g/L Yeast Nitrogen Base, 5.0 g/L Casamino acids, 1.065 g/L MES, and 2% w/v carbon source (dextrose for routine maintenance, galactose supplemented with 0.1% dextrose for RBD induction). HEK293T cells (ATCC CRL-3216) were cultured in D10 growth media (DMEM with 10% heat-inactivated FBS, 2 mM l-glutamine, 100 U/mL penicillin, and 100 μg/mL streptomycin) at 37°C in a humidified 5% CO2 incubator. Expi293F (Thermo Fisher Cat No. A14527) and FreeStyle 293F (Thermo Fisher Cat No. R79007) suspension cells were grown at at 37°C in a humidified 8% CO2 incubator rotating at 130 rpm. Cell lines were not authenticated.
Method Details
RBD cloning
The Spike receptor binding domain (RBD) from SARS-CoV-2 (isolate Wuhan-Hu-1, GenBank : MN908947, residues N331-T531) and additional sarbecovirus homologs (RaTG13, GenBank: MN996532; GD-Pangolin consensus from Lam et al. (2020); SARS-CoV-1 Urbani, GenBank: AY278741; WIV1, GenBank: KF367457 (identical RBD sequence to WIV16); LYRa11, GenBank: KF569996; Rp3, GenBank: DQ071615; HKU3-1, GenBank: DQ022305; Rf1, GenBank: DQ412042; ZXC21, GenBank: MG772934; ZC45, GenBank: MG772933; and BM48-31, GenBank: NC014470) were ordered as yeast codon-optimized gBlocks (IDT) and cloned into the pETcon yeast surface-display expression vector. The destination vector was modified downstream of the yeast surface-display fusion construct to include a barcode landing pad for subsequent library generation, along with Illumina sequencing priming handles for downstream barcode sequencing and NotI digestion sites for downstream PacBio sequencing preparation. This plasmid sequence is provided on GitHub at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/plasmid_maps/2649_pETcon-SARS-CoV-2-RBD-201aa.gb.
Isogenic yeast display induction and titration
RBD variant plasmids were transformed into the AWY101 Saccharomyces cerevisiae strain (Wentz and Shusta, 2007), selecting for the plasmid Trp auxotrophic marker on SD-CAA selective plates (6.7g/L Yeast Nitrogen Base, 5.0g/L Casamino acids, 1.065 g/L MES acid, and 2% w/v dextrose). Single colonies were inoculated into 1.5mL liquid SD-CAA media, and grown overnight at 30°C. Then 1 OD unit of yeast were back-diluted into 1.5mL SG-CAA+0.1%D induction media (2% w/v galactose supplemented with 0.1% dextrose), and incubated for 16-18 hours at room temperature.
Induced cells were spun down at 250,000 cells per sample and washed in PBS-BSA (0.2 mg/mL). Samples were resuspended in primary labeling solutions across a range of concentrations of biotinylated human ACE2 ectodomain (ACROBiosystems AC2-H82E6), which contains its natural dimerization domain. Primary labeling reactions were conducted in sufficient reaction volumes for each concentration to avoid ligand depletion effects of greater than 10%. For instance, the lowest sample concentration of 10−13 M was scaled to 25mL, at which volume 2.9% of total ligand molecules are estimated to be titrated in RBD:ACE2 complexes given the wild-type KD,app and an estimated 50,000 surface RBDs per cell (Boder and Wittrup, 1997). Following overnight equilibration of ACE2 binding at room temperature, cells were washed in ice-cold PBS-BSA, and resuspended in PBS-BSA containing 1:200 diluted FITC-conjugated anti c-Myc antibody (Immunology Consultants Lab, CMYC-45F) to label for RBD surface expression via a C-terminal c-Myc epitope tag, and 1:200 diluted PE-conjugated streptavidin (Thermo Fisher S866) to detect bound biotinylated ACE2 ligand. Following 1 hour of secondary labeling at 4°C, cells were washed twice in ice-cold PBS-BSA, and resuspended in PBS.
RBD surface expression and ACE2-binding levels were determined via flow cytometry using a BD LSRFortessa X-50. For flow cytometry, 10,000 cells were analyzed at each ACE2 concentration across a titration series. Cells were gated to select for singleton events, FITC labeling was used to subset RBD+ cells, and PE labeling was measured within this FITC+ population. To mimic the subsequent library sorting experiments in which we are blinded to exact PE fluorescence within a given PE fluorescence bin (since we only sequence barcodes within a bin), we analyzed isogenic titration data by drawing equivalent bins of PE fluorescence that capture 95% of unbound unmutated SARS-CoV-2 cells (bin1), 95% of saturated SARS-CoV-2 cells (bin4), and a bin2/bin3 boundary evenly spaced on the log-scale between the boundaries of the bin1 and bin4 partitions (see Figure 2B). For each ACE2 concentration, we determine the mean bin of PE fluorescence as a simple weighted mean value across integer-weighted bins:
where ni,[ACE2] is the number of cells that fall into bin i at a given ACE2 concentration, and i is the simple integer value of a bin from 1 to 4.
We determined the binding constant KD,app describing the affinity of each RBD variant for human ACE2 ligand along with free parameters a (titration response range) and b (titration curve baseline) via nonlinear least-squares regression using a standard non-cooperative Hill equation relating the mean bin response variable to the ACE2 labeling concentration:
We report apparent KD values (KD,app) that do not take into account the stoichiometry of the multivalent yeast-displayed RBD interaction with dimeric ACE2. Following this “apparent” nomenclature, we report ACE2 concentrations as molarity of the monomeric subunit. Computational notebooks detailing the fits of all isogenic RBD titrations is provided on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/isogenic_titrations/homolog_validations.md and https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/isogenic_titrations/point-mut-validations.md).
Library mutagenesis
Mutagenesis of the SARS-CoV-2 RBD was performed in two independent replicates via the method described in Bloom (2014) with the modification that primers lengths were adjusted to ensure equal melting temperatures as described in Dingens et al. (2017) and we used NNS rather than NNN primers. Our general library generation and sequencing workflow is outlined in Figure S1A. Briefly, we designed mutagenic primers containing degenerate NNS codons that tile across the SARS-CoV-2 RBD, which were ordered as oPools from Integrated DNA Technologies. The script used to design the mutagenic primers and the resulting primer sequences are available at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/tree/master/data/primers/mutational_lib. We conducted three rounds of mutagenesis, each consisting of 7 mutagenic PCR cycles and 20 joining PCR cycles. The final joined products were amplified for 10 cycles with primers that append a unique identifier N16 barcode sequence to the 3′ end of each mutagenized insert, downstream of the RBD stop codon and mRNA 3′ UTR. Barcodes were also PCR appended to the un-mutagenized RBD homologs via the same primer addition PCR. Primers used in library assembly are provided on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/tree/master/data/primers).
Mutagenized SARS-CoV-2 libraries and pooled wild-type homolog RBDs were cloned into EcoRI-HF/SacI-HF digested pETcon 2649 vector (sequence linked above) using NEBuilder HiFi DNA Assembly (NEB E2621). Assembled products were Ampure purified and electroporated into electrocompetent NEB10-beta cells. Electroporated cells were plated on 15cm LB+ampicillin plates at an estimated bottleneck of 100,000 (SARS-CoV-2 mutant libraries) or 1,000 (pooled RBD homologs) colony forming units to limit library size. After approximately 18 hours of outgrowth, colonies were scraped into liquid LB+ampicillin, and grown for 2.5 hours in liquid culture prior to plasmid purification.
Plasmid pools were transformed into the AWY101 strain of Saccharomyces cerevisiae via the protocol of Gietz and Schiestl (2007). SARS-CoV-2 mutant libraries were transformed at 50ug scale and the pooled RBD homolog controls were transformed at 10ug scale. Colony forming unit counts from plated serial dilutions indicate transformation yield of > 1 million cfus. Transformed yeast grew for 14 hours post-transformation in 100mL selective SD-CAA media, and were subsequently back-diluted into 100mL fresh SD-CAA at 1 OD600 for an additional 9 hours passage, to enable further resolution of multiple vector transformants (Scanlon et al., 2009). Transformed yeast libraries were flash frozen in 1e8 cfu aliquots and stored −80°C.
PacBio library sequencing and analysis
PacBio sequencing was used to acquire long sequence reads spanning the N16 barcode and the RBD gene sequence. PacBio sequencing inserts were prepared from bacterially-purified plasmid pools via NotI-HF restriction digest followed by gel purification and SMRTbell ligation. The use of restriction digest rather than PCR eliminates the possibility of PCR strand exchange scrambling barcodes. Each SARS-CoV-2 RBD mutant library was spiked to 1% frequency with the internal standard pool of RBD homologs. Each replicate library was sequenced in two SMRT Cells on a PacBio Sequel using 20-hour movie collection times. PacBio circular consensus sequences (CCSs) were generated from the raw subreads using the ccs program (https://github.com/PacificBiosciences/ccs, version 4.2.0), setting the parameters to require 99.9% accuracy and a minimum of 3 passes. The resulting CCSs are available on the NCBI Sequence Read Archive at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA639956.
We then processed the CCSs to identify the RBD sequence (SARS-CoV-2 or one of the 11 homologs), call any mutations in the RBD sequence, and determine the associated 16-nucleotide barcode. To do this, we used alignparse (Crawford and Bloom, 2019), version 0.1.3, which in turn makes use of minimap2 (Li, 2018), version 2.17. We only retained CCSs that matched the parental RBD sequence with no more than 45 nucleotide mutations (corresponding to up to 15 codon mutations), had a barcode of the expected 16 nucleotide length, and had no more than one mismatch in the flanking regions expected in the sequenced amplicon. A computational notebook providing full details is available on GitHub at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/process_ccs.md.
We next used these processed CCSs to generate a codon-variant lookup table that links each barcode to its associated codon mutations in the RBD sequence. To do this, we first filtered only for CCSs where the PacBio ccs-reported accuracy was at least 99.99% in both the RBD gene sequence and the barcode (the vast majority of CCSs passed this filter). We then determined the empirical accuracy of the CCSs by determining the concordance between the RBD gene sequence called by CCSs with the same barcode using the method implemented at https://jbloomlab.github.io/alignparse/alignparse.consensus.html#alignparse.consensus.empirical_accuracy. For both libraries, the empirical accuracy of the entire region of the CCS covering the RBD sequence was 99.8% if we ignored those with indels (Figure S1B). Most barcodes were covered by multiple CCSs (Figure S1B), and in that case we built a consensus of these CCSs after discarding any barcodes for which the CCSs differed often or at many sites using the method implemented at https://jbloomlab.github.io/alignparse/alignparse.consensus.html#alignparse.consensus.simple_mutconsensus. Finally, we discarded any variants with indels in the RBD. Therefore, more than 99.8% of the final barcode-linked variants should have the correctly determined RBD sequence, since 99.8% is the accuracy for those covered by just one CCS and most variants were called by the consensus of multiple CCSs. For further analysis of the barcoded variants, we then created a codon variant table using dms_variants (https://jbloomlab.github.io/dms_variants/, version 0.6.0). The final barcode-variant lookup table (which associates each barcode with its RBD sequence) is at https://media.githubusercontent.com/media/jbloomlab/SARS-CoV-2-RBD_DMS/master/results/variants/codon_variant_table.csv. Some summary statistics about the final composition of the libraries are in Figure S1, and the complete code used to generate the barcode-variant lookup table and many additional plots characterizing the composition of the libraries are on GitHub at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/build_variants.md.
Deep mutational scanning library yeast surface-display induction and labeling
Yeast libraries were thawed and grown overnight at 30°C in 180mL SD-CAA media at an initial OD600 of 0.1. We spiked our SARS-CoV-2 mutant libraries with the barcoded RBD homolog pool at a total fraction of 0.6% yeast density, such that each RBD homolog barcode should be present at a frequency on the same order of magnitude as the typical SARS-CoV-2 variant barcode. To induce RBD surface expression, yeast were back-diluted to 50mL (expression experiments) or 200mL (binding experiments) SG-CAA+0.1%D induction media at 0.67 OD600 and incubated at room temperature for 16-18 hours with mild agitation.
For library expression experiments, 45 OD units yeast were washed twice with PBS-BSA and labeled in 3mL 1:100 diluted anti-Myc-FITC antibody for 1hr at 4°C with gentle mixing. Labeled cells were washed twice in PBS-BSA and resuspended in 5mL PBS for FACS. For library binding experiments, 8 OD units yeast per titration concentration (10−13 M to 10−6 M ACE2 at half-log intervals, and a 0M ACE2 sample) were washed twice with PBS-BSA, and incubated with ACE2 ligand overnight at room temperature with gentle agitation. Labeling volumes were scaled at low ACE2 concentration to limit ligand depletion effects, as with isogenic titrations described above. Following equilibration of ACE2 labeling, cells were kept chilled while washing once with PBS-BSA, labeling for one hour in 1mL PBS-BSA with 1:100 diluted Myc-FITC and 1:200 Streptavidin-PE, washed two more times with PBS-BSA, and resuspended to 1mL in PBS.
Fluorescence-activated cell sorting (FACS) of yeast libraries
Yeast libraries were sorted into bins of FITC or PE fluorescence using a BD FACS Aria II. Cells were sorted into 5mL FACS tubes containing 1mL of 2xYPAD supplemented with 1% BSA. Tubes were pre-wet with collection media prior to sample collection, to reduce sticking and improve post-sort yield.
For expression sorts, cells were gated for singleton events (Figure S2A), followed by partitioning into four bins of FITC fluorescence (Figure 2A): bin 1 captures 99% of unstained cells, and bins 2-4 split the remaining library fraction into tertiles. We sorted > 50 million cells from each library into these bins. From these same inductions, we also sorted 15 million RBD+ cells from each library (P4 population, Figure S2A), to enrich RBD-expressing cells within our libraries for our titration sorting experiments.
For ACE2-binding titrations, we gated cells for singleton events and RBD+ expression (Figure S2B). For each ACE2 concentration sample, we sorted cells into four bins of PE fluorescence as described above: bin1 captures 95% of unmutated SARS-CoV-2 cells incubated with 0M ACE2, bin4 captures 95% of unmutated cells at saturating ACE2 ligand, and the bin2/bin3 boundary evenly splits the log-MFI scale between the bin1 and bin4 boundaries (Figure 2B). We sorted each ACE2 concentration sample into these four bins for approximately 15 minutes, capturing 5-6 million cells per ACE2 concentration.
Following each sort, cells from each collection tube were spun for 5 min at 3,000 g in a tabletop centrifuge, yielding a visible pellet for any sample with at least ∼500,000 collected cells. Collection supernatant was removed, and cells were resuspended in SD-CAA media supplemented with 1:100 penicillin-streptomycin. Cells were resuspended to an estimated 2e6 cells/mL in 15mL culture tubes or baffled flasks for expresion post-sort samples, 5e5 cells/mL in baffled flasks for RBD+ sort samples, and 1mL (< 1e6 cells) or 1.5mL (> 1e6 cells) in 96-deep-well plates for titration samples. For expression FACS experiments, total cell recovery from all samples was measured via serial dilution and plating on YPD and SD-CAA plates for each sample, which showed average cellular recovery of 85% (range 79%–94%), with 62% (range 52%–77%) of cells retaining plasmid, with the exception of the FITC-negative bin 1 populations, which showed 20% plasmid retention. These per-sample cell recovery counts were used to calibrate downstream sequencing numbers for the actual number of cells that grew out from each sort bin. For titration sorts, we did not titer all 64 post-sort samples, but instead spot checked 6 samples to ensure normal levels of cell recovery, which showed an average 66% cell recovery and 46% plasmid retention. As we did not titer all samples, we use the FACS log cell count as the estimate of number of cells collected in each bin, which makes the assumption that there are no systematic differences in post-sort cell yield across bins, which is more appropriate for these titration sorts where the ACE2-binding gates are nested within an overall RBD+ selection gate that selects for even plasmid retention (Figure S2B).
Post-sort samples were grown overnight in liquid media at 30°C. Plasmids were purified from post-sort yeast samples of < 4e7 cfu using Zymo Yeast Miniprep kits (single column or 96-well plate formats) according to kit instructions, but with the addition of > 2 hours Zymolyase treatment and a −80°C freeze/thaw cycle prior to cell lysis.
Illumina Sequencing
Post-sort plasmid samples were PCR amplified from 10uL plasmid template input using primers flanking the N16 barcode that append remaining Illumina sequencing handles that are not already plasmid encoded, and unique NextFlex sample indices (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/tree/master/data/primers). PCRs were conducted with KOD polymerase for 20 cycles, except for titration sort samples of less than 10,000 cells, where 28 cycles were necessary to obtain sufficient PCR product due to low sample input:
-
1.
95°C, 2min
-
2.
95°C, 20 s
-
3.
58°C, 10 s
-
4.
70°C, 10 s
-
5.
Return to 2, 19x (27x for low-input samples)
PCR products were Ampure purified, quantified via PicoGreen, and pooled to mirror desired sample frequencies given cell counts in each FACS sample. Pooled samples were gel purified, Ampure purified, and submitted for 2 lanes of 50bp single end Illumina HiSeq sequencing per library.
Demultiplexed reads were aligned to library barcodes determined from PacBio sequencing, yielding a count of the number of times each library barcode was sequenced within each FACS partition. Read counts for each FACS sample were downweighted by the ratio of total reads from a bin compared to the number of cells that were actually sorted into that bin. For one bin in which the number of HiSeq reads was less than the number of cells sorted into a bin, we re-amplified PCR product from a newly purified plasmid aliquot, and obtained reads via a single lane of MiSeq 50bp single end sequencing. Computational notebooks providing additional details on our Illumina sequencing processing and statistics are provided on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/count_variants.md and https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/analyze_counts.md).
Calculating variant phenotypes for expression
For each library variant, we estimated mean expression based on its distribution of cell counts across FITC sort bins and the known censored fluorescence boundaries of each sort bin using a maximum likelihood approach (Peterman and Levine, 2016), enacted in the fitdistrplus R package (Delignette-Muller and Dutang, 2015), assuming the uncensored log-transformed fluorescence values for a genotype follow a normal distribution. Expression measurements were retained for barcodes for which at least 20 cells were sampled across the four sort bins, resulting in measured expression phenotypes for 92.9 and 90.5% of variants in libraries 1 and 2, respectively.
Expression measurements were represented as the difference in log-mean fluorescence intensity (MFI) relative to wild-type (ΔlogMFI = logMFIvariant - logMFIwild-type), such that a positive value indicates higher RBD expression. A very small fraction of wild-type and synonymous barcodes were ascribed non-fluorescing phenotypes, likely reflecting expression-abolishing mutations that occurred outside of the PacBio sequencing window. These variants were selected out prior to titration measurements by our RBD+ pre-sort, but remain in the expression measurements. To avoid artificially depressing the wild-type SARS-CoV-2 expression measurement and therefore miscalibrating this Δlog(MFI) scale, potentially annotating slightly deleterious mutational effects as beneficial, we computed the mean wild-type expression excluding these outliers (logMFI < 10.2 or 10.1 in lib1 and lib2, respectively). We note that we are unable to do the same for any library mutants for which we observe non-fluorescence, because we are unable to a priori determine whether a lack of expression is due to the library mutation versus external, unobserved factors. This uncertainty makes our calling of expression-enhancing mutations conservative, as mutational effects, if biased by these outliers, will tend to be pulled slightly down in their measurement. The global epistasis approach we explain below can mitigate the influence of these outlier observations on our final estimates of mutational effects. A computational notebook presenting our calculation of expression phenotypes and results is included on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/compute_expression_meanF.md).
Calculating variant phenotypes for ACE2-binding affinity
For each library barcode at each ACE2 sample concentration, we determined its simple mean bin of ACE2-binding via the equation used above in isogenic titrations. We fit titration curves as above to determine barcode-specific KD,app from the series of FACS-seq derived mean bin measurements across ACE2 concentration (Figure S2D). Because a barcode’s mean bin might be measured with varying certainty across different ACE2 concentrations, we used weighted least-squares nonlinear regression, weighing each mean bin estimate by an empirical estimate of variance based on the per-sample cell count, derived from estimates of variance in repeated wild-type/synonymous barcode measurements grouped by sampling depth (Figure S2C, right panels). To avoid fits of errant titration curves, we constrained the baseline parameter b to be fit between 1 and 1.5, and the response parameter a to be fit between 1.5 and 3. Through initial curve fit constraints and subsequent QC filtering, our fit KD,app binding constants were constrained to be within the concentration range of our titration (10−13 – 10−6 M), and therefore many barcodes are censored at the upper limit with true KD,app ≥ 10−6 M. We filtered out titration curve fits for variants with an average cell count < 5 across sample concentrations, or with cell count < 2 in 7 or more of the 16 ACE2 concentration samples. Finally, we filtered out the 5% of curves with the highest normalized mean square residual, where residuals are normalized from 0 to 1 by the fit response parameter a, such that titration curves that plateau at lower levels of saturated binding don’t have systematically smaller mean square residuals. This process yielded KD,app estimates for 75.2 and 75.4% of variants in libraries 1 and 2, respectively. Binding measurements were represented as the difference in log10(KD,app) relative to wild-type (Δlog10(KD,app) = log10(KD,app)wild-type – log10(KD,app)variant), polarized such that a positive value indicates higher variant ACE2 affinity. A computational notebook presenting our calculation of expression phenotypes and results is included on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/compute_binding_Kd.md).
Decomposing single-mutant effects from multiple-mutant genotypes
Barcodes in our libraries contain a Poisson-distributed number of mutations (Figure S1C). Though most mutations are sampled in at least one barcode as a unique single mutant (Figure S1E), most library genotypes contain multiple amino acid mutations, and some amino acid mutations are only sampled on many of these multiple-mutant backgrounds. Therefore, we used global epistasis models (Otwinowski et al., 2018) to decompose single-mutation effects from the set of all single- and multi-mutant backgrounds (Figures S2E–S2K). Briefly, we fit regression models that represent the phenotype of each library variant as a sum of latent-scale effects of all component amino acid mutations, which are transformed by a flexible nonlinear curve to the observed experimental scale; the shape of the nonlinear curve and the single-mutant effect terms are fit simultaneously to all of the data. For variance estimates on each library variant, we used the standard error of the estimate on KD,app to estimate a variance for our per-variant binding measurements; for expression, we calculated empirical estimates of variance as a function of cell count, based on binning replicate wild-type/synonymous mutant barcodes present in the library across bins of sampling depth (Figure S2C, left panels). Our analysis, implemented in the dms_variants package (see https://jbloomlab.github.io/dms_variants/dms_variants.globalepistasis.html), is as described by Otwinowski et al., except we used a Cauchy likelihood model to relate observed measurements to the global epistasis modeled phenotype, which should be more tolerant of outliers than the Gaussian likelihood used by Otwinoski et al., and we transformed our single-mutant effect latent-scale coefficients back to the experimentally measured observed scale, to facilitate comparison with additional measurements made on this scale such as the RBD homologs spiked into each library. Computational notebooks detailing the global epistasis fits are provided on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/global_epistasis_expression.md and https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/global_epistasis_binding.md).
For our binding titration measurements, directly measured single-mutant phenotypes correlated extremely well between replicates (R2 = 0.97, Figure S2I), and this correlation was not further improved by the global epistasis decomposition (Figure S2J); therefore, we retained all directly measured single-mutant effects, and only used global epistasis decomposition to interpolate the 14% of single mutants in each library that were not directly measured on any single-mutant backgrounds (which together comprise the measurements correlated in Figure 2F). It is important to note that the shape of global epistasis nonlinearity that was fit to the data disallows mutations from increasing affinity relative to wild-type (Figures S2H and S2K)—this prevents us from ascribing affinity-enhancing effects to any of the mutations that we did not directly measure as single mutants (only 5.7% of mutants were not sampled as single mutants in either library), which we accept as an appropriately conservative approach.
In the case of our expression measurements, directly sampled single mutants correlated moderately well between replicates (R2 = 88, Figure S2F), but this correlation was improved between the global epistasis estimates derived from each library (R2 = 0.93, Figure 2E). This may be in part because the expression phenotype is a more widely distributed phenotype with smaller relative shifts in the mean caused by mutation, and because of the errant outliers that we could not account for as discussed above with regards to wild-type barcodes, such that measurements of mutational effects are improved when integrating across many different backgrounds instead of taking a single observed barcode at face value. Therefore, for expression phenotypes, we used the global epistasis estimates for all mutations. We filtered out four coefficients from library 1 and three from library 2 that had nonsensically high model estimates, likely to do partial collinearities among some low-coverage mutations. Our final binding and expression single-mutant phenotypes were determined from the average effect across the two independent library replicates. A computational notebook detailing the full derivation of our final single-mutant phenotypic scores for binding and expression is on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/single_mut_effects.md#assessing-global-epistasis-models-for-binding-data).
Data visualization
The interactive heatmap of mutational effects shown at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS/ (Data S1) was made using the altair Python package (VanderPlas et al., 2018).
For the logo plot representation of the data in Figure S3, the experimental measurements of Δlog(MFI) and Δlog10(KD,app) were converted to letter heights as follows. For binding, we first computed a Boltzmann-like weighting factor for each amino acid a at site r as wr,a = exp(α xr,a) where xr,a is the experimental measurement for the effect of the mutation of site r to amino acid a, in other words the Δlog(MFI) or Δlog10(KD,app) value. The α parameter is a temperature-like scaling factor which was set to 1.4 for the binding values, and chosen for the expression values so that the range of exponents for expression is the same as for binding. The letter heights were then computed by re-scaling the weighting factors at each site to sum to one, so that the letter height is pr,a = wr,a / ∑a’ wr,a’. The logo plots themselves were rendered using Logomaker (Tareen and Kinney, 2020). The code that creates these logo plots is on GitHub at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/logoplots_of_muteffects.md.
The interactive structure-based visualizations at https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS/structures were built using dms-view (Hilton et al., 2020). In these visualizations, the logo plot letter heights were computed as for Figure S3 (see paragraph immediately above). Number of effective amino acids was calculated as the exponentiated preferences. Mean, minimum, and maximum mutational effects per site were calculated from the set of Δlog(MFI) or Δlog10(KD,app) measurements of all missense mutations at a site.
Structural analyses
Structural analyses of the ACE2-bound SARS-CoV-2 and SARS-CoV-1 RBDs used the crystal structures from PDB: 6M0J (Lan et al., 2020) and PDB: 2AJF (Li et al., 2005a), respectively. ACE2 contacts were annotated as residues with any non-hydrogen atom within 4 Angstrom from any ACE2 residue. Solvent-accessible surface area was calculated from the 6M0J structure using dssp (Kabsch and Sander, 1983), with and without the ACE2 ligand present. Relative solvent accessibilities were determined by normalizing to the maximum theoretical solvent accessibility of a residue (Tien et al., 2013). Structural images were rendered in PyMol. Full analyses of our mutational measurements in context of structural and evolutionary features are provided on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/structure_function.md and https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/sarbecovirus_diversity.md).
Antibody epitopes were mapped from crystal structures PDB: 6W41 (Yuan et al., 2020b), PDB: 6WAQ (Wrapp et al., 2020b), PDB: 2DD8 (Prabakaran et al., 2006), PDB: 3BGF (Pak et al., 2009), PDB: 2GHW (Hwang et al., 2006), PDB: 7BZ5 (Wu et al., 2020b), and cryo-EM structures PDB: 6NB6, 6NB7 (Walls et al., 2019), and PDB: 6WPS (Pinto et al., 2020). RBD residues were annotated as being in an antibody epitope if any non-hydrogen atom was within 4 Angstroms of an antibody residue, with the exception of the backbone-only models of 6NB6 and 6NB7, where epitopes were defined as RBD residues with Cɑ within 8 Angstroms of any antibody residue. Our full analysis of mutational constraint in antibody epitopes is provided on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/antibody_epitopes.md).
Analysis of circulating variants
All 31,570 spike sequences on GISAID as of 27 May 2020 were downloaded and aligned via mafft (Katoh and Standley, 2013). Sequences from non-human origins and sequences containing any gap characters were removed. All amino-acid mutations among GISAID sequences were enumerated. Some low-coverage spike sequences contain undetermined ‘X’ characters. We excluded any mutation from our curated set of GISAID mutations if it was solely observed on sequence backgrounds containing at least one undetermined X character in the RBD sequence; however, sequences with X characters were allowed to contribute to observations of mutation count for mutations that were observed on at least one other high-coverage RBD sequence. To characterize patterns of selection on amino-acid mutations observed among GISAID sequences, we conducted permutation tests as described in the Figure S7 legend. Our full analysis of mutational effects of circulating variants is provided on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/circulating_variants.md). We acknowledge all GISAID contributors for their sharing of sequencing data (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/alignments/Spike_GISAID/gisaid_hcov-19_acknowledgement_table.xls).
Alignment and phylogeny
We used the curated RBD sequence set from Letko et al. (2020), adding newly described RBD sequences from sarbecovirus strains RaTG13 (Zhou et al., 2020b), RmYN02 (Zhou et al., 2020a), GD-Pangolin and GX-Pangolin (Lam et al., 2020), and the additional non-Asian bat sarbecovirus isolate BtKY72 (Tong et al., 2009). RBD nucleotide sequences were aligned via mafft with a gap opening penalty of 4.5, and the maximum likelihood phylogeny was inferred in RAxML (Stamatakis, 2014) under the GTR model with 4 gamma-distributed discrete categories of among-site rate variation.
Pseudotyped lentiviral particle infection assays
We selected seven single mutations from our deep mutational scanning measurements for validation of phenotypic effects in a spike-pseudotyped lentivirus assay (Crawford et al., 2020). Mutations were selected that exhibited deleterious effects on RBD expression (C432D) or ACE2 binding (L455Y, N501D and G502), no strong phenotypic effect on either binding or expression (N439K), and affinity-enhancing effects (Q498Y and N501F). These point mutations were introduced via site-directed mutagenesis (New England Biolabs E0554S) into the HDM vector containing codon-optimized SARS-CoV-2 Spike from Wuhan-Hu-1, with an upstream Kozak sequence. The full sequence of this plasmid is available at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/data/plasmid_maps/2736_HDM_IDTSpike_EcoKozak.gb.
Pseudotyped lentiviral particles were generated as previously described (Crawford et al., 2020). Viruses were rescued in triplicate (i.e., independent transfections), which should average out variation in transfection efficiency such that viral entry phenotypes are reflective of both pseudovirus production and entry efficiency. Briefly, 2.5e5 293T cells per well were seeded in 12-well plates in 1 mL D10 growth media (DMEM with 10% heat-inactivated FBS, 2 mM l-glutamine, 100 U/mL penicillin, and 100 μg/mL streptomycin). 24h later, cells were transfected using BioT transfection reagent (Bioland Scientific, Paramount, CA, USA) with 0.5 μg of the ZsGreen lentiviral backbone pHAGE2-CMV-ZsGreen-W (BEI Resources NR-52520), 0.11 μg each of the lentiviral helper plasmids HDM-Hgpm2 (BEI Resources NR-52517), pRC-CMV-Rev1b (BEI Resources NR-52519), and HDM-tat1b (BEI Resources NR-52518), and 0.17 μg wild-type or mutant SARS-CoV-2 Spike plasmids. Media was changed to fresh D10 at 24 h post-transfection. At 60 hours post transfection, the viral supernatant was collected, filtered through a 0.45 μm SFCA low protein-binding filter, and stored at −80°C. To quantify efficiency of pseudovirus production, we quantified p24 levels (in pg/mL) in viral transfection supernatants via ELISA, in technical duplicate, per kit instructions (Advanced Bioscience Laboratories Cat. # 5421).
The resulting viruses were titered as previously described (Crawford et al., 2020). 293T cells stably expressing ACE2 (BEI NR-52511) were seeded at 1e4 cells per well in poly-L-lysine coated 96-well plates (Greiner 655930). 24 h later, 3 wells were counted and averaged to determine the number of cells per well at time of infection. Media was removed from the 293T-ACE2 cells and replaced with fresh D10 containing 50 μL of pseudovirus supernatant in a final volume of 150 μL. Polybrene (TR-1003-G, Sigma Aldrich, St. Louis, MO, USA) was added to a final concentration of 5 μg/mL. 60 h post-infection, cells were analyzed by flow cytometry. Titers were calculated using the Poisson formula. If P is the percentage of cells that are ZsGreen positive, as determined by drawing a ZsGreen+ gate from uninfected controls, then the titer per ml is: -ln(1 − P/100) × (number of cells/well)/(volume of virus per well in mL). Titers are only accurate when the percentage of ZsGreen+ cells is relatively low, i.e., ∼1%–10%. Titers are reported relative to the mean of the wild-type, which had similar titers as Crawford et al. of ∼104 infectious particles per mL (Crawford et al., 2020; Figure 4H), and normalized by p24 levels in transfection supernatants (Figure S4J). The dashed horizontal line in Figure 4H showing the limit of detection was calculated as the minimum titer that would be determined in the case of a single positive event.
RBD homolog purifications and binding assays
Receptor binding domains of SARS-CoV-2 (328-531) (Walls et al., 2020), WIV1 (316-518), RaTG13 (359-562), ZC45 (324-508), and ZXC21 (323-507) were synthesized by GenScript into vector pcDNA3.1- with a preceding mu-phosphatase signal peptide and a C-terminal octahistidine tag. SARS-CoV-1 (306-575) was subcloned from a GenArt synthesized SARS-CoV-1 Spike ectodomain (Walls et al., 2020). Human ACE2-Fc was synthesized and cloned by Twist with a C-terminal human Fc tag. The ACE2 construct begins with 19STIEE and ends with PYAD615.
The RBD constructs were transfected into 150mL suspension Expi293F (Thermo Fisher Cat No. A14527) cells at 37°C in a humidified 8% CO2 incubator rotating at 130rpm and harvested 3 days later. Clarified supernatants were purified in batch over Talon resin (Takara) prior to buffer exchanging into 20mM Tris pH 8, 150mM NaCl and flash freezing.
Expression and purification of human ACE2 was performed as previously described (Walls et al. 2020). Briefly, human ACE2-Fc was produced in FreeStyle 293F cells (ThermoFisher Cat# R79007) grown in suspension using FreeStyle 293 expression medium (Life Technologies) at 37°C in a humidified 8% CO2 incubator rotating at 130rpm. The cultures were transfected using PEI-MAX (Polyscience) with cells grown to a density of 2.5 million cells per mL and cultivated for 6 days. Human ACE2-Fc from clarified supernatants was affinity purified using a protein A column (GE Healthcare). The Fc tag was removed by thrombin cleavage in a reaction mixture containing 6mg of recombinant ACE2-Fc and 20μg thrombin in 20mM Tris-HCl pH 8.0, 150mM NaCl and 2.5mM CaCl2 for 4h at room temperature. The reaction mixture was re-purified using a protein A column to remove uncleaved protein and cleaved Fc tag. The cleaved ACE2 protein was further purified by gel filtration using a Superdex 200 column 10/300 GL (GE Life Sciences) equilibrated in a buffer containing 20mM Tris pH 8.0 and 150mM NaCl. Purified protein was quantified using absorption at 280nm, and concentrated to approximately 1mg/mL.
Binding measurements were performed on an Octet Red instrument (Forte Bio) at 30°C with shaking at 1,000 RPM. For monomeric ACE2 affinity measurements, Ni-NTA biosensors were hydrated in water for 10min and placed into 10X Kinetics Buffer (ForteBio). 10-50 μg/mL of RBD was loaded for a 1.5nm threshold prior to baseline stabilization in 10X Kinetics Buffer. The sensors were immersed in a 1:3 serial dilution of monomeric Fc-cleaved ACE2 ranging from 1,000 to 4.11nM in 10X Kinetics Buffer. For measurements of RaTG13, ZC45, and ZXC21 binding to dimeric ACE2-Fc, ARG2 biosensors were hydrated in water then activated for 300 s with an NHS-EDC solution (ForteBio) prior to amine coupling. 5-10 μg/mL of RBD in 10mM pH6 sodium acetate was loaded onto ARG2 tips (ForteBio) for 600 s and then quenched into 1M ethanolamine for 300 s. A baseline in 10X Kinetics Buffer was collected for 120 s prior to immersing the sensors in a 1:3 serial dilution of dimeric ACE2-his (SinoBiological # 10108-H08H, residues 1-740) ranging from 1,000 to 4.11nM in 10X Kinetics Buffer. Curve fitting was performed using a 1:1 binding model and the ForteBio data analysis software when applicable. Mean kon and koff values were determined with a global fit applied to all data.
Expression-enhancing mutant RBD purifications, binding, and stability assays
Codon-optimized RBDs of SARS-CoV-2 with its unmutated sequence or with single mutations (I358F, Y365F, Y365W, V367F or F392W) were synthesized by IDT as gBlocks with an N-terminal EGT secretion signal (MGILPSPGMPALLSLVSLLSVLLMGCVA) and C-terminal Avi- and octa-histidine tags (GLNDIFEAQKIEWHEHHHHHHHH) and cloned into the CMV/R (VRC 8400) mammalian expression vector. Plasmids were transfected into 200mL suspension Expi293F cells at 37°C in a humidified 8% CO2 incubator rotating at 130 rpm and harvested 3 days later. Clarified supernatants were purified in batch over Talon resin (Takara). After elution at 125mL in 20mM Tris (pH 8.0), 300mM NaCl, and 300mM imidazole, concentrated solutions of L-arginine (pH 8.0), CHAPS and glycerol were added to eluate to final concentrations of 100mM, 0.75%, and 5%, respectively, to prevent adhesion to concentrator membranes. To quantify yield, each sample was concentrated to a final volume of 1500uL, and 1000uL was applied to a Superdex 75 Increase 10/300 GL column (GE) pre-equilibrated with 50mM Tris (pH 8.0), 185mM NaCl, 100mM L-arginine, 0.75% CHAPS and 5% glycerol. Peak integration was quantified using UNICORN software (GE), and relative quantity from the SEC trace was corrected for unique extinction coefficients and molecular weights of each RBD mutant. Purified peaks from monomeric species were dialyzed three times into 25mM Tris (pH 8.0), 150mM NaCl and 5% glycerol at 4°C.
BLI binding assays were performed on an Octet Red instrument at 25°C with shaking at 1,000 RPM in the presence of 25mM Tris pH 8.0, 150mM NaCl and 5% glycerol. Anti-hIgG Capture (AHC) tips were loaded with human ACE2-Fc or CR3022 at 0.02mg/mL for 300 s prior to a baseline for 60 s, association with monomeric RBDs at 500nM for 600 s, and dissociation for 300 s.
Non-equilibrium measurements of melting temperatures were determined from thermal denaturation melt curves using an UNcle (UNchained Labs) based on the barycentric mean of intrinsic tryptophan fluorescence, with data collected from 20-95°C using a thermal ramp of 1°C per minute in a background of 25mM Tris pH 8.0, 150mM NaCl and 5% glycerol. Melting temperatures were defined as the maximum point of the first derivative of the melting curve, with first derivatives calculated using GraphPad Prism software after smoothing with four neighboring points using 2nd order polynomial settings.
Quantification and Statistical Analysis
Quantitative analyses were performed using custom code, available on GitHub (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS).
For quantitative analysis of deep mutational scanning expression phenotypes (see Method Details section, “Calculating variant phenotypes for expression”), we determined per-variant expression via maximum likelihood inference using the fitdistrplus R package (Delignette-Muller and Dutang, 2015).
For quantitative analysis of deep mutational scanning binding phenotypes (see Method Details section, “Calculating variant phenotypes for ACE2-binding affinity”), we determined per-variant titration curves via weighted least-squares nonlinear regression in R.
To quantitatively decompose single-mutant effects on expression and binding (see Figure S2 legend and Method Details section, “Decomposing single-mutant effects from multiple-mutant genotypes”), we fit global epistasis regression models (Otwinowski et al., 2018) using the dms_variants Python package (https://jbloomlab.github.io/dms_variants/dms_variants.globalepistasis.html).
For quantification of binding via Biolayer Inferometry (see Figures 4D, S4A–S4F, and S4I), global curve fitting to determine kon and koff was performed using a 1:1 binding model in the ForteBio data analysis software.
To quantify thermal stability from melting curves (see Figures 4G and S4H), the GraphPad Prism software was used to identify the maximum point of the first derivative of the melting curve.
For the statistical analysis of mutations observed among circulating SARS-CoV-2 isolates described in Figure S7, we used permutation tests to assess significant trends in effects of observed mutations compared to the distribution of randomly sampled mutation subsets.
Acknowledgments
We thank Keara Malone for experimental assistance, Katherine Xue for helpful suggestions, and Frederick Matsen for intellectual support and hospitality. We thank the Flow Cytometry and Genomics core facilities at the Fred Hutchinson Cancer Research Center for experimental support as well as Mike Murphy, Deleah Pettie, and the Mammalian Production Core at the Institute for Protein Design for assistance with protein purification. This work was supported by the NIAID/NIH (R01AI141707 and R01AI12893 to J.D.B., HHSN272201700059C to D.V., F30AI149928 to K.H.D.C., and T32AI083203 to A.J.G.), NIGMS/NIH (R01GM120553 to D.V.), a Pew Biomedical Scholars Award to D.V., Burroughs Wellcome Investigators in the Pathogenesis of Infectious Diseases awards to D.V. and J.D.B., the Bill & Melinda Gates Foundation (OPP1156262 to D.V. and N.P.K.), Fast Grants to N.P.K. and D.V., and a generous gift from the Open Philanthropy Project to N.P.K. T.N.S. is a Washington Research Foundation Innovation Fellow at the University of Washington Institute for Protein Design and a Howard Hughes Medical Institute Fellow of the Damon Runyon Cancer Research Foundation (DRG-2381-19). J.D.B. is an Investigator of the Howard Hughes Medical Institute.
Author Contributions
Conceptualization, T.N.S., D.V., and J.D.B.; Methodology, T.N.S. and J.D.B.; Investigation, T.N.S. and A.J.G.; Code, T.N.S., S.K.H., K.H.D.C., and J.D.B.; Formal Analysis, T.N.S. and J.D.B.; Validation, A.J.G., D.E., K.H.D.C., A.S.D., M.J.N., J.E.B., M.A.T., and A.C.W.; Visualization, T.N.S., S.K.H., and J.D.B.; Writing – Original Draft, T.N.S. and J.D.B.; Writing – Review and Editing, all authors; Supervision, N.P.K., D.V., and J.D.B.
Declaration of Interests
N.P.K. is a co-founder, shareholder, and chair of the scientific advisory board of Icosavax, Inc.
Published: August 11, 2020, corrected online August 26, 2020
Footnotes
Supplemental Information can be found online at https://doi.org/10.1016/j.cell.2020.08.012.
Supplemental Information
References
- Adams R.M., Mora T., Walczak A.M., Kinney J.B. Measuring the sequence-affinity landscape of antibodies with massively parallel titration curves. eLife. 2016;5:e23156. doi: 10.7554/eLife.23156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andersen K.G., Rambaut A., Lipkin W.I., Holmes E.C., Garry R.F. The proximal origin of SARS-CoV-2. Nat. Med. 2020;26:450–452. doi: 10.1038/s41591-020-0820-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baum A., Fulton B.O., Wloga E., Copin R., Pascal K.E., Russo V., Giordano S., Lanza K., Negron N., Ni M. Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies. Science. 2020:eabd0831. doi: 10.1126/science.abd0831. Published online June 15, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Becker M.M., Graham R.L., Donaldson E.F., Rockx B., Sims A.C., Sheahan T., Pickles R.J., Corti D., Johnston R.E., Baric R.S., Denison M.R. Synthetic recombinant bat SARS-like coronavirus is infectious in cultured cells and in mice. Proc. Natl. Acad. Sci. USA. 2008;105:19944–19949. doi: 10.1073/pnas.0808116105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bedford T., Greninger A.L., Roychoudhury P., Starita L.M., Famulare M., Huang M.-L., Nalla A., Pepper G., Reinhardt A., Xie H. Cryptic transmission of SARS-CoV-2 in Washington State. medRxiv. 2020 doi: 10.1101/2020.04.02.20051417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bloom J.D. An experimentally determined evolutionary model dramatically improves phylogenetic fit. Mol. Biol. Evol. 2014;31:1956–1978. doi: 10.1093/molbev/msu173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bloom J.D., Gong L.I., Baltimore D. Permissive secondary mutations enable the evolution of influenza oseltamivir resistance. Science. 2010;328:1272–1275. doi: 10.1126/science.1187816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boder E.T., Wittrup K.D. Yeast surface display for screening combinatorial polypeptide libraries. Nat. Biotechnol. 1997;15:553–557. doi: 10.1038/nbt0697-553. [DOI] [PubMed] [Google Scholar]
- Bolles M., Donaldson E., Baric R. SARS-CoV and emergent coronaviruses: viral determinants of interspecies transmission. Curr. Opin. Virol. 2011;1:624–634. doi: 10.1016/j.coviro.2011.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boni M.F., Lemey P., Jiang X., Lam T.T.-Y., Perry B.W., Castoe T.A., Rambaut A., Robertson D.L. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat. Microbiol. 2020 doi: 10.1038/s41564-020-0771-4. Published online July 28, 2020. [DOI] [PubMed] [Google Scholar]
- Brouwer P.J.M., Caniels T.G., van der Straten K., Snitselaar J.L., Aldon Y., Bangaru S., Torres J.L., Okba N.M.A., Claireaux M., Kerster G. Potent neutralizing antibodies from COVID-19 patients define multiple targets of vulnerability. Science. 2020;369:643–650. doi: 10.1126/science.abc5902. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Callaway H.M., Welsch K., Weichert W., Allison A.B., Hafenstein S.L., Huang K., Iketani S., Parrish C.R. Complex and Dynamic Interactions between Parvovirus Capsids, Transferrin Receptors, and Antibodies Control Cell Infection and Host Range. J. Virol. 2018;92:e00460-e18. doi: 10.1128/JVI.00460-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao Y., Su B., Guo X., Sun W., Deng Y., Bao L., Zhu Q., Zhang X., Zheng Y., Geng C. Potent neutralizing antibodies against SARS-CoV-2 identified by high-throughput single-cell sequencing of convalescent patients’ B cells. Cell. 2020;182:73–84. doi: 10.1016/j.cell.2020.05.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen W.-H., Du L., Chag S.M., Ma C., Tricoche N., Tao X., Seid C.A., Hudspeth E.M., Lustigman S., Tseng C.-T.K. Yeast-expressed recombinant protein of the receptor-binding domain in SARS-CoV spike protein with deglycosylated forms as a SARS vaccine candidate. Hum. Vaccin. Immunother. 2014;10:648–658. doi: 10.4161/hv.27464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen W.-H., Chag S.M., Poongavanam M.V., Biter A.B., Ewere E.A., Rezende W., Seid C.A., Hudspeth E.M., Pollet J., McAtee C.P. Optimization of the Production Process and Characterization of the Yeast-Expressed SARS-CoV Recombinant Receptor-Binding Domain (RBD219-N1), a SARS Vaccine Candidate. J. Pharm. Sci. 2017;106:1961–1970. doi: 10.1016/j.xphs.2017.04.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen W.-H., Tao X., Peng B.-H., Pollet J., Strych U., Bottazzi M.E., Hotez P.J., Lustigman S., Du L., Jiang S. Yeast-Expressed SARS-CoV Recombinant Receptor-Binding Domain (RBD219-N1) Formulated with Alum Induces Protective Immunity and Reduces Immune Enhancement. bioRxiv. 2020 doi: 10.1101/2020.05.15.098079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen W.-H., Strych U., Hotez P.J., Bottazzi M.E. The SARS-CoV-2 Vaccine Pipeline: an Overview. Curr. Trop. Med. Rep. 2020;7:1–4. doi: 10.1007/s40475-020-00201-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crawford K.H.D., Bloom J.D. alignparse: A Python package for parsing complex features from high-throughput long-read sequencing. J. Open Source Softw. 2019;4:1915. doi: 10.21105/joss.01915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crawford K.H.D., Eguia R., Dingens A.S., Loes A.N., Malone K.D., Wolf C.R., Chu H.Y., Tortorici M.A., Veesler D., Murphy M. Protocol and Reagents for Pseudotyping Lentiviral Particles with SARS-CoV-2 Spike Protein for Neutralization Assays. Viruses. 2020;12:513. doi: 10.3390/v12050513. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cui J., Li F., Shi Z.-L. Origin and evolution of pathogenic coronaviruses. Nat. Rev. Microbiol. 2019;17:181–192. doi: 10.1038/s41579-018-0118-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davenport T.M., Gorman J., Joyce M.G., Zhou T., Soto C., Guttman M., Moquin S., Yang Y., Zhang B., Doria-Rose N.A. Somatic Hypermutation-Induced Changes in the Structure and Dynamics of HIV-1 Broadly Neutralizing Antibodies. Structure. 2016;24:1346–1357. doi: 10.1016/j.str.2016.06.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dean A.M., Thornton J.W. Mechanistic approaches to the study of evolution: the functional synthesis. Nat. Rev. Genet. 2007;8:675–688. doi: 10.1038/nrg2160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Delignette-Muller M., Dutang C. fitdistrplus: An R Package for Fitting Distributions. Journal of Statistical Software. Articles. 2015;64:1–34. [Google Scholar]
- Demogines A., Farzan M., Sawyer S.L. Evidence for ACE2-utilizing coronaviruses (CoVs) related to severe acute respiratory syndrome CoV in bats. J. Virol. 2012;86:6350–6353. doi: 10.1128/JVI.00311-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dingens A.S., Haddox H.K., Overbaugh J., Bloom J.D. Comprehensive Mapping of HIV-1 Escape from a Broadly Neutralizing Antibody. Cell Host Microbe. 2017;21:777–787. doi: 10.1016/j.chom.2017.05.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dingens A.S., Arenz D., Weight H., Overbaugh J., Bloom J.D. An Antigenic Atlas of HIV-1 Escape from Broadly Neutralizing Antibodies Distinguishes Functional and Structural Epitopes. Immunity. 2019;50:520–532. doi: 10.1016/j.immuni.2018.12.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duan H., Chen X., Boyington J.C., Cheng C., Zhang Y., Jafari A.J., Stephens T., Tsybovsky Y., Kalyuzhniy O., Zhao P. Glycan Masking Focuses Immune Responses to the HIV-1 CD4-Binding Site and Enhances Elicitation of VRC01-Class Precursor Antibodies. Immunity. 2018;49:301–311. doi: 10.1016/j.immuni.2018.07.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eggink D., Goff P.H., Palese P. Guiding the immune response against influenza virus hemagglutinin toward the conserved stalk domain by hyperglycosylation of the globular head domain. J. Virol. 2014;88:699–704. doi: 10.1128/JVI.02608-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elbe S., Buckland-Merrett G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob. Chall. 2017;1:33–46. doi: 10.1002/gch2.1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fane B., Villafane R., Mitraki A., King J. Identification of global suppressors for temperature-sensitive folding mutations of the P22 tailspike protein. J. Biol. Chem. 1991;266:11640–11648. [PubMed] [Google Scholar]
- Fauver J.R., Petrone M.E., Hodcroft E.B., Shioda K., Ehrlich H.Y., Watts A.G., Vogels C.B.F., Brito A.F., Alpert T., Muyombwe A. Coast-to-Coast Spread of SARS-CoV-2 during the Early Epidemic in the United States. Cell. 2020;181:990–996. doi: 10.1016/j.cell.2020.04.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fowler D.M., Fields S. Deep mutational scanning: a new style of protein science. Nat. Methods. 2014;11:801–807. doi: 10.1038/nmeth.3027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frank H.K., Enard D., Boyd S.D. Exceptional diversity and selection pressure on SARS-CoV and SARS-CoV-2 host receptor in bats compared to other mammals. bioRxiv. 2020 doi: 10.1101/2020.04.20.051656. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friedrich T.C., Frye C.A., Yant L.J., O’Connor D.H., Kriewaldt N.A., Benson M., Vojnov L., Dodds E.J., Cullen C., Rudersdorf R. Extraepitopic compensatory substitutions partially restore fitness to simian immunodeficiency virus variants that escape from an immunodominant cytotoxic-T-lymphocyte response. J. Virol. 2004;78:2581–2585. doi: 10.1128/JVI.78.5.2581-2585.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frieman M., Yount B., Agnihothram S., Page C., Donaldson E., Roberts A., Vogel L., Woodruff B., Scorpio D., Subbarao K., Baric R.S. Molecular determinants of severe acute respiratory syndrome coronavirus pathogenesis and virulence in young and aged mouse models of human disease. J. Virol. 2012;86:884–897. doi: 10.1128/JVI.05957-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gietz R.D., Schiestl R.H. High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat. Protoc. 2007;2:31–34. doi: 10.1038/nprot.2007.13. [DOI] [PubMed] [Google Scholar]
- Gong L.I., Suchard M.A., Bloom J.D. Stability-mediated epistasis constrains the evolution of an influenza protein. eLife. 2013;2:e00631. doi: 10.7554/eLife.00631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hamilton S.R., Bobrowicz P., Bobrowicz B., Davidson R.C., Li H., Mitchell T., Nett J.H., Rausch S., Stadheim T.A., Wischnewski H. Production of complex human glycoproteins in yeast. Science. 2003;301:1244–1246. doi: 10.1126/science.1088166. [DOI] [PubMed] [Google Scholar]
- Harms M.J., Thornton J.W. Evolutionary biochemistry: revealing the historical and physical causes of protein properties. Nat. Rev. Genet. 2013;14:559–571. doi: 10.1038/nrg3540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hensley S.E., Das S.R., Bailey A.L., Schmidt L.M., Hickman H.D., Jayaraman A., Viswanathan K., Raman R., Sasisekharan R., Bennink J.R., Yewdell J.W. Hemagglutinin receptor binding avidity drives influenza A virus antigenic drift. Science. 2009;326:734–736. doi: 10.1126/science.1178258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hiatt J.B., Patwardhan R.P., Turner E.H., Lee C., Shendure J. Parallel, tag-directed assembly of locally derived short sequence reads. Nat. Methods. 2010;7:119–122. doi: 10.1038/nmeth.1416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hilton S.K., Bloom J.D. Modeling site-specific amino-acid preferences deepens phylogenetic estimates of viral sequence divergence. Virus Evol. 2018;4:vey033. doi: 10.1093/ve/vey033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hilton S.K., Huddleston J., Black A., North K., Dingens A.S., Bedford T., Bloom J.D. dms-view: Interactive visualization tool for deep mutational scanning data. Journal of Open Source Software. 2020;5:2353. doi: 10.21105/joss.02353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoffmann M., Kleine-Weber H., Schroeder S., Krüger N., Herrler T., Erichsen S., Schiergens T.S., Herrler G., Wu N.-H., Nitsche A. SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor. Cell. 2020;181:271–280. doi: 10.1016/j.cell.2020.02.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hu B., Zeng L.-P., Yang X.-L., Ge X.-Y., Zhang W., Li B., Xie J.-Z., Shen X.-R., Zhang Y.-Z., Wang N. Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. PLoS Pathog. 2017;13:e1006698. doi: 10.1371/journal.ppat.1006698. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huo J., Zhao Y., Ren J., Zhou D., Duyvesteyn H.M.E., Ginn H.M., Carrique L., Malinauskas T., Ruza R.R., Shah P.N.M. Neutralization of SARS-CoV-2 by Destruction of the Prefusion Spike. Cell Host Microbe. 2020 doi: 10.1016/j.chom.2020.06.010. Published online June 19, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hwang W.C., Lin Y., Santelli E., Sui J., Jaroszewski L., Stec B., Farzan M., Marasco W.A., Liddington R.C. Structural basis of neutralization by a human anti-severe acute respiratory syndrome spike protein antibody, 80R. J. Biol. Chem. 2006;281:34610–34616. doi: 10.1074/jbc.M603275200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jardine J.G., Kulp D.W., Havenar-Daughton C., Sarkar A., Briney B., Sok D., Sesterhenn F., Ereño-Orbea J., Kalyuzhniy O., Deresa I. HIV-1 broadly neutralizing antibody precursor B cells revealed by germline-targeting immunogen. Science. 2016;351:1458–1463. doi: 10.1126/science.aad9195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ju B., Zhang Q., Ge J., Wang R., Sun J., Ge X., Yu J., Shan S., Zhou B., Song S. Human neutralizing antibodies elicited by SARS-CoV-2 infection. Nature. 2020;584:115–119. doi: 10.1038/s41586-020-2380-z. Published online May 26, 2020. [DOI] [PubMed] [Google Scholar]
- Julian M.C., Li L., Garde S., Wilen R., Tessier P.M. Efficient affinity maturation of antibody variable domains requires co-selection of compensatory mutations to maintain thermodynamic stability. Sci. Rep. 2017;7:45259. doi: 10.1038/srep45259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22:2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- Katoh K., Standley D.M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kowalski J.M., Parekh R.N., Mao J., Wittrup K.D. Protein folding stability can determine the efficiency of escape from endoplasmic reticulum quality control. J. Biol. Chem. 1998;273:19453–19458. doi: 10.1074/jbc.273.31.19453. [DOI] [PubMed] [Google Scholar]
- Kowalski J.M., Parekh R.N., Wittrup K.D. Secretion efficiency in Saccharomyces cerevisiae of bovine pancreatic trypsin inhibitor mutants lacking disulfide bonds is correlated with thermodynamic stability. Biochemistry. 1998;37:1264–1273. doi: 10.1021/bi9722397. [DOI] [PubMed] [Google Scholar]
- Kulp D.W., Steichen J.M., Pauthner M., Hu X., Schiffner T., Liguori A., Cottrell C.A., Havenar-Daughton C., Ozorowski G., Georgeson E. Structure-based design of native-like HIV-1 envelope trimers to silence non-neutralizing epitopes and eliminate CD4 binding. Nat. Commun. 2017;8:1655. doi: 10.1038/s41467-017-01549-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kutter J.S., Spronken M.I., Fraaij P.L., Fouchier R.A., Herfst S. Transmission routes of respiratory viruses among humans. Curr. Opin. Virol. 2018;28:142–151. doi: 10.1016/j.coviro.2018.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lam T.T.-Y., Jia N., Zhang Y.-W., Shum M.H.-H., Jiang J.-F., Zhu H.-C., Tong Y.-G., Shi Y.-X., Ni X.-B., Liao Y.-S. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Nature. 2020;583:282–285. doi: 10.1038/s41586-020-2169-0. [DOI] [PubMed] [Google Scholar]
- Lan J., Ge J., Yu J., Shan S., Zhou H., Fan S., Zhang Q., Shi X., Wang Q., Zhang L., Wang X. Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor. Nature. 2020;581:215–220. doi: 10.1038/s41586-020-2180-5. [DOI] [PubMed] [Google Scholar]
- Lang Y., Li W., Li Z., Koerhuis D., van den Burg A.C.S., Rozemuller E., Bosch B.-J., van Kuppeveld F.J.M., Boons G.-J.P.H., Huizinga E.G. Coronavirus hemagglutinin-esterase and spike proteins co-evolve for functional balance and optimal virion avidity. bioRxiv. 2020 doi: 10.1101/2020.04.03.003699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Latinne A., Hu B., Olival K.J., Zhu G., Zhang L., Li H., Chmura A.A., Field H.E., Zambrana-Torrelio C., Epstein J.H. Origin and cross-species transmission of bat coronaviruses in China. bioRxiv. 2020 doi: 10.1101/2020.05.31.116061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee J.M., Huddleston J., Doud M.B., Hooper K.A., Wu N.C., Bedford T., Bloom J.D. Deep mutational scanning of hemagglutinin helps predict evolutionary fates of human H3N2 influenza variants. Proc. Natl. Acad. Sci. USA. 2018;115:E8276–E8285. doi: 10.1073/pnas.1806133115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee J.M., Eguia R., Zost S.J., Choudhary S., Wilson P.C., Bedford T., Stevens-Ayers T., Boeckh M., Hurt A.C., Lakdawala S.S. Mapping person-to-person variation in viral mutations that escape polyclonal serum targeting influenza hemagglutinin. eLife. 2019;8:e49324. doi: 10.7554/eLife.49324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Letko M., Marzi A., Munster V. Functional assessment of cell entry and receptor usage for SARS-CoV-2 and other lineage B betacoronaviruses. Nat. Microbiol. 2020;5:562–569. doi: 10.1038/s41564-020-0688-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li F. Structural analysis of major species barriers between humans and palm civets for severe acute respiratory syndrome coronavirus infections. J. Virol. 2008;82:6984–6991. doi: 10.1128/JVI.00442-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li W., Moore M.J., Vasilieva N., Sui J., Wong S.K., Berne M.A., Somasundaran M., Sullivan J.L., Luzuriaga K., Greenough T.C. Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus. Nature. 2003;426:450–454. doi: 10.1038/nature02145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li F., Li W., Farzan M., Harrison S.C. Structure of SARS coronavirus spike receptor-binding domain complexed with receptor. Science. 2005;309:1864–1868. doi: 10.1126/science.1116480. [DOI] [PubMed] [Google Scholar]
- Li W., Zhang C., Sui J., Kuhn J.H., Moore M.J., Luo S., Wong S.-K., Huang I.-C., Xu K., Vasilieva N. Receptor and viral determinants of SARS-coronavirus adaptation to human ACE2. EMBO J. 2005;24:1634–1643. doi: 10.1038/sj.emboj.7600640. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Louz D., Bergmans H.E., Loos B.P., Hoeben R.C. Animal models in virus research: their utility and limitations. Crit. Rev. Microbiol. 2013;39:325–361. doi: 10.3109/1040841X.2012.711740. [DOI] [PubMed] [Google Scholar]
- Lynch R.M., Wong P., Tran L., O’Dell S., Nason M.C., Li Y., Wu X., Mascola J.R. HIV-1 fitness cost associated with escape from the VRC01 class of CD4 binding site neutralizing antibodies. J. Virol. 2015;89:4201–4213. doi: 10.1128/JVI.03608-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- MacLean O.A., Lytras S., Singer J.B., Weaver S., Kosakovsky Pond S.L., Robertson D.L. Evidence of significant natural selection in the evolution of SARS-CoV-2 in bats, not humans. bioRxiv. 2020 doi: 10.1101/2020.05.28.122366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matreyek K.A., Starita L.M., Stephany J.J., Martin B., Chiasson M.A., Gray V.E., Kircher M., Khechaduri A., Dines J.N., Hause R.J. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 2018;50:874–882. doi: 10.1038/s41588-018-0122-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Menachery V.D., Yount B.L., Jr., Debbink K., Agnihothram S., Gralinski L.E., Plante J.A., Graham R.L., Scobey T., Ge X.-Y., Donaldson E.F. A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. Nat. Med. 2015;21:1508–1513. doi: 10.1038/nm.3985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Menachery V.D., Yount B.L., Jr., Sims A.C., Debbink K., Agnihothram S.S., Gralinski L.E., Graham R.L., Scobey T., Plante J.A., Royal S.R. SARS-like WIV1-CoV poised for human emergence. Proc. Natl. Acad. Sci. USA. 2016;113:3048–3053. doi: 10.1073/pnas.1517719113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mulligan M.J., Lyke K.E., Kitchin N., Absalon J., Gurtman A., Lockhart S.P., Neuzil K., Raabe V., Bailey R., Swanson K.A. Phase 1/2 Study to Describe the Safety and Immunogenicity of a COVID-19 RNA Vaccine Candidate (BNT162b1) in Adults 18 to 55 Years of Age: Interim Report. medRxiv. 2020 doi: 10.1101/2020.06.30.20142570. [DOI] [Google Scholar]
- Otwinowski J., McCandlish D.M., Plotkin J.B. Inferring the shape of global epistasis. Proc. Natl. Acad. Sci. USA. 2018;115:E7550–E7558. doi: 10.1073/pnas.1804015115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ovchinnikov V., Louveau J.E., Barton J.P., Karplus M., Chakraborty A.K. Role of framework mutations and antibody flexibility in the evolution of broadly neutralizing antibodies. eLife. 2018;7:e33038. doi: 10.7554/eLife.33038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pak J.E., Sharon C., Satkunarajah M., Auperin T.C., Cameron C.M., Kelvin D.J., Seetharaman J., Cochrane A., Plummer F.A., Berry J.D., Rini J.M. Structural insights into immune recognition of the severe acute respiratory syndrome coronavirus S protein receptor binding domain. J. Mol. Biol. 2009;388:815–823. doi: 10.1016/j.jmb.2009.03.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peterman N., Levine E. Sort-seq under the hood: implications of design choices on large-scale characterization of sequence-function relations. BMC Genomics. 2016;17:206. doi: 10.1186/s12864-016-2533-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pinto D., Park Y.-J., Beltramello M., Walls A.C., Tortorici M.A., Bianchi S., Jaconi S., Culap K., Zatta F., De Marco A. Cross-neutralization of SARS-CoV-2 by a human monoclonal SARS-CoV antibody. Nature. 2020;583:290–295. doi: 10.1038/s41586-020-2349-y. [DOI] [PubMed] [Google Scholar]
- Pollock D.D., Thiltgen G., Goldstein R.A. Amino acid coevolution induces an evolutionary Stokes shift. Proc. Natl. Acad. Sci. USA. 2012;109:E1352–E1359. doi: 10.1073/pnas.1120084109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Poteete A.R., Rennell D., Bouvier S.E., Hardy L.W. Alteration of T4 lysozyme structure by second-site reversion of deleterious mutations. Protein Sci. 1997;6:2418–2425. doi: 10.1002/pro.5560061115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Povolotskaya I.S., Kondrashov F.A. Sequence space and the ongoing expansion of the protein universe. Nature. 2010;465:922–926. doi: 10.1038/nature09105. [DOI] [PubMed] [Google Scholar]
- Prabakaran P., Gan J., Feng Y., Zhu Z., Choudhry V., Xiao X., Ji X., Dimitrov D.S. Structure of severe acute respiratory syndrome coronavirus receptor-binding domain complexed with neutralizing antibody. J. Biol. Chem. 2006;281:15829–15836. doi: 10.1074/jbc.M600697200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Premkumar L., Segovia-Chumbez B., Jadi R., Martinez D.R., Raut R., Markmann A., Cornaby C., Bartelt L., Weiss S., Park Y. The receptor binding domain of the viral spike protein is an immunodominant and highly specific target of antibodies in SARS-CoV-2 patients. Sci. Immunol. 2020;5:eabc8413. doi: 10.1126/sciimmunol.abc8413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pybus O.G., Rambaut A., Belshaw R., Freckleton R.P., Drummond A.J., Holmes E.C. Phylogenetic evidence for deleterious mutation load in RNA viruses and its contribution to viral evolution. Mol. Biol. Evol. 2007;24:845–852. doi: 10.1093/molbev/msm001. [DOI] [PubMed] [Google Scholar]
- Qu X.-X., Hao P., Song X.-J., Jiang S.-M., Liu Y.-X., Wang P.-G., Rao X., Song H.-D., Wang S.-Y., Zuo Y. Identification of two critical amino acid residues of the severe acute respiratory syndrome coronavirus spike protein for its variation in zoonotic tropism transition via a double substitution strategy. J. Biol. Chem. 2005;280:29588–29595. doi: 10.1074/jbc.M500662200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quinlan B.D., Mou H., Zhang L., Guo Y., He W., Ojha A., Parcells M.S., Luo G., Li W., Zhong G. The SARS-CoV-2 receptor-binding domain elicits a potent neutralizing response without antibody-dependent enhancement. bioRxiv. 2020 doi: 10.1101/2020.04.10.036418. [DOI] [Google Scholar]
- Ravichandran S., Coyle E.M., Klenow L., Tang J., Grubbs G., Liu S., Wang T., Golding H., Khurana S. Antibody signature induced by SARS-CoV-2 spike protein immunogens in rabbits. Sci. Transl. Med. 2020;12:eabc3539. doi: 10.1126/scitranslmed.abc3539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ren W., Qu X., Li W., Han Z., Yu M., Zhou P., Zhang S.-Y., Wang L.-F., Deng H., Shi Z. Difference in receptor usage between severe acute respiratory syndrome (SARS) coronavirus and SARS-like coronavirus of bat origin. J. Virol. 2008;82:1899–1907. doi: 10.1128/JVI.01085-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rockx B., Donaldson E., Frieman M., Sheahan T., Corti D., Lanzavecchia A., Baric R.S. Escape from human monoclonal antibody neutralization affects in vitro and in vivo fitness of severe acute respiratory syndrome coronavirus. J. Infect. Dis. 2010;201:946–955. doi: 10.1086/651022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rogers T.F., Zhao F., Huang D., Beutler N., Burns A., He W.-T., Limbo O., Smith C., Song G., Woehl J. Isolation of potent SARS-CoV-2 neutralizing antibodies and protection from disease in a small animal model. Science. 2020:eabc7520. doi: 10.1126/science.abc7520. Published online June 15, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Russell C.A., Kasson P.M., Donis R.O., Riley S., Dunbar J., Rambaut A., Asher J., Burke S., Davis C.T., Garten R.J. Improving pandemic influenza risk assessment. eLife. 2014;3:e03883. doi: 10.7554/eLife.03883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scanlon T.C., Gray E.C., Griswold K.E. Quantifying and resolving multiple vector transformants in S. cerevisiae plasmid libraries. BMC Biotechnol. 2009;9:95. doi: 10.1186/1472-6750-9-95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmidt A.G., Xu H., Khan A.R., O’Donnell T., Khurana S., King L.R., Manischewitz J., Golding H., Suphaphiphat P., Carfi A. Preconfiguration of the antigen-binding site during affinity maturation of a broadly neutralizing influenza virus antibody. Proc. Natl. Acad. Sci. USA. 2013;110:264–269. doi: 10.1073/pnas.1218256109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwehm J.M., Kristyanne E.S., Biggers C.C., Stites W.E. Stability effects of increasing the hydrophobicity of solvent-exposed side chains in staphylococcal nuclease. Biochemistry. 1998;37:6939–6948. doi: 10.1021/bi9725069. [DOI] [PubMed] [Google Scholar]
- Seydoux E., Homad L.J., MacCamy A.J., Parks K.R., Hurlburt N.K., Jennewein M.F., Akins N.R., Stuart A.B., Wan Y.-H., Feng J. Analysis of a SARS-CoV-2-Infected Individual Reveals Development of Potent Neutralizing Antibodies with Limited Somatic Mutation. Immunity. 2020;53:98–105.e5. doi: 10.1016/j.immuni.2020.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shah P., McCandlish D.M., Plotkin J.B. Contingency and entrenchment in protein evolution under purifying selection. Proc. Natl. Acad. Sci. USA. 2015;112:E3226–E3235. doi: 10.1073/pnas.1412933112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shang J., Ye G., Shi K., Wan Y., Luo C., Aihara H., Geng Q., Auerbach A., Li F. Structural basis of receptor recognition by SARS-CoV-2. Nature. 2020;581:221–224. doi: 10.1038/s41586-020-2179-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sheahan T., Rockx B., Donaldson E., Sims A., Pickles R., Corti D., Baric R. Mechanisms of zoonotic severe acute respiratory syndrome coronavirus host range expansion in human airway epithelium. J. Virol. 2008;82:2274–2285. doi: 10.1128/JVI.02041-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sheahan T., Rockx B., Donaldson E., Corti D., Baric R. Pathways of cross-species transmission of synthetically reconstructed zoonotic severe acute respiratory syndrome coronavirus. J. Virol. 2008;82:8721–8732. doi: 10.1128/JVI.00818-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shi R., Shan C., Duan X., Chen Z., Liu P., Song J., Song T., Bi X., Han C., Wu L. A human neutralizing antibody targets the receptor-binding site of SARS-CoV-2. Nature. 2020;584:120–124. doi: 10.1038/s41586-020-2381-y. [DOI] [PubMed] [Google Scholar]
- Shusta E.V., Kieke M.C., Parke E., Kranz D.M., Wittrup K.D. Yeast polypeptide fusion surface display levels predict thermal stability and soluble secretion efficiency. J. Mol. Biol. 1999;292:949–956. doi: 10.1006/jmbi.1999.3130. [DOI] [PubMed] [Google Scholar]
- Smith D.J., Lapedes A.S., de Jong J.C., Bestebroer T.M., Rimmelzwaan G.F., Osterhaus A.D.M.E., Fouchier R.A.M. Mapping the antigenic and genetic evolution of influenza virus. Science. 2004;305:371–376. doi: 10.1126/science.1097211. [DOI] [PubMed] [Google Scholar]
- Soskine M., Tawfik D.S. Mutational effects and the evolution of new protein functions. Nat. Rev. Genet. 2010;11:572–582. doi: 10.1038/nrg2808. [DOI] [PubMed] [Google Scholar]
- Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Starr T.N., Thornton J.W. Epistasis in protein evolution. Protein Sci. 2016;25:1204–1218. doi: 10.1002/pro.2897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Starr T.N., Flynn J.M., Mishra P., Bolon D.N.A., Thornton J.W. Pervasive contingency and entrenchment in a billion years of Hsp90 evolution. Proc. Natl. Acad. Sci. USA. 2018;115:4453–4458. doi: 10.1073/pnas.1718133115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Suthar M.S., Zimmerman M., Kauffman R., Mantus G., Linderman S., Vanderheiden A., Nyhoff L., Davis C., Adekunle S., Affer M. Rapid generation of neutralizing antibody responses in COVID-19 patients. Cell Reports Medicine. 2020;1:100040. doi: 10.1016/j.xcrm.2020.100040. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tareen A., Kinney J.B. Logomaker: beautiful sequence logos in Python. Bioinformatics. 2020;36:2272–2274. doi: 10.1093/bioinformatics/btz921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tien M.Z., Meyer A.G., Sydykova D.K., Spielman S.J., Wilke C.O. Maximum allowed solvent accessibilites of residues in proteins. PLoS ONE. 2013;8:e80635. doi: 10.1371/journal.pone.0080635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tokuriki N., Stricher F., Serrano L., Tawfik D.S. How protein stability and new functions trade off. PLoS Comput. Biol. 2008;4:e1000002. doi: 10.1371/journal.pcbi.1000002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tong S., Conrardy C., Ruone S., Kuzmin I.V., Guo X., Tao Y., Niezgoda M., Haynes L., Agwanda B., Breiman R.F. Detection of novel SARS-like and other coronaviruses in bats from Kenya. Emerg. Infect. Dis. 2009;15:482–485. doi: 10.3201/eid1503.081013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trkola A., Kuster H., Rusert P., Joos B., Fischer M., Leemann C., Manrique A., Huber M., Rehr M., Oxenius A. Delay of HIV-1 rebound after cessation of antiretroviral therapy through passive transfer of human neutralizing antibodies. Nat. Med. 2005;11:615–622. doi: 10.1038/nm1244. [DOI] [PubMed] [Google Scholar]
- van Dorp L., Acman M., Richard D., Shaw L.P., Ford C.E., Ormond L., Owen C.J., Pang J., Tan C.C.S., Boshier F.A.T. Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infect. Genet. Evol. 2020;83:104351. doi: 10.1016/j.meegid.2020.104351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- VanderPlas J., Granger B., Heer J., Moritz D., Wongsuphasawat K., Satyanarayan A., Lees E., Timofeev I., Welsh B., Sievert S. Altair: Interactive Statistical Visualizations for Python. JOSS. 2018;3:1057. [Google Scholar]
- Walls A.C., Xiong X., Park Y.-J., Tortorici M.A., Snijder J., Quispe J., Cameroni E., Gopal R., Dai M., Lanzavecchia A. Unexpected Receptor Functional Mimicry Elucidates Activation of Coronavirus Fusion. Cell. 2019;176:1026–1039.e15. doi: 10.1016/j.cell.2018.12.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walls A.C., Park Y.-J., Tortorici M.A., Wall A., McGuire A.T., Veesler D. Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein. Cell. 2020;181:281–292.e6. doi: 10.1016/j.cell.2020.02.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wan Y., Shang J., Graham R., Baric R.S., Li F. Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus. J. Virol. 2020;94:e00127-20. doi: 10.1128/JVI.00127-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang X., Minasov G., Shoichet B.K. Evolution of an antibiotic resistance enzyme constrained by stability and activity trade-offs. J. Mol. Biol. 2002;320:85–95. doi: 10.1016/S0022-2836(02)00400-X. [DOI] [PubMed] [Google Scholar]
- Weidenbacher P.A., Kim P.S. Protect, modify, deprotect (PMD): A strategy for creating vaccines to elicit antibodies targeting a specific epitope. Proc. Natl. Acad. Sci. USA. 2019;116:9947–9952. doi: 10.1073/pnas.1822062116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weile J., Roth F.P. Multiplexed assays of variant effects contribute to a growing genotype-phenotype atlas. Hum. Genet. 2018;137:665–678. doi: 10.1007/s00439-018-1916-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wentz A.E., Shusta E.V. A novel high-throughput screen reveals yeast genes that increase secretion of heterologous proteins. Appl. Environ. Microbiol. 2007;73:1189–1198. doi: 10.1128/AEM.02427-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wrapp D., Wang N., Corbett K.S., Goldsmith J.A., Hsieh C.-L., Abiona O., Graham B.S., McLellan J.S. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science. 2020;367:1260–1263. doi: 10.1126/science.abb2507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wrapp D., De Vlieger D., Corbett K.S., Torres G.M., Wang N., Van Breedam W., Roose K., van Schie L., Hoffmann M., Pöhlmann S., VIB-CMB COVID-19 Response Team Structural Basis for Potent Neutralization of Betacoronaviruses by Single-Domain Camelid Antibodies. Cell. 2020;181:1004–1015.e15. doi: 10.1016/j.cell.2020.04.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wrobel A.G., Benton D.J., Xu P., Roustan C., Martin S.R., Rosenthal P.B., Skehel J.J., Gamblin S.J. SARS-CoV-2 and bat RaTG13 spike glycoprotein structures inform on virus evolution and furin-cleavage effects. Nat. Struct. Mol. Biol. 2020;27:763–767. doi: 10.1038/s41594-020-0468-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu X., Yang Z.-Y., Li Y., Hogerkorp C.-M., Schief W.R., Seaman M.S., Zhou T., Schmidt S.D., Wu L., Xu L. Rational design of envelope identifies broadly neutralizing human monoclonal antibodies to HIV-1. Science. 2010;329:856–861. doi: 10.1126/science.1187659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu K., Peng G., Wilken M., Geraghty R.J., Li F. Mechanisms of host receptor adaptation by severe acute respiratory syndrome coronavirus. J. Biol. Chem. 2012;287:8904–8911. doi: 10.1074/jbc.M111.325803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu N.C., Xie J., Zheng T., Nycholat C.M., Grande G., Paulson J.C., Lerner R.A., Wilson I.A. Diversity of Functionally Permissive Sequences in the Receptor-Binding Site of Influenza Hemagglutinin. Cell Host Microbe. 2017;21:742–753.e8. doi: 10.1016/j.chom.2017.05.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu N.C., Thompson A.J., Lee J.M., Su W., Arlian B.M., Xie J., Lerner R.A., Yen H.-L., Bloom J.D., Wilson I.A. Different genetic barriers for resistance to HA stem antibodies in influenza H3 and H1 viruses. Science. 2020;368:1335–1340. doi: 10.1126/science.aaz5143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y., Wang F., Shen C., Peng W., Li D., Zhao C., Li Z., Li S., Bi Y., Yang Y. A noncompeting pair of human neutralizing antibodies block COVID-19 virus binding to its receptor ACE2. Science. 2020;368:1274–1278. doi: 10.1126/science.abc2241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xue K.S., Bloom J.D. Linking influenza virus evolution within and between human hosts. Virus Evol. 2020;6:veaa010. doi: 10.1093/ve/veaa010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan R., Zhang Y., Li Y., Xia L., Guo Y., Zhou Q. Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2. Science. 2020;367:1444–1448. doi: 10.1126/science.abb2762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang J., Wang W., Chen Z., Lu S., Yang F., Bi Z., Bao L., Mo F., Li X., Huang Y. A vaccine targeting the RBD of the S protein of SARS-CoV-2 induces protective immunity. Nature. 2020 doi: 10.1038/s41586-020-2599-8. Published online July 29, 2020. [DOI] [PubMed] [Google Scholar]
- Yuan M., Liu H., Wu N.C., Lee C.-C.D., Zhu X., Zhao F., Huang D., Yu W., Hua Y., Tien H. Structural basis of a shared antibody response to SARS-CoV-2. Science. 2020 doi: 10.1126/science.abd2321. Published online 13 July 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yuan M., Wu N.C., Zhu X., Lee C.D., So R.T.Y., Lv H., Mok C.K.P., Wilson I.A. A highly conserved cryptic epitope in the receptor binding domains of SARS-CoV-2 and SARS-CoV. Science. 2020;368:630–633. doi: 10.1126/science.abb7269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zang J., Gu C., Zhou B., Zhang C., Yang Y., Xu S., Zhang X., Zhou Y., Bai L., Wu Y. Immunization with the receptor-binding domain of SARS-CoV-2 elicits antibodies cross-neutralizing SARS-CoV-2 and SARS-CoV without antibody-dependent enhancement. bioRxiv. 2020 doi: 10.1101/2020.05.21.107565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang B.-Z., Hu Y.-F., Chen L.-L., Yau T., Tong Y.-G., Hu J.-C., Cai J.-P., Chan K.-H., Dou Y., Deng J. Mining of epitopes on spike protein of SARS-CoV-2 from COVID-19 patients. Cell Res. 2020;30:702–704. doi: 10.1038/s41422-020-0366-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou H., Chen X., Hu T., Li J., Song H., Liu Y., Wang P., Liu D., Yang J., Holmes E.C. A Novel Bat Coronavirus Closely Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site of the Spike Protein. Curr. Biol. 2020;30:2196–2203. doi: 10.1016/j.cub.2020.05.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou P., Yang X.-L., Wang X.-G., Hu B., Zhang L., Zhang W., Si H.-R., Zhu Y., Li B., Huang C.-L. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270–273. doi: 10.1038/s41586-020-2012-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou T., Teng I.-T., Olia A.S., Cerutti G., Gorman J., Nazzari A., Shi W., Tsybovsky Y., Wang L., Wang S. Structure-Based Design with Tag-Based Purification and In-Process Biotinylation Enable Streamlined Development of SARS-CoV-2 Spike Molecular Probes. bioRxiv. 2020 doi: 10.1101/2020.06.22.166033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zost S.J., Gilchuk P., Case J.B., Binshtein E., Chen R.E., Nkolola J.P., Schäfer A., Reidy J.X., Trivette A., Nargi R.S. Potently neutralizing and protective human antibodies against SARS-CoV-2. Nature. 2020 doi: 10.1038/s41586-020-2548-6. Published online July 15, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
We provide all data and code in the following ways:
-
•
Raw data tables of our replicate functional scores at the level of single mutations (Table S2, and GitHub: https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/single_mut_effects/single_mut_effects.csv)
-
•
Raw data tables of our replicate functional scores among sarbecovirus homologs (Table S1 and GitHub: https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/single_mut_effects/homolog_effects.csv)
-
•
Interactive heatmaps for lookup of individual mutational effects and related information (Data S1 and GitHub: https://jbloomlab.github.io/SARS-CoV-2-RBD_DMS/)
-
•
Illumina sequencing counts for each barcode among FACS bins (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/counts/variant_counts.csv)
-
•
The complete variant:barcode lookup table (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/variants/codon_variant_table.csv)
-
•
The complete computational workflow to generate and analyze these data, including reproducible code within a programmatically constructed computational environment (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS)
-
•
A Markdown summary of the organization of analysis steps, with links to key data files and Markdown summaries of each step in the analysis pipeline (https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS/blob/master/results/summary/summary.md), with specific Markdown summaries linked in the relevant STAR Methods sections below
-
•
All raw sequencing data are uploaded to the NCBI Short Read Archive: BioProject PRJNA639956.