Skip to main content
eLife logoLink to eLife
. 2024 Jun 28;13:RP94836. doi: 10.7554/eLife.94836

Modulation of biophysical properties of nucleocapsid protein in the mutant spectrum of SARS-CoV-2

Ai Nguyen 1, Huaying Zhao 1, Dulguun Myagmarsuren 1, Sanjana Srinivasan 1, Di Wu 2, Jiji Chen 3, Grzegorz Piszczek 2, Peter Schuck 1,
Editors: Mauricio Comas-Garcia4, Qiang Cui5
PMCID: PMC11213569  PMID: 38941236

Abstract

Genetic diversity is a hallmark of RNA viruses and the basis for their evolutionary success. Taking advantage of the uniquely large genomic database of SARS-CoV-2, we examine the impact of mutations across the spectrum of viable amino acid sequences on the biophysical phenotypes of the highly expressed and multifunctional nucleocapsid protein. We find variation in the physicochemical parameters of its extended intrinsically disordered regions (IDRs) sufficient to allow local plasticity, but also observe functional constraints that similarly occur in related coronaviruses. In biophysical experiments with several N-protein species carrying mutations associated with major variants, we find that point mutations in the IDRs can have nonlocal impact and modulate thermodynamic stability, secondary structure, protein oligomeric state, particle formation, and liquid-liquid phase separation. In the Omicron variant, distant mutations in different IDRs have compensatory effects in shifting a delicate balance of interactions controlling protein assembly properties, and include the creation of a new protein-protein interaction interface in the N-terminal IDR through the defining P13L mutation. A picture emerges where genetic diversity is accompanied by significant variation in biophysical characteristics of functional N-protein species, in particular in the IDRs.

Research organism: Viruses

eLife digest

Like other types of RNA viruses, the genetic material of SARS-CoV-2 (the agent responsible for COVID-19) is formed of an RNA molecule which is prone to accumulating mutations. This gives SARS-CoV-2 the ability to evolve quickly, and often to remain one step ahead of treatments. Understanding how these mutations shape the behavior of RNA viruses is therefore crucial to keep diseases such as COVID-19 under control.

The gene that codes for the protein that ‘packages’ the genetic information inside SARS-CoV-2 is particularly prone to mutations. This nucleocapsid (N) protein participates in many key processes during the life cycle of the virus, including potentially interfering with the immune response. Exactly how the physical properties of the N-Protein are impacted by the mutations in its genetic sequence remains unclear.

To investigate this question, Nguyen et al. predicted the various biophysical properties of different regions of the N-protein based on a computer-based analysis of SARS-CoV-2 genetic databases. This allowed them to determine if specific protein regions were positively or negatively charged in different mutants. The analyses showed that some domains exhibited great variability in their charge between protein variants – reflecting the fact that the corresponding genetic sequences showed high levels of plasticity. Other regions remained conserved, however, including across related coronaviruses.

Nguyen et al. also conducted biochemical experiments on a range of N-proteins obtained from clinically relevant SARS-CoV-2 variants. Their results highlighted the importance of protein segments with no fixed three-dimensional structure. Mutations in the related sequences created high levels of variation in the physical properties of these ‘intrinsically disordered’ regions, which had wide-ranging consequences. Some of these genetic changes even gave individual N-proteins the ability to interact with each other in a completely new way.

These results shed new light on the relationship between genetic mutations and the variable physical properties of RNA virus proteins. Nguyen et al. hope that this knowledge will eventually help to develop more effective treatments for viral infections.

Introduction

A salient characteristic of RNA viruses is their high error rate in transcription and their resulting quasispecies nature (Eigen, 1996; Domingo and Holland, 1997). This diversity is also reflected in the ensemble of consensus sequences sampled across the infected host population, as is apparent in the GISAID (Global Initiative on Sharing All Influenza Data) repository of SARS-CoV-2 genomes (Elbe and Buckland-Merrett, 2017). With currently ≈15 million entries, this unprecedented large database has provided the basis for phylogenetic analyses that have identified critical amino acid mutations associated with immune evasion, infectivity, and disease severity, and allowed the rapid identification of variants of concern (Greaney et al., 2022; Kepler et al., 2021; Obermeyer et al., 2022; Rochman et al., 2021; Viana et al., 2022). The vast majority of mutations, however, seem inconsequential in that they usually do not lead to any fixed substitutions. Nonetheless, the mutant spectrum exhaustively describes a landscape of amino acids that may occupy any position in the viral proteins, as in a natural deep mutational scan (Bloom and Neher, 2023; Schuck and Zhao, 2023; Zhao et al., 2022). Biophysical constraints implicit in the shape of such landscapes are key to understand the function and molecular evolution of viral proteins (Starr and Thornton, 2016; Wang et al., 2021).

Unfortunately, the wealth of genomic information on SARS-CoV-2 stands in stark contrast with our knowledge of the phenotypic consequences of sequence mutations. In conjunction with biophysical and structural studies, inspections of local mutations have increased our understanding of mechanisms of SARS-CoV-2 entry, mechanisms of replication and assembly, and interaction with various host factors (Dadonaite et al., 2023; Del Veliz et al., 2021; Greaney et al., 2022; Hu et al., 2023; Stevens et al., 2022; Syed et al., 2021; Zhao et al., 2023; Zhao et al., 2022). Furthermore, the range of naturally occurring mutations at target sites is an important consideration for potential drugs, vaccines, and diagnostics (Artesi et al., 2020; Saldivar-Espinoza et al., 2022; Tian et al., 2022). Outside these focused studies of relatively well-understood hot spots, however, the mutational landscape has remained relatively unexplored.

Biophysical fitness landscapes have been studied with regard to observables such as thermal stability of globular proteins, solvent accessibility, catalytic activity, or binding affinity of protein-protein interfaces, which has led to significant advances in understanding relationship between molecular properties, population fitness, and evolutionary processes (Bershtein et al., 2017; Bloom et al., 2006; Echave and Wilke, 2017; Lässig et al., 2017; Liberles et al., 2012; Serohijos and Shakhnovich, 2014; Sikosek and Chan, 2014; Wang et al., 2015). However, it was found that constraints for evolution of intrinsically disordered regions (IDRs) are much different from those of globular proteins (Brown et al., 2010; Lafforgue et al., 2022). Generally, intrinsic disorder and loose packing is a common characteristic of many RNA virus proteins (Tokuriki et al., 2009), which is thought to promote functional promiscuity, permit greater diversity, and enhance evolvability to adopt new functions with few mutations (Charon et al., 2018; Gitlin et al., 2014; Tokuriki and Tawfik, 2009). One possible mechanism is viral mimicry of host-protein short linear motifs (SLiMs) that allow binding to host protein domains and cause subversion of host cellular pathways (Davey et al., 2015; Davey et al., 2011; Hagai et al., 2014; Kruse et al., 2021; Mihalič et al., 2023; Schuck and Zhao, 2023; Shuler and Hagai, 2022). It was also shown how nonlocal biophysical properties, such as the charge of IDRs, can be relevant evolutionary traits (Zarin et al., 2021; Zarin et al., 2017). More recently, it was recognized that the formation of membrane-less cellular compartments driven by liquid-liquid phase separation (LLPS) is a key aspect of many intrinsically disordered proteins, including many viral proteins (Cascarina and Ross, 2022; Zhang et al., 2023). What kind of sequence constraints may derive from the biophysical requirement to conserve LLPS properties is currently only emerging (Brown et al., 2011; Chin et al., 2022; Ho and Huang, 2022; Lin et al., 2017; Riback et al., 2017).

The goal of the present work is to probe the phenotypic diversity with respect to several biophysical properties of SARS-CoV-2 nucleocapsid (N-)protein, taking advantage of the vast mutational landscape of SARS-CoV-2. N-protein is the most abundant viral protein in the infected cell (Finkel et al., 2021), and as we reported previously (Zhao et al., 2022), it is also the most diverse structural protein with approximately 86% of its 419 residues capable of assuming on average four to five different amino acids evidently without impairment of viability. The highest frequency of mutations occurs in the substantial IDRs which are the N-arm, linker, and C-arm that flank and connect the folded nucleic acid binding domain (NTD) and the dimerization domain (CTD) (Figure 1). The IDRs comprise approximately half of the molecule and allow large conformational fluctuations (Botova et al., 2024; Cubuk et al., 2021; Redzic et al., 2021). The eponymous structural function of N-protein is that of scaffolding genomic RNA for virion assembly. It proceeds via nucleic acid (NA) binding-induced conformational changes and oligomerization, leading to the formation of ribonucleoprotein (RNP) particles with as-of-yet unknown molecular architecture, ≈38 of which are arranged like beads-on-a-string in the viral particle (Carlson et al., 2022; Cubuk et al., 2021; Klein et al., 2020; Yao et al., 2020; Zhao et al., 2024; Zhao et al., 2023; Zhao et al., 2021), and are anchored through binding of N-protein to viral M-protein (Lu et al., 2021; Masters, 2019). Beyond this structural role, N-protein is highly multifunctional and binds to multiple host proteins to modulate or exploit different pathways, including stress granules (Biswal et al., 2022; Gordon et al., 2020; Savastano et al., 2020), the type 1 interferon signaling pathway (Chen et al., 2020; Li et al., 2020), the NLRP3 inflammasome (Pan et al., 2021), and others, as recently reviewed (Wu et al., 2023; Yu et al., 2023). N-protein can form macromolecular condensates through LLPS that aid in assembly functions and interactions with host proteins (Carlson et al., 2020; Cascarina and Ross, 2022; Cubuk et al., 2021; Iserman et al., 2020; Jack et al., 2021; Lu et al., 2021; Perdikari et al., 2020; Savastano et al., 2020). In addition, it is also localized at exterior cell surfaces, where it was found to bind many different chemokines, likely manipulating innate immunity through chemokine sequestration (López-Muñoz et al., 2022).

Figure 1. Structural organization and sequence plasticity of N-protein.

Figure 1.

(A) Schematics of folded regions (NTD and CTD, rectangles) and disordered regions (N-arm, linker, and C-arm, straight line) along the N-protein sequence. Defining mutations from the Delta variant are indicated in blue, those from Omicron variants in magenta. Transient helices in the disordered regions are highlighted, as well as SR-rich and L-rich linker sequences and the C-terminal N3 region. (B) Histogram of the number of distinct amino acid mutations at each position. For clarity and reference to other figures, intrinsically disordered regions (IDRs) are shaded with N-arm highlighted in yellow, linker in magenta, and C-arm in cyan.

The large number of structural and non-structural N-protein functions poses the question of how they are conserved in light of the significant sequence diversity. In the present work we computationally evaluate the range of several biophysical traits resulting from diversity in the SARS-CoV-2 N-protein folded domains and IDRs across the observed mutant spectrum, as well as related coronaviruses. In complementary biophysical experiments with several representative N-protein mutants derived from SARS-CoV-2 variants of concern, we characterize their variation in thermodynamic stability, secondary structure, oligomeric state, energetics of NA binding, assembly, and LLPS propensity. We find that a large biophysical parameter space is available for viable N-protein, with the potential for mutations to exert nonlocal effects modulating overall protein biophysical properties.

Results

Distribution of physicochemical properties across the SARS-CoV-2 mutant spectrum

SARS-CoV-2 sequence data were downloaded from Nextstrain (Hadfield et al., 2018) in January 2023 and 5.06 million high-quality sequences were selected for analysis. The N-protein amino acid sequences exhibit ≈43 million instances of mutations distributed across ≈92% of its residues. We have previously characterized this dataset with regard to the amino acid mutational landscape of N-protein, and found mutation frequencies that are strongly dependent on position and largely time-invariant, except for the defining mutations arising in variants of concern, the latter comprising ≈36% Delta variant and ≈49% Omicron variant sequences (Schuck and Zhao, 2023). A histogram of the number of different amino acids mutations that are found at each residue is shown in Figure 1B. It may be discerned that sequence plasticity is highest in the IDRs, with an average of 5.2 different possible amino acid mutations at each residue compared to 2.9 different mutations on average in the folded domains.

Exploiting the N-protein mutational landscape and sequence data, previous work in our laboratory has focused on local amino acid sequence properties such as mutation effects on transient structural features in the linker IDR (Zhao et al., 2023) and the creation of SLiMs (Schuck and Zhao, 2023). However, nonlocal biophysical properties may also be functionally critical and evolutionarily conserved despite amino acid sequence heterogeneity in IDRs (Zarin et al., 2021; Zarin et al., 2017). The sequence ensembles extracted from the genomic database allow us to ask whether physicochemical properties are constrained or can vary across viable sequences of the mutation spectrum.

To this end, genome data were sorted into unique groups with distinct N-protein amino acid sequences, each sequence carrying a set of distinct mutations that represent a viable N-protein species. For a robust analysis, each mutated sequence was required to be represented in at least 10 different genomes in the database. This led to 6300 distinct full-length N-protein sequences (N-FL; 1–419). We similarly subdivided the N-protein into different regions (Figure 1A) and grouped unique sets of mutations in each region: For the folded domains we found 720 distinct NTD (N:45–179) and 399 distinct CTD (N:248–363) sequences, while for the IDRs there are 512 N-arm (N:1–44), 1039 linker (N:175–247), and 556 C-arm (N:364–419) sequences. (Due to ambiguity in delineation between NTD and linker, designations overlapping in 175–180 were used to avoid artificial truncation and permit conservative evaluation of the properties of each domain.) Further subdividing the linker there are 349 distinct sequences for the SR-rich region (N:175–205) and 442 for the L-rich region (N:206–247), respectively. Finally, similarly subdividing the C-arm we obtained the 176 sequences for the N3 region (N:390–419) and 242 for the remainder of the C-arm (N:364–389).

We first examine polarity and hydrophobicity of N-protein and different regions based on their amino acid compositions. As shown in beehive plots of Figure 2, where each of the partially overlapping black dots represents one species from the cloud of mutant sequences, the index values of all N-FL sequences fall within a very narrow range (left column). Properties of the full-length protein may obscure significant differences on a smaller scale, in particular since the polarity and hydrophobicity indices are weighted-average properties. Focusing on folded N-protein modules, we find that hydrophobicity is uniformly high and polarity correspondingly low in the folded NTD and CTD domains, which is consistent with the expectation that folded structures are stabilized by buried hydrophobic residues (Eisenberg and McLachlan, 1986; Kauzmann, 1959). By contrast, IDRs exhibit significantly higher polarity and lower hydrophobicity. In particular, the N-arm and C-arm are most polar: despite a very large dispersion across the mutant spectrum, their values do not overlap with those of the folded domains.

Figure 2. Beehive plots showing the distributions of polarity and hydrophobicity of viable N-protein species across the mutant spectrum.

Figure 2.

The polarity index (A) and hydrophobicity index (B) was calculated based on amino acid composition for all distinct sequences of N-FL, the folded domains (NTD and CTD), and the intrinsically disordered regions (IDRs) (N-arm, linker, and C-arm). Further subdivisions of the linker into the SR-rich and L-rich regions, and subdivisions of the C-arm into the N3 region and the C-terminal remainder of the C-arm (C-arm1) are indicated in the arrows. Highlighted by horizontal lines are the values for the corresponding peptides from the ancestral sequence Wuhan-Hu-1 (blue), and including the defining mutations of the Delta variant (dotted red) and the Omicron variant (dashed green), respectively. Symbols indicate values for SARS-CoV-2 (ancestral reference, light blue circles), and corresponding peptides from SARS-CoV-1 (red up triangles), Middle East respiratory syndrome coronavirus (MERS) (red down triangles), murine hepatitis virus (MHV) (red squares), human coronavirus NL63 (gray pentagrams), and the bat coronavirus APD51511.1 (gray diamonds).

It is useful to subdivide the linker IDR further to distinguish the SR-rich region (N:175–205), which exhibits high polarity and low hydrophobicity, from the L-rich region (N:206–247), which exhibits opposite behavior and is among the sequence stretches with lowest polarity values and highest hydrophobicity (Figure 2, red arrows in magenta shaded columns). Despite significant spread across the mutant spectrum, there is no overlap in these properties, which suggests biophysical constraints require the distinct polar and non-polar properties of the SR-rich region and the L-rich region, respectively. Indeed, these regions in the linker IDR have been recognized to play distinct functional roles: The SR-rich region provides a major hub for phosphorylation, aids in NA binding, and mediates NA binding-induced allosteric interactions between NTD and the L-rich region (Pontoriero et al., 2022; Yaron et al., 2022; Zhao et al., 2023). This is distinct from the L-rich region, which has a propensity for the formation of transient helices that interact with NSP3 (Bessa et al., 2022), and can assemble via hydrophobic interactions to form coiled-coil oligomers that contribute to the architecture of RNPs in viral assembly (Adly et al., 2023; Zhao et al., 2024; Zhao et al., 2023).

Similarly, the C-arm IDR can be subdivided into the N3 region (N:390–419) and the remainder (‘C-arm1’, N:364–389), which also have strikingly different properties (Figure 2, blue arrows in cyan shaded columns): Whereas the connecting C-arm portion is by far the most polar, the N-terminal N3 region is among the most hydrophobic regions of the entire protein. Interestingly, the N3 region contains a transient helix (Cubuk et al., 2021; Zhao et al., 2023; Zhao et al., 2022), which may be involved in recognition of the packaging signal and M-protein interactions localized here (Kuo et al., 2016; Masters, 2019). Again, the difference in the physicochemical properties of these regions persists throughout the entire ensemble of sequences despite their significant spread and high mutation frequencies (Figure 1B).

Charges in proteins can control multiple properties related to electrostatic interactions, from functions of active sites to protein solubility, protein interactions, and conformational ensembles in IDRs (Garcia-Viloca et al., 2004; Gerstein and Chothia, 1996; Gitlin et al., 2006; Mao et al., 2010). The net charges of the different N-protein regions at pH 7.4 are displayed in Figure 3A. Similar to polarity and hydrophobicity, viable sequences can have significant spread of net charges among all the mutants, amounting to departures by ±(1–2) from the ancestral sequence. This is expected considering the replacement and introduction of charged residues in the mutational landscape, e.g., including those from the defining substitutions of variants. The positive charge of the overall basic protein is shared similarly among all folded domains and IDRs. However, noteworthy is again the contrast arising from subdivision of the linker and C-arm, which displays uneven and non-overlapping distributions: despite the strongly basic character of the linker, its L-rich sequence is nearly neutral; similarly, the basic C-arm splits into an even more basic C-arm1 and an acidic N3 tail region. These differences are highly significant and persist throughout the mutant spectrum.

Figure 3. Beehive plots showing the distributions of charges of viable N-protein species.

Figure 3.

(A) Charges were calculated based on the amino acid composition of different N-protein regions as in Figure 2. Highlighted by horizontal lines are the values for the corresponding peptides from the ancestral sequence Wuhan-Hu-1 (blue), and including the defining mutations of the Delta variant (dotted red) and the Omicron variant (dashed green), respectively. Symbols indicate values for SARS-CoV-2 (ancestral sequence, blue circles), SARS-CoV-1 (red up triangles), Middle East respiratory syndrome coronavirus (MERS) (red down triangles), murine hepatitis virus (MHV) (red squares), NL63 (gray pentagrams), and bat coronavirus APD51511.1 (gray diamonds). (B) Same as in (A), with added charges from maximally phosphorylated serine, threonine, and tyrosine residues in the intrinsically disordered regions (IDRs).

It is well established that intracellular N-protein can be heavily phosphorylated (in contrast to N-protein in the virion) (Botova et al., 2024; Carlson et al., 2020; Fung and Liu, 2018; Johnson et al., 2022; Yaron et al., 2022). As reviewed in Yaron et al., 2022, most serine, threonine, and tyrosine residues in the disordered regions (30 of 37) have been found phosphorylated in different proteomic analyses. Accordingly, we estimated the maximum charge when all of these residues in the IDRs are phosphorylated (Figure 3B). This leads to a negative charge for all IDRs. As might be expected, the largest impact was found in the SR-rich region of the linker, which carries the highest density of phosphorylation sites. Interestingly, despite the considerable spread of net charges within families of mutant sequences, the differences between the regions remain highly significant.

It is noteworthy that the defining mutations of the Delta and Omicron variant (denoted by dotted red and dashed green horizontal lines, respectively) do impact the hydrophobicity, polarity, and charges in all of the N-protein regions. However, their values do not stand out from the clouds of values across the mutant spectrum, which include more extreme values throughout.

Physicochemical properties of related coronaviruses

The distinct physicochemical properties of the linker and C-arm sub-segments persist throughout the mutant spectrum, which suggests these constitute biophysical constraints for functional SARS-CoV-2 N-protein. Therefore, we asked whether this holds true for N-protein from related coronaviruses such as SARS-CoV-1 (P59595.1), Middle East respiratory syndrome coronavirus (MERS, YP_009047211.1), murine hepatitis virus (MHV, NP_045302.1), human coronavirus NL63 (Q6Q1R8.1), and the 229E-related bat coronavirus APD51511.1. To this end, we used alignment of their consensus sequences to SARS-CoV-2 N-protein (shown previously; Zhao et al., 2022) to subdivide all N-proteins into equivalent regions (Supplementary file 1). As shown in Table 1, the resulting peptides present high sequence identity scores for the FL protein and the folded domains, but, with exception of SARS-CoV-1, have little to no sequence identity in the IDRs. This observation is consistent with the high mutation frequency of the IDRs.

Table 1. Sequence alignment score of segments from related coronaviruses.

Virus Full-length N-arm NTD Linker SR-rich L-rich CTD C-arm C-arm1 N3
SARS-CoV-1 672* 68.6 263 41.6 44.7 30 231 60.5 75.3 77
MERS 276 13.9 157 112 14.6 23.5
MHV 192 114 14.6 80.5 14.6 13.4
NL63 67.4 58.9 61.6
APD51511.1 61.2 44.3 44.3
*

Values are BLASTp total alignment scores.

The resulting peptides were subjected to the same analyses of physicochemical properties described above for SARS-CoV-2 N-protein. The results are displayed in Figures 2 and 3 as symbols. With regard to hydrophobicity (Figure 2B), the FL proteins and folded domains show values within the range of the SARS-CoV-2 mutant spectrum. By contrast, more significant spread is observed in most IDR peptides. Nonetheless, the pattern observed for SARS-CoV-2 of hydrophobicity and polarity values of IDRs relative to those of the folded domains, and the pattern comparing subdivisions of the IDRs is closely mirrored for SARS-CoV-1, MERS, and MHV (red symbols). Similar patterns, although with some divergence, are observed for the NL63 and APD51511.1 IDRs (gray pentagrams and diamonds, respectively) which have the least sequence identity to SARS-CoV-2.

Polarity values (Figure 2A) of all coronavirus linker peptides are higher than either their corresponding FL, NTD, or CTD regions. The subdivision of the linker in the peptides corresponding to SR-rich and L-rich regions of SARS-CoV-2 follow the same qualitative trend, with higher polarity in the equivalent SR-rich and lower polarity in the equivalent L-rich peptides for all coronaviruses studied. Similarly, the properties of the equivalent C-arm and subdivision of C-arm1 and N3 peptides for SARS-CoV-1, MERS, and MHV (red symbols) closely track the values from the mutant spectrum of SARS-CoV-2, although this is not the case for the more distant NL63 and APD51511.1 (gray symbols).

Charge properties of related coronaviruses follow a similar pattern of SARS-CoV-2 (Figure 3A), although with somewhat greater differences, particularly again for NL63 and APD51511.1. Peptides corresponding to L-rich regions exhibit low charge, distinctly below those of the SR-rich regions, and similarly, N3 peptides have lower charges than C-arm-1 peptides of the corresponding viral species, and nearly all are acidic. Even though it is unclear to what extent IDRs of other coronaviruses can be phosphorylated, their amino acid composition would provide similar potential as SARS-CoV-2, as the completely phosphorylated charges of all peptides follow closely those of SARS-CoV-2 (Figure 3B).

This suggests that the charge properties and phosphorylation, like polarity and hydrophobicity, of the equivalent IDR sub-regions are functional biophysical constraints maintained across related coronaviruses despite little sequence conservation.

Biophysical properties of select mutants

Unfortunately, it is impossible to express and experimentally characterize biophysical properties of all mutant species. Therefore, to assess the range of phenotype variation, we examine only six exemplary protein constructs related to variants of concern in comparison with the Wuhan-Hu-1 reference molecule, Nref (Table 2): (1) N:R203K/G204R with a double mutation in the disordered linker that arose early in the Alpha variant (B.1.1.7), but occurs also in the Gamma variant (P.1), and all Omicron variants (BA.1 through BA.5). It was found to modulate phosphorylation of cytosolic N-protein, enhance assembly in a VLP assay, and increase viral fitness (Johnson et al., 2022; Javed et al., 2023; Syed et al., 2022); (2) N:P13L/Δ31–33 carrying the mutation P13L and the deletion Δ31–33 that are part of the defining mutations of all Omicron variants, with P13L epidemiologically ranked as the most statistically significant N-protein mutation linked to increased fitness (Obermeyer et al., 2022; Oulas et al., 2021); (3) No is a combination of N:R203K/G204R and N:P13L/Δ31–33, carrying thereby the complete set of defining mutations of the BA.1 Omicron variant; (4) N:G215C with a key mutation in the disordered linker that was associated with the rise of the 21J clade of the Delta variant, and found to modulate a transient helix in the L-rich linker region (Zhao et al., 2022). In a reverse genetics system, N:G215C was recently reported to cause significantly increased viral growth and altered virion morphology (Kubinski et al., 2024). (5) N:D63G containing another defining mutation of the Delta variant, located in the NTD and epidemiologically ranked above G215C in increasing SARS-CoV-2 fitness (Obermeyer et al., 2022); and (6) Nδ carrying all four defining mutations D63G, R203M, G215C, D377Y of the Delta variant. As detailed in Table 2, all of these species are found in the genomic database, and in combination with additional mutations occur in a high fraction of all genomes (exceeding the frequency of the ancestral Wuhan-Hu-1 N-protein by an order of magnitude). However, with the exception of N:G215C, none of the mutants has been studied in detail with regard to their macromolecular biophysical properties.

Table 2. Overview of N-protein species compared in biophysical experiments.

Designation N-protein mutations n exclusive instances* Occur in # of distinct sequences Occurs in % of all genomes In set of defining VOC mutations §
N:R203K/G204R R203K, G204R 53,282 17,552 57% α, γ, ο
N:P13L/Δ31–33 P13L, Δ31–33 9548 12,503 47% ο
Nο P13L, Δ31–33, R203K, G204R 791,613 10,238 46% ο (all BA.1)
Nδ D63G, R203M, G215C, D377Y >1.2 × 106 9397 33% δ (all 21J)
N:G215C G215C 60 10,562 34% δ
N:D63G D63G 182 12,443 36% δ
Nref none 38,929 NA 3.6% NA
*

Number of genomes where the indicated mutations are the only N mutations.

Number of unique N-protein sequences in which indicated mutations are present, alongside other mutations.

Percentage of all sequenced genomes carrying the specific mutation.

§

Variants of concern for which indicated mutations are part (or all) of the defining set of N-m.

These sets of mutations comprise all defining N-protein mutations of this variant. Literature on definition or biophysical characterization of the mutant.

All mutations considered here are within the IDRs, except for N:D63G, a mutation characteristic of the Delta variant. The presence of the N:D63G mutation in the NTD is highlighted in the shift of the intrinsic fluorescence quantum yield of this mutant in comparison to Nref (Figure 4A). This may be attributed to changes in the local environment of tryptophan W108, which is partially surface exposed and structurally near the aspartic acid D63, as indicated by AlphaFold structural predictions (Figure 4—figure supplement 1). D63G ablates a negative surface charge near the NA binding site of the NTD, which poses the question whether this mutation alters NA binding affinity. We assessed this using sedimentation velocity analytical ultracentrifugation (SV-AUC) with the oligonucleotide T10 as an NA probe. T10 is comparable in length to the NTD binding canyon for NA but does not permit multi-valent binding (Dinesh et al., 2020; Zhao et al., 2021). No significant differences in the intrinsic binding affinity to T10 was detected between N:D63G, other mutants, and the ancestral species (Figure 4—figure supplement 2).

Figure 4. Thermodynamic stability and structural differences of N-protein reference and mutant species.

(A) Intrinsic fluorescence spectrum of N:D63G in comparison with Nref, showing spectra in triplicate. (B) Differential scanning fluorometry, with the temperature of maximum fluorescence ratio derivative (Ti-values, with an estimated precision 0.3°C). (C) Circular dichroism spectra of all N-protein species (spectra with error bars are shown in Figure 4—figure supplement 3).

Figure 4.

Figure 4—figure supplement 1. Structural comparison of N:D63G mutant and ancestral N-protein.

Figure 4—figure supplement 1.

Structures are predicted using ColabFold for N:D63G (left) and the ancestral protein (right).
Figure 4—figure supplement 2. N-protein affinity for binding nucleic acids (NA) probed by sedimentation velocity analytical ultracentrifugation (SV-AUC) of N-protein mixtures with oligonucleotide T10.

Figure 4—figure supplement 2.

T10 can occupy the NA binding grove of the NTD of N-protein, but does not permit multi-valent binding. Titration series of N-protein with T10 allows separation of concentration-dependent populations of free and bound/co-migrating T10 in the mixtures. This provides the basis for the determination of equilibrium binding constants through non-linear regression of the isotherm of signal weighted-average sedimentation coefficients using a two-site binding model of T10 to N-protein dimers. Best-fit KD-values and 95% confidence intervals are 1.1 [0.8–1.6] µM for Nref, 2.8 [1.6–4.9] µM for No, 2.4 [1.1–5.0] µM for N:D63G, and 1.3 [0.9–1.9] µM for N:R203K/G204R, respectively. SV-AUC experiments were carried out in buffer high-salt (HS). Similarly, no significant difference was measured in binding affinity between Nref and N:D63G was observed in buffer low-salt (LS).
Figure 4—figure supplement 3. Individual comparison of circular dichroism (CD) spectra.

Figure 4—figure supplement 3.

The data from Figure 4C are reproduced and plotted in comparison with Nref. Standard deviations from three acquired spectra are depicted as shaded bands.
Figure 4—figure supplement 4. Comparison of differential scanning fluorometry (DSF) data of N-protein species in low-salt (LS) and high-salt (HS) buffer.

Figure 4—figure supplement 4.

Protein preparations were dialyzed in either HS buffer consisting of 20 mM HEPES, 150 mM NaCl, pH 7.5, or LS buffer consisting of 10.1 mM Na2PO4, 1.8 mM KH2PO4, 2.7 mM KCl, 10 mM NaCl, pH 7.4. DSF experiments show no significant shift in Ti for the same protein species in LS or HS buffer.

A parameter of great interest from an evolutionary perspective is the thermal stability of the folded domains. This property can be assessed experimentally by differential scanning fluorometry (DSF), which reports on temperature-driven changes in the environment of aromatic amino acids due to changes in solvent exposure (Eftink, 2000). Such changes may occur during unfolding or as a result of other conformational changes. In the case of N-protein, conveniently all tryptophan and tyrosine residues of N-protein are located in the NTD and CTD, such that changes in the intrinsic fluorescence report exclusively on changes in the state of the folded domains. As shown in Figure 4B, a major transition is observed with an inflection point at Ti ≈ 49°C. Compared to the reproducibility of transition temperatures of ±0.3°C, significant shifts from the ancestral N-protein can be discerned: While Omicron mutations No, N:R203K/G204R, and N:P13L/Δ31–33 are neutral, those occurring in the Delta variant (N:D63G, N:G215C, and Nδ) are destabilizing, i.e., they lower the transition temperature. Interestingly, apparent destabilization of the folded domains occurs in N:G215C despite the absence of mutations in the folded domains – 215C being located in the middle of the linker IDR. This nonlocal mutation effect points to altered intra-molecular interactions between IDRs and the folded domains, and/or changes in contacts between folded domains mediated through an altered oligomeric state. (This is corroborated in non-natural point mutants N:L222P and N:L222P/R226P which abrogate linker helix oligomerization [Zhao et al., 2023] and exhibit Ti-values of ≈51°C.) Furthermore, Figure 4B shows additional transitions occur at higher temperatures broadly in the range of 60–70°C. While their origin is unclear, this signal may accompany the formation of higher-order structure. It is noteworthy that N:G215C is also distinctly different in this feature.

Secondary structure information from the entire molecule including the IDRs can be extracted from circular dichroism (CD) spectra. As may be observed from Figure 4C (and in more detail in Figure 4—figure supplement 3), significant variation occurs both in the magnitude of the negative ellipticity at ≈200 nm, which mainly reflects disordered residues, and in the magnitude of the negative ellipticity at ≈220 nm, which reports on helical structure. Compared to the ancestral Nref, significantly less disorder and greater helicity is observed for N:G215C (and to lesser extent also for Nδ), whereas slightly more disorder is indicated for N:R203K/G204R. Little difference to the ancestral molecule is observed for No, N:P13L/Δ31–33, and N:D63G. The absence of significant changes for N:D63G is consistent with this mutation having only a subtle, if any, impact on the NTD conformation. For N:G215C, increased helicity can be attributed to the stabilization of transient helices in the leucine-rich region of the central linker IDR, as shown previously (Zhao et al., 2023; Zhao et al., 2022).

Tertiary and quaternary structure can be assessed by SV-AUC (Figure 5A). As reported previously, the ancestral N-protein at micromolar concentrations in NA-free form is a tightly linked dimer sedimenting at ≈4 S, without significant populations of higher oligomers (Forsythe et al., 2021; Ribeiro-Filho et al., 2022; Tarczewska et al., 2021; Zhao et al., 2022; Zhao et al., 2021). The same behavior is observed for N:D63G, No, N:R203K/G204R, as well as N:P13L/Δ31–33 at low micromolar concentrations (Figure 5A). By contrast, the G215C mutation promotes the formation of higher oligomers via stabilization of coiled-coil interactions of transient helices in the L-rich linker region (Zhao et al., 2023; Zhao et al., 2022). This is consistent with the enhanced helical content of this mutant (Figure 4C). Oligomerization beyond the dimeric Nref is also observed for Nδ, which incorporates the 215C mutation, but less than for N:G215C. This is consistent with the intermediate helical content of Nδ observed in CD. Of the three additional mutations of Nδ relative to N:G215C, we speculate that D63G does not impact dimerization (as in N:D63G, Figure 5A), and that therefore either the distant D377Y and/or R203M might cause this reduction of helicity and oligomerization relative to N:G215C, noting that R203M is proximal to the L-rich region (215–235) reshaped by 215C (Zhao et al., 2023).

Figure 5. Tertiary and quaternary structure of N-protein species.

(A) Sedimentation coefficient distributions c(s) from sedimentation velocity analytical ultracentrifugation (SV-AUC) experiments show ≈4 S dimers and higher oligomers. Data for N:G215C and Nδ are reproduced from Zhao et al., 2022. (B) Temperature-dependent particle formation reported as average Stokes radius measured by dynamic light scattering.

Figure 5.

Figure 5—figure supplement 1. Comparison of solution state of N-protein species in low-salt (LS) and high-salt (HS) buffer.

Figure 5—figure supplement 1.

Protein preparations were dialyzed in either HS buffer consisting of 20 mM HEPES, 150 mM NaCl, pH 7.5, or LS buffer consisting of 10.1 mM Na2PO4, 1.8 mM KH2PO4, 2.7 mM KCl, 10 mM NaCl, pH 7.4. (A) Differential scanning fluorometry (DSF) experiments show no significant shift in Ti for the same protein species in LS or HS buffer. (B) Sedimentation velocity analytical ultracentrifugation (SV-AUC) exhibit sedimentation coefficient distributions with peak s-values increased by ≈5% in LS buffer relative to HS buffer. This apparent change is negligible compared to the ≈60–90% increase in sedimentation coefficients from altered oligomeric states observed for N:G215C and Nδ (Figure 5A).

As outlined in the Introduction, N-protein has a propensity to form large particles and undergo LLPS (Carlson et al., 2020; Cascarina and Ross, 2022; Cubuk et al., 2021; Iserman et al., 2020; Jack et al., 2021; Lu et al., 2021; Perdikari et al., 2020; Savastano et al., 2020), which can be promoted at higher temperatures (Iserman et al., 2020; Zhao et al., 2021). Figure 5B shows the z-average particle size measured by dynamic light scattering (DLS) as a function of temperature. Particle formation is governed by a combination of processes, including the hydrophobicity-driven stabilization of the linker helix and its self-association, ultra-weak interactions across the entire protein contributing to LLPS, and unfolding and aggregation processes. This complicates a comparison of the temperature transitions observed in DSF (Figure 4B) and DLS (Figure 5B) (and a further technical difficulty may be potential differences in temporal lag of conformational rearrangements versus particle assembly kinetics).

Nevertheless, several clear observations can be made. As reported previously, Nref forms clusters and particles at >55°C (Zhao et al., 2021), which is strongly enhanced and occurs at a lower temperature for N:G215C, due to the enhancement of the linker oligomerization (Figure 5B; Zhao et al., 2023). Very similar behavior is observed for Nδ, which suggests that at higher temperatures any potential inhibitory role suspected of the R203M mutation on self-association may be less relevant compared to G215C. It is interesting to note that, correspondingly, both show a lower Ti in DSF. More moderate enhancement of particle formation is observed for N:D63G, which shows an onset already at ≈50°C and larger particle averages than the ancestral protein. This also correlates with its significantly lower Ti in DSF. Thus, even subtle structural changes (as shown in Figure 4—figure supplement 1) can impact the assembly behavior.

The opposite effect, strong inhibition of particle formation, is observed for the N:R203K/G204R double mutant. Here, particles form only at temperatures >70°C, as a mixture of smaller clusters with some very large aggregates that adventitiously enter the light path in DLS and cause fluctuations in the z-average Stokes radius. Interestingly, although No comprises the R203K/G204R mutation, No does not share this behavior but instead exhibit slightly enhanced particle formation relative to the ancestral Nref, comparable to N:D63G. This points to the role of additional mutations in No, which besides R203K/G204R features the N-arm mutations P13L and Δ31–33. Interestingly, by themselves in N:P13L/Δ31–33 the particle formation is also suppressed relative to Nref, although less so than for N:R203K/G204R. This again points to non-additive effects, suggesting that the combination of N-arm and linker IDR mutations in No alter the effect of either set of inhibitory mutations alone, to jointly promote particle formation of No.

We were curious whether IDR mutations might alter particle formation through modulation of existing or introduction of new protein-protein interfaces. We focused on Omicron mutations as these are obligatory in all currently circulating strains, and specifically on N-arm mutations, which have recently been implicated in altered intramolecular interactions with NA-occupied NTD (Cubuk et al., 2023). Even though SV-AUC showed no indication of self-association of N:P13L/Δ31–33 at low micromolar concentrations, weak interactions with Kd > mM would not be detectable under these conditions yet could be highly relevant in the context of multi-valent complexes (Zhao et al., 2024). Following the roadmap used previously for the study of the weak self-association of the leucine-rich linker IDR (Zhao et al., 2023), we restricted the protein to the N-arm peptide such that it can be studied at much higher concentrations. To this end, we compared solution behavior of the N-arm constructs Nref:(1–43) with the Omicron N-arm N:P13L/Δ31–33(1–43), as well as the N-arm with individual mutation N:P13L(1–43) and deletion N:Δ31–33(1–43). Unexpectedly, solutions of N:P13L/Δ31–33(1–43) exhibited elevated viscosity after storage for several days at 4°C in 20 mM HEPES, 150 mM NaCl, pH 7.5. Since this is a tell-tale sign of weak protein interactions, we carried out ColabFold structural predictions. Even though ColabFold is trained to predict folded structures, it has been found to be frequently successful in predicting transient folds in IDRs (Alderson et al., 2023; Zhao et al., 2023). Indeed, it predicts that replacement of proline at position 13 by leucine allows for formation of parallel sheets symmetrically arranged in higher-order N-arm oligomers (Figure 6—figure supplement 1). We proceeded to test oligomerization of the N-arm constructs experimentally in hydrodynamic studies. Figure 6A shows autocorrelation functions of all peptides. While the reference N-arm Nref:(1–43) and the construct carrying the Δ31–33 deletion behave as expected for non-interacting peptides of this size, the N-arm constructs carrying the P13L mutation (in particular, the Omicron N-arm N:P13L/Δ31–33(1–43)) exhibit very large correlation times. This may be indicative of either formation of large particles or the presence of weak interaction networks as in gels. Similarly, in SV-AUC (Figure 6B) the ancestral reference and the Δ31–33 deletion mutant sediment as expected for non-interacting N-arm peptides (Zhao et al., 2023), whereas rapidly sedimenting, anomalously shaped boundaries with ≈100-fold larger sedimentation coefficient were observed for the Omicron N-arm and the construct carrying solely the P13L mutation. This unequivocally demonstrates the introduction of new protein self-association interfaces from the P13L mutation. They are weak and not apparent in studies of the full-length protein N:P13L/Δ31–33 at low micromolar concentrations, but oligomers can be populated at the ≈100-fold higher achievable concentrations of the peptides, which mirrors the concentration range for in vitro observation of interactions of the leucine-rich linker helices (Zhao et al., 2023).

Figure 6. Protein-protein interactions of N-arm peptide containing the Omicron P13L mutation lead to large structures at high concentrations.

(A) Autocorrelation functions from dynamic light scattering (DLS) (A) and sedimentation coefficient distributions from sedimentation velocity analytical ultracentrifugation (SV-AUC) (B) for the ancestral reference Nref:(1–43) (black), N:Δ31–33(1–43) (blue), N:P13L(1–43) (cyan), and N:P13L/Δ31–33(1–43) (identical to the Omicron N-arm, magenta). All peptide concentrations are 400 µM, except for Nref:(1–43) in the SV-AUC experiment which is 275 µM, reproduced from previously reported data (Zhao et al., 2023).

Figure 6.

Figure 6—figure supplement 1. Structural prediction of Omicron N-arm self-interactions.

Figure 6—figure supplement 1.

(A) Best ColabFold prediction of eight Omicron N-arm (1:41) peptides with P13L and Δ31–33 mutations. For one chain shown in magenta, atoms of 13L are depicted and labeled, and contacts of this chain within 3.5 Å are color-coded by confidence. (B) Top view of (A). (C) Predicted alignment error (PAE) map showing symmetry and confidence of predicted interactions. (D) Best analogous prediction of ancestral N-arm interactions, highlighting the absence of order.

The ability for N-protein to undergo LLPS is thought to be crucial for several functions including interactions with stress granules, RNP assembly, and interactions with viral M-protein (Carlson et al., 2022; Cascarina and Ross, 2022; Iserman et al., 2020; Lu et al., 2021; Savastano et al., 2020). Weak protein-protein interactions and cluster formation such as shown in Figures 5 and 6 can be coupled to LLPS, or alternatively LLPS may occur independent of clusters following Flory-Huggins theory (Kar et al., 2022). Therefore, we examined the impact of mutations on the propensity for LLPS. Images of phase-separated condensates are shown in Figure 7, and corresponding histograms of droplet numbers and areas are shown in Figure 7—figure supplement 1. As may be discerned from the top left panel of Figure 7, Nref readily forms droplets in the presence of T40 oligonucleotides. Under the same conditions, N:R203K/G204R (bottom left) does not display droplets, but forms few large particles with fibrillar morphology. In stark contrast, N:P13L/Δ31–33 (bottom center) readily forms droplets that appear to be more rapidly merging and growing than those of Nref (Figure 7—figure supplement 2). The combination of these mutations in No exhibits an intermediate propensity for LLPS with droplets in a dispersion of sizes. The most polydisperse distribution with largest droplets were observed for N:G215C (Figure 7—figure supplement 1).

Figure 7. Differences in liquid-liquid phase separation (LLPS) propensity of N-protein mutant species.

Optical microscopy images were taken of 10 μM N-protein with 5 μM T40 (except Nδ, which is 4 μM N-protein with 2 μM T40) in low-salt (LS) buffer after incubation for 15 min at room temperature. For N:P13L/Δ31–33, a second image was taken at the 21 min time point highlighting the growth of condensed phases. All scale bars are 10 µm. Histograms of particle areas are in Figure 7—figure supplement 1, and a comparison of two time points for Nref, N:R203K/G204R and N:P13L/Δ31–33, is provided in Figure 7—figure supplement 2.

Figure 7.

Figure 7—figure supplement 1. Comparison of area distributions of droplets in images of Figure 7.

Figure 7—figure supplement 1.

For each N-protein species, images were segmented to identify droplets. The values indicated are particle numbers, the mean area, the standard deviation of the area, and the probability that the sample is from the same distribution as Nref based on the two-sample Kolmogorov-Smirnov test.
Figure 7—figure supplement 2. Comparison of droplet area after liquid-liquid phase separation (LLPS) at two points in time.

Figure 7—figure supplement 2.

Similar to Figure 7, images of LLPS were recorded for Nref, N:R203K/G204R, and N:P13L/Δ31–33 at two time points for the same sample. The upper plot shows droplet numbers. The lower plot shows mean and standard deviations of the droplet area. Images and histograms for the early time points and the later time point of N:P13L/Δ31–33 are shown in Figure 7 and Figure 7—figure supplement 1, respectively.

Discussion

The SARS-CoV-2 pandemic has motivated the collection of virus genomic sequences on an unprecedented scale, which has generated invaluable data on the genomic diversity of an RNA virus. From the ensemble of observed consensus sequences of infected hosts, we can extract, for the first time, an exhaustive map of possible amino acid replacements in viral proteins that are tolerable for viable virus (Bloom et al., 2023; Saldivar-Espinoza et al., 2023; Zhao et al., 2022). This brings into stark relief our limited understanding of the genotype/phenotype relationship, which is very detailed on some local functional aspects, such as spike protein antigenicity, but not much developed in general. This limits our ability to draw conclusions from the observed mutant spectrum on their variation in biophysical functions and fitness. Besides traditional sequence-based structure prediction and structure/function relationships, and more recent recognition of structural dynamics, new paradigms have emerged with increased understanding of the role of IDRs, their mimicry of SLiMs, nonlocal physicochemical properties of sequence regions, and the ability of IDRs to promote macroscopic phase separation to generate or usurp condensates with virus-related functions. The extensive genomic data of SARS-CoV-2 presents an opportunity to probe how sequence diversity impacts these biophysical properties, and to examine what biophysical constraints exist for viral proteins to support viability. Focusing on SARS-CoV-2 N-protein we have studied the diversity of biophysical phenotypes with the goal to increase understanding of salient mechanisms of the many N-protein functions, and also to glean aspects of the biophysical fitness landscape underlying evolution.

On one hand, our studies of the diversity of nonlocal physicochemical properties of N-protein revealed the absence of tightly controlled hydrophobicity, polarity, and charges outside the folded domains. In the IDRs, individual mutations may alter each of these properties apparently without impacting viability, although modulatory fitness effects may be possible. For example, viable linker sequences span from 4.8 to 9.1 charges. On the other hand, a very clear separation of physicochemical parameters far exceeding mutational dispersion is maintained between the L-rich and SR-rich region of the linker IDR, and between the N3 and remaining region of the C-arm IDR. These distinctions are likely functionally important, with the polarity and charges of the SR-rich linker region aiding in NA binding (Pontoriero et al., 2022), the hydrophobicity of the L-rich region aiding in assembly functions (Bessa et al., 2022; Zhao et al., 2024; Zhao et al., 2023), and the acidic N3 region probably playing a role in NA- and M-protein interactions as suggested from analogy to MHV- and SARS-CoV-1 (Masters, 2019). These nonlocal features are also maintained in analogous consensus sequence regions of related coronaviruses, and thus provide further examples for nonlocal biophysical properties that are evolutionary conserved despite amino acid sequence divergence (Zarin et al., 2017; Zarin et al., 2021). It may seem as a paradox that despite this conservation these features seem not very fine-tuned and that significant variation of these properties is still observed within the viable mutant spectrum, for polarity and hydrophobicity significantly exceeding the spread of parameter values of the folded domains. However, as mentioned above, the differences between IDR regions that appear associated with biophysical functions are of significantly larger magnitude. The tolerance for the remaining comparatively smaller fluctuations in physicochemical parameters may be important to allow sufficient local variation in sequence space for additional functions to evolve, such as the emergence of SLiMs to manipulate the host/virus interface (Davey et al., 2011; Schuck and Zhao, 2023). Correspondingly, in a recent study of SLiMs variation across the mutant spectrum, we found the total number and detailed location of phosphorylation SLiMs to vary considerably in the SR-rich region, but to be maintained overall at a high level across this region (Schuck and Zhao, 2023).

Other nonlocal properties were studied experimentally, though unavoidably only by example of several different SARS-CoV-2 N-protein species. We selected conspicuous mutations in variants of concern, but each of the constructs studied also represents in itself viable N-protein species occurring in consensus sequences of the genomic database. Strikingly, point mutations can affect protein properties on all levels of organization, from thermodynamic stability and secondary structure to intra- and inter-molecular interactions, oligomeric state, particle formation, and LLPS. These results must be considered in the context of the highly dynamic nature of N-protein, which is caused by the flexibility of intrinsically disordered domains (Cubuk et al., 2023; Cubuk et al., 2021; Redzic et al., 2021; Zhao et al., 2021), the NTD and its disordered β-hairpin (Redzic et al., 2021), and the large-scale conformational fluctuations of the N-protein dimer in solution (Botova et al., 2024; Ribeiro-Filho et al., 2022; Różycki and Boura, 2022). High sequence plasticity is accompanied by high plasticity of protein configuration and delicate balances of protein interactions that can be significantly shifted by single mutations with nonlocal effects.

Our results highlight two different mechanisms through which mutation effects may be propagated across the protein. First, mutations can impact the transient helix in the hydrophobic L-rich region of the linker, and, as we have shown previously, promote its helical conformation and self-association into higher oligomeric states (Zhao et al., 2023; Zhao et al., 2022). This, in turn, may impact collision frequency or other intra-molecular interactions with folded domains, such as the recently reported intra-molecular contact of the L-rich region to the NTD observed by NMR (Botova et al., 2024). This is reflected in the altered secondary structure observed in CD of Nδ and N:G215C, as well as their oligomers observed in SV-AUC, and this would explain the impact of the G215C mutation on the thermal stability reported by intrinsic fluorescence localized to the NTD and CTD. In addition, changes near the L-rich transient helix also impact weak protein interactions and amplify to enhanced particle formation and altered LLPS. Notably, introduction of N:G215C in a reverse genetics system resulted in enhanced viral replication and larger virions (Kubinski et al., 2024).

Second, mutation frequencies peak in the downstream end of the SR-rich linker region, including the double mutation R203K/G204R that is part of the defining mutations of Omicron (and other) variants. In different VLP and cellular assays (Johnson et al., 2022; Syed et al., 2022), it has been shown to modulate N-protein phosphorylation and thereby the balance between replication and assembly, with contributions from an emerging alternate, truncated N-protein (210–419) that itself supports assembly (Adly et al., 2023; Leary et al., 2021; Mears et al., 2022; Javed et al., 2023). In the present study, we found that full-length N:R203K/G204R strongly opposes both temperature-driven particle formation and LLPS with oligonucleotides. Interestingly, this effect can be compensated for by the additional N-arm mutation P13L that is present in all Omicron variants. P13L itself has been identified epidemiologically as the most important driver of fitness in N-protein (Obermeyer et al., 2022; Oulas et al., 2021), but its biophysical effects have not been previously studied. We identified a distinct self-association propensity of N-arm peptides carrying the P13L mutation, and enhanced LLPS propensity of full-length N-protein carrying the complete set of N-arm mutations in Omicron, N:P13L/Δ31–33. This is consistent with the partial ‘rescue’ of particle formation and full restoration of LLPS propensity that we have observed in the No molecule with the complete set of P13L/Δ31–33/R203K/G204R mutations defining N-protein from the BA.1 (B.1.1.529) Omicron variant. It is interesting to note that the R203K/G204R mutation, the P13L mutation, and the P13L/Δ31–33 combination each can occur independently of each other in viable virus species, with 261 genomes in the database carrying only the P13L mutation, 9548 only the combination P13L/Δ31–33, and >50,000 genomes exclusively the double mutation R203K/G204R, even though their more frequent coexistence (by approximately 10-fold, in all of Omicron variants) might suggest epistatic interactions and a fitness advantage. Relatedly, it was shown that the P13L mutation causes complete loss of recognition of a CD8+ T-cell epitope, which may cause T-cell evasion (de Silva et al., 2021), and provide an additional fitness effects of this mutation. Compensating effects between linker IDR and N-arm mutations highlight the nonlocal consequences of IDR mutations. They also highlight the difficulty of assigning variant properties and fitness effects to a single mutation, given the entangled effects among the sets of multiple mutations defining the variants of concern.

In summary, the importance of IDRs in viral evolution was recognized previously for several reasons. Their inherent flexibility makes them more permissible for amino acid changes, which is born out in the mutational landscape of SARS-CoV-2. As mentioned above, this makes them well suited for host adaptation through remodeling of host protein interaction networks, which is exemplified in the clusters of host-specific mutations located in IDRs of Dengue virus proteins (Charon et al., 2018; Dolan et al., 2021). Mimicry of eukaryotic SLiMs is ubiquitous (Davey et al., 2011; Hagai et al., 2014; Mihalič et al., 2023), and as we have shown recently, the sequence space of SARS-CoV-2 N-protein IDRs allows presentation of a large fraction of known eukaryotic SLiMs (Schuck and Zhao, 2023). In addition, nonlocal sequence-distributed physicochemical features of IDRs such as their charge and hydrophobicity have been demonstrated recently to mediate biological functions and present evolutionary constraints (Moses et al., 2023; Zarin et al., 2021). This principle also holds true in the distinct properties of linker and C-arm regions of SARS-CoV-2 N-protein. A related nonlocal physicochemical property of IDRs is their propensity for supporting LLPS (Abyzov et al., 2022; Brocca et al., 2020; Pappu et al., 2023), which plays a key role in different N-protein functions (Carlson et al., 2020; Cascarina and Ross, 2022; Roden et al., 2022; Savastano et al., 2020). Finally, here we have observed the ability of mutations in IDRs to modulate overall biophysical properties such as thermal stability, oligomeric state, and assembly properties. In SARS-CoV-2 N-protein IDRs, the latter are mediated via weak interactions in transiently folded structures. In addition, the high flexibility of the IDRs and their resulting high intra-chain contact frequencies (Botova et al., 2024; Różycki and Boura, 2022) may magnify nonlocal consequences of mutations. This endows viral protein IDRs with yet another level of variation of the biophysical phenotype that can impact evolutionary fitness. Exploiting the emerging mutational landscape and sequence space presents both a challenge and opportunity to explore the biophysical phenotype spectrum and thereby to uncover the salient functional principles of RNA-virus proteins.

Materials and methods

Mutational landscape, sequence alignment, and prediction of physicochemical properties

The Wuhan-Hu-1 isolate (GenBank QHD43423) (Wu et al., 2020) was used as the ancestral reference. Sequence data were based on consensus sequences of SARS-CoV-2 isolates submitted to the GISAID as previously described (Schuck and Zhao, 2023; Zhao et al., 2022). Briefly, sequence data were downloaded on January 20, 2023, from Nextstrain (Hadfield et al., 2018) and 5.06 million high-quality preprocessed sequences were included in the analysis. 746 sequences exhibiting insertions in the N-protein were omitted, as well as those with more than 10 deletions in N-protein and those represented in fewer than 10 genome instances.

The resulting sequence database was parsed for different unique sequences for N-proteins and different segments, using code written in MATLAB (MathWorks, Natick, MA, USA). Sequence hydrophobicity was calculated in RStudio (https://posit.co/) using the package PEPTIDES (Osorio et al., 2015) and polarity and charge using the package ALAKAZAM (Gupta et al., 2015). For maximally phosphorylated charge, –2 was added to the total charge for each serine, threonine, and tyrosine in the IDRs.

Alignment of SARS and related coronavirus sequences (SARS-CoV-1 P59595.1, MERS YP_009047211.1, MHV NP_045302.1, human coronavirus NL63 Q6Q1R8.1, and 229E-related bat coronavirus APD51511.1) was carried out with COBALT at NLM (Papadopoulos and Agarwala, 2007), as shown in Zhao et al., 2022. This alignment was used to dissect related viruses into regions corresponding to the SARS-CoV-2 regions (N-arm, NTD, linker, SR-rich, L-rich, CTD, Carm, Carm1, N3) (Supplementary file 1). The resulting segments of the related viruses were subjected to analysis of physicochemical properties as described above. Sequence similarity of the corresponding regions relative to the SARS-CoV-2 regions was calculated using BLAST blastp suite (Altschul et al., 1997), using an expectation threshold of 0.9, word size 2, and BLOSUM63 scoring matrix.

Structure prediction

Structural predictions for NTD and N-arm were carried out using ColabFold (Mirdita et al., 2022) and graphics were generated using ChimeraX (Pettersen et al., 2021).

Proteins, peptides, and oligonucleotides

N:D63G and N:G215C were purchased from EXONBIO (catalog# 19CoV-N170 and 19CoV-N180, San Diego, CA, USA), while Nref, N:R203K/G204R, N:P13L/Δ31–33, No, and Nδ were expressed in-house as described previously (Zhao et al., 2023; Zhao et al., 2022). Briefly, the full-length protein with an N-terminal Tobacco etch virus (TEV) cleavage site and 6xHis tag was cloned into the pET-29a(+) expression vector and transformed into One Shot BL21(DE3)pLysS Escherichia coli (Thermo Fisher Scientific, Carlsbad, CA, USA). After cell lysis, the protein was bound to a Ni-NTA column, and unfolded and refolded to remove residual protein-bound bacterial NA (Carlson et al., 2020). After elution the 6xHis tag was cleaved and the protein purified by size exclusion chromatography. Greater than 95% purity of the proteins was confirmed by SDS-PAGE, and the ratio of absorbance at 260 nm and 280 nm of ~0.50–0.55 confirmed absence of NA. The latter is important to eliminate higher-order N-protein oligomers induced by NA binding (Carlson et al., 2020; Tarczewska et al., 2021; Zhao et al., 2021). For a subset of mutants, the protein sequence and mass were tested and confirmed by LC-MS/MS and LC-MS, respectively. Biophysical experiments were preceded by dialysis in either high-salt (HS) buffer consisting of 20 mM HEPES, 150 mM NaCl, pH 7.5, or low-salt (LS) buffer consisting of 10.1 mM Na2PO4, 1.8 mM KH2PO4, 2.7 mM KCl, 10 mM NaCl, pH 7.4 as indicated below.

The oligonucleotide T40 was purchased from Integrated DNA Technologies (Skokie, IL, USA), as purified by HPLC and lyophilized. N-arm peptides were purchased from ABI Scientific (Sterling, VA, USA), as purified by HPLC, examined by MALDI for purity and identity, and lyophilized.

Spectroscopy

CD spectra were acquired in a Chirascan Q100 (Applied Photophysics, UK), using cuvettes of 1 mm pathlength, and data acquisition with 1 nm steps and 1 s integration time. Results are averages of three acquisitions, corrected for buffer background. Protein concentration was 3 µM in buffer LS, except No in buffer HS.

For the acquisition of fluorescence spectra, protein samples at 1 µM were loaded into a quartz cuvette with 1.0 cm optical pathlength. Steady-state tryptophan fluorescence emission spectra in the range from 305 nm to 500 nm were recorded in a spectrofluorimeter (QuantaMaster, Photon Technology) with excitation at 295 nm using a 1.0 nm increment. Scans were acquired in triplicate.

DSF was carried out in a Tycho instrument (Nanotemper, Germany) as previously described (Zhao et al., 2021). Briefly, 10 µL samples were aspirated in capillaries (TY-C001, Nanotemper, Germany), and intrinsic fluorescence was measured at 350 nm and 330 nm while the temperature was ramped from 35°C to 95°C at a rate of 30°C/min. The first derivative of the intensity ratio was calculated as a function of temperature. DSF experiments were carried out at protein concentrations of 2 µM in buffer LS, except for N:R203K/G204R which was measured in buffer HS. As a buffer control, the difference in Ti for Nref in LS and HS buffer was measured and found to be within error of data acquisition (Figure 4—figure supplement 4).

Hydrodynamic techniques

SV-AUC experiments were carried out in a ProteomeLab XL-I analytical ultracentrifuge (Beckman Coulter, Indianapolis, IN, USA) in standard configurations (Schuck et al., 2015), with instruments subjected to routine calibrations (Ghirlando et al., 2013). Briefly, 2 µM protein samples were filled in cell assemblies composed of charcoal-filled Epon double-sector centerpieces with sapphire windows, inserted in an 8-hole AN-50 TI rotor and temperature equilibrated. After acceleration to 50,000 rpm data acquisition commenced using the absorbance optical detector at 280 nm and the interference optical detector. Data were analyzed in SEDFIT (https://sedfitsedphat.nibib.nih.gov/software/default.aspx) in terms of a sedimentation coefficient distribution c(s) (Schuck, 2016). Proteins for self-association studies were in buffer HS, except Nref, Nδ, and N:G215C were in LS, the latter causing an ≈5% increase in s-value (Figure 5—figure supplement 1). Typical accuracy of c(s) peaks are on the order of ≈1% for peak s-values and ≈1–2% for relative peak areas (Zhao et al., 2015).

NA binding experiments were analyzed in buffer HS and LS with isotherms of signal weighted-average sedimentation coefficients in SEDPHAT (Schuck and Zhao, 2017). For studies of the N-arm peptide species, 400 µM peptide samples were studied by gravitational sweep sedimentation using rotor speed steps of 3000 rpm, 10,000 rpm, 40,000 rpm, and 55,000 rpm (Ma et al., 2016) and analyzed with a model for apparent sedimentation coefficient distributions ls-g*(s) (Schuck, 2016) as a qualitative representation of rapidly migrating boundaries of N:P13L(1:43) and N:P13L/Δ31-33(1:43), or with c(s) distributions for Nref:(1:43) and N:Δ31-33(1:43).

Temperature-dependent DLS autocorrelation data of N-protein species were collected in a NanoStar instrument (Wyatt Technology, Santa Barbara, CA, USA) equipped with a 658 nm laser and using a detection angle of 90°. 100 µL samples at 3 µL N-protein in LS buffer were inserted into a 1 µL quartz cuvette (WNQC01-00, Wyatt Instruments), with excess sample to prevent evaporation in the observation chamber. A temperature ramp rate of 1 °C/min was applied with 5 s data acquisitions and averaging three replicates for each temperature point. Data were collected and processed with the software Dynamics 7.4 (Wyatt Instruments) to determine the average hydrodynamic radius by cumulant analysis.

DLS studies of N-arm peptides were carried out in a Prometheus Panta (Nanotemper, Germany) instrument at 20°C. The samples were loaded into a capillary (Nanotemper PR-AC002) and ACFs were acquired using the 405 nm laser at the detection angle of 140°.

Optical microscopy

Optical imaging of in vitro phase-separated condensates was carried out as described previously (Zhao et al., 2021). Briefly, reaction mixtures of N-protein and T40 in buffer LS were combined and mixed immediately prior to imaging. 3µL samples were transferred onto a glass-bottom 35 mm dish (catalog# Part No: P35G-1.5–20-C, MatTek) for imaging at room temperature. Images were acquired on a Nikon Ti-E microscope equipped with a 100× 1.49 NA oil objective lens (LIDA light engine, Lumencor, Beaverton, OR, USA) and recorded with a Prime 95B camera (Teledyne Photometrics) with a pixel size of 110 nm. Images were background-subtracted and contrast-enhanced using MATLAB (Mathworks, Natick, MA, USA).

The segmentation of different shapes in the brightfield images was performed with deep learning methods. Specifically, a pre-trained model (versatile) from StarDist Napari Plugin (Schmidt et al., 2018) was employed to segment the shapes with the following parameters: Input image scaling: 0.5, probability threshold: 0.2, overlap threshold: 0.2. The labels were imported into Fiji and LABKIT (Arzt et al., 2022) for manual verification and correction. For each segmented object, the area was measured in MATLAB.

Materials availability

Plasmids for mutant N-proteins generated in this study are available from the author upon request.

Acknowledgements

We thank Dr. Yan Li (NINDS, NIH) for carrying out mass spectroscopy experiments and Dr. Jiamin Liu (NIBIB, NIH) for her help in quantitative image analysis. This work was supported by the Intramural Research Programs of the National Institute of Biomedical Imaging and Bioengineering (ZIA EB000099-02) and the National Heart, Lung, and Blood Institute, National Institutes of Health. This work utilized the computational resources of the NIH HPC Biowulf cluster for sequence analyses.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Peter Schuck, Email: schuckp@mail.nih.gov.

Mauricio Comas-Garcia, Universidad Autónoma de San Luis Potosí, Mexico.

Qiang Cui, Boston University, United States.

Funding Information

This paper was supported by the following grant:

  • National Institutes of Health ZIA EB000099-02 to Peter Schuck.

Additional information

Competing interests

No competing interests declared.

Author contributions

Resources, Investigation, Methodology, Writing – review and editing.

Conceptualization, Data curation, Software, Formal analysis, Investigation, Methodology, Writing – review and editing.

Investigation.

Investigation.

Resources, Formal analysis, Investigation, Methodology, Writing – review and editing.

Resources, Software, Investigation, Methodology, Writing – review and editing.

Resources, Formal analysis, Supervision, Investigation, Methodology, Writing – review and editing.

Conceptualization, Formal analysis, Supervision, Funding acquisition, Writing – original draft, Project administration, Writing – review and editing.

Additional files

Supplementary file 1. Sequence alignment of nucleocapsid protein of related coronaviruses and sequences corresponding to SARS-CoV-2 regions.

N-protein sequences of SARS-CoV-1 P59595.1, MERS YP_009047211.1, MHV NP_045302.1, human coronavirus NL63 Q6Q1R8.1, and 229E-related bat coronavirus APD51511.1 were aligned with SARS-CoV-2 N-protein. Regions corresponding to the SARS-CoV-2 regions (N-arm, NTD, linker, SR-rich, L-rich, CTD, Carm, Carm1, N3) and their alignment score were determined.

elife-94836-supp1.xlsx (12.1KB, xlsx)
MDAR checklist

Data availability

Raw data supporting this study can be found at the Harvard Dataverse https://doi.org/10.7910/DVN/PZ6LRK.

The following dataset was generated:

Nguyen A, Zhao H, Myagmarsuren D, Srinivasan S, Wu D, Chen J, Piszczek G, Schuck P. 2024. Replication Data for: Modulation of Biophysical Properties of Nucleocapsid Protein in the Mutant Spectrum of SARS-CoV-2. Harvard Dataverse.

References

  1. Abyzov A, Blackledge M, Zweckstetter M. Conformational dynamics of intrinsically disordered proteins regulate biomolecular condensate chemistry. Chemical Reviews. 2022;122:6719–6748. doi: 10.1021/acs.chemrev.1c00774. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Adly AN, Bi M, Carlson CR, Syed AM, Ciling A, Doudna JA, Cheng Y, Morgan DO. Assembly of SARS-CoV-2 ribonucleosomes by truncated N∗ variant of the nucleocapsid protein. The Journal of Biological Chemistry. 2023;299:105362. doi: 10.1016/j.jbc.2023.105362. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Alderson TR, Pritišanac I, Kolarić Đ, Moses AM, Forman-Kay JD. Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2. PNAS. 2023;120:2022. doi: 10.1073/pnas.2304302120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Artesi M, Bontems S, Göbbels P, Franckh M, Maes P, Boreux R, Meex C, Melin P, Hayette MP, Bours V, Durkin K. A recurrent mutation at position 26340 of SARS-CoV-2 is associated with failure of the E gene quantitative reverse transcription-PCR utilized in a commercial dual-target diagnostic assay. Journal of Clinical Microbiology. 2020;58:1–8. doi: 10.1128/JCM.01598-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Arzt M, Deschamps J, Schmied C, Pietzsch T, Schmidt D, Tomancak P, Haase R, Jug F. LABKIT: labeling and segmentation Toolkit for big image data. Frontiers in Computer Science. 2022;4:777728. doi: 10.3389/fcomp.2022.777728. [DOI] [Google Scholar]
  7. Bershtein S, Serohijos AW, Shakhnovich EI. Bridging the physical scales in evolutionary biology: from protein sequence space to fitness of organisms and populations. Current Opinion in Structural Biology. 2017;42:31–40. doi: 10.1016/j.sbi.2016.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bessa LM, Guseva S, Camacho-Zarco AR, Salvi N, Maurin D, Perez LM, Botova M, Malki A, Nanao M, Jensen MR, Ruigrok RWH, Blackledge M. The intrinsically disordered SARS-CoV-2 nucleoprotein in dynamic complex with its viral partner nsp3a. Science Advances. 2022;8:eabm4034. doi: 10.1126/sciadv.abm4034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Biswal M, Lu J, Song J. SARS-CoV-2 nucleocapsid protein targets a conserved surface groove of the NTF2-like Domain of G3BP1. Journal of Molecular Biology. 2022;434:167516. doi: 10.1016/j.jmb.2022.167516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bloom JD, Labthavikul ST, Otey CR, Arnold FH. Protein stability promotes evolvability. PNAS. 2006;103:5869–5874. doi: 10.1073/pnas.0510098103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Bloom JD, Beichman AC, Neher RA, Harris K. Evolution of the SARS-CoV-2 mutational spectrum. Molecular Biology and Evolution. 2023;40:2022. doi: 10.1093/molbev/msad085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Bloom JD, Neher RA. Fitness effects of mutations to SARS-CoV-2 proteins. Virus Evolution. 2023;9:vead055. doi: 10.1093/ve/vead055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Botova M, Camacho-Zarco AR, Tognetti J, Bessa LM, Guseva S, Mikkola E, Salvi N, Maurin D, Herrmann T, Blackledge M. A Specific Phosphorylation-Dependent Conformational Switch of SARS-CoV-2 Nucleoprotein Inhibits RNA Binding. bioRxiv. 2024 doi: 10.1101/2024.02.22.579423. [DOI]
  14. Brocca S, Grandori R, Longhi S, Uversky V. Liquid-liquid phase separation by intrinsically disordered protein regions of viruses: roles in viral life cycle and control of virus-host interactions. International Journal of Molecular Sciences. 2020;21:1–31. doi: 10.3390/ijms21239045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Brown CJ, Johnson AK, Daughdrill GW. Comparing models of evolution for ordered and disordered proteins. Molecular Biology and Evolution. 2010;27:609–621. doi: 10.1093/molbev/msp277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Brown CJ, Johnson AK, Dunker AK, Daughdrill GW. Evolution and disorder. Current Opinion in Structural Biology. 2011;21:441–446. doi: 10.1016/j.sbi.2011.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Carlson CR, Asfaha JB, Ghent CM, Howard CJ, Hartooni N, Safari M, Frankel AD, Morgan DO. Phosphoregulation of phase separation by the SARS-CoV-2 N protein suggests a biophysical basis for its dual functions. Molecular Cell. 2020;80:1092–1103. doi: 10.1016/j.molcel.2020.11.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Carlson CR, Adly AN, Bi M, Howard CJ, Frost A, Cheng Y, Morgan DO. Reconstitution of the SARS-CoV-2 ribonucleosome provides insights into genomic RNA packaging and regulation by phosphorylation. The Journal of Biological Chemistry. 2022;298:102560. doi: 10.1016/j.jbc.2022.102560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Cascarina SM, Ross ED. Phase separation by the SARS-CoV-2 nucleocapsid protein: Consensus and open questions. The Journal of Biological Chemistry. 2022;298:101677. doi: 10.1016/j.jbc.2022.101677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Charon J, Barra A, Walter J, Millot P, Hébrard E, Moury B, Michon T. First experimental assessment of protein intrinsic disorder involvement in an RNA virus natural adaptive process. Molecular Biology and Evolution. 2018;35:38–49. doi: 10.1093/molbev/msx249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Chen K, Xiao F, Hu D, Ge W, Tian M, Wang W, Pan P, Wu K, Wu J. SARS-CoV-2 nucleocapsid protein interacts with RIG-I and represses RIG-mediated IFN-β production. Viruses. 2020;13:47. doi: 10.3390/v13010047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Chin AF, Zheng Y, Hilser VJ. Phylogenetic convergence of phase separation and mitotic function in the disordered protein BuGZ. Protein Science. 2022;31:822–834. doi: 10.1002/pro.4270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Cubuk J, Alston JJ, Incicco JJ, Singh S, Stuchell-Brereton MD, Ward MD, Zimmerman MI, Vithani N, Griffith D, Wagoner JA, Bowman GR, Hall KB, Soranno A, Holehouse AS. The SARS-CoV-2 nucleocapsid protein is dynamic, disordered, and phase separates with RNA. Nature Communications. 2021;12:1936. doi: 10.1038/s41467-021-21953-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Cubuk J, Alston JJ, Incicco JJ, Holehouse AS, Hall KB, Stuchell-Brereton MD, Soranno A. The Disordered N-Terminal Tail of SARS CoV-2 Nucleocapsid Protein Forms a Dynamic Complex with RNA. bioRxiv. 2023 doi: 10.1101/2023.02.10.527914. [DOI] [PMC free article] [PubMed]
  25. Dadonaite B, Crawford KHD, Radford CE, Farrell AG, Yu TC, Hannon WW, Zhou P, Andrabi R, Burton DR, Liu L, Ho DD, Chu HY, Neher RA, Bloom JD. A pseudovirus system enables deep mutational scanning of the full SARS-CoV-2 spike. Cell. 2023;186:1263–1278. doi: 10.1016/j.cell.2023.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Davey NE, Travé G, Gibson TJ. How viruses hijack cell regulation. Trends in Biochemical Sciences. 2011;36:159–169. doi: 10.1016/j.tibs.2010.10.002. [DOI] [PubMed] [Google Scholar]
  27. Davey NE, Cyert MS, Moses AM. Short linear motifs - Ex nihilo evolution of protein regulation Short linear motifs - The unexplored frontier of the eukaryotic proteome. Cell Communication and Signaling: CCS. 2015;13:9–11. doi: 10.1186/s12964-015-0120-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Del Veliz S, Rivera L, Bustos DM, Uhart M. Analysis of SARS-CoV-2 nucleocapsid phosphoprotein N variations in the binding site to human 14-3-3 proteins. Biochemical and Biophysical Research Communications. 2021;569:154–160. doi: 10.1016/j.bbrc.2021.06.100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. de Silva TI, Liu G, Lindsey BB, Dong D, Moore SC, Hsu NS, Shah D, Wellington D, Mentzer AJ, Angyal A, Brown R, Parker MD, Ying Z, Yao X, Turtle L, Dunachie S, Maini MK, Ogg G, Knight JC, Peng Y, Rowland-Jones SL, Dong T, COVID-19 Genomics UK (COG-UK) Consortium. ISARIC4C Investigators The impact of viral mutations on recognition by SARS-CoV-2 specific T cells. iScience. 2021;24:103353. doi: 10.1016/j.isci.2021.103353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Dinesh DC, Chalupska D, Silhan J, Koutna E, Nencka R, Veverka V, Boura E. Structural basis of RNA recognition by the SARS-CoV-2 nucleocapsid phosphoprotein. PLOS Pathogens. 2020;16:e1009100. doi: 10.1371/journal.ppat.1009100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Dolan PT, Taguwa S, Rangel MA, Acevedo A, Hagai T, Andino R, Frydman J. Principles of dengue virus evolvability derived from genotype-fitness maps in human and mosquito cells. eLife. 2021;10:e61921. doi: 10.7554/eLife.61921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Domingo E, Holland JJ. RNA virus mutations and fitness for survival. Annual review of microbiology. 1997;51:151–178. doi: 10.1146/annurev.micro.51.1.151. [DOI] [PubMed] [Google Scholar]
  33. Echave J, Wilke CO. Biophysical models of protein evolution: understanding the patterns of evolutionary sequence divergence. Annual Review of Biophysics. 2017;46:85–103. doi: 10.1146/annurev-biophys-070816-033819. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Eftink MR. In: Topics in Fluorescence Spectroscopy. Lakowicz JR, editor. Kluwer Academic Publishers; 2000. Intrinsic fluorescence of proteins; pp. 1–13. [DOI] [Google Scholar]
  35. Eigen M. On the nature of virus quasispecies. Trends in Microbiology. 1996;4:216–218. doi: 10.1016/0966-842X(96)20011-3. [DOI] [PubMed] [Google Scholar]
  36. Eisenberg D, McLachlan AD. Solvation energy in protein folding and binding. Nature. 1986;319:199–203. doi: 10.1038/319199a0. [DOI] [PubMed] [Google Scholar]
  37. Elbe S, Buckland-Merrett G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges. 2017;1:33–46. doi: 10.1002/gch2.1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Finkel Y, Mizrahi O, Nachshon A, Weingarten-Gabbay S, Morgenstern D, Yahalom-Ronen Y, Tamir H, Achdout H, Stein D, Israeli O, Beth-Din A, Melamed S, Weiss S, Israely T, Paran N, Schwartz M, Stern-Ginossar N. The coding capacity of SARS-CoV-2. Nature. 2021;589:125–130. doi: 10.1038/s41586-020-2739-1. [DOI] [PubMed] [Google Scholar]
  39. Forsythe HM, Rodriguez Galvan J, Yu Z, Pinckney S, Reardon P, Cooley RB, Zhu P, Rolland AD, Prell JS, Barbar E. Multivalent binding of the partially disordered SARS-CoV-2 nucleocapsid phosphoprotein dimer to RNA. Biophysical Journal. 2021;120:2890–2901. doi: 10.1016/j.bpj.2021.03.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Fung TS, Liu DX. Post-translational modifications of coronavirus proteins: roles and function. Future Virology. 2018;13:405–430. doi: 10.2217/fvl-2018-0008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Garcia-Viloca M, Gao J, Karplus M, Truhlar DG. How enzymes work: analysis by modern rate theory and computer simulations. Science. 2004;303:186–195. doi: 10.1126/science.1088172. [DOI] [PubMed] [Google Scholar]
  42. Gerstein M, Chothia C. Packing at the protein-water interface. PNAS. 1996;93:10167–10172. doi: 10.1073/pnas.93.19.10167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Ghirlando R, Balbo A, Piszczek G, Brown PH, Lewis MS, Brautigam CA, Schuck P, Zhao H. Improving the thermal, radial, and temporal accuracy of the analytical ultracentrifuge through external references. Analytical Biochemistry. 2013;440:81–95. doi: 10.1016/j.ab.2013.05.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Gitlin I, Carbeck JD, Whitesides GM. Why are proteins charged? networks of charge–charge interactions in proteins measured by charge ladders and capillary electrophoresis. Angewandte Chemie International Edition. 2006;45:3022–3060. doi: 10.1002/anie.200502530. [DOI] [PubMed] [Google Scholar]
  45. Gitlin L, Hagai T, LaBarbera A, Solovey M, Andino R. Rapid evolution of virus sequences in intrinsically disordered protein regions. PLOS Pathogens. 2014;10:e1004529. doi: 10.1371/journal.ppat.1004529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Gordon DE, Jang GM, Bouhaddou M, Xu J, Obernier K, White KM, O’Meara MJ, Rezelj VV, Guo JZ, Swaney DL, Tummino TA, Hüttenhain R, Kaake RM, Richards AL, Tutuncuoglu B, Foussard H, Batra J, Haas K, Modak M, Kim M, Haas P, Polacco BJ, Braberg H, Fabius JM, Eckhardt M, Soucheray M, Bennett MJ, Cakir M, McGregor MJ, Li Q, Meyer B, Roesch F, Vallet T, Mac Kain A, Miorin L, Moreno E, Naing ZZC, Zhou Y, Peng S, Shi Y, Zhang Z, Shen W, Kirby IT, Melnyk JE, Chorba JS, Lou K, Dai SA, Barrio-Hernandez I, Memon D, Hernandez-Armenta C, Lyu J, Mathy CJP, Perica T, Pilla KB, Ganesan SJ, Saltzberg DJ, Rakesh R, Liu X, Rosenthal SB, Calviello L, Venkataramanan S, Liboy-Lugo J, Lin Y, Huang X-P, Liu Y, Wankowicz SA, Bohn M, Safari M, Ugur FS, Koh C, Savar NS, Tran QD, Shengjuler D, Fletcher SJ, O’Neal MC, Cai Y, Chang JCJ, Broadhurst DJ, Klippsten S, Sharp PP, Wenzell NA, Kuzuoglu-Ozturk D, Wang H-Y, Trenker R, Young JM, Cavero DA, Hiatt J, Roth TL, Rathore U, Subramanian A, Noack J, Hubert M, Stroud RM, Frankel AD, Rosenberg OS, Verba KA, Agard DA, Ott M, Emerman M, Jura N, von Zastrow M, Verdin E, Ashworth A, Schwartz O, d’Enfert C, Mukherjee S, Jacobson M, Malik HS, Fujimori DG, Ideker T, Craik CS, Floor SN, Fraser JS, Gross JD, Sali A, Roth BL, Ruggero D, Taunton J, Kortemme T, Beltrao P, Vignuzzi M, García-Sastre A, Shokat KM, Shoichet BK, Krogan NJ. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature. 2020;583:459–468. doi: 10.1038/s41586-020-2286-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Greaney AJ, Starr TN, Bloom JD. An antibody-escape estimator for mutations to the SARS-CoV-2 receptor-binding domain. Virus Evolution. 2022;8:veac021. doi: 10.1093/ve/veac021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Gupta NT, Vander Heiden JA, Uduman M, Gadala-Maria D, Yaari G, Kleinstein SH. Change-O: A toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics. 2015;31:3356–3358. doi: 10.1093/bioinformatics/btv359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, Sagulenko P, Bedford T, Neher RA. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. 2018;34:4121–4123. doi: 10.1093/bioinformatics/bty407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Hagai T, Azia A, Babu MM, Andino R. Use of host-like peptide motifs in viral proteins is a prevalent strategy in host-virus interactions. Cell Reports. 2014;7:1729–1739. doi: 10.1016/j.celrep.2014.04.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Ho WL, Huang JR. The return of the rings: Evolutionary convergence of aromatic residues in the intrinsically disordered regions of RNA‐binding proteins for liquid–liquid phase separation. Protein Science. 2022;31:1–7. doi: 10.1002/pro.4317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Hu Y, Lewandowski EM, Tan H, Zhang X, Morgan RT, Zhang X, Jacobs LMC, Butler SG, Gongora MV, Choy J, Deng X, Chen Y, Wang J. Naturally occurring mutations of SARS-CoV-2 main protease confer drug resistance to nirmatrelvir. ACS Central Science. 2023;9:1658–1669. doi: 10.1021/acscentsci.3c00538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Iserman C, Roden CA, Boerneke MA, Sealfon RSG, McLaughlin GA, Jungreis I, Fritch EJ, Hou YJ, Ekena J, Weidmann CA, Theesfeld CL, Kellis M, Troyanskaya OG, Baric RS, Sheahan TP, Weeks KM, Gladfelter AS. Genomic RNA elements drive phase separation of the SARS-CoV-2 nucleocapsid. Molecular Cell. 2020;80:1078–1091. doi: 10.1016/j.molcel.2020.11.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Jack A, Ferro LS, Trnka MJ, Wehri E, Nadgir A, Nguyenla X, Fox D, Costa K, Stanley S, Schaletzky J, Yildiz A. SARS-CoV-2 nucleocapsid protein forms condensates with viral genomic RNA. PLOS Biology. 2021;19:e3001425. doi: 10.1371/journal.pbio.3001425. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Javed I, Butt MA, Khalid S, Shehryar T, Amin R, Syed AM, Sadiq M. Face mask detection and social distance monitoring system for COVID-19 pandemic. Multimedia Tools and Applications. 2023;82:14135–14152. doi: 10.1007/s11042-022-13913-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Johnson BA, Zhou Y, Lokugamage KG, Vu MN, Bopp N, Crocquet-Valdes PA, Kalveram B, Schindewolf C, Liu Y, Scharton D, Plante JA, Xie X, Aguilar P, Weaver SC, Shi PY, Walker DH, Routh AL, Plante KS, Menachery VD. Nucleocapsid mutations in SARS-CoV-2 augment replication and pathogenesis. PLOS Pathogens. 2022;18:e1010627. doi: 10.1371/journal.ppat.1010627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Kar M, Dar F, Welsh TJ, Vogel LT, Kühnemuth R, Majumdar A, Krainer G, Franzmann TM, Alberti S, Seidel CAM, Knowles TPJ, Hyman AA, Pappu RV. Phase-separating RNA-binding proteins form heterogeneous distributions of clusters in subsaturated solutions. PNAS. 2022;119:e2202222119. doi: 10.1073/pnas.2202222119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Kauzmann W. Some factors in the interpretation of protein denaturationadvances in protein chemistry. Advance in Protein Chemistry. 1959;14:1–63. doi: 10.1016/S0065-3233(08)60608-7. [DOI] [PubMed] [Google Scholar]
  59. Kepler L, Hamins-Puertolas M, Rasmussen DA. Decomposing the sources of SARS-CoV-2 fitness variation in the United States. Virus Evolution. 2021;7:veab073. doi: 10.1093/ve/veab073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Klein S, Cortese M, Winter SL, Wachsmuth-Melm M, Neufeldt CJ, Cerikan B, Stanifer ML, Boulant S, Bartenschlager R, Chlanda P. SARS-CoV-2 structure and replication characterized by in situ cryo-electron tomography. Nature Communications. 2020;11:5885. doi: 10.1038/s41467-020-19619-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Kruse T, Benz C, Garvanska DH, Lindqvist R, Mihalic F, Coscia F, Inturi R, Sayadi A, Simonetti L, Nilsson E, Ali M, Kliche J, Moliner Morro A, Mund A, Andersson E, McInerney G, Mann M, Jemth P, Davey NE, Överby AK, Nilsson J, Ivarsson Y. Large scale discovery of coronavirus-host factor protein interaction motifs reveals SARS-CoV-2 specific mechanisms and vulnerabilities. Nature Communications. 2021;12:1–13. doi: 10.1038/s41467-021-26498-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Kubinski HC, Despres HW, Johnson BA, Schmidt MM, Jaffrani SA, Mills MG, Lokugamage K, Dumas CM, Shirley DJ, Estes LK, Pekosz A, Crothers JW, Roychoudhury P, Greninger AL, Jerome KR, Di Genova BM, Walker DH, Ballif BA, Ladinsky MS, Bjorkman PJ, Menachery VD, Bruce EA. Variant Mutation in SARS-CoV-2 Nucleocapsid Enhances Viral Infection via Altered Genomic Encapsidation. bioRxiv. 2024 doi: 10.1101/2024.03.08.584120. [DOI]
  63. Kuo L, Hurst-Hess KR, Koetzner CA, Masters PS. Analyses of coronavirus assembly interactions with interspecies membrane and nucleocapsid protein chimeras. Journal of Virology. 2016;90:4357–4368. doi: 10.1128/JVI.03212-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Lafforgue G, Michon T, Charon J. Analysis of the contribution of intrinsic disorder in shaping potyvirus genetic Diversity. Viruses. 2022;14:1959. doi: 10.3390/v14091959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Lässig M, Mustonen V, Walczak AM. Predicting evolution. Nature Ecology & Evolution. 2017;1:77. doi: 10.1038/s41559-017-0077. [DOI] [PubMed] [Google Scholar]
  66. Leary S, Gaudieri S, Parker MD, Chopra A, James I, Pakala S, Alves E, John M, Lindsey BB, Keeley AJ, Rowland-Jones SL, Swanson MS, Ostrov DA, Bubenik JL, Das SR, Sidney J, Sette A, COVID-19 Genomics UK (COG-UK) consortium. de Silva TI, Phillips E, Mallal S. Generation of a novel SARS-CoV-2 Sub-genomic RNA due to the R203K/G204R variant in nucleocapsid: homologous recombination has potential to change SARS-CoV-2 at both protein and RNA level. Pathogens & Immunity. 2021;6:27–49. doi: 10.20411/pai.v6i2.460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Li JY, Liao CH, Wang Q, Tan YJ, Luo R, Qiu Y, Ge XY. The ORF6, ORF8 and nucleocapsid proteins of SARS-CoV-2 inhibit type I interferon signaling pathway. Virus Research. 2020;286:198074. doi: 10.1016/j.virusres.2020.198074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning APJ, Dokholyan NV, Echave J, Elofsson A, Gerloff DL, Goldstein RA, Grahnen JA, Holder MT, Lakner C, Lartillot N, Lovell SC, Naylor G, Perica T, Pollock DD, Pupko T, Regan L, Roger A, Rubinstein N, Shakhnovich E, Sjölander K, Sunyaev S, Teufel AI, Thorne JL, Thornton JW, Weinreich DM, Whelan S. The interface of protein structure, protein biophysics, and molecular evolution. Protein Science. 2012;21:769–785. doi: 10.1002/pro.2071. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Lin Y, Currie SL, Rosen MK. Intrinsically disordered sequences enable modulation of protein phase separation through distributed tyrosine motifs. The Journal of Biological Chemistry. 2017;292:19110–19120. doi: 10.1074/jbc.M117.800466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. López-Muñoz AD, Kosik I, Holly J, Yewdell JW. Cell surface SARS-CoV-2 nucleocapsid protein modulates innate and adaptive immunity. Science Advances. 2022;8:eabp9770. doi: 10.1126/sciadv.abp9770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Lu S, Ye Q, Singh D, Cao Y, Diedrich JK, Yates JR, Villa E, Cleveland DW, Corbett KD. The SARS-CoV-2 nucleocapsid phosphoprotein forms mutually exclusive condensates with RNA and the membrane-associated M protein. Nature Communications. 2021;12:502. doi: 10.1038/s41467-020-20768-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Ma J, Zhao H, Sandmaier J, Alexander Liddle J, Schuck P. Variable field analytical ultracentrifugation: II gravitational sweep sedimentation velocity. Biophysical Journal. 2016;110:103–112. doi: 10.1016/j.bpj.2015.11.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Mao AH, Crick SL, Vitalis A, Chicoine CL, Pappu RV. Net charge per residue modulates conformational ensembles of intrinsically disordered proteins. PNAS. 2010;107:8183–8188. doi: 10.1073/pnas.0911107107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Masters PS. Coronavirus genomic RNA packaging. Virology. 2019;537:198–207. doi: 10.1016/j.virol.2019.08.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Mears HV, Young GR, Sanderson T, Harvey R, Crawford M, Snell DM, Fowler AS, Hussain S, Nicod J, Peacock TP, Emmott E, Finsterbusch K, Luptak J, Wall E, Williams B, Gandhi S, Swanton C, Bauer DL. Emergence of new subgenomic mRNAs in SARS-CoV-2. bioRxiv. 2022 doi: 10.1101/2022.04.20.488895. [DOI]
  76. Mihalič F, Simonetti L, Giudice G, Sander MR, Lindqvist R, Peters MBA, Benz C, Kassa E, Badgujar D, Inturi R, Ali M, Krystkowiak I, Sayadi A, Andersson E, Aronsson H, Söderberg O, Dobritzsch D, Petsalaki E, Överby AK, Jemth P, Davey NE, Ivarsson Y. Large-scale phage-based screening reveals extensive pan-viral mimicry of host short linear motifs. Nature Communications. 2023;14:2409. doi: 10.1038/s41467-023-38015-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nature Methods. 2022;19:679–682. doi: 10.1038/s41592-022-01488-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Moses D, Ginell GM, Holehouse AS, Sukenik S. Intrinsically disordered regions are poised to act as sensors of cellular chemistry. Trends in Biochemical Sciences. 2023;48:1019–1034. doi: 10.1016/j.tibs.2023.08.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Obermeyer F, Jankowiak M, Barkas N, Schaffner SF, Pyle JD, Yurkovetskiy L, Bosso M, Park DJ, Babadi M, MacInnis BL, Luban J, Sabeti PC, Lemieux JE. Analysis of 6.4 million SARS-CoV-2 genomes identifies mutations associated with fitness. Science. 2022;376:1327–1332. doi: 10.1126/science.abm1208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Osorio D, Rondón-Villarreal P, Torres R. Peptides: a package for data mining of antimicrobial peptides. The R Journal. 2015;7:4. doi: 10.32614/RJ-2015-001. [DOI] [Google Scholar]
  81. Oulas A, Zanti M, Tomazou M, Zachariou M, Minadakis G, Bourdakou MM, Pavlidis P, Spyrou GM. Generalized linear models provide a measure of virulence for specific mutations in SARS-CoV-2 strains. PLOS ONE. 2021;16:e0238665. doi: 10.1371/journal.pone.0238665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Pan P, Shen M, Yu Z, Ge W, Chen K, Tian M, Xiao F, Wang Z, Wang J, Jia Y, Wang W, Wan P, Zhang J, Chen W, Lei Z, Chen X, Luo Z, Zhang Q, Xu M, Li G, Li Y, Wu J. SARS-CoV-2 N protein promotes NLRP3 inflammasome activation to induce hyperinflammation. Nature Communications. 2021;12:1–17. doi: 10.1038/s41467-021-25015-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Papadopoulos JS, Agarwala R. COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics. 2007;23:1073–1079. doi: 10.1093/bioinformatics/btm076. [DOI] [PubMed] [Google Scholar]
  84. Pappu RV, Cohen SR, Dar F, Farag M, Kar M. Phase transitions of associative biomacromolecules. Chemical Reviews. 2023;123:8945–8987. doi: 10.1021/acs.chemrev.2c00814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Perdikari TM, Murthy AC, Ryan VH, Watters S, Naik MT, Fawzi NL. SARS‐CoV‐2 nucleocapsid protein phase‐separates with RNA and with human hnRNPs. The EMBO Journal. 2020;39:1–35. doi: 10.15252/embj.2020106478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Pettersen EF, Goddard TD, Huang CC, Meng EC, Couch GS, Croll TI, Morris JH, Ferrin TE. UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Protein Science. 2021;30:70–82. doi: 10.1002/pro.3943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Pontoriero L, Schiavina M, Korn SM, Schlundt A, Pierattelli R, Felli IC. NMR reveals specific tracts within the intrinsically disordered regions of the SARS-CoV-2 nucleocapsid protein involved in RNA encountering. Biomolecules. 2022;12:929. doi: 10.3390/biom12070929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Redzic JS, Lee E, Born A, Issaian A, Henen MA, Nichols PJ, Blue A, Hansen KC, D’Alessandro A, Vögeli B, Eisenmesser EZ. The inherent dynamics and interaction sites of the SARS-CoV-2 nucleocapsid N-terminal region. Journal of Molecular Biology. 2021;433:167108. doi: 10.1016/j.jmb.2021.167108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Riback JA, Katanski CD, Kear-Scott JL, Pilipenko EV, Rojek AE, Sosnick TR, Drummond DA. Stress-triggered phase separation is an adaptive, evolutionarily tuned response. Cell. 2017;168:1028–1040. doi: 10.1016/j.cell.2017.02.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Ribeiro-Filho HV, Jara GE, Batista FAH, Schleder GR, Costa Tonoli CC, Soprano AS, Guimarães SL, Borges AC, Cassago A, Bajgelman MC, Marques RE, Trivella DBB, Franchini KG, Figueira ACM, Benedetti CE, Lopes-de-Oliveira PS. Structural dynamics of SARS-CoV-2 nucleocapsid protein induced by RNA binding. PLOS Computational Biology. 2022;18:e1010121. doi: 10.1371/journal.pcbi.1010121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Rochman ND, Wolf YI, Faure G, Mutz P, Zhang F, Koonin EV. Ongoing global and regional adaptive evolution of SARS-CoV-2. PNAS. 2021;118:1–10. doi: 10.1073/pnas.2104241118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Roden CA, Dai Y, Giannetti CA, Seim I, Lee M, Sealfon R, McLaughlin GA, Boerneke MA, Iserman C, Wey SA, Ekena JL, Troyanskaya OG, Weeks KM, You L, Chilkoti A, Gladfelter AS. Double-stranded RNA drives SARS-CoV-2 nucleocapsid protein to undergo phase separation at specific temperatures. Nucleic Acids Research. 2022;50:8168–8192. doi: 10.1093/nar/gkac596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Różycki B, Boura E. Conformational ensemble of the full-length SARS-CoV-2 nucleocapsid (N) protein based on molecular simulations and SAXS data. Biophysical Chemistry. 2022;288:106843. doi: 10.1016/j.bpc.2022.106843. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Saldivar-Espinoza B, Macip G, Pujadas G, Garcia-Vallve S. Could nucleocapsid be a next-generation COVID-19 vaccine candidate? International Journal of Infectious Diseases. 2022;125:231–232. doi: 10.1016/j.ijid.2022.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Saldivar-Espinoza B, Garcia-Segura P, Novau-Ferré N, Macip G, Martínez R, Puigbò P, Cereto-Massagué A, Pujadas G, Garcia-Vallve S. The mutational landscape of SARS-CoV-2. International Journal of Molecular Sciences. 2023;24:9072. doi: 10.3390/ijms24109072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Savastano A, Ibáñez de Opakua A, Rankovic M, Zweckstetter M. Nucleocapsid protein of SARS-CoV-2 phase separates into RNA-rich polymerase-containing condensates. Nature Communications. 2020;11:6041. doi: 10.1038/s41467-020-19843-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Schmidt U, Weigert M, Broaddus C, Myers G. In: Medical image computing and computer assisted intervention – MICCAI 2018, Lecture Notes in computer science. Frangi A, Schnabel J, Davatzikos C, Alberola-López C, Fichtinger G, editors. Springer; 2018. Cell detection with star-convex Polygons in; pp. 265–273. [DOI] [Google Scholar]
  98. Schuck P, Zhao H, Brautigam CA, Ghirlando R. Basic principles of analytical ultracentrifugation. CRC Press; 2015. [DOI] [Google Scholar]
  99. Schuck P. Sedimentation Velocity Analytical Ultracentrifugation: Discrete Species and Size-Distributions of Macromolecules and Particles. CRC Press; 2016. [DOI] [Google Scholar]
  100. Schuck P, Zhao H. Sedimentation velocity analytical ultracentrifugation: interacting systems. CRC Press; 2017. [DOI] [Google Scholar]
  101. Schuck P, Zhao H. Diversity of short linear interaction motifs in SARS-CoV-2 nucleocapsid protein. mBio. 2023;14:e0238823. doi: 10.1128/mbio.02388-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Serohijos AWR, Shakhnovich EI. Merging molecular mechanism and evolution: theory and computation at the interface of biophysics and evolutionary population genetics. Current Opinion in Structural Biology. 2014;26:84–91. doi: 10.1016/j.sbi.2014.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Shuler G, Hagai T. Rapidly evolving viral motifs mostly target biophysically constrained binding pockets of host proteins. Cell Reports. 2022;40:111212. doi: 10.1016/j.celrep.2022.111212. [DOI] [PubMed] [Google Scholar]
  104. Sikosek T, Chan HS. Biophysics of protein evolution and evolutionary protein biophysics. Journal of the Royal Society, Interface. 2014;11:20140419. doi: 10.1098/rsif.2014.0419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Starr TN, Thornton JW. Epistasis in protein evolution. Protein Science. 2016;25:1204–1218. doi: 10.1002/pro.2897. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Stevens LJ, Pruijssers AJ, Lee HW, Gordon CJ, Tchesnokov EP, Gribble J, George AS, Hughes TM, Lu X, Li J, Perry JK, Porter DP, Cihlar T, Sheahan TP, Baric RS, Götte M, Denison MR. Mutations in the SARS-CoV-2 RNA-dependent RNA polymerase confer resistance to remdesivir by distinct mechanisms. Science Translational Medicine. 2022;14:eabo0718. doi: 10.1126/scitranslmed.abo0718. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Syed AM, Taha TY, Tabata T, Chen IP, Ciling A, Khalid MM, Sreekumar B, Chen PY, Hayashi JM, Soczek KM, Ott M, Doudna JA. Rapid assessment of SARS-CoV-2-evolved variants using virus-like particles. Science. 2021;374:1626–1632. doi: 10.1126/science.abl6184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Syed AM, Ciling A, Khalid MM, Sreekumar B, Chen PY, Kumar GR, Silva I, Milbes B, Kojima N, Hess V, Shacreaw M, Lopez L, Brobeck M, Turner F, Spraggon L, Taha TY, Tabata T, Chen IP, Ott M, Doudna JA. Omicron mutations enhance infectivity and reduce antibody neutralization of SARS-CoV-2 virus-like particles. bioRxiv. 2022 doi: 10.1101/2021.12.20.21268048. [DOI] [PMC free article] [PubMed]
  109. Tarczewska A, Kolonko-Adamska M, Zarębski M, Dobrucki J, Ożyhar A, Greb-Markiewicz B. The method utilized to purify the SARS-CoV-2 N protein can affect its molecular properties. International Journal of Biological Macromolecules. 2021;188:391–403. doi: 10.1016/j.ijbiomac.2021.08.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Tian Y, Zhang G, Liu H, Ding P, Jia R, Zhou J, Chen Y, Qi Y, Du J, Liang C, Zhu X, Wang A. Screening and identification of B cell epitope of the nucleocapsid protein in SARS-CoV-2 using the monoclonal antibodies. Applied Microbiology and Biotechnology. 2022;106:1151–1164. doi: 10.1007/s00253-022-11769-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Tokuriki N, Oldfield CJ, Uversky VN, Berezovsky IN, Tawfik DS. Do viral proteins possess unique biophysical features? Trends in Biochemical Sciences. 2009;34:53–59. doi: 10.1016/j.tibs.2008.10.009. [DOI] [PubMed] [Google Scholar]
  112. Tokuriki N, Tawfik DS. Protein dynamism and evolvability. Science. 2009;324:203–207. doi: 10.1126/science.1169375. [DOI] [PubMed] [Google Scholar]
  113. Viana R, Moyo S, Amoako DG, Tegally H, Scheepers C, Althaus CL, Anyaneji UJ, Bester PA, Boni MF, Chand M, Choga WT, Colquhoun R, Davids M, Deforche K, Doolabh D, du Plessis L, Engelbrecht S, Everatt J, Giandhari J, Giovanetti M, Hardie D, Hill V, Hsiao N-Y, Iranzadeh A, Ismail A, Joseph C, Joseph R, Koopile L, Kosakovsky Pond SL, Kraemer MUG, Kuate-Lere L, Laguda-Akingba O, Lesetedi-Mafoko O, Lessells RJ, Lockman S, Lucaci AG, Maharaj A, Mahlangu B, Maponga T, Mahlakwane K, Makatini Z, Marais G, Maruapula D, Masupu K, Matshaba M, Mayaphi S, Mbhele N, Mbulawa MB, Mendes A, Mlisana K, Mnguni A, Mohale T, Moir M, Moruisi K, Mosepele M, Motsatsi G, Motswaledi MS, Mphoyakgosi T, Msomi N, Mwangi PN, Naidoo Y, Ntuli N, Nyaga M, Olubayo L, Pillay S, Radibe B, Ramphal Y, Ramphal U, San JE, Scott L, Shapiro R, Singh L, Smith-Lawrence P, Stevens W, Strydom A, Subramoney K, Tebeila N, Tshiabuila D, Tsui J, van Wyk S, Weaver S, Wibmer CK, Wilkinson E, Wolter N, Zarebski AE, Zuze B, Goedhals D, Preiser W, Treurnicht F, Venter M, Williamson C, Pybus OG, Bhiman J, Glass A, Martin DP, Rambaut A, Gaseitsiwe S, von Gottberg A, de Oliveira T. Rapid epidemic expansion of the SARS-CoV-2 Omicron variant in southern Africa. Nature. 2022;603:679–686. doi: 10.1038/s41586-022-04411-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Wang K, Yu S, Ji X, Lakner C, Griffing A, Thorne JL. Roles of solvent accessibility and gene expression in modeling protein sequence evolution. Evolutionary Bioinformatics Online. 2015;11:85–96. doi: 10.4137/EBO.S22911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Wang Y, Lei R, Nourmohammad A, Wu NC. Antigenic evolution of human influenza H3N2 neuraminidase is constrained by charge balancing. eLife. 2021;10:e72516. doi: 10.7554/eLife.72516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  116. Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, Yuan ML, Zhang YL, Dai FH, Liu Y, Wang QM, Zheng JJ, Xu L, Holmes EC, Zhang YZ. A new coronavirus associated with human respiratory disease in China. Nature. 2020;579:265–269. doi: 10.1038/s41586-020-2008-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Wu W, Cheng Y, Zhou H, Sun C, Zhang S. The SARS-CoV-2 nucleocapsid protein: its role in the viral life cycle, structure and functions, and use as a potential target in the development of vaccines and diagnostics. Virology Journal. 2023;20:6. doi: 10.1186/s12985-023-01968-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  118. Yao H, Song Y, Chen Y, Wu N, Xu J, Sun C, Zhang J, Weng T, Zhang Z, Wu Z, Cheng L, Shi D, Lu X, Lei J, Crispin M, Shi Y, Li L, Li S. Molecular architecture of the SARS-CoV-2 virus. Cell. 2020;183:730–738. doi: 10.1016/j.cell.2020.09.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Yaron TM, Heaton BE, Levy TM, Johnson JL, Jordan TX, Cohen BM, Kerelsky A, Lin T-Y, Liberatore KM, Bulaon DK, Van Nest SJ, Koundouros N, Kastenhuber ER, Mercadante MN, Shobana-Ganesh K, He L, Schwartz RE, Chen S, Weinstein H, Elemento O, Piskounova E, Nilsson-Payant BE, Lee G, Trimarco JD, Burke KN, Hamele CE, Chaparian RR, Harding AT, Tata A, Zhu X, Tata PR, Smith CM, Possemato AP, Tkachev SL, Hornbeck PV, Beausoleil SA, Anand SK, Aguet F, Getz G, Davidson AD, Heesom K, Kavanagh-Williamson M, Matthews DA, tenOever BR, Cantley LC, Blenis J, Heaton NS. Host protein kinases required for SARS-CoV-2 nucleocapsid phosphorylation and viral replication. Science Signaling. 2022;15:1–17. doi: 10.1126/scisignal.abm0808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Yu H, Guan F, Miller H, Lei J, Liu C. The role of SARS-CoV-2 nucleocapsid protein in antiviral immunity and vaccine development. Emerging Microbes & Infections. 2023;12:2164219. doi: 10.1080/22221751.2022.2164219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Zarin T, Tsai CN, Nguyen Ba AN, Moses AM. Selection maintains signaling function of a highly diverged intrinsically disordered region. PNAS. 2017;114:E1450–E1459. doi: 10.1073/pnas.1614787114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  122. Zarin T, Strome B, Peng G, Pritišanac I, Forman-Kay JD, Moses AM. Identifying molecular features that are associated with biological function of intrinsically disordered protein regions. eLife. 2021;10:e60220. doi: 10.7554/eLife.60220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  123. Zhang X, Zheng R, Li Z, Ma J. Liquid-liquid phase separation in viral function. Journal of Molecular Biology. 2023;435:167955. doi: 10.1016/j.jmb.2023.167955. [DOI] [PubMed] [Google Scholar]
  124. Zhao H, Ghirlando R, Alfonso C, Arisaka F, Attali I, Bain DL, Bakhtina MM, Becker DF, Bedwell GJ, Bekdemir A, Besong TMD, Birck C, Brautigam CA, Brennerman W, Byron O, Bzowska A, Chaires JB, Chaton CT, Cölfen H, Connaghan KD, Crowley KA, Curth U, Daviter T, Dean WL, Díez AI, Ebel C, Eckert DM, Eisele LE, Eisenstein E, England P, Escalante C, Fagan JA, Fairman R, Finn RM, Fischle W, de la Torre JG, Gor J, Gustafsson H, Hall D, Harding SE, Cifre JGH, Herr AB, Howell EE, Isaac RS, Jao S-C, Jose D, Kim S-J, Kokona B, Kornblatt JA, Kosek D, Krayukhina E, Krzizike D, Kusznir EA, Kwon H, Larson A, Laue TM, Le Roy A, Leech AP, Lilie H, Luger K, Luque-Ortega JR, Ma J, May CA, Maynard EL, Modrak-Wojcik A, Mok Y-F, Mücke N, Nagel-Steger L, Narlikar GJ, Noda M, Nourse A, Obsil T, Park CK, Park J-K, Pawelek PD, Perdue EE, Perkins SJ, Perugini MA, Peterson CL, Peverelli MG, Piszczek G, Prag G, Prevelige PE, Raynal BDE, Rezabkova L, Richter K, Ringel AE, Rosenberg R, Rowe AJ, Rufer AC, Scott DJ, Seravalli JG, Solovyova AS, Song R, Staunton D, Stoddard C, Stott K, Strauss HM, Streicher WW, Sumida JP, Swygert SG, Szczepanowski RH, Tessmer I, Toth RT, Tripathy A, Uchiyama S, Uebel SFW, Unzai S, Gruber AV, von Hippel PH, Wandrey C, Wang S-H, Weitzel SE, Wielgus-Kutrowska B, Wolberger C, Wolff M, Wright E, Wu Y-S, Wubben JM, Schuck P. A multilaboratory comparison of calibration accuracy and the performance of external references in analytical ultracentrifugation. PLOS ONE. 2015;10:e0126420. doi: 10.1371/journal.pone.0126420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Zhao H, Wu D, Nguyen A, Li Y, Adão RC, Valkov E, Patterson GH, Piszczek G, Schuck P. Energetic and structural features of SARS-CoV-2 N-protein co-assemblies with nucleic acids. iScience. 2021;24:102523. doi: 10.1016/j.isci.2021.102523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Zhao H, Nguyen A, Wu D, Li Y, Hassan SA, Chen J, Shroff H, Piszczek G, Schuck P. Plasticity in structure and assembly of SARS-CoV-2 nucleocapsid protein. PNAS Nexus. 2022;1:gac049. doi: 10.1093/pnasnexus/pgac049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  127. Zhao H, Wu D, Hassan SA, Nguyen A, Chen J, Piszczek G, Schuck P. A conserved oligomerization domain in the disordered linker of coronavirus nucleocapsid proteins. Science Advances. 2023;9:eadg6473. doi: 10.1126/sciadv.adg6473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  128. Zhao H, Syed AM, Khalid MM, Nguyen A, Ciling A, Wu D, Yau WM, Srinivasan S, Esposito D, Doudna JA, Piszczek G, Ott M, Schuck P. Assembly of SARS-CoV-2 nucleocapsid protein with nucleic acid. Nucleic Acids Research. 2024;52:6647–6661. doi: 10.1093/nar/gkae256. [DOI] [PMC free article] [PubMed] [Google Scholar]

eLife assessment

Mauricio Comas-Garcia 1

This important manuscript provides new insights into the biophysics of the SARS-CoV-2 nucleocapsid. The evidence, which relies on a convincing combination of genetic and biophysical data, nicely supports the conclusions.

Reviewer #2 (Public Review):

Anonymous

This work focuses on the biochemical features of the SARS-CoV-2 Nucleocapsid (N) protein, which condenses the large viral RNA genome inside the virus and also plays other roles in the infected cell. The N protein of SARS-CoV-2 and other coronaviruses is known to contain two globular RNA-binding domains, the NTD and CTD, flanked by disordered regions. The central disordered linker is particularly well understood: it contains a long SR-rich region that is extensively phosphorylated in infected cells, followed by a leucine-rich helical segment that was shown previously by these authors to promote N protein oligomerization.

In the current work, the authors analyze 5 million viral sequence variants to assess the conservation of specific amino acids and general sequence features in the major regions of the N protein. This analysis shows that disordered regions are particularly variable but that the general hydrophobic and charge character of these regions are conserved, particularly in the SR and leucine-rich regions of the central linker. The authors then construct a series of N proteins bearing the most prevalent mutations seen in the Delta and Omicron variants, and they subject these mutant proteins to a comprehensive array of biophysical analyses (temperature sensitivity, circular dichroism, oligomerization, RNA binding, and phase separation).

The results include a number of novel findings that are worthy of further exploration. Most notable are the analyses of the previously unstudied P31L mutation of the Omicron variant. The authors use ColabFold and sedimentation analysis to suggest that this mutation promotes self-association of the disordered N-terminal region and stimulates the formation of N protein condensates. Although the affinity of this interaction is low, it seems likely that this mutation enhances viral fitness by promoting N-terminal interactions. The work also addresses the impact of another unstudied mutation, D63G, that is located on the surface of the globular NTD and has no significant effect on the properties analyzed here, raising interesting questions about how this mutation enhances viral fitness. Finally, the paper ends with studies showing that another common mutant, R203K/G204R, disrupts phase separation and might thereby alter N protein function in a way that enhances viral fitness. These provocative results set the stage for in-depth analyses of these mutations in future work.

Reviewer #3 (Public Review):

Anonymous

Nguyen, Zhao et al. used bioinformatic analysis of mutational variants of SARS-CoV-2 Nucleocapsid (N) protein from the large genomic database of SARS-CoV-2 sequences to identify domains and regions of N where mutations are more highly represented, and computationally determined the effects of these mutations on the physicochemical properties of the protein. They found that the intrinsically disordered regions (IDRs) of N protein are more highly mutated than structured regions, and that these mutations can lead to higher variability in the physical properties of these domains. These computational predictions are compared to in vitro biophysical experiments to assess the effects of identified mutations on the thermodynamic stability, oligomeric state, particle formation, and liquid-liquid phase separation of a few exemplary mutants.

The paper is well written, easy to follow and the conclusions drawn are supported by the evidence presented. The analyses and conclusions are interesting and will be of value to virologists, cell biologists, and biophysicists studying SARS-CoV-2 function and assembly.

eLife. 2024 Jun 28;13:RP94836. doi: 10.7554/eLife.94836.3.sa3

Author response

Ai Nguyen 1, Joy Zhao 2, Dulguun Myagmarsuren 3, Sanjana Srinivasan 4, Di Wu 5, Jiji Chen 6, Grzegorz Piszczek 7, Peter Schuck 8

The following is the authors’ response to the original reviews.

Public Reviews:

Reviewer #1 (Public Review):

The study is highly interesting and the applied methods are target-oriented. The biophysical characterization of viable N-protein species and several representative N-protein mutants is supported by the data, including polarity, hydrophobicity, thermodynamic stability, CD spectra, particle size, and especially protein self-association. The physicochemical parameters for viable N-protein and related coronavirus are described for comparison in detail. However, the conclusion becomes less convincing that the interaction of peptides or motifs was judged by different biophysical results, with no more direct data about peptide interaction. Additionally, the manuscript could benefit from more results involving peptide interaction to support the author's opinions or make expression more accurate when concerning the interaction of motifs. Although the authors put a lot of effort into the study, there are still some questions to answer.

We thank the Reviewer for this assessment and wholeheartedly agree that there are still many questions. The main thrust of the present work was not intended to unravel the detailed mechanistic origin of all observations, but rather to juxtapose the different observations made with different viable N-protein species across the mutant spectrum, in order to get a sense of how narrowly the biophysical phenotype is confined to ensure virus viability. Such a study has become possible for the first time with the unprecedented genomic database of SARS-CoV-2. This has led to observations of non-local effects of individual mutations that are not independent and non-additive relative to the effects of other mutations, and in that sense we have inferred ‘interactions’. These might be mediated by direct contacts or indirectly through altered chain configurations. In the revised manuscript we have clarified this point.

Meanwhile, a number of documented direct physical intra-molecular and intra-dimer interactions provide a context to our study of mutation effects. The flexibility of the IDRs provides a rich variety of contacts that have been observed in molecular dynamics and single-molecule fluorescence studies (Rozycki & Boura, Biophys Chem. 2022 and Cubuk et al, Nat Communs 2021). We have previously carried out detailed hydrodynamic studies of self-association interfaces located in the leucine-rich region. More recently, NMR data just published by the Blackledge laboratory (Botova et al., bioRxiv 2024) extend the list of intra-molecular contacts with the observation of long-range intra-molecular interactions between the NTD and the CTD, NTD and the phosphorylated SR-rich region, and NTD and the previously studied leucine-rich region. The latter contacts require the C-terminal region of the linker to loop back onto the NTD, which may well introduce susceptibility to any of the linker mutations. However, detailed linker configurations are beyond the scope of the present work.

With regard to the effects of the Omicron mutations in the N-arm IDR, we have shown hydrodynamic data directly demonstrating peptide self-association, and we are currently working on a more detailed functional follow-up study which we hope to communicate soon.

Reviewer #2 (Public Review):

Summary:This work focuses on the biochemical features of the SARS-CoV-2 Nucleocapsid (N)protein, which condenses the large viral RNA genome inside the virus and also plays other roles in the infected cell. The N protein of SARS-CoV-2 and other coronaviruses is known to contain two globular RNA-binding domains, the NTD and CTD, flanked by disordered regions. The central disordered linker is particularly well understood: it contains a long SR-rich region that is extensively phosphorylated in infected cells, followed by a leucine-rich helical segment that was shown previously by these authors to promote N protein oligomerization.

In the current work, the authors analyze 5 million viral sequence variants to assess the conservation of specific amino acids and general sequence features in the major regions of the N protein. This analysis shows that disordered regions are particularly variable but that the general hydrophobic and charge character of these regions are conserved, particularly in the SR and leucine-rich regions of the central linker. The authors then construct a series of N proteins bearing the most prevalent mutations seen in the Delta and Omicron variants, and they subject these mutant proteins to a comprehensive array of biophysical analyses (temperature sensitivity, circular dichroism, oligomerization, RNA binding, and phase separation).

Strengths:

The results include a number of novel findings that are worthy of further exploration. Most notable are the analyses of the previously unstudied P31L mutation of the Omicron variant. The authors use ColabFold and sedimentation analysis to suggest that this mutation promotes the self-association of the disordered N-terminal region and stimulates the formation of N protein condensates. Although the affinity of this interaction is low, it seems likely that this mutation enhances viral fitness by promoting N-terminal interactions. The work also addresses the impact of another unstudied mutation, D63G, that is located on the surface of the globular NTD and has no significant effect on the properties analyzed here, raising interesting questions about how this mutation enhances viral fitness. Finally, the paper ends with studies showing that another common mutant, R203K/G204R,disrupts phase separation and might thereby alter N protein function in a way that enhances viral fitness.

Thank you for highlighting the strengths of our paper.

Weaknesses:

In general, the results in the paper confirm previous ideas about the role of N protein regions. The key novelty of the paper lies in the identification of point mutations, notablyP13L, that suggest previously unsuspected functions of the N-terminal disordered region in protein oligomerization. The paper would benefit from further exploration of these possibilities.

We agree that the bioinformatic results confirm previous ideas about the role of the N protein regions. However, we believe our results go beyond the previous thinking in a crucial aspect, which is that we examine the full (so far known) mutant spectrum of N-protein. Properties previously inferred from the inspection of single consensus sequences can be misleading because of the quasispecies nature of RNA viruses. By considering the mutant spectrum we can obtain a sense for how significant differences in the physicochemical properties of the different regions are, and how much variation is possible without jeopardizing essential protein functions.

With regard to the N-arm IDR mutations we believe this deserves a separate study focusing on the apparent N-arm function. Our rationale for presenting some initial N-arm results in the current paper was to highlight how the variability of N-protein species in the mutant spectrum can even include differences in the type and number of protein self-association interfaces.

Reviewer #3 (Public Review):

Nguyen, Zhao, et al. used bioinformatic analysis of mutational variants of SARS-CoV-2Nucleocapsid (N) protein from the large genomic database of SARS-CoV-2 sequences to identify domains and regions of N where mutations are more highly represented and computationally determined the effects of these mutations on the physicochemical properties of the protein. They found that the intrinsically disordered regions (IDRs) of N protein are more highly mutated than structured regions and that these mutations can lead to higher variability in the physical properties of these domains. These computational predictions are compared to in vitro biophysical experiments to assess the effects of identified mutations on the thermodynamic stability, oligomeric state, particle formation, and liquid-liquid phase separation of a few exemplary mutants.

The paper is well-written and easy to follow, and the conclusions drawn are supported by the evidence presented. The analyses and conclusions are interesting and will be of value to virologists, cell biologists, and biophysicists studying SARS-CoV-2 function and assembly. It would be nice if some further extrapolation or comments could be made regarding the effects of the observed mutations on the in vivo behavior and properties of the virus, but I appreciate that this is much higher-order than could be addressed with the approaches employed here.

We thank the Reviewer for this positive assessment. With regard to the possible in vivo behavior of mutant species, we agree that this would require additional data beyond the scope of the present work.

However, for the N:G215C mutant we can point to a very recent preprint by Kubinski et al. (bioRxiv 2024) that describes reverse genetics experiments where the isolated N:G215C mutation caused altered in vivo pathology, enhanced viral replication, and altered virion morphology. We have cited this work in the revised manuscript.

As mentioned above, for the P13L mutation we hope to communicate a more detailed follow-up study that will allow us to extrapolate on its in vivo behavior.

Recommendations For The Authors:

Reviewer #1:

(1) Given the structure organization of N-protein in Figure 1, the authors should explain why linker region 180-247 is different from linker (175-247) mentioned in the first result.

We thank the reviewer for bringing up this point, which we agree deserves clarification. While often the NTD has been assigned a C-terminal limit of 180 (e.g., in the NMR structure by Dinesh et al, Plos Pathogens 2020), the last several residues in the NTD are already disordered and contain the S176/R177 pair and therefore may be ascribed to the beginning of the SR-rich portion of the linker. In order not to artificially truncate functional sequences of either NTD or linker, we have decided to allow the designations of the NTD and linker regions to overlap. We believe this is conservative in that possible NTD or linker properties extending into this transition region will be preserved. In order to explain this in the manuscript, we have modified Figure 1 and inserted a brief sentence “(Due to ambiguity in delineation between NTD and linker, designations overlapping in 175-180 were used to avoid artificial truncation and permit conservative evaluation of the properties of each domain.)”.

(2) Please specify the "physicochemical requirements" in the fourth paragraph of the first result, and its physicochemical meaning and references.

Thank you for pointing this out; we agree this was not well expressed. We have rephrased this (including new references) to “…we find that hydrophobicity is uniformly high and polarity correspondingly low in the folded NTD and CTD domains, which is consistent with the expectation that folded structures are stabilized by buried hydrophobic residues (Eisenberg and McLachlan, 1986; Kauzmann, 1959)”.

(3) The authors should clarify the biological meaning of the net charge and phosphorylation charge in the first result, just like the description in the results of polarity and hydrophobicity.

We agree this will improve readability, and have inserted an introductory sentence to the study of charges in the mutant spectrum: “Charges in proteins can control multiple properties related to electrostatic interactions, from functions of active sites to protein solubility, protein interactions, and conformational ensembles in IDRs (Garcia-Viloca et al., 2004; Gerstein and Chothia, 1996; Gitlin et al., 2006; Mao et al., 2010).”.

(4) The authors should clarify the calculation method and meaning of the column "occurs in % of all genomes" in Table 2.

We have inserted a footnote specifying that this is the “Percentage of all sequenced genomes carrying the specific mutation.”.

(5) Please specify what information or conclusion we can get for the shift of the intrinsic fluorescent spectrum of N: D63G in the third result paragraph 2.

We have rephrased the second sentence of this paragraph to “The presence of the N:D63G mutation in the NTD is highlighted in the shift of the intrinsic fluorescence quantum yield of this mutant in comparison to Nref ”. It confirms the structural prediction, which positions D63G at the protein surface near the NA binding site, and sets up the question whether this obligatory mutation of Delta-variant N-protein affects NA binding and thereby possibly assembly. Unexpectedly, we did not find any impact of the D63G mutation on NA binding, although we observed a modest impact on temperature-dependent particle formation by DLS.

(6) The conclusion, "some epistatic interaction between mutation of the linker and N-arm" in the third result paragraph 4, is over-interpreted from the result of the CD spectra because they didn't detect peptide interaction between mutation of the linker and N-arm.

Thank you for raising this point. We did not mean to make a strong conclusion here, and have now deleted this statement.

(7) The parallel assay for N: G215C and Nδ in SV-AUC experiments is recommended to be conducted with other groups to avoid experimental error.

I believe this may be a misunderstanding: Indeed we had carried out SV-AUC experiments for all the mutants, as shown in Figure 5A. However, since all but the N:G215C and Nδ formed only dimers as the reference protein, we did not comment on these in the results text. We have rectified this omission in the revision by inserting the sentence: “…The same behavior is observed for N:D63G, No, N:R203K/G204R, as well as N:P13L/Δ31-33 at low micromolar concentrations (Figure 5A). By contrast, the G215C mutation promotes the formation of higher oligomers…”

With regard to experimental error, SV-AUC is an absolute method based on first principles and we have maintained our instruments by performing regular calibrations, using methods developed by us and colleagues at NIST, as described in the literature (Anal Biochem 2013, PLOS ONE 2018, Eur. Biophys. J. 2021). Previously we have critically examined the accuracy of s-values by SV-AUC before and after calibration in a large multi-laboratory study (PLOS ONE 2015), and found that the accuracy of s-values is ~1%. This allows detailed comparisons of results from different runs and different points in time. To alleviate any concerns we have now mentioned our calibration methods in the methods section.

(8) The authors did not test the function of Nδ R203M mutation, so they should not mention about it like in the third result paragraph 5, which is over-interpreted from result 5A.

We accept the criticism that we have not yet examined the R203M mutation in isolation. However, we believe some speculation is in order: Nδ consists of D63G, R203M, G215C, and D377Y, of which D63G is unlikely to impact oligomeric state based on our data of N:D63G. It is therefore reasonable to assume that R203M and/or D377Y interfere with the observed promotion of oligomerization that we have observed with N:G215C. In previous work, we have traced the 215C-incuded oligomerization to the transient helix in the leucine-rich region of the linker 215-235 (Science Advances, 2023), Since 377Y is quite far away, the more proximal 203M appears to be the most plausible origin of the modulation of dimerization.

In the revision we have more clearly outlined this speculation: “ Of the three additional mutations of Nδ relative to N:G215C, we speculate that D63G does not impact dimerization (as in N:D63G, Figure 5A), and that therefore either the distant D377Y and/or R203M might cause this reduction of helicity and oligomerization relative to N:G215C, noting that R203M is proximal to the L-rich region (215-235) reshaped by 215C. ”. Later we refer to this as “any potential inhibitory role suspected of the R203M mutation on self-association…”.

(9) The description of LLPS formation lacks reference in the third result paragraph 6.

Thank you. To improve the transition to this new paragraph in the results, we have inserted “As outlined in the introduction, …” and repeated the 8 references to the fact that N-protein undergoes LLPS. The two additional, separate references refer to just those published studies that examined the temperature-dependence of LLPS, which I believe is now clearer.

(10) The authors did not test the interaction between the N-arm IDR mutation and linker IDR, it is not exponible that interaction promoted particle formation of No in the third result paragraph 8, which is over-interpreted from result 5B.

We thank the Reviewer for raising this point. In fact, we did not want to imply a direct physical interaction (in terms of binding) between the N-arm IDR mutation and that in the linker. But clearly there are non-additive effects in particle formation since P13L/Δ31-33 inhibits slightly and R203K/G204R inhibits almost completely, whereas the combination of the two (constituting No) promotes particle formation. We have rephrased this to “alter the effect of”, avoiding the term “interact with” not to suggest a picture of direct binding and invoke instead the idea of epistatic interactions.

(11) In the third result paragraph 9, why did the authors choose to examine the role of the N-arm mutations of the Omicron variants in greater detail? This reason should be added to the manuscript.

Thank you for this suggestion. Naturally, we were curious how the defining N-arm mutations of Omicron variants could impact particle formation. Even though no obvious enhancement of self-association by either Omicron N-arm or linker mutations was observed at low micromolar concentrations in SV-AUC (Figure 5A), we knew from experience with the study of the leucine-rich transient helix in the linker IDR that even weak interfaces with mM Kd can be highly relevant in the context of multivalent assemblies (Science Advances, 2023). Therefore we followed the same roadmap and focused on IDR peptides with the goal to study them at higher concentrations that might reveal weak interactions.

We have described this motivation as follows: “We were curious whether IDR mutations might alter particle formation through modulation of existing or introduction of new protein-protein interfaces. We focused on Omicron mutations as these are obligatory an all currently circulating strains, and specifically on N-arm mutations, which have recently been implicated in altered intramolecular interactions with NA-occupied NTD (Cubuk et al., 2023). Even though SV-AUC showed no indication of self-association of N:P13L/Δ31-33 at low micromolar concentrations, weak interactions with Kd > mM would not be detectable under these conditions yet could be highly relevant in the context of multi-valent complexes (Zhao et al., 2024). Following the roadmap used previously for the study of the weak self-association of the leucine-rich linker IDR (Zhao et al., 2023), we restricted the protein to the N-arm peptide such that it can be studied at much higher concentrations. To this end, we …”

(12) Why were different proteins dissolved in either high-salt buffer or low-salt buffer for biophysical experiments? Did this affect the experimental results? Explanations and evidence are required.

We appreciate this is an important point. Unfortunately, for practical reasons of available sample concentrations and quantities, it was not always possible to dialyze protein into both buffers. For example, the DSF data in Figure 4B show all proteins in low-salt buffer except N:R203K/G204R, which is in high-salt buffer. We had previously reported the absence of changes in Ti in DSF for Nref in the two buffers, which we have documented better in the revised manuscript by providing an additional Supplementary Figure S7: “As a buffer control, the difference in Ti for Nref in LS and HS buffer was measured and found to be within error of data acquisition (Supplementary Figure S7A).” This new Supplementary Figure provides an overlay of low-salt and high-salt DSF data for Nref, N:D63G, and No, which have variations in the Ti values for different buffers on the order of 0.1 °C. This is comparable to the precision of the measurement, and significantly smaller than the changes in Ti values between the different mutant protein species. Finally, we note that the one species for which we were unable to collect DSF data in low-salt buffer, N:R203K/G204R, was unremarkable relative to Nref, No, and N:P13L/Δ31-33.

In the case of CD, the only species for which we could not collect spectra in low-salt buffer was No. Again, this spectrum was similar to the group including Nref, along with N:P13L/Δ31-33, and N:D63G. In the results we interpreted significant differences from Nref for N:G215C and N:R203K/G204R.

Similarly, SV-AUC experiments were carried out in high-salt buffer, except Nref, Nδ , and N:G215C. In this case, we could observe a ≈ 5% difference in s-value for the same protein in different buffers, but the magnitude of this change is negligible compared to the ≈ 60-90% increase observed for altered oligomeric states. To clarify this we have inserted a sentence “Proteins for self-association studies were in buffer HS, except Nref, Nδ , and N:G215C were in LS, the latter causing a ≈5% increase in s-value (Supplementary Figure S7B).”, with the new Supplementary Figure S7B showing a comparison of sedimentation coefficient distributions of Nref and N:D63G in low- and high-salt buffers. Whether the small differences in s-values are indeed significant and reflective of salt-dependent conformational ensembles of IDRs will require a more detailed follow-up study, but is outside the scope of the present work.

All other experiments were carried out with uniform buffer conditions for all protein species.

(13) DLS data of N from other research suggests oligomers beyond dimer. Please address this discrepancy.

Unfortunately several previous studies in the literature did not recognize the importance of eliminating nucleic acid contaminations in the N-protein preparations, and/or did not succeed in completely removing nucleic acid from the protein. We and others have repeatedly commented on this issue. For example, Tarczewska et al (IJBM 188 (2021) 391-403) clearly demonstrate this in much detail in a study dedicated to this problem.

The clarify this point we have included a sentence in the paragraph describing the protein preparation “…the ratio of absorbance at 260 nm and 280 nm of ~0.50-0.55 confirmed absence of nucleic acid. The latter is important to eliminate higher order N-protein oligomers induced by nucleic acid binding (Carlson et al., 2020; Tarczewska et al., 2021; Zhao et al., 2021)” .

In order to strengthen the statement in the Results that the ancestral N-protein is dimeric we have added additional references from other labs that have carried out detailed biophysical analyses: “As reported previously, the ancestral N-protein at micromolar concentrations in NA-free form is a tightly linked dimer sedimenting at ≈4 S , without significant populations of higher oligomers (Forsythe et al., 2021; Ribeiro-Filho et al., 2022; Tarczewska et al., 2021; Zhao et al., 2022, 2021).”

Reviewer #2:

The key novel finding of the work lies in the evidence that P31L promotes N-terminal interactions. The paper would be strengthened by additional studies of the impact of P31Lon the oligomerization of full-length N protein. The sedimentation analysis in Fig 6 shows that high concentrations of the N arm alone self-associate, while the analysis in Fig 5 argues that P31L does not have an effect on the oligomerization of the full-length protein. Perhaps there are specific conditions or mutation combinations that would provide evidence that P31L has an effect on protein behavior that might explain the prevalence of this mutation.

We agree that the finding of P13L promoting N-terminal interactions is of great interest, and we thank the Reviewer for the suggestion to examine cross-correlations of N-arm mutations with other mutations as a tool to study its function and relevance.

The observation of self-association in Figure 6 at high concentrations is not necessarily at odds with the absence of self-association at 100fold lower concentrations. Rather, it seems to show that the interaction mediated by the N-terminal mutation P13L is weak with an effective Kd in the mM range. It will likely not be possible to reach sufficiently high protein concentrations with the full-length protein to visualize the oligomerization of N-terminal IDR. But even if it was possible to concentrate the protein enough, very likely other assembly processes would take place, including LLPS, obscuring potential P13L interfaces. Nonetheless we believe the protein-protein interface created by the N-arm IDR is highly relevant in the context of multi-valent complexes, where entropic co-localization enhances the effective N-arm IDR concentration that then can provide additional binding energy and strengthen the assembly of multi-protein complexes.

We are currently pursuing further experiments examining the properties and relevance of the N-arm mutations and intend to publish this in a separate study, not to distract from the thrust of the current work exploring of the extent of the biophysical phenotype space.

The R203K/G204R mutations have a surprising impact on LLPS in Figure 7: it is not clear how such limited mutations would alter the many nonspecific, multivalent interactions that presumably lead to phase separation. The paper would benefit from a more extensive analysis of LLPS in this mutant and in the P31L mutant, perhaps by performing the analysis at various protein concentrations and times.

Following this recommendation we have expanded the study of LLPS of Figure 7 by comparison of two different time points for Nref, N:R203K/G204R, and N:P13L in a new Supplementary Figure S6. We have also quantified the droplet distributions as shown in the new Supplementary Figure S5. Both clearly confirm the strong inhibitory effect of the R203K/G204R mutation on LLPS under our experimental conditions. What this shows is not that this protein could not undergo LLPS per se, but that the phase boundaries have shifted such that under the experimental conditions we applied LLPS does not occur yet. (In this context it is interesting to note that ≈50,000 genomes in the GISAID database have R203K/G204R as the sole N-protein mutation, without impact on viral viability.)

That individual point-mutations in IDRs can have significant impact on LLPS has been observed previously for several other proteins. Examples include SPOP [Bouchard et al., Mol Cell 72 (2018) 19-36.e8], SHP2 [Zhu et al., Cell 183 (2020) 490-502.e18], FUS [Niaki et al., Mol Cell 77 (2020) 82-94.e4], and CAPRIN1 [Kim et al., PNAS 118 (2021) 1-11]. The latter work applies NMR and reveals that promotion of LLPS is not uniform but centered in hot-spot residues of CAPRIN1.

While the precise molecular mechanism for LLPS of the N-protein is unclear, we can speculate how the effect of 203K/204R might be amplified. As shown by the coarse-grained MD simulations from Rozycki & Boura (Biophys. Chem. 2022), the linker IDR is highly flexible and the 203/204 residues make transient contacts to other residues throughout the linker as well as to distinct sites on the NTD. Furthermore, recent NMR data from the Blackledge lab (Botova et al., bioRxiv 2024, doi:10.1101/2024.02.22.579423) have revealed intra-molecular interactions, including a state where the L-rich (C-terminal) portion of the linker IDR interacts with a site on the distant NTD. (We have included a reference to this preprint in the discussion.) This intra-molecular contact observed in NMR must cause significant chain compaction and may thereby modulate the accessibility of portions of the linker IDR available to inter-molecular interactions contributing to LLPS. The residues 203/204 are in the middle between the SR-rich and L-rich region where bending of the chain must occur to allow for the intra-molecular contacts. The 203K/204R mutation may alter the dynamics or population of this intra-molecular bound state, especially considering the introduction of a bulky positively charged R replacing G204.

In summary, considering the dynamics of intra-molecular contacts and considering precedent of several other disordered proteins, we believe it is not unreasonable that the local mutation in the IDR R203K/G204R may cause a significant shift in LLPS phase boundaries. We note that this mutant also shows a very distinct behavior in the temperature-dependent DLS, entirely lacking particle formation below 70 °C. This observation seems consistent with altered inter-molecular interactions.

Reviewer #3:

I have only a few minor specific comments:

(1) Page 4, last paragraph - typo: "The large number of structural and non-structural N-protein functions poses the question of how they are conserved...". This either needs a colon or to be changed to "... poses the question of how they are conserved...".

Thank you – we have changed this sentence accordingly.

(2) Page 7, 2nd and 3rd paragraphs of "Physicochemical properties" section: why is Figure2B discussed before Figure 2A?

Initially when we present the results of polarity and hydrophobicity we refer more generally to Figure 2, as the two properties are so closely related. Later, in the section on related coronaviruses we do refer once more to Figure 2. Here we begin this section by discussing Figure 2B since in this plot the symbols for the different viruses are most recognizable.

(3) Page 11, lines 1-2: "Since this is a tell-tale of weak protein..." -> "tell-tale sign of ...".

We thank the reviewer for pointing this out and have fixed this sentence.

(4) Further down in the same paragraph, the meaning of "SV-AUC" should be spelled out at its first use.

We have double checked that SV-AUC is spelled out at its first use.

(5) Figures 1 and 2. Is there a good reason that the color scheme for the IDRs (magenta and cyan) is so close to the color scheme for the identifying mutations of Omicron and Delta (magenta and blue)? This initially led me to try to search for some connection, and it remains unclear to me if there is.

We apologize for this confusion. This was indeed a poor color choice, and we have rectified this in the revised manuscript by changing the colors of the identifying mutations of Omicron and Delta to dashed green and dotted red, respectively, so that there is no connection to the shading of the IDRs. Thank you very much for pointing this out!

(6) Figure 1: The physical limits of the subdomains, e.g. SR-rich, L-rich, C-arm1, and N3 could be more clearly delineated with lines, or some other visual representation.

Once more, we thank the reviewer for pointing this out. We have revised Figure 1 to indicate the limits between these subdomains.

(7) Figures 4, 5, and 6: are there any kind of error bars or confidence intervals on these measurements?

We appreciate this concern and have addressed it in different ways for the different methods.

For the spectra of intrinsic fluorescence in Figure 4A, we have now plotted an overlay of three acquired spectra, from which the experimental error as a function of wavelength may be assessed. It is clear that the differences between Nref and N:D63G are far greater than the measurement error.

With regard to DSF, we have provide an error estimate of 0.3 °C for the Ti-values, a value that we have revised from the previously reported errors of sequential replicates to now include Ti variation observed with different preparations of the same protein over long time periods.

For CD spectra we have included a new Supplementary Figure S3 that shows standard deviations of triplicate measurements as a function of wavelength. Since an overlay including errors for all species would be too crowded, we have created separate plots for all species in comparison with Nref. (On this occasion we discovered a 3% error in the magnitude of the Nref spectrum due to previously incorrect conversion to MRE, which we have now fixed.)

In SV-AUC, for data with typical signal-noise ratio, the statistical error is very small due to the large number (> 104 ) of raw data points included in the calculation of each c(s) trace, which each data point carrying a statistical error that is usually better than 1%. Therefore, the dominant error is systematic. In the past we have carried out large studies quantifying the accuracy of the major peaks of the sedimentation coefficient distributions, and found they are typically ≈1% in s-value and 1-2% for relative peak areas. In the AUC methods section we have now included the sentence “Typical accuracy of c(s) peaks are on the order of ≈1% for peak s-values and ≈1-2% for relative peak areas (Zhao et al., 2015).”

Finally, for the temperature-dependent DLS data we have to resort to the scatter in the temperature-dependent Rh-values. The calculated Rh-values can exhibit fluctuations once particles start to form and the distribution becomes highly polydisperse. As is characteristic for DLS under those conditions, individual Rh-values can be dominated by adventitious diffusion of few large particles into the laser focal spot. Although customarily autocorrelation functions can be filtered out through software filters (e.g., setting baseline and amplitude thresholds), this still presents the largest source of error in the Rh-values. These are systematic for the individual autocorrelation functions. We believe that the variation of Rh-values at similar temperatures outside the transition region provides a reasonable estimate for the experimental error.

(8) Figure 7: My most major comment. It would be good to somehow quantify the differences between these images. The claim is made that the LLPS droplets are different sizes, or for the P13L/\Delta31-33 variant that droplets are coalescing or changing shape over time. It would be good to quantify this rather than rely on eyeballing the pictures.

We are grateful to the Reviewer for this suggestion. As mentioned above, to improve the LLPS analysis we have now carried out segmentation of the images in Figure 7 to quantify the droplet numbers and areas. Histograms and statistical analyses are now provided in the new Supplementary Figure S5. In addition, we have added a comparison of the droplet numbers and sizes at two time-points for Nref, N:R203K/G204R, in addition to the previously shown N:P13L/Δ31-33, provided in the new Supplementary Figure S6. The results corroborate the previous conclusions, and depict how droplets in the N:P13L/Δ31-33 merge and grow in area more strongly than those from Nref.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Nguyen A, Zhao H, Myagmarsuren D, Srinivasan S, Wu D, Chen J, Piszczek G, Schuck P. 2024. Replication Data for: Modulation of Biophysical Properties of Nucleocapsid Protein in the Mutant Spectrum of SARS-CoV-2. Harvard Dataverse. [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    Supplementary file 1. Sequence alignment of nucleocapsid protein of related coronaviruses and sequences corresponding to SARS-CoV-2 regions.

    N-protein sequences of SARS-CoV-1 P59595.1, MERS YP_009047211.1, MHV NP_045302.1, human coronavirus NL63 Q6Q1R8.1, and 229E-related bat coronavirus APD51511.1 were aligned with SARS-CoV-2 N-protein. Regions corresponding to the SARS-CoV-2 regions (N-arm, NTD, linker, SR-rich, L-rich, CTD, Carm, Carm1, N3) and their alignment score were determined.

    elife-94836-supp1.xlsx (12.1KB, xlsx)
    MDAR checklist

    Data Availability Statement

    Raw data supporting this study can be found at the Harvard Dataverse https://doi.org/10.7910/DVN/PZ6LRK.

    The following dataset was generated:

    Nguyen A, Zhao H, Myagmarsuren D, Srinivasan S, Wu D, Chen J, Piszczek G, Schuck P. 2024. Replication Data for: Modulation of Biophysical Properties of Nucleocapsid Protein in the Mutant Spectrum of SARS-CoV-2. Harvard Dataverse.


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES