Skip to main content
Taylor & Francis - PMC COVID-19 Collection logoLink to Taylor & Francis - PMC COVID-19 Collection
. 2020 Sep 9:1–15. doi: 10.1080/07391102.2020.1815581

Ribonucleocapsid assembly/packaging signals in the genomes of the coronaviruses SARS-CoV and SARS-CoV-2: detection, comparison and implications for therapeutic targeting

Vladimir R Chechetkin a,, Vasily V Lobzin b
PMCID: PMC7544952  PMID: 32901577

Abstract

The genomic ssRNA of coronaviruses is packaged within a helical nucleocapsid. Due to transitional symmetry of a helix, weakly specific cooperative interaction between ssRNA and nucleocapsid proteins leads to the natural selection of specific quasi-periodic assembly/packaging signals in the related genomic sequence. Such signals coordinated with the nucleocapsid helical structure were detected and reconstructed in the genomes of the coronaviruses SARS-CoV and SARS-CoV-2. The main period of the signals for both viruses was about 54 nt, that implies 6.75 nt per N protein. The complete coverage of the ssRNA genome of length about 30,000 nt by the nucleocapsid would need 4.4 × 103 N proteins, that makes them the most abundant among the structural proteins. The repertoires of motifs for SARS-CoV and SARS-CoV-2 were divergent but nearly coincided for different isolates of SARS-CoV-2. We obtained the distributions of assembly/packaging signals over the genomes with nonoverlapping windows of width 432 nt. Finally, using the spectral entropy, we compared the load from point mutations and indels during virus age for SARS-CoV and SARS-CoV-2. We found the higher mutational load on SARS-CoV. In this sense, SARS-CoV-2 can be treated as a ‘newborn’ virus. These observations may be helpful in practical medical applications and are of basic interest.

Communicated by Ramaswamy H. Sarma

Keywords: COVID-19, SARS-CoV, SARS-CoV-2, helical nucleocapsid, genome packaging

1. Introduction

To the end of July 2020, the COVID-19 pandemia was the cause of more than 18.4 million of coronavirus cases and more than 695,000 of deaths over all world (https://www.worldometers.info/coronavirus/). The pandemia is still continuing and the possibility of return of new disease waves is considered to be very high. The development of efficient medications and vaccines against coronaviruses needs the knowledge of main molecular mechanisms in the virus life cycle and virus–host interaction (Feng et al., 2020; Fung & Liu, 2019; Maier et al., 2015; Saxena, 2020; Xie & Chen, 2020; Ziebuhr, 2016). In this article, we will discuss a specific interaction between nucleocapsid (N) proteins and genomic ssRNA in the coronaviruses SARS-CoV and SARS-CoV-2.

The ssRNA genome of the coronaviruses is packaged within a helical nucleocapsid, while the whole ribonucleocapsid is packaged within a membrane envelope (for a review see, e.g. Masters, 2019; Neuman & Buchmeier, 2016). The term ‘packaging signal’ in the coronavirus papers is overwhelmingly attributed to the specific interaction between the genomic RNA and membrane (M) proteins ensuring the transport of the ribonucleocapsid into the membrane envelope (Fosmire et al., 1992; Madhugiri et al., 2016; Masters, 2019; Narayanan, & Makino, 2001; Woo et al., 2019). The interactions between genomic RNA and N proteins are assumed to be nonspecific and governed mainly by electrostatic effects. The question of how N proteins recognize the related genomic RNA remains unanswered. A similar point of view was long-lastingly prevalent also in the virology community which studied ssRNA viruses with icosahedral capsids. The importance of cooperative weakly specific interactions between ssRNA and capsid proteins has been recognized not long ago (see discussion by Twarock et al. (2018) and references therein). Stockley et al. (2016) suggested and proved experimentally a two-stage model for the assembly of ssRNA viruses with icosahedral capsids. At the first, more rapid, stage RNA binds to the coat proteins to facilitate capsid assembly, whereas at the second, slower, stage RNA is compactly packaged within the capsid. The specific cooperative RNA-coat protein interactions play important role at the both stages. The two stages may be associated with different signals (Chechetkin & Lobzin, 2019) and the whole dynamic process may be called assembly/packaging. The generalization of these ideas on viruses with ribonucleocapsid within the membrane envelope like that for coronaviruses assumes three stages related to the complete packaging of the genomic RNA within the envelope: two-staged assembly/packaging of the helical ribonucleocapsid and packaging of the ribonucleocapsid within the envelope. This article is devoted to search for specific signals in the genomic ssRNA sequences related to the two-staged assembly/packaging of the helical ribonucleocapsid. As has been shown previously, the icosahedral symmetry of the capsid strongly affects the large-scale quasi-periodic segmentation in the related viral genomes (Chechetkin & Lobzin, 2019). The whole ribonucleocapsid structure of coronaviruses also remains invariant under transition by one helical turn. Therefore, the putative weakly specific assembly/packaging signals in the genomic RNA of coronaviruses should be coordinated with the parameters of the helical nucleocapsid (such as the helix pitch, inner and outer diameters) which are established by cryo-electron microscopy (cryo-EM) and other structural methods. In this article, we provide methods for the detection and comparative analysis of assembly/packaging signals in the genomic RNA of the coronaviruses SARS-CoV and SARS-CoV-2 and describe main results of our study.

2. Theory and methods

The quasi-periodic patterns in the genomic DNA/RNA sequences can be efficiently detected with the discrete Fourier transform (DFT). As the periodic patterns generate equidistant series of harmonics in the DFT spectra (see, e.g. Chechetkin & Turygin, 1995; Lobzin & Chechetkin, 2000), long enough patterns can be detected by the iteration of DFT or by the discrete double Fourier transform (DDFT) (Chechetkin & Lobzin, 2017, 2019, 2020a, 2020b). Although, the correlation functions are the main tools in this article, our approach is based implicitly and explicitly on DFT and DDFT. Therefore, we begin with the definitions of these operations. Below, we follow the methods developed previously (Chechetkin & Lobzin, 2017, 2020a, 2020b; Chechetkin & Turygin, 1994, 1995; Lobzin & Chechetkin, 2000).

2.1. DFT and DDFT: main definitions and relationships

The DFT harmonics corresponding to the nucleotides of type α(A, C, G, T) in a genomic sequence of length M are calculated as

ρα(qn)=M1/2m=1Mρm,αeiqnm,qn=2πn/M,n=0,1,,M1. (1)

Here, ρm,α indicates the position occupied by the nucleotide of type α; ρm,α = 1 if the nucleotide of type α occupies the mth site and 0 otherwise. The amplitudes of Fourier harmonics (or structure factors) are defined as

Fαα(qn)=ρα(qn)ρα*(qn), (2)

where the asterisk denotes the complex conjugation. Taking into account the symmetry relationship for the structure factors, the analysis of their spectra can be restricted by the range from n = 1 to

N=[M/2], (3)

where the brackets denote the integer part of the quotient. The structure factors will always be normalized on the mean spectral values, which are determined by the exact sum rules,

fαα(qn)=Fαα(qn)/F¯αα;F¯αα=Nα(MNα)/M(M1), (4)

where Nα is the total number of the nucleotides of type α in a sequence of length M. Below, we will also use the sums,

Sαβ(qn)=fαα(qn)+fββ(qn);αβ, (5)
Sαβγ(qn)=fαα(qn)+fββ(qn)+fγγ(qn);αβγ, (6)
S4(qn)=fAA(qn)+fCC(qn)+fGG(qn)+fTT(qn), (7)

which can be applied to the detection of quasi-periodic patterns or motifs composed of the nucleotides of different types. The period p is measured in terms of the number of nucleotides (these units will always be tacitly implied below) and is calculated as,

p=M/n. (8)

The harmonics in DDFT are calculated as

Φα(q˜n)=(N1)1/2n=2Nfαα(qn)eiq˜nn,q˜n=2πn/(N1),n=0,1,,N2, (9)

where N is defined by Equation (3) and fαα(qn) are the normalized structure factors (see Equation (4)). The similar transform can be used for the sums defined by Equations (5)–(7). The amplitudes of harmonics are given by

Fαα,II(q˜n)=Φα(q˜n)Φα*(q˜n). (10)

Similarly to DFT, the analysis of the spectra for the amplitudes defined by Equation (10) can be restricted from n' = 1 to

N=[(N1)/2]. (11)

The DDFT amplitudes are normalized as

fII(q˜n)=FII(q˜n)/F¯II, (12)
F¯II=1Nn=1NFII(q˜n). (13)

Generally, equidistant series in DFT spectra also generate the corresponding equidistant series in DDFT spectra with the spectral numbers kn,k=1,,kmax;kmaxnN, where N' is defined by Equation (11). The number of quasi-periodic patterns can be assessed by the spectral number n' for the peak amplitude fII(q˜n) as

Np=(N1)/n, (14)

while their periods in nucleotides are given by

pII=M/Np. (15)

2.2. Correlation functions

The nucleotide correlation functions (NCF) are determined as,

Kαα(m0)=M1m=1Mρm,αcρm+m0,αc,m0=0,1,,M1, (16)
ρm,αc={ρm,α,if1mM;ρmM,α,ifM+1m2M1. (17)

The circular NCFs used in this article are especially suitable for the detection of periodic patterns. Periodic patterns with a period p produce a series of equidistant peaks at the multiple spacings, m0 = kp, k = 1, 2, … The corresponding mean value is given by

K¯αα=1M1m0=1M1Kαα(m0)=Nα(Nα1)M(M1). (18)

The correlation functions are symmetrical,

Kαα(m0)=Kαα(Mm0). (19)

This allows us to restrict the analysis of NCF from m0 = 1 to N defined by Equation (3). The normalized deviations,

καα(m0)=(Kαα(m0)K¯αα)/<ΔKαα2>random1/2, (20)

where

<ΔKαα2>random=F¯αα2/M,F¯αα=Nα(MNα)/M(M1), (21)

are Gaussian for the random sequences. Similarly to the sums defined by Equations (5)–(7), it is useful to introduce the combinations,

Qαβ(m0)=(καα(m0)+κββ(m0))/2;αβ, (22)
Qαβγ(m0)=(καα(m0)+κββ(m0)+κγγ(m0))/3;αβγ, (23)
Q4(m0)=(κAA(m0)+κTT(m0)+κCC(m0)+κGG(m0))/2, (24)

which are also Gaussian for the random sequences.

The correlation functions and the DFT structure factors are not independent and are related by the Wiener–Khinchin relationship,

Kαα(m0)=M1n=0M1Fαα(qn)exp(iqnm0). (25)

The normalized deviations for NCF can be expressed as,

καα(m0)Δkαα(m0)/(1/M)1/2, (26)
Δkαα(m0)=(Kαα(m0)K¯αα)/F¯αα=M1n=1M1(fαα(qn)1)eiqnm0. (27)

These deviations are insensitive to the nucleotide composition and genome length but may strongly depend on the dominating underlying periodicities in the genomic sequences. In the viral genomes this is the triplet periodicity p = 3 inherent to the protein-coding regions (for a review and further references see, e.g. Lobzin & Chechetkin, 2000; Marhon & Kremer, 2011). The relationship defined by Equation (27) facilitates the control of contribution from underlying periodicities into the normalized deviations for NCF by cutting-off dominating peaks and re-normalizing DFT spectra. Such a procedure can be used for detection of the weaker longer periodicities on the background of strong short periodicities.

2.3. Statistical criteria

Throughout this article, we will use the standard statistical criteria corresponding to the probability Pr = 0.05. For the random sequences, the statistics for the DFT and DDFT normalized harmonics defined by Equations (4) and (12) is Rayleighian, whereas the statistics for the normalized deviations defined by Equations (20) and (22)–(24) is Gaussian. To study the distribution of periodic patterns over the genome, we will use a set of nonoverlapping windows of width w. Averaging of the DFT spectra over the windows provides the corresponding periodogram, while averaging of the normalized deviations for NCF over the windows provides the corresponding correlogram (see, e.g. Marple, 1987). Averaging over windows diminishes the effects of indels on the periodicity phasing.

2.4. Reconstruction of motifs related to quasi-periodic patterns

The motifs related to quasi-periodic patterns are presumably the most important for practical applications. For their reconstruction, we developed a method of transitional automorphic mapping of the genome on itself (TAMGI). The algorithm for TAMGI is as follows. Let a step length s be chosen (equal to the detected period of periodic patterns in the problem concerned). Then, the pairs of nucleotides (Nm, Nm + s) separated by the step s are mutually compared when moving one-by-one site m along the genomic sequence. If both nucleotides belong to the same type, they both are retained in the genomic sequence; otherwise, the nucleotide Nm is replaced by void (denoted traditionally by the hyphen). Thus, the Nmth nucleotide will be retained if it has at least one neighbor Nm – s or Nm + s of the same type and be replaced by void otherwise. The resulting sequence after TAMGI is composed of the nucleotides of four types (A, C, G, T) and the hyphens ‘-‘ denoting voids. Further analysis is reduced to the enumeration of all complete words of length k (k-mers) composed only of nucleotides (voids within the complete words are prohibited) and surrounded by the voids ‘-’ at 5′- and 3′-ends, -Nk-. By definition, the complete words are nonoverlapping. At the next stage, the mismatches to the complete words can be studied. If the presence of periodic patterns is ensured, e.g. by DFT or DDFT, TAMGI with the step s equal to the corresponding period p provides a sequence enriched by the periodic patterns. Thus, TAMGI contains the most frequent motifs related to quasi-periodic patterns and provides their distribution over the genome. As TAMGI contains also the quasi-random fraction, the latter can be partially filtered out by combining TAMGI with the steps s and 2 s. The TAMGI method is robust with respect to indels but may depend on the nucleotide content and underlying short periodicities.

Generally, TAMGI may also be extended to noninteger steps s by the best integer approximation of transitional mapping with noninteger s. The latter can be obtained using a set of chains (N1, N1+{s}, …, N1+{kmaxs}), (N2, N2+{s}, …, N2+{kmaxs}), ..., (N{s}, N{2s}, …, N{(kmax+1)s}), where {ks} means rounding to the nearest integer and {(kmax+1)s} < M. The choice of consecutive pairs in the chains is performed by the algorithm similar to that as described above.

2.5. Spectral entropy

The abundance of quasi-periodic patterns in the genomic DNA/RNA sequences can be assessed by the spectral entropy (Balakirev et al., 2003, 2005, 2014; Chechetkin, 2011; Chechetkin & Lobzin, 1996; Chechetkin & Turygin, 1994). The spectral entropy is defined as,

Sα=n=1Nfαα(qn)lnfαα(qn);Stotal=αSα. (28)

Its mean value,

<Sα>random=(1C)N, (29)

where C is Euler constant; (1C)= 0.422785…, attains approximate maximum for the random sequences. The corresponding variance for the spectral entropy is given by

σ2(Sα)random=0.289868 . . .N. (30)

The abundance of quasi-periodic patterns in the genomes of different lengths can be assessed by the relative spectral entropies,

Sα,rel=Sα/|<Sα>random|. (31)

The relative spectral entropy serves also for the assessment of the load from point mutations and indels on the genomes or on the particular genes and pseudogenes (Balakirev et al., 2003, 2005, 2014).

3. Results

3.1. Nucleocapsid structure and packaging of genomic ssRNA

Early studies based on electron microscopy have revealed that the ribonucleocapsid of coronaviruses is helical, consisting of coils of 9–16 nm in diameter and a hollow interior of approximately 3–4 nm (Macneughton et al., 1978). Chang et al. (2014) asserted that for the SARS-CoV nucleocapsid an outer diameter of 16 nm and an inner diameter of 4 nm are consistent with cryo-EM observations. The length of a helical turn per pitch is

lt=(1+(πd/h)2)1/2h, (32)

where d is the diameter of the helix and h is the pitch. According to Chen et al. (2007), the pitch for the SARS-CoV nucleocapsid is h = 14 nm. Taking the distance between RNA bases as 0.34 nm, the positioning of RNA near the inner diameter of nucleocapsid provides the length of RNA turn about 54–56 nt, the positioning of RNA in the middle between the inner and outer diameters would provide the length of turn about 84–87 nt, whereas the positioning of RNA at the outer diameter would provide the length of turn about 153–154 nt. Chang et al. (2009) found multiple (at least three) nucleic acid binding sites in N proteins. Therefore, the intermediate dynamic positioning of RNA in the middle during assembly/packaging cannot be excluded. At the final stage of packaging, ssRNA is assumed to be positioned at the inner diameter of the nucleocapsid in accordance with cryo-EM observations (Chang et al., 2014). We performed the complete combined analysis of the SARS-CoV and SARS-CoV-2 genomes based on DFT, DDFT, NCF and pattern correlation functions (Chechetkin & Lobzin, 2020b) and screened all range of putative periods from the shortest period of 2 nt to the large-scale periods comparable to the whole genome lengths. The most interesting results related to the ribonucleocapsid assembly/packaging are presented below.

3.2. Overview of NCF and characteristic patterns

We took for analysis one genomic sequence for SARS-CoV (GenBank accession: NC_004718; M = 29,751; NA = 8481, NG = 6187, NT = 9143, NC = 5940) as a reference and the genomic sequences for three isolates of SARS-CoV-2 (GenBank accessions: MT371038; M = 29,719; NA = 8873, NG = 5834, NT = 9554, NC = 5458; MT295464; M = 29,892; NA = 8948, NG = 5862, NT = 9592, NC = 5490; MT371037; M = 29,694; NA = 8866, NG = 5829, NT = 9544, NC = 5455) to assess the impact of point mutations and indels on the detected patterns. Henceforth, the viruses will be denoted by their accessions. Taking into account the transitional invariance of a helix, the main results will be given for NCF. The presence of periodic components in NCF was proved by combining DFT and DDFT. The general overviews of the plots for the normalized NCF deviations defined by Equation (20) are shown in Figures 1 and 2. The overview for MT371038 is closer to that shown in Figure 1, while the corresponding plots for MT295464 are similar to those shown in Figure 2.

Figure 1.

Figure 1.

The plots for the normalized NCF deviations defined by Equations (26) and (27). The initial ranges of plots shown in the inserts were re-calculated by replacing the highest Fourier harmonics by the peaks defined by extreme value statistics in the DFT spectra. The characteristic spacings m0 are explicitly marked by the arrows. The horizontal lines correspond to the significance Pr = 0.05 for the reshuffled random sequences. The panels A–D correspond to the nucleotides of particular types in the genome of SARS-CoV (accession NC_004718).

Figure 2.

Figure 2.

The plots for the normalized NCF deviations defined by Equations (26) and (27). The initial ranges of plots shown in the inserts were re-calculated by replacing the highest Fourier harmonics by the peaks defined by extreme value statistics in the DFT spectra. The characteristic spacings m0 are explicitly marked by the arrows. The horizontal lines correspond to the significance Pr = 0.05 for the reshuffled random sequences. The panels A–D correspond to the nucleotides of particular types in the genome of SARS-CoV-2 (accession MT371037).

Then, all plots for NCF were recalculated using Equations (26) and (27) and replacing all highest harmonics in the DFT spectra by the peaks assessed by extreme value statistics (cf. Chechetkin & Lobzin, 2019). The initial ranges of the recalculated plots are shown in the inserts to Figures 1 and 2. The deviations corresponding to the most pronounced patterns are shown explicitly by arrows. Such patterns are quasi-periodic because the corresponding approximately equidistant series can be pursued in these plots (the next peaks are shown only for the most pronounced patterns with periodicity p = 54).

3.3. Correlograms and periodograms for the genomes of SARS-CoV and SARS-CoV-2

For the further analysis and as a cross-check of the above results, the NCF and DFT spectra were computed for the set of nonoverlapping windows of width 432 nt. The 3′-end windows #69 were incomplete for the genomes of NC_004718, MT371038 and MT371037. The characteristics used in our analysis are robust with respect to the length of window. The normalized deviations for NCF were calculated using Equations (26) and (27) and replacing peaks corresponding to the triplet periodicity p = 3 by the heights corresponding to Pr = 0.05 in the Rayleigh spectra. The similar cut-off was used after the calculations of the DFT spectra within windows. The correlograms obtained by averaging of the plots for normalized NCF deviations for the sums defined by Equation (24) are shown in Figure 3. The significance threshold of Pr = 0.05 for the correlograms corresponds to ±1.96/Nw1/2, where Nw is the total number of windows. In all genomes the deviations for m0 = 54 were the highest and the deviations with m0 = 108 were significant as well. For SARS-CoV the deviations with m0 = 216 (=4 × 54) were also significant. The next characteristic high deviations for SARS-CoV were for m0 = 87, while in the isolates of SARS-CoV-2 they were for m0 = 84.

Figure 3.

Figure 3.

The correlograms obtained by averaging of the normalized NCF deviations defined by Equations (24), (26) and (27) calculated within nonoverlapping windows of width 432 nt for the genomes of NC_004718 (A), MT371038 (B), MT295464 (C) and MT371037 (D). The horizontal lines correspond to the significance Pr = 0.05.

The corresponding periodograms obtained by the averaging of the DFT spectra over windows were then re-computed by DDFT (Chechetkin & Lobzin, 2020a). Due to the restrictions related to the applicability of DDFT, the left boundary in the DDFT spectra is positioned at n' =10. Then, the DDFT spectra were renormalized in this range. The resulting DDFT spectra for the sums defined by Equation (7) are shown in Figure 4. Again, the harmonic with n' = 27, p' = 54.2 was reproducibly significant and the highest in the range under study for all genomes. For SARS-CoV the harmonic with n' = 43, p' = 86.4 was also significant, whereas for the isolates of SARS-CoV-2 the harmonic with n' = 42, p' = 84.4 appeared to be insignificant. The harmonic with n' = 57, p' = 114.5 for SARS-CoV can be treated as a distorted and modified doubled period p = 54 (typically of hidden fuzzy repeating patterns). Thus, combining correlograms for NCF with the analysis of periodograms by DDFT reveals clearly the persistently reproducible quasi-periodic patterns with the period p ≈ 54 in all genomes and indicates the relevance of less robust patterns with p ≈ 84 and 87.

Figure 4.

Figure 4.

The DDFT spectra of the periodograms obtained by averaging of the DFT spectra defined by Equations (4) and (7) calculated within nonoverlapping windows of width 432 nt for the genomes of NC_004718 (A), MT371038 (B), MT295464 (C) and MT371037 (D). The horizontal lines correspond to the significance Pr = 0.05.

3.4. Distribution over the genomes for deviations of NCF components putatively related to ribonucleocapsid assembly/packaging signals

To assess the distribution of the detected patterns over the genomes, the normalized deviations for NCF were computed in separate windows of width 432 nt as described above. The spacings for NCF m0 were chosen by the correspondence with the periods of the detected patterns and were equal to 54, 84 and 87, respectively. The resulting plots for the sums defined by Equation (24) are shown in Figure 5. The numerical data for the profiles in Figure 5 and for the profiles corresponding to the nucleotides of particular types as well as to the sums defined by Equations (22) and (23) are collected in Supporting Information S1. We assessed the correlations between different profiles by the Pearson correlation coefficients. The NCF profiles for the different genomes were significantly correlated for the same spacings m0, while the profiles with the different spacings can be considered uncorrelated. The coefficients for correlations between profiles for SARS-CoV and three isolates of SARS-CoV-2 at m0 =54 were 0.623, 0.491 and 0.636 (Pr < 2 × 10−5 for 69 components). The related coefficients for the correlations MT371038–MT295464, MT371038–MT371037 and MT295464–MT371037 were 0.817, 0.954 and 0.751. Similar but a bit lower values were obtained for the correlations at two other spacings.

Figure 5.

Figure 5.

The profiles of the normalized NCF deviations defined by Equations (24), (26) and (27) for the components with m0 = 54 (A), 84 (B) and 87 (C) calculated within nonoverlapping windows of width 432 nt for the genomes of NC_004718, MT371038, MT295464 and MT371037. The horizontal lines correspond to the significance Pr = 0.05.

As supposed, the motifs detected at the different spacings m0 are related to the different stages of ribonucleocapsid assembly/packaging. Such motifs can be incorporated into the genomic sequence by silent mutations due to the degeneracy of the genetic code. The regular near-by positioning of different assembly/packaging motifs would be too restrictive, because the main function of the genomic sequence is coding for proteins. Therefore, the windows enriched simultaneously by the motifs of different types are especially interesting as well as the windows enriched or depleted by the motifs of the same type. Despite evolutionary divergence between the two viruses and the action of point mutations and indels, some features appear to be remarkably reproducible in all genomes. In particular, in the window #3 (sites 865–1296) the normalized NCF deviations exceeded significance threshold Pr = 0.05 for all genomes at m0 =54. Similar but stronger effects were observed for the window #5 (1729–2160); in the latter case for SARS-CoV, this window was also enriched by the motifs with m0 =87. The window #29 (12097–12528) was enriched by the motifs with m0 =54; additionally, for all isolates of SARS-CoV-2 this window was enriched by the motifs with m0 =87. An opposite example with depletion of motifs associated with m0 =87 can be seen in the window #34 (14257–14688). These profiles may explain why the mean deviation with m0 =84 exceeds the deviation with m0 =87 in the genomes of SARS-CoV-2. In the latter case, despite significant enrichment by the motif with m0 =87 in some of the windows, there are also the windows with significant depletion of this motif.

Similar profiles were also obtained for DDFT harmonics with the spectral numbers n' = 27, 42 and 43. DDFT spectra in windows of width 432 were computed for the sum defined by Equation (7) as described above. The related profiles can be found in Supporting Information S2. The counterpart profiles for the normalized NCF deviations and DDFT harmonics appear to be significantly correlated in the same genomes. Therefore, the characteristic features in the both sets of profiles were approximately reproducible. In addition to these features, an extremely high peak for the DDFT harmonic with n' = 27, p' = 54.2 was observed in the window #60 (25489–25920) in the genome of SARS-CoV.

3.5. Distributions and repertoires of motifs obtained by TAMGI

Reconstructed motifs and their positions on the genomes were obtained by TAMGI with the steps s = 54, 84 and 87. The resulting sequences after TAMGI are explicitly reproduced in Supporting Information S3–S6. The data on the total fractions of nucleotides after TAMGI are summarized in Table 1. A simple theoretical consideration shows that the partial fractions of nucleotides after TAMGI for the randomly reshuffled genomic sequences are given by

Φα=ϕα2(2ϕα);Φtotal=αΦα, (33)

where ϕα is the frequency of nucleotides of the type α retained under reshuffling. Equation (33) was additionally verified by simulations. The frequencies given by Equation (33) are independent of steps and also are reproduced in Table 1 for reference. The variances of frequencies related to particular random realizations are about

σ2(Φα)=Φα(1Φα)/M;σtotal2=ασ2(Φα). (34)

Table 1.

The total frequencies of nucleotides after TAMGI with different steps in the genomes of SARS-CoV and SARS-CoV-2.

    Genome accession
 
  NC_004718 MT371038 MT295464 MT371037
Random 0.448 0.456 0.456 0.456
s = 54 0.481 0.490 0.489 0.490
s = 84 0.464 0.483 0.483 0.483
s = 87 0.476 0.477 0.477 0.477

The total frequencies of nucleotides after TAMGI for the randomly reshuffled genomic sequences were calculated by Equation (33).

Equation (34) yields for σtotal the value of 0.004 that is much lower than the differences between frequencies for viral and random sequences. In this sense, Table 1 reveals distinctly nonrandom character of the variations related to the detected quasi-periodic patterns in the viral genomes. The mutual comparison of the total frequencies of nucleotides after TAMGI for the different isolates of SARS-CoV-2 shows their robustness against point mutations and indels.

The general distributions of k-mers, -Nk-, on the length k are presented in Table 2. The period of p ≈ 54 implies the association of 6.75 nt per one N protein (see Subsection 4.2 below). All motifs with k ≥ 6 and their positions on the genomes are enumerated in Supporting Information S7. The profiles for the total numbers of nucleotides within nonoverlapping windows of width 432 nt after TAMGI with the steps s = 54, 84 and 87 are shown in Figure 6. For the incomplete windows #69 these numbers were increased proportionally to obtain estimates for the width of 432 nt. The profiles in Figures 5 and 6 are close but differ in some features. The corresponding Pearson correlation coefficients between the counterpart profiles in Figures 5 and 6 were highly significant, 0.72–0.86. Nevertheless, the highest peaks and the lowest troughs may differ between the counterpart profiles. In particular, the highest peak in Figure 6(A) was observed for the window #8 (sites 3025–3456). The profiles for s = 54 and 87 were slightly biased from the higher values at 5′-end to the lower values at 3′-end, although, the extreme windows #1 (1–432) comprising 5′-UTR were depleted of motifs. The numerical values for all profiles in Figure 6 can be found in Supporting Information S7.

Table 2.

The occurrences of k-mers, -Nk-, in the genomes of SARS-CoV and SARS-CoV-2 after TAMGI with steps s = 54, 84 and 87.

    Genome accession
 
k-mers NC_004718 MT371038 MT295464 MT371037
    s = 54    
1 3618 3617 3633 3609
2 1834 1823 1847 1828
3 862 898 901 901
4 458 448 449 446
5 218 226 227 226
6 106 128 125 126
7 51 51 52 52
8 28 23 23 23
9 16 21 20 20
10 5 10 10 10
11 4 2 2 2
12 5 2 2 2
13 1
14 1 1 1
17 1 1 1
 
 
s = 84
 
 
1 3844 3790 3808 3787
2 1782 1807 1819 1802
3 870 899 903 901
4 368 417 422 416
5 211 197 198 197
6 93 108 108 108
7 49 50 51 51
8 20 35 35 35
9 12 13 12 12
10 4 10 10 10
11 2 7 7 7
12 1 1 1 1
13 1 1 1
15 2
 
 
s = 87
 
 
1 3667 3726 3752 3732
2 1855 1798 1816 1793
3 941 905 908 902
4 405 417 420 415
5 197 196 194 195
6 102 113 116 115
7 52 44 44 44
8 23 33 31 32
9 8 9 10 9
10 6 9 9 9
11 3 5 5 5
12 2 1 1 1
13 1 1 1

Figure 6.

Figure 6.

The profiles of the total numbers of nucleotides after TAMGI with steps s = 54 (A), 84 (B) and 87 (C) within nonoverlapping windows of width 432 nt for the genomes of NC_004718, MT371038, MT295464 and MT371037.

The comparison of repertoires of motifs with k ≥ 6 presented in Supporting Information S7 revealed nearly complete correspondence (up to one-two motifs) between motifs for three isolates of SARS-CoV-2. The divergence between motifs for SARS-CoV and SARS-CoV-2 appeared to be more significant. In particular at the step s = 54, only 22 hexamer motifs from 102 different motifs (106 in total) in the SARS-CoV genome coincided with those for SARS-CoV-2 and 36 hexamers differed by one letter from the repertoires of hexamers for SARS-CoV-2. The similar comparison for the other steps yielded the coincidence of 18 from 89 different motifs (93 in total) and 38 motifs differing by one letter at s = 84 and the coincidence of 15 from 93 different motifs (102 in total) and 37 motifs differing by one letter at s = 87. This means that the repertoires of relatively long motifs are robust to point mutations and indels for the separate coronaviruses but diverge (and in this sense are specific enough) between the two viruses despite the conservation of the main helical periodicity p ≈ 54 nt. The relationships between motifs found for the assembly/packaging and the other cis-acting elements (Madhugiri et al., 2016) should be established separately. Our study showed that actually any cis-acting element should comprise contextual surrounding vicinity of several tens of nucleotides up- and downstream the element.

The occurrences of the motifs determined by TAMGI can be compared with their counterparts in the whole genome. The statistical significance of such motifs in the whole genome can be assessed by the related occurrences in the sequences obtained by the random reshuffling of the genome. Instead of modeling with genome reshuffling, the rigorous theory by Zubkov and Mikhailov (1974) and Karlin and Altschul (1990) can be used for the assessment of motif occurrences (see also Boeva et al., 2006; Suvorova et al., 2014).

4. Discussion

4.1. Comparison of abundance of quasi-periodic patterns in the SARS-CoV and SARS-CoV-2 genomes

Short tandem repeats in human genomes are widely used in the medical diagnostics and forensic (see, e.g. Baine & Hui, 2019; Butler, 2011; Grover & Sharma, 2016; Kayser, 2017; Sznajder & Swanson, 2019; and references therein). Similar patterns were also found in some prokaryotic genomes (Subirana & Messeguer, 2019). Quasi-repeating patterns in viral genomes are present commonly in the hidden form on the background of frequent random point mutations and indels. Nevertheless, many quasi-repeating patterns remain persistent, robust and contain important information about molecular mechanisms of virus life cycle, including genome packaging. Such patterns can be detected and quantified by DFT, DDFT, NCF and other methods. Surprisingly, the quasi-repeating patterns in viral genomes are usually completely ignored when discussing evolutionary and subtyping problems in virology (see, e.g. Andersen et al., 2020; Cagliani et al., 2020; Forster et al., 2020; MacLean et al., 2020; Tang et al., 2020).

The general abundance of quasi-periodic patterns in viral genomes can be conveniently assessed by the relative spectral entropy (Subsection 2.5). The more negative the spectral entropy, the higher the abundance of quasi-periodic patterns in the genome. The relevant data for the genomes of SARS-CoV and three isolates of SARS-CoV-2 are summarized in Table 3. For the significance Pr = 0.05, the difference between the total spectral entropies should exceed by the absolute value the threshold 1.962σ(Stotal,rel)≈ 0.055. This is actually fulfilled for all three differences between Stotal,rel for SARS-CoV and the isolates of SARS-CoV-2, whereas the mutual differences between Stotal,rel for the isolates of SARS-CoV-2 are less, that is in accordance with the evolutionary divergence of SARS-CoV and SARS-CoV-2. The values of Stotal,rel in Table 3 reveal the higher abundance of periodic patterns in the SARS-CoV-2 genomes in comparison with the SARS-CoV genome. It can also be said that during virus age the load from point mutations and indels on the genome of SARS-CoV was higher in comparison with the load on the genome of SARS-CoV-2. Within such interpretation SARS-CoV-2 may be treated as a ‘newborn’ virus.

Table 3.

The relative spectral entropies (see Equation (31)) characterizing the abundance of quasi-periodic patterns in the viral genomes.

    Relative spectral entropy
   
Accession SA,rel SG,rel ST,rel SC,rel Stotal, rel
NC_004718 –1.017 –1.233 –1.200 –1.008 –4.459
MT371038 –1.001 –1.267 –1.220 –1.029 –4.517
MT295464 –1.008 –1.251 –1.231 –1.046 –4.536
MT371037 –1.007 –1.271 –1.222 –1.042 –4.542

The standard deviations for the relative spectral entropies Sα, rel in the random sequences of the same lengths are about 0.010. The standard deviation for Stotal, rel is twice of this value.

4.2. How many N proteins are needed for the complete packaging of the SARS-CoV and SARS-CoV-2 ssRNA genomes?

The periods of ssRNA turns packaged within the helical ribonucleocapsid and detected via repeating motifs in the genomic RNA sequences proved to be persistent in the genomes of SARS-CoV and SARS-CoV-2, although, the repertoires of related motifs appeared to be divergent. Taking into account that the turn of nucleocapsid is composed of two octamers (Chen et al., 2007) polymerized from dimeric N proteins, the detected period of 54 nt implies that one N protein should be associated with 6.75 nt. This is very close to the estimate obtained by Chang et al. (2014) that one N protein should be associated with 7 nt. Consequently, for genomes of length 30,000 nt typical of coronaviruses, 4.4 × 103 N proteins are needed for complete packaging of the genomic ssRNA. The latter estimate significantly exceeds the value suggested by Neuman and Buchmeier (2016), 0.7–2.2 × 103 N proteins per virion and the association of each N protein with 14–40 nt of genomic RNA. The flower-like packaging of the helical nucleocapsid within the envelope (see, e.g. Gui et al., 2017; Masters, 2019; and further references therein) implies an integrity of the nucleocapsid and gives evidence against rods-on-a-string model for nucleocapsid. Therefore, such difference in estimates cannot be attributed to uncovering of a part of the genome. Presumably, the total number of N proteins per virion is underestimated and the number 4.4 × 103 makes N proteins the most abundant in the active phase of the virus life cycle.

4.3. Implications for therapeutic targeting

N proteins of the coronaviruses provide the promising therapeutic targets (Chang et al., 2014; 2016; Lin et al., 2020; Tilocca et al., 2020; Yadav et al., 2020). The advantages of using N proteins for therapeutic targeting are as follows. (i) As N proteins are abundant, the antibodies against them can be used for early diagnostics and in vaccines. (ii) N proteins are multifunctional and participate not only in the assembly/packaging of the ribonucleocapsid but also in the regulation of the replication-transcription processes (Hurst et al., 2010; McBride et al., 2014; Verheije et al., 2010). The interaction between M and N proteins plays an important role in the packaging of the ribonucleocapsid within the envelope (Kuo et al., 2016). (iii) Coronavirus M and N proteins stand out as being the most conserved among structural proteins (Neuman & Buchmeier, 2016). They should be more stable against the load from point mutations and indels especially frequent in viruses. The most of vaccines are currently developed against spike (S) proteins. However, S proteins are rather variable and in any case the multitargeted vaccines will be more efficient in comparison with one-targeted.

The other strategy is related to the development of RNA vaccines (Kramps & Elbers, 2017) or to targeting of specific motifs in the viral RNA. The latter can be performed by RNA aptamers, RNA interference (Min & Ichim, 2010) or by the specially designed RNA-binding proteins (Filipovska & Rackham, 2012; Hall, 2016; Lunde et al., 2007). The assembly/packaging signals look quite promising as the targets in the genomic ssRNA. The modified N proteins or their fragments can be used for similar purposes and may introduce defects in the nucleocapsid and make the virus less viable. The incorporation of assembly/packaging motifs into oligonucleotides immobilized on the surface of microarrays may facilitate the detection of coronaviruses by microarrays (for a review on microarrays see, e.g. Dufva, 2009).

4.4. Comments on the specificity of motifs

Presumably, the most working motifs (or, more exactly, the complete words defined in Subsection 2.4) participating in ssRNA-N proteins specific interactions are of 2–4 nt in length. They are frequent enough (see Table 2) and their coordinate positioning over the genome may provide specific cooperative interaction with N proteins. The close incorporation of the longer motifs would be too restrictive because of the protein coding function of the genomic ssRNA. However, the longer and rarer motifs may be multifunctional and may play the role of cis/trans-elements for other molecular mechanisms during the virus life cycle. This conclusion looks nearly definite for the pairwise motifs at the step s = 84 such as ATTATAATTATAAAT (SARS-CoV; the start sites 22711 and 22795) and ATTATAATTA (isolates of SARS-CoV-2; sites 22766 and 22850; 22810 and 22894; 22757 and 22841, respectively). Note that the positions of these motifs on the genomes are also closely conserved. The same concerns the longest motifs found at s = 54 in the genome of SARS-CoV-2, TATTCAAACAATTGTTG (sites 3213, 3257 and 3204, respectively).

The specific binding of N proteins with ssRNA results in the lowering of free energy, which may approximately be assessed by the Boltzmann factor,

ϕbexp(ΔF/T);ΔF=FfFi<0, (35)

Typically, the Boltzmann factor grows at the lower temperatures. This means that weakly specific effects should be more pronounced at the lower temperatures. Taking into account huge numbers of species in virus populations, even a small decrease in free energy may produce a significant impact and be advantageous for the natural selection.

5. Conclusion

The methods developed in this article are quite general and can be applied to the detection of assembly/packaging signals in all viral genomes packaged within helical capsids including the other infectious coronaviruses such as 229E, NL63, OC43, HKU1 and MERS-CoV. The ssRNA genomes of numerous filamentous and rod-shaped plant viruses are also packaged within capsids with helical symmetry (Solovyev & Makarov, 2016; Stubbs & Kendall, 2012). As shown, combining NCF, DFT and DDFT provides efficient tools for the investigation of this problem. It is essential that dominating triplet periodicity p = 3 typical of protein coding regions in the viral genomes should be suppressed to discern the longer periodic patterns related to the assembly/packaging signals. After detection of periodic patterns and determination of their periods, the underlying motifs can be explicitly reconstructed by TAMGI. Generally, TAMGI can be efficiently used for data mining and search for cis/trans-elements in genomic sequences. The mutual experimental and bioinformatic analysis and the knowledge about the assembly/packaging mechanisms in viral genomes should facilitate the choice of the most efficient strategy in practical medical applications. The regular study of hidden quasi-periodic patterns is of basic interest for the virology.

Supplementary Material

Supplement_S7.xlsx
Supplement_S6.txt
Supplement_S5.txt
Supplement_S4.txt
Supplement_S3.txt
Supplement_S2.xlsx
Supplement_S1.xlsx

Glossary

Abbreviations

DDFT

discrete double Fourier transform

DFT

discrete Fourier transform

NCF

nucleotide correlation functions

N proteins

nucleocapsid proteins

SARS-CoV

severe acute respiratory syndrome coronavirus

SARS-CoV-2

severe acute respiratory syndrome coronavirus 2

ssRNA

single-stranded RNA

TAMGI

transitional automorphic mapping of the genome on itself

UTR

untranslated region.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  1. Andersen, K. G., Rambaut, A., Lipkin, W. I., Holmes, E. C., & Garry, R. F. (2020). The proximal origin of SARS-CoV-2. Nature Medicine, 26(4), 450–452. 10.1038/s41591-020-0820-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Baine, I., & Hui, P. (2019). Practical applications of DNA genotyping in diagnostic pathology. Expert Review of Molecular Diagnostics, 19(2), 175–188. 10.1080/14737159.2019.1568874 [DOI] [PubMed] [Google Scholar]
  3. Balakirev, E. S., Chechetkin, V. R., Lobzin, V. V., & Ayala, F. J. (2003). DNA polymorphism in the β-esterase gene cluster of Drosophila melanogasterGenetics, 164(2), 533–544. PMID: 12807774 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Balakirev, E. S., Chechetkin, V. R., Lobzin, V. V., & Ayala, F. J. (2005). Entropy and GC content in the beta-esterase gene cluster of the Drosophila melanogaster subgroup . Molecular Biology and Evolution, 22(10), 2063–2072. 10.1093/molbev/msi197 [DOI] [PubMed] [Google Scholar]
  5. Balakirev, E. S., Chechetkin, V. R., Lobzin, V. V., & Ayala, F. J. (2014). Computational methods of identification of pseudogenes based on functionality: Entropy and GC content. Methods in Molecular Biology (Clifton, N.J.), 1167, 41–62. 10.1007/978-1-4939-0835-6_4 [DOI] [PubMed] [Google Scholar]
  6. Boeva, V., Regnier, M., Papatsenko, D., & Makeev, V. (2006). Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics (Oxford, England), 22(6), 676–684. 10.1093/bioinformatics/btk032 [DOI] [PubMed] [Google Scholar]
  7. Butler, J. (2011). Advanced topics in forensic DNA typing: Methodology. Elsevier Academic Press. [Google Scholar]
  8. Cagliani, R., Forni, D., Clerici, M., & Sironi, M. (2020). Computational inference of selection underlying the evolution of the novel coronavirus, severe acute respiratory syndrome coronavirus 2. Journal of Virology, 94(12), e00411–e00420. 10.1128/JVI.00411-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chang, C., Hou, M., Chang, C., Hsiao, C., & Huang, T. (2014). The SARS coronavirus nucleocapsid protein-forms and functions. Antiviral Research, 103, 39–50. https://doi.org/ 10.1016/j.antiviral.2013.12.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Chang, C. K., Hsu, Y. L., Chang, Y. H., Chao, F. A., Wu, M. C., Huang, Y. S., Hu, C. K., & Huang, T. H. (2009). Multiple nucleic acid binding sites and intrinsic disorder of severe acute respiratory syndrome coronavirus nucleocapsid protein: Implications for ribonucleocapsid protein packaging. Journal of Virology, 83(5), 2255–2264. 10.1128/JVI.02001-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Chang, C-K., Lo, S.-C., Wang, Y.-S., & Hou, M.-H. (2016). Recent insights into the development of therapeutics against coronavirus diseases by targeting N protein. Drug Discovery Today, 21(4), 562–572. 10.1016/j.drudis.2015.11.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Chechetkin, V. R. (2011). Spectral sum rules and search for periodicities in DNA sequences. Physics Letters A, 375(16), 1729–1732. 10.1016/j.physleta.2011.03.007 [DOI] [Google Scholar]
  13. Chechetkin, V. R., & Lobzin, V. V. (1996). Levels of ordering in coding and non-coding regions of DNA sequences. Physics Letters A, 222(5), 354–360. 10.1016/0375-9601(96)00672-X [DOI] [Google Scholar]
  14. Chechetkin, V. R., & Lobzin, V. V. (2017). Large-scale chromosome folding versus genomic DNA sequences: A discrete double Fourier transform technique. Journal of Theoretical Biology, 426, 162–179. 10.1016/j.jtbi.2017.05.033 [DOI] [PubMed] [Google Scholar]
  15. Chechetkin, V. R., & Lobzin, V. V. (2019). Genome packaging within icosahedral capsids and large-scale segmentation in viral genomic sequences. Journal of Biomolecular Structure & Dynamics, 37(9), 2322–2338. 10.1080/07391102.2018.1479660 [DOI] [PubMed] [Google Scholar]
  16. Chechetkin, V. R., & Lobzin, V. V. (2020. a). Detection of large-scale noisy multi-periodic patterns with discrete double Fourier transform. Fluctuation and Noise Letters, 19(02), 2050019. https://doi.org/ 10.1142/S0219477520500194 [DOI] [Google Scholar]
  17. Chechetkin, V. R., & Lobzin, V. V. (2020. b). Detection of large-scale noisy multi-periodic patterns with discrete double Fourier transform. II. Study of correlations between patterns. Fluctuation and Noise Letters, 20(1), 2150003. 10.1142/S0219477521500036 [DOI] [Google Scholar]
  18. Chechetkin, V. R., & Turygin, A. Y. (1994). On the spectral criteria of disorder in non-periodic sequences: Application to inflation models, symbolic dynamics and DNA sequences. Journal of Physics A: Mathematical and General, 27(14), 4875–4898. 10.1088/0305-4470/27/14/016 [DOI] [Google Scholar]
  19. Chechetkin, V. R., & Turygin, A. Y. (1995). Search of hidden periodicities in DNA sequences. Journal of Theoretical Biology, 175, 477–494. 10.1006/jtbi.1995.0155 [DOI] [PubMed] [Google Scholar]
  20. Chen, C. Y., Chang, C. K., Chang, Y. W., Sue, S. C., Bai, H. I., Riang, L., Hsiao, C. D., & Huang, T. H. (2007). Structure of the SARS coronavirus nucleocapsid protein RNA-binding dimerization domain suggests a mechanism for helical packaging of viral RNA. Journal of Molecular Biology, 368(4), 1075–1086. 10.1016/j.jmb.2007.02.069 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Dufva, M. (Ed.). (2009). DNA microarrays for biomedical research: Methods and protocols. Springer Humana Press. 10.1007/978-1-59745-538-1 [DOI] [Google Scholar]
  22. Feng, W., Zong, W., Wang, F., & Ju, S. (2020). Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2): A review. Molecular Cancer, 19(1), 100. 10.1186/s12943-020-01218-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Filipovska, A., & Rackham, O. (2012). Modular recognition of nucleic acids by PUF, TALE and PPR proteins. Molecular Biosystems, 8(3), 699–708. 10.1039/c2mb05392f [DOI] [PubMed] [Google Scholar]
  24. Forster, P., Forster, L., Renfrew, C., & Forster, M. (2020). Phylogenetic network analysis of SARS-CoV-2 genomes. Proceedings of the National Academy of Sciences of the United States of America, 117(17), 9241–9243. 10.1073/pnas.2004999117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Fosmire, J. A., Hwang, K., & Makino, S. (1992). Identification and characterization of a coronavirus packaging signal. Journal of Virology, 66(6), 3522–3530. 10.1128/JVI.66.6.3522-3530.1992 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Fung, T. S., & Liu, D. X. (2019). Human coronavirus: Host-pathogen interaction . Annual Review of Microbiology, 73, 529–557. 10.1146/annurev-micro-020518-115759 [DOI] [PubMed] [Google Scholar]
  27. Grover, A., & Sharma, P. C. (2016). Development and use of molecular markers: Past and present. Critical Reviews in Biotechnology, 36(2), 290–302. 10.3109/07388551.2014.959891 [DOI] [PubMed] [Google Scholar]
  28. Gui, M., Liu, X., Guo, D., Zhang, Z., Yin, C., Chen, Y., & Xiang, Y. (2017). Electron microscopy studies of the coronavirus ribonucleoprotein complex. Protein & Cell, 8(3), 219–224. 10.1007/s13238-016-0352-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Hall, T. M. (2016). De-coding and re-coding RNA recognition by PUF and PPR repeat proteins. Current Opinion in Structural Biology, 36, 116–121. 10.1016/j.sbi.2016.01.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Hurst, K. R., Ye, R., Goebel, S. J., Jayaraman, P., & Masters, P. S. (2010). An interaction between the nucleocapsid protein and a component of the replicase-transcriptase complex is crucial for the infectivity of coronavirus genomic RNA. Journal of Virology, 84(19), 10276–10288. https://doi.org/ 10.1128/JVI.01287-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Karlin, S., & Altschul, S. F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences of the United States of America, 87(6), 2264–2268. 10.1073/pnas.87.6.2264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Kayser, M. (2017). Forensic use of Y-chromosome DNA: A general overview. Human Genetics, 136(5), 621–635. https://doi.org/ 10.1007/s00439-017-1776-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Kramps, T., & Elbers, K. (Eds.). (2017). RNA vaccines. In Methods and protocols. Springer Humana Press. 10.1007/978-1-4939-6481-9 [DOI] [Google Scholar]
  34. Kuo, L., Hurst-Hess, K. R., Koetzner, C. A., & Masters, P. S. (2016). Analyses of coronavirus assembly interactions with interspecies membrane and nucleocapsid protein chimeras. Journal of Virology, 90(9), 4357–4368. 10.1128/JVI.03212-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Lin, S.-M., Lin, S.-C., Hsu, J.-N., Chang, C-K., Chien, C.-M., Wang, Y.-S., Wu, H.-Y., Jeng, U.-S., Kehn-Hall, K., & Hou, M.-H. (2020). Structure-based stabilization of non-native protein-protein interactions of coronavirus nucleocapsid proteins in antiviral drug design. Journal of Medicinal Chemistry, 63(6), 3131–3141. 10.1021/acs.jmedchem.9b01913 [DOI] [PubMed] [Google Scholar]
  36. Lobzin, V. V., & Chechetkin, V. R. (2000). Order and correlations in genomic DNA sequences. The spectral approach. Physics-Uspekhi, 43(1), 55–78. 10.1070/PU2000v043n01ABEH000611 [DOI] [Google Scholar]
  37. Lunde, B. M., Moore, C., & Varani, G. (2007). RNA-binding proteins: Modular design for efficient function. Nature Reviews. Molecular Cell Biology, 8(6), 479–490. 10.1038/nrm2178 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. MacLean, O. A., Orton, R. J., Singer, J. B., & Robertson, D. L. (2020). No evidence for distinct types in the evolution of SARS-CoV-2. Virus Evolution, 6(1), veaa034. 10.1093/ve/veaa034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Macneughton, M. R., Davies, H. A., & Nermut, M. V. (1978). Ribonucleoprotein-like structures from coronavirus particles . Journal of General Virology, 39(3), 545–549. https://doi.org/ 10.1099/0022-1317-39-3-545 [DOI] [PubMed] [Google Scholar]
  40. Madhugiri, R., Fricke, M., Marz, M., & Ziebuhr, J. (2016). Coronavirus cis-acting RNA elements. Advances in Virus Research, 96, 127–163. 10.1016/bs.aivir.2016.08.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Maier, H. J., Bickerton, E., & Britton, P. (Eds.). (2015). Coronaviruses. In Methods and protocols. Springer Humana Press. 10.1007/978-1-4939-2438-7 [DOI] [Google Scholar]
  42. Marhon, S. A., & Kremer, S. C. (2011). Gene prediction based on DNA spectral analysis: A literature review. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology, 18(4), 639–676. 10.1089/cmb.2010.0184 [DOI] [PubMed] [Google Scholar]
  43. Marple, S. L., Jr. (1987). Digital spectral analysis with applications. Prentice-Hall. [Google Scholar]
  44. Masters, P. S. (2019). Coronavirus genomic RNA packaging. Virology, 537, 198–207. 10.1016/j.virol.2019.08.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. McBride, R., van Zyl, M., & Fielding, B. C. (2014). The coronavirus nucleocapsid is a multifunctional protein. Viruses, 6(8), 2991–3018. 10.3390/v6082991 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Min, W.-P., & Ichim, T. (Eds.). (2010). RNA interference. In From biology to clinical applications. Springer Humana Press. 10.1007/978-1-60761-588-0 [DOI] [Google Scholar]
  47. Narayanan, K., & Makino, S. (2001). Cooperation of an RNA packaging signal and a viral envelope protein in coronavirus RNA packaging. Journal of Virology, 75(19), 9059–9067. https://doi.org/ 10.1128/JVI.75.19.9059-9067.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Neuman, B. W., & Buchmeier, M. J. (2016). Supramolecular architecture of the coronavirus particle. Advances in Virus Research, 96, 1–27. 10.1016/bs.aivir.2016.08.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Saxena, S. (Ed.). (2020). Coronavirus disease 2019 (COVID-19). In Medical virology: From pathogenesis to disease control. Springer. 10.1007/978-981-15-4814-7_13 [DOI] [Google Scholar]
  50. Solovyev, A. G., & Makarov, V. V. (2016). Helical capsids of plant viruses: Architecture with structural lability. The Journal of General Virology, 97(8), 1739–1754. 10.1099/jgv.0.000524 [DOI] [PubMed] [Google Scholar]
  51. Stockley, P. G., White, S. J., Dykeman, E., Manfield, I., Rolfsson, O., Patel, N., Bingham, R., Barker, A., Wroblewski, E., Chandler-Bostock, R., Weiß, E. U., Ranson, N. A., Tuma, R., & Twarock, R. (2016). Bacteriophage MS2 genomic RNA encodes an assembly instruction manual for its capsid. Bacteriophage, 6(1), e1157666. 10.1080/21597081.2016.1157666 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Stubbs, G., & Kendall, A. (2012). Helical viruses . Advances in Experimental Medicine and Biology, 726, 631–658. 10.1007/978-1-4614-0980-9_28 [DOI] [PubMed] [Google Scholar]
  53. Subirana, J. A., & Messeguer, X. (2019). Satellites in the prokaryote world. BMC Evolutionary Biology, 19(1), 181. 10.1186/s12862-019-1504-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Suvorova, Y. M., Korotkova, M. A., & Korotkov, E. V. (2014). Comparative analysis of periodicity search methods in DNA sequences. Computational Biology and Chemistry, 53, 43–48. 10.1016/j.compbiolchem.2014.08.008 [DOI] [PubMed] [Google Scholar]
  55. Sznajder, L. J., & Swanson, M. S. (2019). Short tandem repeat expansions and RNA-mediated pathogenesis in myotonic dystrophy. International Journal of Molecular Sciences, 20(13), 3365. https://doi.org/ 10.3390/ijms20133365 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Tang, X., Wu, C., Li, X., Song, Y., Yao, X., Wu, X., Duan, Y., Zhang, H., Wang, Y., Qian, Z., Cui, J., & Lu, J. (2020). On the origin and continuing evolution of SARS-CoV-2. National Science Review, 7(6), 1012–1023. 10.1093/nsr/nwaa036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Tilocca, B., Soggiu, A., Sanguinetti, M., Musella, V., Britti, D., Bonizzi, L., Urbani, A., & Roncada, R. (2020). Comparative computational analysis of SARS-CoV-2 nucleocapsid protein epitopes in taxonomically related coronaviruses. Microbes and Infection, 22(4–5), 188–194. 10.1016/j.micinf.2020.04.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Twarock, R., Bingham, R. J., Dykeman, E. C., & Stockley, P. G. (2018). A modelling paradigm for RNA virus assembly. Current Opinion in Virology, 31, 74–81. 10.1016/j.coviro.2018.07.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Verheije, M. H., Hagemeijer, M. C., Ulasli, M., Reggiori, F., Rottier, P. J., Masters, P. S., & de Haan, C. A. (2010). The coronavirus nucleocapsid protein is dynamically associated with the replication-transcription complexes. Journal of Virology, 84(21), 11575–11579. 10.1128/JVI.00569-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Woo, J., Lee, E. Y., Lee, M., Kim, T., & Cho, Y.-E. (2019). An in vivo cell-based assay for investigating the specific interaction between the SARS-CoV N-protein and its viral RNA packaging sequence . Biochemical and Biophysical Research Communications, 520(3), 499–506. 10.1016/j.bbrc.2019.09.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Xie, M., & Chen, Q. (2020). Insight into 2019 novel coronavirus - An updated interim review and lessons from SARS-CoV and MERS-CoV. International Journal of Infectious Diseases: IJID: Official Publication of the International Society for Infectious Diseases, 94, 119–124. 10.1016/j.ijid.2020.03.071 [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Yadav, R., Imran, M., Dhamija, P., Suchal, K., & Handu, S. (2020). Virtual screening and dynamics of potential inhibitors targeting RNA binding domain of nucleocapsid phosphoprotein from SARS-CoV-2. Journal of Biomolecular Structure and Dynamics. 10.1080/07391102.2020.1778536 [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Ziebuhr, J. (Ed.). (2016). Coronaviruses. Elsevier Academic Press. 10.1016/bs.aivir.2016.08.005 [DOI] [Google Scholar]
  64. Zubkov, A. M., & Mikhailov, V. G. (1974). Limit distributions of random variables associated with long duplications in a sequence of independent trials. Theory of Probability & Its Applications, 19(1), 172–179. 10.1137/1119017 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement_S7.xlsx
Supplement_S6.txt
Supplement_S5.txt
Supplement_S4.txt
Supplement_S3.txt
Supplement_S2.xlsx
Supplement_S1.xlsx

Articles from Journal of Biomolecular Structure & Dynamics are provided here courtesy of Taylor & Francis

RESOURCES