Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 Jun 8;16(6):e0250654. doi: 10.1371/journal.pone.0250654

Potential G-quadruplexes and i-Motifs in the SARS-CoV-2

Efres Belmonte-Reche 1,*, Israel Serrano-Chacón 2, Carlos Gonzalez 2, Juan Gallo 1, Manuel Bañobre-López 1
Editor: Eric Charles Dykeman3
PMCID: PMC8186786  PMID: 34101725

Abstract

Quadruplex structures have been identified in a plethora of organisms where they play important functions in the regulation of molecular processes, and hence have been proposed as therapeutic targets for many diseases. In this paper we report the extensive bioinformatic analysis of the SARS-CoV-2 genome and related viruses using an upgraded version of the open-source algorithm G4-iM Grinder. This version improves the functionality of the software, including an easy way to determine the potential biological features affected by the candidates found. The quadruplex definitions of the algorithm were optimized for SARS-CoV-2. Using a lax quadruplex definition ruleset, which accepts amongst other parameters two residue G- and C-tracks, 512 potential quadruplex candidates were discovered. These sequences were evaluated by their in vitro formation probability, their position in the viral RNA, their uniqueness and their conservation rates (calculated in over seventeen thousand different COVID-19 clinical cases and sequenced at different times and locations during the ongoing pandemic). These results were then compared subsequently to other Coronaviridae members, other Group IV (+)ssRNA viruses and the entire viral realm. Sequences found in common with other viral species were further analyzed and characterized. Sequences with high scores unique to the SARS-CoV-2 were studied to investigate the variations amongst similar species. Quadruplex formation of the best candidates were then confirmed experimentally. Using NMR and CD spectroscopy, we found several highly stable RNA quadruplexes that may be suitable therapeutic targets for the SARS-CoV-2.

Introduction

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a positive-sense single-stranded RNA virus from the Betacoronavirus genus, within the Coronaviridae family of the Nidovirales order. Although it is believed to have originated from a bat-borne coronavirus [15], the SARS-CoV-2 can spread between humans with no need of other vectors or reservoirs for its transmission. The virus is responsible for the ongoing COVID-19 pandemic that has caused hundreds of thousands of deaths, millions of infected and a disastrous strain on the economy of most countries and citizens worldwide.

The origin of the virus has been traced back to the Chinese city of Wuhan, where the first cases of infected individuals were reported amongst the workers of the Huanan Seafood Market [6, 7]. This wet exotic animal market, where wild animals including bats and pangolins are sold and prepared for consumption, offers ample opportunities for pathogenic bacteria and viruses to adapt and thrive [8, 9]. Such circumstances led Cheng and colleagues to predict the current pandemic back in 2007 [10]. In their own words: “the presence of a large reservoir of SARS-CoV-like viruses in horseshoe bats, together with the culture of eating exotic mammals in southern China, is a time bomb. The possibility of the re-emergence of SARS and other novel viruses from animals or laboratories and therefore the need for preparedness should not be ignored”.

SARS-CoV-2 has now become a global problem. In this current scenario, the scientific community is playing a fundamental role in minimizing the number of victims. Their work includes, to name a few, the development of fast and reliable detection methods, the identification of therapeutic targets within the virus, and the development of active drugs and vaccines to cure and to prevent infections, respectively.

G-Quadruplexes (G4s) and i-Motifs (iMs) have been proposed as therapeutic targets in many disease aetiologies. G4s are Guanine (G) rich DNA or RNA nucleic acid sequences where successive Gs stack in a planar fashion via Hoogsteen bonds to form four-stranded structures, stabilized by monovalent cations [11]. iMs on the contrary, are Cytosine (C)-rich regions that fold into tetrameric structures of stranded duplexes [1214]. These are sustained by hydrogen bonds between the intercalated nucleotide base pairs C·C+ when under acidic physiological conditions.

The importance of these genomic secondary structures has been abundantly studied during the last years [1520]. They have been found to be regulatory elements in the human genome implicated in key functions such as telomere maintenance and genome transcription regulation, replication and repair [21]. G4 structures have also been identified in fungi [2225], bacteria [2630] and parasites [3136]. Their occurrence are known in many viruses that infect humans as well. These include the HIV-1 [3739], Epstein-Barr [40, 41], human and manatee papilloma [42, 43], herpes simplex 1 [44, 45], Hepatitis B [46], Ebola [47] and Zika [48] viruses. Here they can regulate the viral replication, recombination and virulence [32, 49, 50].

iMs have been less studied in general, especially outside of the human context. With regards to viruses, Ruggiero et al. recently published the formation of an iM in HIV-1 [51], whilst we reported the presence of the known cMyb.S [52] iM within the Epstein-Barr virus [53]. Despite the lack off reports, iMs are interesting potential therapeutic targets for viruses. For example, the in silico analysis of the rubella virus revealed an extremely dense genome of potential iMs (density as counts per genomic length) that surpassed its human counterpart by over an order of magnitude [53]. In the same study, other viruses such as the measles and hepacivirus C presented potential iMs densities similar to the human genome.

In this work, we wished to contribute to the ongoing research efforts related to the COVID-19 pandemic by investigating SARS-CoV-2 for the presence of quadruplex structures. With this aim, we analysed the prevalence, distribution and relationships of Potential G4 Sequences (PQS) and Potential iM Sequences (PiMS) in its genome. These PQS and PiMS have been evaluated according to their potential to form quadruplex structures in vitro and localization within the genome. The presence of confirmed quadruplex-forming sequences and the candidate’s frequency, uniqueness and conservation rates between 17312 different SARS-CoV-2 clinical cases were also analyzed. The study of the SARS-CoV-2 and its quadruplex results were expanded to integrate the Coronaviridae family, Group IV of the Baltimore classification and the entire virus realm, as to allow a wider range of interpretation. With all this information at hand, our final objective was to identify biologically important PQS and PiMS candidates in the virus. To substantiate our bioinformatic analysis, we analysed experimentally some of these sequences by CD and NMR spectroscopies. Our in vitro results confirmed the formation of stable quadruplexes that can form in the viral genome, suggesting that they may be suitable targets for new therapeutic or diagnostic agents [50, 54]. Hence, our analysis of the SARS-CoV-2, and by extension of the entire virus realm, may provide useful insights into using quadruplex structures as targets in future anti-viral treatments.

Materials and methods

G4-iM Grinder and G4-iM Grinder’ parameter configuration

In this work, we have used an upgraded version of the G4-iM Grinder package (GiG, https://github.com/EfresBR/G4iMGrinder) for the analysis of all viruses (S1 File, section 1). GiG is an R-based algorithm that locates, quantifies and qualifies PQS, PiMS and their potential higher-order versions in RNA and DNA genomes [53]. We retrieved the SARS-CoV-2’s reference sequence (GCF_009858895.2) from the NCBI database [55]. We also downloaded those of 18 other viruses which can cause mortal illness in humans, including six other pathogenic Coronavirus, as comparison (S1 File, section 2).

As a workflow, we applied the functions GiG.Seq.Analysis (to study their G- and C-run characteristics), G4iMGrinder (to locate quadruplex candidates) and G4.ListAnalysis (to compare quadruplex results between genomes) from the GiG package to all the viruses. The ‘size-restricted overlapping search and frequency count’ method (Method 2, M2A and M2B) was used to locate all the candidates. Then, these PQS and PiMS were evaluated by the presence within of in vitro confirmed G4 or iM sequences, their frequency of appearance in the corresponding genome, and their probability of quadruplex-formation score (as the mean of G4Hunter [56] and the adaptation of the PQSfinder algorithm [57]). To compare between virus species, we calculated the density of potential quadruplex sequences per 100000 nucleotides (Density=100000×NumberofcandidatesGenomeLength).

We previously saw that viruses have a wider-range of PQS and PiMS densities than that of the human, fungi, bacteria and parasite genomes [53]. Some were totally void whilst others were very rich in candidates. So, we explored different quadruplex definitions to determine the most useful configurations for the analysis of the viruses at hand. These different definitions control the characteristics of what the algorithm considers a quadruplex. They include the acceptable size of G- or C-repetitions to be considered a run, the acceptable amount of bulges within these runs, the acceptable loop sizes between runs, the acceptable number of runs to constitute a PQS or PiMS, and the total acceptable length of the sequence (Fig 1A). A flexible configuration of quadruplex definitions will detect larger amounts of candidates at the expense of requiring more computing power and accepting sequences that are more ambiguous in forming quadruplex structures in vitro (as determined by their score; with longer loops, smaller runs, more bulges and more complementary G/C %, Fig 1B). More constrained definitions result in the opposite. Hence, for the analysis, we chose three different configurations: a Lax configuration (which accepts run bulges and longer ranges of runs, loops and total sizes), the Predefined configuration of the package (which restricts sizes but still accepts run bulges), and the original Folding Rule [58, 59] (which restricts length and does not accept run bulges) (Fig 1C Left).

Fig 1.

Fig 1

A. Results with G4-iM Grinder depend on the quadruplex definitions introduced to the algorithm. Sizes of G- or C-runs, loops and the entire sequence, together with an acceptable number of bulges within the runs are part of the definitions. B. The structures found with GiG under the definitions proposed by the user can be evaluated for their in vitro probability of formation. More positive scores mean that the sequence is more capable of forming G4s, whilst more negative values mean that it is more capable of forming iMs. C. Left, Quadruplex definitions used by GiG’s search engine in this work. C. Right, Total results found within the SARS-CoV-2 by configuration and score criteria. D. PQS and PiMS densities (per 100000 nucleotides) found per different configuration and score criteria for 19 viruses. The G and C content (as a percentage) is shown under each virus. X scale is in logarithmic scale (base 10). Results are categorized by their |score|: intense colours (blue for PQS, yellow for PiMS) are the most probable to form in vitro (|score| ≥ 40), lighter bars are the density of structures with at least a |score| ≥ 20 and grey bars are the densities without the score filter.

Then, we calculated the PQS and PiMS densities of each virus to allow a direct size-independent comparison between them all (Fig 1D), and filtered the results by their in vitro probability of formation score. The |score| filters were set to 20 and 40 to allow us the study of both the medium (PQS score ≥ 20; PiMS score ≤ -20) and the high probability candidates (PQS score ≥ 40; PiMS score ≤ -40; Fig 1B) within the results. These score filters are important because they qualify the sequences and grant specificity to the results of GiG’s extremely flexible search engine (which was designed solely for sensitivity), as highlighted by the results of a recent review [60].

For the viruses analysed, the best configuration to obtain significant number of candidates was the Lax set-up. This was also relevant for the reference genome of the SARS-CoV-2 (Fig 1C Right). Given the small size of the viral genomes, the increase in computational power was deemed acceptable and hence, we established this Lax configuration as the default configuration for all posterior searches with GiG. Although some authors have reported the unfeasibility of forming iMs with tracks of only two C [19], such statement has been rebutted later [61], allowing the use of this configuration also for potential iMs.

The search was then expanded to 17312 different SARS-CoV-2 genomes sequenced during the pandemic (from December-2019 to January-2021, by different laboratories worldwide and downloaded from the GISAID database [62]), other Coronaviridae family members and the entire virus realm (6678 other viruses) using the methodologies described previously and in the S1 File, section 1. To validate the in silico findings, the most interesting candidates were selected and confirmed by NMR and CD spectroscopy.

in silico methodology

To analyse these genomes, we employed the workflow described in the G4-iM Grinder’ parameter configuration section of the manuscript using the Lax parameter configuration. We investigated the biological features potentially affected by candidates using the function GiG.df.GenomicFeatures of the GiG package. The conservation of each PQS and PiMS found in the reference genome was calculated as {Conservation (%) = 100×∑Ng+/∑Ng} where Ng is the number of genomes, and Ng+ is the number of genomes with the PQS or PiMS candidate. The genomic pairwise alignments, used to study the similarity between viruses and detect PQS and PiMS variations between species, were done using the pairwiseAlignment function (global alignment type) from the Biostrings package in the Bioconductor repository. We calculated the divergence from the reference genome per clade (or lineage) as, {Divergence=(|N°Clade/Lineage|S.|20¯-N°ref|S.|20|)+N°LineageVariants|S.|20¯} where N°Clade/Lineage|S.|20¯ is the clade/lineage’s mean number of PQS or PiMS that |score| at least 20, N°ref|S.|20 is the number of PQS or PiMS that |score| at least 20 in the reference genome, and N°LineageVariants|S.|20¯ is the mean number variants of PQS or PiMS that |score| at least 20 per Lineage. To compare potential quadruplex presence and prevalence between genomic groupings (species, families, groups and the entire virus realm), we calculated also the genomic density of several arguments. These were calculated using the GiGList.Analysis function of the GiG package (density per 100000 nucleotides). The arguments were the density of results (PQS and PiMS), density of results with |score| filters (with at least 20 or 40), density of already confirmed sequences that form G4 or iM within, and uniqueness (as {Uniqueness (%) = 100×∑ Nsf = 1/∑ Ns} where Ns is the number of sequences, and Nsf = 1 is the number of sequences with a frequency of appearance of 1 in its respective genome). For the G- and C-runs density analysis of the viruses, we used the function GiG.Seq.Analysis from the GiG package. The arguments here were: densities of runs with different sizes (two or three to five long G- or C-runs) and with different bulges per run (zero and/or one). All of these results can be found in the S1 File, section 5.

Candidate selection

PQS and PiMS candidates were selected according to their potential to form quadruplex structures in vitro, uniqueness, frequency of appearance, conservation between 17312 different SARS-CoV-2 clinical case genomes, confirmed quadruplex presence and localization within the genome.

NMR experiments

Oligonucleotides (0.3 mM) for NMR experiments were purchased from IDT, and suspended in 200 μl of H2O/D2O 9:1 in 25 mM KH2PO4 and 25 mM KCl buffer, pH 7. Samples at acidic pH were prepared by adding aliquots of concentrated HCl. Spectra were acquired on Bruker Avance spectrometers operating at 600 MHz, and processed with Topspin software. Experiments were carried out at temperatures ranging from 5.1 to 45°C and pH from 5 to 7. NOESY spectra in H2O were acquired with a 150 ms mixing time. Water suppression was achieved by including a WATERGATE module in the pulse sequence prior to acquisition.

Circular Dichroism (CD)

Circular dichroism (CD) studies were performed on a JASCO J-810 spectropolarimeter using a 1 mm path length cuvette. Spectra were recorded in a 320–220 nm range at a scan rate of 100 nm min−1 and a response time of 4.0 s with four acquisitions recorded for each spectrum. Data were smoothed using the means-movement function within the JASCO graphing software. Melting transitions were recorded by the monitoring the decrease of the CD signal at 264 nm. Heating rates were 30°C/h. Transitions were evaluated using a nonlinear least squares fit assuming a two-state model with sloping pre- and post-transitional baselines. Oligonucleotide solutions for CD measurements were prepared at the same buffer conditions as the NMR experiments. Oligonucleotide concentration was of 50 μM.

Results and discussion

A detailed analysis of the results of SARS-CoV-2, Coronaviridae family and the entire virus realm with G4-iM Grinder can be found in the S1 File, Section 3.

G4-iMGrinder and settings

The genome of the SARS-CoV-2, and that of many other viruses, were analysed with G4-iM Grinder in search off potential quadruplex (both G4 and iM) therapeutic targets. To do so, we first expanded G4-iM Grinder’s quadruplex identification and characterization repertoire with two new functions, GiG.Seq.Analysis and GiG.df.GenomicFeatures. Other functions such as G4iMGrinder and GiGList.Analysis were upgraded to better analyse and summarise the quadruplex results obtained. Furthermore, over 2800 quadruplex-related sequences were searched for in the literature and included in G4-iM Grinder’s database to rapidly identify confirmed G4s and iMs within all results.

An initial study of the SARS-CoV-2 genome and 18 other pathogenic viruses revealed the special characteristics that need to be considered for quadruplex-related examinations in these organisms. For most, the original folding rule (which accepts no bulges within the runs and is very constrained in its quadruplex definitions) and the predefined parameters of G4-iM Grinder (which allows more liberty by accepting bulges and longer loops) are too strict to find associated runs that can give rise to quadruplexes. Although other organisms such as Plasmodium falciparum or Entamoeba histolytica may be poorer in G and C content [53], the size of these genomes enables finding rich G- or C- tracks that can ultimately form potential quadruplexes. In most viruses, however, this does not take place because of the small size of the genomes (in the range of tens to hundreds thousand nucleotides versus the tens of millions for the parasites mentioned, and thousands of millions for humans). Furthermore, most of the G4s found in viruses are complex sequences, with short runs and bulges (for example, HIV-1 [37, 39] and Ebola [47]), which elude detection when following traditional quadruplex definitions. To overcome these problems, we took advantage of the great adaptability of G4-iM Grinder, and developed, tested and successfully employed a lax quadruplex definition configuration for the analysis. With these settings, the number of candidates found increased greatly and included the complex sequences expected in viruses, at the expense of needing more computational power.

SARS-CoV-2

With all these updates and configurations at hand, we focused on the reference SARS-CoV-2 and located 323 PQS and 189 PiMS unique (only occurring once in the genome) sequences dispersed unevenly in its genome (Fig 2). 20% of these candidates had at least a medium probability of formation (|score| ≥ 20), and 7 PQS and 10 PiMS had a |score| ≥ 30 (Fig 2D). Candidates with at least a medium probability of formation concentrate in the N, S and especially in the orf1ab gene (in the nsp 1 and 3 regions for PQSs and in the nsp 3, 4 and 12 regions for PiMS). The orf3a, orf8 and UTR regions also presented these candidates. Other genes, such as orb7a and b, and orf10 were found totally void of them.

Fig 2.

Fig 2

A. Top. Percentage of conservation of each PQS found along the genome of the SARS-CoV-2. Each point represents one PQS. The PQS score is given by the fill colour of the points, where lower |scores| are greyer, and bluer points have higher |scores|. Bottom. PQS count density plot related to the genome position (counts per 200 nucleotides). Grey coloured density plots are all the results found, whilst blue density plots are the results found with at least a |score| ≥ 20. B. Distribution of the biological features of the SARS-CoV-2 by its genomic position. UTR regions are in red, CDS and genes region are in green, and nps of the orf1ab gene are in purple. Orange dots are mature protein regions of the CDS. C, Top. PiMS count density plot related to the genome position (counts per 200 nucleotides). Grey coloured density plots are all the results found, whilst yellow density plots are the results found with at least a |score| ≥ 20. Bottom. Percentage of conservation of each PiMS found along the genome of the SARS-CoV-2. Each point represents one PiMS. The PiMS score is given by the fill colour of the points, where lower |scores| are greyer, and higher |scores| are more yellow. D. Top scoring PQS (Score ≥ 30, entry 1 to 7) and PiMS (Score ≤ -30, entry 8 to 17) found in the SARS-CoV-2 ordered by their localization in the genome. G-runs are in blue, C-runs are in yellow, loops are in red and bulges within the runs are in green. For each entry, the biological feature column lists the genomic landmark that hosts the potential quadruplex. The percentage of conservation is also given.

We calculated the SARS-CoV-2 candidate’s quadruplex conservation rates and quadruplex-related region variability under three different scopes.

First, attention was focused exclusively on the virus in an intra-species analysis comprising 17312 genomes of the SARS-CoV-2 sequenced at different places and times of the pandemic. Here, we found that the least conserved candidates were located in the 5’UTR, orf1ab and N regions with conservation as low as 9.8%. On the other hand, most of the sequences analysed that |scored| ≥ 20 presented conservation rates of over 99% (46/71 PQS and 21/35). Of these, only 18 PQSs and 7 PiMSs rates surpassed that of the mean sequence identity percentage between the 17312 SARS-CoV-2 and the reference genome (99.83%). To further investigate these differences, we first identified the 5429 new PQSs and 3298 new PiMS variants that |scored| ≥ 20 amongst all the SARS-CoV-2 genomes and then associated them with the versions found in the reference genome. In this manner, we identified for one of the highest-scoring PQSs found in the N-gene (entry 7, Fig 2D and entry 1, Fig 3A) a variant with the same probability of formation (entry 3, Fig 3A), which is exclusive to the lineages within B.1.1/clade GR and B.1.160/clade G. These have a substitution of a C for a U in the first loop, and together with several other less frequent variations with similar modifications in the loops, partially explain its 99.08% conservation rate. Furthermore, a nearby four-membered G-run may influence this PQS, to the point of potentially being a fifth domain [63, 64] or forming an alternative G4 (entry 2, Fig 3A). This extra G-run is separated from the PQS by a 19-nucleotide long loop that has a conservation rate of only 35%. The most frequent variants found for this poorly conserved area were also the substitution of a C for a U, as seen before (entry 4, Fig 3A). Variants of lineage A/Clade S displayed a different substitution, where a C mutates to a G and becomes an additional G-run, which can further influence the PQS (entry 5, Fig 3A). How this affects the known activity of the PQS and the N gene is yet to be determined [65]. Variants of specific lineages with heightened quadruplex formation probability were also detected for several other high scoring candidates, including a PQS found in the 5’UTR area (entry 1, Fig 2A and entry 6, Fig 3A) and a PiMS in the orf1ab gene (entry 14, Fig 2A and entry 8, Fig 3A), both of which are the only results found in SARS-CoV-2 with high a probability of forming quadruplex (|score| ≥ 40).

Fig 3.

Fig 3

A. Sequences found in the SARS-CoV-2 reference genome (those with a starting position) and some of the variants identified in specific lineages for four high scoring candidates. Mutations are underlined. B. Centre, SARS-CoV-2 phylogenetic tree by clade and lineage of the sequences analysed. Lineages with less than 100 genomes were grouped (suffix x). Inner segment, Lineage: Mean PQS count (A), Mean PQS count with |score| ≥ 20 (B) and PQS divergence from the reference genome (C). Centre segment, Mean lineage percentage sequence identity with the reference genome (dots) compared to the overall mean found for the 17312 sequences analysed (black line). Outer segment, Clade: Mean PQS count (A), Mean PQS count with |score| ≥ 20 (B) and PQS divergence from the reference genome (C). R-packages used: ggtree [66] and circlize [67].

We observed significant differences between the SARS-CoV-2 lineages and clades when considering the overall PQS differences. On the one hand, the GR clade displayed a reduced number of PQSs, PQSs that |scored| at least 20 and the least number of variants per genome analysed (Fig 3C). On the other hand, The S clade presented, on average, additional PQSs in their genome and a higher number of variants per genome analysed. In either case, both clades differed significantly from the reference genome, as well as amongst themselves. The rest of the clades presented fewer differences although some specific lineage aggrupations (B.1.1/Clade O and B.1.1/Clade G) also displayed a lower number of PQSs overall. For PiMS, the differences between clades were smaller and more homogeneous (S1 File, section 3, Fig 2C).

The search was then expanded to the rest of the Coronaviridae family. 53 SARS-CoV-2 PQS and PiMS candidates were found in common with the SARS-CoV and/or Bat coronavirus BM48-31/BGR/2008 (Bat-CoV-BM), all of which are suspect of having bats as hosts during their evolution (S1 File, section 3, Fig 3). These common sequences were located in the 3’UTR, N and E genes of the SARS-CoV-2, although most were positioned in the orf1ab gene, and especially in the 5’UTR region. Paradoxically, the candidates found in the 5’UTR site (which regulates the translation of the RNA transcript) include the least conserved group of candidates of the inter-species analysis (with conservation rates as low as 9%), while also hosting a very conserved family-wise group of candidates. On the one hand, high conservation in candidates (maintained through natural selection) may be an important factor for the survival of the virus. This importance may transcend beyond the SARS-CoV-2 and into other familiar species were PQS and PiMS were found in common. On the other hand, variability in the region may also play a vital role in the ability of the virus to adapt to new hosts, situations and environments.

The highest |scoring| candidates found in SARS-CoV-2 were however not common to any other Coronaviridae member species. So, we investigated the differences between them through genome alignments and found that most of the sequence versions amongst species (6 out of 8) were still able to form potential quadruplex structures even with modifications. Therefore, these PQS and PiMS, although different from those in the SARS-CoV-2, maintain their potential biological role and importance.

Expanding the search for common candidates to the entire virus realm, we matched one PQS and PiMS from the SARS-CoV-2 with the potential quadruplexes found in four viruses from Group I belonging to the Herpesviridae, Podoviridae and Siphoviridae families (all dsDNA) which cannot be explained by the number of sequences analysed.

SARS-CoV-2 and the virus realm

We analysed the entire virus realm in a similar fashion to other studies in the literature [68, 69]. However, we employed the lax definition of quadruplexes to detect G- and C- structures and searched for verified G4 and iM sequences already described in literature. These results were then matched and compared to SARS-CoV-2.

Whilst the SARS-CoV-2 did not present any of the published quadruplex sequences listed in the GiG.DB (as of V2.5.0) within its genome, other viruses including a wigeon-afflicting Coronavirus did. In the entire virus realm, 1725 viruses presented at least one confirmed G4 sequence in their genome, while 195 at least one confirmed iM sequence (the dimensional discrepancies between both results may partially be due to the difference in the number of G4 and iM entries in the database; 2568 and 283 respectively). The sheer volume of species with confirmed quadruplex structures in all groups of viruses suggests that quadruplexes may be common and necessary genomic regulatory elements for viruses, as seen in other organisms such as humans. However, the prevalence is not homogeneous and varies broadly at the group level although not that much at the family level. For example, some families like Group I’s Herpesviridae and Sphaerolipoviridae, Groups IV’s Matonavirirdae and Flaviviridae and Groups II’s Spiroviridae presented the highest PQS densities; whilst Groups V’s Aspiriviridae and Fimoviridae, Groups IV Mononiviridae and Mesoniviridae and Group’s I Mimiviridae displayed the lowest. PiMS showed a similar tendency with Group I (Sphaerolipoviridae and Herpesviridae) and especially IV (Tymoviridae, Matonaviridae and Gammaflexiviridae) families being the densest in candidates; whilst Groups IV (Monoviridae and Yueviridae), Groups V (Fimoviridae and Phasmavirirdae) and Groups I families (Mimiviridae) displayed the lowest. These results indicate that viruses/families (and particularly single-stranded ones) are probably more oriented to a kind of quadruplex structure in a group/genome-type independent manner, whilst being contingent upon cation concentration and pH of the environment for formation.

Altogether, the SARS-CoV-2 genome displayed a quadruplex candidate scarcity when compared in a macroscopic perspective to the virus realm. Its PQS and PiMS densities were in the lower end of results from the Coronaviridae family, which itself was in the lower end of the (+) ssRNA Group IV (in an approximate ratio of 1:2:4 for PQS and 1:2:8 for PiMS). When put into the entire virus realm context, the SARS-CoV-2 PQS density was lower than 5813 other viruses analysed (out of 6680), whilst PiMS density was lower than 6125. Furthermore, when we compared the SARS-CoV-2 reference genome results with the results of five hundred randomly shuffled genomic sequences of size and composition equal to that of SARS-CoV-2, the number of candidates found in the SARS-CoV-2 was significantly lower than the mean expected number of candidates for the genome’s size and composition. Whilst 362 ± 42 PQSs were expected, only 323 were found in the SARS-CoV-2. Similarly, 97 ± 22 PQSs that score over 20 and 3.0 ± 2.6 candidates that score over 40 were expected with this genomic size and composition, whilst 71 and 0 were found in the virus, respectively. For PiMS the total number of expected candidates for the SARS-CoV-2 size and genome composition was of 250 ± 33, whilst candidates that score -20 or less and -40 or less was of 60 ± 16 and 1.5 ± 1.6, respectively. However, SARS-CoV-2 presented only 189, 32 and 0 PiMSs for each of these respective groups. Although the SARS-CoV-2 genomic organization limits the number of potential quadruplex structures almost to its minimum, other viruses with similar low quadruplex densities were identified here to possess confirmed G4 and iM sequences within, supporting the potential these structures have for targeting the SARS-CoV-2.

Candidate confirmation in vitro

We, therefore, selected the best candidates to evaluate in vitro. NMR spectra of CoVID-RNA.G4-1 and CoVID-RNA.G4-2 exhibited imino signals in the 10.5–12.0 region, characteristic of guanine imino protons involved in G-tetrads (Fig 4A and 4B). In both cases, CD spectra also showed the characteristic positive band of parallel G-quadruplexes, which together with the NMR results confirmed the formation of very stable structures. The highly conserved CoVID-RNA.G4-1 located in the N-gene can possibly interact with the viral RNA packaging, transcription and replication functions [70]. In fact, it has been shown in a recent study that a known G4-ligand can interact with this sequence and reduce the expression of the N protein [65]. Although CoVID-RNA.G4-2 also formed a stable parallel quadruplex, the signals in the NMR spectra were broader than for CoVID-RNA.G4-1. This might be due to the formation of higher order structures through self-association between G-quadruplex units. CoVID-RNA.G4-2 is located in the nsp3 region of orf1ab very near its SUD domain. This area has been associated with the increased pathogenicity of the virus compared to other Coronaviridae that do not present it [71]. Additionally, it has been suggested that the SUD domain interacts with G-quadruplexes of the host. These results, however, open the possibility of an intrinsic gene modulation that may be linked with an increased virulence. Such a hypothesis can be extended to the SARS-CoV, as another stable PQS candidate was found in its genome in the same location (S1 File, Section 3, Fig 3B1).

Fig 4.

Fig 4

A. The candidates examined in vitro through biophysical assays. The in vitro column states if the sequence forms a quadruplex (Y for Yes, N for No). B, NMR spectra of the two RNA-G4s analyzed at different temperatures (pH 7.0). C, NMR spectra of the DNA-iM analyzed at different temperatures (pH 5.3). D, CD analysis of the two RNA-G4 analyzed.

For PiMS, we used NMR to confirm that the DNA version of a candidate located in the orf1ab gene of the SARS-CoV-2 and with a 99.54% conservation rate formed an iM at almost neutral pH (Fig 4C and S1 File, Section 3, Fig 5). However, the SARS-CoV version of the iM (which differs by one nucleotide in the first loop, from TT to TG) was unable to form even at pH 5.1. As TT base pairs are common capping positions, the substitution of the T might prevent the folding in SARS-CoV. Additionally, the presence of C in G4s lowers overall stability of the quadruplex as C can base pair with G and ultimately hinder G-quartet formation [72]. Similarly, the pairing of C with G may also impede the formation of the C-based structures. When we analysed the RNA version of the SARS-CoV-2 iM, it did not form an iM. Despite the fact that the sequences found in SARS-CoV-2 have an intermediate probability of formation, RNA iMs are known to be less stable than their DNA-versions [73]. Still, G4-iM Grinder methodology identified several more candidates with the potential to form iMs in the virus.

PQS result comparison

The results of G4-iM Grinder were compared to other recent reports of quadruplex-related analysis in the single strand of SARS-CoV-2. QGRS mapper [74] was the main tool for the search because of its browser-based interface, its predefined capability to detect two-sized G-runs and its design that returns all the PQSs found independently of their score [65, 7578]. Other search engines such as G4Hunter and PQSfinder automatically filter their results by their score threshold, which makes criterion optimization fundamental to successfully execute the analysis. For example, one PQSs was found with a threshold of ≥ 1.2 and none with higher thresholds when using G4Hunter in the virus (in its scale of -4 to 4) [75]. On the contrary, 25 candidates have been reported using QGRS mapper with very small scores (mean QGRS Score of 12 ± 5 in QGRS mapper’s scale of ≈ 0 to 100; mean G4Hunter score of 0.6 ± 0.2 in G4Hunter scale). G4catchall [79], PQSfinder and QGRS mapper methodologies were also combined to select 15 PQSs, 13 of which were part of the original QGRS mapper results [80]. Except one, all of these sequences reported to date in SARS-CoV-2 have been found with G4-iM Grinder and are part of the analysis made here. These are (mainly) part of the 71 sequences with a medium probability of forming G4 (scored between 20 and 40 in G4-iM Grinder’s scale). G4-iM Grinder however, found 47 extra PQSs that have not been previously reported for the SARS-CoV-2 with the same probability of forming G4s. Additionally, over 5000 different variants of these PQSs were also identified with the same probability, in the analysis of the 17312 different SARS-CoV-2 genomes.

Overall, these results complement the current knowledge we have regarding quadruplexes and the SARS-CoV-2. They also broaden the way for targeting viruses in general, and the SARS-CoV-2 in particular, through the use of these nucleic sequences as therapeutic targets in future anti-viral treatments. G4-ligands based on small molecules that can stabilize G4s have recently been proposed to be viable antivirus strategies for viruses such as Ebola, HIV and HCV (reviewed in [50]). For the SARS-CoV-2, G4-ligands have already been reported to significantly reduce protein translation levels in vivo and in vitro [65]. Another report highlighted the existing evidence indicating that helicase inhibitors may also exert antiviral activity as another therapeutic approach for SARS-CoV-2 [78].

Supporting information

S1 File

(PDF)

Acknowledgments

The authors thank Dr. Matilde Arévalo, Rafael Ferreira and Sarah Heselden for their help regarding this topic.

Data Availability

All results and the R package are available from the URL https://github.com/EfresBR/G4iMGrinder.

Funding Statement

M.B-L & J.G: a. Grants: 1. NORTE-01-0145-FEDER-000019, 2. NORTE-01-0145-FEDER-031142 and 3. 0624_2IQBIONEURO_6_E. b. Funders: 1. 2014-2020 North Portugal Regional Operational Program (NORTE 2020) and the European Regional Development Fund (ERDF), 2. the Fundação para a Ciência e a Tecnoloxía (FCT), ERDF and NORTE 2020, 3. 2014-2020 INTERREG Cooperation Programme Spain–Portugal (POCTEP). c. URLS: 1. https://norte2020.pt, https://ec.europa.eu/regional_policy/en/funding/erdf/, 2. https://www.fct.pt, https://norte2020.pt, https://ec.europa.eu/regional_policy/en/funding/erdf/, 3. https://interreg.eu/programme/interreg-spain-portugal-poctep/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. C.G: a. Grants: BFU2017-89707-P b. Funders: Spanish Ministry of Science, Innovation and Universities (MCIU) c. URL: https://www.ciencia.gob.es d. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Benvenuto D, Giovanetti M, Ciccozzi A, Spoto S, Angeletti S, Ciccozzi M. The 2019-new coronavirus epidemic: Evidence for virus evolution. J Med Virol. 2020;92: 455–459. doi: 10.1002/jmv.25688 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Perlman S. Another Decade, Another Coronavirus. N Engl J Med. 2020;382: 760–762. doi: 10.1056/NEJMe2001126 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Zhou P, Yang X-L, Wang X-G, Hu B, Zhang L, Zhang W, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579: 270–273. doi: 10.1038/s41586-020-2012-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Andersen KG, Rambaut A, Lipkin WI, Holmes EC, Garry RF. The proximal origin of SARS-CoV-2. Nat Med. 2020;26: 450–452. doi: 10.1038/s41591-020-0820-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lau SKP, Luk HKH, Wong ACP, Li KSM, Zhu L, He Z, et al. Possible Bat Origin of Severe Acute Respiratory Syndrome Coronavirus 2. Emerg Infect Dis. 2020;26: 1542–1547. doi: 10.3201/eid2607.200092 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020;395: 497–506. doi: 10.1016/S0140-6736(20)30183-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chen N, Zhou M, Dong X, Qu J, Gong F, Han Y, et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. 2020;395: 507–513. doi: 10.1016/S0140-6736(20)30211-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Peck KM, Burch CL, Heise MT, Baric RS. Coronavirus Host Range Expansion and Middle East Respiratory Syndrome Coronavirus Emergence: Biochemical Mechanisms and Evolutionary Perspectives. Annu Rev Virol. 2015;2: 95–117. doi: 10.1146/annurev-virology-100114-055029 [DOI] [PubMed] [Google Scholar]
  • 9.Menachery VD, Yount BL, Debbink K, Agnihothram S, Gralinski LE, Plante JA, et al. A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. Nat Med. 2015;21: 1508–1513. doi: 10.1038/nm.3985 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Cheng VCC, Lau SKP, Woo PCY, Yuen KY. Severe Acute Respiratory Syndrome Coronavirus as an Agent of Emerging and Reemerging Infection. Clinical Microbiology Reviews. 2007;20: 660–694. doi: 10.1128/CMR.00023-07 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Gellert M, Lipsett MN, Davies DR. Helix formation by guanylic acid. Proc Natl Acad Sci USA. 1962;48: 2013–2018. doi: 10.1073/pnas.48.12.2013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Day HA, Pavlou P, Waller ZAE. i-Motif DNA: Structure, stability and targeting with ligands. Bioorganic & Medicinal Chemistry. 2014;22: 4407–4418. doi: 10.1016/j.bmc.2014.05.047 [DOI] [PubMed] [Google Scholar]
  • 13.Benabou S, Aviñó A, Eritja R, González C, Gargallo R. Fundamental aspects of the nucleic acid i-motif structures. RSC Adv. 2014;4: 26956–26980. doi: 10.1039/C4RA02129K [DOI] [Google Scholar]
  • 14.Abou Assi H, Garavís M, González C, Damha MJ. i-Motif DNA: structural features and significance to cell biology. Nucleic Acids Research. 2018;46: 8038–8056. doi: 10.1093/nar/gky735 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Manzini G, Yathindra N, Xodo LE. Evidence for intramolecularly folded i-DNA structures in biologically relevant CCC-repeat sequences. Nucl Acids Res. 1994;22: 4634–4640. doi: 10.1093/nar/22.22.4634 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Du Z, Zhao Y, Li N. Genome-wide analysis reveals regulatory role of G4 DNA in gene transcription. Genome Research. 2008;18: 233–241. doi: 10.1101/gr.6905408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bugaut A, Balasubramanian S. 5’-UTR RNA G-quadruplexes: translation regulation and targeting. Nucleic Acids Research. 2012;40: 4727–4741. doi: 10.1093/nar/gks068 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rhodes D, Lipps HJ. G-quadruplexes and their regulatory roles in biology. Nucleic Acids Res. 2015;43: 8627–8637. doi: 10.1093/nar/gkv862 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Wright EP, Huppert JL, Waller ZAE. Identification of multiple genomic DNA sequences which form i-motif structures at neutral pH. Nucleic Acids Research. 2017;45: 2951–2959. doi: 10.1093/nar/gkx090 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Kwok CK, Merrick CJ. G-Quadruplexes: Prediction, Characterization, and Biological Application. Trends in Biotechnology. 2017;35: 997–1013. doi: 10.1016/j.tibtech.2017.06.012 [DOI] [PubMed] [Google Scholar]
  • 21.Varshney D, Spiegel J, Zyner K, Tannahill D, Balasubramanian S. The regulation and functions of DNA and RNA G-quadruplexes. Nat Rev Mol Cell Biol. 2020;21: 459–474. doi: 10.1038/s41580-020-0236-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Hershman SG, Chen Q, Lee JY, Kozak ML, Yue P, Wang L-S, et al. Genomic distribution and functional analyses of potential G-quadruplex-forming sequences in Saccharomyces cerevisiae. Nucleic Acids Research. 2008;36: 144–156. doi: 10.1093/nar/gkm986 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Capra JA, Paeschke K, Singh M, Zakian VA. G-Quadruplex DNA Sequences Are Evolutionarily Conserved and Associated with Distinct Genomic Features in Saccharomyces cerevisiae. Stormo GD, editor. PLoS Computational Biology. 2010;6: e1000861. doi: 10.1371/journal.pcbi.1000861 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Paeschke K, Capra JA, Zakian VA. DNA Replication through G-Quadruplex Motifs Is Promoted by the Saccharomyces cerevisiae Pif1 DNA Helicase. Cell. 2011;145: 678–691. doi: 10.1016/j.cell.2011.04.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Götz S, Pandey S, Bartsch S, Juranek S, Paeschke K. A Novel G-Quadruplex Binding Protein in Yeast—Slx9. Molecules. 2019;24: 1774. doi: 10.3390/molecules24091774 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Rawal P. Genome-wide prediction of G4 DNA as regulatory motifs: Role in Escherichia coli global regulation. Genome Research. 2006;16: 644–655. doi: 10.1101/gr.4508806 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Waller ZAE, Pinchbeck BJ, Buguth BS, Meadows TG, Richardson DJ, Gates AJ. Control of bacterial nitrate assimilation by stabilization of G-quadruplex DNA. Chem Commun (Camb). 2016;52: 13511–13514. doi: 10.1039/c6cc06057a [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ding Y, Fleming AM, Burrows CJ. Case studies on potential G-quadruplex-forming sequences from the bacterial orders Deinococcales and Thermales derived from a survey of published genomes. Sci Rep. 2018;8: 15679. doi: 10.1038/s41598-018-33944-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Bartas M, Čutová M, Brázda V, Kaura P, Šťastný J, Kolomazník J, et al. The Presence and Localization of G-Quadruplex Forming Sequences in the Domain of Bacteria. Molecules. 2019;24: 1711. doi: 10.3390/molecules24091711 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shao X, Zhang W, Umar MI, Wong HY, Seng Z, Xie Y, et al. RNA G-Quadruplex Structures Mediate Gene Regulation in Bacteria. Chang Y-F, editor. mBio. 2020;11: e02926–19. doi: 10.1128/mBio.02926-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Abu-Ghazalah RM, Macgregor RB. Structural polymorphism of the four-repeat Oxytricha nova telomeric DNA sequences. Biophysical Chemistry. 2009;141: 180–185. doi: 10.1016/j.bpc.2009.01.013 [DOI] [PubMed] [Google Scholar]
  • 32.Harris LM, Merrick CJ. G-Quadruplexes in Pathogens: A Common Route to Virulence Control? PLoS Pathog. 2015;11: e1004562. doi: 10.1371/journal.ppat.1004562 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Bhartiya D, Chawla V, Ghosh S, Shankar R, Kumar N. Genome-wide regulatory dynamics of G-quadruplexes in human malaria parasite Plasmodium falciparum. Genomics. 2016;108: 224–231. doi: 10.1016/j.ygeno.2016.10.004 [DOI] [PubMed] [Google Scholar]
  • 34.Demkovičová E, Bauer Ľ, Krafčíková P, Tlučková K, Tóthova P, Halaganová A, et al. Telomeric G-Quadruplexes: From Human to Tetrahymena Repeats. Journal of Nucleic Acids. 2017;2017: 1–14. doi: 10.1155/2017/9170371 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Belmonte-Reche E, Martínez-García M, Guédin A, Zuffo M, Arévalo-Ruiz M, Doria F, et al. G-Quadruplex Identification in the Genome of Protozoan Parasites Points to Naphthalene Diimide Ligands as New Antiparasitic Agents. Journal of Medicinal Chemistry. 2018;61: 1231–1240. doi: 10.1021/acs.jmedchem.7b01672 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Dumetz F, Merrick C. Parasitic Protozoa: Unusual Roles for G-Quaduplerxes in Early-Diverging Eukaryotes. Molecules. 2019;24: 1339. doi: 10.3390/molecules24071339 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Perrone R, Nadai M, Frasson I, Poe JA, Butovskaya E, Smithgall TE, et al. A Dynamic G-Quadruplex Region Regulates the HIV-1 Long Terminal Repeat Promoter. J Med Chem. 2013;56: 6521–6530. doi: 10.1021/jm400914r [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Perrone R, Nadai M, Poe JA, Frasson I, Palumbo M, Palù G, et al. Formation of a Unique Cluster of G-Quadruplex Structures in the HIV-1 nef Coding Region: Implications for Antiviral Activity. Qiu J, editor. PLoS ONE. 2013;8: e73121. doi: 10.1371/journal.pone.0073121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Amrane S, Kerkour A, Bedrat A, Vialet B, Andreola M-L, Mergny J-L. Topology of a DNA G-Quadruplex Structure Formed in the HIV-1 Promoter: A Potential Target for Anti-HIV Drug Development. J Am Chem Soc. 2014;136: 5249–5252. doi: 10.1021/ja501500c [DOI] [PubMed] [Google Scholar]
  • 40.Norseen J, Johnson FB, Lieberman PM. Role for G-quadruplex RNA binding by Epstein-Barr virus nuclear antigen 1 in DNA replication and metaphase chromosome attachment. J Virol. 2009;83: 10336–10346. doi: 10.1128/JVI.00747-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Murat P, Zhong J, Lekieffre L, Cowieson NP, Clancy JL, Preiss T, et al. G-quadruplexes regulate Epstein-Barr virus–encoded nuclear antigen 1 mRNA translation. Nat Chem Biol. 2014;10: 358–364. doi: 10.1038/nchembio.1479 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Tlučková K, Marušič M, Tóthová P, Bauer L, Šket P, Plavec J, et al. Human Papillomavirus G-Quadruplexes. Biochemistry. 2013;52: 7207–7216. doi: 10.1021/bi400897g [DOI] [PubMed] [Google Scholar]
  • 43.Zahin M, Dean WL, Ghim S, Joh J, Gray RD, Khanal S, et al. Identification of G-quadruplex forming sequences in three manatee papillomaviruses. Buratti E, editor. PLoS ONE. 2018;13: e0195625. doi: 10.1371/journal.pone.0195625 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Artusi S, Nadai M, Perrone R, Biasolo MA, Palù G, Flamand L, et al. The Herpes Simplex Virus-1 genome contains multiple clusters of repeated G-quadruplex: Implications for the antiviral activity of a G-quadruplex ligand. Antiviral Research. 2015;118: 123–131. doi: 10.1016/j.antiviral.2015.03.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Biswas B, Kumari P, Vivekanandan P. Pac1 Signals of Human Herpesviruses Contain a Highly Conserved G-Quadruplex Motif. ACS Infect Dis. 2018;4: 744–751. doi: 10.1021/acsinfecdis.7b00279 [DOI] [PubMed] [Google Scholar]
  • 46.Biswas B, Kandpal M, Vivekanandan P. A G-quadruplex motif in an envelope gene promoter regulates transcription and virion secretion in HBV genotype B. Nucleic Acids Research. 2017;45: 11268–11280. doi: 10.1093/nar/gkx823 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Wang S-R, Zhang Q-Y, Wang J-Q, Ge X-Y, Song Y-Y, Wang Y-F, et al. Chemical Targeting of a G-Quadruplex RNA in the Ebola Virus L Gene. Cell Chemical Biology. 2016;23: 1113–1122. doi: 10.1016/j.chembiol.2016.07.019 [DOI] [PubMed] [Google Scholar]
  • 48.Fleming AM, Ding Y, Alenko A, Burrows CJ. Zika Virus Genomic RNA Possesses Conserved G-Quadruplexes Characteristic of the Flaviviridae Family. ACS Infect Dis. 2016;2: 674–681. doi: 10.1021/acsinfecdis.6b00109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Ravichandran S, Kim Y-E, Bansal V, Ghosh A, Hur J, Subramani VK, et al. Genome-wide analysis of regulatory G-quadruplexes affecting gene expression in human cytomegalovirus. Lieberman PM, editor. PLoS Pathog. 2018;14: e1007334. doi: 10.1371/journal.ppat.1007334 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Ruggiero E, Richter SN. G-quadruplexes and G-quadruplex ligands: targets and tools in antiviral therapy. Nucleic Acids Research. 2018;46: 3270–3283. doi: 10.1093/nar/gky187 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Ruggiero E, Lago S, Šket P, Nadai M, Frasson I, Plavec J, et al. A dynamic i-motif with a duplex stem-loop in the long terminal repeat promoter of the HIV-1 proviral genome modulates viral transcription. Nucleic Acids Research. 2019;47: 11057–11068. doi: 10.1093/nar/gkz937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Brazier JA, Shah A, Brown GD. I-Motif formation in gene promoters: unusually stable formation in sequences complementary to known G-quadruplexes. Chem Commun. 2012;48: 10739–10741. doi: 10.1039/c2cc30863k [DOI] [PubMed] [Google Scholar]
  • 53.Belmonte-Reche E, Morales JC. G4-iM Grinder: when size and frequency matter. G-Quadruplex, i-Motif and higher order structure search and analysis tool. NAR Genomics and Bioinformatics, Volume 2, Issue 1, March 2020, lqz005, doi: 10.1093/nargab/lqz005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Métifiot M, Amrane S, Litvak S, Andreola M-L. G-quadruplexes in viruses: function and potential therapeutic applications. Nucleic Acids Research. 2014;42: 12352–12366. doi: 10.1093/nar/gku999 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.NCBI Resource Coordinators, Agarwala R, Barrett T, Beck J, Benson DA, Bollin C, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Research. 2018;46: D8–D13. doi: 10.1093/nar/gkx1095 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Bedrat A, Lacroix L, Mergny J-L. Re-evaluation of G-quadruplex propensity with G4Hunter. Nucleic Acids Res. 2016;44: 1746–1759. doi: 10.1093/nar/gkw006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Hon J, Martínek T, Zendulka J, Lexa M. pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics. 2017;33: 3373–3379. doi: 10.1093/bioinformatics/btx413 [DOI] [PubMed] [Google Scholar]
  • 58.Huppert JL. Prevalence of quadruplexes in the human genome. Nucleic Acids Research. 2005;33: 2908–2916. doi: 10.1093/nar/gki609 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Todd AK. Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Research. 2005;33: 2901–2907. doi: 10.1093/nar/gki553 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Miskiewicz J, Sarzynska J, Szachniuk M. How bioinformatics resources work with G4 RNAs. Briefings in Bioinformatics. 2020; bbaa201. doi: 10.1093/bib/bbaa201 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Mir B, Serrano I, Buitrago D, Orozco M, Escaja N, González C. Prevalent Sequences in the Human Genome Can Form Mini i-Motif Structures at Physiological pH. J Am Chem Soc. 2017;139: 13985–13988. doi: 10.1021/jacs.7b07383 [DOI] [PubMed] [Google Scholar]
  • 62.Shu Y, McCauley J. GISAID: Global initiative on sharing all influenza data–from vision to reality. Euro Surveill. 2017;22: 30494. doi: 10.2807/1560-7917.ES.2017.22.13.30494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Fleming AM, Zhou J, Wallace SS, Burrows CJ. A Role for the Fifth G-Track in G-Quadruplex Forming Oncogene Promoter Sequences during Oxidative Stress: Do These “Spare Tires” Have an Evolved Function? ACS Central Science. 2015;1: 226–233. doi: 10.1021/acscentsci.5b00202 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Omaga CA, Fleming AM, Burrows CJ. The Fifth Domain in the G-Quadruplex-Forming Sequence of the Human NEIL3 Promoter Locks DNA Folding in Response to Oxidative Damage. Biochemistry. 2018;57: 2958–2970. doi: 10.1021/acs.biochem.8b00226 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Zhao C, Qin G, Niu J, Wang Z, Wang C, Ren J, et al. Targeting RNA G‐Quadruplex in SARS‐CoV‐2: A Promising Therapeutic Target for COVID‐19? Angew Chem Int Ed. 2021;60: 432–438. doi: 10.1002/anie.202011419 [DOI] [PubMed] [Google Scholar]
  • 66.Yu G, Smith DK, Zhu H, Guan Y, Lam TT. ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. McInerny G, editor. Methods Ecol Evol. 2017;8: 28–36. doi: 10.1111/2041-210X.12628 [DOI] [Google Scholar]
  • 67.Gu Z, Gu L, Eils R, Schlesner M, Brors B. circlize implements and enhances circular visualization in R. Bioinformatics. 2014;30: 2811–2812. doi: 10.1093/bioinformatics/btu393 [DOI] [PubMed] [Google Scholar]
  • 68.Lavezzo E, Berselli M, Frasson I, Perrone R, Palù G, Brazzale AR, et al. G-quadruplex forming sequences in the genome of all known human viruses: A comprehensive guide. Lexa M, editor. PLoS Comput Biol. 2018;14: e1006675. doi: 10.1371/journal.pcbi.1006675 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Puig Lombardi EP, Londoño-Vallejo A, Nicolas A. Relationship Between G-Quadruplex Sequence Composition in Viruses and Their Hosts. Molecules. 2019;24: 1942. doi: 10.3390/molecules24101942 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Narayanan K, Kim KH, Makino S. Characterization of N protein self-association in coronavirus ribonucleoprotein complexes. Virus Res. 2003;98: 131–140. doi: 10.1016/j.virusres.2003.08.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Tan J, Vonrhein C, Smart OS, Bricogne G, Bollati M, Kusov Y, et al. The SARS-Unique Domain (SUD) of SARS Coronavirus Contains Two Macrodomains That Bind G-Quadruplexes. Rey FA, editor. PLoS Pathog. 2009;5: e1000428. doi: 10.1371/journal.ppat.1000428 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Beaudoin J-D, Jodoin R, Perreault J-P. New scoring system to identify RNA G-quadruplex folding. Nucleic Acids Research. 2014;42: 1209–1223. doi: 10.1093/nar/gkt904 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Snoussi K, Nonin-Lecomte S, Leroy J-L. The RNA i-motif. Journal of Molecular Biology. 2001;309: 139–153. doi: 10.1006/jmbi.2001.4618 [DOI] [PubMed] [Google Scholar]
  • 74.Kikin O, D’Antonio L, Bagga PS. QGRS Mapper: a web-based server for predicting G-quadruplexes in nucleotide sequences. Nucleic Acids Research. 2006;34: W676–W682. doi: 10.1093/nar/gkl253 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Bartas M, Brázda V, Bohálová N, Cantara A, Volná A, Stachurová T, et al. In-Depth Bioinformatic Analyses of Nidovirales Including Human SARS-CoV-2, SARS-CoV, MERS-CoV Viruses Suggest Important Roles of Non-canonical Nucleic Acid Structures in Their Lifecycles. Front Microbiol. 2020;11: 1583. doi: 10.3389/fmicb.2020.01583 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Ji D, Juhas M, Tsang CM, Kwok CK, Li Y, Zhang Y. Discovery of G-quadruplex-forming sequences in SARS-CoV-2. Briefings in Bioinformatics. 2020; bbaa114. doi: 10.1093/bib/bbaa114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Cui H, Zhang L. G-Quadruplexes Are Present in Human Coronaviruses Including SARS-CoV-2. Front Microbiol. 2020;11: 567317. doi: 10.3389/fmicb.2020.567317 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Panera N, Tozzi AE, Alisi A. The G-Quadruplex/Helicase World as a Potential Antiviral Approach Against COVID-19. Drugs. 2020;80: 941–946. doi: 10.1007/s40265-020-01321-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Doluca O. G4Catchall: A G-quadruplex prediction approach considering atypical features. Journal of Theoretical Biology. 2019;463: 92–98. doi: 10.1016/j.jtbi.2018.12.007 [DOI] [PubMed] [Google Scholar]
  • 80.Zhang R, Xiao K, Gu Y, Liu H, Sun X. Whole Genome Identification of Potential G-Quadruplexes and Analysis of the G-Quadruplex Binding Domain for SARS-CoV-2. Front Genet. 2020;11: 587829. doi: 10.3389/fgene.2020.587829 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Eric Charles Dykeman

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

14 Jan 2021

PONE-D-20-34438

Exploring G- and C-quadruplex structures as potential targets for the severe acute respiratory syndrome coronavirus 2

PLOS ONE

Dear Dr. Belmonte Reche,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

After careful review of the comments, it appears that the reviewers have a several issues with the manuscript. One of the main concerns was the use of the 'lax' setting for the G4-iM Grinder and the resulting predicition of G-quadreplexes based on this setting, as well as comparision with alternative G-quadraplex tools such as e.g. cGcC or G4Hunter.

Please address these concerns, along with the other comments made by the reviewers, in your re-submission.

Please submit your revised manuscript by Feb 27 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Eric Charles Dykeman, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

Reviewer #3: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Exploring G- and C-quadruplex structures as potential targets for the severe acute respiratory syndrome coronavirus 2

By Efres Belmonte-Reche* et al (*Corresponding author)

Submitted to PLoS One (Editorial No. PONE-D-20-34438)

General Comments

G- and C-quadruplex secondary structures (G4s and iMs, respectively) are found in the cellular genomes of many animal species and have complex roles in the regulation of metabolic pathways, including the biochemistry of telomeres and of oncogenic promoters (genome stability). Bacteria and many viruses have recently been explored for the presence of such structures in their genomes. Here, the genomes of SARS-CoV-2 isolates (of the reference strain, GCF_009858895.2, and of >3000 subsequent isolates), of other coronaviruses (CoVs) and of other (+)ssRNA viruses were investigated for the presence of such structures using the G4-iM Grinder (GiG) algorithm with some modifications. The quadruplex sequence structures identified were evaluated for the probability of occurring in infected cells, their position in the genome, and the degree of conservation in closely related viruses of the Coronaviridae family. The ‘best’ candidates for potential quadruplex formation were explored for stability by biophysical techniques (NMR, CD spectroscopy). Several relatively stable quadruplexes were identified which may be considered as possible targets for the development of novel antivirals.

The data observed are interesting. However, their presentation is rather confusing:

- The Materials and Methods section frequently refers to the Suppl. Mat. component of the manuscript for information which should be in the main text;

- The modifications of the GiG method used are not properly explained;

- The ‘detailed analysis of the Results’ is moved into Suppl. Mat. which is rather confusing and also has led to some duplications in panels of figures in the main text and in Suppl. Mat. ;

- The frequency of quadruplex structure candidates is very high at the lax quadruplex definition, but the relevance of such data in biological terms is unclear;

- The significance of differences in conservation of potential quadruplex structures is not clear;

- Text and Legend of Fig. 2 differ in the nomenclature of structures assessed;

- The findings of this manuscript are claimed to ‘greatly expand the current knowledge regarding quadruplexes and the SARS-CoV-2 in particular [refs 66-68]’ (lines 320/321), but no details are given;

- No particular information is given on compounds which could potentially react with (and disturb) quadruplex structures, using them as therapeutic targets.

Numerous clarifications are requested, and the presentation of this potentially very important information should be improved.

Specific Comments

Line

1 Reconsider title, e.g., ‘Potential G- and C-quadruplex structures of SARS-CoV-2 as potential targets for the development of antivirals’, or similar. [Since no antiviral candidates are even mentioned, consider: ‘G- and C-quadruplex structures of SARS-CoV-2 and their potential biological relevance’]

18 A short paragraph should introduce the topic.

28 … the entire realm… Please clarify. …. in common with other species… Please specify.

32 … may be suitable targets for… Please clarify (see comment above).

38 Consider citation of: Lau SKP, et al. Possible Bat Origin of Severe Acute Respiratory Syndrome Coronavirus 2. Emerg Infect Dis. 2020 Jul;26(7):1542-1547.

Andersen KG, et al. The proximal origin of SARS-CoV-2. Nat Med. 2020 Apr;26(4):450-452.

45 … pathogenic bacteria and viruses…

47 Consider citation of:

Menachery VD, et al. A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. Nat Med. 2015 Dec;21(12):1508-13.

Peck KM, et al. Coronavirus Host Range Expansion and Middle East Respiratory Syndrome Coronavirus Emergence: Biochemical Mechanisms and Evolutionary Perspectives. Annu Rev Virol. 2015 Nov;2(1):95-117.

57 … DNA or RNA [read RNA throughout the ms; ARN is the French abbreviation.]

74 Consider reading: … great potential as targets for virus inhibition…

77 Clarify the description of data from ref. [49]. Correct the site of publication to: NAR Genomics and Bioinformatics, Volume 2, Issue 1, March 2020, lqz005, https://doi.org/10.1093/nargab/lqz005

78 Support the statement by refs.

79 Rephrase sentence: … to the ongoing research efforts related to the COVID-19 pandemic by investigating SARS-CoV-2 for the presence of quadruplex structures…

84 Rephrase sentence.

111 … presence of known-to-form quadruplex structures… Please clarify.

133 and 158f. Please clarify what ‘positive’ and ‘negative’ mean in this context. Are these designations just used for presentational purposes?

154 Fig. 1A. Consider showing a quadruplex structure.

172f This summarizing referral of the reader to Suppl. Mat. is confusing. In addition, in detail there are partial duplications in figures of Suppl. Mat. and the main text. It should be considered to transfer essential components of Suppl. Mat. into the main text (including relevant figures). See comments below.

208 to 213. The meaning of this text is not clear. The concluding sentence ‘Here they may play their biological role if formed’ is speculative and should be omitted as long as no hard data are available.

220 This interpretation of Fig. 2A does not describe what is shown. Please adjust accordingly.

224 and 230. The citation of components of Suppl. Mat. is out of order and confusing.

247 Fig. 2, panel C. Explain how COVID DNA.iM-1 was obtained and what this means in context.

260f Clarify sentence.

267 … quadruplexes may be common and… for viruses to ‘live’, thrive and adapt… This statement is highly speculative and should be considered for omission.

293 and 296. Clarify: … CoVID-RNA G4-1… CoVID-RNA G4-2… The interpretation of the latter structure is highly speculative and should be omitted.

310 … the opposite but with the same effect might also be happening… Please spell out more clearly what you want to say.

318f This paragraph should contain details of how the data presented here augment those of refs. [60, 61, 63, 66-68]. Furthermore, the potential of developing novel antivirals interfering with quadruplex structure formation, should be assessed in more detail.

476 Correct ref. , see comment above.

520 Qu X is last, not first author.

Suppl. Mat.

Page

3 The new functions developed and their exact website locations should be mentioned under Methods in the main text.

4 The subheadings of Suppl. Fig. 1 could be incorporated into Fig. 1, making Suppl. Fig. 1 redundant.

5 Essential information of ‘in silico methodology’ should be in the Methods section of the main text.

6 The procedure of NMR and circular dichroism experiments should be in the Methods section of the main text.

7 A condensed version of this text should be transferred to the Methods section of the main Text.

Paragraph 2, line 3. Omit ref [2].

8f Suppl. Fig. 2 panel A duplicates Fig. 2 panel A. Suppl Fig. 2 panels B-D could remain in Suppl Mat.

11f Abbreviated versions of these data should be transferred to the main text.

14f and Suppl. Fig. 4 could remain in Suppl. Mat. in a condensed form.

18f Biophysical experiments. The Suppl. Fig. 5 duplicates data of Fig. 2. The section should be transferred to the main text and abandoned in Suppl. Mat.

21f ‘Other bioinformatics figures’. Suppl. Figures 6-11 and 13-16 are not considered to be essential. Suppl. Figs. 12 and 17 could be considered for the main text.

33f ‘Other biophysical figures’. Suppl. Figs. 18 and 19 could be considered for Suppl. Mat., preferentially Suppl. Fig. 19.

35 to 37. Condensed components of this information (including relevant additional refs.) should be transferred to Methods of the main text.

Reviewer #2: In their manuscript the authors present a genome wide screen for putative

G- and C Quadruplexes sites (PQS and PiMS, respectively) in SARS-CoV-2 and other

viral genomes. For that purpose, the authors updated the pre-existing

R-based scanning tool G4-iM Grinder.

The genome wide screen in the SARS-CoV-2 reference genome resulted in about

300 PQS and about 200 PiMS. However, when the resulting scores were filtered

by a previously suggested value of |score| > 40, none of the predicted sites

remained. Therefore, the authors concentrated on those results with |score| > 20

which consisted of 71 PQS and 32 PiMS. The authors then compared these

sites to other viral genomes within the Coronaviridae, the entire group IV

and a selection of all virus genomes to find sites conserved within different

virus species. Furthermore, the authors conducted NMR and CD experiments on

a handful of selected high-scoring PWS and PiMS found in SARS-CoV-2 reference

genome. Here they found that both of the PQS tested do form a G-quadruplex.

The single PiMS, however, did not show formation of a C-quadruplex. However,

for the latter the authors also tested a DNA variant instead and found, that

in DNA this sequence indeed adopts a C-quadruplex conformation. Interestingly,

a single point mutation in the first loop region this DNA variant disrupts

the ability to form the C-quadruplex.

Overall, the article is very well written, and the analysis performed by the

authors is comprehensive and sound. Nevertheless, there are some minor points

that require more attention or need to be described in more detail. Furthermore,

in some instances the authors draw conclusions that I can't follow or agree

with.

In particular, I have the following remarks and questions:

1. In the methods section, the authors state that in their analysis they

chose 3 different configurations of G4-iM Grinder, the lax, the default,

and the 'original folding rule'. However, all Results are only for the

lax configuration. If I understand correctly, this was the only configuration

that yielded any results for SARS-CoV-2. If none of the results presented

in this study (and the corresponding supplement), please correct the Methods

section accordingly.

Along with that, I wonder how the authors justify their threshold of |score|>20.

From the original G4-iM Grinder publication I took that |score|>40 is suggested

for good prediction performance. Here, I am missing some mroe discussion

about this lower threshold, especially in combination with the lax configuration,

as I assume that this might greatly increase the number of false positive predictions!

2. The authors should consider comparing their result of 73 PQS to those found

in another previous study by Ji et al. 2020 "Discovery of G-quadruplex-forming

sequences in SARS-CoV-2", Briefings in Bioinformatics (https://doi.org/10.1093/bib/bbaa114)

3. 3rd paragraph of the Materials and Methods section, line 125ff. The authors

state that a more flexible configuration requires 'more computing power'. Maybe,

I oversaw this but I wonder why this is the case and how the computation time

scales with the choice of the different 'flexibility' parameters. Does the

analysis scale linearly, quadratically, or even exponential in the number of

candidate sites? One can only speculate at this point. Especially, since the

underlying scoring schemes (cGcC, G4Hunter, etc) seem to be independent

on the flexibility of the sequence constraint.

4. Results and discussion SARS-CoV-2, 5th paragraph, lines 242-245, as well

as Supplement Section 3, 'SARS-CoV-2, the Virus Realm and quadruplexes'.

The authors matched PQS and PiMS found in SARS-CoV-2 in other viruses and

claim that these findings 'cannot be explained by chance'. I argue against that,

since there is no obvious relation between dsDNA viruses and the ss+virus

SARS-CoV-2. Given the probability in the order of 1e-13 as derived in the

supplement and the vast size of the entirety of viral genomes, I would expect a

few sequences harboring common subsequences in the size of the PQS tested.

Note, that most viruses mutate much faster than bacteria, eucaryotes, or archaea,

so chances are, that subsequences of length of about 30 develop independently.

Especially if the other hypothesis is that dsDNA and +strand ssRNA viruses are

somehow related and have conserved these small pieces of subsequence during

evolution.

5. For the conservation of the PQS found in SARS-CoV-2 reference and the remaining

3297 SARS-CoV-2 genomes, please also state the overall sequence identity. Otherwise,

simply showing that the PQS are conserved by 98.6%+-7.4% renders it difficult to

assess whether this is expected, or unexpected.

6. Suppl. Figure 17 and related text. The examples given do not seem well chosen.

First, the linker sequences between the G-runs are quite large compared to the number

of layers (2). So I'd assume that, if at all, the resulting quadruplexes would be

exceptionally weak. Second, the regions of Cluster 1 and Cluster 2 are located

in the 5'UTR which is know to be well structured and conserved throughout all

betacoronavirus and even the remaining coronaviruses. In particular, Cluster 2

overlaps the well annotated SL5C which is required for replication. Cluster 2

resides in a highly complementary region that is known to form well conserved

secondary structure. This might also explain the low conservation of 27% among the

coronaviruses, if I'm not mistaken, since secondary structure, not G-quadruplex

seems to play the most important role here. The authors should therefore relate

their findings to known annotation of SARS-CoV-2.

Along with that, the authors should add to their discussion a paragraph about

the reliability of their predictions, if such can be assessed. Especially

the PQS (and PiMS) with just 2 layers and/or bulges are known to be less

stable, thus potentially do not form at all (in vitro or in vivo). Moreover,

even when NMR and CD measurements of the PQS suggest their formation, they

still have to compete against regular secondary structure formation when they

reside in their viral sequence context. The authors should elaborate on that

problem, at least in the discussion.

General remarks:

- There are multiple occurrences of ANR instead of RNA throughout the manuscript

and Supplement. Please correct them.

- In the 4th paragraph of the introduction, there is a 'nucleic acid sequences'

right after DNA and RNA. This is redundant, since the NA in both already stands

for Nucleic Acid

- In the second paragraph of Materials and Methods, line 112. What do the

authors attempt to convey with 'presence of known-to-form quadruplex structures

sequences'? Maybe a simple 'presence of known-to-form quadruplexes' would suffice?

Reviewer #3: Thank you for the opportunity to review the manuscript: “Exploring G- and C-quadruplex structures as potential targets for the severe acute respiratory syndrome coronavirus 2". This manuscript is presenting analyses of potential G-quadruplex-forming sequences (PQS) in the genome of SARS-CoV-2 and related viruses. It was interesting to read this, but there are several points which must be improved remarkably before its publication. The main problems are 1. Conclusions of the analyses are wrongly interpreted, 2. The various methods must be used for their results evaluation and 3. The used methods must be described in manuscript with details. Although I found this manuscript interesting the major revision is necessary before its publication. I recommend a major revision of this manuscript.

Major points:

1. Conclusion are misleading. The authors find that compare to other tested viruses, there are no potential G-quadruplex forming (PQS) sequences in SARS-cov2 genome. Then they changed the algorithm for PQS search – “lax” configuration – with loop size 20bp – and claimed that these PQS could be may be suitable targets. This approach is wrong. The stability of these G-quadruplexes will be extremely low, so their formation in vivo is very doubtful. The right conclusion will be that compare to other genomes – the G-quadruplexes are probably very rare (if presented) in SARS-cov2.

2. The alternative algorithms must be used as a control and discussed (for example QGRS Mapper and G4Hunter web are freely available and easily accessible for quick evaluations).

3. The Title should be improved, the authors did not target any structures, just predicted them.

4. Description of experimental methods in Material and Method section is missing.

5. How the PQS score is set? What these number means? Please compare with G4Hunter score and QGRS score.

6. The comparison of PQS in real genomes must be compared statistically with scrambled/random sequences – it is possible than the same amount or maybe more PQS will be in scrambled sequence. With this control your analyses do not prove anything.

7. Do not use suggestion the the Result part. Just described your results (e.g. line 231: Here they may play their biological role if formed., line: 266, 277: „may be“. We are not interested what „may be“ in Results, but we would like to see the facts only. Keep your theories and “maybe” into Discussion part.

8. Figure 2B. Compare results with already in vivo proved sequences.

9. Figure 2C, Show the same result with confirmed iM and compare the spectra.

Minor points:

1. Please polish the terminology. I-motifs are formed by four strands, however to use C-quadruplex is unusual, because the structure is completely different from G-quadruplexes. Fours strand structures: G-quadruplexes and i-motifs.

2. Please do not repeat the same first sentence in Methods and Results section.

3. Line 195 and 196 – do not discuss in Result section.

4. Line 294 and 295 – again this is not your result so move to discussion part

5. Line 316 and 317 – I do not understand. How your results on RNA viruses proved “the especially for DNA, G4-iM Grinder can be used”? Any algorithm can be used if you used “lax” settings.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jun 8;16(6):e0250654. doi: 10.1371/journal.pone.0250654.r002

Author response to Decision Letter 0


24 Feb 2021

A rebuttal letter that responds to each point raised by the academic editor and reviewer has been submitted.

A marked-up copy of the manucripst that highlights changes made to the original version has been submitted.

An unmarked version of the revised paper without tracked changes has been submitted.

Figures that comply with PACE, have been submitted within the RAR file.

The supplementary material has been submitted.

Attachment

Submitted filename: DV.2.docx

Decision Letter 1

Eric Charles Dykeman

25 Mar 2021

PONE-D-20-34438R1

Potential G-quadruplexes and i-Motifs in the SARS-CoV-2

PLOS ONE

Dear Dr. Belmonte Reche,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

In particular, one of the referees notes a technical issue with the way that you have performed your analysis on random 16-nt long sequences and the probability that they would form an exact match. Specifically,

"The authors matched PQS and PiMS found in SARS-CoV-2 in other viruses and claim that these findings 'cannot be explained by chance'."

Although you have addressed this in your response, the second referee notes that, because of SARS-Cov2 genome has a length of > 25000 nt, this presents multiple chances for alignment, while your analysis only gives statistics when comparing 16 nt against 16 nt (not 16nt against 25,000+), and thus the referee is concerned that the presence of the PQS and PIMS are more common then you expect. I would be glad to re-consider your revised manuscript after you have responded to this issue.

==============================

Please submit your revised manuscript by May 09 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Eric Charles Dykeman, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #3: No

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: N/A

Reviewer #3: No

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: The authors revised their manuscript and properly addressed all the comments and questions raised in my previous report.

Reviewer #3: 1. The experiment with the "shuffled" or "random" sequence was preformed wrongly. :

"To complement the analysis, we randomly shuffled sequences (16 nucleotides long)."

Please take complete virus sequence then shuffle and analyse this 29903 nt long sequence - and compare the number of PQS sequences in the real virus sequence and "shuffled" sequence.

2. The abstract do not correspond to the results. It was found that there are less PQS in Coronaviridae compare another viruses. This fact must be part of the abstract and disscussed accordingly.

3. "hundreds of potential quadruplex candidates were discovered" - please use exact numbers and significance in the abstract statements.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jun 8;16(6):e0250654. doi: 10.1371/journal.pone.0250654.r004

Author response to Decision Letter 1


1 Apr 2021

Please find the response to Reviewer 3 second comments attached on submission (as it has graphs which cannot be just pasted here).

Attachment

Submitted filename: Reviewer.answers.2.March.docx

Decision Letter 2

Eric Charles Dykeman

12 Apr 2021

Potential G-quadruplexes and i-Motifs in the SARS-CoV-2

PONE-D-20-34438R2

Dear Dr. Belmonte Reche,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Eric Charles Dykeman, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #3: Thank you. Authors have improved the manuscript and added requested analyses. I recommend this manuscript for publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #3: No

Acceptance letter

Eric Charles Dykeman

27 May 2021

PONE-D-20-34438R2

Potential G-quadruplexes and i-Motifs in the SARS-CoV-2

Dear Dr. Belmonte-Reche:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Eric Charles Dykeman

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File

    (PDF)

    Attachment

    Submitted filename: DV.2.docx

    Attachment

    Submitted filename: Reviewer.answers.2.March.docx

    Data Availability Statement

    All results and the R package are available from the URL https://github.com/EfresBR/G4iMGrinder.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES