Abstract
Purpose
The aim of this work was to investigate the genome of SARS-CoV, MERS-CoV, and SARS-CoV-2 by the paradigm of chaos theory and fractal geometry. Coronavirus is the agent that causes the severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) and emergent infections such as the present pandemic of COVID-19. Understanding its genome pattern is important for developing new and faster ways of testing for identifying the genome of the virus and also for better understanding of its origin and evolution.
Methods
For that, it was calculated the alpha coefficient by detrended fluctuation analysis (DFA) for the sequences of these genomes converted to binary numbers in order to determine if it is a chaotic or a random series of data. Also, it applied the random walking for obtaining a fractal map of the whole genome and calculated the fractal dimension (FD) by box-counting of this map by two different software.
Results
With this, it was found that the alpha coefficient of the first SARS viruses was > 0.5, indicating that the series is chaotic or fractal, and has a persistent long-range memory or self-similarity along its sequence. This is not the case for MERS virus, which showed to have a completely random sequence (α < 0.5). For the fractal dimension, SARS viruses presented a FD around 1.5, and for MERS the fractal dimension decreases (FD < 1.5).
Conclusion
The images generated by random walking of the entire RNA genome are by itself a fractal signature of the virus, which may be applied for studying its origin and for faster diagnostic of COVID-19.
Keywords: Fractal analysis, Random walking, Deterministic chaos, Coronavirus, COVID-19
Introduction
A high incidence of flu in adults and children are caused by coronavirus. The SARS-CoV-2 virus is the cause of COVID-19, a severe acute respiratory syndrome (SARS). In 2002, a pathogenic coronavirus caused the first SARS, and 10 years later, MERS-CoV caused the Midle East respiratory syndrome (MERS) (Chen 2020).
Previous studies have shown that SARS-CoV during the epidemic of 2002 has passed through mutations to better bind to the human cellular receptor ACE-2 in order to replicate, increasing its virulence (Cui et al. 2019).
Therefore, viruses change its genome from time to time originating emergent infections. But understanding the dynamics and pattern of these mutations and its consequences for the entire RNA genome of these viruses requires also a new paradigm in science, based not on classical methods.
The non-linear mathematical algorithm applied for studying the shape and dynamics of many biological processes is a field so-called fractal geometry and chaos theory (Mandelbrot 1983; Liebovitch 1998; Stam 2005; Kunicki et al. 2009). Some developments in this field begin with the works of Peng et al. (1994, 1995), with the description of the detrended fluctuation analysis (DFA) method for analyzing long-range correlations of genes and intergenic regions, as well as in the interval of heart-beats.
For DFA algorithm, at first, the time series is integrated, and then it is divided into boxes of equal length, n. In each box of length n, a least-square line is fitted to the data (representing the trend in that box). An algorithm is performed to obtain the relationship of the average fluctuation as a function of box size n, in a double-log graphic, in which a linear relationship indicates the presence of scaling exponent, or power law (Stam 2005). The authors have found that sequences rich in genes have a random series of data (α ≈ 0.5), and the non-coding regions presented a long-range correlation or self-similarity (α > 0.5). The method showed to be powerful also for estimating health and disease, in which healthy hearts presented a chaotic or fractal pattern in the interval of heart beats, and pathological hearts presented a random pattern of these intervals. Therefore, self-similarity characteristics of fractal phenomena are presented from molecules such as DNA and proteins to organs such as hearts (Peng et al. 1994; Stanley et al. 1996; Havlin et al. 1999; Wang et al. 2008).
In the present work, we applied the methods based on non-linear dynamics for understanding the pattern of coronaviruses causing acute severe respiratory syndromes, in order to obtain its fractal signature for better understanding of its origin, mutations, and for faster identification of its genome in the diagnostic of COVID-19.
Methods
The genomes of coronaviruses SARS-CoV, MERS-CoV, and SARS-CoV-2, were obtained in GenBank (www.ncbi.nlm.nih.gov) with the codes FJ588686.1, PRJNA485481, and MT192773.1, respectively.
The genetic sequences were converted to binary number, in which purines were converted to the number − 1, and pyrimidines converted to the number 1. With this conversion, it was generated a series of − 1 and 1. This series of data was then submitted to the detrended fluctuation analysis (DFA). DFA was calculated as described by Peng et al. (1994, 1995) and using the free software GNU Octave (https://www.gnu.org/software/octave/). The series was integrated and divided into boxes of equal length (n). In each box of length n, a least-square line was fitted to the data representing the trend in that box. The y coordinate of the straight line segments was denoted by yn(k). Computing DFA was performed according to Eq. 1.
| 1 |
This computation was repeated over all the time scales (box sizes) to provide a relationship between the average root-mean fluctuation function F(n) and the box size n (i.e., the number of length changes in a box that was the size of the window of observation). Typically, F(n) would increase with box size n. A linear relationship on a double-log plot indicated the presence of scaling. The fluctuations could be characterized by a scaling exponent alpha (α), the slope of the line relating log F(n) to log n: F(n) ~ nα, in order to provide a more accurate estimate of F(n). If α = 0.5, the series was the result of a random event; α < 0.5 was indicating an anti-persistent behavior, in which large values are probably related to small values in the future, and conversely, α > 0.5 indicated the persistent long-range correlations. The other values were as follows: α = 1 corresponded to 1/f noise (very rough landscape), the α ≥ 1 indicates that correlations existed but ceased to be of a power-law form or a random walk-like fluctuation, and finally, α = 1.5 indicates Brown noise, the integration of white noise (very smooth landscape) (Peng et al. 1995).
For random walking, the sequences of nucleotides were converted in vectors according to the method described by Abramson et al. (1999), in which A is drawn to the left, T is to the right, C is down, and G is up, using Python Language Reference, version 2.7, available at http://www.python.org. The images were submitted to fractal analysis using box-counting method of the plugin “fracLac” of the software ImageJ™ (https://imagej.nih.gov/ij/plugins/fraclac/fraclac.html) and using Fractal Dimension Estimator (FDE) (http://www.fractal-lab.org/Downloads/FDEstimator.html). This method consists in the superposition of boxes in the image, in which the size of the boxes is continuously reduced. In each of these reductions, it counted the number of boxes necessary to cover the image.
After this, the fractal dimension (FD) is calculated by Eq. 2:
| 2 |
In which Nr is the number of boxes necessary to cover the image in each progressive reduction of the size of the box (r). FD will be the slope of the regression line, generated by the logarithm of the quantity of boxes in function to the logarithm of the size of the boxes.
Results
The values of dimension and value of alpha (α) from DFA method applied to SARS-CoV, MERS-CoV, and SARS-CoV-2 are presented in Table 1 below:
Table 1.
Values of dimension and alpha coefficients (α) from DFA method applied to the sequences of SARS-CoV, MERS-CoV, and SARS-CoV-2
| Virus | Dimension from DFA | Alpha coefficient (α) |
|---|---|---|
| SARS-CoV | 2.4020 | 0.5970 |
| MERS-CoV | 2.5061 | 0.4939 |
| SARS-CoV-2 | 2.4261 | 0.5740 |
In Fig. 1, it shows the double-log graphics of F(n) versus (n), for all viruses, in which the inclination of the line is the alpha coefficient (α).
Fig. 1.
Double-log graphic of F(n) versus n. (a) The result of SARS-CoV, (b) MERS-CoV, and (c) SARS-CoV-2
The result of random walking of SARS-CoV, MERS-CoV, and SARS-CoV-2 sequences is shown in Fig. 2 below.
Fig. 2.
Random walking for the sequences of SARS-CoV, MERS-CoV, and SARS-CoV-2
The FDs of the random walking of each virus were calculated by box-counting using two different software: ImageJ and Fractal Dimension Estimator (FDE). The values of FD are listed in Table 2 below:
Table 2.
Values of fractal dimensions (FDs) obtained by software ImageJ and Fractal Dimension Estimator (FDE)
| Virus | FD by ImageJ | FD by FDE |
|---|---|---|
| SARS-CoV | 1.3079 | 1.5388 |
| MERS-CoV | 1.1689 | 1.4000 |
| SARS-CoV-2 | 1.2338 | 1.4983 |
In Fig. 3, it shows the double-log graphic of the box-counting method (log of the number of boxes by the log of the 1/r length of the size of the box), in which the inclination of the curve is the FD. The only software generating this graphic is FDE.
Fig. 3.
Double-log graphic of box-counting algorithm for SARS-CoV (a), MERS-CoV (b), and SARS-CoV-2 (c) random walking images
Discussion
Conventional methods concentrate efforts to investigate which mutations are involved in modifications of genomes. In this work, another paradigm was taken based on the fractal signature of the whole genome, and not in specific point mutations.
It is well known that DNA in eukaryotes presents a fractal pattern with an ideal optimization of space in order to organize a long molecule of DNA into a coiled complex structure of just a few micrometers (Liebovitch 1998; Lieberman-Aiden et al. 2009). Also, the dynamics of gene mutations and organization is not considered to be random, presenting a long-range correlation with a DFA alpha value (α) higher than 0.5 for sequences rich in genes, and a random pattern for intergenic regions (α = 0.5) (Peng et al. 1994).
When the sequences of viruses known in causing severe acute respiratory syndromes, the SARS-CoV (2002), MERS-CoV (2012), and SARS-CoV-2 (2019), were investigated using DFA, it was noticed that the SARS-CoV and SARS-CoV-2 presented a long-range memory characteristic of a self-similar sequence, but for MERS-CoV the sequence is random (α ≈ 0.5). When the FD was estimated by two different software, it was evidenced also that SARS-CoV and SARS-CoV-2 are fractal, and this fractality decreases in MERS-CoV. This reinforces the result with DFA that indicates that SARS-CoV and SARS-CoV-2 have more self-similarity or long-range memory along the sequence than MERS-CoV.
The software applied for estimating fractal dimensions (ImageJ and Fractal Dimension Estimator) showed a variation in the obtained values. The reason for this software dependence should be better evaluated by comparing algorithms of box-counting by both programs. Despite this discrepancy, by both methods the values of dimensions were not an integer number, with values between 1 and 2, and followed the same pattern of decreasing fractality from SARS-CoV to SARS-CoV-2. The biological significance of this loss of complexity should be further evaluated.
When the random walking of the investigated virus is analyzed alone, even with no other data of fractal dimension or detrended fluctuation analysis, it is clear that the pattern generated by it is its own specific signature, a “fractal signature” of the virus. This could be of great help for identifying the virus by only observing the generated pattern, for a quicker answer. This together with faster and easier methods for DNA and RNA sequencing would be helpful for studying the origin of SARS-CoV-2, its mutations, and for diagnostic of COVID-19.
Conclusion
This work presented an approach for obtaining a genetic signature for coronavirus based on detrended fluctuation analysis (DFA), random walking, and fractal dimension of random walking, allowing the specific identification, whole genome analysis, what may be helpful as a faster tool for the diagnostic of COVID-19.
Acknowledgements
The authors would like to thank the Federal University of Pernambuco and Dr. Romildo Nogueira, from Federal Rural University of Pernambuco, for his discussions on chaos and fractals.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- Abramson G, Cerdeira HA, Bruschi C. Fractal properties of DNA walks. BioSystems. 1999;49:63–70. doi: 10.1016/S0303-2647(98)00032-X. [DOI] [PubMed] [Google Scholar]
- Chen J. Pathogenicity and transmissibility of 2019-nCoVdA quick overview and comparison with other emerging viruses. Microbes Infect. 2020;22:69–71. doi: 10.1016/j.micinf.2020.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cui J, Li F, Shi Z. Origin and evolution of pathogenic coronaviruses. Nat Rev Microbiol. 2019;17:181–192. doi: 10.1038/s41579-018-0118-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Havlin S, Buldyrev SV, Bunde A, Goldberger AL, Ivanov PC, Peng CK, Stanley HE. Scaling in nature: from DNA through heartbeats to weather. Physica A. 1999;273:46–49. doi: 10.1016/S0378-4371(99)00340-4. [DOI] [PubMed] [Google Scholar]
- Kunicki ACB, Oliveira AJ, Mendonça MBM, Barbosa CTF, Nogueira RA. Can the fractal dimension be applied for the early diagnosis of non-proliferative diabetic retinopathy? Braz J Med Biol Res. 2009;42:930–934. doi: 10.1590/S0100-879X2009005000020. [DOI] [PubMed] [Google Scholar]
- Lieberman-Aiden E, Berkum NL, William L, Imakaev M, Ragoczy T, Telling A, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liebovitch LS. Fractals and chaos simplified for life sciences. New York: Oxford University Press; 1998. [Google Scholar]
- Mandelbrot BB. The fractal geometry of nature. 2. New York: Ed. Freeman; 1983. [Google Scholar]
- Peng CK, Buldyrev SV, Havlin S, Simons M, Stanley HE, Goldberger AL. Mosaic organization of DNA nucleotides. Phys Rev E. 1994;49:1685–1689. doi: 10.1103/PhysRevE.49.1685. [DOI] [PubMed] [Google Scholar]
- Peng CK, Havlin S, Stanley HE, Goldberger AL. Quantification of scaling exponents and crossover phenomena in non-stationary heartbeat time series. Chaos. 1995;5:82–7. [DOI] [PubMed]
- Stam CJ. Nonlinear dynamical analysis of EEG and MEG: review of an emerging field. Clin Neurophysiol. 2005;116:2266–2301. doi: 10.1016/j.clinph.2005.06.011. [DOI] [PubMed] [Google Scholar]
- Stanley HE, Afanasyev V, Amaral LAN, Buldyrev SV, Goldberger AL, Havlin S, Leschhorn H, Maass P, Mantegna RN, Peng CK, Prince PA, Salinger MA, Stanley MHR, Viswanathan GM. Anomalous fluctuations in the dynamics of complex systems: from DNA and physiology econophysics. Physica A. 1996;224:302–321. doi: 10.1016/0378-4371(95)00409-2. [DOI] [Google Scholar]
- Wang SC, Li PC, Tseng HC. Long range correlation and possible electron conduction through DNA sequences. Physica A. 2008;387:5159–5168. doi: 10.1016/j.physa.2008.04.029. [DOI] [Google Scholar]



