Data stream dataset of SARS-CoV-2 genome

Raquel de M Barbosa; Marcelo AC Fernandes

doi:10.1016/j.dib.2020.105829

. 2020 Jun 10;31:105829. doi: 10.1016/j.dib.2020.105829

Data stream dataset of SARS-CoV-2 genome

Raquel de M Barbosa ^a,^b,^d,^⁎⁎, Marcelo AC Fernandes ^b,^c,^e,^⁎,^⁎⁎⁎

PMCID: PMC7306612 PMID: 32596428

Abstract

As of May 25, 2020, the novel coronavirus disease (called COVID-19) spread to more than 185 countries/regions with more than 348,000 deaths and more than 5,550,000 confirmed cases. In the bioinformatics area, one of the crucial points is the analysis of the virus nucleotide sequences using approaches such as data stream techniques and algorithms. However, to make feasible this approach, it is necessary to transform the nucleotide sequences string to numerical stream representation. Thus, the dataset provides four kinds of data stream representation (DSR) of SARS-CoV-2 virus nucleotide sequences. The dataset provides the DSR of 1557 instances of SARS-CoV-2 virus, 11540 other instances of other viruses from the Virus-Host DB dataset, and three instances of Riboviria viruses from NCBI (Betacoronavirus RaTG13, bat-SL-CoVZC45, and bat-SL-CoVZXC21).

Keywords: SARS-CoV-2, Data stream, COVID-19

Specifications Table
Subject	Biochemistry, Genetics and Molecular Biology (General)
Specific subject area	Bioinformatics
Type of data	Table
	Number
How data were acquired	NCBI - Genbank - SARS-CoV2 https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs
	Virus-Host-DB https://www.genome.jp/virushostdb
	Matlab Software
	Excel Software
Data format	Raw and analyzed data are in Matlab file (.mat) and Microsoft Excel file (.xlsx).
Parameters for data collection	The entire dataset was generated using MATLAB 2019b on Windows operating system with Intel Core - i5 6500T 2.5 GHz quad-core processor with 16GB of RAM.
Description of data collection	The raw data were downloaded from NCBI - Genbank, and Virus-Host-DB. The data stream values were generated using Matlab.
Data source location	Laboratory of Machine Learning and Intelligent Instrumentation, IMD/nPITI, Federal University of Rio Grande do Norte.
Data accessibility	https://data.mendeley.com/datasets/g5ktw4y4pz/2

Open in a new tab

Value of the Data

•
These data are useful because they provide numeric representation of the COVID-2019 epidemic virus (SARS-CoV-2). With this, it is possible to use data stream algorithms.
•
All researchers in bioinformatics, computing science, and computing engineering disciplines can benefit from these data because by using this numeric representation, they can apply several stream algorithms and techniques such as TEDA (Typicality and Eccentricity Data Analytic), TEDA-Cloud, TEDA-Cluster and Teda-Class in genomic information.
•
Data experiments that use analytic stream techniques in SARS-CoV-2 virus genomic information can be used with this dataset.
•
These data represent an simple way to evaluate the SARS-CoV-2 virus genome with stream algorithms.
•
Differently of the conventional bioinformatics techniques in which are based on dynamic programming (such as BLAST and other), this approach allows the utilization of different techniques (techniques commons in other areas) to find similarities between genome sequences.

1. Data Description

This work presents a dataset of data stream representation (DSR) of SARS-CoV-2 virus nucleotide sequences. The dataset contains two kinds of data, the raw data, and the processing data. The raw data is composed of the 1557 instances of the SARS-CoV-2 virus genome collected from the National Center for Biotechnology Information (NCBI) [1], 11540 instances of other viruses from the Virus-Host DB [2], [3], and the other three specific viruses also collected from NCBI (Betacoronavirus RaTG13, bat-SL-CoVZC45, and bat-SL-CoVZXC21). The last specific three viruses have high similarity with SARS-CoV-2 [4], [5]. The processing data is composed of four kinds of DSR called Direct Mapping (DM), DM with Chaos Game Representation (DM-CGR), k-mers mapping (kMersM) and k-mers mapping with CGR (kMersM-CGR). k-mers is a frequency count metric used in Bioinformatics. Other k-mers datasets are presented in [6], [7], [8].

In the Chaos Game Representation (CGR) [8], the genome sequence is transformed in a bi-dimensional signal (1D vector), and after that, this signal passes to infinite impulse response (IIR) filter [9]. The result of CGR is a signal that expressed the density of the bases and, at the same time, the transition between bases because the IIR is a memory system. CGR can be used with the signature of the genome sequence. With k-mers representation [10], the genome can be transformed into a 1D or 2D vector that represents the occurrence number of each base (frequency of the bases). k-mers also can be used with a signature of the genome sequence. However, in this manuscript, the genome sequence is transformed into a linear stream data, and this type of transformation can be used with stream algorithms. Another important aspect of this dataset is associated with applied CGR not in all sequences but just in each k bases (with mers or not). This strategy maintains the statistical characteristics and reduces the size of the stream.

The data is organized into three main directories: “SARS-CoV-2 data”, “Virus-Host DB data” and “Other viruses data”. Each main directory contains three files called “RawDataTable.mat”, “RawData.mat” and “RawData.xlsx”, and four sub-directories named “DirectMapping”, “DirectMappingCGR”, “kmersMapping” and “kmersMappingCGR”. “RawDataTable.mat”, “RawData.mat” and “RawData.xlsx” files store the raw data information from viruses database; they have the same information, however in the “RawDataTable.mat” the attributes are stored in Matlab table format (after 2013b version), in the “RawData.mat” the attributes are stored in Matlab cell arrays format, and in the “RawData.xlsx” the attributes are stored in a Microsoft Excel file. In the sub-directories “DirectMapping”, “DirectMappingCGR”, “kmersMapping” and “kmersMappingCGR” are stored the DM, DM-CGR, kMersM and kMersM-CGR data stream representation, respectively. Inside each sub-directory the files are called:

•
For DM, the DSR was generated for $k = 1 \dots 5$ and the files are called “PointsData_1_k=k.mat”;
•
For DM-CGR, the DSR was generated for $k = 1 \dots 7$ and the files are called “PointsDataCGR_1_k=k.mat”;
•
For kMersM, the DSR was generated for $k = 2 \dots 5$ and the files are called “PointsDatakmers_1_k=k.mat”;
•
For kMersM-CGR:
- •
  In the directories “Other viruses data” and “SARS-CoV-2 data”, the DSR was generated for $k = 2 \dots 7$ and the files are called “PointsDatakmersCGR_1_k=k.mat”;
- •
  In the “Virus-Host DB data”, the DSR was generated for $k = 2, 3, 5, and 7$ and the files are called “PointsDatakmersCGR_1_k=k.mat”;

For the main directory “Virus-Host DB data”, the values are stored in 10 files where each i-th file is called “PointsData_k_k=k.mat” for sub-directory “DirectMapping”, “PointsDataCGR_i_k=k.mat” for DM-CGR, “PointsDatakmers_i_k=k.mat” for kMersM and “PointsDatakmersCGR_i_k=k.mat” for kMersM-CGR.

2. Experimental design, materials, and methods

The streams were based in nucleotide sequence, s, expressed as

s = [s_{1}, \dots, s_{n}, \dots, s_{N}]

(1)

where N is the length of sequence and s_n is the nth nucleotide of the sequence.

For DM and DM-CGR, the nucleotide sequence, s, are grouped in sub-sequences of the k bases. The group of sub-sequences can be expressed as

B = [\begin{matrix} b_{1} \\ ⋮ \\ b_{i} \\ ⋮ \\ b_{K} \end{matrix}] = [\begin{matrix} s_{1} & \dots & s_{k} \\ ⋮ & ⋱ & ⋮ \\ s_{k (i - 1) + 1} & \dots & s_{k (i - 1) + k} \\ ⋮ & ⋱ & ⋮ \\ s_{K - k + 1} & \dots & s_{K} \end{matrix}]

(2)

where

K = k \times ⌊ \frac{N}{k} ⌋

(3)

and the i-th vector b_i is a i-th group of the k nucleotides, that is

b_{i} = [b_{i, 1}, \dots, b_{i, j}, \dots, b_{i, k}] = [s_{k (i - 1) + 1}, \dots, s_{k (i - 1) + j}, \dots, s_{k (i - 1) + k}] .

(4)

For DM, the group of sup-sequences, stored in matrix B, are transformed in a sequence of the integer values expressed as

c = [c_{1}, \dots, c_{i}, \dots, c_{K}]

(5)

where c is the DM stream stored in dataset. The DM stream, c, calculus can be expressed as

{[\begin{matrix} c_{1} \\ ⋮ \\ c_{i} \\ ⋮ \\ c_{K} \end{matrix}]}^{T} = f_{map} (B) = [\begin{matrix} f_{map} (b_{1}) \\ ⋮ \\ f_{map} (b_{i}) \\ ⋮ \\ f_{map} (b_{K}) \end{matrix}]

(6)

where f_map( · ) is the mapping function expressed by

c_{i} = f_{map} (b_{i}) = (\sum_{j = 0}^{k - 1} 4^{j} \times (u_{i, j} - 1)) + 1

(7)

and

u_{i, j} = {\begin{matrix} 1 & for b_{i, j + 1} = T or U \\ 2 & for b_{i, j + 1} = C \\ 3 & for b_{i, j + 1} = A \\ 4 & for b_{i, j + 1} = G \end{matrix} .

(8)

For DM-CGR, the stream is characterized by vector a expressed as

a = [a_{1}, \dots, a_{i}, \dots, a_{K}]

(9)

where the a_i is the i-th value of CGR. In CGR (see [11], [12]) each element a_i is a bi-dimensional value expressed as

a_{i} = (a_{i}^{x}, a_{i}^{y})

(10)

where $a_{i}^{x}$ and $a_{i}^{y}$ are the x-axes and y-axes in bi-dimensional space, receptively. The values of the CGR are calculate using the functions $f_{CGR}^{x} (\cdot)$ and $f_{CGR}^{y} (\cdot)$ in Matrix B, that is

{[\begin{matrix} (a_{1}^{x}, a_{1}^{y}) \\ ⋮ \\ (a_{i}^{x}, a_{i}^{y}) \\ ⋮ \\ (a_{K}^{x}, a_{K}^{y}) \end{matrix}]}^{T} = (f_{CGR}^{x} (B), f_{CGR}^{y} (B)) = [\begin{matrix} (f_{CGR}^{x} (b_{1}), f_{CGR}^{y} (b_{1})) \\ ⋮ \\ (f_{CGR}^{x} (b_{i}), f_{CGR}^{y} (b_{i})) \\ ⋮ \\ (f_{CGR}^{x} (b_{K}), f_{CGR}^{y} (b_{K})) \end{matrix}] .

(11)

The function $f_{CGR}^{x} (\cdot)$ calculates the x-axes value of the CGR and it can be expressed as

a_{i}^{x} = f_{CGR}^{x} (b_{i}) = p_{i, k}^{x}

(12)

where

p_{i, j}^{x} = \frac{1}{2} u_{i, j}^{x} + \frac{1}{2} p_{i, j - 1}^{x}, for j = 1, \dots, k

(13)

and

u_{i, j}^{x} = {\begin{matrix} 1 & for b_{i, j} = A \\ - 1 & for b_{i, j} = T or U \\ - 1 & for b_{i, j} = C \\ 1 & for b_{i, j} = G \end{matrix} .

(14)

For y-axes, the function, $f_{CGR}^{y} (\cdot),$ can be expressed as

a_{i}^{y} = f_{CGR}^{y} (b_{i}) = p_{i, k}^{y}

(15)

where

p_{i, j}^{y} = \frac{1}{2} u_{i, j}^{y} + \frac{1}{2} p_{i, j - 1}^{y}, for j = 1, \dots, k

(16)

and

u_{i, j}^{y} = {\begin{matrix} 1 & for b_{i, j} = A \\ 1 & for b_{i, j} = T or U \\ - 1 & for b_{i, j} = C \\ - 1 & for b_{i, j} = G \end{matrix} .

(17)

For the initial condition, $j = 0,$ $p_{i, 0}^{x} = α_{x}$ and $p_{i, 0}^{y} = α_{y}$ [11], [12]. The dataset was generated with $α_{x} = 0$ and $α_{y} = 0$ .

For kMersM and kMersM-CGR, the nucleotide sequence, s, are grouped in k-mers sub-sequences [13], [14] in the matrix H that can expressed as

H = [\begin{matrix} h_{1} \\ h_{2} \\ ⋮ \\ h_{i} \\ ⋮ \\ h_{N - k} \\ h_{N - k + 1} \end{matrix}] = [\begin{matrix} s_{1} & \dots & s_{k} \\ s_{2} & \dots & s_{k + 1} \\ ⋮ & ⋱ & ⋮ \\ s_{i} & \dots & s_{i + k} \\ ⋮ & ⋱ & ⋮ \\ s_{N - k} & \dots & s_{N - 1} \\ s_{N - k + 1} & \dots & s_{N} \end{matrix}] .

(18)

The kMersM, stream is characterized as a sequence of the integer values expressed as

r = [r_{1}, \dots, r_{i}, \dots, r_{N - k + 1}]

(19)

where

{[\begin{matrix} r_{1} \\ ⋮ \\ r_{i} \\ ⋮ \\ h_{N - k + 1} \end{matrix}]}^{T} = f_{map} (H) = [\begin{matrix} f_{map} (h_{1}) \\ ⋮ \\ f_{map} (h_{i}) \\ ⋮ \\ f_{map} (h_{N - k + 1}) \end{matrix}] .

(20)

The function f_map( · ) is the mapping processing characterized by Eqs. (7) and (8). The kMersM-CGR is stored in the vector z expressed as

z = [z_{1}, \dots, z_{i}, \dots, z_{N - k + 1}]

(21)

where the z_i is the i-th value of CGR. Each ith element z_i is a bi-dimensional value expressed as

z_{i} = (z_{i}^{x}, z_{i}^{y})

(22)

where $z_{i}^{x}$ and $z_{i}^{y}$ are the x-axes and y-axes in bi-dimensional space, receptively. The values of the CGR are calculate using the functions $f_{CGR}^{x} (\cdot)$ (see Eqs. (12)–(14)) and $f_{CGR}^{y} (\cdot)$ (see Equation see Eqs. (15)–(17)) in Matrix H, that is

\begin{matrix} {[\begin{matrix} (z_{1}^{x}, z_{1}^{y}) \\ ⋮ \\ (z_{i}^{x}, z_{i}^{y}) \\ ⋮ \\ (z_{N - k + 1}^{x}, z_{N - k + 1}^{y}) \end{matrix}]}^{T} & = (f_{CGR}^{x} (H), f_{CGR}^{y} (H)) \\ = [\begin{matrix} (f_{CGR}^{x} (h_{1}), f_{CGR}^{y} (h_{1})) \\ ⋮ \\ (f_{CGR}^{x} (h_{i}), f_{CGR}^{y} (h_{i})) \\ ⋮ \\ (f_{CGR}^{x} (h_{N - k + 1}), f_{CGR}^{y} (h_{N - k + 1})) \end{matrix}] . \end{matrix}

(23)

Fig. 1, Fig. 2, Fig. 3, Fig. 4 show the DSR examples for SARS-CoV-2 from Brazil, respectively.

Fig. 3 — Example of the kMersM-DSR values for the SARS-CoV-2 sequence ( $i = 500 \dots 600$ ) stored in dataset (MT126808 - Brazil).

Fig. 4 — Example of the kMersM-CGR-DSR values for the SARS-CoV-2 sequence ( $i = 500 \dots 600$ ) stored in dataset (MT126808 - Brazil).

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors wish to acknowledge the financial support of the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for their financial support.

Footnotes

Supplementary material associated with this article can be found, in the online version, at 10.1016/j.dib.2020.105829

Contributor Information

Raquel de M. Barbosa, Email: raquelmb@mit.edu.

Marcelo A.C. Fernandes, Email: mfernandes@dca.ufrn.br.

Appendix A. Supplementary materials

Supplementary Data S1

Supplementary Raw Research Data. This is open data under the CC BY license http://creativecommons.org/licenses/by/4.0/

mmc1.xml^{(1.1KB, xml)}

References

1.NCBI, SARS-CoV-2 (Severe acute respiratory syndrome coronavirus 2) Sequences, 2020, (https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/).
2.Mihara T., Nishimura Y., Shimizu Y., Nishiyama H., Yoshikawa G., Uehara H., Hingamp P., Goto S., Ogata H. Linking virus genomes with host taxonomy. Viruses. 2016;8(3) doi: 10.3390/v8030066. [DOI] [PMC free article] [PubMed] [Google Scholar]; URL https://www.mdpi.com/1999-4915/8/3/66
3.Virus-Host DB, Virus-Host DB - Website, 2020, https://www.genome.jp/virushostdb.
4.Randhawa G.S., Soltysiak M.P., Roz H.E., de Souza C.P., Hill K.A., Kari L. Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: Covid-19 case study. bioRxiv. 2020 doi: 10.1101/2020.02.03.932350. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Randhawa G.S., Soltysiak M.P., Roz H.E., de Souza C.P., Hill K.A., Kari L. Machine learning-based analysis of genomes suggests associations between wuhan 2019-ncov and bat betacoronaviruses. bioRxiv. 2020 doi: 10.1101/2020.02.03.932350. [DOI] [Google Scholar]
6.de M. Barbosa R., Fernandes M.A. Chaos game representation dataset of sars-cov-2 genome. Mendeley Data. 2020;v2 doi: 10.17632/nvk5bf3m2f.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.de M. Barbosa R., Fernandes M.A. k-mers 1d and 2d representation dataset of sars-cov-2 nucleotide sequences. Mendeley Data. 2020;v2 doi: 10.17632/f5y9cggnxy.2. [DOI] [Google Scholar]
8.de M. Barbosa R., Fernandes M.A. Chaos game representation dataset of sars-cov-2 genome. Data Brief. 2020;30:105618. doi: 10.1016/j.dib.2020.105618. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Proakis J.G., Manolakis D.K. (4th Ed.) Prentice-Hall, Inc.; USA: 2006. Digital Signal Processing. [Google Scholar]
10.Pinello L., Lo Bosco G., Yuan G.-C. Applications of alignment-free methods in epigenomics. Briefings in Bioinf. 2014;15(3):419–430. doi: 10.1093/bib/bbt078. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Jeffrey H. Chaos game representation of gene structure. Nucleic Acids Research. 1990;18(8):2163–2170. doi: 10.1093/nar/18.8.2163. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.C. Yin, Encoding dna sequences by integer chaos game representation, 2017, arXiv: 1712.04546 [DOI] [PubMed]
13.Mapleson D., Garcia Accinelli G., Kettleborough G., Wright J., Clavijo B.J. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics. 2016;33(4):574–576. doi: 10.1093/bioinformatics/btw663. [DOI] [PMC free article] [PubMed] [Google Scholar]; URL https://academic.oup.com/bioinformatics/article-pdf/33/4/574/25146635/btw663.pdf
14.Chor B., Horn D., Goldman N., Levy Y., Massingham T. Genomic dna k-mer spectra: models and modalities. Genome Biol. 2009;10(10):R108. doi: 10.1186/gb-2009-10-10-r108. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data S1

Supplementary Raw Research Data. This is open data under the CC BY license http://creativecommons.org/licenses/by/4.0/

mmc1.xml^{(1.1KB, xml)}

[bib0001] 1.NCBI, SARS-CoV-2 (Severe acute respiratory syndrome coronavirus 2) Sequences, 2020, (https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/).

[bib0002] 2.Mihara T., Nishimura Y., Shimizu Y., Nishiyama H., Yoshikawa G., Uehara H., Hingamp P., Goto S., Ogata H. Linking virus genomes with host taxonomy. Viruses. 2016;8(3) doi: 10.3390/v8030066. [DOI] [PMC free article] [PubMed] [Google Scholar]; URL https://www.mdpi.com/1999-4915/8/3/66

[bib0003] 3.Virus-Host DB, Virus-Host DB - Website, 2020, https://www.genome.jp/virushostdb.

[bib0004] 4.Randhawa G.S., Soltysiak M.P., Roz H.E., de Souza C.P., Hill K.A., Kari L. Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: Covid-19 case study. bioRxiv. 2020 doi: 10.1101/2020.02.03.932350. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0005] 5.Randhawa G.S., Soltysiak M.P., Roz H.E., de Souza C.P., Hill K.A., Kari L. Machine learning-based analysis of genomes suggests associations between wuhan 2019-ncov and bat betacoronaviruses. bioRxiv. 2020 doi: 10.1101/2020.02.03.932350. [DOI] [Google Scholar]

[bib0006] 6.de M. Barbosa R., Fernandes M.A. Chaos game representation dataset of sars-cov-2 genome. Mendeley Data. 2020;v2 doi: 10.17632/nvk5bf3m2f.2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0007] 7.de M. Barbosa R., Fernandes M.A. k-mers 1d and 2d representation dataset of sars-cov-2 nucleotide sequences. Mendeley Data. 2020;v2 doi: 10.17632/f5y9cggnxy.2. [DOI] [Google Scholar]

[bib0008] 8.de M. Barbosa R., Fernandes M.A. Chaos game representation dataset of sars-cov-2 genome. Data Brief. 2020;30:105618. doi: 10.1016/j.dib.2020.105618. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0009] 9.Proakis J.G., Manolakis D.K. (4th Ed.) Prentice-Hall, Inc.; USA: 2006. Digital Signal Processing. [Google Scholar]

[bib0010] 10.Pinello L., Lo Bosco G., Yuan G.-C. Applications of alignment-free methods in epigenomics. Briefings in Bioinf. 2014;15(3):419–430. doi: 10.1093/bib/bbt078. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0011] 11.Jeffrey H. Chaos game representation of gene structure. Nucleic Acids Research. 1990;18(8):2163–2170. doi: 10.1093/nar/18.8.2163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0012] 12.C. Yin, Encoding dna sequences by integer chaos game representation, 2017, arXiv: 1712.04546 [DOI] [PubMed]

[bib0013] 13.Mapleson D., Garcia Accinelli G., Kettleborough G., Wright J., Clavijo B.J. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics. 2016;33(4):574–576. doi: 10.1093/bioinformatics/btw663. [DOI] [PMC free article] [PubMed] [Google Scholar]; URL https://academic.oup.com/bioinformatics/article-pdf/33/4/574/25146635/btw663.pdf

[bib0014] 14.Chor B., Horn D., Goldman N., Levy Y., Massingham T. Genomic dna k-mer spectra: models and modalities. Genome Biol. 2009;10(10):R108. doi: 10.1186/gb-2009-10-10-r108. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Data stream dataset of SARS-CoV-2 genome

Raquel de M Barbosa

Marcelo AC Fernandes

Abstract

Value of the Data

1. Data Description

2. Experimental design, materials, and methods

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Declaration of Competing Interest

Acknowledgments

Footnotes

Contributor Information

Appendix A. Supplementary materials

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Data stream dataset of SARS-CoV-2 genome

Raquel de M Barbosa

Marcelo AC Fernandes

Abstract

Value of the Data

1. Data Description

2. Experimental design, materials, and methods

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Declaration of Competing Interest

Acknowledgments

Footnotes

Contributor Information

Appendix A. Supplementary materials

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases