Skip to main content
. 2020 Jul 6;9(7):giaa072. doi: 10.1093/gigascience/giaa072

Table 2:

Test datasets

A) Genome sequence datasets
Category Organism Accession Size
Virus Gordoniaphage GAL1 [61] GCF_001884535.1 50.7 kB
Bacteria WS1 bacterium JGI 0000059-K21 [60] GCA_000398605.1 522 kB
Protist Astrammina rara [60] GCA_000211355.2 1.71 MB
Fungus Nosema ceranae [60] GCA_000988165.1 5.81 MB
Protist Cryptosporidium parvumIowa II [60] GCA_000165345.1 9.22 MB
Protist Spironucleus salmonicida [60] GCA_000497125.1 13.1 MB
Protist Tieghemostelium lacteum [60] GCA_001606155.1 23.7 MB
Fungus Fusarium graminearumPH-1 [61] GCF_000240135.3 36.9 MB
Protist Salpingoeca rosetta [60] GCA_000188695.1 56.2 MB
Algae Chondrus crispus [60] GCA_000350225.2 106 MB
Algae Kappaphycus alvarezii [60] GCA_002205965.2 341 MB
Animal Strongylocentrotus purpuratus [61] GCF_000002235.4 1.01 GB
Plant Picea abies [60] GCA_900067695.1 13.4 GB
B) Other DNA datasets
Dataset No. of sequences Size Source Date
Mitochondrion [61] 9,402 245 MB RefSeq ftp: ftp://ftp.ncbi.nlm.nih.gov/refseq/release/mitochondrion/mitochondrion.1.1.genomic.fna.gz 15 March 2019
ftp://ftp.ncbi.nlm.nih.gov/refseq/release/mitochondrion/mitochondrion.2.1.genomic.fna.gz
NCBI Virus Complete Nucleotide Human [62] 36,745 482 MB NCBI Virus: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/ 11 May 2020
Influenza [63] 700,001 1.22 GB Influenza Virus Database: ftp://ftp.ncbi.nih.gov/genomes/INFLUENZA/influenza.fna.gz 27 April 2019
Helicobacter [60] 108,292 2.76 GB NCBI Assembly: https://www.ncbi.nlm.nih.gov/assembly 24 April 2019
C) RNA datasets
SILVA 132 LSURef [64] 198,843 610 MB Silva database: https://ftp.arb-silva.de/release_132/Exports/SILVA_132_LSURef_tax_silva.fasta.gz 11 December 2017
SILVA 132 SSURef Nr99 [64] 695,171 1.11 GB Silva database: https://ftp.arb-silva.de/release_132/Exports/SILVA_132_SSURef_Nr99_tax_silva.fasta.gz 11 Devember 2017
SILVA 132 SSURef [64] 2,090,668 3.28 GB Silva database: https://ftp.arb-silva.de/release_132/Exports/SILVA_132_SSURef_tax_silva.fasta.gz 11 December 2017
D) Multiple DNA sequence alignments
UCSC hg38 7way knownCanonical-exonNuc [65] 1,470,154 340 MB UCSC: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz7way/alignments/knownCanonical.exonNuc.fa.gz 6 June 2014
UCSC hg38 20way knownCanonical-exonNuc [65] 4,211,940 969 MB UCSC: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz20way/alignments/knownCanonical.exonNuc.fa.gz 30 June 2015
E) Protein datasets
PDB [66] 109,914 67.6 MB PDB database FTP: ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz 9 April 2019
Homo sapiens GRCh38 [67] 105,961 73.2 MB NCBI ftp: ftp://ftp.ensembl.org/pub/release-96/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.all.fa.gz 12 March 2019
NCBI Virus RefSeq Protein [62] 373,332 122 MB NCBI Virus: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/ 10 May 2020
UniProtKB Reviewed (Swiss-Prot) [68] 560,118 277 MB UniProt ftp: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz 2 April 2019