Skip to main content
. 2019 Jul 25;20:144. doi: 10.1186/s13059-019-1755-7

Table 2.

Overview of the reference data sets

Category Name # Sequences Average sequence length # Files # Sequence comparisons
Regulatory element detection Cis-regulatory modules (CRMs) [6] 370 764 nt 370 68,256
Protein sequence classification Low sequence identity (< 40%) [57] 1,066 180 aa 1,066 567,645
High sequence identity (≥ 40%) [57] 2,128 184 aa 2,128 2,263,128
Gene tree inference SwissTree [58] 651 398 aa 651 211,575
Genome-based phylogeny Assembled genomes
 29 E. coli/Shigella strains 29 4,895,247 nt 29 406
 14 plant species 14 337,515,688 nt 14 91
 25 fish mitochondrial genomes [59] 25 16,623 nt 25 300
Unassembled genomes
 29 E. coli/Shigella strains
  Coverage 0.03125 29,557 150 nt 29 406
  Coverage 0.0625 59,116 150 nt 29 406
  Coverage 0.125 118,266 150 nt 29 406
  Coverage 0.25 236,541 150 nt 29 406
  Coverage 0.5 473,081 150 nt 29 406
  Coverage 1 946,169 150 nt 29 406
  Coverage 5 4,730,778 150 nt 29 406
 14 plant species
  Coverage 0.015625 48,274 150 nt 14 91
  Coverage 0.03125 96,489 150 nt 14 91
  Coverage 0.0625 1,931,268 150 nt 14 91
  Coverage 0.125 3,862,905 150 nt 14 91
  Coverage 0.25 7,725,928 150 nt 14 91
  Coverage 0.5 15,461,718 150 nt 14 91
  Coverage 1 30,903,727 150 nt 14 91
Horizontal gene transfer 27 E. coli/Shigella genomes [60] 27 4,905,896 nt 27 351
8 Yersinia species [61] 8 4,605,553 nt 8 28
33 simulated genomes [62]
 HGT level 0 33 2,205,524 nt 33 528
 HGT level 250 33 2,149,620 nt 33 528
 HGT level 500 33 2,230,317 nt 33 528
 HGT level 750 33 2,263,926 nt 33 528
 HGT level 1,000 33 2,238,661 nt 33 528

An interactive visualization of all results for all data sets can be found online (http://afproject.org)