Skip to main content
. 2021 May 5;9:e11348. doi: 10.7717/peerj.11348

Table 1. Details of TQMD runs and phylogenomic datasets built on eight different subsets of Bacteria.

For each dataset, TQMD was launched with the Jaccard Index as a distance, a pack size of 200, the loose clustering mode, and was allocated a maximum of 50 CPUs. Other parameters (direct or indirect strategy and distance threshold) are provided in the table, along with the total running time in CPU hours (h.CPU), the initial number of genomes (# starting), the number of representatives obtained (# repr.), the number of ribosomal protein alignments used in the supermatrix (# prot.), and the number of unambiguously aligned amino acids in the supermatrix (# AA). Further details (taxonomy and download links, Krona taxonomic plots, Forty-Two reports, supermatrices and trees) are available at https://doi.org/10.6084/m9.figshare.13238936.

Label Dataset Strategy Threshold h.CPU # starting # repr. # prot. # AA
A Bacteria (49) indirect 0.900 656 63,863 49 53 6338
B Bacteria (151) indirect 0.880 656 63,863 151 53 6187
C Actinobacteria direct 0.900 96 8859 20 51 6562
D Bacteroidetes direct 0.850 16 1225 37 49 6605
E Chlamydia direct 0.800 6 360 32 44 6131
F Cyanobacteria direct 0.800 8 428 46 48 6314
G Firmicutes direct 0.900 242 21,544 22 52 6536
H Proteobacteria direct 0.885 310 30,690 36 53 6471
I Archaea direct 0.850 8 432 86 57 7810