Skip to main content
. 2022 Jan 11;39(2):msab365. doi: 10.1093/molbev/msab365

Table 4.

Description of the Empirical Data Sets Used in Our Benchmark.

Data Set Families Genes GFTs Gene Data Source
Primates13 16,670 268,338 Inferred Ensembl Compara
Cyanobacteria36 1,099 41,035 Inferred Hogenom
Vertebrates22 18,829 1,521,587 Extracted PhylomeDB
Fungi16 7,180 85,866 Extracted Butler et al. (2009)
Fungi60 5,665 391,471 Inferred PhylomeDB
Plants23 21,469 1,652,464 Inferred PhylomeDB
Life92 41,222 628,747 Inferred Williams et al. (2020)
Archaea364 150 46,801 Inferred Dombrowski et al. (2020)
Plants83 9,237 1,294,695 Extracted 1000k plants
Vertebrates188 31,612 3,725,332 Inferred Ensembl Compara

Note.—Data set names are suffixed by the number of species in the respective data set. Families are the number of input gene families. Genes are the total number of gene copies in the data set. GFTs indicate if we inferred the GFTs (“inferred”) or if we extracted them from the data source (“extracted”). Gene data source is the database or the project/publication from which the GFTs and/or gene family alignments were obtained.