Table 4.
Description of the Empirical Data Sets Used in Our Benchmark.
Data Set | Families | Genes | GFTs | Gene Data Source |
---|---|---|---|---|
Primates13 | 16,670 | 268,338 | Inferred | Ensembl Compara |
Cyanobacteria36 | 1,099 | 41,035 | Inferred | Hogenom |
Vertebrates22 | 18,829 | 1,521,587 | Extracted | PhylomeDB |
Fungi16 | 7,180 | 85,866 | Extracted | Butler et al. (2009) |
Fungi60 | 5,665 | 391,471 | Inferred | PhylomeDB |
Plants23 | 21,469 | 1,652,464 | Inferred | PhylomeDB |
Life92 | 41,222 | 628,747 | Inferred | Williams et al. (2020) |
Archaea364 | 150 | 46,801 | Inferred | Dombrowski et al. (2020) |
Plants83 | 9,237 | 1,294,695 | Extracted | 1000k plants |
Vertebrates188 | 31,612 | 3,725,332 | Inferred | Ensembl Compara |
Note.—Data set names are suffixed by the number of species in the respective data set. Families are the number of input gene families. Genes are the total number of gene copies in the data set. GFTs indicate if we inferred the GFTs (“inferred”) or if we extracted them from the data source (“extracted”). Gene data source is the database or the project/publication from which the GFTs and/or gene family alignments were obtained.