Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2020 Nov 24;117(50):31582. doi: 10.1073/pnas.2015631117

Reply to Li et al.: Organism tree of life: Gene phylogeny vs. whole-proteome phylogeny

JaeJin Choi a,b,c, Sung-Hou Kim a,b,c,1
PMCID: PMC7749327  PMID: 33234566

In responding to the Letter by Li et al. (1) regarding our paper (2), we would like to remind the authors of the Letter that 1) the concept and structure of organism tree of life (“organism ToL”) have been evolving since Darwin’s time and are expected to continue to evolve as the types and amounts of genomic data increase, as well as the methods of analyzing the data and constructing the ToL improve; and 2) as a surrogate for the organism ToL, we use whole-proteome sequences (predicted by all protein coding genes) to construct “whole-proteome ToL” in contrast to the commonly used “gene ToLs.” The former uses an information theory (3)-based “alignment-free” method (4) to compare the whole-proteome sequence content, while the latter uses an “alignment-based” method (5) on only the reliably aligned regions of a set of select gene or protein sequences, which account, in general, for a very small fraction of a whole proteome.

Although the grouping patterns of the organisms agree well in both types of the ToLs, the branch lengths and branching orders of the trees do not. This difference is expected because of the differences in 1) the contents and descriptor of input data and 2) the basic assumptions of the types of mutations causing the evolution of organisms, from which branch lengths of the trees are derived (positional substitution mutation for gene trees vs. all types of mutations for whole-proteome trees).

These differences are two of the major reasons for the differences in the tree topologies as we pointed out in our paper (2). Figure 1A of the Letter (1) is simply an example for showing such differences. However, the authors of the Letter interpret the “differences” in the figure as showing the “inaccuracy” of our method, suggesting that the authors take the current consensus gene ToL as the “true” organism ToL, which we do not agree with, because the subjectively selected genes of an organism do not represent the organism fully.

What we need is to start discussions on the issues of whether the current gene ToLs are proper surrogates for the organism ToL, and of whether whole-proteome ToL has merits to be considered as a possible alternative surrogate for the organism ToL, among more to come in the future. These issues are relevant especially at this time when the availability of new whole-genome sequences of diverse as well as rare life forms is expanding rapidly.

After all, an organism ToL is a conceptual and metaphorical tree to capture a simplified narrative of what must have been a very complex and unpredictable evolutionary course of the extant organisms of today. Since such a tree cannot be experimentally validated, it is expected that the effort will continue to find a better surrogate of the organism ToL by a more comprehensive and computable descriptor of the characteristics associated with each organism and advancement of the analysis methods of the descriptors under as few subjective assumptions as possible at the time of investigation.

Footnotes

The authors declare no competing interest.

References


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES