In responding to the Letter by Li et al. (1) regarding our paper (2), we would like to remind the authors of the Letter that 1) the concept and structure of organism tree of life (“organism ToL”) have been evolving since Darwin’s time and are expected to continue to evolve as the types and amounts of genomic data increase, as well as the methods of analyzing the data and constructing the ToL improve; and 2) as a surrogate for the organism ToL, we use whole-proteome sequences (predicted by all protein coding genes) to construct “whole-proteome ToL” in contrast to the commonly used “gene ToLs.” The former uses an information theory (3)-based “alignment-free” method (4) to compare the whole-proteome sequence content, while the latter uses an “alignment-based” method (5) on only the reliably aligned regions of a set of select gene or protein sequences, which account, in general, for a very small fraction of a whole proteome.
Although the grouping patterns of the organisms agree well in both types of the ToLs, the branch lengths and branching orders of the trees do not. This difference is expected because of the differences in 1) the contents and descriptor of input data and 2) the basic assumptions of the types of mutations causing the evolution of organisms, from which branch lengths of the trees are derived (positional substitution mutation for gene trees vs. all types of mutations for whole-proteome trees).
These differences are two of the major reasons for the differences in the tree topologies as we pointed out in our paper (2). Figure 1A of the Letter (1) is simply an example for showing such differences. However, the authors of the Letter interpret the “differences” in the figure as showing the “inaccuracy” of our method, suggesting that the authors take the current consensus gene ToL as the “true” organism ToL, which we do not agree with, because the subjectively selected genes of an organism do not represent the organism fully.
What we need is to start discussions on the issues of whether the current gene ToLs are proper surrogates for the organism ToL, and of whether whole-proteome ToL has merits to be considered as a possible alternative surrogate for the organism ToL, among more to come in the future. These issues are relevant especially at this time when the availability of new whole-genome sequences of diverse as well as rare life forms is expanding rapidly.
After all, an organism ToL is a conceptual and metaphorical tree to capture a simplified narrative of what must have been a very complex and unpredictable evolutionary course of the extant organisms of today. Since such a tree cannot be experimentally validated, it is expected that the effort will continue to find a better surrogate of the organism ToL by a more comprehensive and computable descriptor of the characteristics associated with each organism and advancement of the analysis methods of the descriptors under as few subjective assumptions as possible at the time of investigation.
Footnotes
The authors declare no competing interest.
References
- 1.Li Y., et al. , Feature frequency profile-based phylogenies are inaccurate. Proc. Natl. Acad. Sci. U.S.A. 117, 31580–31581 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Choi J., Kim S.-H., Whole-proteome tree of life suggests a deep burst of organism diversity. Proc. Natl. Acad. Sci. U.S.A. 117, 3678–3686 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Shannon C., A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948). [Google Scholar]
- 4.Zielezinski A., Vinga S., Almeida J., Karlowski W. M., Alignment-free sequence comparison: Benefits, applications, and tools. Genome Biol. 18, 186 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.National Center for Biotechnology Information , Genetic analysis software. https://www.ncbi.nlm.nih.gov/guide/howto/dwn-software/. Accessed 26 July 2020.
