Abstract
Two errors in the article Best Match Graphs (Geiß et al. in JMB 78: 2015–2057, 2019) are corrected. One concerns the tacit assumption that digraphs are sink-free, which has to be added as an additional precondition in Lemma 9, Lemma 11, Theorem 4. Correspondingly, Algorithm 2 requires that its input is sink-free. The second correction concerns an additional necessary condition in Theorem 9 required to characterize best match graphs. The amended results simplify the construction of least resolved trees for n-cBMGs, i.e., Algorithm 1. All other results remain unchanged and are correct as stated.
Keywords: Corrigendum, Phylogenetic Combinatorics, Colored digraph, Rooted triples
Best match graphs (BMGs) must be sink-free
Throughout Geiß et al. (2019) we have tacitly assumed that all vertex-colored digraphs satisfy the following property, which, by construction, is true for all colored best match graphs (cBMGs):
For each vertex x with color , there is an arc xy to at least one vertex of y every other color .
All properly 2-colored digraphs appearing in the text are therefore assumed to be sink-free, i.e., the out-neighborhoods of their -classes are assumed to be non-empty:
-
(N4)
for all .
This assumption was not clearly stated in the text.
Property (N4) is required in the proof of Lemma 9 [last line on page 2032]: Here only implies for the classes with if , which in turn is equivalent to . Furthermore, implies and thus . Property (N4) is therefore also necessary to ensure that and thus that the tree is phylogenetic [page 2035, just before Theorem 4]. In summary, Lemma 9, Lemma 11, and Theorem 4 require (N4) as additional precondition.
Corrected characterization of n-cBMGs
The second paragraph of the proof of Theorem 9 in (Geiß et al. 2019, page 2045) incorrectly states that “for any and any color the out-neighborhood is the same w.r.t. and w.r.t. .”, leading to the incorrect conclusion that whenever exists. We recall that the triple set R is defined as the union
of all triples in the least resolved trees that explain the induced subgraphs of , and the Aho tree is defined on the leaf set (which may not have been clear from the wording in the text). We shall show in Proposition 3 below that is always a subgraph of whenever R is consistent. The example in Fig. 1 shows, however, that is possible because can contain triples that are not present in any of the 2-colored trees .
As a consequence, the characterization of n-cBMGs requires the equality as an additional condition. The corrected result, with the correction underlined, reads as follows:
Theorem 9
A connected colored digraph is an n-cBMG if and only if (1) all induced subgraphs on two colors are 2-cBMGs, (2) the union R of all triples obtained from their least resolved trees forms a consistent set, and . In particular, is the unique least resolved tree that explains .
Condition (1) in Theroem 9, i.e., the requirement that all 2-colored induced subgraphs of are 2-cBMGs, is necessary to ensure that the least resolved trees exist and thus that the triple sets – and therefore also the set R of all triples displayed by the 2-colored induced subgraphs – are well-defined. Consistency of R is necessary for the existence of . Clearly, Condition (3) is sufficient to ensure that is an n-cBMG. Hence, it remains to show that Condition (3) is also necessary. This is achieved in Proposition 2 below.
Proof of theorem 9
Instead of proving the corrected version of Theorem 9 directly, we first state and prove a slightly stronger and more convenient result, Theorem 1 below, and then proceed to derive Theroem 9. To this end, we first generalize Definition 8 in (Geiß et al. (2019), page 2036) to digraphs with an arbitrary number of colors:
Definition 1
(Schaller et al. (2021), Definition 2.7) Let be a colored digraph. We say that a triple is informative for if a, b and are pairwise distinct vertices in such that (i) and (ii) and . The set of informative triples is denoted by .
We briefly argue that, for 2-colored digraphs, the definition of informative triples given here is equivalent to the one given in Geiß et al. (2019): By definition, an informative triple of some colored digraph has vertices with exactly two colors, and thus is also an informative triple in one of its 2-colored induced subgraphs. It is easy to check that, for 2-colored digraphs, Definition 1 is equivalent to Definition 8 in Geiß et al. (2019), since the four induced subgraphs shown in Fig. 8 in (Geiß et al. (2019), page 2036) correspond to the presence or absence of the two optional arcs ba and ca in the informative triple ab|c (as defined here).
We will also make use of a generalization of Lemma 12 in (Geiß et al. (2019), page 2036):
Lemma 1
(Schaller et al. (2021), Lemma 2.8) Let be an n-cBMG and an informative triple for . Then, every tree that explains displays the triple , i.e. .
Given a digraph for which R exists, Lemma 1 in particular implies that
1 |
With these preliminaries, we are ready to formulate our new main result as
Theorem 1
A colored digraph is an n-cBMG if and only if . Moreover, is the unique least resolved tree explaining an n-cBMG .
In order to prove Theorem 1, we will first provide several technical results that make use of the notion of (non-)redundant tree edges and, in particular, of least resolved trees. Recall that an inner edge e in a leaf-colored tree is redundant if the tree obtained from T by contraction of e explains the same n-cBMG, i.e., if . A tree is called least resolved if it does not contain any redundant edges. We will need the following, simplified, characterization of redundant edges:
Lemma 2
(Schaller et al. (2021), Lemma 2.10) Let be an n-cBMG explained by a tree . The edge with in is redundant w.r.t. if and only if (i) e is an inner edge of T and (ii) there is no arc such that and .
We note that the proofs of Lemma 1 (Schaller et al. (2021), Lemma 2.8) and Lemma 2 (Schaller et al. (2021), Lemma 2.10) only require the definition of best match graphs, and are thus independent of the results proved in Geiß et al. (2019).
Following Bryant and Steel (1995), an inner edge e of a rooted tree T is distinguished by a triple if the path from a to c in T intersects the path from b to the root precisely on the edge e. In other words, with is distinguished by ab|c if and . Lemma 2 immediately implies the following generalization of Lemma 13 in Geiß et al. (2019):
Corollary 1
Let be an n-cBMG explained by a tree . An inner edge e of is non-redundant w.r.t. if and only if it is distinguished by an informative triple for . In particular, if is least resolved, then each of its inner edges is distinguished by an informative triple.
In addition, we will need the following two technical results relating subtrees and induced subgraphs of n-cBMGs.
Lemma 3
Let be a tree explaining an n-cBMG . Then holds for every .
Proof
Let and . By definition, we have . First assume that for some . Hence, it holds that for all with in T(u) and thus, since T(u) is a subtree of T, we have for all with in T. Therefore, . Since and is the subgraph of induced by L(T(u)), we have and thus . Now assume for some . Hence, . Consequently, there is no leaf in T with such that . This clearly also holds for the subtree T(u). Therefore, we have and thus .
Lemma 4
If is least resolved for an n-cBMG , then the subtree T(u) is least resolved for the n-cBMG for each .
Proof
The statement is trivially satisfied if T(u) does not contain any inner edges, which is exactly the case if either or with . Thus, let and . Since is least resolved, it does not contain redundant edges. Let vw be an inner edge of T(u) with , and note that vw must also be an inner edge in T. By Lemma 2 and since vw is not redundant in T, there is an arc such that and . Since , Lemma 3 implies that ab is also an arc in and . Hence, in particular, we have . We can now apply Lemma 2 to conclude that vw is not redundant in T(u). Since vw was chosen arbitrarily, we conclude that T(u) does not contain any redundant edge and thus, it must be least resolved for for all .
We finally relate the subtrees T(u) to the construction of the Aho-graph as specified in (Geiß et al. (2019), Sec. 3.4). Given a set of triples R on L, we will write for the set of triples with .
Lemma 5
Let be least resolved for an n-cBMG with informative triple set . Then, L(T(v)) is a connected component in the Aho-graph for every inner vertex u and each of its children .
Proof
We proceed by induction on . The statement trivially holds for . Hence, suppose that and assume that the statement is true for every n-cBMG with less than |L| vertices.
Let u be an inner vertex of T and . We first show that L(T(v)) is connected in , and then argue that there are no edges between L(T(v)) and , i.e., that L(T(v)) forms a connected component.
If uv is an outer edge, i.e. v is a leaf, then L(T(v)) is trivially connected. Now suppose that uv is an inner edge of T. By Lemmas 3 and 4, is explained by the least resolved tree . By the induction hypothesis, L(T(w)) forms a connected component in for all children . Together with , this implies that the elements in L(T(w)) are also connected in for all . Since uv is an inner edge of the least resolved tree , we can apply Corollary 1 to conclude that there is an informative triple in that distinguishes uv, i.e. and with color . Hence, is also contained in . In particular, there are children such that and , and the edge ab connects L(T(w)) and in .
Now suppose that there is an additional child . We distinguish two cases. Either there is a leaf with or no such leaf exists. If there is such a leaf , then forms an arc in and is an informative triple making L(T(w)) and connected in . Otherwise, take an arbitrary leaf . Since , we have and thus, there is an arc cb in . Since , the arc is not contained in . Hence, is an informative triple making and connected in .
Therefore, the subgraph in induced by L(T(v)) must be connected.
It remains to show that L(T(v)) is a connected component in and thus, that there are no edges ab in with and . Assume, for contradiction, that there exists such an edge ab. Hence, this edge must be supported by an informative triple w.l.o.g. with and . Lemma 1 implies that must be displayed by T. However, implies that such a triple cannot exist. Thus, L(T(v)) is a connected component in .
The least resolved tree of an n-cBMG therefore coincides with the Aho tree of its informative triples. In more detail, we have
Proposition 1
If is an n-cBMG, then is the unique least resolved tree for .
Proof
Since is an n-cBMG, Lemma 1 implies that there is a tree displaying all triples in . In particular, therefore, exists. Moreover, there must be a least resolved tree for . To see this, consider an arbitrary tree that explains , and repeatedly identify and contract a redundant edge until no redundant edges remain. By definition, the resulting tree still explains and is least resolved. By Lemma 5 and by construction of , any least resolved tree for coincides with the latter. The uniqueness of therefore implies that the least resolved tree is also unique.
We now have all the pieces in place to complete the proof of the main result:
Proof of Theorem 1
If is an n-cBMG, then Proposition 1 implies that is its unique least resolved tree, and thus . Conversely, is an n-cBMG.
None of the intermediate results used to prove Theorem 9 in Geiß et al. (2019) is used below in our proof of Theorem 1. However, to the best of our knowledge, all results in Geiß et al. (2019) with the exception of the aforementioned Lemmas 9 and 11, and Thms. 4 and 9 are correct as stated. It is worth noting, furthermore, that Theorem 1 immediately implies Thms. 5, 6, and 7, as well as the existence of a unique least resolved tree in Thms. 2 and 8 of Geiß et al. (2019). In particular, Theorem 1 allows us to obtain the least resolved tree of an n-cBMG without the need to explicitly construct the least resolved trees of all its 2-colored induced subgraphs.
To prove the correctness of the amended version of Theorem 9, it only remains to show
Proposition 2
If is a n-cBMG, then .
Proof
For brevity set . From Eq. (1), i.e., , we immediately have for every inner vertex u of T. Moreover, by Theorem 1, with is the least resolved tree that explains .
Hence, we can apply the same arguments as in the proof of Lemma 5 to conclude that L(T(v)) forms a connected component in the Aho-graph for every inner vertex u and each of its children . More precisely, note that connectedness of any such L(T(v)) is guaranteed by the informative triples. Now assume, for contradiction, that there is an edge ab in with and connecting L(T(v)) and for some child . In this case, there is a triple and thus, and . By definition of R and Observation 4 in Geiß et al. (2019), ab|c must be displayed by T. However, and imply that ab|c is not displayed by T; a contradiction. Therefore, , which completes the proof.
For completeness, we show that conditions (i) and (ii) of Theorem 9 ensure that and are subgraphs of .
Proposition 3
Let be a properly n-colored digraph with all 2-colored induced subgraphs being 2-cBMGs. Then the following two statements hold:
If is consistent, then .
If R is consistent, then .
Proof
We set . Since is defined on V(G), we have and . Now assume, for contradiction, that there is an arc such that . By assumption, the induced subgraph of , where and , is a 2-cBMG and thus sink-free. Therefore, there must be a vertex of color with . Hence, is informative for and contained in . In particular, must be displayed by ; contradicting that ab is an arc in . Hence, statement (i) is true.
Statement (ii) can be shown using Eq. (1), i.e., , and arguments similar to the previous paragraph.
Consequences for the algorithms
Finally, we discuss the consequences of the corrections for the algorithmic aspects outlined in Section 5 of Geiß et al. (2019).
Algorithm 2 constructs the least resolved tree for 2-cBMGs based on Theorem 4 in Geiß et al. (2019). It therefore requires a sink-free graph as input, or needs to be amended to check that its input satisfies condition (N4). This can be done trivially in O(|E|) time. The statements concerning its complexity, i.e., Lemmas 18 and 19, therefore are still correct.
Regarding the recognition of n-cBMGs, we have noted above that the consistency of the triple set R and the fact that all 2-colored induced subgraphs are 2-BMGs are not sufficient. Algorithm 1 of Geiß et al. (2019) therefore also needs to be corrected. By Theorem 1, it suffices to construct the tree and to check whether . On the other hand, it is no longer necessary to require connectedness of the input graph. We therefore obtain a considerably simpler procedure, see Alg. 1.
The same arguments as in Geiß et al. (2019) show that can be constructed in time using the algorithm by Deng and Fernández-Baca (2018). The construction of can then be achieved in time e.g. using Algorithm 1 of the Supplement of Geiß et al. (2020). The equality can be checked in operations. The total effort therefore remains dominated by the construction of the least resolved tree T.
We note that Algorithm 3 in Geiß et al. (2019) is essentially the simplified Algorithm 1 above with its input restricted to 2-colored connected digraphs. Its correctness therefore follows immediately from Theorem 1.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
David Schaller, Email: sdavid@bioinf.uni-leipzig.de.
Manuela Geiß, Email: manuela.geiss@scch.at.
Edgar Chávez, Email: echavezaparicio@gmail.com.
Marcos González Laffitte, Email: marcoslaffitte@im.unam.mx.
Alitzel López Sánchez, Email: Alitzel.Lopez.Sanchez@USherbrooke.ca.
Bärbel M. R. Stadler, Email: baer@bioinf.uni-leipzig.de
Dulce I. Valdivia, Email: dulce.valdivia@cinvestav.mx
Marc Hellmuth, Email: mhellmuth@mailbox.org.
Maribel Hernández Rosales, Email: maribel.hr@cinvestav.mx.
Peter F. Stadler, Email: studla@bioinf.uni-leipzig.de.
References
- Bryant D, Steel M. Extension operations on sets of leaf-labeled trees. Adv Appl Math. 1995;16:425–453. doi: 10.1006/aama.1995.1020. [DOI] [Google Scholar]
- Deng Y, Fernández-Baca D. Fast compatibility testing for rooted phylogenetic trees. Algorithmica. 2018;80:2453–2477. doi: 10.1007/s00453-017-0330-4. [DOI] [Google Scholar]
- Geiß M, Chávez E, González Laffitte M, López Sánchez A, Stadler BMR, Valdivia DI, Hellmuth M, Hernández Rosales M, Stadler PF. Best match graphs. J Math Biol. 2019;78:2015–2057. doi: 10.1007/s00285-019-01332-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Geiß M, González Laffitte ME, López Sánchez A, Valdivia DI, Hellmuth M, Hernández Rosales M, Stadler PF. Best match graphs and reconciliation of gene trees with species trees. J Math Biol. 2020;80:1459–1495. doi: 10.1007/s00285-020-01469-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaller D, Geiß M, Stadler PF, Hellmuth M. Complete characterization of incorrect orthology assignments in best match graphs. J Math Biol. 2021 doi: 10.1007/s00285-021-01564-8. [DOI] [PMC free article] [PubMed] [Google Scholar]