Skip to main content
Springer logoLink to Springer
. 2021 Apr 5;82(6):47. doi: 10.1007/s00285-021-01601-6

Corrigendum to “Best match graphs”

David Schaller 1, Manuela Geiß 2, Edgar Chávez 3, Marcos González Laffitte 4, Alitzel López Sánchez 5, Bärbel M R Stadler 1, Dulce I Valdivia 6, Marc Hellmuth 7, Maribel Hernández Rosales 6, Peter F Stadler 1,8,9,10,11,12,
PMCID: PMC8021527  PMID: 33818665

Abstract

Two errors in the article Best Match Graphs (Geiß et al. in JMB 78: 2015–2057, 2019) are corrected. One concerns the tacit assumption that digraphs are sink-free, which has to be added as an additional precondition in Lemma 9, Lemma 11, Theorem 4. Correspondingly, Algorithm 2 requires that its input is sink-free. The second correction concerns an additional necessary condition in Theorem 9 required to characterize best match graphs. The amended results simplify the construction of least resolved trees for n-cBMGs, i.e., Algorithm 1. All other results remain unchanged and are correct as stated.

Keywords: Corrigendum, Phylogenetic Combinatorics, Colored digraph, Rooted triples

Best match graphs (BMGs) must be sink-free

Throughout Geiß et al. (2019) we have tacitly assumed that all vertex-colored digraphs (G,σ) satisfy the following property, which, by construction, is true for all colored best match graphs (cBMGs):

  • For each vertex x with color σ(x), there is an arc xy to at least one vertex of y every other color σ(y)σ(x).

All properly 2-colored digraphs appearing in the text are therefore assumed to be sink-free, i.e., the out-neighborhoods of their Inline graphic-classes are assumed to be non-empty:

  • (N4)

    N(α) for all αN.

This assumption was not clearly stated in the text.

Property (N4) is required in the proof of Lemma 9 [last line on page 2032]: Here R(β)R(α)= only implies βR(α) for the Inline graphic classes β with R(β)R(α) if R(β), which in turn is equivalent to N(β). Furthermore, N(α)= implies Q(α)=α and thus R(α)=α. Property (N4) is therefore also necessary to ensure that |R(α)|>1 and thus that the tree T(H) is phylogenetic [page 2035, just before Theorem 4]. In summary, Lemma 9, Lemma 11, and Theorem 4 require (N4) as additional precondition.

Corrected characterization of n-cBMGs

The second paragraph of the proof of Theorem 9 in (Geiß et al. 2019, page 2045) incorrectly states that “for any αNand any color sσ(α) the out-neighborhood Ns(α) is the same w.r.t. (Tst,σst) and w.r.t. Aho(R).”, leading to the incorrect conclusion that G(Aho(R),σ)=(G,σ) whenever Aho(R) exists. We recall that the triple set R is defined as the union

R:=s,tSr(Tst)

of all triples in the least resolved trees (Tst,σst) that explain the induced subgraphs (Gst,σst) of (G,σ), and the Aho tree Aho(R) is defined on the leaf set L=V(G) (which may not have been clear from the wording in the text). We shall show in Proposition 3 below that G(Aho(R),σ) is always a subgraph of (G,σ) whenever R is consistent. The example in Fig. 1 shows, however, that G(Aho(R),σ)(G,σ) is possible because Aho(R) can contain triples that are not present in any of the 2-colored trees (Tst,σst).

Fig. 1.

Fig.
1

Counterexample for the original version of Theorem 9. a A colored digraph with vertex set L that is not a 3-cBMG. b The least resolved subtrees for the three 2-colored induced subgraphs. The union of their triples is R:={a1b|a2,a1c1|a2,a1c1|c2,a2c2|a1,a2c2|c1}. c The Aho-graph [RL]. In particular, R forms a consistent set. d The tree T:=Aho(R). e The 3-cBMG G(T,σ). The arc bc2 that was present in (G,σ) is missing in G(T,σ)

As a consequence, the characterization of n-cBMGs requires the equality (G,σ)=G(Aho(R),σ) as an additional condition. The corrected result, with the correction underlined, reads as follows:

Theorem 9

A connected colored digraph (G,σ) is an n-cBMG if and only if (1) all induced subgraphs (Gst,σst) on two colors are 2-cBMGs, (2) the union R of all triples obtained from their least resolved trees (Tst,σst) forms a consistent set, and (3)G(Aho(R),σ)=(G,σ)_. In particular, (Aho(R),σ) is the unique least resolved tree that explains (G,σ).

Condition (1) in Theroem 9, i.e., the requirement that all 2-colored induced subgraphs (Gst,σst) of (G,σ) are 2-cBMGs, is necessary to ensure that the least resolved trees (Tst,σst) exist and thus that the triple sets r(Tst) – and therefore also the set R of all triples displayed by the 2-colored induced subgraphs – are well-defined. Consistency of R is necessary for the existence of Aho(R). Clearly, Condition (3) is sufficient to ensure that (G,σ) is an n-cBMG. Hence, it remains to show that Condition (3) is also necessary. This is achieved in Proposition 2 below.

Proof of theorem 9

Instead of proving the corrected version of Theorem 9 directly, we first state and prove a slightly stronger and more convenient result, Theorem 1 below, and then proceed to derive Theroem 9. To this end, we first generalize Definition 8 in (Geiß et al. (2019), page 2036) to digraphs with an arbitrary number of colors:

Definition 1

(Schaller et al. (2021), Definition 2.7) Let (G,σ) be a colored digraph. We say that a triple ab|b is informative for (G,σ) if a, b and b are pairwise distinct vertices in G such that (i) σ(a)σ(b)=σ(b) and (ii) abE(G) and abE(G). The set of informative triples is denoted by R(G,σ).

We briefly argue that, for 2-colored digraphs, the definition of informative triples given here is equivalent to the one given in Geiß et al. (2019): By definition, an informative triple of some colored digraph has vertices with exactly two colors, and thus is also an informative triple in one of its 2-colored induced subgraphs. It is easy to check that, for 2-colored digraphs, Definition 1 is equivalent to Definition 8 in Geiß et al. (2019), since the four induced subgraphs shown in Fig. 8 in (Geiß et al. (2019), page 2036) correspond to the presence or absence of the two optional arcs ba and ca in the informative triple ab|c (as defined here).

We will also make use of a generalization of Lemma 12 in (Geiß et al. (2019), page 2036):

Lemma 1

(Schaller et al. (2021), Lemma 2.8) Let (G,σ) be an n-cBMG and ab|b an informative triple for (G,σ). Then, every tree (T,σ) that explains (G,σ) displays the triple ab|b, i.e. lcaT(a,b)TlcaT(a,b)=lcaT(b,b).

Given a digraph (G,σ) for which R exists, Lemma 1 in particular implies that

R(G,σ)R. 1

With these preliminaries, we are ready to formulate our new main result as

Theorem 1

A colored digraph (G,σ) is an n-cBMG if and only if G(Aho(R(G,σ)),σ)=(G,σ). Moreover, Aho(R(G,σ)) is the unique least resolved tree explaining an n-cBMG (G,σ).

In order to prove Theorem 1, we will first provide several technical results that make use of the notion of (non-)redundant tree edges and, in particular, of least resolved trees. Recall that an inner edge e in a leaf-colored tree (T,σ) is redundant if the tree (Te,σ) obtained from T by contraction of e explains the same n-cBMG, i.e., if G(T,σ)=G(Te,σ). A tree (T,σ) is called least resolved if it does not contain any redundant edges. We will need the following, simplified, characterization of redundant edges:

Lemma 2

(Schaller et al. (2021), Lemma 2.10) Let (G,σ) be an n-cBMG explained by a tree (T,σ). The edge e=uv with vTu in (T,σ) is redundant w.r.t. (G,σ) if and only if (i) e is an inner edge of T and (ii) there is no arc abE(G) such that lcaT(a,b)=v and σ(b)σ(L(T(u))\L(T(v))).

We note that the proofs of Lemma 1 (Schaller et al. (2021), Lemma 2.8) and Lemma 2 (Schaller et al. (2021), Lemma 2.10) only require the definition of best match graphs, and are thus independent of the results proved in Geiß et al. (2019).

Following Bryant and Steel (1995), an inner edge e of a rooted tree T is distinguished by a triple ab|cr(T) if the path from a to c in T intersects the path from b to the root ρT precisely on the edge e. In other words, e=uv with vTu is distinguished by ab|c if lcaT(a,b)=v and lcaT(a,b,c)=u. Lemma 2 immediately implies the following generalization of Lemma 13 in Geiß et al. (2019):

Corollary 1

Let (G,σ) be an n-cBMG explained by a tree (T,σ). An inner edge e of (T,σ) is non-redundant w.r.t. (G,σ) if and only if it is distinguished by an informative triple ab|b for (G,σ). In particular, if (T,σ) is least resolved, then each of its inner edges is distinguished by an informative triple.

In addition, we will need the following two technical results relating subtrees and induced subgraphs of n-cBMGs.

Lemma 3

Let (T,σ) be a tree explaining an n-cBMG (G,σ). Then G(T(u),σ|L(T(u)))=(G[L(T(u))],σ|L(T(u))) holds for every uV(T).

Proof

Let (G1,σ):=GT(u),σ|L(T(u)) and (G2,σ):=(G[L(T(u))],σ|L(T(u))). By definition, we have V(G1)=V(G2)=L(T(u)). First assume that xyE(G1) for some x,yL(T(u)). Hence, it holds that lcaT(u)(x,y)T(u)lcaT(u)(x,y) for all y with σ(y)=σ(y) in T(u) and thus, since T(u) is a subtree of T, we have lcaT(x,y)TlcaT(x,y) for all y with σ(y)=σ(y) in T. Therefore, xyE(G). Since x,yL(T(u)) and G2 is the subgraph of G induced by L(T(u)), we have xyE(G2) and thus E(G1)E(G2). Now assume xyE(G2) for some x,yL(T(u)). Hence, xyE(G). Consequently, there is no leaf y in T with σ(y)=σ(y)σ(x) such that lcaT(x,y)TlcaT(x,y)Tu. This clearly also holds for the subtree T(u). Therefore, we have xyE(G1) and thus E(G2)E(G1).

Lemma 4

If (T,σ) is least resolved for an n-cBMG (G,σ), then the subtree T(u) is least resolved for the n-cBMG G(T(u),σ|L(T(u))) for each uV(T).

Proof

The statement is trivially satisfied if T(u) does not contain any inner edges, which is exactly the case if either uL(T) or uV0(T) with childT(u)L(T). Thus, let uV0(T) and childT(u)V0(T). Since (T,σ) is least resolved, it does not contain redundant edges. Let vw be an inner edge of T(u) with wTvTu, and note that vw must also be an inner edge in T. By Lemma 2 and since vw is not redundant in T, there is an arc abE(G) such that lcaT(a,b)=w and σ(b)σ(L(T(v))\L(T(w))). Since uTv, Lemma 3 implies that ab is also an arc in G(T(u),σ|L(T(u))) and lcaT(u)(a,b)=v. Hence, in particular, we have σ(b)σ(L(T(v))\L(T(w))). We can now apply Lemma 2 to conclude that vw is not redundant in T(u). Since vw was chosen arbitrarily, we conclude that T(u) does not contain any redundant edge and thus, it must be least resolved for G(T(u),σ|L(T(u))) for all uV(T).

We finally relate the subtrees T(u) to the construction of the Aho-graph as specified in (Geiß et al. (2019), Sec. 3.4). Given a set of triples R on L, we will write R|L for the set of triples ab|cR with a,b,cLL.

Lemma 5

Let (T,σ) be least resolved for an n-cBMG (G,σ) with informative triple set R:=R(G,σ). Then, L(T(v)) is a connected component in the Aho-graph [R|L(T(u)),L(T(u))] for every inner vertex u and each of its children vchildT(u).

Proof

We proceed by induction on L:=V(G). The statement trivially holds for |L|=1. Hence, suppose that |L|>1 and assume that the statement is true for every n-cBMG with less than |L| vertices.

Let u be an inner vertex of T and vchildT(u). We first show that L(T(v)) is connected in [R|L(T(u)),L(T(u))], and then argue that there are no edges between L(T(v)) and L(T(u))\L(T(v)), i.e., that L(T(v)) forms a connected component.

If uv is an outer edge, i.e. v is a leaf, then L(T(v)) is trivially connected. Now suppose that uv is an inner edge of T. By Lemmas 3 and 4, (G[L(T(v))],σ|L(T(v))) is explained by the least resolved tree (T(v),σ|L(T(v))). By the induction hypothesis, L(T(w)) forms a connected component in [R|L(T(v)),L(T(v))] for all children wchildT(v). Together with R|L(T(v))R|L(T(u)), this implies that the elements in L(T(w)) are also connected in [R|L(T(u)),L(T(u))] for all wchildT(v). Since uv is an inner edge of the least resolved tree (T,σ), we can apply Corollary 1 to conclude that there is an informative triple ab|b in (G,σ) that distinguishes uv, i.e. lcaT(a,b)=v and bL(T(u))\L(T(v)) with color σ(b)=σ(b). Hence, ab|b is also contained in [R|L(T(u)),L(T(u))]. In particular, there are children w,wchildT(v) such that aTw and bTw, and the edge ab connects L(T(w)) and L(T(w)) in [R|L(T(u)),L(T(u))].

Now suppose that there is an additional child wchildT(v)\{w,w}. We distinguish two cases. Either there is a leaf bTw with σ(b)=σ(b) or no such leaf exists. If there is such a leaf b, then ab forms an arc in (G,σ) and ab|b is an informative triple making L(T(w)) and L(T(w)) connected in [R|L(T(u)),L(T(u))]. Otherwise, take an arbitrary leaf cTw. Since σ(b)σ(L(T(w))), we have σ(c)σ(b) and thus, there is an arc cb in (G,σ). Since lcaT(c,b)=uTv=lcaT(c,b), the arc cb is not contained in (G,σ). Hence, cb|b is an informative triple making L(T(w)) and L(T(w)) connected in [R|L(T(u)),L(T(u))].

Therefore, the subgraph in [R|L(T(u)),L(T(u))] induced by L(T(v)) must be connected.

It remains to show that L(T(v)) is a connected component in [R|L(T(u)),L(T(u))] and thus, that there are no edges ab in [R|L(T(u)),L(T(u))] with aL(T(v)) and bL(T(u))\L(T(v)). Assume, for contradiction, that there exists such an edge ab. Hence, this edge must be supported by an informative triple w.l.o.g. ab|b with σ(a)σ(b)=σ(b) and bL(T(u)). Lemma  1 implies that ab|b must be displayed by T. However, lcaT(a,b)=u=lcaT(a,b,b) implies that such a triple cannot exist. Thus, L(T(v)) is a connected component in [R|L(T(u)),L(T(u))].

The least resolved tree of an n-cBMG therefore coincides with the Aho tree of its informative triples. In more detail, we have

Proposition 1

If (G,σ) is an n-cBMG, then (Aho(R(G,σ)),σ) is the unique least resolved tree for (G,σ).

Proof

Since (G,σ) is an n-cBMG, Lemma 1 implies that there is a tree displaying all triples in R(G,σ). In particular, therefore, Aho(R(G,σ)) exists. Moreover, there must be a least resolved tree (T,σ) for (G,σ). To see this, consider an arbitrary tree (T,σ) that explains (G,σ), and repeatedly identify and contract a redundant edge until no redundant edges remain. By definition, the resulting tree still explains (G,σ) and is least resolved. By Lemma 5 and by construction of (Aho(R(G,σ)),σ), any least resolved tree (T,σ) for (G,σ) coincides with the latter. The uniqueness of Aho(R(G,σ)) therefore implies that the least resolved tree is also unique.

We now have all the pieces in place to complete the proof of the main result:

Proof of Theorem 1

If (G,σ) is an n-cBMG, then Proposition 1 implies that (Aho(R(G,σ)),σ) is its unique least resolved tree, and thus G(Aho(R(G,σ)),σ)=(G,σ). Conversely, G(Aho(R(G,σ)),σ) is an n-cBMG.

None of the intermediate results used to prove Theorem 9 in Geiß et al. (2019) is used below in our proof of Theorem 1. However, to the best of our knowledge, all results in Geiß et al. (2019) with the exception of the aforementioned Lemmas 9 and 11, and Thms. 4 and 9 are correct as stated. It is worth noting, furthermore, that Theorem 1 immediately implies Thms. 5, 6, and 7, as well as the existence of a unique least resolved tree in Thms. 2 and 8 of Geiß et al. (2019). In particular, Theorem 1 allows us to obtain the least resolved tree of an n-cBMG without the need to explicitly construct the least resolved trees of all its 2-colored induced subgraphs.

To prove the correctness of the amended version of Theorem 9, it only remains to show

Proposition 2

If (G,σ) is a n-cBMG, then Aho(R(G,σ))=Aho(R).

Proof

For brevity set R:=R(G,σ). From Eq. (1), i.e., RR, we immediately have R|L(T(u))R|L(T(u)) for every inner vertex u of T. Moreover, by Theorem 1, (T,σ) with T:=Aho(R) is the least resolved tree that explains (G,σ).

Hence, we can apply the same arguments as in the proof of Lemma 5 to conclude that L(T(v)) forms a connected component in the Aho-graph [R|L(T(u)),L(T(u))] for every inner vertex u and each of its children vchildT(u). More precisely, note that connectedness of any such L(T(v)) is guaranteed by the informative triples. Now assume, for contradiction, that there is an edge ab in [R|L(T(u)),L(T(u))] with aL(T(v)) and bL(T(u))\L(T(v)) connecting L(T(v)) and L(T(v)) for some child vchildT(u)\{v}. In this case, there is a triple ab|cR|L(T(u)) and thus, a,b,cL(T(u)) and lcaT(a,b,c)=u. By definition of R and Observation 4 in Geiß et al. (2019), ab|c must be displayed by T. However, a,b,cL(T(u)) and lcaT(a,b)=u=lcaT(a,b,c) imply that ab|c is not displayed by T; a contradiction. Therefore, (T,σ)=(Aho(R),σ), which completes the proof.

For completeness, we show that conditions (i) and (ii) of Theorem 9 ensure that G(Aho(R),σ) and G(Aho(R(G,σ)),σ) are subgraphs of (G,σ).

Proposition 3

Let (G,σ) be a properly n-colored digraph with all 2-colored induced subgraphs being 2-cBMGs. Then the following two statements hold:

  1. If R(G,σ) is consistent, then G(Aho(R(G,σ)),σ)(G,σ).

  2. If R is consistent, then G(Aho(R),σ)(G,σ).

Proof

We set (G,σ):=G(Aho(R(G,σ)),σ). Since Aho(R(G,σ)) is defined on V(G), we have V(G)=V(G) and σ=σ. Now assume, for contradiction, that there is an arc abE(G) such that abE(G). By assumption, the induced subgraph (Gst,σst) of (G,σ), where s=σ(a) and t=σ(b), is a 2-cBMG and thus sink-free. Therefore, there must be a vertex b of color σ(b) with abE(G). Hence, ab|b is informative for (G,σ) and contained in R(G,σ). In particular, ab|b must be displayed by Aho(R(G,σ)); contradicting that ab is an arc in (G,σ). Hence, statement (i) is true.

Statement (ii) can be shown using Eq. (1), i.e., R(G,σ)R, and arguments similar to the previous paragraph.

Consequences for the algorithms

Finally, we discuss the consequences of the corrections for the algorithmic aspects outlined in Section 5 of Geiß et al. (2019).

Algorithm 2 constructs the least resolved tree for 2-cBMGs based on Theorem 4 in Geiß et al. (2019). It therefore requires a sink-free graph as input, or needs to be amended to check that its input satisfies condition (N4). This can be done trivially in O(|E|) time. The statements concerning its complexity, i.e., Lemmas 18 and 19, therefore are still correct.

Regarding the recognition of n-cBMGs, we have noted above that the consistency of the triple set R and the fact that all 2-colored induced subgraphs are 2-BMGs are not sufficient. Algorithm 1 of Geiß et al. (2019) therefore also needs to be corrected. By Theorem 1, it suffices to construct the tree T:=Aho(R(G,σ)) and to check whether G(T,σ)=(G,σ). On the other hand, it is no longer necessary to require connectedness of the input graph. We therefore obtain a considerably simpler procedure, see Alg. 1.graphic file with name 285_2021_1601_Figa_HTML.jpg

The same arguments as in Geiß et al. (2019) show that T=Aho(R(G,σ)) can be constructed in O(|E||L|log2(|E||L|))=O(|E||L|log2|L|) time using the algorithm by Deng and Fernández-Baca (2018). The construction of G(T,σ) can then be achieved in O(|L|2) time e.g. using Algorithm 1 of the Supplement of Geiß et al. (2020). The equality G(T,σ)=(G,σ) can be checked in O(|L|2) operations. The total effort therefore remains dominated by the construction of the least resolved tree T.

We note that Algorithm 3 in Geiß et al. (2019) is essentially the simplified Algorithm 1 above with its input restricted to 2-colored connected digraphs. Its correctness therefore follows immediately from Theorem 1.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

David Schaller, Email: sdavid@bioinf.uni-leipzig.de.

Manuela Geiß, Email: manuela.geiss@scch.at.

Edgar Chávez, Email: echavezaparicio@gmail.com.

Marcos González Laffitte, Email: marcoslaffitte@im.unam.mx.

Alitzel López Sánchez, Email: Alitzel.Lopez.Sanchez@USherbrooke.ca.

Bärbel M. R. Stadler, Email: baer@bioinf.uni-leipzig.de

Dulce I. Valdivia, Email: dulce.valdivia@cinvestav.mx

Marc Hellmuth, Email: mhellmuth@mailbox.org.

Maribel Hernández Rosales, Email: maribel.hr@cinvestav.mx.

Peter F. Stadler, Email: studla@bioinf.uni-leipzig.de.

References

  1. Bryant D, Steel M. Extension operations on sets of leaf-labeled trees. Adv Appl Math. 1995;16:425–453. doi: 10.1006/aama.1995.1020. [DOI] [Google Scholar]
  2. Deng Y, Fernández-Baca D. Fast compatibility testing for rooted phylogenetic trees. Algorithmica. 2018;80:2453–2477. doi: 10.1007/s00453-017-0330-4. [DOI] [Google Scholar]
  3. Geiß M, Chávez E, González Laffitte M, López Sánchez A, Stadler BMR, Valdivia DI, Hellmuth M, Hernández Rosales M, Stadler PF. Best match graphs. J Math Biol. 2019;78:2015–2057. doi: 10.1007/s00285-019-01332-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Geiß M, González Laffitte ME, López Sánchez A, Valdivia DI, Hellmuth M, Hernández Rosales M, Stadler PF. Best match graphs and reconciliation of gene trees with species trees. J Math Biol. 2020;80:1459–1495. doi: 10.1007/s00285-020-01469-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Schaller D, Geiß M, Stadler PF, Hellmuth M. Complete characterization of incorrect orthology assignments in best match graphs. J Math Biol. 2021 doi: 10.1007/s00285-021-01564-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Mathematical Biology are provided here courtesy of Springer

RESOURCES