Skip to main content
Algorithms for Molecular Biology : AMB logoLink to Algorithms for Molecular Biology : AMB
. 2018 Feb 6;13:2. doi: 10.1186/s13015-018-0121-8

Time-consistent reconciliation maps and forbidden time travel

Nikolai Nøjgaard 1,2, Manuela Geiß 5, Daniel Merkle 2, Peter F Stadler 5,6,7,8,9,10,11, Nicolas Wieseke 3, Marc Hellmuth 1,4,
PMCID: PMC5800358  PMID: 29441122

Abstract

Background

In the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to event-labeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S, relative to the reconciliation problem without prior knowledge of the event types. It is well-known that optimal reconciliations in the unlabeled case may violate time-consistency and thus are not biologically feasible. Here we investigate the mathematical structure of the event labeled reconciliation problem with horizontal transfer.

Results

We investigate the issue of time-consistency for the event-labeled version of the reconciliation problem, provide a convenient axiomatic framework, and derive a complete characterization of time-consistent reconciliations. This characterization depends on certain weak conditions on the event-labeled gene trees that reflect conditions under which evolutionary events are observable at least in principle. We give an O(|V(T)|log(|V(S)|))-time algorithm to decide whether a time-consistent reconciliation map exists. It does not require the construction of explicit timing maps, but relies entirely on the comparably easy task of checking whether a small auxiliary graph is acyclic. The algorithms are implemented in C++ using the boost graph library and are freely available at https://github.com/Nojgaard/tc-recon.

Significance

The combinatorial characterization of time consistency and thus biologically feasible reconciliation is an important step towards the inference of gene family histories with horizontal transfer from orthology data, i.e., without presupposed gene and species trees. The fast algorithm to decide time consistency is useful in a broader context because it constitutes an attractive component for all tools that address tree reconciliation problems.

Keywords: Tree reconciliation, Horizontal gene transfer, Reconciliation map, Time-consistency, History of gene families

Background

Modern molecular biology describes the evolution of species in terms of the evolution of the genes that collectively form an organism’s genome. In this picture, genes are viewed as atomic units whose evolutionary history by definition forms a tree. The phylogeny of species also forms a tree. This species tree is either interpreted as a consensus of the gene trees or it is inferred from other data. An interesting formal manner to define a species tree independent of genes and genetic data is discussed, e.g. in [1].

In this contribution, we assume that gene and species trees are given independently of each other. The relationship between gene and species evolution is therefore given by a reconciliation map that describes how the gene tree is embedded in the species tree: after all, genes reside in organisms, and thus at each point in time can be assigned to a species.

From a formal point of view, a reconciliation map μ identifies vertices of a gene tree with vertices and edges in the species tree in such a way that (partial) ancestor relations given by the genes are preserved by μ. Vertices in the species tree correspond to speciation events. By definition, in a speciation event all genes are faithfully transmitted from the parent species into both (all) daughter species. Some of the vertices in the gene tree therefore correspond to speciation events. In gene duplications, two copies of a gene are formed from a single ancestral gene and then keep residing in the same species. In horizontal gene transfer (HGT) events, the original remains within the parental species, while the offspring copy “jumps” into a different branch of the species tree. Given a gene tree with event types assigned to its interior vertices, it is customary to define pairwise relations between genes depending on the event type of their last common ancestor [24].

Most of the literature on this topic assumes that both the gene tree and the species tree are known but no information is available of the type of events [58]. The aim is then to find a mapping of the gene tree T into the species tree S and, at least implicitly, an event-labeling on the vertices of the gene tree T. Here we take a different point of view and assume that T and the types of evolutionary events on T are known. This setting has ample practical relevance because event-labeled gene trees can be derived from the pairwise orthology relation [4, 9]. These relations in turn can be estimated directly from sequence data using a variety of algorithmic approaches that are based on the pairwise best match criterion and hence do not require any a priori knowledge of the topology of either the gene tree or the species tree, see e.g. [1013].

Genes that share a common origin (homologs) can be classified into orthologs, paralogs, and xenologs depending whether they originated by a speciation, duplication or horizontal gene transfer (HGT) event [2, 4]. Recent advances in mathematical phylogenetics [9, 14] have shown that the knowledge of these event-relations (orthologs, paralogs and xenologs) suffices to construct event-labeled gene trees and, in some case, also a species tree [3, 15, 16].

Conceptually, both the gene tree and species tree are associated with a timing of each event. Reconciliation maps must preserve this timing information because there are biologically infeasible event labeled gene trees that cannot be reconciled with any species tree. In the absence of HGT, biologically feasibility can be characterized in terms of certain triples (rooted binary trees on three leaves) that are displayed by the gene trees [16]. In the presence of HGT such triples give at least necessary conditions for a gene tree being biologically feasible [15]. In particular, the timing information must be taken into account explicitly in the presence of HGT. That is, gene trees with HGT that must be mapped to species trees only in such a way that some genes do not travel back in time.

There have been several attempts in the literature to handle this issue, see e.g. [17] for a review. In [18, 19] a single HGT adds timing constraints to a time map for a reconciliation to be found. Time-consistency is then defined as the existence of a topological order of the digraph reflecting all the time constraints. In [20] NP-hardness was shown for finding a parsimonious time-consistent reconciliation based on a definition for time-consistency that in essence considers pairs of HGTs. However, the latter definitions are explicitly designed for binary gene trees and do not apply to non-binary gene trees, which are used here to model incomplete knowledge of the exact gene phylogenies. Different algorithmic approaches for tackling time-consistency exist [17] such as the inclusion of “time-zones” known for specific evolutionary events. It is worth noting that a posteriori modifications of time-inconsistent solutions will in general violate parsimony [18]. So far, no results have become available to determine the existence of time-consistent reconciliation maps given the (undated) species tree and the event-labeled gene tree.

Here, we introduce an axiomatic framework for time-consistent reconciliation maps and characterize for given event-labeled gene trees T and species trees S whether there exists a time-consistent reconciliation map. We provide an O(|V(T)|log(|V(S)|))-time algorithm that constructs a time-consistent reconciliation map if one exists.

Notation and preliminaries

We consider rooted trees T=(V,E)(onLT) with root ρTV and leaf set LTV. A vertex vV is called a descendant of uV, vTu, and u is an ancestor of v, uTv, if u lies on the path from ρT to v. As usual, we write vTu and uTv to mean vTu and uv. The partial order T is known as the ancestor order of T; the root is the unique maximal element w.r.t T. If uTv or vTu then u and v are comparable and otherwise, incomparable. We consider edges of rooted trees to be directed away from the root, that is, the notation for edges (uv) of a tree is chosen such that uTv. If (uv) is an edge in T, then u is called parent of v and v child of u. It will be convenient for the discussion below to extend the ancestor relation T on V to the union of the edge and vertex sets of T. More precisely, for the edge e=(u,v)E we put xTe if and only if xTv and eTx if and only if uTx. For edges e=(u,v) and f=(a,b) in T we put eTf if and only if vTb. For xV, we write LT(x):={yLTyTx} for the set of leaves in the subtree T(x) of T rooted in x.

For a non-empty subset of leaves AL, we define lcaT(A), or the least common ancestor of A, to be the unique T-minimal vertex of T that is an ancestor of every vertex in A. In case A={u,v}, we put lcaT(u,v):=lcaT({u,v}). We have in particular u=lcaT(LT(u)) for all uV. We will also frequently use that for any two non-empty vertex sets AB of a tree, it holds that lca(AB)=lca(lca(A),lca(B)).

A phylogenetic tree is a rooted tree such that no interior vertex in vV\LT has degree two, except possibly the root. If LT corresponds to a set of genes G or species S, we call a phylogenetic tree on LT gene tree or species tree, respectively. In this contribution we will not restrict the gene or species trees to be binary, although this assumption is made implicitly or explicitly in much of the literature on the topic. The more general setting allows us to model incomplete knowledge of the exact gene or species phylogenies. Of course, all mathematical results proved here also hold for the special case of binary phylogenetic trees.

In our setting a gene tree T=(V,E) on G is equipped with an event-labeling map t:VEI{0,1} with I={,,,} that assigns to each interior vertex v of T a value t(v)I indicating whether v is a speciation event (), duplication event () or HGT event (). It is convenient to use the special label for the leaves x of T. Moreover, to each edge e a value t(e){0,1} is added that indicates whether e is a transfer edge (1) or not (0). Note, only edges (xy) for which t(x)= might be labeled as transfer edge. We write E={eEt(e)=1} for the set of transfer edges in T. We assume here that all edges labeled “0” transmit the genetic material vertically, that is, from an ancestral species to its descendants.

We remark that the restriction t|V of t to the vertex set V coincides with the “symbolic dating maps” introduced in [21]; these have a close relationship with cographs [14, 22, 23]. Furthermore, there is a map σ:GS that assigns to each gene the species in which it resides. The set σ(M), MG, is the set of species from which the genes M are taken. We write (T;t,σ) for the gene tree T=(V,E) with event-labeling t and corresponding map σ.

Removal of the transfer edges from (T;t,σ) yields a forest TE¯:=(V,E\E) that inherits the ancestor order on its connected components, i.e., TE¯ iff xTy and xy are in same subtree of TE¯ [20]. Clearly TE¯ uniquely defines a root for each subtree and the set of descendant leaf nodes LTE¯(x).

In order to account for duplication events that occurred before the first speciation event, we need to add an extra vertex and an extra edge “above” the last common ancestor of all species in the species tree S=(V,E). Hence, we add an additional vertex to V (that is now the new root ρS of S) and the additional edge (ρS,lcaS(S)) to E. Strictly speaking S is not a phylogenetic tree in the usual sense, however, it will be convenient to work with these augmented trees. For simplicity, we omit drawing the augmenting edge (ρS,lcaS(S)) in our examples.

Observable scenarios

The true history of a gene family, as it is considered here, is an arbitrary sequence of speciation, duplication, HGT, and gene loss events. The applications we envision for the theory developed, here, however assume that the gene tree and its event labels are inferred from (sequence) data, i.e., (T;t,σ) is restricted to those labeled trees that can be constructed at least in principle from observable data. The issue here are gene losses that may completely eradicate the information on parts of the history. Specifically, we require that (T;t,σ) satisfies the following three conditions:

  • (O1) Every internal vertex v has degree at least 3, except possibly the root which has degree at least 2.

  • (O2) Every HGT node has at least one transfer edge, t(e)=1, and at least one non-transfer edge, t(e)=0;

  • (O3)
    • (a) If x is a speciation vertex, then there are at least two distinct children vw of x such that the species V and W that contain v and w, resp., are incomparable in S.
    • (b) If (vw) is a transfer edge in T, then the species V and W that contain v and w, resp., are incomparable in S.

Condition (O1) ensures that every event leaves a historical trace in the sense that there are at least two children that have survived in at least two of its subtrees. If this were not the case, no evidence would be left for all but one descendant tree, i.e., we would have no evidence that event v ever happened. We note that this condition was used, e.g. in [16] for scenarios without HGT. Condition (O2) ensures that for an HGT event a historical trace remains of both the transferred and the non-transferred copy. If there is no transfer edge, we have no evidence to classify v as a HGT node. Conversely, if all edges were transfers, no evidence of the lineage of origin would be available and any reasonable inference of the gene tree from data would assume that the gene family was vertically transmitted in at least one of the lineages in which it is observed. In particular, Condition (O2) implies that for each internal vertex there is a path consisting entirely of non-transfer edges to some leaf. This excludes in particular scenarios in which a gene is transferred to a different “host” and later reverts back to descendants of the original lineage without any surviving offspring in the intermittent host lineage. Furthermore, a speciation vertex x cannot be observed from data if it does not “separate” lineages, that is, there are two leaf descendants of distinct children of x that are in distinct species. However, here we only assume to have the weaker Condition (O3.a) which ensures that any “observable” speciation vertex x separates at least locally two lineages. In other words, if all children of x would be contained in species that are comparable in S or, equivalently, in the same lineage of S, then there is no clear historical trace that justifies x to be a speciation vertex. In particular, most-likely there are two leaf descendants of distinct children of x that are in the same species even if only TE¯ is considered. Hence, x would rather be classified as a duplication than as a speciation upon inference of the event labels from actual data. Analogously, if (v,w)E then v signifies the transfer event itself but w refers to the next (visible) event in the gene tree T. Given that (vw) is a HGT-edge in the observable part, in a “true history” v is contained in a species V that transmits its genetic material (maybe along a path of transfers) to a contemporary species Z that is an ancestor of the species W containing w. Clearly, the latter allows to have VSW which happens if the path of transfers points back to the descendant lineage of V in S. In this case the transfer edge (vw) must be placed in the species tree such that μ(v) and μ(w) are comparable in S. However, then there is no evidence that this transfer ever happened, and thus v would be rather classified as speciation or duplication vertex.

Assuming that (O2) is satisfied, we obtain the following useful result:

Lemma 1

Let T1,,Tk be the connected components of TE¯ with roots ρ1,,ρk, respectively. If (O2) holds, then, {LTE¯(ρ1),,LTE¯(ρk)} forms a partition of G.

Proof

Since LTE¯(ρi)V(T), it suffices to show that LTE¯(ρi) does not contain vertices of V(T)\G. Note, xLTE¯(ρi) with xG is only possible if all edges (xy) are removed.

Let xV with t(x)= such that all edges (xy) are removed. Thus, all such edges (xy) are contained in E. Therefore, every edge of the form (xy) is a transfer edge; a contradiction to (O2).

We will show in Proposition 1 that (O1), (O2), and (O3) together imply two important properties of event labeled species trees, (Σ1) and (Σ2), which play a crucial role for the results reported here.

  • (Σ1) If t(x)=, then there are distinct children v, w of x in T such that σ(LTE¯(v))σ(LTE¯(w))=.

  • (Σ2) If (v,w)E, then σ(LTE¯(v))σ(LTE¯(w))=.

Intuitively, (Σ1) is true because within a component TE¯ no genetic material is exchanged between non-comparable nodes. Thus, a gene separated in a speciation event necessarily ends up in distinct species in the absence of horizontal transfer. It is important to note that we do not require the converse: σ(LTE¯(y))σ(LTE¯(y))= does not imply t(lcaT(LTE¯(y)LTE¯(y))=, that is, the last common ancestor of two sets of genes from different species is not necessarily a speciation vertex.

Now consider a transfer edge (v,w)E, i.e., t(v)=. Then TE¯(v) and TE¯(w) are subtrees of distinct connected components of TE¯. Since HGT amounts to the transfer of genetic material across distinct species, the genes v and w must be contained in distinct species X and Y, respectively. Since no genetic material is transferred between contemporary species X and Y in TE¯, where X and Y is a descendant of X and Y, respectively we derive (Σ1).

Proposition 1

Conditions (O1)(O3) imply (Σ1) and (Σ2).

Proof

Since (O2) is satisfied we can apply Lemma 1 and conclude that neither σ(LTE¯(v))= nor σ(LTE¯(w))=. Let xV(T) with t(x)=. By Condition (O1) x has (at least two) children. Moreover, (O3) implies that there are (at least) two children v and w in T that are contained in distinct species V and W that are incomparable in S. Note, the edges (xv) and (xw) remain in TE¯, since only transfer edges are removed. Since no transfer is contained in TE¯, the genetic material v and w of V and W, respectively, is always vertically transmitted. Therefore, for any leaf vLTE¯(v) we have σ(v)SV and for any leaf wLTE¯(w) we have σ(w)SW in S. Assume now for contradiction, that σ(LTE¯(v))σ(LTE¯(w)). Let z1LTE¯(v) and z2LTE¯(w) with σ(z1)=σ(z2)=Z. Since ZSV,W and S is a tree, the species V and W must be comparable in S; a contradiction to (O3). Hence, Condition (Σ1) is satisfied.

To see (Σ2), note that since (O2) is satisfied we can apply Lemma 1 and conclude that neither σ(LTE¯(v))= nor σ(LTE¯(w))=. Let (v,w)E. By (O3) the species containing V and W are incomparable in S. Now we can argue along the same lines as in the proof for (Σ2) to conclude that σ(LTE¯(v))σ(LTE¯(w))=.

From here on we simplify the notation a bit and write σTE¯(u):=σ(LTE¯(u)). We are aware of the fact that condition (O3) cannot be checked directly for a given event-labeled gene tree. In contrast, (Σ1) and (Σ2) are easily determined. Hence, in the remainder of this paper we consider the more general case, that is, gene trees that satisfy (O1), (O2), (Σ1), and (Σ1).

DTL-scenario and time-consistent reconciliation maps

In case that the event-labeling of T is unknown, but the gene tree T and a species tree S are given, the authors in [20, 24] provide an axiom set, called DTL-scenario, to reconcile T with S. This reconciliation is then used to infer the event-labeling t of T. Instead of defining a DTL-scenario as octuple [20, 24], we use here the notation established above:

Definition 1

(DTL-scenario) For a given gene tree (T;t,σ) on G and a species tree S on S the map γ:V(T)V(S) maps the gene tree into the species tree such that

  • (I) For each leaf xG, γ(u)=σ(u).

  • (II) If uV(T)\G with children vw, then
    1. γ(u) is not a proper descendant of γ(v) or γ(w), and
    2. At least one of γ(v) or γ(w) is a descendant of γ(u).
  • (III) (uv) is a transfer edge if and only if γ(u) and γ(v) are incomparable.

  • (IV) If uV(T)\G with children vw, then
    1. t(u)= if and only if either (uv) or (uw) is a transfer-edge,
    2. If t(u)=, then γ(u)=lcaS(γ(v),γ(w)) and γ(v),γ(w) are incomparable,
    3. If t(u)=, then γ(u)lcaS(γ(v),γ(w)).

DTL-scenarios are explicitly defined for fully resolved binary gene and species trees. Indeed, Fig. 1 (right) shows a valid reconciliation between a gene tree T and a species tree S that is not consistent with DTL-scenario. To see this, let us call the duplication vertex v. The vertex v and the leaf a are both children of the speciation vertex ρT. Condition (IVb) implies that a and v must be incomparable. However, this is not possible since γ(v)SlcaS(B,C) (Cond. (IVc)) and γ(a)=A (Cond. (I)) and therefore, γ(v)SlcaS(B,C)=lcaS(A,B,C)Sγ(a).

Fig. 1.

Fig. 1

Left: A “true” evolutionary scenario for a gene tree with leaf set G evolving along the tube-like species trees is shown. The symbol “x” denotes losses. All speciations along the path from the root ρT to the leaf a are followed by losses and we omit drawing them. Middle: The observable gene tree is shown in the upper-left. The orthology graph G=(G,E) (edges are placed between genes xy for which t(lca(x,y))=) is drawn in the lower part. This graph is a cograph and the corresponding non-binary gene tree T on G that can be constructed from such data is given in the upper-right part (cf. [3, 4, 14] for further details). Right: Shown is species trees S on S=σ(G) with reconciled gene tree T. The reconciliation map μ for T and S is given implicitly by drawing the gene tree T within S. Note, this reconciliation is not consistent with DTL-scenarios [20, 24]. A DTL-scenario would require that the duplication vertex and the leaf a are incomparable in S. Note, a non-binary duplication or HGT vertex v can always be “binary resolved” such that the newly created vertices are placed on the same edge μ(v) as v. However, there are cases that show that non-binary speciation vertices cannot be “binary resolved”. For instance, for the non-binary gene tree T there is no way to resolve its root without violating the conditions of a reconciliation map (cf. [15, Fig. 3]). Yet, such cases strongly imply that the speciation event must have been followed by (several) duplication/HGT events that are not observable due to losses

The problem of reconciliations between gene trees and species tree is formalized in terms of so-called DTL-scenarios in the literature [20, 24]. This framework, however, usually assumes that the event labels t on T are unknown, while a species tree S is given. The “usual” DTL axioms, furthermore, explicitly refer to binary, fully resolved gene and species trees. We therefore use a different axiom set here that is a natural generalization of the framework introduced in [16] for the HGT-free case:

Definition 2

Let T=(V,E) and S=(W,F) be phylogenetic trees on G and S, resp., σ:GS the assignment of genes to species and t:VE{,,,}{0,1} an event labeling on T. A map μ:VWF is a reconciliation map if for all vV it holds that:

  • (M1) Leaf Constraint. If t(v)=, then μ(v)=σ(v).

  • (M2) Event Constraint.
    • (i)
      If t(v)=, then μ(v)=lcaS(σTE¯(v)).
    • (ii)
      If t(v){,}, then μ(v)F.
    • (iii)
      If t(v)= and (v,w)E, then μ(v) and μ(w) are incomparable in S.
  • (M3) Ancestor Constraint.

  • Suppose v,wV with vTE¯w.
    • (i) If t(v),t(w){,}, then μ(v)Sμ(w),
    • (ii) Otherwise, i.e., at least one of t(v) and t(w) is a speciation , μ(v)Sμ(w).

We say that S is a species tree for (T;t,σ) if a reconciliation map μ:VWF exists.

For the special case that gene and species trees are binary, Definition 2 is equivalent to the definition of a DTL-scenario, which is summarized in the following

Theorem 1

For a binary gene tree (T;t,σ) and a binary species tree S there is a DTL-scenario if and only if there is a reconciliation μ for (T;t,σ) and S.

The proof of Theorem 1 is a straightforward but tedious case-by-case analysis. In order to keep this section readable, we relegate the proof of Theorem 1 to "Proof of Theorem 1" section. Figure 1 shows an example of a biologically plausible reconciliation of non-binary trees that is valid w.r.t. Definition 2 but does not satisfy the conditions of a DTL-scenario.

Condition (M1) ensures that each leaf of T, i.e., an extant gene in G, is mapped to the species in which it resides. Conditions (M2.i) and (M2.ii) ensure that each inner vertex of T is either mapped to a vertex or an edge in S such that a vertex of T is mapped to an interior vertex of S if and only if it is a speciation vertex. Condition (M2.i) might seem overly restrictive, an issue to which we will return below. Condition (M2.iii) satisfies condition (O3) and maps the vertices of a transfer edge in a way that they are incomparable in the species tree, since a HGT occurs between distinct (co-existing) species. It becomes void in the absence of HGT; thus Definition 2 reduces to the definition of reconciliation maps given in [16] for the HGT-free case. Importantly, condition (M3) refers only to the connected components of TE¯ since comparability w.r.t. TE¯ implies that the path between x and y in T does not contain transfer edges. It ensures that the ancestor order T of T is preserved along all paths that do not contain transfer edges.

We will make use of the following bound that effectively restricts how close to the leafs the image of a vertex in the gene tree can be located.

Lemma 2

If μ:(T;t,σ)S satisfies (M1) and (M3), then μ(u)SlcaS(σTE¯(u)) for any uV(T).

Proof

If u is a leaf, then by Condition (M1) μ(u)=σ(u) and we are done. Thus, let u be an interior vertex. By Condition (M3), zSμ(u) for all zσTE¯(u). Hence, if μ(u)SlcaS(σTE¯(u)) or if μ(u) and lcaS(σTE¯(u))) are incomparable in S, then there is a zσTE¯(u) such that z and μ(u) are incomparable; contradicting (M3).

Condition (M2.i) implies in particular the weaker property “(M2.i’) if t(v)= then μ(v)W”. In the light of Lemma 2, μ(v)=lcaS(σTE¯(v)) is the lowest possible choice for the image of a speciation vertex. Clearly, this restricts the possibly exponentially many reconciliation maps for which μ(v)SlcaS(σTE¯(v)) for a speciation vertices v to only those that satisfy (M2.i). However, the latter is justified by the observation that if v is a speciation vertex with children uw, then there is only one unique piece of information given by the gene tree to place μ(v), that is, the unique vertex x in S with children yz such that σTE¯(u)LS(y) and σTE¯(w)LS(z). The latter arguments easily generalizes to the case that v has more than two children in T. Moreover, any observable speciation node vTv closer to the root than v must be mapped to a node ancestral to μ(v) due to (M3.ii). Therefore, we require μ(v)=x=lcaS(σTE¯(v)) here.

If S is a species tree for the gene tree (T,t,σ) then there is no freedom in the construction of a reconciliation map μ on the set {xV(T)t(x){,}}. The duplication and HGT vertices of T, however, can be placed differently. As a consequence there is a possibly exponentially large set of reconciliation maps from (T,t,σ) to S.

From a biological point of view, however, the notion of reconciliation used so far is too weak. In the absence of HGT, subtrees evolve independently and hence, the linear order of points along each path from root to leaf is consistent with a global time axis. This is no longer true in the presence of HGT events, because HGT events imply additional time-consistency conditions. These stem from the fact that the appearance of the HGT copy in a distant subtree of S is concurrent with the HGT event. To investigate this issue in detail, we introduce time maps and the notion of time-consistency, see Figs. 2, 3, 4 for illustrative examples.

Fig. 2.

Fig. 2

Shown are two (tube-like) species trees with reconciled gene trees. The reconciliation map μ for T and S is given implicitly by drawing the gene tree (upper right to the respective species tree) within the species tree. In the left example, the map μ is unique. However, μ is not time-consistent and thus, there is no time consistent reconciliation for T and S. In the example on the right hand side, μ is time-consistent

Fig. 3.

Fig. 3

Shown are a gene tree (T;t,σ) (right) and two identical (tube-like) species trees S (left and middle). There are two possible reconciliation maps for T and S that are given implicitly by drawing T within the species tree S. These two reconciliation maps differ only in the choice of placing the HGT-event either on the edge (lcaS(C,D),C) or on the edge (lcaS({A,B,C,D}),lcaS(C,D)). In the first case, it is easy to see that μ would not be time-consistent, i.e., there are no time maps τT and τS that satisfy (C1) and (C2). The reconciliation map μ shown in the middle is time-consistent

Fig. 4.

Fig. 4

Shown are a gene tree (T;t,σ) (right) and two identical (tube-like) species trees S (left and middle). There are two possible reconciliation maps for T and S that are given implicitly by drawing T within the species tree S. The left reconciliation maps each gene tree vertex as high as possible into the species tree. However, in this case only the middle reconciliation map is time-consistent

Definition 3

(Time Map) The map τT:V(T)R is a time map for the rooted tree T if xTy implies τT(x)>τT(y) for all x,yV(T).

Definition 4

A reconciliation map μ from (T;t,σ) to S is time-consistent if there are time maps τT for T and τS for S for all uV(T) satisfying the following conditions:

  • (C1) If t(u){,}, then τT(u)=τS(μ(u)).

  • (C2) If t(u){,} and, thus μ(u)=(x,y)E(S), then τS(y)>τT(u)>τS(x).

Condition (C1) is used to identify the time-points of speciation vertices and leaves u in the gene tree with the time-points of their respective images μ(u) in the species trees. In particular, all genes u that reside in the same species must be assigned the same time point τT(u)=τS(σ(u)). Analogously, all speciation vertices in T that are mapped to the same speciation in S are assigned matching time stamps, i.e., if t(u)=t(v)= and μ(u)=μ(v) then τT(u)=τT(v)=τS(μ(u)).

To understand the intuition behind (C2) consider a duplication or HGT vertex u. By construction of μ it is mapped to an edge of S, i.e., μ(u)=(x,y) in S. The time point of u must thus lie between time points of x and y. Now suppose (u,v)E is a transfer edge. By construction, u signifies the transfer event itself. The node v, however, refers to the next (visible) event in the gene tree. Thus τT(u)<τT(v). In particular, τT(v) must not be misinterpreted as the time of introducing the HGT-duplicate into the new lineage. While this time of course exists (and in our model coincides with the timing of the transfer event) it is not marked by a visible event in the new lineage, and hence there is no corresponding node in the gene tree T.

W.l.o.g. we fix the time axis so that τT(ρT)=0 and τS(ρS)=-1. Thus, τS(ρS)<τT(ρT)<τT(u) for all uV(T)\{ρT}.

Clearly, a necessary condition to have biologically feasible gene trees is the existence of a reconciliation map μ. However, not all reconciliation maps are time-consistent, see Fig. 2.

Definition 5

An event-labeled gene tree (T;t,σ) is biologically feasible if there exists a time-consistent reconciliation map from (T;t,σ) to some species tree S.

As a main result of this contribution, we provide simple conditions that characterize (the existence of) time-consistent reconciliation maps and thus, provides a first step towards the characterization of biologically feasible gene trees.

Theorem 2

Let μ be a reconciliation map from (T;t,σ) to S. There is a time-consistent reconciliation map from (T;t,σ) to S if and only if there are two time-maps τT and τS for T and S, respectively, such that the following conditions are satisfied for all xV(S):

  • (D1) If μ(u)=x, for some uV(T), then τT(u)=τS(x).

  • (D2) If xSlcaS(σTE¯(u)) for some uV(T) with t(u){,}, then τS(x)>τT(u).

  • (D3) If lcaS(σTE¯(u)σTE¯(v))Sx for some (u,v)E, then τT(u)>τS(x).

Proof

In what follows, x and u denote vertices in S and T, respectively.

Assume that there is a time-consistent reconciliation map μ from (T;t,σ) to S, and thus two time-maps τS and τT for S and T, respectively, that satisfy (C1) and (C2).

To see (D1), observe that if μ(u)=xV(S), then (M1) and (M2) imply that t(u){,}. Now apply (C1).

To show (D2), assume that t(u){,} and xSlcaS(σTE¯(u)). By Condition (M2) it holds that μ(u)=(y,z)E(S). Together with Lemma 2 we obtain that xSlcaS(σTE¯(u))SzSμ(u). By the properties of τS we have

τS(x)τS(lcaS(σTE¯(u))τS(z)>(C2)τT(u).

To see (D3), assume that (u,v)E and z:=lcaS(σTE¯(u)σTE¯(v))Sx. Since t(u)= and by (M2ii), we have μ(u)=(y,y)E(S). Thus, μ(u)Sy. By (M2iii) μ(u) and μ(v) are incomparable and therefore, we have either μ(v)Sy or μ(v) and y are incomparable. In either case we see that ySz, since Lemma 2 implies that lcaS(σTE¯(u))Sμ(u) and lcaS(σTE¯(v))Sμ(v). In summary, μ(u)SySzSx. Therefore,

τT(u)>(C2)τS(y)τS(z)τS(x).

Hence, conditions (D1)(D3) are satisfied.

To prove the converse, assume that there exists a reconciliation map μ that satisfies (D1)(D3) for some time-maps τT and τS. In the following we will make use of τS and τT to construct a time-consistent reconciliation map μ.

First we define “anchor points” by μ(v)=μ(v) for all vV(T) with t(v){,}. Condition (D1) implies τT(v)=τS(μ(v)) for these vertices, and therefore μ satisfies (C1).

The next step will be to show that for each vertex uV(T) with t(u){,} there is a unique edge (xy) along the path from lcaS(σTE¯(u)) to ρS with τS(x)<τT(u)<τS(y). We set μ(u)=(x,y) for these points. In the final step we will show that μ is a valid reconciliation map.

Consider the unique path Pu from lcaS(σTE¯(u)) to ρS. By construction, τS(ρS)<τT(ρT)τT(u) and by Condition (D2) we have τT(u)<τS(lcaS(σTE¯(u))). Since τS is a time map for S, every edge (x,y)E(S) satisfies τS(x)<τS(y). Therefore, there is a unique edge (xu,yu)E(S) along Pu such that either τS(xu)<τT(u)<τS(yu), τS(xu)=τT(u)<τS(yu), or τS(xu)<τT(u)=τS(yu). The addition of a sufficiently small perturbation ϵu to τT(u) does not violate the conditions for τT being a time-map for T. Clearly ϵu can be chosen to break the equalities in the latter two cases in such a way that τS(xu)<τT(u)<τS(yu) for each vertex uV(T) with t(u){,}. We then continue with the perturbed version of τT and set μ(u)=(xu,yu). By construction, μ satisfies (C2).

It remains to show that μ is a valid reconciliation map from (T;t,σTE¯) to S. Again, let Pu denote the unique path from lcaS(σTE¯(u)) to ρS for any uV(T).

By construction, Conditions (M1), (M2i), (M2ii) are satisfied. To check condition (M2iii), assume (u,v)E. The original map μ is a valid reconciliation map, and thus, Lemma 2 implies that lcaS(σTE¯(u))Sμ(u) and lcaS(σTE¯(v))Sμ(v). Since μ(u) and μ(v) are incomparable in S and lcaS(σTE¯(u)σTE¯(v)) lies on both paths Pu and Pv we have μ(u),μ(v)SlcaS(σTE¯(u)σTE¯(v))=:x. In particular, xlcaS(σTE¯(u)) and xlcaS(σTE¯(v)).

Conditions (D1) and (D2) imply that τS(x)<τT(u)<τS(lcaS(σTE¯(u))) and τS(x)<τT(v)τS(lcaS(σTE¯(v))). By construction of μ, the vertex u is mapped to a unique edge eu=(xu,yu) and v is mapped either to lcaS(σTE¯(v))x or to the unique edge ev=(xv,yv), respectively. In particular, μ(u) lies on the path P from x to lcaS(σTE¯(u)) and μ(v) lies one the path P from x to lcaS(σTE¯(v)). The paths P and P are edge-disjoint and have x as their only common vertex. Hence, μ(u) and μ(v) are incomparable in S, and (M2iii) is satisfied.

In order to show (M3), assume that uTE¯v. Since uTE¯v, we have σTE¯(u)σTE¯(v). Hence, lcaS(σTE¯(u))lcaS(σTE¯(v))SρS. In other words, lcaS(σTE¯(v)) lies on the path Pu and thus, Pv is a subpath of Pu. By construction of μ, both μ(u) and μ(v) are comparable in S. Moreover, since τT(u)>τT(v) and by construction of μ, it immediately follows that μ(u)Sμ(v).

Its now an easy task to verify that (M3) is fulfilled by considering the distinct event-labels in (M3i) and (M3ii), which we leave to the reader.

Interestingly, the existence of a time-consistent reconciliation map from a gene tree T to a species tree S can be characterized in terms of a time map defined on T, only.

Theorem 3

Let μ be a reconciliation map from (T;t,σ) to S. There is a time-consistent reconciliation map (T;t,σ) to S if and only if there is a time map τT such that for all u,v,wV(T):

  • (T1) If t(u)=t(v){,} then
    • (a) If μ(u)=μ(v), then τT(u)=τT(v).
    • (b) If μ(u)Sμ(v), then τT(u)>τT(v).
  • (T2) If t(u){,}, t(v){,} and μ(u)SlcaS(σTE¯(v)), then τT(u)>τT(v).

  • (T3) If (u,v)E and lcaS(σTE¯(u)σTE¯(v))SlcaS(σTE¯(w)) for some wV(T), then τT(u)>τT(w).

Proof

Suppose that μ is a time-consistent reconciliation map from (T;t,σ) to S. By Definition 4 and Theorem 2, there are two time maps τT and τS that satisfy (D1)(D3). We first show that τT also satisfies (T1)(T3), for all u,vV(T). Condition (T1a) is trivially implied by (D1). Let t(u),t(v){,}, and μ(u)Sμ(v). Since τT and τS are time maps, we may conclude that

τT(u)=(D1)τS(μ(u))<τS(μ(v))=(D1)τT(v).

Hence, (T1b) is satisfied.

Now, assume that t(u){,}, t(v){,} and μ(u)Slca(σTE¯(v)). By the properties of τS, we have:

τT(u)=(D1)τS(μ(u))>(D2)τT(v).

Hence, (T2) is fulfilled.

Finally, assume that (u,v)E, and x:=lcaS(σTE¯(u)σTE¯(v))SlcaS(σTE¯(w)) for some wV(T). Lemma 2 implies that lcaS(σTE¯(w))Sμ(w) and we obtain

τT(w)<(D2)τS(x)τS(lca(σTE¯(w)))<(D3)τT(u).

Hence, (T3) is fulfilled.

To see the converse, assume that there exists a reconciliation map μ that satisfies (T1)(T3) for some time map τT. In the following we construct a time map τS for S that satisfies (D1)(D3). To this end, we first set

τS(x)=-1ifx=ρSτT(v)else ifvμ-1(x)else, i.e.,μ-1(x)=andxρS.

We use the symbol to denote the fact that so far no value has been assigned to τS(x). Note, by (M2i) and (T1a) the value τS(x) is uniquely determined and thus, by construction, (D1) is satisfied. Moreover, if x,yV(S) have non-empty preimages w.r.t. μ and xSy, then we can use the fact that τT is a time map for T together with condition (T1) to conclude that τS(x)>τS(y).

If xV(S) with aμ-1(x), then (T2) implies (D2) [by (D1) and setting u=a in (T2) and (T3) implies (D3) [by (D1) and setting w=a in (T3)]. Thus, (D2) and (D3) is satisfied for all xV(S) with μ-1(x).

Using our choices τS(ρT)=0 and τS(ρS)=-1 for the augmented root of S, we must have μ-1(ρS)=. Thus, ρSSlcaS(σTE¯(v)) for any vV(T). Hence, (D2) is trivially satisfied for ρS. Moreover, τT(ρT)=0 implies τT(u)>τS(ρS) for any uV(T). Hence, (D3) is always satisfied for ρS.

In summary, Conditions (D1)(D3) are met for any vertex xV(S) that up to this point has been assigned a value, i.e., τS(x).

We will now assign to all vertices xV(S) with μ-1(x)= a value τS(x) in a stepwise manner. To this end, we give upper and lower bounds for the possible values that can be assigned to τS(x). Let xV(S) with τS(x)=. Set

LO(x)={τS(y)xSy,yV(S)andτS(y)}UP(x)={τS(y)xSy,yV(S)andτS(y)}.

We note that LO(x) and UP(x) because the root and the leaves of S already have been assigned a value τS in the initial step. In order to construct a valid time map τS we must ensure max(LO(x))<τS(x)<min(UP(x)).

Moreover, we strengthen the bounds as follows. Put

lo(x)={τT(u)t(u){,},xSlcaS(σTE¯(u))}up(x)={τT(u)(u,v)EandlcaS(σTE¯(u)σTE¯(v))Sx}.

Observe that max(lo(x))<min(up(x)), since otherwise there are vertices u,wV(T) with τT(w)lo(x) and τT(u)up(x) and τT(w)τT(u). However, this implies that lcaS(σTE¯(u)σTE¯(v))SxlcaS(σTE¯(w)); a contradiction to (T3).

Since (D2) is satisfied for all vertices y that obtained a value τS(y), we have max(lo(x))<min(UP(x)). Likewise because of (D3), it holds that max(LO(x))<min(up(x)). Thus we set τS(x) to an arbitrary value such that

max(LO(x)lo(x))<τS(x)<min(UP(x)up(x)).

By construction, (D1), (D2), and (D3) are satisfied for all vertices in V(S) that have already obtained a time value distinct from . Moreover, for all such vertices with xTy we have τS(x)>τS(y). In each step we chose a vertex x with τS(x)= that obtains then a real-valued time stamp. Hence, in each step the number of vertices that have value is reduced by one. Therefore, repeating the latter procedure will eventually assign to all vertices a real-valued time stamp such that, in particular, τS satisfies (D1), (D2), and (D3) and thus is indeed a time map for S.

From the algorithmic point of view it is desirable to design methods that allow to check whether a reconciliation map is time-consistent. Moreover, given a gene tree T and species tree S we wish to decide whether there exists a time-consistent reconciliation map μ, and if so, we should be able to construct μ.

To this end, observe that any constraints given by Definition 3, Theorem 2 (D2)(D3), and Definition 4 (C2) can be expressed as a total order on V(S)V(T), while the constraints (C1) and (D1) together suggest that we can treat the preimage of any vertex in the species tree as a “single vertex”. In fact we can create an auxiliary graph in order to answer questions that are concerned with time-consistent reconciliation maps.

Definition 6

Let μ be a reconciliation map from (T;t,σ) to S. The auxiliary graph A is defined as a directed graph with a vertex set V(A)=V(S)V(T) and an edge-set E(A) that is constructed as follows

  • (A1) For each (u,v)E(T) we have (u,v)E(A), where
    u=μ(u)ift(u){,}uotherwise
    and
    v=μ(v)ift(v){,}votherwise,
  • (A2) For each (x,y)E(S) we have (x,y)E(A)..

  • (A3) For each uV(T) with t(u){,} we have (u,lcaS(σTE¯(u)))E(A).

  • (A4) For each (u,v)E we have (lcaS(σTE¯(u)σTE¯(v)),u)E(A).

  • (A5) For each uV(T) with t(u){,} and μ(u)=(x,y)E(S) we have (x,u)E(A) and (u,y)E(A).

We define A1 and A2 as the subgraphs of A that contain only the edges defined by (A1), (A2), (A5) and (A1), (A2), (A3), (A4), respectively.

We note that the edge sets defined by conditions (A1) through (A5) are not necessarily disjoint. The mapping of vertices in T to edges in S is considered only in condition (A5). The following two theorems are the key results of this contribution.

Theorem 4

Let μ be a reconciliation map from (T;t,σ) to S. The map μ is time-consistent if and only if the auxiliary graph A1 is a directed acyclic graph (DAG).

Proof

Assume that μ is time-consistent. By Theorem 2, there are two time-maps τT and τS satisfying (C1) and (C2). Let τ=τTτS be the map from V(T)V(S)R. Let A be the directed graph with V(A)=V(S)V(T) and set for all x,yV(A): (x,y)E(A) if and only if τ(x)<τ(y). By construction A is a DAG since τ provides a topological order on A [25].

We continue to show that A contains all edges of A1.

To see that (A1) is satisfied for E(A) let (u,v)E(T). Note, τ(v)>τ(u), since τT is a time map for T and by construction of τ. Hence, all edges (u,v)E(T) are also contained in A, independent from the respective event-labels t(u), t(v). Moreover, if t(u) or t(v) are speciation vertices or leaves, then (C1) implies that τS(μ(u))=τT(u)>τT(v) or τT(u)>τT(v)=τS(μ(v)). By construction of τ, all edges satisfying (A1) are contained in E(A). Since τS is a time map for S, all edges as in (A2) are contained in E(A). Finally, (C2) implies that all edges satisfying (A5) are contained in E(A).

Although, A might have more edges than required by (A1), (A2) and (A5), the graph A1 is a subgraph of A. Since A is a DAG, also A1 is a DAG.

For the converse assume that A1 is a directed graph with V(A1)=V(S)V(T) and edge set E(A1) as constructed in Definition 6 (A1), (A2) and (A5). Moreover, assume that A1 is a DAG. Hence, there is is a topological order τ on A1 with τ(x)<τ(y) whenever (x,y)E(A1). In what follows we construct the time-maps τT and τS such that they satisfy (C1) and (C2). Set τS(x)=τ(x) for all xV(S). Additionally, set for all uV(T):

τT(u)=τ(μ(u))ift(u){,}τ(u)otherwise.

By construction it follows that (C1) is satisfied. Due to (A2), τS is a valid time map for S. It follows from the construction and (A1) that τT is a valid time map for T. Assume now that uV(T), t(u){,}, and μ(u)=(x,y)E(S). Since τ provides a topological order we have:

τ(x)<(A5)τ(u)<(A5)τ(y).

By construction, it follows that τS(x)<τT(u)<τS(y) satisfying (C2).

Theorem 5

Assume there is a reconciliation map μ from (T;t,σ) to S. There is a time-consistent reconciliation map, possibly different from μ, from (T;t,σ) to S if and only if the auxiliary graph A2 (defined on μ) is a DAG.

Proof

Let μ be a reconciliation map for (T;t,σ) and S and μ be a time-consistent reconciliation map for (T;t,σ) and S. Let A2 and A2 be the auxiliary graphs that satisfy Definition 6 (A1)(A4) for μ and μ, respectively. Since μ(u)=μ(u) for all uV(T) with t(u){,} and (A2)(A4) don’t rely on the explicit reconciliation map, it is easy to see that A2=A2.

Now we can re-use similar arguments as in the proof of Theorem 4. Assume there is a time-consistent reconciliation map (T;t,σ) to S. By Theorem 2, there are two time-maps τT and τS satisfying (D1)-(D3). Let τ and A be defined as in the proof of Theorem 4.

Analogously to the proof of Theorem 4, we show that A contains all edges of A2. Application of (D1) immediately implies that all edges satisfying (A1) and (A2) are contained in E(A). By condition (D2), it yields (u,lcaS(σTE¯(u)))E(A) and (D3) implies (lcaS(σTE¯(u)σTE¯(v)),u)E(A). We conclude by the same arguments as before that the graph A2 is a DAG.

For the converse, assume we are given the directed acyclic graph A2. As before, there is is a topological order τ on A2 with τ(x)<τ(y) only if (x,y)E(A2). The time-maps τT and τS are given as in the proof of Theorem 1.

By construction, it follows that (D1) is satisfied. Again, by construction and the Properties (A1) and (A2), τS and τT are valid time-maps for S and T respectively.

Assume now that uV(T), t(u){,}, and xSlcaS(σTE¯(u)) for some xV(S). Since there is a topological order on V(A2), we have

τ(x)(A2)τ(lcaS(σTE¯(u)))>(A3)τ(u).

By construction, it follows that τS(x)>τT(u). Thus, (D2) is satisfied.

Finally assume that (u,v)E and lcaS(σTE¯(u)σTE¯(v))Sx for some xV(S). Again, since τ provides a topological order, we have:

τ(x)(A2)τ(lcaS(σTE¯(u)σTE¯(v)))<(A4)τ(u).

By construction, it follows that τS(x)<τT(u), satisfying (D3).

Thus τT and τS are valid time maps satisfying (D1)(D3).

graphic file with name 13015_2018_121_Figa_HTML.jpg

graphic file with name 13015_2018_121_Figb_HTML.jpg

Naturally, Theorems 4 or 5 can be used to devise algorithms for deciding time-consistency. To this end, the efficient computation of lcaS(σTE¯(u)) for all uV(T) is necessary. This can be achieved with Algorithm 2 in O(|V(T)|log(|V(S)|)). More precisely, we have the following statement:

Lemma 3

For a given gene tree (T=(V,E);t,σ) and a species tree S=(W,F), Algorithm 2 correctly computes (u)=lcaS(σTE¯(u)) for all uV(T) in O(|V|log(|W|)) time.

Proof

Let uV(T). In what follows, we show that (u)=lcaS(σTE¯(u)). In fact, the algorithm is (almost) a depth first search through T that assigns the (species tree) vertex (u) to u if and only if every child v of u has obtained an assignment (v) (cf. Line (9)–(10)). That there are children v with non-empty (v) at some point is ensured by Line (7). That is, if t(u)=, then (u)=lcaS(σTE¯(u))=σ(u). Now, assume there is an interior vertex uV(T), where every child v has been assigned a value (v), then

lcaS(σTE¯(u))=lcaS(σTE¯({σTE¯(v)(u,v)E(T)andt(u,v)=0}))=lcaS(σTE¯({lcaS(σTE¯(v))(u,v)E(T)andt(u,v)=0}))=lcaS(σTE¯({(v)(u,v)E(T)andt(u,v)=0}))

The latter is achieved by Line (10).

Since T is a tree and the algorithm is in effect a depth first search through T, the while loop runs at most O(V(T)+E(T)) times, and thus in O(V(T)) time.

The only non-constant operation within the while loop is the computation of lcaS in Line (10). Clearly lcaS of a set of vertices C={c1,c2ck}, where ciV(S), for all ciC can be computed as sequence of lcaS operations taking two vertices: lcaS(c1,lcaS(c2,lcaS(ck-1,ck))), each taking O(lg(|V(S)|)) time. Note however, that since Line (10) is called exactly once for each vertex in T, the number of lcaS operations taking two vertices is called at most |E(T)| times through the entire algorithm. Hence, the total time complexity is O(|V(T)|lg(|V(S)|)).

Let S be a species tree for (T;t,σ), that is, there is a valid reconciliation between the two trees. Algorithm 1 describes a method to construct a time-consistent reconciliation map for (T;t,σ) and S, if one exists, else “No time-consistent reconciliation map exists” is returned. First, an arbitrary reconciliation map μ that satisfies the condition of Definition 2 is computed. Second, Theorem 5 is utilized and it is checked whether the auxiliary graph A2 is not a DAG in which case no time-consistent map μ exists for (T;t,σ) and S. Finally, if A2 is a DAG, then we continue to adjust μ to become time-consistent. The latter is based on Theorem 2, see the proof of Theorems 2 and 6 for details.

Theorem 6

Let S=(W,F) be species tree for the gene tree (T=(V,E);t,σ) . Algorithm 1 correctly determines whether there is a time-consistent reconciliation map μ and in the positive case, returns such a μ in O(|V|log(|W|)) time.

Proof

In order to produce a time-consistent reconciliation map, we first construct some valid reconciliation map μ from (T;t,σ) to S. Using the lca-map from Algorithm 2, μ will be adjusted to become time-consistent, if possible.

By assumption, there is a reconciliation map from (T;t,σ) to S. The for-loop (Line (3)–(5)) ensures that each vertex uV obtained a value μ(u). We continue to show that μ is a valid reconciliation map satisfying (M1)(M3).

Assume that t(u)=, in this case (u)=σ(u), and thus (M1) is satisfied. If t(u)=, it holds that μ(u)=(u)=lcaS(σTE¯(u)), thus satisfying (M2i). Note that ρSS(u), and hence, μ(u)F by Line (5), implying that (M2ii) is satisfied. Now, assume t(u)= and (u,v)E. By assumption, we know there exists a reconciliation map from T to S, thus by (Σ1):

σTE¯(u)σTE¯(v)=

It follows that, (u) is incomparable to (v), satisfying (M2iii).

Now assume that u,vV and uTE¯v. Note that σTE¯(u)σTE¯(v). It follows that (u)=lcaS(σTE¯(u))SlcaS(σTE¯(v))=(v). By construction, (M3) is satisfied. Thus, μ is a valid reconciliation map.

By Theorem 5, two time maps τT and τS satisfying (D1)(D3) only exists if the auxiliary graph A build on Line (7) is a DAG. Thus if A:=A2 contains a cycle, no such time-maps exists and the statement “No time-consistent reconciliation map exists.” is returned (Line (7)). On the other hand, if A is a DAG, the construction in Line (8)–(11) is identical to the construction used in the proof of Theorem 5. Hence correctness of this part of the algorithm follows directly from the proof of Theorem 5.

Finally, we adjust μ to become a time-consistent reconciliation map.. By the latter arguments, τT and τS satisfy (D1)(D3) w.r.t. to μ. Note, that μ is chosen to be the “lowest point” where a vertex uV with t(u){,} can be mapped, that is, μ(u) is set to (p(x), x) where x=lcaS(σTE¯(u)). However, by the arguments in the proof of Theorem 2, there is a unique edge (y,z)W on the path from x to ρS such that τS(y)<τT(u)<τS(z). The latter is ensured by choosing a different value for distinct vertices in V(A), see comment in Line (9). Hence, Line (14) ensures, that μ(u) is mapped on the correct edge such that (C2) is satisfied. It follows that adjusted μ is a valid time-consistent reconciliation map.

We are now concerned with the time-complexity. By Lemma 3, computation of in Line (1) takes O(|V|log(|W|)) time and the for-loop (Line (3)-(5)) takes O(|V|) time. We continue to show that the auxiliary graph A (Line (6)) can be constructed in O(|V|log(|W|)) time.

Since we know (u)=lcaS(σTE¯(u)) for all uV and since T and S are trees, the subgraph with edges satisfying (A1)(A3) can be constructed in O(|V|+|W|+|E|+|F)|)=O(|V|+|W|) time. To ensure (A4), we must compute for a possible transfer edges (u,v)E the vertex lcaS(σTE¯(u)σTE¯(v)). which can be done in O(log(|W|)) time. Note, the number of transfer edges is bounded by the number of possible transfer event O(|V|). Hence, generating all edges satisfying (A4) takes O(|V|(log(|W|)) time. In summary, computing A can done in O(|V|+|W|+|V|(log(|W|))=O(|V|(log(|W|)) time.

To detect whether A contains cycles one has to determine whether there is a topological order τ on V(A) which can be done via depth first search in O(|V(A)|+|E(A)|) time. Since |V(A)|=|V|+|W| and O(|E(A)|)=O(|F|+|E|+|W|+|V|) and ST are trees, the latter task can be done in O(|V|+|W|) time. Clearly, Line (10)-(11) can be performed on O(|V|+|W|) time.

Finally, we have to adjust μ according to τT and τS. Note, that for each uV with t(u){,} (Line (12)) we have possibly adjust μ to the next edge (p(x), x). However, the possibilities for the choice of (p(x), x) is bounded by by the height of S, which is in the worst case log(|W|). Hence, the for-loop in Line (12) has total-time complexity O(|V|log(|W|)).

In summary, the overall time complexity of Algorithm 1 is O(|V|log(|W|)).

So far, we have shown how to find a time consistent reconciliation map μ given a species tree S and a single gene tree T. In practical applications, however, one often considers more than one gene family, and thus, a set of gene trees F={(T1;t1,σ1),,(Tn;tn,σn)} that has to be reconciled with one and the same species tree S.

In this case it is possible to aggregate all gene trees (Ti;ti,σi)F to a single gene tree (T;t,σ) that is constructed from F by introducing an artificial duplication as the new root of all Ti. More precisely, T=(V,E) is constructed from F such that V={ρT}i=1nV(Ti) and E=i=1n(E(Ti){(ρT,ρTi)}). Moreover, the event-labeling map t is defined as

t(x)=ti(x)ifxV(Ti)E(Ti)ifx=t(ρT)0ifx=(ρT,ρTi)

Finally, σ(x)=σi(x) for all xLTi.

Finding a time consistent reconciliation for a species tree S and a set of gene trees F then corresponds to finding a time map τS for S and a time map τT for the aggregated gene tree (T;t,σ), such that (D1)(D3) are satisfied.

If there exists a time consistent reconciliation map μ from (T;t,σ) to S then, by Theorem 2, there exists the two time maps τT and τS that satisfy (D1)(D3). But then τT and τS also satisfy (D1)(D3) w.r.t. any (Ti;ti,σi)F and therefore, μ immediately gives a time-consistent reconciliation map for each (Ti;ti,σi)F.

Outlook and summary

We have characterized here whether a given event-labeled gene tree (T;t,σ) and species tree S can be reconciled in a time-consistent manner in terms of two auxiliary graphs A1 and A2 that must be DAGs. These are defined in terms of given reconciliation maps. This condition yields an O(|V|log(|W|))-time algorithm to check whether a given reconciliation map μ is time-consistent, and an algorithm with the same time complexity for the construction of a time-consistent reconciliation maps, provided one exists.

Our results depend on three conditions on the event-labeled gene trees that are motivated by the fact that event-labels can be assigned to internal vertices of gene trees only if there is observable information on the event. The question which event-labeled gene trees are actually observable given an arbitrary, true evolutionary scenario deserves further investigation in future work. Here we have used conditions that arguable are satisfied when gene trees are inferred using sequence comparison and synteny information. A more formal theory of observability is still missing, however.

Our results point to an efficient way of deciding whether a given pair of gene and species tree can be time-consistently reconciled. Such gene and species trees can be obtained from genomic sequence data using the following workflow: (i) Estimate putative orthologs and HGT events using e.g. one of the methods detailed in [11, 12, 2638], respectively. Importantly, this step uses only sequence data as input and does not require the construction of either gene or species trees. (ii) Correct these estimates in order to derive “biologically feasible” homology relations as described in [15, 16, 26, 3944]. The result of this step are (not necessarily fully resolved) gene trees together with event-labels. (iii) Extract “informative triples” from the event-labeled gene tree. These imply necessary conditions for gene trees to be biologically feasible [15, 16].

In general, there will be exponentially many putative species trees. This begs the question whether there is at least one species tree S for a gene tree and if so, how to construct S. In the absence of HGT, the answer is known: time-consistent reconciliation maps are fully characterized in terms of “informative triples” [16]. Hence, the central open problem that needs to be addressed in further research are sufficient conditions for the existence of a time-consistent species tree given an event-labeled gene tree with HGT.

Proof of Theorem 1

We show that Definition 2 is is equivalent to the traditional definition of a DTL-scenario [20, 24] in the special case that both the gene tree and species trees are binary. To this end we establish a series of lemmas detailing some useful properties of reconciliation maps.

Lemma 4

Let μ be a reconciliation map from (T;t,σ) to S and assume that T is binary. Then the following conditions are satisfied:

  1. If v,wV(T) are in the same connected component of TE¯, then μ(lcaTE¯(v,w))SlcaS(μ(v),μ(w)). Let u be an arbitrary interior vertex of T with children v, w, then:

  2. μ(u) and μ(v) are incomparable in S if and only if (u,v)E.

  3. If t(u)=, then μ(v) and μ(w) are incomparable in S.

  4. If μ(v),μ(w) are comparable or μ(u)SlcaS(μ(v),μ(w)), then t(u)=.

Proof

We prove the Items 1 – 4 separately. Recall, Lemma 1 implies that σ(LTE¯(x)) for all xV(T).

Proof of Item 1: Let v and w be distinct vertices of T that are in the same connected component of TE¯. Consider the unique path P connecting w with v in TE¯. This path P is uniquely subdivided into a path P and a path P from lcaTE¯(v,w) to v and w, respectively. Condition (M3) implies that the images of the vertices of P and P under μ, resp., are ordered in S with regards to S and hence, are contained in the intervals Q and Q that connect μ(lcaTE¯(v,w)) with μ(v) and μ(w), respectively. In particular, μ(lcaTE¯(v,w)) is the largest element (w.r.t. S) in the union of QQ which contains the unique path from μ(v) to μ(w) and hence also lcaS(μ(v),μ(w)).

Proof of Item 2: If (u,v)E then, t(u)= and (M2iii) implies that μ(u) and μ(v) are incomparable.

To see the converse, let μ(u) and μ(v) be incomparable in S. Item (M3) implies that for any edge (x,y)E(TE¯) we have μ(y)Sμ(x). However, since μ(u) and μ(v) are incomparable it must hold that (u,v)E(TE¯). Since (uv) is an edge in the gene tree T, (u,v)E is a transfer edge.

Proof of Item 3: Let t(u)=. Since none of (uv) and (uw) are transfer-edges, it follows that both edges are contained in TE¯.

Then, since T is a binary tree, it follows that LTE¯(u)=LTE¯(v)LTE¯(w) and therefore, σTE¯(u)=σTE¯(v)σTE¯(w).

Therefore and by Item (M2i),

μ(u)=lcaS(σTE¯(u))=lcaS(σTE¯(v)σTE¯(w))=lcaS(lcaS(σTE¯(v)),lcaS(σTE¯(w))).

Assume for contradiction that μ(v) and μ(w) are comparable, say, μ(w)Sμ(v). By Lemma 2, μ(w)Sμ(v)SlcaS(σTE¯(v)) and μ(w)SlcaS(σTE¯(w)). Thus,

μ(w)SlcaS(lcaS(σTE¯(v)),lcaS(σTE¯(w))).

Thus, μ(w)Sμ(u); a contradiction to (M3ii).

Proof of Item 4: Let μ(v),μ(w) be comparable in S. Item 3 implies that t(u). Assume for contradiction that t(u)=. Since by (O2) only one of the edges (uv) and (uw) is a transfer edge, we have either (u,v)E or (u,w)E. W.l.o.g. let (u,v)E and (u,w)E(TE¯). By Condition (M3), μ(u)Sμ(w). However, since μ(v) and μ(w) are comparable in S, also μ(u) and μ(v) are comparable in S; a contradiction to Item 2. Thus, t(u). Since each interior vertex is labeled with one event, we have t(u)=.

Assume now that μ(u)SlcaS(μ(v),μ(w)). Hence, μ(u) is comparable to both μ(v) and μ(w) and thus, (M2iii) implies that t(u). Lemma 2 implies μ(v)SlcaS(σTE¯(v)) and μ(w)SlcaS(σTE¯(w)). Hence,

lcaS(μ(v),μ(w))SlcaS(lcaS(σTE¯(v)),lcaS(σTE¯(w)))=lcaS(σTE¯(v)σTE¯(w)).

Since T(u) it follows that neither (u,v)E nor (u,w)E and hence, both edges are contained in TE¯. By the same argumentation as in Item 3 it follows that σTE¯(u)=σTE¯(v)σTE¯(w) and therefore, lcaS(σTE¯(v)σTE¯(w))=lcaS(σTE¯(u)). Hence, μ(u)SlcaS(μ(v),μ(w))SlcaS(σTE¯(u)). Now, (M2i) implies t(u). Since each interior vertex is labeled with one event, we have t(u)=.

Lemma 5

Let μ be a reconciliation map for the gene tree (T;t,σ) and the species tree S as in Definition 2. Moreover, assume that T and S are binary. Set for all uV(T):

γ(u)=μ(u),ifμ(u)V(S)y,ifμ(u)=(x,y)E(S)

Then γ:V(T)V(S) is a map according to the DTL-scenario.

Proof

We first emphasize that, by construction, μ(u)Sγ(u) for all uV(T). Moreover, μ(u)=μ(v) implies that γ(u)=γ(v), and γ(u)=γ(v) implies that μ(u) and μ(v) are comparable. Furthermore, μ(u)Sμ(v) implies γ(u)Sγ(v), while γ(u)Sγ(v) implies that μ(u)Sμ(v). Thus, μ(u) and μ(v) are comparable if and only if γ(u) and γ(v) are comparable.

Item (I) and (M1) are equivalent.

For Item (II) let uV(T)\G be an interior vertex with children vw. If (u,w)E, then wTE¯u. Applying Condition (M3) yields μ(w)Sμ(u) and thus, by construction, γ(w)Sγ(u). Therefore, γ(u) is not a proper descendant of γ(w) and γ(w) is a descendant of γ(u). If one of the edges, say (uv), is a transfer edge, then t(u)= and by Condition (M2iii) μ(u) and μ(v) are incomparable. Hence, γ(u) and γ(v) are incomparable. Therefore, γ(u) is no proper descendant of γ(v). Note that (O2) implies that for each vertex uV(T)\G at least one of its outgoing edges must be a non-transfer edge, which implies that γ(w)Sγ(u) or γ(v)Sγ(u) as shown before. Hence, Item (IIa) and (IIb) are satisfied.

For Item (III) assume first that (u,v)E and therefore t(u)=. Then, (M2iii) implies that μ(u) and μ(v) are incomparable and thus, γ(u) and γ(v) are incomparable. Now assume that (uv) is an edge in the gene tree T and γ(u) and γ(v) are incomparable. Therefore, μ(u) and μ(v) are incomparable. Now, apply Lemma 4(2).

Item (IVa) is clear by the event-labeling t of T and since (O2). Now assume for (IVb) that t(u)=. Lemma 4(3) implies that μ(v) and μ(w) are incomparable and thus, γ(v) and γ(w) must be incomparable as well. Furthermore, Condition (M2i) implies that μ(u)=lcaS(σTE¯(u)). Lemma 2 implies that μ(v)SlcaS(σTE¯(v)) and μ(w)SlcaS(σTE¯(w)). The latter together with the incomparability of μ(v) and μ(u) implies that

lcaS(μ(v),μ(w))=lcaS(lcaS(σTE¯(v)),lcaS(σTE¯(w)))=lcaS(σTE¯(v)σTE¯(w))=lcaS(σTE¯(u))=μ(u).

If μ(v) is mapped on the edge (xy) in T, then γ(v)=y. By definition of lca for edges, lcaS(μ(v),γ(w))=lcaS(y,γ(w))=lcaS(γ(v),γ(w)). The same argument applies if μ(w) is mapped on an edge. Since for all zV(T) either μ(z)Sγ(z) (if μ(z) is mapped on an edge) or μ(z)=γ(z), we always have

lcaS(γ(v),γ(w))=lcaS(μ(v),μ(w))=μ(u).

Since t(u)=, (M2i) implies that μ(u)V(S) and therefore, by construction of γ it holds that μ(u)=γ(u). Thus, γ(u)=lcaS(γ(v),γ(w)). For (IVc) assume that t(u)=. Condition (M3) implies that μ(u)Sμ(v),μ(w) and therefore, γ(u)Sγ(v),γ(w). If γ(v) and γ(w) are incomparable, then γ(u)Sγ(v),γ(w) implies that γ(u)SlcaS(γ(v),γ(w)). If γ(v) and γ(w) are comparable, say γ(v)Sγ(w), then γ(u)Sγ(v)=lcaS(γ(v),γ(w)). Hence, Statement (IVc) is satisfied.

Lemma 6

Let γ:V(T)V(S) be a map according to the DTL-scenario for the binary the gene tree (T;t,σ) and the binary species tree S. For all uV(T) define:

μ(u)=γ(u),ift(u){,}(x,γ(u))E(S),ift(u){,}

Then μ:V(T)V(S)E(S) is a reconciliation map according to Definition 2.

Proof

Let γ:V(T)V(S) be a map a DTL-scenario for the binary the gene tree (T;t,σ) and the species tree S.

Condition (M1) is equivalent to (I).

For (M3) assume that vTE¯w. The path P from v to w in TE¯ does not contain transfer edges. Thus, by (III) all vertices along P are comparable. Moreover, by (IIa) we have that γ(w) is not a proper descendant of the image of its child in S, and therefore, by repeating these arguments along the vertices x in Pwv, we obtain γ(v)Sγ(x)Sγ(w).

If γ(v)Sγ(w), then by construction of μ, it follows that μ(v)Sμ(w). Thus, (M3) is satisfied, whenever γ(v)Sγ(w). Assume now that γ(v)=γ(w). If t(v),t(w){,} then μ(v)=(x,γ(v))=(x,γ(w))=μ(w) and thus (M3i) is satisfied. If t(v)= and t(w) then since μ(v)=γ(v) and μ(w)=(x,γ(w)). Thus μ(v)Sμ(w).

Now assume that γ(v)=γ(w) and w is a speciation vertex. Since t(w)=, for its two children w and w the images γ(w) and γ(w) must be incomparable due to (IVb). W.l.o.g. assume that w is a vertex of Pwv. Since γ(v)Sγ(x)Sγ(w) for any vertex x along Pwv and γ(v)=γ(w), we obtain γ(w)=γ(w). However, since γ(w)Sγ(w), the vertices γ(w) and γ(w) are comparable in S; contradicting (IVb). Thus, whenever w is a speciation vertex, γ(w)=γ(w) is not possible. Therefore, γ(v)Sγ(w)Sγ(w) and, by construction of μ, μ(v)Sμ(w). Thus, (M3ii) is satisfied.

Finally, we show that (M2) is satisfied. To this end, observe first that (M2ii) is fulfilled by construction of μ and (M2iii) is an immediate consequence of (III). Thus, it remains to show that (M2i) is satisfied. Thus, for a given speciation vertex u we need to show that μ(u)=lcaS(σTE¯(u)). By construction, μ(u)=γ(u). Note, TE¯ does not contain transfer edges. Applying (III) implies that for all edges (xy) in TE¯ the images γ(x) and γ(y) must be comparable. The latter and (IIa) implies that for all edges (xy) in TE¯ we have γ(y)Sγ(x). Take the latter together, σ(z)=γ(z)Sγ(u) for any leaf zLTE¯(u). Therefore lcaS(σTE¯(u))Sγ(u)=μ(u). Assume for contradiction that lcaS(σTE¯(u))Sγ(u)=μ(u). Consider the two children u and u of u in TE¯. Since neither (u,u)E nor (u,u)E and T is a binary tree, it follows that LTE¯(u)=LTE¯(u)LTE¯(u) and we obtain that σTE¯(u)=σTE¯(u)σTE¯(u). Moreover, re-using the arguments above, lcaS(σTE¯(u))Sγ(u) and lcaS(σTE¯(u))Sγ(u). By the arguments we used in the proof for (M3), we have γ(u)Sγ(u) and γ(u)Sγ(u). In particular, γ(u) and γ(u) must be contained in the subtree of S that is rooted in the child a of γ(u) in S with lcaS(σTE¯(u))Sa, as otherwise, lcaS(σTE¯(u))eqSγ(u) or lcaS(σTE¯(u))eqSγ(u). Moreover, neither lcaS(σTE¯(u))SlcaS(σTE¯(u)) nor lcaS(σTE¯(u))SlcaS(σTE¯(u)) is possible since then lcaS(σTE¯(u))Sγ(u) and lcaS(σTE¯(u))Sγ(u) implies that γ(u) and γ(u) would be comparable; contradicting (IVb). Hence, there remains only one way to locate γ(u) and γ(u), that is, they must be located in the subtree of S that is rooted in lcaS(σTE¯(u)). But then we have lcaS(γ(u),γ(u))SlcaS(σTE¯(u))Sγ(u); a contradiction to (IVb) γ(u)=lcaS(γ(u),γ(u)). Therefore, lcaS(σTE¯(u))=γ(u)=μ(u) and (M2i) is satisfied.

Finally, Lemmas 5 and 6 imply Theorem 1.

Authors' contributions

NN and MH designed the study. NN implemented and designed the algorithms. All authors collaborated in research and the writing of the manuscript. All authors read and approved the final manuscript.

Acknowledgements

We thank the organizers of the 32nd TBI Winterseminar 2017 in Bled (Slovenia), where the authors met and jointly drafted the main ideas of this paper with the help of an unknown number of cold and tasty cans of red Union, or was it green Laško?

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

Not applicable.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.

Funding

Supported in part by the Danish Council for Independent Research, Natural Sciences, Grants DFF-1323-00247 and DFF-7014-00041.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Nikolai Nøjgaard, Email: nojgaard@imada.sdu.dk.

Manuela Geiß, Email: manuela@bioinf.uni-leipzig.de.

Daniel Merkle, Email: daniel@imada.sdu.dk.

Peter F. Stadler, Email: studla@bioinf.uni-leipzig.de

Nicolas Wieseke, Email: wieseke@informatik.uni-leipzig.de.

Marc Hellmuth, Email: mhellmuth@mailbox.org.

References

  • 1.Dress A, Moulton V, Steel M, Wu T. Species, clusters and the ‘tree of life’: a graph-theoretic perspective. J Theor Biol. 2010;265:535–542. doi: 10.1016/j.jtbi.2010.05.031. [DOI] [PubMed] [Google Scholar]
  • 2.Fitch WM. Homology: a personal view on some of the problems. Trends Genet. 2000;16:227–231. doi: 10.1016/S0168-9525(00)02005-9. [DOI] [PubMed] [Google Scholar]
  • 3.Hellmuth M, Stadler PF, Wieseke N. The mathematics of xenology: di-cographs, symbolic ultrametrics, 2-structures and tree- representable systems of binary relations. J Math Biol. 2016;75(1):199–237. doi: 10.1007/s00285-016-1084-3. [DOI] [PubMed] [Google Scholar]
  • 4.Hellmuth M, Wieseke N. From sequence data including orthologs, paralogs, and xenologs to gene and species trees. In: Pontarotti P, editor. Evolutionary Biology: convergent evolution, evolution of complex traits, concepts and methods. Cham: Springer; 2016. pp. 373–392. [Google Scholar]
  • 5.Guigó R, Muchnik I, Smith T. Reconstruction of ancient molecular phylogeny. Mol Phylogenet Evol. 1996;6:189–213. doi: 10.1006/mpev.1996.0071. [DOI] [PubMed] [Google Scholar]
  • 6.Page RDM, Charleston MA. Trees within trees: phylogeny and historical associations. Trends Ecol Evol. 1998;13:356–359. doi: 10.1016/S0169-5347(98)01438-4. [DOI] [PubMed] [Google Scholar]
  • 7.Zmasek C, Eddy S. A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics. 2001;17:821–828. doi: 10.1093/bioinformatics/17.9.821. [DOI] [PubMed] [Google Scholar]
  • 8.Vernot B, Stolzer M, Goldman A, Durand D. Reconciliation with non-binary species trees. J Comput Biol. 2008;15:981–1006. doi: 10.1089/cmb.2008.0092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Hellmuth M, Wieseke N, Lechner M, Lenhof H-P, Middendorf M, Stadler PF. Phylogenomics with paralogs. Proc Natl Acad Sci. 2015;112(7):2058–2063. doi: 10.1073/pnas.1412770112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Roth ACJ, Gonnet GH, Dessimoz C. Algorithm of OMA for large-scale orthology inference. BMC Bioinf. 2008;9:518. doi: 10.1186/1471-2105-9-518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Altenhoff AM, Dessimoz C. Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol. 2009;5:1000262. doi: 10.1371/journal.pcbi.1000262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lechner M, Hernandez-Rosales M, Doerr D, Wieseke N, Thévenin A, Stoye J, Hartmann RK, Prohaska SJ, Stadler PF. Orthology detection combining clustering and synteny for very large datasets. PLoS ONE. 2014;9(8):105015. doi: 10.1371/journal.pone.0105015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Altenhoff AM, Boeckmann B, Capella-Gutierrez S, Dalquen DA, DeLuca T, Forslund K, Huerta-Cepas J, Linard B, Pereira C, Pryszcz LP, Schreiber F, da Silva AS, Szklarczyk D, Train CM, Bork P, Lecompte O, von Mering C, Xenarios I, Sjölander K, Jensen LJ, Martin MJ, Muffato M, Gabaldón T, Lewis SE, Thomas PD, Sonnhammer E, Dessimoz C. Standardized benchmarking in the quest for orthologs. Nat Methods. 2016;13:425–430. doi: 10.1038/nmeth.3830. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hellmuth M, Hernandez-Rosales M, Huber KT, Moulton V, Stadler PF, Wieseke N. Orthology relations, symbolic ultrametrics, and cographs. J Math Biol. 2013;66(1–2):399–420. doi: 10.1007/s00285-012-0525-x. [DOI] [PubMed] [Google Scholar]
  • 15.Hellmuth M. Biologically feasible gene trees, reconciliation maps and informative triples. Algorithms Mol Biol. 2017;12(1):23. doi: 10.1186/s13015-017-0114-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hernandez-Rosales M, Hellmuth M, Wieseke N, Huber KT, Moulton V, Stadler PF. From event-labeled gene trees to species trees. BMC Bioinf. 2012;13(Suppl 19):6. [Google Scholar]
  • 17.Doyon J-P, Ranwez V, Daubin V, Berry V. Models, algorithms and programs for phylogeny reconciliation. Brief Bioinf. 2011;12(5):392. doi: 10.1093/bib/bbr045. [DOI] [PubMed] [Google Scholar]
  • 18.Merkle D, Middendorf M. Reconstruction of the cophylogenetic history of related phylogenetic trees with divergence timing information. Theor Biosci. 2005;4:277–299. doi: 10.1016/j.thbio.2005.01.003. [DOI] [PubMed] [Google Scholar]
  • 19.Charleston MA. Jungles: a new solution to the host/parasite phylogeny reconciliation problem. Math Biosci. 1998;149(2):191–223. doi: 10.1016/S0025-5564(97)10012-8. [DOI] [PubMed] [Google Scholar]
  • 20.Tofigh A, Hallett M, Lagergren J. Simultaneous identification of duplications and lateral gene transfers. IEEE/ACM Trans Comput Biol Bioinf. 2011;8(2):517–535. doi: 10.1109/TCBB.2010.14. [DOI] [PubMed] [Google Scholar]
  • 21.Böcker S, Dress AWM. Recovering symbolically dated, rooted trees from symbolic ultrametrics. Adv Math. 1998;138:105–125. doi: 10.1006/aima.1998.1743. [DOI] [Google Scholar]
  • 22.Hellmuth M, Wieseke N. On symbolic ultrametrics, cotree representations, and cograph edge decompositions and partitions. Cham: Springer; 2015. pp. 609–623. [Google Scholar]
  • 23.Hellmuth M, Wieseke N. On tree representations of relations and graphs: Symbolic ultrametrics and cograph edge decompositions. J Comb Optim. 2017 [Google Scholar]
  • 24.Bansal MS, Alm EJ, Kellis M. Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss. Bioinformatics. 2012;28(12):283–291. doi: 10.1093/bioinformatics/bts225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kahn AB. Topological sorting of large networks. Commun ACM. 1962;5(11):558–562. doi: 10.1145/368996.369025. [DOI] [Google Scholar]
  • 26.Altenhoff AM, Gil M, Gonnet GH, Dessimoz C. Inferring hierarchical orthologous groups from orthologous gene pairs. PLoS ONE. 2013;8(1):53786. doi: 10.1371/journal.pone.0053786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Altenhoff AM, et al. The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements. Nucleic Acids Res. 2015;43(D1):240–249. doi: 10.1093/nar/gku1158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Chen F, Mackey AJ, Stoeckert CJ, Roos DS. OrthoMCL-db: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006;34(S1):363–368. doi: 10.1093/nar/gkj123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lechner M, Findeiß S, Steiner L, Marz M, Stadler PF, Prohaska SJ. Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinf. 2011;12:124. doi: 10.1186/1471-2105-12-124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Östlund G, Schmitt T, Forslund K, Köstler T, Messina DN, Roopra S, Frings O, Sonnhammer ELL. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 2010;38(suppl 1):196–203. doi: 10.1093/nar/gkp931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28(1):33–36. doi: 10.1093/nar/28.1.33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Trachana K, Larsson TA, Powell S, Chen W-H, Doerks T, Muller J, Bork P. Orthology prediction methods: a quality assessment using curated protein families. BioEssays. 2011;33(10):769–780. doi: 10.1002/bies.201100062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Ostell J, Pruitt KD, Schuler GD, Shumway M, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2008;36:13–21. doi: 10.1093/nar/gkm1000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Clarke GDP, Beiko RG, Ragan MA, Charlebois RL. Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores. J Bacteriol. 2002;184(8):2072–2080. doi: 10.1128/JB.184.8.2072-2080.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Dessimoz C, Margadant D, Gonnet GH. DLIGHT—lateral gene transfer detection using pairwise evolutionary distances in a statistical framework. In: Proceedings RECOMB 2008, pp. 315–330. Springer, Berlin; 2008.
  • 36.Lawrence JG, Hartl DL. Inference of horizontal genetic transfer from molecular data: an approach using the bootstrap. Genetics. 1992;131(3):753–760. doi: 10.1093/genetics/131.3.753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA. 1999;96(8):4285–4288. doi: 10.1073/pnas.96.8.4285. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Ravenhall M, Škunca N, Lassalle F, Dessimoz C. Inferring horizontal gene transfer. PLoS Comput Biol. 2015;11(5):1004095. doi: 10.1371/journal.pcbi.1004095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Dondi R, Lafond M, El-Mabrouk N. Approximating the correction of weighted and unweighted orthology and paralogy relations. Algorithms Mol Biol. 2017;12(1):4. doi: 10.1186/s13015-017-0096-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lafond M, El-Mabrouk N. Orthology and paralogy constraints: satisfiability and consistency. BMC Genom. 2014;15(6):12. doi: 10.1186/1471-2164-15-S6-S12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Lafond M, El-Mabrouk N. Orthology relation and gene tree correction: complexity results. In: International workshop on algorithms in bioinformatics, Berlin: Springer; 2015. p. 66–79.
  • 42.Dondi R, El-Mabrouk N, Lafond M. Correction of weighted orthology and paralogy relations-complexity and algorithmic results. In: International workshop on algorithms in bioinformatics, Berlin: Springer; 2016. p. 121–36.
  • 43.Dondi R, Mauri G, Zoppis I. Orthology correction for gene tree reconstruction: Theoretical and experimental results. Procedia Computer Science. International Conference on Computational Science, ICCS 2017, 12-14 June 2017, Zurich, Switzerland. p. 1115–24.
  • 44.Lafond M, Dondi R, El-Mabrouk N. The link between orthology relations and gene trees: a correction perspective. Algorithms Mol Biol. 2016;11(1):1. doi: 10.1186/s13015-016-0067-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Not applicable.


Articles from Algorithms for Molecular Biology : AMB are provided here courtesy of BMC

RESOURCES