Skip to main content
Springer logoLink to Springer
. 2019 Jul 30;79(5):1623–1638. doi: 10.1007/s00285-019-01405-9

Not all phylogenetic networks are leaf-reconstructible

Péter L Erdős 1, Leo van Iersel 2, Mark Jones 2,
PMCID: PMC6800874  PMID: 31363828

Abstract

Unrooted phylogenetic networks are graphs used to represent reticulate evolutionary relationships. Accurately reconstructing such networks is of great relevance for evolutionary biology. It has recently been conjectured that all unrooted phylogenetic networks for at least five taxa can be uniquely reconstructed from their subnetworks obtained by deleting a single taxon. Here, we show that this conjecture is false, by presenting a counter-example for each possible number of taxa that is at least 4. Moreover, we show that the conjecture is still false when restricted to binary networks. This means that, even if we are able to reconstruct the unrooted evolutionary history of each proper subset of some taxon set, this still does not give us enough information to reconstruct their full unrooted evolutionary history.

Keywords: Graph reconstruction, Phylogenetics, Undirected graphs, Leaf removal, Ulam’s Conjecture, Phylogenetic Networks

Introduction

The reconstruction conjecture, introduced in 1941 by Kelly and Ulam (see Bondy and Hemminger 1977), conjectures that each graph with at least three vertices is uniquely reconstructable from its multiset of vertex-deleted subgraphs. Despite more than seven decades of research, the conjecture is still open.

Recently, a variant of this conjecture was introduced that is relevant for the field of phylogenetics, the study of evolutionary relationships. Such relationships among a set X of entities (e.g. biological species or languages) are traditionally described by a tree with no degree-2 vertices and its leaves bijectively labelled by the elements of X; this is called a phylogenetic tree on X. More recently, evolutionary histories are more and more often described by phylogenetic networks (Bapteste et al. 2013), which are basically (directed or undirected) graphs with their leaves bijectively labelled by the elements of X. These networks are able to describe more complex evolutionary relationships than trees.

To find out whether it may be possible to accurately reconstruct phylogenetic networks, an important question to answer is which substructures uniquely define a phylogenetic network. For example, although there is much research directed at reconstructing rooted phylogenetic networks from embedded trees [see e.g. Van Iersel et al. (2016) and Whidden et al. (2013)], these trees do not uniquely define a network [see e.g. Pardi and Scornavacca (2015)]. Hence, no method based on embedded trees can be guaranteed to reconstruct the right network, even when it gets error-free and complete trees as input. Moreover, it has recently been shown that rooted phylogenetic networks also cannot be reconstructed uniquely from their subnetworks obtained by deleting one or more leaves and transforming the result into a valid rooted phylogenetic network (Huber et al. 2014). A similar reconstruction question for pedigrees has also been answered negatively (Thatte 2008).

Here, we focus on unrooted phylogenetic networks, which are undirected graphs with leaves labelled by the elements of some taxon set X. Although real evolutionary histories are rooted, it is not always possible to identify the root location and the directions of all arcs. Therefore, just like unrooted phylogenetic trees are studied in addition to rooted phylogenetic trees, unrooted phylogenetic networks are studied increasingly. Van Iersel and Moulton (2018) studied reconstructing such networks from their X-deck, which consists of the graphs obtained by deleting a single taxon from the network (see Fig. 1 for an example). Several promising results were obtained, including a proof that all phylogenetic trees and all decomposable networks (i.e. networks that can be decomposed into two nontrivial subnetworks by deleting a single edge) are reconstructable from their X-deck, assuming |X|5. Moreover, the same was shown for networks that can be turned into a tree by deleting at most four edges, and for all networks with sufficiently many leaves. The only known networks not reconstructible from their |X|-decks were ones for which |X|4. It was conjectured that all unrooted phylogenetic networks on X, with |X|5, can be uniquely reconstructed from their X-deck.

Fig. 1.

Fig. 1

Example of a phylogenetic network N and its X-deck

Here, we show that this conjecture is false. To do so, we present, for each finite set X containing at least four elements, two unrooted phylogenetic networks on X that are not isomorphic but have the same X-deck. Moreover, we also give binary networks with these properties, hence showing that the conjecture restricted to binary networks is still false. These results can be seen as the unrooted counterpart to the results from Huber et al. (2014). However, we also note that there are important differences between the rooted and unrooted case, which make it impossible to directly transform the rooted counter-examples to the unrooted case, see Sect. 2.1.

Our result may have consequences for developing “supernetwork” methods, which attempt to reconstruct phylogenetic networks from subnetworks. Supertree methods work well for phylogenetic trees, which can be explained from the fact that a phylogenetic tree is uniquely determined by its induced set of four-leaved trees (or three-leaved trees in the case of rooted trees). Since phylogenetic networks are not uniquely determined by their subnetworks, developing supernetwork methods will be significantly more challenging than in the tree-case, even for unrooted networks.

The structure of the paper is as follows. We start off by giving formal definitions related to phylogenetic networks and binary sequences, which are central to the construction of our counter-examples, in Sect. 2. In Sect. 2.1, we explain why unrooting the counter-example for the rooted case from Huber et al. (2014) does not give a counter-example for the conjecture considered here. Then, in Sect. 3, we present our counter-examples for the unrooted, non-binary case. Finally, in Sect. 4 we show how these can be transformed into counter-examples for the unrooted, binary case.

Preliminaries

A phylogenetic tree on X is an undirected simple tree, with no degree-2 vertices, such that each leaf is bijectively labelled by an element from X. A biconnected component of a graph is a maximal 2-edge-connected subgraph and it is called a blob if it contains at least two edges. Let X be a finite set with |X|2, and let N be an undirected simple graph in which the leaves (degree-1 vertices) are bijectively labelled by the elements of X. We say N is an unrooted phylogenetic network on X if contracting each blob into a single vertex gives phylogenetic tree (or equivalently, each cut-edge induces a unique partition of the leaves). In addition, we say that N is binary if every vertex has degree 1 or 3. In what follows, we will refer to unrooted phylogenetic networks as networks for short.

Let G and H be two partially labelled undirected multigraphs with the same label set, such that |V(G)|=|V(H)|. Let f:V(G)V(H) be a bijective function. We say that f is an isomorphism between G and H if it is both label-preserving (that is, vertex aV(G) has label l if and only if f(a) has label l) and edge-preserving (that is, for any a,bV(G) the number of edges between a and b in G is equal to the number of edges between f(a) and f(b) in H). We say G and H are equivalent, denoted GH, if there is an isomorphism between G and H.

Given an undirected multigraph G with no vertices of degree 2, and a vertex aV(G), we denote by Ga the undirected multigraph derived from G by deleting a and all incident edges, and then suppressing any degree-2 vertices. We say Ga is derived from G by removing the vertex a. For a label x, we may write Gx to refer to Ga, where a is the unique vertex in G with label x.

Given a network N on X, an X-reconstruction of N is a network N on X such that NxNx for all xX. We call a phylogenetic network Nleaf-reconstructible if NN for every X-reconstruction N of N. That is, all X-reconstructions of N are isomorphic to each other.

It was conjectured in Van Iersel and Moulton (2018) that all unrooted phylogenetic networks with 5 or more leaves are leaf-reconstructible. (We note that phylogenetic trees on 5 or more leaves are leaf-reconstructible, as it is clearly possible to reconstruct every quartet in the tree.)

In this paper, we show that the conjecture is false. More precisely, we will show that for each r4, there exist binary unrooted phylogenetic networks N and N on X with |X|=r, such that NN, but NxNx for all xX. Thus, N and N are not leaf-reconstructible.1

Finally, for an integer k, let [k] denote the set {1,2,,k}.

Unrooting the rooted counter-example

Huber et al. (2014) showed that for any r3, there exist rooted binary networks M and M on X with |X|=r, such that MM, but M|XM|X for any strict subset X of X. Here M|X denotes the subnet of Minduced by X; roughly speaking, M|X is derived from M by deleting any vertices not on a directed path from the root to an element of X, then suppressing any degree-2 vertices and parallel arcs [see Huber et al. (2014) for full details].

We note that one cannot create a counterexample to the leaf-reconstruction conjecture by simply taking the directed networks M,M given by Huber et al. and replacing them with their underlying undirected graphs G,G. A key observation here is that for any xX, the network M|X\{x} may have many fewer vertices and arcs than M, whereas the graph Gx has at most two fewer edges and two fewer vertices than G. Indeed, Fig. 2 gives two networks N,N on X={a,b,c,d} that correspond to the undirected versions (after suppressing degree-2 vertices) of the networks given by Huber et al. for r=4. We observe that the distance between a and b is 7 in Nd, and 6 in Nd, and thus these networks do not have the same X-deck. Thus the approach of Huber et al. cannot be naively used to give our result. However, the two papers do use similar ideas, in particular the use of binary sequences in the construction of a network (see Sect. 2.2).

Fig. 2.

Fig. 2

The underlying undirected networks N and N of two rooted networks which, in Huber et al. (2014), were shown to have the same induced subnetworks for any strict subset of the leaf set X={a,b,c,d}. We observe that NdNd, since the shortest path between a and b has length 7 in Nd and length 6 in Nd. Hence, these networks can not be used as counter-example for the leaf-reconstruction conjecture

Binary sequences

Given an alphabet Σ, let wΣ be a sequence of elements with elements drawn from Σ. If Σ={0,1} then we call w a binary sequence. The length of the sequence w, denoted l(w), is the number of elements in w. We write wi to denote the i’th element of w. We often write e1e2el to denote the sequence w such that l(w)=l and wi=ei for each i[l]. (Thus, for example, 1011 denotes the length-4 binary sequence whose second element is 0 and whose first, third and fourth elements are 1.) Given a binary sequence w, the weight of w is the number of 1’s in w. For an integer l, we write Bl to denote the set of binary sequences of length l. Given a sequence wBr and i[r], let wi be the sequence derived from w by replacing the i’th element with 1-wi (for example, if w=1001 and i=3, then wi=1011).

Central to the proof of our result is the idea that for a binary sequence w, one needs to know all elements of w in order to decide whether w has odd or even weight. (Note that here and in the rest of the paper, we consider a sequence of weight 0 to have even weight.) For some integer r, consider the set Breven of all length-r binary sequences of even weight, and the set Brodd of all length-r binary sequences of odd weight. Given a length-r binary sequence w and integer i[r], let w-i denote the sequence on {0,1,} derived from w by replacing the i’th element with . Then for each wBreven, there exists a sequence wBrodd such that (w)-i=w-i (indeed, wi is such a sequence). For a set of sequences S and i[r], let Br-i={w-i;wS}. Then it follows that for each i[r], the sets (Brodd)-i and (Breven)-i are the same.

We will use this concept to guide our construction of two networks Neven and Nodd on a set X={x1,,xr}. Roughly speaking, Neven can be thought of as a representation of Breven, and Nodd can be thought of as a representation of Brodd. Then for each i[r], (Neven)xi corresponds to (Breven)-i, and (Nodd)xi corresponds to (Brodd)-i. Just as (Breven)-i=(Brodd)-i, we will be able to show that (Neven)xi and (Nodd)xi are equivalent, while originally Neven and Nodd are different.

A non-binary example

In order to demonstrate the main concepts of our construction, we first give a construction using non-binary graphs. In the next section, we will construct an example with binary phylogenetic networks, using these non-binary graphs as a guide.

For some integer r4, let X denote the set of labels {x1,,xr}. We will construct two graphs Meven and Modd, in which the leaves are bijectively labelled by the elements of X. As in the previous section, let Breven denote the set of all length-r binary sequences of even weight, and let Brodd denote the set of all length-r binary sequences of odd weight.

The graph Meven is constructed as follows. For each i[r], let Meven contain vertices vi,0 and vi,1, and a leaf labelled with xi, such that xi is adjacent to vi,0.2 For each wBreven, let Meven contain a vertex uw. For each wBreven and i[r], let uw be adjacent to vi,0 if wi=0, and let uw be adjacent to vi,1 if wi=1. This completes the construction of Meven (see Fig. 3a).

Fig. 3.

Fig. 3

Non-binary example of Meven and Modd for the case when when r=4. Vertices uw are adjacent to vertices vi,h if and only if wi=h

The construction of Modd is identical to that of Meven, except that we have a vertex uw for each wBrodd rather than each wBreven. For completeness, the full construction is as follows: For each i[r], let Modd contain vertices vi,0 and vi,1, and a leaf labelled with xi, such that xi is adjacent to vi,0. For each wBrodd, let Modd contain a vertex uw. For each wBrodd and i[r], let uw be adjacent to vi,0 if wi=0, and let uw be adjacent to vi,1 if wi=1. This completes the construction of Modd (see Fig. 3b).

Lemma 1

Meven and Modd are not equivalent.

Proof

Suppose for a contradiction that Meven and Modd are equivalent, and let f:V(Meven)V(Modd) be an isomorphism between Meven and Modd. Let 0 denote the all-0 sequence from Breven. Observe that for each i[r], the distance between u0 and xi is 2 (as both u0 and xi are adjacent to vi,0). It follows that f(u0) must have distance 2 to f(xi)=xi in Modd , for each i[r]. We will show that no such f(u0) exists in Modd, a contradiction to the existence of f.

Observe that by construction of Modd (in particular, the fact that it is a bipartite graph with one side consisting of vertices vj,0 or vj,1), the distance between any leaf xi and any vertex vj,0 or vj,1 is odd. It follows that f(u0) must be the vertex uw, for some wBrodd (any other vertex is either a leaf, which has distance 0 from itself, or has odd distance from any leaf). However, for any wBrodd there exists i[r] such that wi=1, and so uw is not adjacent to vi,0. As vi,0 is the only vertex adjacent to xi, it follows that the distance between uw and xi is greater than 2, and so f(u0)uw.

As there is no choice for f(u0) that satisfies the conditions of an isomorphism, we have that there is no possible isomorphism between Meven and Modd, and so Meven and Modd are not equivalent.

Lemma 2

For each i[r], (Meven)xi(Modd)xi.

Proof

Observe that vi,0 and vi,1 each have 2r-24 neighbors in Meven not including xi (as |Breven|=2r-1 and exactly half of the sequences in Breven have 1 as their i’th element). Also any vertex uw has r4 neighbors in Meven. It follows that if xi is deleted from Meven, the remaining graph has no vertices of degree 2, and thus (Meven)xi is exactly Meven with xi deleted. By a similar argument, (Modd)xi is exactly Modd with xi deleted.

Now define a bijective function f:V((Meven)xi)V((Modd)xi) as follows. For each wBreven, let f(uw)=uwi. Observe that this defines a bijection between {uw:wBreven} and {uw:wBrodd}. Let f(vi,0)=vi,1 and f(vi,1)=vi,0. For j[r]\{i}, let f(vj,0)=vj,0,f(vj,1)=vj,1 and f(xj)=xj (recall that the leaf xi does not appear in (Meven)xi or (Modd)xi, so we do not need to define f(xi)).

By construction, f is a bijective function from V((Meven)xi) to V((Modd)xi). It remains to show that f is label-preserving and edge-preserving. As f is the identity on all labelled vertices, f is label-preserving. As (Meven)xi and (Modd)xi are simple graphs, to show that f is edge-preserving it is enough to show that two vertices ab are adjacent in (Meven)xi if and only if f(a) and f(b) are adjacent in (Modd)xi.

So consider any a,bV((Meven)xi). Suppose first that a=uw for some wBreven and that b=vj,h for some j[r]\{i} and h{0,1}. Then a and b are adjacent if and only if wj=h. By definition of f, we have f(a)=uwi, and we note that (wi)j=wj. Finally, we have that f(a) and f(b)=vj,h are adjacent if and only if (wi)j=h. Putting it together, we have that abE((Meven)xi)wj=hwji=hf(a)f(b)E((Meven)xi). Thus a and b are adjacent if and only f(a) and f(b) are adjacent.

Next suppose that a=uw for some wBreven and that b=vi,h for some h{0,1}. Then a and b are adjacent if and only if wi=h. Furthermore f(a)=uwi where wii=1-wi, and f(a) and f(b)=vi,1-h are adjacent if and only if wii=1-h. Thus abE((Meven)xi)wj=hwji=1-hf(a)f(b)E((Meven)xi).

If a and b are uw,uw for some w,wBreven, then a and b are not adjacent, and neither are f(a) and f(b) (which are both vertices uw,uw for some w,wBrodd). By a similar argument, if a and b are both vertices vj,h for some j[r] and h{0,1}, then ab are not adjacent and f(a), f(b) are not adjacent. If b=xj for some j[r]\j, then a and b are adjacent if and only if a=vj,0, which holds if and only if f(a)=vj,0, which in turn holds if and only if f(a) is adjacent to xj=f(b). This covers all possible cases, and so we have that a and b are adjacent if and only if f(a) and f(b) are adjacent. This completes the proof that f is an isomorphism, and so (Meven)xi(Modd)xi.

A binary example

In this section, we show how to construct two binary networks on X that are X-reconstructions of each other but are not equivalent, for |X|4. (An example of two such networks for the case when |X|=4 is given in Fig. 7.) This is enough to show that networks on r4 leaves are not leaf-reconstructible.

Fig. 7.

Fig. 7

Binary example of Neven and Nodd for the case when when r=4. The vertex u0000 in Neven has distance d1=3 from x1, d2=3 from x2, d3=4 from x3, and d4=4 from x4. Moreover there is no vertex in Nodd with distance di from xi for each i[4]. Thus Neven and Nodd are not equivalent. However, for each i[4] the multigraphs (Neven)xi and (Nodd)xi are equivalent, using an isomorphism that maps each vertex uw to uwi

Given the non-binary networks Meven and Modd constructed in the previous section, we proceed to construct two graphs Geven and Godd in the following way. For each binary sequence wBr, uw will be expanded into a caterpillar Cat(w) (details of the construction are given below). Each vertex vi,h will be expanded into a lexicographic tree Lex(i,h)even or Lex(i,h)odd (defined below). These subgraphs contain leaves denoted zw,i, for wBr and i[r]. Two subgraphs Cat(w) and Lex(i,h)even (or Cat(w) and Lex(i,h)odd ) will share a vertex zw,i if and only if wi=h (analogous to how in Meven and Modd, the vertices uw and vi,h are adjacent if and only if wi=h).

Similarly to Meven and Modd, we will show that Geven and Godd are not equivalent, but that they become equivalent if a single leaf xi is deleted.

We note that Geven and Godd are not technically networks, because while they have maximum degree 3, they contain some vertices of degree 2 (in particular, every vertex zw,i has degree 2). In the last part of this section, we will produce two networks Neven and Nodd from Geven and Godd.

We now define the two types of tree that will be used in our construction.

Definition 1

For any sequence wBr, the caterpillarCat(w) is the tree with internal vertices uw and yw,i for each i[r-3], leaves zw,i for each i[r], and edges uwzw,1,uwzw,2,uwyw,1, yw,r-3zw,r-1,yw,r-3zw,r, and yw,izw,i+2,yw,iyw,i+1 for each 1ir-4.

See Fig. 4 for an example. Observe that all internal vertices of Cat(w) have degree 3.

Fig. 4.

Fig. 4

The caterpillar Cat(w) for the case r=5

Observation 1

Given sequences w,wBr, the trees Cat(w) and Cat(w) are equivalent. In particular, there exists an isomorphism f between Cat(w) and Cat(w) such that f(uw)=uw and f(zw,i)=zw,i for all i[r].

Definition 2

Given a set S of binary sequences such that |S|=2t for some positive integer t, and i[r], the lexicographic tree Lex(iS) is a fully balanced binary tree with leaves zw,i for wS. All non-leaf vertices have degree 3 except for a single vertex, called the root, of degree 2, and all leaves are of distance exactly t from the root. Moreover, the leaves are arranged in such a way that there exists a depth-first search of the vertices of Lex(iS) that traverses the leaves zw,i in lexicographic order with respect to w. (Note that this uniquely determines Lex(iS).)

Definition 3

Let (Breven)i:h be the set of all length-r binary sequences w of even weight such that wi=h. Let (Brodd)i:h be the set of all length-r binary sequences w of odd weight such that wi=h.

Definition 4

For any i[r] and h{0,1}, define Lex(i,h)even=Lex(i,(Breven)i:h), and define Lex(i,h)odd=Lex(i,(Brodd)i:h). (Thus the leaves of Lex(i,h)even are zw,i for w(Breven)i:h, and the leaves of Lex(i,h)odd are zw,i for w(Brodd)i:h). We refer to the root of Lex(i,h)even by vi,heven, and we refer to the root of Lex(i:h)odd by vi,hodd.

See Fig. 5 for some examples.

Fig. 5.

Fig. 5

The lexicographic trees Lex(2,0)even and Lex(2,0)odd for the case r=4. Leaves of Lex(2,0)even (respectively, Lex(2,0)odd) are zw,2 for every length-r sequence w of even weight (odd weight) such that w2=0

Lemma 3

For any j[r]\{i} and h{0,1}, there exists an isomorphism f between Lex(j,h)even and Lex(j,h)odd such that f(vj,heven)=vj,hodd, and f(zw,j)=zwi,j for all w(Breven)j:h.

Also, for any h{0,1} there exists an isomorphism f between Lex(i,h)even and Lex(i,1-h)odd such that f(vi,heven)=vi,1-hodd, and f(zw,i)=zwi,i for all w(Breven)i:h.

Proof

Observe that the root of a lexicographic tree is unique, as it is the only vertex of degree 2. Then for any integer l and leaf zw,j in a lexicographic tree, we may define the depth-lancestor of zw,j as follows. The depth-l ancestor of zw,j is the unique vertex on a path between zw,j and the root, that has distance l from zw,j . Note that we count the root itself as a depth-(r-2) ancestor of every leaf, and each leaf is the depth 0 ancestor of itself. Moreover, because a lexicographic tree is fully balanced, if a vertex a is the depth-l ancestor of one leaf and the depth-l ancestor of another leaf then l=l.

In order to prove the first claim, we first show that for any two sequences w,w(Breven)j:h and integer l, the leaves zw,j,zw,j share a depth-l ancestor in Lex(j,h)even if and only if zwi,j, zwi,j share a depth-l ancestor in Lex(j,h)odd. Indeed, it is easy to see that zw,j,zw,j share a depth-l ancestor if and only if w,w agree on the first r-2-l elements not including j. But if w,w agree on these elements then so do wi,wi, and so zwi,j, zwi,j also share a depth-l ancestor.

Thus, we may define a bijective function f:V(Lex(j,h)even)V(Lex(j,h)odd) as follows. For any vertex aV(Lex(j,h)even) with distance r-2-l from the root, choose any sequence w(Breven)j:h such that a is a depth-l ancestor of zw,j, and let f(a) be the depth-l ancestor of zwi,j in Lex(j,h)odd. Observe that f is well-defined, since we have just shown that if two leaves zw,j,zw,j share a as a depth-l ancestor, then zwi,j, zwi,j also have the same depth-l ancestor.

By construction, it is clear that f(vj,heven)=vj,hodd, and f(zw,j)=zwi,j for all w(Breven)j:h. To see that f is an isomorphism it remains to show that f is edge-preserving. To see this, observe that two vertices a,bV(Lex(j,h)even) are adjacent if and only if one is the depth-l ancestor and the other the depth(l+1) ancestor of some leaf, and that this holds if and only if f(a), f(b) are also adjacent.

The proof of the second claim is similar.

We can now describe the structure of Geven and Godd.

For each wBreven, let Geven contain the caterpillar Cat(w). For each i[r] and h{0,1}, let Geven contain the lexicographic tree Lex(i,h)even. Finally, for each i[r] let Geven contain the labelled leaf xi adjacent to vi,0even.

The construction of Godd is similar: For each wBrodd, let Godd contain the caterpillar Cat(w). For each i[r] and h{0,1}, let Godd contain the lexicographic tree Lex(i,h)odd. Finally, for each i[r] let Godd contain the labelled leaf xi adjacent to vi,0even.

Observe that in both Geven and Godd, the vertices zw,i have degree 2 (as they appear as a leaf in the caterpillar Cat(w) and in the lexicographic tree Lex(i,wi)even or Lex(i,wi)odd ). The vertices vi,1 also have degree 2, and all other non-leaf vertices have degree 3.

We will later show that Geven and Godd are not equivalent. First though, we will show that the multigraphs derived from Geven and Godd by deleting (not removing) the same leaf are in fact equivalent. (Recall that the difference between deleting and removing a vertex v is that removing v involves the extra step of suppressing any degree-2 vertices left after deleting v.)

Lemma 4

For i[r], let Geven-xi be the graph derived from Geven by deleting xi and its incident edge, and similarly let Godd-xi be the graph derived from Godd by deleting xi and its incident edge. Then Geven-xi and Godd-xi are equivalent.

Proof

We will describe a set of isomorphisms between subgraphs of Geven-xi and Godd-xi, then combine them to produce an isomorphism between Geven-xi and Godd-xi. Each isomorphism will be one that maps vertex zw,j to zwi,j.

For each wBreven, Observation 1 implies that there exists an isomorphism f between Cat(w) and Cat(wi) such that f(zw,j)=zwi,j for each j[r]. For each j[r]\{i} and h{0,1}, Lemma 3 implies that there exists an isomorphism f between Lex(j,h)even and Lex(j,h)odd, such that f(vj,h)=vj,h and f(zw,j,j)=zwi;j for each leaf zw,j. Finally, for each h{0,1}, Lemma 3 implies that there exists an isomorphism f between Lex(i,h)even and Lex(i,1-h)odd, such that f(vi,heven)=vi,1-hodd and f(zw,i)=zwi,i for each leaf zw,i.

Observe that all of these isomorphisms agree on zw,j for any wBreven,j[r] (that is, they each map this vertex to zwi,j), and such vertices are the only vertices that are shared between caterpillars and lexicographic trees. Thus we can combine these isomorphisms into a single edge-preserving function f that maps every non-leaf vertex of Geven-xi to a non-leaf vertex of Godd-xi. Moreover, as each caterpillar and lexicographic tree in Geven-xi is mapped to a different caterpillar or lexicographic tree in Godd-xi, this function is a bijection. Finally, set f(xj)=xj for every j[r]\i. Then f is now a bijective function from V(Geven-xi) to V(Godd-xi) that is both edge-preserving and label-preserving.

We note that we cannot extend the above graph isomorphism between Geven-xi and Godd-xi to an isomorphism between Geven and Godd by setting f(xi)=xi, because f(vi,0even)=vi,1odd, and so there would be no edge between xi=f(xi) and f(vi,0even)=vi,1odd in Godd .

In fact, the next lemma shows that there is no isomorphism between Geven and Godd.

Lemma 5

Let 0 denote the all-0 sequence from Br. For two vertices ab in Geven, let disteven(a,b) denote the distance between a and b in Geven. Similarly for two vertices ab in Godd, let distodd(a,b) denote the distance between a and b in Godd. Then for any vertex a in Godd:

  1. If distodd(a,x1)=disteven(u0,x1) then a=uw for some wBrodd.

  2. If a=uw for some wBrodd then there exists i[r] such that distodd(a,xi)>disteven(u0,xi).

This holds even if we suppress all degree-2 vertices in Geven and Godd.

Proof

We consider the two parts of the claim separately.

  1. We first calculate the value of disteven(u0,x1). Recall that in Geven, x1 is adjacent to the root v1,0 of Lex(1,0)even, and (by definition) every leaf of Lex(1,0)even has distance r-2 from v1,0. As u0 is adjacent to a leaf z0,1 of Lex(1,0)even, it follows that disteven(u0,x1)=1+r-2+1=r (there is no shorter path from u0 to x1, as any path must pass through zw,1 for some w).

    As all leaves in Lex(1,0)odd have distance r-2 from v1,0 in Godd, and therefore distance r-1=disteven(u0,x1)-1 from x1, it follows that the only vertices in Godd of distance disteven(u0,x1) from x1 are those which are not in Lex(1,0)odd but adjacent to a leaf zw,1 of Lex(1,0)odd. By construction, all such vertices are uw for some wBrodd such that w1=0.

    When degree-2 vertices are suppressed, a similar argument holds, except that disteven(u0,x1) is reduced by 1 (as we suppress z0,1). It remains the case that the vertices in Godd of distance disteven(u0,x1) from x1 are those which are incident to a vertex from Lex(1,0)odd but not in Lex(1,0)odd themselves, and again all such vertices are uw for some wBrodd .

  2. For any wBrodd, there exists i[r] such that wi=1. Any path from uw to xi must pass through a vertex zw,i where wi=0, and all such vertices have equal distance from xi. Thus, it is enough to show that the distance in Godd between uw and any such zw,i is greater than the distance between u0 and z0,i in Geven.

    To see this, consider a path P between uw and zw,i. As wi=0, we note that ww and so P must traverse at least one lexicographic tree. We construct a mapping g:V(P)V(Cat(0)), as follows. For any aV(P), if a is in Cat(w) for any wBrodd (including w or w), set g(a)=f(a), where f is the isomorphism between Cat(w) and Cat(0) such that f(uw)=u0 and f(zw,j)=z0,j for all j[r] (such an isomorphism exists by Observation 1). Otherwise, it must be the case that aLex(j,h)odd for some j[r],h{0,1}. In this case, set g(a)=z0,j. Let Q be the set of all g(a) for any vertex a in P. Observe that for any vertices ab in P, if a and b are adjacent then either g(a)=g(b) or g(a) and g(b) are adjacent. It follows that Q forms a connected set of vertices in Cat(0), and thus Q contains a path between g(uw)=u0 and g(zw,i)=z0,i. Moreover, as P must traverse at least one lexicographic tree, there are consecutive vertices in P that are mapped to the same vertex by g. It follows that the path in Q is shorter than the path P, as required. It follows that the distance between uw and xi is greater than disteven(u0,xi). We note that a similar argument applies even when vertices of degree 2 are suppressed.

Corollary 1

Geven and Godd are not equivalent.

The next lemma will be used to show that when we suppress degree-2 vertices in Geven and Godd, the resulting graphs Neven and Nodd are networks.

Lemma 6

In both Geven and Godd, there exists a single blob containing all non-leaf vertices.

Proof

Observe that any non-leaf vertex is part of a path between uw and vi,h for some wBr, i[r],h{0,1}. Furthermore every vertex vi,h appears on a path between uw and uw for some w,w. Therefore it is enough to show that for any ww, uw and uw appear in the same blob.

Let 00,01,11,10 be four sequences in Breven such that hk1=h,hk2=k (such sequences exist as r>3).

Then there exists a cycle

u00z00,1z01,1u01z01,2z11,2u11z11,1z10,1u10z10,2z00,2u00.

Here the path between z00,1 and z01,1 passes through Lex(1,0)even, the path between z01,2 and z11,2 passes through Lex(2,1)even, the path between z11,1 and z10,1 passes through Lex(1,1)even, and the path between z10,2 and z00,2 passes through Lex(2,0)even. See Fig. 6 for an example when 00=0000,01=0101,11=1100 and 10=1001.

Fig. 6.

Fig. 6

A cycle containing the vertices u0000,u0101,u1001,u1100 in Geven for the case r=4

As 00,01,11,10 appear on a cycle, they appear in the same blob of Geven. Moreover as any vertex uw could fill the role of one of 00,01,11,10, we have that all uw appear in the same blob. A similar argument holds for Godd.

Now we are ready to construct the networks Neven and Nodd: Let Neven be derived from Geven by suppressing all vertices of degree 2. Similarly, let Nodd be derived from Godd by suppressing all vertices of degree 2 (see Fig. 7 for the networks when r=4).

Lemma 7

Neven and Nodd are networks on X.

Proof

We show that Neven is a network on X (the proof for Nodd is similar). By construction, all vertices in Neven have degree 1 or 3 and the leaves are bijectively labelled with the elements of X. It remains to show that contracting each blob into a single vertex gives a tree with no degree-2 vertices, which we will do by showing that Neven has only one blob. By Lemma 6, all non-leaf vertices in Geven are part of the same blob in Geven. Observe that if two degree-3 vertices are in the same blob, then they are still in the same blob after contracting degree-2 vertices. Thus, all non-leaf vertices in Neven are part of the same blob, and thus Neven has a single blob, as required.

Lemma 8

Neven and Nodd are not equivalent.

Proof

As Neven and Nodd are derived from Geven and Godd by suppressing degree-2 vertices, Lemma 5 implies that there is no vertex in Nodd that has the same distance from each leaf xi as u0 has from xi in Neven.

This implies that there is no isomorphism between Neven and Nodd, as if f is edge-preserving and label-preserving then the distance between u0 and xi is equal to the distance between f(u0) and f(xi)=xi.

Lemma 9

For each i[r], (Neven)xi and (Nodd)xi are equivalent.

Proof

Recall the definitions of Geven-xi and Godd-xi, and observe that (Neven)xi (respectively, (Nodd)xi) can be derived from Geven-xi (Godd-xi) by suppressing degree-2 vertices. By Lemma 4, there exists an isomorphism f between Geven-xi and Godd-xi. So define a bijective function f:V((Neven)xi)V((Nodd)xi) by setting f(a)=f(a) for all aV((Neven)xi). Note that if a does not have degree 2 in Geven-xi, f(a) also does not have degree 2 in Godd-xi. Thus if aV((Neven)xi) then f(a)=f(a)V((Nodd)xi), and so f is well-defined.

By construction, f is label-preserving. To see that f is edge-preserving, consider some a,bV((Neven)xi). Observe that the number of edges between a and b in (Neven)xi is equal to the number of paths between a and b in Geven-xi whose internal vertices have degree 2. As f is an isomorphism, this is equal to the number of paths between f(a) and f(b) in Godd-xi whose internal vertices have degree 2, which in turn is equal to the number of edges between f(a) and f(b) in (Nodd)xi. Thus, f is edge-preserving, and so f is an isomorphism.

Lemmas 78 and  9 imply the following theorem:

Theorem 2

For any r4, there exist networks Neven, Nodd on X with |X|=r, such that Nodd is a leaf-reconstruction of Neven, but Neven and Nodd are not equivalent. Thus, Neven is not leaf-reconstructible.

Concluding remarks

Although we have shown that not all phylogenetic networks with five or more leaves are leaf-reconstructible, this does not mean that reconstructing networks from subnetworks is completely hopeless. There are already some positive results for interesting restricted network classes (Van Iersel and Moulton 2018). Moreover, since the presented counter-examples are very complex, it is certainly possible that other reasonable network classes are also leaf-reconstructible.

For example, while it is known that all networks with at least five leaves and |E|-|V|3 are leaf-reconstructible, the counter-examples presented in this paper have |E|-|V|=2r-1(r-1)-2r, with r the number of leaves. Hence, whether networks with 3<|E|-|V|<2r-1(r-1)-2r are leaf-reconstructible is still open. In particular, is it possible to construct counter-examples where |E|-|V| is bounded by a linear function of the number of leaves?

Footnotes

1

It was previously known that networks on r=4 leaves are not leaf-reconstructible in general. We nevertheless include the case r=4 in our paper, as it allows us to give simpler figures than for the r=5 case.

2

We note that in this section and next, we will often give names to particular vertices in the graphs we construct. This is done to differentiate between vertices, in order to aid in the description of the construction and help define isomorphisms. However, this is not the same as labelling the vertices; the only labelling that will occur is the labelling of leaves with elements of X.

LvI and MJ were funded in part by the Netherlands Organization for Scientific Research (NWO), including Vidi Grant 639.072.602 and LvI also partly by the 4TU Applied Mathematics Institute. PLE was supported in part by the National Research, Development and Innovation Office—NKFIH Grant K 116769 and KH 126853.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Péter L. Erdős, Email: erdos.peter@renyi.mta.hu

Leo van Iersel, Email: L.J.J.vanIersel@tudelft.nl.

Mark Jones, Email: markelliotlloyd@gmail.com.

References

  1. Bapteste E, van Iersel L, Janke A, Kelchner S, Kelk S, McInerney JO, Morrison DA, Nakhleh L, Steel M, Stougie L, et al. Networks: expanding evolutionary thinking. Trends Genet. 2013;29(8):439–441. doi: 10.1016/j.tig.2013.05.007. [DOI] [PubMed] [Google Scholar]
  2. Bondy JA, Hemminger RL. Graph reconstructiona survey. J Graph Theory. 1977;1(3):227–268. doi: 10.1002/jgt.3190010306. [DOI] [Google Scholar]
  3. Huber KT, van Iersel L, Moulton V, Wu T. How much information is needed to infer reticulate evolutionary histories? Syst Biol. 2014;64(1):102–111. doi: 10.1093/sysbio/syu076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Pardi F, Scornavacca C. Reconstructible phylogenetic networks: do not distinguish the indistinguishable. PLOS Comput Biol. 2015;11(4):e1004135. doi: 10.1371/journal.pcbi.1004135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Thatte BD. Combinatorics of pedigrees I: counterexamples to a reconstruction question. SIAM J Discrete Math. 2008;22(3):961–970. doi: 10.1137/060675964. [DOI] [Google Scholar]
  6. van Iersel L, Moulton V. Leaf-reconstructibility of phylogenetic networks. SIAM J Discrete Math. 2018;32(3):2047–2066. doi: 10.1137/17M1111930. [DOI] [Google Scholar]
  7. van Iersel L, Kelk S, Scornavacca C. Kernelizations for the hybridization number problem on multiple nonbinary trees. J Comput Syst Sci. 2016;82(6):1075–1089. doi: 10.1016/j.jcss.2016.03.006. [DOI] [Google Scholar]
  8. Whidden C, Beiko RG, Zeh N. Fixed-parameter algorithms for maximum agreement forests. SIAM J Comput. 2013;42(4):1431–1466. doi: 10.1137/110845045. [DOI] [Google Scholar]

Articles from Journal of Mathematical Biology are provided here courtesy of Springer

RESOURCES