Abstract
Trees with labelled leaves and with all other vertices of degree three play an important role in systematic biology and other areas of classification. A classical combinatorial result ensures that such trees can be uniquely reconstructed from the distances between the leaves (when the edges are given any strictly positive lengths). Moreover, a linear number of these pairwise distance values suffices to determine both the tree and its edge lengths. A natural set of pairs of leaves is provided by any ‘triplet cover’ of the tree (based on the fact that each non-leaf vertex is the median vertex of three leaves). In this paper we describe a number of new results concerning triplet covers of minimum size. In particular, we characterize such covers in terms of an associated graph being a 2-tree. Also, we show that minimum triplet covers are ‘shellable’ and thereby provide a set of pairs for which the inter-leaf distance values will uniquely determine the underlying tree and its associated branch lengths.
Keywords: Trees, Median vertex, 2-Trees, Shellability, Reconstruction
Introduction
Trees play a central role in systematic biology, and other areas of classification, such as linguistics. It is often assumed that such a tree T has a labelled leaf set X, that all vertices have degree 1 or at least three, and that there is an assignment of a positive real-valued length to each edge of T.
A classical and important result from the 1960s and 1970s asserts that any such tree T with edge lengths is uniquely determined from the induced leaf-to-leaf distances between each pair of elements of X. This result is the basis of widely-used methods for inferring trees from distance data, such as the popular ‘Neighbor-Joining’ algorithm (Saitou and Nei 1987). Moreover, when T is binary (each non-leaf vertex has degree 3) then we do not require distance values for all of the pairs from X (where ), since just carefully selected pairs of leaves suffice to determine T and its edge lengths [see Guénoche et al. (2004); more recent results appear in Dress et al. (2012), motivated by the irregular distribution of genes across species in biological data].
This value of cannot be made any smaller, since a binary unrooted tree with n leaves has edges, and the inter-leaf distances are linear combinations of the corresponding edge lengths (so, by linear algebra, these values cannot be uniquely determined by fewer than equations).
There is a particularly natural way to select a subset of for T when T is binary. Since each non-leaf vertex is incident with three subtrees of T, let us (i) select a leaf from each subtree, (ii) consider the three pairs of leaves we can form from this triple, and then (iii) take the union of these sets of pairs over all non-leaf vertices of T. This process produces a ‘triplet cover’ of T (defined more precisely below).
A triplet cover need not be of this minimum size (i.e. of size ) but in this paper we characterize when it is. Also, we show that in that case the resulting triplet cover is ‘shellable’ which implies that the inter-leaf distances defined on these pairs uniquely determine the tree and its edge lengths. These, and other results obtained along the way complement recent work into phylogenetic ‘lasso’ sets (Dress et al. 2012; Huber and Steel 2014), as well as a Hall-type characterization of the median function on trees in Dress and Steel (2009).
We begin with some definitions.
Definitions
Let X be a finite set with . We denote elements in and also by ab and abc, respectively, where are distinct. We refer to the elements in as triples.
A (binary) phylogenetic X-tree is an unrooted tree which has leaf set X, and for which each non-leaf vertex is unlabelled and of degree three. We let B(X) denote the set of binary phylogenetic X-trees (two such trees are regarded as equivalent if there is a graph isomorphism between them that maps leaf x in one tree to leaf x in the other tree, for all ). In evolutionary biology, the set X usually corresponds to some collection of species or taxa.
Note that a phylogenetic X-tree T must contain at least one cherry , that is, a and b are adjacent with the same interior vertex of T. Moreover, if then each tree has at least two cherries that are vertex disjoint from each other; if T has exactly two cherries we say it is a caterpillar tree [every tree in B(X) is a caterpillar when or ]. When , we say that is a quartet, and if the two cherries of this tree are (say) and then we denote T by ab|cd.
We let denote the set of interior vertices of T. Given where , we let denote the phylogenetic -tree which is obtained by removing the leaf x (and its incident edge) from T and suppressing the resulting degree 2 vertex.
Suppose that is a subset of , and . We say that a triple in supports a vertex in T (relative to ) if we can select leaves , one from each connected component of the graph obtained by removing v and its incident edges from T, such that . We call a subset a triplet cover for T if for each vertex there is some triple in that supports v (relative to ). Note that holds in this case. Given a non-empty subset , we define the cover graph (of ) to be the graph with vertex set X and edge set .
We illustrate these concepts in Fig. 1. For the binary phylogenetic X-tree in Fig. 1i (with ) the vertex v (in Fig. 1ii) is supported by the triple bce (there are three other triples that support v). If u is supported by, say, abc and w by cde then we obtain the triplet cover
The corresponding cover graph is shown in Fig 1iii.
Given a tree , a triplet cover for T is called
minimal if is not a triplet cover for T, for any ;
minimum if for every triplet cover for T.
These two concepts are different; there exist minimal triplet covers that are not minimum (we describe an example in the final section).
Note that it can be shown that any minimum triplet cover on X must have cardinality [by applying Theorem 1 and Proposition 1 of Dress et al. (2012)]. Moreover, there are various ways to construct triplet covers that are minimum [for example, ‘pointed covers’ (Dress et al. 2012, Theorem 7) and ‘stable triplet covers’ (Huber and Steel 2014, Theorem 1)].
Outline of main results
In this paper, we prove a structural result concerning minimum triplet covers. Namely, we prove that a set is a minimum triplet cover for a tree if and only if the associated cover graph is a 2-tree (see Theorem 1 and Sect. 5 for the definition of a 2-tree).
Using the concepts that we develop to prove this result, we also give an independent proof [that does not require the notion of phylogenetic ‘lassos’ from Dress et al. (2012)] that any minimum triplet cover on X must have cardinality (Proposition 3). As a corollary of our structural result, we also show that if is a minimum triplet cover for T then it is shellable for T (Proposition 4).
This corollary has two important implications. First it implies [from results in Dress et al. (2012)] that if is a minimum triplet cover for T, then T (together with its edge lengths) can be uniquely reconstructed from the tree metric restricted to the pairs in . Note that this can also be deduced from results in Leclerc and Makarenkov (1998) that relate 2-trees and tree metrics [see also Guénoche et al. (2004)].
Second, the corollary gives an independent proof of Dress et al. (2012), Theorem 7 and Huber and Steel (2014), Theorem 1 which state that pointed triplet covers and stable triplet covers are shellable, respectively.
The support graph
In this section we introduce a graph that can be associated to a triplet cover of a tree. Properties of this graph will be used to help prove our results later on. We begin with some further definitions.
Suppose for the following that . Given a subset and , we let be the subset of which contains precisely those triples in that support v (relative to ). We call the support of v (relative to ). In addition, suppose that are pairwise distinct. Then we call the unique vertex of T that simultaneously lies on the shortest path from a to b, from b to c, and from a to c the median of a, b, and c, denoted by . The following observation linking medians with supports will be useful.
Lemma 1
Let and . If , , then . Moreover, is a triplet cover of T if and only if for all .
Now, given a non-empty subset and some , we put
Put differently, is the subset of obtained by removing from precisely those elements in which contain x. We also define a bipartite graph , with edge , , , if for all . We call the support graph associated to . For any vertex p of , we let denote the degree of p in . In Fig. 2ii we illustrate the support graph for the triplet cover given in Fig. 1.
We now list some properties of .
Proposition 1
Suppose that and are triplet covers of a tree , and that .
If , then , and .
If , then . In particular, if there exists some with , then .
If is a minimal triplet cover for T, then for all , there exists some such that a, v, b is a path in .
Suppose that v is the vertex adjacent to x in T. Then . Furthermore if and only if is the only edge in that contains x.
if and only if is a triplet cover of .
If , then .
Proof
The inequality follows immediately from the definition of the support of a vertex and the fact that T is binary. The inequality follows since for all for the vertex u that is adjacent to x in T. The inequality follows from the fact that and so has interior vertices.
Suppose that , . Then , for all . Since as it follows that for all . Hence, . The second statement is a trivial consequence in light of the inequality from (P1).
Suppose for contradiction that there exists some such that for all , we have that a, v, b is not a path in . Then for all there must exist some such that . Hence, is a triplet cover of T. Since clearly holds, we obtain a contradiction in view of the minimality of .
That holds is an immediate consequence of the choice of v. If , then since for all , it follows that is in . The rest of the statement follows immediately.
-
Suppose that is not a triplet cover of . Then, by Lemma 1, there exists an interior vertex u of such that . Let be the vertex in T that corresponds to u in . Then as , it follows that for all . Hence and, so, . Moreover, if v is the vertex adjacent to x in T, then . By (P4), it follows that is also an edge in . Therefore .
Conversely, suppose that is a triplet cover for , and assume for contradiction that . Then there exist distinct such that for all and for all . Without loss of generality, we may assume that v is the vertex in T that is adjacent to x. Let be the vertex in that corresponds to u in T. Then since for all . Hence is not a triplet cover for , a contradiction.
If v is the vertex in T adjacent to x, then by Lemma 1. Hence, there must be some with . But then .
We now show that any minimal triplet cover of a tree in B(X) has a size that grows linear with |X|.
Corollary 1
Suppose that is a minimal triplet cover of some . Then
Proof
Put . First we observe that if is a bipartite graph in which every vertex in has degree at most 3, then the number of length 2 paths in B of the form x, v, y with and is equal to
Now, by (P3), is less than or equal to the number of length 2 paths in of the form x, v, y with and . Since , and each term in the above sum is at most 3 the corollary follows.
Multiplicities
In this section we derive some bounds for degrees of vertices in the cover graph of a triplet cover. Suppose that is a triplet cover of . For we define the multiplicity of x (relative to ) to be the number of elements in that contain x [or in other words, the degree of the vertex x in the cover graph ]. The multiplicity of is .
The following observation relating multiplicities with degrees will be useful later.
Lemma 2
Suppose that is a triplet cover for some tree and . If , then .
Proof
If , then x can be contained in at most one element of . But x must be contained in every element of for u the vertex in that is adjacent to x in T. Hence , and the only edge contained in the support graph that contains x (which must exist by (P1)) is . In particular, .
We now derive some bounds for multiplicities of minimal and minimum triplet covers.
Proposition 2
Suppose that .
If is a minimal triplet cover for T, then .
If is a minimum triplet cover for T, then
Proof
- (M1):
-
Suppose that . Let v be the vertex in T adjacent to x in T. Then, as is a triplet cover for T, by Lemma 1 there must exist some where are distinct. Therefore for all and so .
To see that the remaining inequality holds, we show that there is some element of X that is contained in at most 5 elements of . We use a simple counting argument based on pairs (x, c) where is an element in some . By Corollary 1, as is minimal. Since each element of contains 2 elements of X, the size of the set R of pairs (x, c) is at most . On the other hand . Hence, since , there must exist some with .
- (M2):
We again count pairs (x, c) where and c is an element in . This is and also equal to . Since and , there is some with . That holds follows from (M1).
A lower bound
In this section, we show that a minimum triplet cover of a tree has size . As mentioned in the introduction, this result can also be derived by applying Theorem 1 and Proposition 1 of Dress et al. (2012). However, it is of interest to have a direct proof that is independent of results concerning tree metrics.
Proposition 3
Suppose that is a triplet cover for some . Then we have . Moreover this bound is tight.
Proof
We use induction on . The result clearly holds for . So, suppose that the result holds for all triplet covers of trees in B(X) with .
Suppose that is a triplet cover for a tree in B(X) with . If there exists some such that , then by (P5) is a triplet cover for . Hence, by (P6) and induction, .
So, suppose that for all . Note that there must exist some with (otherwise, for all implies that there is a vertex with , which contradicts (P1)). Suppose that are distinct with in . Then there exist distinct elements with such that and . Put . Then since it follows that and so we consider the two possible cases ( and ).
- Case 1:
-
. Without loss of generality we may assume and . Then it is straight-forward to see that without loss of generality, v is adjacent to a in T, u lies on the path in T between v and c, and T restricted to the set is the quartet ab|cy. Note that since otherwise which contradicts .
Consider the triplet cover of T. Then . Hence, since by (P2), . Therefore, by (P5), is a triplet cover of . But the elements ab, ac, ay of are not contained in and, so,
The fact that holds now follows immediately by induction. - Case 2:
. Then and . Without loss of generality, we can assume that v is adjacent to a in T, and that T restricted to the set is a caterpillar tree with cherry . We consider the case where is also a cherry in this caterpillar tree and u is adjacent to both y and c in T. The argument for the remaining case (where or is also a cherry) is similar.
First note that if , then , since otherwise which would contradict . Similarly if , then . Hence, by symmetry, we can assume that does not contain at least one element from the set and at least one element from the set . Now, let P be a subset of of minimum size such that contains precisely one of the sets or , noting that . Consider the triplet cover of T. Then it is easily seen that , and so by (P5) is a triplet cover of . But the elements ab, ac, ax, ay of are not contained in and so
The fact that holds now follows by induction.
The fact that the bound is tight follows since for every there exists some triplet cover of T with cardinality [e.g. a pointed cover Dress et al. (2012)].
A characterization of minimum triplet covers
In this section, we prove our main result, namely a characterization of minimum triplet covers in terms of the structure of their cover graphs. First, we recall that a graph is called a 2-tree if there exists an ordering of V such that and, for , the vertex has degree 2 and belongs to a unique triangle in the subgraph induced by H on the set (Guénoche et al. 2004, p. 235). It is easily seen that a 2-tree has treewidth at most 2, and conversely, every graph of treewidth at most 2 is a subgraph of a 2-tree.
Theorem 1
Suppose that is a triplet cover for a tree . Then is minimum triplet cover if and only if is a 2-tree.
Proof
Put . Suppose that is a 2-tree. Then since 2-trees on n vertices have edges (Leclerc and Makarenkov 1998, p. 227) and , we have . So is a minimum triplet cover for T.
Conversely, suppose that is a minimum triplet cover for some tree . We shall prove that is a 2-tree by induction on . If it is clearly true. Suppose the statement holds for all X with .
Let be a minimum triplet cover for T on X with . Note that, by (M2), equals 2 or 3. Also, note that must be a minimal triplet cover for T.
Suppose that . Let be such that . Then there exist with . Consider the vertex adjacent to x in T (as shown in Fig. 3i). Then as is a triplet cover, and xa, xb are the only elements in containing x, it follows that .
Hence, . It follows that is a triplet cover for (see Fig. 3ii) and since , it follows that and so is a minimum triplet cover for . Since has one fewer leaf than T, we can apply the induction hypothesis and conclude that is a 2-tree. Then, since is obtained from by attaching x to the endpoints of the edge in , it follows that is also 2-tree.
Now suppose that . We shall show that this is not possible, from which the theorem follows. Let be such that and let denote the vertex adjacent to x in T. Then since is a minimal triplet cover for T there must exist distinct such that . Moreover, as there must exist some with . Since we also have , and since is a minimum triplet cover, it follows that .
Without loss of generality, assume T restricted to x, a, b, c is the quartet xa|bc (notice that we have symmetry involving a and b, and the quartet cannot be xc|ab because of the assumption that where v is the vertex adjacent to x in T), as shown in Fig. 3iii. Let be such that .
We claim that . Assume for contradiction that . Since is minimal and , there exists some vertex and some such that . Note that as , we must have . If then is a smaller minimum triplet cover for T (since v is still supported by abx), and this contradicts the minimality of . Thus we may assume that , in which case there is a set with . Since and we already have it follows that which implies that . However, as we already have , the additional assumption that means that contains ab, ac, bc which provides an alternative set, namely abc in , in which case remains a triplet cover for T. But again this contradicts the minimality of . Thus, , as claimed.
Therefore, in summary, and . We claim next that is a triplet cover for T. Indeed, if xb is contained in some element in for some , then since we must have . Since and it follows that must be a triplet cover for T, as claimed.
To complete the proof, note that since , Lemma 2 implies . Hence, by (P5), is a triplet cover of . Since has one fewer leaf than T we can apply the induction hypothesis and conclude that the graph is a 2-tree. Since any 2-tree has at least two vertices with degree 2 (Leclerc and Makarenkov 1998, p. 227), it follows that in at least one of the two vertices a or c has degree 2 (since there cannot be a vertex such that the degree of y in is equal to 2 as, by assumption, ). But if, without loss of generality, the degree of a in is equal to 2, then must hold too which contradicts . This completes the proof.
The next result follows immediately from the last theorem and the fact that any 2-tree has at least two vertices with degree 2 [see e.g. Leclerc and Makarenkov (1998), p. 227]. It improves on the bound given in Proposition 2 (M2).
Corollary 2
If is a minimum triplet cover for some tree then .
Note that a 2-tree is a 2d-tree, but not necessarily conversely [Guénoche et al. (2004), Proposition 3.4] (a graph is called a 2d-tree if there exists an ordering of V such that and, for the vertex has degree 2 in the subgraph of G induced by ). So Theorem 1 can be used to strengthen Theorem 1 of Huber and Steel (2014).
Shellings
Given a triplet cover of a tree , we say that is T-shellable if there exists an ordering of the elements in , say such that for every , there exists a pair of distinct elements in such that the restriction of T to the set is the quartet , and all elements in except are contained in . If T is clear from the context then we sometimes just say that is shellable, and we refer to the ordering of as a shellable ordering.
Although this combinatorial definition of shellability seems somewhat involved, its motivation rests on it being a sufficient condition for recursively determining the distances between all pairs of leaves (when the edges of T are assigned arbitrary positive edge lengths) starting with just the distance values for the pairs in the triplet cover. In other words, if a triplet cover of a tree is shellable then the pairs of elements from X that are not already present in can be ordered in a sequence so that the distance in T between the leaves in each pair is uniquely determined from the distances values on pairs that are either (i) present as an element of or (ii) appear earlier in the sequence.
For example, for the tree T shown in Fig. 1i, and the triplet cover consisting of the 7 pairs of elements of X that form the edges of in Fig. 1iii, there are just three pairs from that are not present in , namely ad, ae, bd. Ordering the pairs as provides a shellable ordering, since for ae we can select and observe that is the quartet obtained by restricting T to , the distance between and in T is determined uniquely by the five other distances involving pairs from , and these five pairs are present in . Having determined the distance for one can now use this (and the distances for pairs in ) to compute the distance value for the pair and, subsequently, for the pair .
We now gather together some facts concerning the shellability of triplet covers, including shellability of minimum triplet covers.
Proposition 4
Suppose that , , and is a triplet cover of T such that is a triplet cover of . If is ()-shellable, then is T-shellable.
Suppose that , are triplet covers of some tree and that . If is T-shellable, then so is .
If is a minimum triplet cover for a tree , then is T-shellable.
Proof
(S1): Put . Suppose such that is a triplet cover of which is shellable. Suppose that is the vertex in T that is adjacent to x in T. Then there must exist distinct with . Let and , so that and
Since is )-shellable, there is a shellable ordering of so that all of the elements in that set can be added into to obtain .
To complete the shellable ordering it remains to add the elements of that contain x to the ordering so far constructed. We consider two cases. First, suppose that neither nor form a cherry of T. Then for all , without loss of generality, the quartet induced by T on is ap|xb. Since we have that xa, xb, ab as and also ap and bp as we have all elements in , it follows that we can add in xp as a next element of the shellable ordering. We can repeat this adding-in process for all remaining elements in (in any order) to obtain . So is T-shellable in this case.
Second, suppose without loss of generality that forms a cherry. Then if , then the quartet induced by T on the set is xa|bp. So, using similar arguments as in the previous case, we can add in xp as a next element in the shellable ordering. It follows that we can repeat this process for all remaining elements in (in any order) to obtain a shellable ordering of . So is T-shellable in this case too.
(S2): This follows immediately from the definition of shellability.
(S3): We proceed using induction on . For the statement is clearly true. Suppose the statement is true up to and including .
Let be a triplet cover for some binary phylogenetic X-tree with . By Corollary 2, . Suppose that with . Then, by Lemma 2, . By (P4) it follows that is a triplet cover for . Note that is minimum since . Thus by induction is )-shellable. Therefore, is T-shellable by (S1).
Corollary 3
For any tree , suppose that is a minimum triplet cover for T. Consider any assignment of strictly positive lengths to the edges of T, and the resulting assignment of inter-leaf distances on the pairs from . This function from to uniquely determines T and its edge lengths, since no different tree can induce the same inter-leaf distances on pairs from under any positive weighting of the edges of .
Proof
This follows immediately from Part (S3) of Proposition 4, combined with Theorem 6 of Dress et al. (2012).
Note that there are examples of sets having cardinality that determine T and any set of positive edge lengths from inter-leaf distances, but which are not T-shellable (see Example 1).
Example 1
Put and let T be the caterpillar tree with exactly two cherries and intermediate leaves c, d, e (as shown in Fig. 4ii). Put . Then determines T and any set of positive edge lengths from inter-leaf distances, but it is not T-shellable (Dress et al. 2012, Example 6.2).
Conclusion and open problems
As mentioned earlier, there are examples of minimal triplet covers that are not minimum. The following provides a specific example.
Example 2
Let and T be the phylogenetic X-tree having cherries and leaves, starting with cherry , labeled in the order g, c, h, d (see Fig. 4i). Let
Then is a minimal triplet cover for T. Since it follows that is not minimum.
An interesting problem would be to investigate the structure of the cover graph for minimal triplet covers.
Our results also suggest further questions for future work.
-
(i)
There are formulae for counting the number of labeled 2-trees (Moon 1969). Is there a formula for counting the number of minimum triplet covers for a given phylogenetic X-tree?
-
(ii)
We have shown that minimum triplet covers are shellable. It would be interesting to see how far this result extends. For example, is every triplet cover shellable? Understanding the structure of minimal triplet covers might help to shed light on this question.
Acknowledgements
We thank the two anonymous reviewers for helpful comments, particularly Reviewer 1 for numerous helpful suggestions. MS thanks the Allan Wilson Centre for helping fund this work. VM and KTH thank the London Mathematical Society for helping to fund their visit to the University of Canterbury, Christchurch.
Contributor Information
K. T. Huber, Email: K.Huber@uea.ac.uk
V. Moulton, Email: V.Moulton@uea.ac.uk
M. Steel, Email: mike.steel@canterbury.ac.nz
References
- Dress A, Steel M. A hall-type theorem for triplet systems based on medians in trees. Appl Math Lett. 2009;22:1789–1793. doi: 10.1016/j.aml.2009.07.001. [DOI] [Google Scholar]
- Dress A, Huber KT, Steel M. ‘Lassoing’ a phylogenetic tree I: basic properties, shellings and covers. J Math Biol. 2012;65:77–105. doi: 10.1007/s00285-011-0450-4. [DOI] [PubMed] [Google Scholar]
- Guénoche A, Leclerc B, Makarenkov V. On the extension of a partial metric to a tree metric. Discrete Math. 2004;276:229–248. doi: 10.1016/S0012-365X(03)00294-2. [DOI] [Google Scholar]
- Huber KT, Steel M. Reconstructing fully-resolved trees from triplet cover distances. Electron J Comb. 2014;21(2):P2.15. [Google Scholar]
- Leclerc B, Makarenkov V. On some relations between 2-trees and tree metrics. Discrete Math. 1998;192:223–249. doi: 10.1016/S0012-365X(98)00073-9. [DOI] [Google Scholar]
- Moon J. The number of labeled -trees. J Comb Theory. 1969;6:196–199. doi: 10.1016/S0021-9800(69)80119-5. [DOI] [Google Scholar]
- Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]