RECOVERING A TREE FROM THE LENGTHS OF SUBTREES SPANNED BY A RANDOMLY CHOSEN SEQUENCE OF LEAVES

Steven N Evans; Daniel Lanoue

doi:10.1016/j.aam.2018.01.001

. Author manuscript; available in PMC: 2019 May 1.

Published in final edited form as: Adv Appl Math. 2018 Feb 28;96:39–75. doi: 10.1016/j.aam.2018.01.001

RECOVERING A TREE FROM THE LENGTHS OF SUBTREES SPANNED BY A RANDOMLY CHOSEN SEQUENCE OF LEAVES

Steven N Evans ¹, Daniel Lanoue ²

PMCID: PMC6135540 NIHMSID: NIHMS938484 PMID: 30220760

Abstract

Given an edge-weighted tree T with n leaves, sample the leaves uniformly at random without replacement and let W_k, 2 ≤ k ≤ n, be the length of the subtree spanned by the first k leaves. We consider the question, “Can T be identified (up to isomorphism) by the joint probability distribution of the random vector (W₂, …, W_n)?” We show that if T is known a priori to belong to one of various families of edge-weighted trees, then the answer is, “Yes.” These families include the edge-weighted trees with edge-weights in general position, the ultrametric edge-weighted trees, and certain families with equal weights on all edges such as (k + 1)-valent and rooted k-ary trees for k ≥ 2 and caterpillars.

Key words and phrases: tree reconstruction, graph isomorphism, phylogenetic diversity, random tree

1. Introduction

1.1. Background and motivation

What features of an edge-weighted tree identify it uniquely up to isomorphism, perhaps within some class of such trees? Here an edge-weighted tree is a connected, acyclic finite graph T with vertex set V(T) and edge set E(T) which is equipped with a function W_T : E(T) → ℝ₊₊ ≔ (0, ∞). The value of W_T(e) for an edge e ∈ E(T) is called the weight or the length of e. Two such trees T′ and T″ are isomorphic if there is a bijection σ : V(T′) → V(T″) such that:

{u, υ} ∈ E(T′) if and only if {σ(u), σ(υ)} ∈ E(T″),
W_T′ ({u, υ}) = W_T″ ({σ(u), σ(υ)}) for all {u, υ} ∈ E(T′).

The question above is, more formally, one of asking for a given class of edge-weighted trees 𝕋 about the possible sets 𝕌 and functions Φ : 𝕋 → 𝕌 such that for all T′, T″ ∈ 𝕋 we have Φ(T′) = Φ(T″) if and only if T′ and T″ are isomorphic. If the class 𝕋 consists of edge-weighted trees for which all edges have length 1 (we will call such objects combinatorial trees for the sake of emphasis), then determining whether two trees in 𝕋 are isomorphic is just a particular case of the standard graph isomorphism problem. The general graph isomorphism problem has been the subject of a large amount of work in combinatorics and computer science – [RC77] already speaks of the “graph isomorphism disease” – and, in particular, there are many results on reconstructing the isomorphism type of a graph from the isomorphism types of subgraphs of various sorts (see, for example, the review [Bon91]). There is also a substantial volume of somewhat parallel research on graph isomorphism in computational chemistry (see, for example, [Diu13] for a review). There seems to be considerably less work on determining isomorphism (in the obvious sense) of edge-weighted graphs; of course, in order for two edge-weighted graphs to be isomorphic the underlying combinatorial graphs must be isomorphic, but this does not imply that the best way for checking that two edge-weighted graphs are isomorphic proceeds by first determining whether the underlying combinatorial graphs are isomorphic and then somehow testing whether some isomorphism of the combinatorial graphs is still an isomorphism when the edge-weights are considered.

We begin with a discussion of previous results that address various aspects of the problem of determining when two edge-weighted or combinatorial trees are isomorphic.

A result in [Bed74] gives the following criterion for a bijection σ : V(T′) → V(T″), where T′ and T″ are combinatorial trees, to be an isomorphism: if υ₀, υ₁, …, υ_m is any sequence from V(T′) ⊔ V(T″) (here ⊔ denotes a disjoint union) such that υ₀ = υ_m and

{υ_{i}, υ_{j}} \in E (T') ⊔ E (T ″) ⊔ {{u, σ (u)} : u \in V (T')} \Leftrightarrow i - j \equiv \pm 1 mod m,

then m = 4.

The above result is elegant, but, of course, one does not need to apply it to all possible bijections to determine whether two combinatorial trees are isomorphic: there is a much more explicit and efficient procedure, which we now describe for the sake of completeness. First of all, suppose that T′ and T″ have distinguished vertices ρ′ and ρ″ and, in addition to the requirements in the above definition of an isomorphism σ, we require that σ maps ρ′ to ρ″; that is, we have rooted trees and we require that an isomorphism maps the root of one tree to the root of the other. The presence of a root allows us to think of a combinatorial tree as a directed graph, where the head of an edge is the vertex that is closer to the root and the tail is the vertex farther from the root. The children of a vertex are the adjacent vertices that are farther from the root and, more generally, the descendants of a vertex u are those vertices υ such that the path from the root to υ passes through u. The subtree spanned by a vertex u and its descendants contains no other vertices and can be thought of as a combinatorial tree rooted at u, and we call this subtree the subtree below u. Then, two rooted, combinatorial trees T′ and T″ are isomorphic if the two roots have the same number of children, say m, and there is an ordering of these children for each tree such that the subtree below the i^th child of the root of T′ is isomorphic (as a rooted, combinatorial tree) to the subtree below the i^th child of the root of T″. This observation can be turned into an efficient algorithm (see, for example, [AHU75, Example 3.2]). Now, two combinatorial trees are isomorphic if there is some choice of roots such the resulting rooted, combinatorial trees are isomorphic. A center of a combinatorial tree is a vertex c such that

max_{υ \in V (T)} r_{T} (c, υ) = min_{u \in V (T)} max_{υ \in V (T)} r_{T} (u, υ),

where r_T(u, υ) is the number of edges in he unique path between u and υ for u, υ ∈ V(T), and a combinatorial tree has either a unique center or two centers that are adjacent. It is therefore possible to determine if two combinatorial trees are isomorphic by rooting each of them at their various centers and checking if any two such rooted, combinatorial trees are isomorphic.

We, however, are interested in whether there are “statistics” of a more numerical character that can be used to decide tree isomorphism. For combinatorial trees, one somewhat obvious possibility is the multiset of eigenvalues of some matrix associated with the tree such as the adjacency matrix or the distance matrix. Unfortunately, the results of [Sch73, BM93, SF83, FGM97, ME11, BES12] show that not only is the isomorphism type of a tree not uniquely determined by the spectrum of its adjacency matrix but for various ensembles of combinatorial trees if one picks a tree uniformly at random from those in the ensemble with n vertices, then the probability there is another tree in the ensemble with an adjacency matrix that has the same spectrum converges to one as n → ∞. The results of [McK77, ME11] can be used to show that an analogous phenomenon is present when one considers, respectively, the spectrum of the matrix of vertex-to-vertex distances and the matrix of leaf-to-leaf distances.

Two trees have adjacency matrices with the same spectrum if and only if the characteristic polynomials of the adjacency matrices are equal. Given some irreducible representation of the symmetric group on the number of letters equal to the dimension of a square matrix, the immanantal polynomial of the matrix is constructed in the same manner as the characteristic polynomial except that the determinant is replaced by a similarly defined object for which the sign character is replaced by the character of the representation. One might hope that the immanantal polynomials are more successful at deciding isomorphism of combinatorial trees, but a result of [BM93] shows that if the adjacency matrices of two combinatorial trees have the same characteristic polynomials, then they have the same immanantal polynomials for every irreducible representation. We note that [Tur68] already contains an example of two combinatorial trees with adjacency matrices that are explicitly shown to have the same immanantal polynomial.

The greedoid Tutte polynomial of a combinatorial tree T encodes for each i and ℓ the number of subtrees of T that have i internal vertices and ℓ leaves. It was conjectured in [GMOY95] that this information identifies the isomorphism type of a combinatorial tree. However, it was shown in [EG06] that there are infinitely many pairs of nonisomorphic caterpillars that share the same greedoid Tutte polynomial: a caterpillar is a combinatorial tree that consists of some number of internal vertices along a single path and leaves that are each adjacent to one of the internal vertices. This contrasts with the situation for rooted, combinatorial trees; it is shown in [GM89] that there is a two-variable polynomial defined for all rooted, directed graphs (and hence, in particular, for rooted, combinatorial trees) such that two rooted, combinatorial trees have the same polynomial if and only if they are isomorphic. The polynomial in [GM89] is defined recursively, but it is not hard to see that it encodes in a compact manner the total number of vertices in the tree, the number of children of the root, the number of vertices in each of the subtrees below the children of the root, and so on.

The chromatic symmetric function of a graph was introduced in [Sta95]. A proper coloring of a finite graph is a function κ from the vertices of the graph to ℕ such that adjacent vertices are assigned different values. We can introduce an equivalence relation on the proper colorings by declaring that two colorings κ′ and κ″ are equivalent if there is a bijection π : ℕ → ℕ such that κ″ = π ◦ κ′. For a graph with m vertices, each equivalence class gives rise to a partition λ₁ ≥ λ₂ ≥ … ≥ λ_k > 0 of m by taking, for any κ in the equivalence class, λ_i to be the i^th largest of the cardinalities ⧣{υ : κ(υ) = j} as j ranges over ℕ. The chromatic symmetric function encodes for each partition of m the number of equivalence classes of colorings that give rise to that partition. It was conjectured in [Sta95] that nonisomorphic combinatorial trees have distinct chromatic symmetric functions. It was shown in [MMW08, APZ14] that this conjecture is true for certain classes of caterpillars and the latter paper also reports on computational results verifying that the conjecture holds for the class of trees with at most 23 vertices. Further work related to the conjecture is contained in [OS14, SST15].

Our point of departure in this paper is the well-known fact [Zar65, SP69, Bun71, Bun74] that an edge-weighted tree can be reconstructed from its matrix of leaf-to-leaf distances (see [Fel04] for an indication of the importance of this observation in the statistical reconstruction of phylogenetic trees). In fact, an edge-weighted tree with n leaves can be reconstructed if one knows for every subset of m leaves the total length of the subtree they span, provided that n ≥ 2m − 1 [PS04]. We remark that the total length of the subtree spanned by a set of leaves is an important quantity in phylogenetics where it is called the phylogenetic diversity of the corresponding set of taxa [HS07].

Given these results, one might imagine that the multiset of leaf-to-leaf distances suffices to identify the isomorphism type of an edge-weighted tree. This is certainly not the case. For example, consider the two combinatorial caterpillars T′ and T″ with 28 vertices each, where T′ has 3 internal vertices a′, b′, c′ in order along a path that are adjacent respectively to 2, 11, 12 leaves, and T″ has 3 internal vertices a″, b″, c″ in order along a path that are adjacent respectively to 3, 14, 8 leaves. Taking the $(\begin{matrix} 25 \\ 2 \end{matrix})$ pairs of distinct leaves in T′, we see that the distance 2 appears $(\begin{matrix} 2 \\ 2 \end{matrix}) + (\begin{matrix} 11 \\ 2 \end{matrix}) + (\begin{matrix} 12 \\ 2 \end{matrix}) = 122 times$ , the distance 3 appears 2 × 11 + 11 × 12 = 154 times, and the distance 4 appears 2 × 12 = 24 times. Similarly, taking the $(\begin{matrix} 25 \\ 2 \end{matrix})$ pairs of distinct leaves in T″, we see that the distance 2 appears $(\begin{matrix} 3 \\ 2 \end{matrix}) + (\begin{matrix} 14 \\ 2 \end{matrix}) + (\begin{matrix} 8 \\ 2 \end{matrix}) = 3 + 91 + 28 = 122 times$ , the distance 3 appears 3 × 14 + 14 × 8 = 154 times, and the distance 4 appears 3 × 18 = 24 times. Probabilistically, we have just shown that if we pick two leaves uniformly at random without replacement from an edge-weighted tree, then the isomorphism type of the tree is not uniquely identified by the probability distribution of the distance between the two leaves.

Note in this last example that if we looked at the multisets of lengths of subtrees spanned by three leaves, then we would see the length 3 appearing $(\begin{matrix} 11 \\ 3 \end{matrix}) + (\begin{matrix} 12 \\ 3 \end{matrix}) = 335 times$ for T′ and $(\begin{matrix} 3 \\ 3 \end{matrix}) + (\begin{matrix} 14 \\ 3 \end{matrix}) + (\begin{matrix} 8 \\ 3 \end{matrix}) = 421 times$ for T″, and hence the probability distribution of the length of the subtree spanned by three leaves chosen uniformly at random is not the same for the two trees.

In order to proceed further, we need to introduce some more notation. Write L(T) for the set of leaves of an edge-weighted tree T. Given a subset K of L(T), let W_T(K) be the length of the subtree spanned by K; that is, W_T(K) is the sum of the lengths of the edges in the smallest connected subgraph of T with a vertex set that contains K.

It is possible to calculate the total length of T, that is, W_T(L(T)), using the following result from [SS04] that extends one for the special case of 3-valent trees in [Pau00]. Write d_T(υ) for the degree of an interior vertex υ of T (that is, υ ∈ V(T) \ L(T)). For distinct leaves x, y ∈ L(T) denote by I_T(x, y) the set of interior vertices on the unique path in T between x and y and put

h_{T} (x, y) ≔ \prod_{υ \in I_{T} (x, y)} {((d_{T} (υ) - 1)!)}^{- 1} .

Let r_T(x, y) be the sum of the lengths of the edges in the path between x and y. Then,

W_{T} (L (T)) = \sum_{{x, y} \subseteq L (T), x \neq y} h_{T} (x, y) r_{T} (x, y) .

Of course, a similar formula gives W_T(K) for any K ⊆ L(T); the path between a pair of leaves of the subtree is the same as the path between them in T, the length of this path is the same in the subtree as it is in T, but the degree of an interior vertex of the subtree can be less than its degree as an interior vertex of T.

Suppose that ⧣L(T) = n and Y₁, …, Y_n is a uniformly distributed random listing of L(T); that is, Y₁, …, Y_n is the result of sampling the leaves of T uniformly at random without replacement. Set W_k ≔ W_T({Y₁, …, Y_k}) for 2 ≤ k ≤ n; that is, the random variable W_k is the length of the subtree spanned by the first k of the randomly chosen leaves. We write 𝒲_T for the (n − 1)-dimensional random vector (W₂, …, W_n) and call this random vector the random length sequence of T.

In this paper we address the following question.

Question 1.1

Can we reconstruct the edge-weighted tree T up to isomorphism from the joint probability distribution of the random length sequence 𝒲_T?

Another way of framing this question is the following. Write y₁, …, y_n for the leaves of T and let 𝒥_T be the multiset with cardinality n! that results from listing the (n − 1)-dimensional vectors

(W_{T} ({y_{π (1)}, y_{π (2)}}), W_{T} ({y_{π (1)}, y_{π (2)}, y_{π (3)}}), \dots, W_{T} ({y_{π (1)}, \dots, y_{π (n)}}))

as π ranges of the permutations of [n] ≔ {1, …, n}. We stress that 𝒥_T is a multiset; that is, we do not know which increasing sequences of lengths go with which ordered listings of the leaves.

Question 1.2

Can we reconstruct the edge-weighted tree T up to isomorphism from the multiset of length sequences 𝒥_T?

We end this section with some remarks about the problem of reconstructing trees from various so-called decks, as this subject has some similarities to the questions we consider. In [Ula60], Ulam asked whether it is possible to reconstruct the isomorphism type of a graph with at least 3 vertices from the isomorphism types of the subgraphs obtained by deleting each of the vertices. This question was resolved in the affirmative for combinatorial trees in [Kel57]. Moreover, later results established that it is not necessary to know the forests obtained by deleting every vertex. For example, it was shown in [HP66] that it suffices to know the subtrees obtained by deleting leaves. This latter result was strengthened in [Man70], where it was found that it is only necessary to know which nonisomorphic forests are obtained and not what the multiplicity of each isomorphism type is, and in [Bon69], where it was shown that it suffices to take only those leaves p that are peripheral in the sense that

max_{υ \in V (T)} r_{T} (p, υ) = max_{υ \in V (T)} max_{u \in V (T)} r_{T} (u, υ) .

Along the same lines, it was established in [Lau83] that it is enough to take only the nonleaf vertices, provided that there are at least three of them. The line of inquiry in [KS85] is the most similar to ours: an example was presented of two trees for which the respective sets of vertices may be paired up in such a way that for each pair the sizes of the trees in the forests produced by removing each element of the pair from its tree are the same, and a necessary and sufficient condition was given for a tree to be uniquely reconstructible from this sort of data, which the authors of [KS85] call the number deck of the tree.

1.2. Overview of the main results

We will answer Question 1.1 in the affirmative for a few different classes of trees. Some classes will have general edge-weights and some classes will be combinatorial trees. It is clear that in the case of general edge-weights we must restrict to trees that have no vertices with degree 2 because otherwise we can subdivide any edge into arbitrarily many edges with the same total length and the joint probability distribution of the random length sequence will be unchanged – see Figure 1.1. We call such trees simple. The terms irreducible or homomorphically irreducible are also used in the literature. Any simple tree with integer edge-weights can be associated with a combinatorial tree that is typically not simple by replacing an edge with weight m by a path of length m. Furthermore, any tree with commensurate edge-weights can be turned into one with integer edge-weights by multiplying each edge-weight by a suitable constant. This, however, does not reduce the Question 1.1 for edge-weighted trees to the corresponding question for combinatorial trees: a priori, it could be the case that there is an example of an edge-weighted tree with incommensurable edge-weights that is not reconstructable from the joint probability distribution of its random length sequence – such a tree can be “approximated” by trees with commensurable edge-weights and hence by combinatorial trees, but in principle there is nothing that prevents each of the approximations being reconstructable while their “limit” isn’t.

Figure 1.1 — Two non-isomorphic edge-weighted trees that cannot be distinguished by the joint probability distribution of their random length sequences.

Our first result is for the the class of stars; that is, edge-weighted trees with n ≥ 3 leaves that have a single interior vertex. Note that such trees are simple. For any edge-weighted tree with n leaves, W_n is a constant (the total length of the tree) and W_n − W_n−1 is a uniformly distributed random pick from the lengths of the n edges that are adjacent to one of the leaves. The following simple result is immediate from this observation.

Theorem 1.3

For n ≥ 3 the isomorphism type of a star is uniquely determined by the joint probability distribution of its random length sequence.

The simple trees with two leaves all consist of a single edge and have a random length sequence (W₂), where W₂ is the length of that edge, and so the isomorphism type of such a tree is uniquely determined by the joint probability distribution of its random length sequence. The simple trees with three leaves are stars, and it follows from Theorem 1.3 that the isomorphism type of such a tree is uniquely determined by the the joint probability distribution of its random length sequence.

We next consider simple, edge-weighted trees with four leaves.

Theorem 1.4

For 2 ≤ n ≤ 4, the isomorphism type of a simple, edge-weighted tree T with n leaves is uniquely determined by the joint probability distribution of its random length sequence.

The proof of this result is via consideration of possible cases. Similar proofs could be attempted for larger numbers of leaves, but the main reason we include the result is to show how such a proof for even a small number of leaves leads to quite a few cases and because we will need the case of four leaves later.

It is well-known that any simple, combinatorial tree with labeled leaves can be reconstructed from the simple, combinatorial trees spanned by each subset of four leaves (the so-called quartets) [SS03, Theorem 6.3.7]. With this and Theorem 1.4 in mind, one might imagine that the isomorphism type of simple, edge-weighted tree can be determined from the joint probability distribution of (W₂, W₃, W₄). However, putting such a strategy into practice would seem to be rather complicated because there can be two sets of leaves ${y_{1}^{'}, y_{2}^{'}, y_{3}^{'}, y_{4}^{'}}$ and ${y_{1}^{″}, y_{2}^{″}, y_{3}^{″}, y_{4}^{″}}$ such that ${y_{1}^{'}, y_{2}^{'}, y_{3}^{'}, y_{4}^{'}} \neq {y_{1}^{″}, y_{2}^{″}, y_{3}^{″}, y_{4}^{″}}$ but $W_{T} ({y_{1}^{'}, y_{2}^{'}}) = W_{T} ({y_{1}^{″}, y_{2}^{″}}), W_{T} ({y_{1}^{'}, y_{2}^{'}, y_{3}^{'}}) = W_{T} ({y_{1}^{″}, y_{2}^{″}, y_{3}^{″}})$ , and $W_{T} ({y_{1}^{'}, y_{2}^{'}, y_{3}^{'}, y_{4}^{'}}) = W_{T} ({y_{1}^{″}, y_{2}^{″}, y_{3}^{″}, y_{4}^{″}})$ One way of ruling out such annoying algebraic coincidences is to assume that the edge-weighted tree T has edge-weights in general position, by which we mean that the sums of the lengths of any two different (not necessarily disjoint) subsets of edges of T are not equal.

Theorem 1.5

The isomorphism type of a simple, edge-weighted tree T with edge-weights in general position is uniquely determined by the joint probability distribution of its random length sequence.

The last family of edge-weighted trees with general edge-weights whose elements we can identify up to isomorphism from the joint probability distributions of their random length sequences is the class of ultrametric trees. For the sake of completeness, we now define this class. Recall that for leaves i, j ∈ L(T) we denote by r_T(i, j) the distance between them; that is, r_T(i, j) is the sum of the lengths of the edges on the unique path between i and j. The edge-weighted tree T is ultrametric if for any leaves i, j, k ∈ L(T) we have

r_{T} (i, k) \leq r_{T} (i, j) \lor r_{T} (j, k),

from which it follows that for any leaves i, j, k ∈ L(T) at least two of the distances r_T(i, j), r_T(i, k), and r_T(j, k) are equal while the third is no greater than that common value. Equivalently, an edge-weighted tree T is ultrametric if, when it is thought of as a real tree (that is, a metric space where the edges are treated as real intervals of varying lengths given by their edge-weights – see, for example, [Eva08]), then there is a (unique) point ρ called the root (which may be in the interior of an edge) such that the distance from ρ to a leaf is the same for all leaves. We will make use of both definitions. It is immediate from the former definition that the subtree of an ultrametric tree spanned of a subset of leaves is itself ultrametric.

Theorem 1.6

The isomorphism type of an ultrametric, simple, edge-weighted tree T is uniquely determined by the joint probability distribution of its random length sequence.

Remark 1.7

The proof of Theorem 1.6 establishes an even stronger result. Namely, the isomorphism type of an ultrametric, simple, edge-weighted tree T is uniquely determined by the minimal element of 𝒥_T in the lexicographic order.

Remark 1.8

We call attention to a subtle point in the statements of Theorem 1.5 and Theorem 1.6. Both results say that if we are given the joint probability distribution of the random length sequence of an edge-weighted tree T – information that certainly includes the number of leaves of T – and we know, a priori, that T has a certain extra property (edge-weights in general position or ultrametricity), then we can determine the isomorphism type of T. The theorems do not, however, say whether it is possible to determine from the joint probability distribution of its random length sequence whether a simple, edge-weighted tree T has its edge-weights in general position or is ultrametric. We do not have results that settle this question, but we say some more about it in Remark 4.1 and believe it is an interesting area for future research.

Observe that if T is an edge-weighted tree, a is any vertex of T, and c is a constant such that c ≥ max{r_T(a, i) : i ∈ L(T)}, then r̃_T : L(T) × L(T) → ℝ₊ defined by

{\tilde{r}}_{T} (i, j) ≔ c + \frac{1}{2} (r_{T} (i, j) - r_{T} (a, i) - r_{T} (a, j)), i \neq j,

and

{\tilde{r}}_{T} (i, i) ≔ 0,

is an ultrametric on L(T) that arises from suitable edge-weights on T. The metric r̃_T is often called the Farris transform of r_T – see [DHM07] for a review of the many appearances of this object in various areas from phylogenetics to metric geometry. It might be hoped that an affirmative answer to Question 1.1 for general edge-weighted trees will follow from Theorem 1.6. However, we have been unable to find an argument which shows that the joint probability distribution of the random length sequence of the tree T equipped with the new edge-weights is determined by the joint probability distribution of the random length sequence for the original edge-weights.

Suppose that T is a rooted, simple combinatorial tree with root ρ. We can define a partial order on V(T) by declaring that x ≤ y if x is on the unique path from ρ to y. Two vertices x, y ∈ V(T) have a unique greatest lower bound in this partial order that we write as x ∧ y and call the most recent common ancestor of x and y. The map r̂_T : L(T) × L(T) → ℝ₊ defined by

{\hat{r}}_{T} (i, j) ≔ ⧣ {k \in L (T) : i \land j < k}

is an ultrametric on L(T) and hence it arises from a collection of edge-weights Ŵ_T on T. A directed edge (x, y) in T with x ≤ y is necessarily of the form x = i ∧ j = i ∧ k and y = j ∧ k for some i, j, k ∈ L(T). If e = {x, y} is the corresponding undirected edge, then

{\hat{W}}_{T} (e) = \frac{1}{2} ({\hat{r}}_{T} (i, j) - {\hat{r}}_{T} (j, k)) = \frac{1}{2} (⧣ {ℓ \in L (T) : x < ℓ} - ⧣ {ℓ \in L (T) : y < ℓ}) .

Therefore, if T′ is a subtree of T spanned by some set of leaves K ⊆ L(T) and D(T′) is the set of directed edges of T′, then we have that the length of T′ is

{\hat{W}}_{T} (K) = \frac{1}{2} \sum_{(x, y) \in D (T')} (\sum_{ℓ \in L (T)} 𝟙 {x < ℓ} - 𝟙 {y < ℓ}) = \frac{1}{2} ⧣ {((x, y), ℓ) \in D (T') \times L (T) : x < ℓ, y ≮ ℓ} .

The following result is immediate from Theorem 1.6 and Remark 1.7.

Corollary 1.9

The isomorphism type of a simple, combinatorial tree T is uniquely determined by the minimal element of the set 𝒥_T of length sequences obtained after designating a root for T and equipping T with the edge-weights Ŵ_T.

We now turn our focus to combinatorial trees and drop the assumption of simplicity. That is, all edge-weights are equal to one and there may be vertices with degree two. We answer Question 1.1 in the affirmative for two families of combinatorial trees.

First, a combinatorial tree T is a caterpillar if the deletion of the leaves along with the edges adjacent to them results in a path with ℓ + 1 vertices (and hence ℓ edges) – see, for example, Figure 1.2. Choose some direction for the path and number from 0 to ℓ the vertices on the path encountered successively in that direction and write n_i for the number of leaves adjacent to the vertex numbered i. Note that n₀ ≥ 1 and n_ℓ ≥ 1. Two sequences $n_{0}^{'}, \dots, n_{ℓ'}^{'}$ and $n_{0}^{″}, \dots, n_{ℓ ″}^{″}$ correspond to isomorphic trees if and only if ℓ′ = ℓ″ = ℓ, say, and either $n_{i}^{'} = n_{i}^{″}$ , 0 ≤ i ≤ ℓ or $n_{i}^{'} = n_{ℓ - i}^{″}$ , 0 ≤ i ≤ ℓ.

Figure 1.2 — A caterpillar tree. Removing the leaves (white vertices) results in a path of length 5 (black vertices).

Theorem 1.10

The isomorphism type of a caterpillar is uniquely determined by the joint probability distribution of its random length sequence. Furthermore, it is possible to determine from the joint probability distribution of the random length sequence of a combinatorial tree whether the tree is a caterpillar.

Our final results are for the classes of (unrooted) (k + 1)-valent and rooted k-ary combinatorial trees. For k ≥ 2, a (k + 1)-valent combinatorial tree is a combinatorial tree for which all vertices have degree either k + 1 (the internal vertices) or 1 (the leaves). For k ≥ 2, a rooted k-ary combinatorial tree is a combinatorial tree for which one internal vertex (the root) has degree k and the remaining internal vertices have degree k + 1; the leaves, of course, have degree 1. When k = 2 we refer to a rooted 2-ary combinatorial tree as a rooted binary combinatorial tree. Attaching an extra vertex via and edge to the root of a rooted k-ary tree produces a (k + 1)-valent combinatorial tree.

Theorem 1.11

The isomorphism type of a (k + 1)-valent (respectively, a rooted k-ary tree) combinatorial tree is uniquely determined by the joint probability distribution of its random length sequence.

In fact, our proof of Theorem 1.11 leads us to a stronger conclusion.

Theorem 1.12

Fix n > 1. Let 𝒯 be a random (k + 1)-valent (respectively, a random rooted k-ary) combinatorial tree. Then, the probability distribution of the isomorphism type of 𝒯 is uniquely determined by the joint probability distribution of its random length sequence.

Note that in Theorem 1.12 there are two sources of randomness in the construction of the random length sequence: we first choose a realization of the random 𝒯 and then take an independent uniform random listing of the leaves to build the increasing sequence of subtrees and their lengths.

The rest of the paper consists primarily of proofs of the above results in the order we have presented them. In Section 10 we briefly discuss further open questions related to Question 1.1.

2. Trees with up to n = 4 leaves: Proof of Theorem 1.4

We begin by looking at Question 1.1 for edge-weighted trees with a small number of leaves and give a proof of Theorem 1.4 that answers Question 1.1 in the affirmative for general, simple edge-weighted trees with n = 2, 3 or 4 leaves.

The case of Theorem 1.4 for simple trees with n = 2 leaves is trivial, as all such trees have two leaves and one edge, 𝒲_T = (W₂) in this case, and W₂ is the length of the edge.

The case of n = 3 leaves is only slightly more complicated, as all such trees are star-shaped. Thus, determining T from 𝒲_T consists of determining its three edge weights. These can be inferred easily from 𝒲_T by looking at the distribution of W₃ − W₂, which, since W₃ is constant (equal to the total length of T), is distributed as a uniform random choice from the three edge weights.

Finally, we give a proof of Theorem 1.4 in the case when n = 4.

Proof

For n = 4 leaves, there are two possible simple combinatorial trees, and hence two possibilities for the shape of T. The first is the star-shaped tree with four edges and one interior vertex. The second is the 3-valent tree with two interior vertices and one interior edge. See Figure 2.1.

Figure 2.1 — The two possible simple combinatorial trees with n = 4 leaves.

To determine which possibility T is, we first look at the distribution of W₄ − W₃ to find the lengths of the four edges connecting directly to the four leaves. Call these edges pendent. If the sum of the four pendent edge lengths equals W₄, then T is star shaped and we have determined T up to isomorphism. If not, then T is 3-valent and the difference between W₄ and the sum of the pendent edge lengths is the length e of the interior edge. All that is left to determine T up to isomorphism in this second case is determining how the pendent edges pair on each side of the interior edge.

First, if the multiset of the lengths of pendent edges is of the form {a, a, a, a} or {a, a, a, b}, then T is already uniquely determined.

Next, if the multiset is of the form {a, a, b, b}, then we need to distinguish between the case where the leaves with pendent edges of length a are siblings (and thus so are the leaves with pendent edge length b) and the case where leaves with pendent edge lengths a and b are paired. In the former case the possible values of W₂ are a + a, b + b, a + b + e with respective probabilities $\frac{4}{24}, \frac{4}{24}, \frac{16}{24}$ , whereas in the latter case the possible values of W₂ are a + b, a + b + e, a + a + e, b + b + e with respective probabilities $\frac{8}{24}, \frac{8}{24}, \frac{4}{24}, \frac{4}{24}$ , and we can certainly distinguish between the two cases.

If the multiset of pendent edge lengths is of the form {a, a, b, c}, then there are the following two possibilities:

(P1)
the two leaves with pendent edge length a are siblings and the two leaves with pendant edge lengths b and c are siblings, in which case the possible values of W₂ are a + a, a + b + e, a + c + e, b + c with respective probabilities $\frac{4}{24}, \frac{8}{24}, \frac{8}{24}, \frac{4}{24}$ ;
(P2)
a leaf with pendent edge length a is the sibling of the one with pendent edge length b and the other leaf with pendent edge length a is the sibling of the one with pendent edge length c, in which case the possible values of W₂ are a + b, a + c, a + a + e, a + b + e, a + c + e, b + c + e with respective probabilities $\frac{4}{24}, \frac{4}{24}, \frac{4}{24}, \frac{4}{24}, \frac{4}{24}, \frac{4}{24}$ .

Suppose without loss of generality that b < c. If a < b < c, then ℙ{W₂ = a + a} is $\frac{4}{24}$ for (P1) and 0 for (P2). If b < a < c or b < c < a, then ℙ{W₂ = a + c + e} is $\frac{8}{24}$ for (P1) and $\frac{4}{24}$ for (P2). In all cases we can distinguish between (P1) and (P2).

Finally, if the multiset of pendent edge lengths is of the form {a, b, c, d}, then there are the following three possibilities:

(P3)
the leaf with pendent edge length a is paired with the one with pendent edge length b and the leaf with pendent edge length c is paired with the one with pendent edge length d, in which case the possible values of W₂ are a + b, c + d, a + c + e, a + d + e, b + c + e, b + d + e with common probability $\frac{4}{24}$ ;
(P4)
the leaf with pendent edge length a is paired with the one with pendent edge length c and the leaf with pendent edge length b is paired with the one with pendent edge length d, in which case the possible values of W₂ are a + c, b + d, a + b + e, a + d + e, b + c + e, c + d + e with common probability $\frac{4}{24}$ ;
(P5)
the leaf with pendent edge length a is paired with the one with pendent edge length d and the leaf with pendent edge length b is paired with the one with pendent edge length c, in which case the possible values of W₂ are a + d, b + c, a + c + e, a + b + e, c + d + e, b + d + e with common probability $\frac{4}{24}$ ;

Suppose without loss of generality that a < b < c < d. Then possibility (P3) holds if and only if ℙ{W₂ = a + b} > 0 and possibility (P5) holds if and only if ℙ{W₂ = a + b} = 0 and ℙ{W₂ = b + d + e} > 0, so we can distinguish between (P3), (P4) and (P5).

The argument in the proof of Theorem 1.4 seems rather ad hoc and it does not suggest a systematic approach to obtaining the analogous result for trees with an arbitrary numbers of leaves. The number of simple combinatorial trees with n leaves grows so rapidly with n (see, for example, [Fel04]) that even for trees with a relatively small fixed number of leaves a case-by-case argument seems rather forbidding. Nonetheless, we do conjecture that an affirmative answer to Question 1.1 holds more generally.

3. Trees in general position: Proof of Theorem 1.5

Recall that the edge-weights of a simple, edge-weighted tree T are in general position if the sum of the lengths of any two distinct subset of edges of T are not equal.

Proof

By assumption, if ${y_{1}^{'}, \dots, y_{k}^{'}}$ and ${y_{1}^{″}, \dots, y_{k}^{″}}$ are two subsets of L(T) such that $W_{T} ({y_{1}^{'}, \dots, y_{k}^{'}}) = W_{T} ({y_{1}^{″}, \dots, y_{k}^{″}})$ , then ${y_{1}^{'}, \dots, y_{k}^{'}} = {y_{1}^{″}, \dots, y_{k}^{″}}$ . Consequently, if ${y_{1}^{'}, \dots, y_{k}^{'}}$ and ${y_{1}^{″}, \dots, y_{k}^{″}}$ are two subsets of L(T) such that $W_{T} ({y_{1}^{'}, \dots, y_{j}^{'}}) = W_{T} ({y_{1}^{″}, \dots, y_{j}^{″}})$ for 2 ≤ j ≤ k, then ${y_{1}^{'}, y_{2}^{'}} = {y_{1}^{″}, y_{2}^{″}}$ and $y_{j}^{'} = y_{j}^{″}$ for 3 ≤ j ≤ k.

Recall that Y₁, …, Y_n are the successive randomly chosen leaves used in the construction of 𝒲_T = (W₂, …, W_n).

Because W_n − W_n−1 is the length of the pendent edge attaching Y_n to the rest of T, it follows that the set C ≔ {ℓ > 0 : ℙ{W_n − W_n−1 = ℓ} > 0} has n elements and $ℙ {W_{n} - W_{n - 1} = ℓ} = \frac{1}{n}$ for each ℓ ∈ P. There are at least two leaves of T that are siblings, and so there exist ℓ′, ℓ″ ∈ C such that ℙ{W₂ = ℓ′ + ℓ″} > 0. Fix such a pair of lengths and write x₁ and x₂ for the (unique) leaves of T with pendent edges having respective lengths ℓ′ and ℓ″. We have $ℙ {W_{2} = ℓ' + ℓ ″} = \frac{1}{(\begin{matrix} n \\ 2 \end{matrix})}$ , and the event {W₂ = ℓ′ + ℓ″} coincides with the event {{Y₁, Y₂} = {x₁, x₂}}.

By assumption, the set D ≔ {ℓ > 0 : ℙ{W₃ − W₂ = ℓ|W₂ = ℓ′ + ℓ″} > 0} has n − 2 elements and $ℙ {W_{3} - W_{2} = ℓ | W_{2} = ℓ' + ℓ ″} = \frac{1}{n - 2}$ for each ℓ ∈ D. Index the values of D as ℓ₃, …, ℓ_n and write x_k, 3 ≤ k ≤ n, for the unique leaf of T that is distance ℓ_k from the unique vertex of T that is adjacent to both of the sibling leaves x₁ and x₂. We will show that it is possible to determine the leaf-to-leaf distances r_T(x_i, x_j), 1 ≤ i, j ≤ n. As we recalled in the Introduction, this information uniquely identifies the isomorphism type of T.

Again by assumption, the set E ≔ {ℓ > 0 : ℙ{W₄ = ℓ|W₂ = ℓ′ + ℓ″} > 0} has $(\begin{matrix} n - 2 \\ 2 \end{matrix})$ elements and $ℙ {W_{4} = ℓ | W_{2} = ℓ' + ℓ ″} = \frac{1}{(\begin{matrix} n - 2 \\ 2 \end{matrix})}$ for each ℓ ∈ E. For a given ℓ ∈ E there is a unique ordered pair (x_i, x_j), 3 ≤ i ≠ j ≤ n, and a unique e ≥ 0 such that

ℙ {W_{3} - W_{2} = ℓ_{i}, W_{4} - W_{3} = ℓ_{j} - e | W_{2} = ℓ' + ℓ ″, W_{4} = ℓ} > 0

and

ℙ {W_{3} - W_{2} = ℓ_{j}, W_{4} - W_{3} = ℓ_{i} - e | W_{2} = ℓ' + ℓ ″, W_{4} = ℓ} > 0,

in which case the two conditional probabilities in question are both $\frac{1}{2}$ . Moreover, every ordered pair (x_i, x_j), 3 ≤ i ≠ j ≤ n, corresponds to some unique ℓ ∈ E and e ≥ 0 in this way. The event {W₂ = ℓ′ + ℓ″, W₃ − W₂ = ℓ_i, W₄ − W₃ = ℓ_j − e, W₄ = ℓ} coincides with the event {{Y₁, Y₂} = {x₁, x₂}, Y₃ = x_i, Y₄ = x_j} and the event {W₂ = ℓ′ + ℓ″, W₃ − W₂ = ℓ_j, W₄ − W₃ = ℓ_i − e, W₄ = ℓ} coincides with the event {{Y₁, Y₂} = {x₁, x₂}, Y₃ = x_j, Y₄ = x_i}. Considering the subtree of T spanned by {x₁, x₂, x₃, x₄} and ignoring the vertices with degree two to produce a simple tree, the leaves x_i and x_j are siblings in this simple tree (as are x₁ and x₂), and the quantity e is the distance between the vertex in the subtree to which x_i and x_j are adjacent and the vertex to which x₁ and x₂ are adjacent; the lengths of the pendent edges connecting x_i and x_j to the rest of the subtree are ℓ_i − e and ℓ_j − e. Thus, if the ordered pair (x_i, x_j) corresponds to ℓ ∈ E and e ≥ 0, then, recalling the notation r_T for the path length distance in T, r_T(x₁, x₂) = ℓ′ + ℓ″, r_T(x₁, x_i) = ℓ′ + ℓ_i, r_T(x₁, x_j) = ℓ′ + ℓ_j, r_T(x₂, x_i) = ℓ″ + ℓ_i, r_T(x₂, x_j) = ℓ″ + ℓ_j, and r_T(x_i, x_j) = ℓ_i + ℓ_j − 2e.

Therefore, the joint probability distribution of the random length sequence 𝒲_T uniquely determines the matrix of leaf-to-leaf distances in T and hence the isomorphism type of T.

4. Ultrametric trees: Proof of Theorem 1.6

Recall that 𝒥_T is the set of sequences (ℓ₂, …, ℓ_n) such that ℙ{W_k = ℓ_k, 2 ≤ k ≤ n} > 0. Write ≺ for the usual lexicographic total order on 𝒥_T (that is ℓ′ ≺ ℓ″ if in the first coordinate where the two sequences differ the entry of the ℓ′ is smaller than the entry of ℓ″). Equivalently, ℓ′ ≺ ℓ″ if either $ℓ_{2}^{'} < ℓ_{2}^{″}$ or $ℓ_{2}^{'} = ℓ_{2}^{″}$ and for the smallest k ≥ 2 such that $ℓ_{k + 1}^{'} - ℓ_{k}^{'} \neq ℓ_{k + 1}^{″} - ℓ_{k}^{″}$ we have $ℓ_{k + 1}^{'} - ℓ_{k}^{'} < ℓ_{k + 1}^{″} - ℓ_{k}^{″}$ . In this section we prove Theorem 1.6 by showing that the tree T is determined up to isomorphism by the minimal element of 𝒥_T.

We use a similar technique (but with a different total order) to establish Theorem 1.11 for k+1-valent and rooted k-ary combinatorial trees in Section 7, Section 8, and Section 9.

Proof

Let (ℓ₂, ℓ₃, …, ℓ_n) be the minimal element of 𝒥_T. Write x₁, x₂, …, x_n for an ordering of L(T) such that ℓ_k = W_T({x₁, x₂, …, x_k}) for k = 2, …, n.

We will establish by induction that for 2 ≤ k ≤ n the ultrametric real tree spanned by the leaves {x₁, x₂, …, x_k} can be reconstructed from (ℓ₂, ℓ₃, …, ℓ_k) and, moreover, if we adopt the convention that we draw ultrametric real trees in the plane with the root at the top and leaves along the bottom, then this particular real tree can be embedded in the plane with the leaves x₁, x₂, …, x_k in order from left to right.

The claim is certainly true when k = 2. Suppose the claim is true for 2, 3, …, k.

Write T_k for the ultrametric real tree spanned by {x₁, x₂, …, x_k} and denote the height of T_k by h_k; that is, h_k is the common distance from each of the leaves of T_k to the root ρ_k of T_k. We can, of course, suppose that T₂ ⊂ T₃ ⊂ … ⊂ T_n.

If ℓ_k+1 − ℓ_k ≥ h_k, then the ultrametric real tree T_k+1 spanned by {x₁, x₂, …, x_k, x_k+1} must consist of an arc of length

h_{k + 1} = \frac{1}{2} (ℓ_{k + 1} - ℓ_{k} + h_{k})

from the root ρ_k+1 of T_k+1 to the leaf x_k+1 and an arc of length $\frac{1}{2} (ℓ_{k + 1} - ℓ_{k} - h_{k})$ from “new root” ρ_k+1 to the “old root” ρ_k. In this case we can, by the inductive hypothesis, certainly embed T_k+1 in the plane with the leaf x_k+1 to the right of the leaves x₁, x₂, …, x_k.

Assume, therefore, that ℓ_k+1 − ℓ_k < h_k. Then the ultrametric real tree T_k+1 must consist of T_k and an arc of length ℓ_k+1 − ℓ_k joining x_k+1 to a point y ∈ T_k. It will suffice to show that y must be on the arc [ρ_k, x_k] that connects ρ_k to x_k because there is a unique ultrametric real tree consisting of T_k and an arc of length ℓ_k+1 − ℓ_k joining a new leaf to a point on the arc [ρ_k, x_k] (this tree must have root ρ_k and the point where the arc of length ℓ_k+1 − ℓ_k attaches to [ρ_k, x_k] must be at distance h_k − (ℓ_k+1 − ℓ_k) from ρ_k) and, moreover, such a tree can be embedded in the plane with the new leaf to the right of the leaves {x₁, x₂, …, x_k}.

Suppose, then, that y is not on the arc [ρ_k, x_k]. Let j be the maximum of the indices i < k such that y is on the arc connecting x_i to ρ_k. Write u for the point that is closest to x_j+1 in the subtree spanned by {x₁, x₂, …, x_j} and ρ_k. Write υ for the point that is closest to x_j+1 in the subtree spanned by {x₁, x₂, …, x_j}. Equivalently, υ is the point in the subtree spanned by {x₁, x₂, …, x_j} that is closest to u. We may, of course, have u = υ (which occurs if and only if h_j+1 = h_j). By the inductive hypothesis, u and υ are on the arc connecting x_j to ρ_k and

r_{T} (x_{j + 1}, u) + r_{T} (u, υ) = ℓ_{j + 1} - ℓ_{j} .

By construction, y is the point closest to x_k+1 in the subtree spanned by {x₁, x₂, …, x_j} and ρ_k. Write w for the point closest to x_k+1 in the subtree spanned by {x₁, x₂, …, x_j}. Equivalently, w is the point in the subtree spanned by {x₁, x₂, …, x_j} that is closest to y. We have

W_{T} ({x_{1}, \dots, x_{j}, x_{k + 1}}) - ℓ_{j} = r_{T} (x_{k + 1}, y) + r_{T} (y, w) .

By the definition of j, the points y and u are on the arc connecting x_j to ρ_k and r_T (u, x_j) > r_T (y, x_j). This implies that r_T (u, υ) ≥ r_T (y, w). It also implies, by ultrametricity, that

r_{T} (x_{k + 1}, y) = r_{T} (x_{j}, y) < r_{T} (x_{j}, u) .

Consequently,

W_{T} ({x_{1}, \dots, x_{j}, x_{k + 1}}) - ℓ_{j} < ℓ_{j + 1} - ℓ_{j} .

This, however, contradicts the minimality of (ℓ₂, …, ℓ_n).

Remark 4.1

As we noted in Remark 1.8, it is interesting to know whether it is possible to determine from the joint probability distribution of the random length sequence whether an edge-weighted tree is ultrametric. The preceding proof of Theorem 1.6 contains a procedure for reconstructing T from the minimal element of 𝒥_T in the lexicographic order when T is an ultrametric tree. If T is an arbitrary edge-weighted tree and this procedure is applied to the minimal element of 𝒥_T in the lexicographic order, then it will still produce an ultrametric tree and so a necessary condition for T to be ultrametric is that the joint probability distribution of the random length sequence of this ultrametric tree coincides with the joint probability distribution of 𝒲_T.

Along the same lines, suppose that T is an arbitrary edge-weighted tree and, thinking of T as a real tree, we root it at the unique point ρ such that

max_{υ \in L (T)} r_{T} (ρ, υ) = r^{*} ≔ \frac{1}{2} max_{u \in L (T)} max_{υ \in L (T)} r_{T} (u, υ) .

Then ρ will have k children for some k. Let m_i, 1 ≤ i ≤ k, be the number of leaves υ in the subtree below the i^th child of ρ such that r_T (ρ, υ) = r*. It is clear that T is ultrametric if and only if m₁ + ⋯ + m_k = n. Let n₁, …, n_ℓ be a listing of the nonzero terms in the list m₁, …, m_k. Note for 2 ≤ j ≤ ℓ that

ℙ {W_{2} = 2 r^{*}, \dots, W_{j} = {jr}^{*}} = j! \frac{1}{n (n - 1) \dots (n - j + 1)} \sum_{1 \leq h_{1} < \dots < h_{j} \leq ℓ} n_{h_{1}} \dots n_{h_{j}}

and

max {j \geq 2 : ℙ {W_{2} = 2 r^{*}, \dots, W_{j} = {jr}^{*}} > 0} = ℓ .

Thus, the joint probability distribution of 𝒲_T determines ℓ and the values of the elementary symmetric polynomials of degrees 2 ≤ j ≤ ℓ evaluated at n₁, …, n_ℓ, and we want to know whether n₁ + ⋯ + n_ℓ, the value of the elementary symmetric polynomial of degree 1 evaluated at n₁, …, n_ℓ, is n. The elementary symmetric polynomials of degrees 1, 2, …, ℓ in ℓ real variables are algebraically independent over the reals, and so we cannot expect to recover n₁ + ⋯ + n_ℓ from the values of the other elementary symmetric polynomials. However, there are inequalities connecting the values of the various elementary symmetric polynomials that can be used to establish necessary conditions and sufficient conditions for T to be ultrametric. For example, set

p_{1} ≔ \frac{1}{ℓ} (n_{1} + \dots + n_{ℓ})

and

p_{j} ≔ \frac{1}{(\begin{matrix} ℓ \\ j \end{matrix})} \sum_{1 \leq h_{1} < \dots < h_{j} \leq ℓ} n_{h_{1}} \dots n_{h_{j}} = \frac{1}{(\begin{matrix} ℓ \\ j \end{matrix})} \frac{n (n - 1) \dots (n - j + 1)}{j!} ℙ {W_{2} = 2 r^{*}, \dots, W_{j} = {jr}^{*}}, 2 \leq j \leq ℓ .

If α₁, …, α_ℓ and β₁, …, β_ℓ are positive constants such that

α_{1} + 2 α_{2} + \dots + ℓ α_{ℓ} = β_{1} + 2 β_{2} + \dots + ℓ β_{ℓ}

and

α_{j} + 2 α_{j + 1} + \dots + (ℓ - j + 1) α_{ℓ} \geq β_{j} + 2 β_{j + 1} + \dots + (ℓ - j + 1) β_{ℓ}, 2 \leq j \leq ℓ,

(4.1)

then, the inequality [HLP88, Chapter II, Theorem 77] says that

\prod_{j = 1}^{ℓ} p_{j}^{α_{j}} \leq \prod_{j = 1}^{ℓ} p_{j}^{β_{j}} .

Thus, if α₂, …, α_ℓ and β₂, …, β_ℓ satisfy the inequalities (4.1) and

γ ≔ \sum_{j = 2}^{ℓ} j (β_{j} - α_{j}),

then

p_{1} \leq {(\prod_{j = 2}^{ℓ} p_{j}^{β_{j} - α_{j}})}^{\frac{1}{γ}}

when γ > 0, and the opposite inequality hold when γ < 0. This observation leads to necessary conditions and sufficient conditions for T to be ultrametric.

Remark 4.2

As we will see in Section 7, Section 8, and Section 9 for k + 1-valent and rooted k-ary combinatorial trees, a somewhat similar argument based on the consideration of length sequences that are minimal with respect to a suitable order leads to a stronger result in that case. There we can not only determine T from the joint probability distribution of its random length sequence, but if we have a random tree 𝒯 with a fixed number of leaves, then it is possible to determine the distribution of 𝒯 from the joint probability distribution of the random length sequence obtained by first picking a realization of 𝒯 and then independently picking a random ordering of the leaves to build a random length sequence.

Formally, we have some space 𝕋 of isomorphism types of trees, a corresponding space 𝕊 of possible length sequences, and a probability kernel μ from 𝕋 to 𝕊, where, for T ∈ 𝕋, ν(T, ·) is the element of ℘(𝕊), the space of probability measures on 𝕊, that is the joint probability distribution of the random length sequence built from T. An affirmative answer to Question 1.1 for a particular 𝕋 means that the map T ⟼ ν(T, ·) from 𝕋 to ℘(𝕊) is injective. Given an element μ of ℘(𝕋), the space of probability measures on 𝕋, let μν ∈ ℘(𝕊) be defined as usual by μν(B) = ∫_𝕋 ν(T, B) μ(dT) for B ⊆ 𝒮. The stronger results obtained in Section 9 say that, in the situations considered there, the map μ ⟼ μν from ℘(𝕋) to ℘(𝕊) is injective.

One can ask if an analogous strengthening is also true for ultrametric trees. A proof along the lines of that given for Theorem 1.12 doesn’t appear to apply immediately in this situation where the relevant space 𝕋 is uncountable rather than finite. We leave this as one of many open questions.

5. Caterpillar trees: Proof of Theorem 1.10

Recall that a caterpillar is a (not necessarily simple) combinatorial tree such that deleting the leaves of the tree results in a path consisting of ℓ+1 vertices (and hence ℓ edges of length 1).

Remark 5.1

Choosing one end of the path, we can label the vertices on the path consecutively with 0, 1, …, ℓ and denote by n_r the leaves that are attached to vertex r on the path. Both n₀ and n_ℓ are non-zero, but the remaining n_i may be zero.

The isomorphism types of caterpillars with n leaves are thus seen to be in a bijective correspondence with equivalence classes of nonnegative integer sequences (n₀, n₁, …, n_ℓ−1, n_ℓ), where n = n₀ + ⋯ + n_ℓ and n₀, n_ℓ ≠ 0, and we declare that (n₀, n₁, …, n_ℓ−1, n_ℓ) and (n_ℓ, n_ℓ−1, …, n₁, n₀) are equivalent.

The proof of the following, which establishes the first claim in Theorem 1.10, is straightforward and we omit it.

Proposition 5.2

A combinatorial tree T with n leaves is a caterpillar with an associated path of length ℓ if and only if

max {k : ℙ {W_{2} = k + 2} > 0} = ℓ

and W_n = ℓ + n almost surely.

We now turn to the proof of the main claim in Theorem 1.10.

Proof

The caterpillar consists of a path of length ℓ and edges connecting the ℓ+1 vertices in the path to leaves. Label the vertices in the path successively with the elements of {0, 1, …, ℓ} and define random variables X₁, X₂, …, X_n by setting X_k = i if Y_k, the k^th leaf chosen, is adjacent to the vertex in the path labeled i. Set

K_{r} ≔ max_{1 \leq j \leq r} X_{j} - min_{1 \leq j \leq r} X_{j} .

It is clear that (W₂, W₃, …, W_n) has the same joint probability distribution as (K₂ + 2, K₃ + 3, …, K_n + n), and so it suffices to show that it is possible to determine {(n₀, n₁, …, n_ℓ−1, n_ℓ), (n_ℓ, n_ℓ−1, …, n₁, n₀)} from a knowledge of the joint probability distribution of 𝒦 ≔ (K₂, …, K_n).

To begin with, note that, as in Proposition 5.2,

max {k : ℙ {K_{2} = k} > 0} = ℓ,

and so we can determine ℓ from the joint probability distribution of 𝒦.

Observe next that

ℙ {K_{2} = ℓ} = ℙ {(X_{1}, X_{2}) \in {(0, ℓ), (ℓ, 0)}} = 2 \frac{n_{0} n_{ℓ}}{n (n - 1)},

and

max {k : ℙ {K_{2} = 0, \dots, K_{k} = 0, K_{k + 1} = ℓ} > 0} = n_{0} \lor n_{ℓ} .

We can thus determine the multiset {n₀, n_ℓ} and, in particular, n₀ + n_ℓ.

For $1 \leq r < \frac{ℓ}{2}$ we have

ℙ {K_{2} = r, K_{3} = ℓ} = ℙ {(X_{1}, X_{2}, X_{3}) \in {(0, r, ℓ), (r, 0, ℓ), (ℓ, ℓ - r, 0), (ℓ - r, ℓ, 0)}} = \frac{2 n_{0} (n_{r} + n_{ℓ - r}) n_{ℓ}}{n (n - 1) (n - 2)},

and so we can determine n_r + n_ℓ−r. If ℓ is even, then

ℙ {K_{2} = \frac{ℓ}{2}, K_{3} = ℓ} = ℙ {(X_{1}, X_{2}, X_{3}) \in {(0, \frac{ℓ}{2}, ℓ), (\frac{ℓ}{2}, 0, ℓ), (ℓ, \frac{ℓ}{2}, 0), (\frac{ℓ}{2}, ℓ, 0)}} = \frac{4 n_{0} n_{\frac{ℓ}{2}} n_{ℓ}}{n (n - 1) (n - 2)},

and so we can determine $n_{\frac{ℓ}{2}}$ .

Also,

ℙ {K_{2} = 0} = \sum_{i = 0}^{ℓ} ℙ {X_{1} = r, X_{2} = r} = \frac{\sum_{r = 0}^{ℓ} n_{r} (n_{r} - 1)}{n (n - 1)} = \frac{\sum_{r = 0}^{ℓ} n_{r}^{2} - n}{n (n - 1)}

and, for 1 ≤ k ≤ ℓ,

ℙ {K_{2} = k} = \sum_{r = 0}^{ℓ - k} ℙ {(X_{1}, X_{2}) \in {(r, r + k), (r + k, r)}} = \frac{2 \sum_{r = 0}^{ℓ - k} n_{r} n_{r + k}}{n (n - 1)} .

We can therefore determine $\sum_{r = 0}^{ℓ - k} n_{r} n_{r + k}$ for 0 ≤ k ≤ ℓ.

We claim that we the information we have just derived suffices to determine {(n₀, n₁, …, n_ℓ−1, n_ℓ), (n_ℓ, n_ℓ−1, …, n₁, n₀)}. That is, if $n_{0}^{'}, \dots, n_{ℓ}^{'}$ is a sequence with

n_{0} + \dots + n_{ℓ} = n_{0}^{'} + \dots + n_{ℓ}^{'} = n,

n_{r} + n_{ℓ - r} = n_{r}^{'} + n_{ℓ - r}^{'}

for 0 ≤ r ≤ ℓ, and

\sum_{r = 0}^{ℓ - k} n_{r} n_{r + k} = \sum_{r = 0}^{ℓ - k} n_{r}^{'} n_{r + k}^{'}

for 0 ≤ k ≤ ℓ, then either $n_{r} = n_{r}^{'}$ for 0 ≤ r ≤ ℓ or $n_{r} = n_{ℓ - r}^{'}$ for 0 ≤ r ≤ ℓ.

To see that this is so, introduce the functions

g (z) ≔ \sum_{k = 0}^{ℓ} n_{k} e^{izk}

and

g' (z) ≔ \sum_{k = 0}^{ℓ} n_{k}^{'} e^{izk}

for z ∈ ℂ. These are entire functions that uniquely determine n₀, …, n_ℓ and $n_{0}^{'}, \dots, n_{ℓ}^{'}$ . Note that

\sum_{k = 0}^{ℓ} n_{ℓ - k} e^{izk} = e^{iz ℓ} g (- z),

and a similar formula holds for g′. It will thus suffice to show that either g(z) = g′(z) or g(z) = e^izℓg′(−z) (equivalently, g′(z) = e^izℓg(−z)).

It follows from the assumption that

\sum_{r = 0}^{ℓ - k} n_{r} n_{r + k} = \sum_{r = 0}^{ℓ - k} n_{r}^{'} n_{r + k}^{'}

for 0 ≤ k ≤ ℓ that if we define N : ℤ → ℤ by

N (j) = {\begin{matrix} n_{j}, & 0 \leq j \leq ℓ, \\ 0, & otherwise, \end{matrix}

and define N′ similarly, then

\sum_{{r, j \in ℤ : r - j = k}} N (r) N (j) = \sum_{{r, j \in ℤ : r - j = k}} N' (r) N' (j)

for all k ∈ ℤ and hence

g (z) g (- z) = g' (z) g' (- z)

for all z ∈ ℂ. Theorem 2.2 in [RS82] says that if g and g′ are related in this way, then there exist finitely supported functions C : ℤ → ℤ and D : ℤ → ℤ such that if we set

ϕ (z) ≔ \sum_{k \in ℤ} C (k) e^{izk}

and

ψ (z) ≔ \sum_{k \in ℤ} D (k) e^{izk},

then

g (z) = ϕ (z) ψ (z)

and

g' (z) = ϕ (z) ψ (- z) .

It follows from the assumption that

n_{r} + n_{ℓ - r} = n_{r}^{'} + n_{ℓ - r}^{'}

for 0 ≤ r ≤ ℓ that

g (z) + e^{iz ℓ} g (- z) = g' (z) + e^{iz ℓ} g' (- z)

for all z ∈ ℂ. Therefore,

ϕ (z) ψ (z) + e^{iz ℓ} ϕ (- z) ψ (- z) = ϕ (z) ψ (- z) + e^{iz ℓ} ϕ (- z) ψ (z)

and hence

(ϕ (z) - e^{iz ℓ} ϕ (- z)) (ψ (z) - ψ (- z)) = 0

for all z ∈ ℂ. Because the functions z ↦ ϕ(z) − e^izℓϕ(−z) and z ↦ ψ(z) − ψ(−z) are both entire, we must have either that ϕ(z) = e^izℓϕ(−z) for all z ∈ ℂ or ψ(z) = ψ(−z) for all z ∈ ℂ. If ϕ(z) = e^izℓϕ(−z) for all z ∈ ℂ, then

g' (z) = ϕ (z) ψ (- z) = e^{iz ℓ} ϕ (- z) ψ (- z) = e^{iz ℓ} g (- z)

and $n_{i} = n_{ℓ - i}^{'}$ for 0 ≤ i ≤ ℓ. If ψ(z) = ψ(−z) for all z ∈ ℂ, then

g' (z) = ϕ (z) ψ (- z) = ϕ (z) ψ (z) = g (z)

and $n_{r} = n_{r}^{'}$ for 0 ≤ r ≤ ℓ.

6. k + 1-valent and rooted k-ary trees – preliminaries

Recall that a (k + 1)-valent tree is a tree with all vertices of degree either k + 1 or 1. For k ≥ 2 a rooted k-ary tree is a tree with one vertex of degree k and the rest of degrees either k + 1 or 1. We refer to the rooted 2-ary tree as a rooted binary tree. Note that any k-ary tree is obtained by removing one leaf of a suitable (k + 1)-valent trees.

Our general proof methodology for these families of trees is similar to that used in Section 4 for ultrametric trees. We first define a particular class of sequences that can appear as elements of 𝒥_T and a total order on such sequences. We then show that the minimal sequence in 𝒥_T uniquely identifies T.

The idea of the proof is the same for all k and depends on the following fact.

Lemma 6.1

Let T be a (k + 1)-valent tree or a rooted k-ary tree and let S be a subtree of T. Then S is a rooted k-ary tree if and only if

⧣ E (S) = \frac{k}{k - 1} (⧣ L (S) - 1) .

Proof

Because S is a subtree of T, every interior vertex of S has degree at most k + 1. Write d₁ ≔ ⧣L(S), d₂, …, d_k+1 for the number of vertices of S of degrees 1, 2, …, k + 1. We need to show that d_j = 0 for 1 < j ≤ k − 1 and d_k = 1, or, equivalently, that d_k = 1 and $d_{k + 1} = \sum_{j = 2}^{k + 1} d_{j} - 1 = ⧣ V (S) - d_{1} - 1 = ⧣ E (S) - d_{1}$ . This is in turn equivalent to showing that

\sum_{j = 2}^{k + 1} {jd}_{j} = k + (k + 1) (⧣ E (S) - d_{1}),

which, by the “handshaking identity”

2 ⧣ E (S) = \sum_{j = 1}^{k + 1} {jd}_{j},

becomes

2 ⧣ E (S) - d_{1} = k + (k + 1) (⧣ E (S) - d_{1})

or, upon rearranging,

⧣ E (S) = \frac{k}{k - 1} (d_{1} - 1) = \frac{k}{k - 1} (⧣ L (S) - 1) .

7. Proofs of Theorem 1.11 and Theorem 1.12 for 3-valent trees

For simplicity of notation we present the details of the proofs of the claims in Theorem 1.11 and Theorem 1.12 in the special case k = 2. In this section we deal with 3-valent trees. We deal with rooted binary trees in Section 8. We comment on how these proofs generalize to arbitrary k in Section 9.

We begin with an analysis of random length sequences for marked (also known as planted) 3-valent trees. A marked 3-valent tree (T, υ) is a 3-valent tree T along with a distinguished leaf υ of T. We define the modified random length sequence 𝒲_(T,υ) of (T, υ) to be the random length sequence 𝒲_T of T conditioned on Y₁ = υ.

7.1. Down-split sequences

We need to distinguish some particular sequences that appear in the support of 𝒲_(T,υ).

Remark 7.1

As usual, we can define a partial order on V(T) by declaring that x precedes y if x ≠ y is on the path between υ and y, and we can extend this partial order to a total order < such that if w, x, y, z are such that w and x are not comparable in the partial order but w < x, w precedes y in the partial order, and x precedes z in the partial order, then y < z. Such a total order corresponds to embedding T in the plane and listing the elements of V(T) in the order they are encountered as one walks around T starting from υ. That is, the total order is a depth–first–search (DFS) ordering of V(T).

Suppose that υ = y₁ < y₂ < … < y_n is the ordered listing of L(T). Set s_p = W_T({y₁, …, y_p}), 2 ≤ p ≤ n. If s_p = 2p − 2, then the subtree spanned by the p leaves {y₁, …, y_p} has 2p − 2 edges and hence, by Lemma 6.1, this subtree is a binary tree. If we write o for the vertex adjacent to the marked leaf υ, denote by υ′, υ″ the other two vertices adjacent to o, and suppose that υ′ < υ″, then it must be the case that {y₂, …, y_p} = {y ∈ L(T) : υ′ ≤ y}. Write T′ (respectively, T″) for the subtree of T consisting of w and the vertices u such that υ′ (respectively, υ″) is on the path from o to u. The sequence $(s_{2}^{'}, \dots, s_{n'}^{'}) ≔ (s_{2} - 1, \dots, s_{p} - 1)$ satisfies $s_{k}^{'} = W_{T'} (o, y_{2}, \dots, y_{k})$ for 2 ≤ k ≤ n′ = p. The sequence $(s_{2}^{″}, \dots, s_{n ″}^{″}) ≔ (s_{p + 1} - (2 p - 2), \dots, s_{n} - (2 p - 2))$ satisfies $s_{k}^{″} = W_{T ″} (o, y_{p + 1}, \dots, y_{p + k - 1})$ for 2 ≤ k ≤ n″ = n − p + 1.

Definition 7.2

A down-split sequence is an element of the class of increasing sequences of positive integers defined recursively as follows. The sequence

s = (1)

is a down-split sequence.

A sequence s = (s₂, …, s_n), n > 2, is down-split if

{2 \leq p < n : s_{p} = 2 p - 2} \neq \emptyset

and, setting

k_{s} = min {2 \leq p < n : s_{p} = 2 p - 2},

(s₂ − 1, …, s_{k_s} − 1) is down-split,
(s_{k_s+1} − (2k_s − 2), …, s_n − (2k_s − 2)) is down-split.

The index k_s is the down-splitting index of s.

Example 7.3

For n = 3, the sequence s = (s₂, s₃) = (2, 3) is a down-split sequence. Here k_s = 2, (s₂−1, …, s_{k_s} −1) = (1) and (s_{k_s+1} − (2k_s − 2), …, s_n − (2k_s − 2)) = (1).

The following result is immediate from Remark 7.1.

Lemma 7.4

For every marked 3-valent tree (T, υ) there is at least one down-split sequence s with

ℙ {𝒲_{(T, υ)} = s} > 0 .

We record the following fact for later use.

Lemma 7.5

If s = (s₂, …, s_n) is a down-split sequence then s_n = 2n − 3.

Proof

This follows easily by induction. If s splits at k_s, then, as

(s_{2}^{'}, \dots, s_{n'}^{'}) = (s_{k_{s} + 1} - (2 k_{s} - 2), \dots, s_{n} - (2 k_{s} - 2))

is a down-split sequence with n′ = n − k₂ + 1, we have by the inductive hypothesis that

s_{n} - (2 k_{s} - 2) = 2 (n - k_{s} + 1) - 3

and the claim follows.

Example 7.6

Given any down-split sequence s, it is possible to reverse the argument in Remark 7.1 and construct a marked 3-valent tree with a suitable total ordering on its vertices such that s is the corresponding down-split sequence. However, a marked 3-valent tree (T, υ) is not uniquely identified by an arbitrary down-split sequence in the support of 𝒲_(T,υ), as the example in Figure 7.1 shows.

Figure 7.1 — Two marked binary trees (T̂, υ) and (Ť, υ) with particular realizations of the random selection of leaves.

Write (Ŷ₁, …, Ŷ_n) and (Y̌₁, …, Y̌_n) for the random selections of the leaves of T̂ and Ť. Suppose that the realizations are such that Ŷ_k = Y̌_k ∈ T̄ for 4 ≤ k ≤ n and that these leaves of the subtree T̄ appear in an order of the type discussed in Remark 7.1. The corresponding realizations for the modified random length sequences are equal. The common value (3, 4, …) is a down-split sequence with down-splitting index 3. Thus, two non-isomorphic marked 3-valent trees can have a common down-split sequence in the supports of their modified random length sequences. Note that the common down-split sequence results from taking the leaves of Ť according to an order of the type described in Remark 7.1, but this is not the case for T̂.

With Example 7.6 in mind we see that it would be useful to have a way of recognizing down-split sequences in the support of 𝒲_(T,υ) that result from realizations where the leaves are selected in an order that arises from a suitable total order on the vertices of T. The key is the following total order on down-split sequences. We re-use the notation ≺ that was used in Section 4 for the lexicographic order.

Definition 7.7

Define a total order ≺ on the set of down-split sequences of a given length recursively as follows. Firstly, (1) ≺ (1) does not hold. Next, let s, r be down-split sequences indexed by {2, …, n} with respective down-splitting indices k_s and k_r. Set

s' = (s_{2} - 1, \dots, s_{k_{s}} - 1), r' = (r_{2} - 1, \dots, r_{k_{r}} - 1),

and

s ″ = (s_{k_{s} + 1} - (2 k_{s} - 2), \dots, s_{n} - (2 k_{s} - 2)), r ″ = (r_{k_{r} + 1} - (2 k_{r} - 2), \dots, r_{n} - (2 k_{r} - 2)) .

Declare that

s ≺ r

k_{s} < k_{r}

k_{s} = k_{r} and s' ≺ r'

k_{s} = k_{r} and s' = r' and s ″ ≺ r ″ .

The next result follows easily by induction.

Lemma 7.8

The binary relation ≺ is a total order on the set of down-split sequences of a given length.

Definition 7.9

The minimal down-split sequence for a marked 3-valent tree (T, υ) is the minimal element (with respect to the total order ≺) of the set

{s down - split : ℙ {𝒲_{(T, υ)} = s} > 0} .

We now proceed to establish some results that culminate in showing that (T, υ) is determined by its minimal down-split sequence.

Lemma 7.10

Let (T, υ) be a marked 3-valent tree, with modified random length sequence 𝒲_(T,υ) = (W₂, …, W_n) constructed from the random sequence of leaves (Y₁, …, Y_n) with Y₁ = υ. Denote by o the vertex adjacent to the marked leaf υ and denote by υ′, υ″ the other two vertices adjacent to o. Write T′ (respectively, T″) for the subtree of T consisting of o and the vertices u such that υ′ (respectively, υ″) is on the path from o to u. Set

m ≔ min {k : ℙ {W_{k} = 2 k - 2} > 0} .

Then W_m = 2m − 2 if and only if

Y_{2}, \dots, Y_{m} \in L (T') and Y_{m + 1}, \dots, Y_{n} \in L (T ″),

or vice versa.

Proof

If Y₂, …, Y_m ∈ L(T′) and Y_m+1, …, Y_n ∈ L(T″), then the subtree spanned by {υ, Y₂, …, Y_m} consists of the leaf υ adjoined to T′ via an edge to the vertex o. This subtree is a rooted binary tree with root o. It follows from Lemma 6.1 that W_m = 2m − 2.

For the other direction, assume that W_m = 2m − 2. By Lemma 6.1, the subtree S spanned by {υ, Y₂, …, Y_m} is a rooted binary tree with m leaves. We have L(S) ⊆ L(T), υ ∈ L(S), and L(S) \ {υ} ⊆ (L(T′) \ {o}) ∪ (L(T″) \ {o}). We need to show that S consists of the leaf υ adjoined to either T′ or T″ via an edge to the vertex o that is common to both T′ and T″.

By the construction prior to the statement of Lemma 7.4 we know that

m \leq ⧣ L (T') \land ⧣ L (T ″)

and so if L(T′) \ {o} ⊆ L(S) \ {υ}, then L(T″) ∩ L(S) = L(T″) \ {o} ∩ L(S) \{υ} = ∅ and similarly with the roles of T′ and T″ reversed.

We can rule out the possibility that L(S) intersects both L(T′) and L(T″) as follows. If L(T′) ∩ L(S) ≠ ∅ and L(T″) ∩ L(S) ≠ ∅, then L(T′) ∩ L(S) must be a proper subset of L(T′) \ {o} and L(T″) ∩ L(S) must be a proper subset of L(T″) \ {o}. If L(T′) ∩ L(S) is a proper, nonempty subset of L(T′) \ {o}, then S must have a degree 2 vertex that belongs to V(T′) \ {o}, and similarly for T″. However, S is a rooted binary tree and cannot have two or more vertices of degree 2.

Finally, we need to rule out the possibility of L(S) \ {υ} is a proper subset of L(T′) \ {o} or L(T″) \ {o}. However, if L(S) \ {υ} is a proper subset of L(T′) \ {o}, then S would have at least one degree 2 vertex in that belongs to V(T′) \ {o} as well as the degree 2 vertex o, which contradicts s being a rooted binary tree. The same argument holds with T″ in place of T′.

Corollary 7.11

Let (T, υ) be a marked 3-valent tree with modified random length sequence 𝒲_(T,υ) = (W₂, …, W_n). Then

m ≔ min {k : ℙ {W_{k} = 2 k - 2} > 0}

is the down-splitting index for the minimal down-split sequence for (T, υ).

Proof

If k_s is the down-splitting index of any down-split sequence s in the support of 𝒲_(T,υ), then s_{k_s} = 2k_s − 2 by definition. Thus ℙ{W_{k_s} = 2k_s − 2} > 0 and hence m ≤ k_s.

On the other hand, let o, υ′, υ″, T′, T″ be as in the statement of Lemma 7.10. It follows from that result that m = ⧣L(T′) ∧ ⧣L(T″). By the construction in Remark 7.1 if m = ⧣L(T′) or the analogous one with the roles of T′ and T″ reversed if m = ⧣L(T″), we may construct a down-split sequence for (T, υ) that has down-splitting index m. By the definition of the total order ≺, the down-splitting index for the minimal down-split sequence for (T, υ) is at most m.

Proposition 7.12

Let s be the minimal down-split sequence for a marked 3-valent tree (T, υ). There is no other marked 3-valent tree for which s is the minimal down-split sequence.

Proof

We will prove this by induction. The claim is clearly true for the down-split sequence s = (1).

Let (T, υ) be a marked 3-valent tree and s the minimal down-split sequence for (T, υ). Define o, υ′, υ″, T′, T″ as in the statement of Lemma 7.10. Let k_s be the down-splitting index of s. Let y₁, …, y_n be an ordered listing of L(T) such that W_(T,υ)({y₁, …, y_k}) = s_k for 2 ≤ k ≤ n. By Corollary 7.11 and Lemma 7.10 we must either have {y₂, …, y_{k_s}} = L(T′) \ {o} and {y_{k_s+1}, …, y_n} = L(T″) \ {o} or the analogous conclusion with the roles of T′ and T″ interchanged holds (if ⧣L(T′) ≠ ⧣L(T″), then only one alternative is possible). We may suppose without loss of generality that the choice of υ′ and υ″ is such that the first alternative holds.

Set

s' ≔ (s_{2} - 1, \dots, s_{k_{s}} - 1), s ″ ≔ (s_{k_{s} + 1} - (2 k_{s} - 2), \dots, s_{n} - (2 k_{s} - 2)) .

By definition, s′ and s″ are down-split sequences. Because ℙ{𝒲_(T,υ) = s} > 0, we have

ℙ {𝒲_{(T', o)} = s'} > 0

and

ℙ {𝒲_{(T ″, o)} = s ″} > 0 .

We claim that s′ must be the minimal down-split sequence for (T′, o). To see this, note that if there was a down-split sequence s̃′ with s̃′ ≺ s′ such that

ℙ {𝒲_{(T', o)} = \tilde{s}'} > 0,

then, writing

\bar{m} ≔ (m, \dots, m)

for a positive integer m, we would have

ℙ {𝒲_{(T, υ)} = (\tilde{s}' + \bar{1}, s ″ + \bar{2 k_{s} - 2})} > 0

and, by definition of the total order ≺,

(\tilde{s}' + \bar{1}, s ″ + \bar{2 k_{s} - 2}) ≺ (s' + \bar{1}, s ″ + \bar{2 k_{s} - 2}) = s .

This, however, contradicts the minimality of s. Similarly, s″ is the minimal down-split sequence for (T″, o). By induction, (T′, o) and (T″, o) are uniquely determined.

Since (T, υ) is obtained by gluing (T′, o) and (T″, o) together at the shared vertex o and attaching the marked leaf υ to o by an edge, we see that (T, υ) is also determined by s.

While the proof of Proposition 7.12 is not in the form of an explicit reconstruction procedure, the argument clearly leads to an algorithm for building a marked 3-valent tree (T, υ) from the corresponding minimal down-split sequence. Namely, (T, υ) is simply the recursion tree that results from parsing s as a down-split sequence as in Definition 7.2, with leaves corresponding to edges that terminate in the sequence (1).

7.2. Completion of the proof of Theorem 1.11 for 3-valent trees

With Proposition 7.12 in hand, we are able to easily prove Theorem 1.11 for (unmarked) 3-valent trees.

Proof

Let T be a fixed (unknown) 3-valent tree with n leaves and let 𝒲_T be its random length sequence. Conditional on Y₁, 𝒲_T is the modified random length sequence of the marked binary tree (T, Y₁). Thus, if

ℙ {𝒲_{T} = s} > 0,

then there must be some leaf υ ∈ T such that

ℙ {𝒲_{(T, υ)} = s} > 0 .

Let s* be the minimal element of the set

{s down - split : ℙ {𝒲_{T} = s} > 0} .

Then s* must be the minimal down-split sequence for (T, υ) for at least one leaf υ of T. By Proposition 7.12 we can reconstruct (T, υ) and hence T from s*.

7.3. Completion of the proof of Theorem 1.12 for 3-valent trees

The above argument can be pushed further to prove Theorem 1.12 for 𝒯 a random 3-valent tree.

Proof

Let 𝒯 be a random 3-valent tree with n leaves and random length sequence 𝒲_𝒯.

Given a 3-valent tree T with n leaves, let s^T be the minimal element of the set of down-split sequences of the marked 3-valent trees (T, υ) as υ ranges over L(T). We equip the set of 3-valent tree with n leaves with a total order that, with a slight abuse of notation, we denote by ≺ by declaring that T′ ≺ T″ if s^T′ ≺ s^T″. Note that if T′ ≺ T″, then ℙ{𝒲_T′ = s^T′} > 0 and ℙ{𝒲_T″ = s^T′} = 0. Now, for each choice of T we have

ℙ {𝒲_{𝒯} = s^{T}} = \sum_{T'} ℙ {𝒯 = T'} ℙ {𝒲_{T'} = s^{T}} = \sum_{T' ⪯ T} ℙ {𝒯 = T'} ℙ {𝒲_{T'} = s^{T}},

and the conclusion that we can recover ℙ{𝒯 = T} as T ranges over the 3-valent trees with n leaves follows simply from the observation that if b is a row vector of length N and A is an N × N matrix that has all entries below the diagonal zero and all entries on the diagonal strictly positive, then there is a unique row vector x of length N such that b = xA.

8. Proofs of Theorem 1.11 and Theorem 1.12 for rooted binary trees

For simplicity of notation we present the details of the proofs of the claims in Theorem 1.11 and Theorem 1.12 for rooted k-ary trees in the special case k = 2 (that is, for rooted binary trees). We comment on how the proof generalizes to arbitrary k in Section 9.

8.1. Up-split sequences

Analogous to the objects we introduced for marked 3-valent trees, we begin with a definition of a class of sequences that will appear in the support of the random length sequence of a rooted binary tree.

Definition 8.1

An up-split sequence is an element of the class of increasing sequences of nonnegative integers defined recursively as follows.

The sequence

s = (0)

is an up-split sequence.

A sequence s = (s₁, …, s_n), n > 1, is an up-split sequence if

{1 \leq p < n : s_{p} = 2 p - 2} \neq \emptyset

and, setting

K_{s} ≔ max {1 \leq p < n : s_{p} = 2 p - 2},

(s₁, …, s_{K_s}) is an up-split sequence,
(s_{K_s+1} − (2K_s − 1), …, s_n − (2K_s − 1)) is a down-split sequence.

The index K_s is the up-splitting index of s.

Example 8.2

Suppose that T is a rooted binary tree with root o. In a manner similar to the construction in Remark 7.1 we can define a partial order on V(T) by declaring that x precedes y if x ≠ y is on the path between ρ and y, and we can extend this partial order to a total order < such that if w, x, y, z are such that w and x are not comparable in the partial order but w < x, w precedes y in the partial order, and x precedes z in the partial order, then y < z. Suppose that y₁ < y₂ < … < y_n is the ordered listing of L(T). Set s₁ ≔ 0 and s_k ≔ W_T({y₁, …, y_k}), 2 ≤ k ≤ n. Then (s₁, …, s_n) is an up-split sequences. The leaves y₁, …, y_{K_s} and y_{K_s+1}, …, y_n respectively span the two binary subtrees T′ and T″ that are rooted at the two children of the root o. The subtree spanned by o and y_{K_s+1}, …, y_n is a 3-valent tree.

The following analogue of Lemma 7.4 is clear from Example 8.2.

Lemma 8.3

For every rooted binary tree T, there is at least one up-split sequence s with

ℙ {(0, 𝒲_{T}) = s} > 0 .

The following analogue of Lemma 7.5 can be established using a similar inductive proof.

Lemma 8.4

If s = (s₁, …, s_n) is an up-split sequence then s_n = 2n − 2.

Definition 8.5

Define a total order ≪ on the set of up-split sequences of a given length recursively as follows. Firstly, (0) ≪ (0) does not hold. Next, let s and r be two up-split sequences indexed by {1, …, n} with respective up-splitting indices K_s and K_r. Set

s' = (s_{1}, \dots, s_{k}), r' = (r_{1}, \dots, r_{k})

and

s ″ = (s_{k + 1} - (2 k - 1), \dots, s_{n} - (2 k - 1)), r ″ = (r_{k + 1} - (2 k - 1), \dots, r_{n} - (2 k - 1)) .

Declare that

s ≪ r

K_{s} > K_{r}

K_{s} = K_{r} and s' ≪ r'

K_{s} = K_{r} and s' = r' and s ″ ≺ r ″ .

Remark 8.6

Note for up-split sequences s and r, that s ≪ r implies that the up-splitting index of s is greater than or equal to the up-splitting index of r. For down-split sequences u and t, u ≺ t implies that the down-splitting index of u is less than or equal to the down-splitting index of t. This change in the direction of the inequalities matches the switch in the definition of the splitting index from an minimum for down-split sequences to a maximum for up-split sequences.

The next result follows easily by induction.

Lemma 8.7

The binary relation ≪ is a total order on the set of up-split sequences of a given length.

Definition 8.8

The minimal up-split sequence for a rooted binary tree T is the minimal element (with respect to the total order ≪) of the set

{s up - split : ℙ {𝒲_{T} = s} > 0} .

The up-split sequence analogues of Lemma 7.10 and Corollary 7.11 are the following and they are proved in essentially the same manner.

Lemma 8.9

Given a binary tree T with root o, let T′ and T″ be the binary subtrees rooted at the two children of o. Set

m ≔ max {1 \leq k < n : ℙ {W_{k} = 2 k - 2} > 0} .

Then W_m = 2m − 2 if and only if Y₁, …, Y_m ∈ T′ and Y_m+1, …, Y_n ∈ T″ or vice versa.

Corollary 8.10

Let T be a rooted binary tree with random length sequence 𝒲_T = (W₂, …, W_n). Then

m ≔ max {1 \leq k < n : ℙ {W_{k} = 2 k - 2} > 0}

is the up-splitting index for the minimal up-split sequence for T.

The following analogue of Proposition 7.12 for up-split sequences follows from Lemma 8.9 and Corollary 8.10 in essentially the same manner that Proposition 7.12 followed from Lemma 7.10 and Corollary 7.11.

Proposition 8.11

Let s be the minimal up-split sequence for a rooted binary tree T. There is no other rooted binary tree for which s is the minimal up-split sequence.

8.2. Completion of the proofs for rooted binary trees

Clearly, Proposition 8.11 completes the proof of Theorem 1.11. To establish Theorem 1.12 in the case of 𝒯 a random rooted binary tree, we need only repeat the argument of the proof of Theorem 1.12 given in Section 7.2 for 3-valent trees.

9. Extension to (k + 1)-valent and rooted k-ary combinatorial trees

The proofs of Theorem 1.11 and Theorem 1.12 to (k+1)-valent and rooted k-ary combinatorial trees for general k ≥ 3 are very similar to the k = 2 case and involve the introduction of suitable notion of down-split and up-split sequences along with appropriate total orders on these sets of sequences. The only difference is that both types of split sequences are now split into k smaller sequences, instead of just two. We leave the details to the reader.

10. Open problems

The original conjecture Question 1.1 remains open in general, both for simple trees with arbitrary edge weights (not in general position), and for combinatorial trees. An even more general question is suggested by Theorem 1.12.

Question 10.1

Let 𝒯 be a random tree with probability distribution supported either on the set of simple trees with n leaves and general edge weights or the set of combinatorial trees with n leaves. Can the probability distribution of 𝒯 be determined uniquely from the joint probability distribution of the random length sequence 𝒲_𝒯?

Even if the answer to Question 10.1 is “no”, the answer may still be “yes” if the probability distribution of 𝒯 is known a priori to belong to some particular family of probability distributions. There are, of course, many families of probability models for with random trees with n leaves that are described by a small number of parameters (for example, conditioned Galton-Watson models, the various preferential attachment models), and perhaps the value of these parameters can be determined from the joint probability distribution of the random length sequence of a random tree that is known a priori to be distributed according to a member of one of these families.

Question 10.2

Given a vector x, is there a necessary and sufficient condition for x to be in the support of 𝒲_T for some edge-weighted tree T?

We remarked in the Introduction that the focus of this paper is superficially similar to that in [KS85], where the problem of reconstructing a combinatorial tree from its number deck (the sizes of the subtrees in the forests produced by deleting each vertex) was studied. The lists of lists that are the number deck of some combinatorial tree are characterized in [KEM86].

Question 10.3

Are there more parsimonious quantities derived from the joint probability distribution of the random length sequence that contain enough information to reconstruct T, at least within some class of trees or up to some degree of ambiguity? For example, how much information about T is contained in the expectation (𝔼[W₂], …, 𝔼[W_n]) of the random length sequence and is it possible to characterize those vectors which can arise as the expectation of the random length sequence?

Figure 7.2 — A marked 3-valent tree with its leaves ordered minimally and the corresponding parse tree for the minimal down-split sequence.

Figure 8.1 — A rooted binary tree split as a rooted binary subtree T′ and a marked 3-valent tree (T″, o).

Acknowledgments

We thank David Aldous, Persi Diaconis, Ron Graham, Alberto Grünbaum, Mike Steel, Bernd Sturmfels, and the anonymous referees for helpful comments.

SNE supported in part by NSF grants DMS-0907630 and DMS-1512933, and NIH grant 1R01GM109454-01.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Steven N. Evans, Department of Statistics, University of California, 367 Evans Hall #3860, Berkeley, CA 94720-3860, U.S.A

Daniel Lanoue, Department of Mathematics, University of California, 970 Evans Hall #3840, Berkeley, CA 94720-3840, U.S.A.

References

[AHU75].Aho Alfred V, Hopcroft John E, Ullman Jeffrey D. Second printing, Addison-Wesley Series in Computer Science and Information Processing. Addison-Wesley Publishing Co.; Reading, Mass.-London-Amsterdam: 1975. The design and analysis of computer algorithms. [Google Scholar]
[APZ14].Aliste-Prieto José, Zamora José. Proper caterpillars are distinguished by their chromatic symmetric function. Discrete Math. 2014;315:158–164. [Google Scholar]
[Bed74].Bednarek AR. A note on tree isomorphisms. J. Combinatorial Theory Ser. B. 1974;16:194–196. [Google Scholar]
[BES12].Bhamidi Shankar, Evans Steven N, Sen Arnab. Spectra of large random trees. J. Theoret. Probab. 2012;25(3):613–654. [Google Scholar]
[BM93].Botti Phillip, Merris Russell. Almost all trees share a complete set of immanantal polynomials. J. Graph Theory. 1993;17(4):467–476. [Google Scholar]
[Bon69].Bondy JA. On Kelly’s congruence theorem for trees. Proc. Cambridge Philos. Soc. 1969;65:387–397. [Google Scholar]
[Bon91].Bondy JA. Surveys in combinatorics, 1991 (Guildford, 1991), London Math. Soc. Lecture Note Ser. Vol. 166. Cambridge Univ. Press; Cambridge: 1991. A graph reconstructor’s manual; pp. 221–252. [Google Scholar]
[Bun71].Buneman Peter. The recovery of trees from measures of dissimilarity. In: Hodson FR, Kendall DG, Tautu P, editors. Mathematics in the archaeological and historical sciences. Edinburgh University Press; Edinburgh: 1971. pp. 387–395. [Google Scholar]
[Bun74].Buneman Peter. A note on the metric properties of trees. J. Combinatorial Theory Ser. B. 1974;17:48–50. [Google Scholar]
[DHM07].Dress A, Huber KT, Moulton V. Some uses of the Farris transform in mathematics and phylogenetics—a review. Ann. Comb. 2007;11(1):1–37. [Google Scholar]
[Diu13].Diudea Mircea V. Hosoya-Diudea polynomials revisited. MATCH Commun. Math. Comput. Chem. 2013;69(1):93–100. [Google Scholar]
[EG06].Eisenstat David, Gordon Gary. Non-isomorphic caterpillars with identical subtree data. Discrete Math. 2006;306(8–9):827–830. [Google Scholar]
[Eva08].Evans Steven N. Lecture Notes in Mathematics. Vol. 1920. Springer, Berlin: Lectures from the 35th Summer School on Probability Theory held in Saint-Flour; Jul 6–23, 2005. Probability and real trees. 2008. [Google Scholar]
[Fel04].Felsenstein J. Inferring phylogenies. Sinauer Press; Sunderland, MA: 2004. [Google Scholar]
[FGM97].Flajolet Philippe, Gourdon Xavier, Martínez Conrado. Patterns in random binary search trees. Random Structures Algorithms. 1997;11(3):223–244. [Google Scholar]
[GM89].Gordon Gary, McMahon Elizabeth. A greedoid polynomial which distinguishes rooted arborescences. Proc. Amer. Math. Soc. 1989;107(2):287–298. [Google Scholar]
[GMOY95].Gordon Gary, McDonnell Eleanor, Orloff Darren, Yung Nen. On the Tutte polynomial of a tree. Proceedings of the Twenty-sixth Southeastern International Conference on Combinatorics, Graph Theory and Computing (Boca Raton, FL, 1995) 1995;108:141–151. [Google Scholar]
[HLP88].Hardy GH, Littlewood JE, Pólya G. Inequalities. Cambridge Mathematical Library, Cambridge University Press, Cambridge. 1988 Reprint of the 1952 edition. [Google Scholar]
[HP66].Harary Frank, Palmer Ed. The reconstruction of a tree from its maximal subtrees. Canad. J. Math. 1966;18:803–810. [Google Scholar]
[HS07].Hartmann Klaas, Steel Mike. Reconstructing evolution. Oxford Univ. Press; Oxford: 2007. Phylogenetic diversity: from combinatorics to ecology; pp. 171–196. [Google Scholar]
[Kel57].Kelly Paul J. A congruence theorem for trees. Pacific J. Math. 1957;7:961–968. [Google Scholar]
[KEM86].Krasikov I, Ellingham MN, Myrvold Wendy. Legitimate number decks for trees. Ars Combin. 1986;21:15–17. [Google Scholar]
[KS85].Krasikov I, Schönheim J. The reconstruction of a tree from its number deck. Discrete Math. 1985;53:137–145. Special volume on ordered sets and their applications (L’Arbresle, 1982) [Google Scholar]
[Lau83].Lauri J. Proof of Harary’s conjecture on the reconstruction of trees. Discrete Math. 1983;43(1):79–90. [Google Scholar]
[Man70].Manvel Bennet. Reconstruction of trees. Canad. J. Math. 1970;22:55–60. [Google Scholar]
[McK77].McKay Brendan D. On the spectral characterisation of trees. Ars Combinatoria. 1977;3:219–232. [Google Scholar]
[ME11].Matsen Frederick A, Evans Steven N. Ubiquity of synonymity: almost all large binary trees are not uniquely identified by their spectra or their immanantal polynomials. Algorithms for molecular biology: AMB. 2011;7(1):14–14. doi: 10.1186/1748-7188-7-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
[MMW08].Martin Jeremy L, Morin Matthew, Wagner Jennifer D. On distinguishing trees by their chromatic symmetric functions. J. Combin. Theory Ser. A. 2008;115(2):237–253. [Google Scholar]
[OS14].Orellana Rosa, Scott Geoffrey. Graphs with equal chromatic symmetric functions. Discrete Math. 2014;320:1–14. [Google Scholar]
[Pau00].Pauplin Yves. Direct calculation of a tree length using a distance matrix. Journal of Molecular Evolution. 2000;51(1):41–47. doi: 10.1007/s002390010065. [DOI] [PubMed] [Google Scholar]
[PS04].Pachter L, Speyer D. Reconstructing trees from subtree weights. Appl. Math. Lett. 2004;17(6):615–621. [Google Scholar]
[RC77].Read Ronald C, Corneil Derek G. The graph isomorphism disease. J. Graph Theory. 1977;1(4):339–363. [Google Scholar]
[RS82].Rosenblatt Joseph, Seymour Paul D. The structure of homometric sets. SIAM Journal on Algebraic Discrete Methods. 1982;3(3):343–350. [Google Scholar]
[Sch73].Schwenk Allen J. New Directions in the Theory of Graphs. Academeic Press; New York: 1973. Almost all trees are cospectral; pp. 275–307. [Google Scholar]
[SF83].Steyaert Jean-Marc, Flajolet Philippe. Patterns and pattern-matching in trees: an analysis. Inform. and Control. 1983;58(1–3):19–58. [Google Scholar]
[SP69].Simões Pereira JMS. A note on the tree realizability of a distance matrix. J. Combinatorial Theory. 1969;6:303–310. [Google Scholar]
[SS03].Semple Charles, Steel Mike. Oxford Lecture Series in Mathematics and its Applications. Vol. 24. Oxford University Press; Oxford: 2003. Phylogenetics. [Google Scholar]
[SS04].Semple Charles, Steel Mike. Cyclic permutations and evolutionary trees. Adv. in Appl. Math. 2004;32(4):669–680. [Google Scholar]
[SST15].Smith Isaac, Smith Zane, Tian Peter. Symmetric chromatic polynomial of trees. 2015 arxiv.org/abs/1505.01889.
[Sta95].Stanley Richard P. A symmetric function generalization of the chromatic polynomial of a graph. Adv. Math. 1995;111(1):166–194. [Google Scholar]
[Tur68].Turner James. Generalized matrix functions and the graph isomorphism problem. SIAM J. Appl. Math. 1968;16:520–526. [Google Scholar]
[Ula60].Ulam SM. Interscience Tracts in Pure and Applied Mathematics. Vol. 8. Interscience Publishers; New York-London: 1960. A collection of mathematical problems. [Google Scholar]
[Zar65].Zaretskii KA. Constructing trees from the set of distances between pendant vertices. Uspehi Matematiceskih Nauk. 1965;20:90–92. [Google Scholar]

[R1] [AHU75].Aho Alfred V, Hopcroft John E, Ullman Jeffrey D. Second printing, Addison-Wesley Series in Computer Science and Information Processing. Addison-Wesley Publishing Co.; Reading, Mass.-London-Amsterdam: 1975. The design and analysis of computer algorithms. [Google Scholar]

[R2] [APZ14].Aliste-Prieto José, Zamora José. Proper caterpillars are distinguished by their chromatic symmetric function. Discrete Math. 2014;315:158–164. [Google Scholar]

[R3] [Bed74].Bednarek AR. A note on tree isomorphisms. J. Combinatorial Theory Ser. B. 1974;16:194–196. [Google Scholar]

[R4] [BES12].Bhamidi Shankar, Evans Steven N, Sen Arnab. Spectra of large random trees. J. Theoret. Probab. 2012;25(3):613–654. [Google Scholar]

[R5] [BM93].Botti Phillip, Merris Russell. Almost all trees share a complete set of immanantal polynomials. J. Graph Theory. 1993;17(4):467–476. [Google Scholar]

[R6] [Bon69].Bondy JA. On Kelly’s congruence theorem for trees. Proc. Cambridge Philos. Soc. 1969;65:387–397. [Google Scholar]

[R7] [Bon91].Bondy JA. Surveys in combinatorics, 1991 (Guildford, 1991), London Math. Soc. Lecture Note Ser. Vol. 166. Cambridge Univ. Press; Cambridge: 1991. A graph reconstructor’s manual; pp. 221–252. [Google Scholar]

[R8] [Bun71].Buneman Peter. The recovery of trees from measures of dissimilarity. In: Hodson FR, Kendall DG, Tautu P, editors. Mathematics in the archaeological and historical sciences. Edinburgh University Press; Edinburgh: 1971. pp. 387–395. [Google Scholar]

[R9] [Bun74].Buneman Peter. A note on the metric properties of trees. J. Combinatorial Theory Ser. B. 1974;17:48–50. [Google Scholar]

[R10] [DHM07].Dress A, Huber KT, Moulton V. Some uses of the Farris transform in mathematics and phylogenetics—a review. Ann. Comb. 2007;11(1):1–37. [Google Scholar]

[R11] [Diu13].Diudea Mircea V. Hosoya-Diudea polynomials revisited. MATCH Commun. Math. Comput. Chem. 2013;69(1):93–100. [Google Scholar]

[R12] [EG06].Eisenstat David, Gordon Gary. Non-isomorphic caterpillars with identical subtree data. Discrete Math. 2006;306(8–9):827–830. [Google Scholar]

[R13] [Eva08].Evans Steven N. Lecture Notes in Mathematics. Vol. 1920. Springer, Berlin: Lectures from the 35th Summer School on Probability Theory held in Saint-Flour; Jul 6–23, 2005. Probability and real trees. 2008. [Google Scholar]

[R14] [Fel04].Felsenstein J. Inferring phylogenies. Sinauer Press; Sunderland, MA: 2004. [Google Scholar]

[R15] [FGM97].Flajolet Philippe, Gourdon Xavier, Martínez Conrado. Patterns in random binary search trees. Random Structures Algorithms. 1997;11(3):223–244. [Google Scholar]

[R16] [GM89].Gordon Gary, McMahon Elizabeth. A greedoid polynomial which distinguishes rooted arborescences. Proc. Amer. Math. Soc. 1989;107(2):287–298. [Google Scholar]

[R17] [GMOY95].Gordon Gary, McDonnell Eleanor, Orloff Darren, Yung Nen. On the Tutte polynomial of a tree. Proceedings of the Twenty-sixth Southeastern International Conference on Combinatorics, Graph Theory and Computing (Boca Raton, FL, 1995) 1995;108:141–151. [Google Scholar]

[R18] [HLP88].Hardy GH, Littlewood JE, Pólya G. Inequalities. Cambridge Mathematical Library, Cambridge University Press, Cambridge. 1988 Reprint of the 1952 edition. [Google Scholar]

[R19] [HP66].Harary Frank, Palmer Ed. The reconstruction of a tree from its maximal subtrees. Canad. J. Math. 1966;18:803–810. [Google Scholar]

[R20] [HS07].Hartmann Klaas, Steel Mike. Reconstructing evolution. Oxford Univ. Press; Oxford: 2007. Phylogenetic diversity: from combinatorics to ecology; pp. 171–196. [Google Scholar]

[R21] [Kel57].Kelly Paul J. A congruence theorem for trees. Pacific J. Math. 1957;7:961–968. [Google Scholar]

[R22] [KEM86].Krasikov I, Ellingham MN, Myrvold Wendy. Legitimate number decks for trees. Ars Combin. 1986;21:15–17. [Google Scholar]

[R23] [KS85].Krasikov I, Schönheim J. The reconstruction of a tree from its number deck. Discrete Math. 1985;53:137–145. Special volume on ordered sets and their applications (L’Arbresle, 1982) [Google Scholar]

[R24] [Lau83].Lauri J. Proof of Harary’s conjecture on the reconstruction of trees. Discrete Math. 1983;43(1):79–90. [Google Scholar]

[R25] [Man70].Manvel Bennet. Reconstruction of trees. Canad. J. Math. 1970;22:55–60. [Google Scholar]

[R26] [McK77].McKay Brendan D. On the spectral characterisation of trees. Ars Combinatoria. 1977;3:219–232. [Google Scholar]

[R27] [ME11].Matsen Frederick A, Evans Steven N. Ubiquity of synonymity: almost all large binary trees are not uniquely identified by their spectra or their immanantal polynomials. Algorithms for molecular biology: AMB. 2011;7(1):14–14. doi: 10.1186/1748-7188-7-14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] [MMW08].Martin Jeremy L, Morin Matthew, Wagner Jennifer D. On distinguishing trees by their chromatic symmetric functions. J. Combin. Theory Ser. A. 2008;115(2):237–253. [Google Scholar]

[R29] [OS14].Orellana Rosa, Scott Geoffrey. Graphs with equal chromatic symmetric functions. Discrete Math. 2014;320:1–14. [Google Scholar]

[R30] [Pau00].Pauplin Yves. Direct calculation of a tree length using a distance matrix. Journal of Molecular Evolution. 2000;51(1):41–47. doi: 10.1007/s002390010065. [DOI] [PubMed] [Google Scholar]

[R31] [PS04].Pachter L, Speyer D. Reconstructing trees from subtree weights. Appl. Math. Lett. 2004;17(6):615–621. [Google Scholar]

[R32] [RC77].Read Ronald C, Corneil Derek G. The graph isomorphism disease. J. Graph Theory. 1977;1(4):339–363. [Google Scholar]

[R33] [RS82].Rosenblatt Joseph, Seymour Paul D. The structure of homometric sets. SIAM Journal on Algebraic Discrete Methods. 1982;3(3):343–350. [Google Scholar]

[R34] [Sch73].Schwenk Allen J. New Directions in the Theory of Graphs. Academeic Press; New York: 1973. Almost all trees are cospectral; pp. 275–307. [Google Scholar]

[R35] [SF83].Steyaert Jean-Marc, Flajolet Philippe. Patterns and pattern-matching in trees: an analysis. Inform. and Control. 1983;58(1–3):19–58. [Google Scholar]

[R36] [SP69].Simões Pereira JMS. A note on the tree realizability of a distance matrix. J. Combinatorial Theory. 1969;6:303–310. [Google Scholar]

[R37] [SS03].Semple Charles, Steel Mike. Oxford Lecture Series in Mathematics and its Applications. Vol. 24. Oxford University Press; Oxford: 2003. Phylogenetics. [Google Scholar]

[R38] [SS04].Semple Charles, Steel Mike. Cyclic permutations and evolutionary trees. Adv. in Appl. Math. 2004;32(4):669–680. [Google Scholar]

[R39] [SST15].Smith Isaac, Smith Zane, Tian Peter. Symmetric chromatic polynomial of trees. 2015 arxiv.org/abs/1505.01889.

[R40] [Sta95].Stanley Richard P. A symmetric function generalization of the chromatic polynomial of a graph. Adv. Math. 1995;111(1):166–194. [Google Scholar]

[R41] [Tur68].Turner James. Generalized matrix functions and the graph isomorphism problem. SIAM J. Appl. Math. 1968;16:520–526. [Google Scholar]

[R42] [Ula60].Ulam SM. Interscience Tracts in Pure and Applied Mathematics. Vol. 8. Interscience Publishers; New York-London: 1960. A collection of mathematical problems. [Google Scholar]

[R43] [Zar65].Zaretskii KA. Constructing trees from the set of distances between pendant vertices. Uspehi Matematiceskih Nauk. 1965;20:90–92. [Google Scholar]

PERMALINK

RECOVERING A TREE FROM THE LENGTHS OF SUBTREES SPANNED BY A RANDOMLY CHOSEN SEQUENCE OF LEAVES

Steven N Evans

Daniel Lanoue

Abstract

1. Introduction

1.1. Background and motivation

Question 1.1

Question 1.2

1.2. Overview of the main results

Figure 1.1.

Theorem 1.3

Theorem 1.4

Theorem 1.5

Theorem 1.6

Remark 1.7

Remark 1.8

Corollary 1.9

Figure 1.2.

Theorem 1.10

Theorem 1.11

Theorem 1.12

2. Trees with up to n = 4 leaves: Proof of Theorem 1.4

Proof

Figure 2.1.

3. Trees in general position: Proof of Theorem 1.5

Proof

4. Ultrametric trees: Proof of Theorem 1.6

Proof

Remark 4.1

Remark 4.2

5. Caterpillar trees: Proof of Theorem 1.10

Remark 5.1

Proposition 5.2

Proof

6. k + 1-valent and rooted k-ary trees – preliminaries

Lemma 6.1

Proof

7. Proofs of Theorem 1.11 and Theorem 1.12 for 3-valent trees

7.1. Down-split sequences

Remark 7.1

Definition 7.2

Example 7.3

Lemma 7.4

Lemma 7.5

Proof

Example 7.6

Figure 7.1.

Definition 7.7

Lemma 7.8

Definition 7.9

Lemma 7.10

Proof

Corollary 7.11

Proof

Proposition 7.12

Proof

7.2. Completion of the proof of Theorem 1.11 for 3-valent trees

Proof

7.3. Completion of the proof of Theorem 1.12 for 3-valent trees

Proof

8. Proofs of Theorem 1.11 and Theorem 1.12 for rooted binary trees

8.1. Up-split sequences

Definition 8.1

Example 8.2

Lemma 8.3

Lemma 8.4

Definition 8.5

Remark 8.6

Lemma 8.7

Definition 8.8

Lemma 8.9

Corollary 8.10

Proposition 8.11

8.2. Completion of the proofs for rooted binary trees

9. Extension to (k + 1)-valent and rooted k-ary combinatorial trees

10. Open problems

Question 10.1

Question 10.2

Question 10.3