THE SHAPE OF THE ONE-DIMENSIONAL PHYLOGENETIC LIKELIHOOD FUNCTION

Vu Dinh; Frederick A Matsen, IV

doi:10.1214/16-aap1240

. Author manuscript; available in PMC: 2023 May 2.

Published in final edited form as: Ann Appl Probab. 2017 Jul 19;27(3):1646–1677. doi: 10.1214/16-aap1240

THE SHAPE OF THE ONE-DIMENSIONAL PHYLOGENETIC LIKELIHOOD FUNCTION

Vu Dinh ¹, Frederick A Matsen IV ¹

PMCID: PMC10153603 NIHMSID: NIHMS1841431 PMID: 37139100

Abstract

By fixing all parameters in a phylogenetic likelihood model except for one branch length, one obtains a one-dimensional likelihood function. In this work, we introduce a mathematical framework to characterize the shapes of such one-dimensional phylogenetic likelihood functions. This framework is based on analyses of algebraic structures on the space of all frequency patterns with respect to a polynomial representation of thspace likelihood functions. Using this framework, we provide conditions under which the one-dimensional phylogenetic likelihood functions are guaranteed to have at most one stationary point, and this point is the maximum likelihood branch length. These conditions are satisfied by common simple models including all binary models, the Jukes-Cantor model and the Felsenstein 1981 model.

We then prove that for the simplest model that does not satisfy our conditions, namely, the Kimura 2-parameter model, the one-dimensional likelihood functions may have multiple stationary points. As a proof of concept, we construct a non-degenerate example in which the phylogenetic likelihood function has two local maxima and a local minimum. To construct such examples, we derive a general method of constructing a tree and sequence data with a specified frequency pattern at the root. We then extend the result to prove that the space of all rescaled and translated one-dimensional phylogenetic likelihood functions under the Kimura 2-parameter model is dense in the space of all non-negative continuous functions on [0, ∞) with finite limits. These results indicate that one-dimensional likelihood functions under advanced evolutionary models can be more complex than it is typically assumed by phylogenetic inference algorithms; however, these complexities can be effectively captured by the Kimura 2-parameter model.

Keywords: evolutionary model, molecular evolution, phylogenetics, likelihood model, characteristic polynomial, algebraic representation, multimodality, universal model

MSC 2010 subject classifications: Primary 05C05, 92B10; secondary 05C25, 92D15

1. Introduction.

The likelihood of a phylogenetic model is a function of the parameters of continuous time Markov chains (CTMCs) used to model sequence evolution along each branch. It is common to assume a single rate matrix and stationary frequency for the CTMCs but allow the branch lengths to vary, representing a single evolutionary process but differing amounts of evolution along each branch. Commonly used maximum-likelihood phylogeny programs improve likelihood by modifying branch lengths iteratively and one at a time [1]. The general approach for numerical maximization of the one-dimensional likelihood function given by fixing every parameter except for one branch length is to iteratively sample the function at a number of points, use surrogate functions to fit simple curves to those points, and use those fits as approximations to locate the maximum branch length. For example, programs often employ Newton’s method, in which the intuitive idea is to use first and second derivatives to approximate the likelihood function (varying along that branch) by a surrogate quadratic function. Since evaluations of the likelihoods (and their derivatives) are computationally expensive, many approaches have been tried to improve the efficiency of this optimization procedure [1].

Such approaches, however, rely on the assumptions that one-dimensional phylogenetic likelihood functions belong to some class of simple functions, and that the surrogate model can, at least, capture the shape of the functions. While there has been a considerable amount of work on finding multiple maxima of the multi-dimensional likelihood surfaces parameterized by all branch lengths for a tree [2, 3, 4], little has been done about the shapes of one-dimensional phylogenetic likelihood functions. The only attempt to investigate the shape of the one-dimensional phylogenetic likelihood functions has been [5], which provided a proof of uniqueness of the stationary points for one-dimensional phylogenetic likelihood functions in the case of the one parameter model of nucleotide substitution. Based on this proof, the authors of [5] asserted that there is at most one stationary point of the full likelihood surface. This claim was later disproved by [2], although the proof for the one-dimensional case still holds. However, the result has not been examined for the more complex models used in practice.

In this work, we introduce a mathematical framework to characterize the shapes of such one-dimensional phylogenetic likelihood functions. This framework is based on analyses of algebraic structures on the space of all frequency patterns with respect to a polynomial representation of the likelihood functions. Specifically, we introduce the new concept of logarithmic relative frequency patterns and analyze algebraic structures on the space of such patterns. These structures, along with the characteristic polynomial representations of one-dimensional phylogenetic likelihood functions, open a new way to explore the space of all possible likelihood functions. Moreover, by composing these structures, we are able to tackle the inverse problem of constructing a phylogenetic tree that has a given frequency pattern at the root. This enables us to construct phylogenetic trees that approximate any given likelihood function with arbitrary precision.

Using this framework, we provide conditions under which the one-dimensional phylogenetic likelihood functions are guaranteed to have at most one stationary point, and this point is the maximum of the one-dimensional function. These conditions are satisfied by common simple models including all binary models, the Jukes-Cantor model [6] and the Felsenstein 1981 model [7]. We then prove that for the simplest model that does not satisfy our conditions, namely, the Kimura 2-parameter model [8], the one-dimensional likelihood functions may have multiple stationary points. As a proof of concept, we construct a non-degenerate example in which the phylogenetic likelihood function has two local maxima and a local minimum.

We then extend the result to prove that the space of all rescaled and translated one-dimensional phylogenetic likelihood functions under the Kimura 2-parameter model is dense in the set of all non-negative continuous functions on [0, ∞) with finite limits. These results indicates that one-dimensional likelihood functions under advanced evolutionary models can be more complex than it is typically assumed by phylogenetic inference algorithms; however, these complexities can be effectively captured by the Kimura 2-parameter model.

2. Background and Definitions.

2.1. Markov models of sequence evolution.

Our setting is the standard IID setting for likelihood-based phylogenetics with a finite number of sites; we review the basics here but refer the reader to [9] for more details. Let Ω denote the set of states and let r = ∣Ω∣. For convenience, we assume that the states have indices 1 to r.

For an unrooted tree T with N taxa, we use E(T) and V(T) to denote the set of edges and vertices of T, respectively. On each edge e ∈ E(T), we assume that the mutation events occur according to a continuous time Markov chain on states Ω with instantaneous rate matrix Q_e. This rate matrix Q_e and the branch length t_e on the edge e define the transition matrix P^e = e^Qete on edge e, where $P_{i j}^{e} (t_{e})$ denotes the probability of mutating from state i to state j across the edge e (with length t_e).

We further assume that for all edges e ∈ E(T), the Markov chains that describe the mutation events are ergodic and time-reversible with respect to a fixed stationary distribution π, that is

lim_{t \to \infty} P_{i j}^{e} (t) = π_{j},

and

π_{i} P_{i j}^{e} (t) = π_{j} P_{j i}^{e} (t) \forall t,

for all i, j ∈ Ω and e ∈ E(T).

The phylogenetic likelihood is computed as follows given a set of (aligned) observed sequences ψ = (ψ₁, ψ₂, …, ψ_S) ∈ Ω^N×S of length S over N taxa of a tree τ. First orient the edges of τ away from an arbitrarily chosen root, ρ of the tree. (We can choose the root arbitrarily since each P_e is reversible with respect to π.) Each site i in the sequences determines a labeling ψ_i of each leaf by a state in Ω. An extension a of a labeling ψ_i is an assignment of states to all of the nodes in the tree that agrees with ψ on the leaves.

The probability of an extension a given the vector of branch lengths t = (t_e)_e∈E(T) is defined to be the probability of the state at the root (given by the stationary distribution) multiplied by the probabilities of all the state transitions (including self-transitions) across each branch in the tree

P (a ∣ t) = π (a_{ρ}) \prod_{(u, v) \in E (T)} P_{a_{u} a_{v}}^{u v} (t_{u v}),

where a_u denotes the assigned state of node u by a.

The likelihood of the data at site i is then the marginal probability over all the extensions

P (ψ_{i} ∣ t) = \sum_{a extends ψ} P (a ∣ t) .

We further assume, as is standard, that evolution is independent between sites. This implies that the likelihood of a set of sequences evolving is just the product of the probabilities for the individual sites

L (ψ ∣ t) = \prod_{s = 1}^{S} P (ψ_{i} ∣ t) .

In summary, the likelihood of observing ψ given the tree topology τ and the vector of branch lengths t = (t_e)_e∈E(T) has the form

L (ψ ∣ t) = \prod_{s = 1}^{S} \sum_{a} π (a_{ρ}) \prod_{(u, v) \in E (T)} P_{a_{u} a_{v}}^{u v} (t_{u v})

where a ranges over all extensions of ψ to the internal nodes of T and a_u denotes the assigned state of node u by a.

For readers familiar with the theory of probabilistic inference on graphical models, the likelihood functions studied in this paper can be alternatively described as follows. Consider a tree T and let {X_ν : ν ∈ V(T)} be a collection of random variables indexed by the nodes of the tree. For each edge (u, ν) ∈ E(T), we define the nonnegative potential function

k_{(u, v)} (i, j, t) ≔ P_{i j}^{u v} (t) .

We assume that the joint probability distribution p(x_V(T)) factorizes over the tree edges:

p (x_{V (T)}) \sim \prod_{(u, v) \in E (T)} k_{(u, v)} (x_{u}, x_{v}, t_{u v}) .

The likelihood functions of interest may then be represented as the marginal probability of the observation ψ on the leaves of the tree T. This formulation allows us to study the phylogenetic likelihood functions beyond the reversible Markov framework. We will investigate partial extensions to this more general case in Section 7, but for the next several sections we will focus on the standard phylogenetic setting (in which we can prove the strongest results).

2.2. One-dimensional phylogenetic likelihood functions.

To investigate the one-dimensional likelihood function on one branch e₀, we fix all other branches, partition the set of all extensions of ψ according to their labels at the end points of e₀, and split E(T) into two sets of edges E_left and E_right corresponding to the location of the edges with respect to e₀. The likelihood function can be rewritten as a univariate function of t, the branch length of e₀:

L (ψ ∣ t) = \prod_{s = 1}^{S} \sum_{i j} \sum_{a \in 𝒜_{i j}} π (a_{ρ}) (\prod_{e \in E_{left}} P_{a_{u} a_{v}}^{e} (t_{u v})) \times P_{i j}^{e_{0}} (t) \times (\prod_{e \in E_{right}} P_{a_{u} a_{v}}^{e} (t_{u v}))

where 𝒜_ij denotes the set of all extensions of ψ for which the labels at the left end point and the right end point of e₀ are i and j, respectively. We note that some 𝒜_ij may be empty if e₀ is a pendant edge and the observed value on the corresponding leaf is not i.

By grouping the products over E_left and E_right as well as the sum over a in a single term $b_{i j}^{s}$ , we can define the one-dimensional log-likelihood function as

ℓ_{e_{0}} (t) = \log L (ψ ∣ t) = \sum_{s = 1}^{S} \log (\sum_{i j} b_{i j}^{s} P_{i j}^{e_{0}} (t)) .

Such ℓ_e₀(t) are the object of study of this paper.

For convenience, we will assume that e₀ has been chosen and will drop the index e₀ hereafter.

2.3. Evolutionary models.

Throughout the paper, we use the term evolutionary model on state set Ω to refer to a collection ℋ of (Q,π) pairs, where π is a vector of stationary frequencies and Q is a rate matrix on Ω that is reversible with respect to π. If at every edge of the tree τ, the matrix-frequency pair (Q_e, π) belongs to ℋ, we say that τ is a tree under evolutionary model ℋ.

We will consider a number of different evolutionary models of DNA sequences. These DNA substitution models differ in terms of the parameters used to describe the rates at which one state replaces another during evolution and the stationary frequencies:

Jukes-Cantor model [6]: this model assumes equal stationary frequencies (π_A = π_G = π_T = π_C = 1/4) and equal mutation rates.
Felsenstein 1981 model [7]: this is an extension of the Jukes-Cantor model in which stationary frequencies are allowed to vary.
Kimura 2-parameter model [8]: this model assumes equal stationary frequencies, but distinguishes between the rates of transitions (A ↔ G i.e. from purine to purine, or C ↔ T, i.e. from pyrimidine to pyrimidine) and transversions (from purine to pyrimidine or vice versa).

Following common usage, we use κ to denote the transition/transversion rate ratio and write the rate matrix for this model as
$Q_{κ} = \frac{1}{2 (κ + 1)} (\begin{matrix} - (κ + 2) & κ & 1 & 1 \\ κ & - (κ + 2) & 1 & 1 \\ 1 & 1 & - (κ + 2) & κ \\ 1 & 1 & κ & - (κ + 2) \end{matrix}) .$

The special case κ = 3 will play a central role in the analysis of this paper. We clarify that the single κ parameter in the Kimura 2-parameter model determines a rate matrix that is shared across the tree, while this paper primarily concerns the effect of changing a single branch length parameter.

While the focus here is on DNA models, we emphasize that our theoretical framework is capable of analyzing any time-reversible evolutionary model on any state space. In fact, we do not assume a uniform molecular clock, or even a single evolutionary model along the edges of the tree.

2.4. Characteristic polynomials of one-dimensional phylogenetic likelihood functions.

We will frequently use the following assumption:

Assumption 2.1. The eigenvalues of the rate matrix Q are equal to

0 = d_{0} γ \geq - d_{1} γ \geq - d_{2} γ \geq \dots \geq - d_{r - 1} γ

for some positive number γ and non-negative integers d₁, …, d_r−1.

The following remark, whose proof is provided in the Appendix, guarantees that Assumption 2.1 does not affect the generality of our analyses up to an arbitrarily small approximation error:

Remark 2.1. The set of rate matrices Q for a given evolutionary model that satisfy Assumption 2.1 is dense in the set of rate matrices under the same evolutionary model.

Under Assumption 2.1, if we denote the entries of the diagonalizing matrix M and N of Q by m_ij and n_ij, respectively, then the transition probabilities can computed as

P_{i j} (t) = \sum_{k} m_{i k} e^{- d_{k} γ t} n_{k j} .

By reparametrizing with x := e^−γt, we can represent these transition probabilities as polynomial functions

P_{i j} (x) = \sum_{k} m_{i k} x^{d_{k}} n_{k j} .

Similarly, the log-likelihood function can be rewritten as

ℓ (x) = \sum_{s = 1}^{S} \log (λ_{s} (x)) where λ_{s} (x) = \sum_{i j} b_{i j}^{s} P_{i j} (x) .

Hereafter, we will refer to P_ij(x) and λ_s(x) as the transition polynomials of the evolutionary model and the characteristic polynomials of the one-dimensional phylogenetic likelihood function, respectively.

As we will see in later sections, this polynomial representation will enable us to exploit many algebraic and analytic properties of the likelihood functions. The most noticeable feature is that one can use the Fundamental Theorem of Algebra to factorize λ_s(x) as products of linear and quadratic polynomials. As a result, the log-likelihood function can be written in the form

ℓ (x) = \sum_{s = 1}^{S} \sum_{i = 1}^{i_{s, 1}} \log (α_{s, i} + β_{s, i} x) + \sum_{s = 1}^{S} \sum_{i = 1}^{i_{s, 2}} \log (μ_{s, i} + ν_{s, i} x + ω_{s, i} x^{2})

where μ_s,i, ν_s,i, ω_s,i are the (real) coefficients of the quadratic polynomials in the decomposition of λ_s, while α_s,i, β_s,i are coefficients of the linear terms in the decomposition.

This enables us to decompose a complicated evolutionary model into smaller modules, each of which can be approximated either by a “linear” model (like the binary symmetric model) or by a “quadratic” model (like the Kimura 2-parameter model). In Section 3, we use this formulation to prove that if the phylogenetic log-likelihood function is essentially linear (that is, there are no quadratic terms in the expression), its shape resembles those generated by binary models, with a unique stationary point that is also the maximum point. In Section 5, we illustrate that this property does not hold for quadratic models by constructing a counter-example with the Kimura 2-parameter model. Finally, in Section 6, we use this formulation once again to prove that the space of all rescaled and translated one-dimensional phylogenetic likelihood functions under the Kimura 2-parameter model is dense in the space of all continuous functions on [0, ∞) with finite limits.

3. Uniqueness of the stationary point.

In this section, we discuss a condition under which the uniqueness of the stationary branch length is guaranteed.

The analyses in this section stem from two observations:

If for every site index s, the characteristic polynomial λ_s has no non-real root, then the likelihood function can be decomposed into smaller modules, each of which resemble a binary model.
The likelihood functions of binary models and summations of such models are incave.

Definition 3.1 (Hanson [10]). A vector-valued function f is said to be incave in $R^{n}$ if there exists a vector-valued function ϕ(t, u) such that

f (t) - f (u) \leq ϕ (t, u) \cdot \nabla f (u), \forall t, u \in R^{n}

where ∇f denotes the gradient of f.

Incave functions were introduced in the optimization literature as a generalization of concave functions[10]. It can be proven that a function is incave if and only if every stationary point is a global maximum [11]. We are interested in the case of functions of a single real argument, for which the following result also holds:

Lemma 3.1. If f is a real-valued incave function with a finite number of stationary points, then f has at most one stationary point. Moreover, if such a point exists, it is also a global maximum.

Proof. Denote A = {t ∈ [0, ∞) : f′(t) = 0} and assume that A has more than one element. Since A is finite, we can choose two elements t₁ and t₂ in A such that the interval $(t_{1}, t_{2}) \subset R - A$ . Since f is incave, every stationary point of f is a global maximum. We deduce that t₁ and t₂ are both global maxima of f and f(t₁) = f(t₂). Using the mean value theorem, there exists t ∈ (t₁, t₂) such that f′(t) = 0. This is a contradiction.

This enables us to prove the following theorem.

Theorem 3.1. If for every site index s, the polynomial λ_s has only real roots, then ℓ has at most one stationary point. Moreover, if such a point exists, it is also a global maximum.

Proof. Since λ_s has only real roots, it can be written as product of linear functions

λ_{s} (x) = \prod_{i = 1}^{d_{p}} (α_{s, i} + β_{s, i} x)

where d_p, defined in Assumption 2.1, is the degree of the polynomial λ_s.

The log-likelihood function ℓ can be computed as

ℓ (t) = \sum_{s = 1}^{S} \log (λ_{s} (e^{- γ t})) = \sum_{s = 1}^{S} \log (\prod_{i = 1}^{d_{p}} (α_{s, i} + β_{s, i} e^{- γ t})) = \sum_{s = 1}^{S} \sum_{i = 1}^{d_{p}} \log (α_{s, i} + β_{s, i} e^{- γ t}) .

For any t, u > 0, we have

ℓ (t) - ℓ (u) = \sum_{s = 1}^{S} \sum_{i = 1}^{d_{p}} \log (\frac{α_{s, i} + β_{s, i} e^{- γ t}}{α_{s, i} + β_{s, i} e^{- γ u}}) \leq \sum_{s = 1}^{S} \sum_{i = 1}^{d_{p}} (\frac{α_{s, i} + β_{s, i} e^{- γ t}}{α_{s, i} + β_{s, i} e^{- γ u}} - 1) = \sum_{s = 1}^{m} \sum_{i = 1}^{d_{p}} (\frac{β_{s, i} (e^{- γ t} - e^{- γ u})}{α_{s, i} + β_{s, i} e^{- γ u}}) = \frac{1}{γ} (1 - e^{- γ (t - u)}) \sum_{s = 1}^{S} \sum_{i = 1}^{d_{p}} (\frac{- β_{s, i} γ e^{- γ u}}{α_{s, i} + β_{s, i} e^{- γ u}}) = \frac{1}{γ} (1 - e^{- γ (t - u)}) ℓ^{'} (u) .

Hence, ℓ is an incave function.

Furthermore, since λ_s are polynomial and e^−γt is a bijective map from [0, ∞) to (0, 1], we deduce that ℓ(t) only has a finite number of stationary points. Using Lemma 3.1, we conclude that ℓ has at most one stationary point; moreover, if such a point exists, it is also a global maximum.

We note that Theorem 3.1 imposes a condition on the characteristic polynomials rather than the evolutionary model, and can be applied to assess the uniqueness of the stationary point of any time-reversible evolutionary model satisfying Assumption 2.1. In fact, Theorem 3.1 does not assume a uniform molecular clock, or even a single evolutionary model along the edges of the tree. However, it is worth noting that for the class of models on which the rate matrices have only one non-zero eigenvalue, the result automatically holds:

Corollary 3.1. For binary, Jukes-Cantor and Felsenstein 1981 models, the one-dimensional likelihood function has at most one stationary point; if such point exists, it is the global maximum.

We also note that the results in previous studies about the number of maxima of likelihood surfaces [2, 3, 4] are derived for binary models. Theorem 3.1 complements those results in the sense that while the likelihood surfaces considered in those work may have multiple (or even a continuum of) local maxima, the stationary points of one-dimensional likelihood functions are still unique.

This corollary also extends and clarifies a result from the first attempt to investigate the shape of the one-dimensional phylogenetic likelihood functions [5]. By studying the location of the solutions of phylogenetic likelihood functions, the paper proves that one-dimensional phylogenetic likelihood functions have unique stationary points under the same model assumptions as Corollary 3.1.

This result also provides a full characterization of one-dimensional likelihood functions of binary models (and those considered by Corollary 3.1). Indeed, since the derivatives of log-likelihood functions are continuous with a single zero, this result implies that:

If there is no stationary point, then ℓ(t) is a monotonic function (either strictly decreasing or strictly increasing).
If the stationary point t₀ exists and is unique, then the function is increasing in the interval (0, t₀) and is decreasing in (t₀, ∞).

This simplicity of the shapes of phylogenetic likelihood functions provides a strong theoretical foundation for the use of simple optimization methods to locate the maximum likelihood branch length. However, we emphasize that these results are only about one-dimensional phylogenetic likelihood functions and do not mean that there is a unique (multivariate) stationary point of the likelihood surface or that simple hill-climbing methods will find this optima

4. Algebraic structures on the space of all logarithmic relative frequency patterns under the Kimura 2-parameter model.

While Section 3 provides a uniqueness result for the maximum likelihood branch lengths under three simple models, the result does not extend to more general models. In fact, as we will illustrate in the next section, the shapes of likelihood functions under the Kimura 2-parameter model [8] can be quite complicated, for example with multiple local and global maxima

In order to enable theoretical analyses of phylogenetic likelihood functions under more complex evolutionary models, here we introduce the concept of conditional logarithmic frequency patterns and study the algebraic structures on the space of such patterns.

Definition 4.1. Given a rooted tree τ_ρ with root ρ and N taxa, some labelings ψ = (ψ₁, …, ψ_S) ∈ Ω^N×S of its taxa and a vector of real constants (c₁, …, c_S) we define the logarithmic relative frequency pattern ϕ(τ_ρ, ψ, c) as the r × S matrix with entries

ϕ_{i, s} = c_{s} + \log \sum_{a \in ℒ_{i, s}} π (i) \prod_{(u, v) \in E (τ_{ρ})} P_{a_{u} a_{v}}^{u v} (t_{u v})

for i ∈ Ω, s = 1, …, S and ℒ_i,s being the set of all extensions a of ψ_s to all the nodes of τ such that a(ρ) = i. For convenience, we will use the shorter term frequency pattern to refer to a logarithmic relative frequency pattern. Because the Kimura 2-parameter model has a uniform stationary distribution, we will drop the constant term π(i) in the analyses.

In probabilistic terms, for a fixed site index s, the (i, s)-entry of a logarithmic relative frequency pattern ϕ(τ_ρ, ψ, c) is (up to a constant c_s) the logarithm of the likelihood of observing state i at the root of the tree, given leaf states ψ_s. This definition is directly related to the formulation of the characteristic polynomials λ_s, whose coefficients $b_{i j}^{s}$ are the product of the probabilities of observing state i and j at the two end points of an edge, given that the labeling ψ_s is observed at the taxa It is straightforward to verify that for models with uniform stationary distribution on a fixed tree, we have

\log b_{i j}^{s} = ϕ_{i, s} (τ_{1}) + ϕ_{j, s} (τ_{2}) + {\tilde{c}}_{s}

for all i, j, s, where ${\tilde{c}}_{s}$ is a constant depending only on s, and τ₁ and τ₂ are the trees obtained by removing the edge e₀ from the tree τ and rooting the newly created trees at the endpoints of e₀ (see the proof of Theorem 4.1 in the Appendix for more details).

Hence, to characterize the space of all phylogenetic characteristic polynomials under a given evolutionary model, we just need to characterize the space of all possible logarithmic relative frequency patterns under that model.

Definition 4.2. We denote the space of all possible logarithmic relative frequency patterns under the Kimura 2-parameter model by

G = {ϕ (τ, ψ, c) : τ \in 𝒯, ψ \in Ψ_{τ}^{S}, c \in R^{S}}

where 𝒯 denotes the set of all rooted trees and $Ψ_{τ}^{S}$ denotes the set of all tuples (ψ₁, …, ψ_S) of S labelings of the taxa of τ.

The goal of this section is to establish that for any sequence of S column vectors v₁, v₂, …, v_S in $R^{4}$ , there exists a tree τ under the Kimura 2-parameter model, labelings ψ = (ψ₁, ψ₂, …, ψ_S) of its taxa and a vector of real constants c such that

ϕ (τ, ψ, c) = [v_{1} v_{2} \dots v_{S}] .

The existence of such tree is guaranteed indirectly by proving that under the Kimura 2-parameter model:

G is an algebraic subgroup of ( $R^{4 \times S}$ , +).
G is a linear subspace of $R^{4 \times S}$ .
G is equal to $R^{4 \times S}$ itself.

Noting again that the stationary distribution of the Kimura 2-parameter model is the uniform distribution across states π = (1/4, 1/4, 1/4, 1/4), the first two steps are confirmed by the following theorem.

Theorem 4.1. If the stationary frequency of the evolutionary model is the same for every state, then the following properties hold:

(G, +) is a subgroup of ( $R^{4 \times S}$ , +).
G is path-connected.
G is a linear subspace of $R^{4 \times S}$ .

Sketch of proof.

A detailed proof of this Theorem is provided in the Appendix, but the main arguments can be simply illustrated. The fact that G is closed under addition follows because we can add two frequency patterns just by gluing the roots of the two corresponding trees, labeling the taxa of τ correspondingly and taking the pattern at the new root (Figure 2). Similarly, we can create the inverse of a pattern by gluing all permuted versions of its corresponding tree (with an appropriate vector of real constants).

Fig 2: — G is closed under addition: we can add two frequency patterns [X] and [Y] just by gluing the roots of the two corresponding trees, labeling the taxa of τ correspondingly and taking the pattern at the new root.

To prove that G is path-connected, given two arbitrary trees with roots ρ₁, ρ₂, we create a new tree by adding a new root ρ, joining ρ₁, ρ₂ with ρ by two new edges of length t and 1/t, respectively, and making ρ the root of τ (Figure 3). By varying t continuously from zero to infinity, we can make a continuous path in G that connects the two frequency patterns. Since any path-connected subgroup of $R^{n}$ is a linear subspace [12], so is G.

Fig 3: — G is path-connected: We can connect any two patterns [X] and [Y] by adding a new root ρ, joining it with the roots of the two corresponding trees with two new edges of length t and 1/t, respectively, and making ρ the root of τ.

We note that although the aforementioned arguments are made for the Kimura 2-parameter model, which describes a model of DNA evolution (r = 4), Theorem 4.1 only requires that the stationary frequency of the evolutionary model is the same for every state. Hence, this result also extends to models with more parameters.

Similarly, the fact that (G, +) is a subgroup of ( $R^{r \times S}$ , +) can be established under the assumption that the root distribution π is uniform, without assuming that it is the stationary distribution of the evolutionary process. However, our current approach requires the uniform root distribution to be the stationary distribution for the proof of path-connectivity of G, and an alternative approach to the proof of path-connectivity will be needed if we want to extend the analyses to a more general framework.

Recalling that the Kimura 2-parameter model corresponds to the uniform stationary distribution and a family of rate matrices Q_κ indexed by κ, the transition/transversion rate ratio, we then establish that when κ = 3, the space of all frequency pattern $G = R^{4 \times S}$ . The proof is done through proving by induction that G contains 4 × S independent frequency patterns (also proven in the Appendix):

Theorem 4.2. The set of all possible logarithmic conditional frequency patterns with S sites under the Kimura 2-parameter model with κ = 3 is equal to $R^{4 \times S}$ .

With those results, we finally can establish the main theorems of the section.

Theorem 4.3. For any sequence of column vectors v₁, v₂, …, v_S in $R^{4}$ , there exists a rooted tree τ under the Kimura 2-parameter model with κ = 3, S labelings ψ₁, ψ₂, …, ψ_S of its taxa, and a vector of real constants c such that

ϕ (τ, ψ, c) = [v_{1} v_{2} \dots v_{S}] .

While Theorem 4.3 provides a theoretical guarantee about the existence of a tree under the Kimura model with a given frequency patterns, the proof is not constructive. This raises some concerns about the practicality of the approach. For example, one can not derive an estimation of the number of edges required to produce a given frequency pattern. Those concerns are addressed by the following theorem.

Theorem 4.4. A tree as in Theorem 4.3 can be constructed with at most 64S edges.

Not only does the theorem provide an upper bound on the number of edges required to construct a tree with a given frequency pattern, its proof also provides a simple algorithm to construct such a tree.

Proof of Theorem 4.4. The main steps of the proof are as follows:

Step 1. As shown in the Appendix, any frequency pattern of the form [x, 0, 0, 0]^t can be produced (up to a real constant c₁) by a tree τ with 4 edges and some labeling ψ of its taxa

Step 2. Using τ from Step 1, we create a tree τ′ of 16 edges by gluing the roots of 4 different versions τ₁, τ₂, τ₃, τ₄ of τ together and define S labelings of τ′ as follows.

For s = 1, we copy the labeling of τ onto τ′.
$ψ_{1} (a) = ψ (a)$
for each taxon a of τ₁, τ₂, τ₃, τ₄.
For all s ≥ 2, the labelings are defined as follows:
$ψ_{s} (a) = σ^{j} (ψ (a)) if a is a taxon of τ_{j}$
where σ is the permutation (A G T C) in cycle notation.

The construction of τ′ is similar to the construction of the inverse of elements in the group G in the proof of Theorem 4.1. Because of symmetry, for s ≥ 2, the frequency pattern corresponding to site s at the root of the newly created tree will be the same for every state while for s = 1, the frequency pattern of τ′ is obtained by multiplying the frequency pattern of τ by a factor of 4.

We deduce that the pattern created by (τ′, {ψ_i}) is:

(\begin{matrix} 4 x & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 \end{matrix}) + (\begin{matrix} c_{1} & c_{2} & \dots & c_{S} \\ c_{1} & c_{2} & \dots & c_{S} \\ c_{1} & c_{2} & \dots & c_{S} \\ c_{1} & c_{2} & \dots & c_{S} \end{matrix})

for some real constants c₁, c₂, …, c_S.

Step 3. By similar arguments, for any i = 1, 2, 3, 4 and s = 1, 2, …, S, we can construct a tree of 16 edges for any patterns with S sites whose only non-zero entry is at the (i, s)-position. Hence, it takes 16 × 4S = 64S edges to construct a tree with an arbitrary given frequency pattern.

5. Non-uniqueness of stationary points: Kimura 2-parameter model.

In this section, we provide an example for which there are multiple stationary points of the likelihood function. To construct such an example, we find two polynomials p₁(x) and p₂(x) with coefficients b₁, b₂ such that the product p₁p₂ has 2 local maxima in [0, 1], and p₁ and p₂ can be expressed as positive linear combination of the basis polynomial functions P_i derived from an evolutionary model (as will be carefully described in this section). This gives a counter-example with S = 2 sites.

Consider the Kimura 2-parameter model with κ = 3 which has the rate matrix

Q = (\begin{matrix} - 5 ∕ 8 & 3 ∕ 8 & 1 ∕ 8 & 1 ∕ 8 \\ 3 ∕ 8 & - 5 ∕ 8 & 1 ∕ 8 & 1 ∕ 8 \\ 1 ∕ 8 & 1 ∕ 8 & - 5 ∕ 8 & 3 ∕ 8 \\ 1 ∕ 8 & 1 ∕ 8 & 3 ∕ 8 & - 5 ∕ 8 \end{matrix}) .

(5.1)

This matrix has eigenvalues 0 > −γ > −2γ where γ = 0.5. The transition probabilities under this evolutionary model can be computed explicitly by

P_{1} (t) = 0.25 + 0.25 exp (- 0.5 t) + 0.5 exp (- t) P_{2} (t) = 0.25 + 0.25 exp (- 0.5 t) - 0.5 exp (- t) P_{3} (t) = P_{4} (t) = 0.25 - 0.25 exp (- 0.5 t)

where P₁(t), P₂(t), P₃(t), P₄(t) are the probabilities of transitioning from state A to state A, T, G, C respectively. This simple model is “universal” in an appropriate sense as shown in the end of the paper.

This leads to a representation of the likelihood as the product of two different linear combinations of the transition polynomials

P_{1} (x) = 0.25 + 0.25 x + 0.5 x^{2} P_{2} (x) = 0.25 + 0.25 x - 0.5 x^{2} P_{3} (x) = P_{4} (x) = 0.25 - 0.25 x

(5.2)

where x = exp(−0.5t).

We assume that the likelihood is computed by observing two sites s₁ and s₂, and that the edge of interest e is a pendant edge with the observed values at that tip being A for both sites. Assume further that the state observation probabilities at the inner node of the edge e are provided by

b^{1} = [0.24977275, 0.34067358, 0.2051904, 0.20436327]

and

b^{2} = [0.25, 0.16087344, 0.29328435, 0.29584221] .

As discussed earlier, the log-likelihood function can be computed as

ℓ (t) = \log (λ_{1} (t)) + \log (λ_{2} (t))

(5.3)

where

λ_{s} (t) = \sum_{i = 1}^{4} b^{s} (i) P_{i} (t) .

Plots of the log-likelihood function ℓ and its perturbations (by varying the coefficients slightly) in terms of x and t are provided in Figure 4 and Figure 5, respectively. The figures show that ℓ has three stationary points (two local maxima at t₁ < t₂ and one local minimum), all in the interval [0, 1]. The fact that ℓ(t₁) > ℓ(t₂) for some cases and ℓ(t₁) < ℓ(t₂) for some others indicates that there exist some values of $b_{i}^{s}$ such that ℓ(t₁) = ℓ(t₂), i.e. the smoothly varying likelihood function can even have two global maxima

Fig 4: — The log-likelihood (5.3) as a function of x = exp(−0.5t) for various values of the coefficients of the characteristic polynomial.

Fig 5: — The log-likelihood (5.3) as a function of branch length t for various values of the coefficients of the characteristic polynomial.

We note that these examples can be achieved under the assumption that given any positive coefficients $b_{i}^{s}$ of the inner node, we can find some trees under the Kimura 2-parameter model with these precise coefficients. This assumption is confirmed by the following result, proven in the Appendix.

Theorem 5.1. For every set of positive coefficients $η_{i}^{s}$ , there exist a phylogenetic tree τ and S labelings ψ₁, ψ₂, …, ψ_S of the taxa such that for some edge e in τ, the one-dimensional likelihood function on e under the Kimura 2-parameter model with κ = 3 satisfies

ℓ (τ, t) = C_{0} + \sum_{s = 1}^{S} \log (\sum_{i} η_{i}^{s} P_{i} (t))

where P_i(t) is the probability of transition from state A to state i and C₀ is a constant. Moreover, such a tree τ can be constructed with at most 64S + 1 edges.

In our examples, the upper bound on the number of edges to produce the given frequency pattern is 64 × 2 + 1 = 129 edges.

Remark 5.1. While the algorithm to construct a tree given a frequency pattern given by Theorem 4.4 always outputs a star-tree (a tree without internal edges), we note that

We can approximate any star tree by resolved trees with arbitrary precision.
The maximum number of stationary points of a polynomial of degree four is 3, hence small perturbations on the coefficient of a polynomial of degree four with three stationary points do not change the number of stationary points.

We deduce that there are resolved trees for which the one-dimensional likelihood function on certain edges have multiple maxima

Since a resolved tree with n taxa has 2n−3 edges, the upper bound on the number of edges of a resolved tree for which the one-dimensional likelihood function on certain edges has multiple maxima is 2 × 129 − 3 = 255 edges.

6. Universality and complexity of the Kimura 2-parameter model.

As we discussed earlier in the paper, the main idea behind the results in Section 3 and Section 4 is that by using the Fundamental Theorem of Algebra, we can decompose a complicated evolutionary model into smaller modules, each of which can be approximated either by a “linear” model or by a “quadratic” model. This paradigm focuses on the branch lengths of the tree and is independent of the state space Ω of the evolutionary model, which provides a way to represent advanced evolutionary models (aminoacid models, codon models) by simple ones (nucleotide models).

This motivates the problem of constructing a complete characterization of one-dimensional likelihood functions. The main question is: does there exist an evolutionary model that can represent all one-dimensional likelihood functions of any time-reversible evolutionary model?

Such a model M, if it exists, and which we will refer to as a universal model, needs to satisfy the following two conditions:

All one-dimensional likelihood functions under any reversible evolutionary model can be written as a product of polynomials, each of which is a positive linear combination of the transition polynomials of M.
For every set of positive coefficients $b_{i j}^{s}$ , there exists a phylogenetic tree τ and S labelings ψ₁, ψ₂, …, ψ_S of the taxa such that for some edge e in τ, the one-dimensional likelihood function on e under the M satisfies
$ℓ (τ, t) = C_{1} + \sum_{s = 1}^{S} \log (\sum_{i j} b_{i j}^{s} P_{i j} (t))$
for some constant C₁.

In this section, we will prove that the Kimura 2-parameter model with κ = 3 is, in fact, a universal model. The key components of the proof are Theorem 5.1, the Fundamental Theorem of Algebra and the fact that the transition polynomials of the Kimura 2-parameter model effectively span a large class of linear and quadratic polynomials.

6.1. Universality of the Kimura 2-parameter model.

We first make the following observation, proven in the Appendix.

Lemma 6.1. If f is a real-coefficient polynomial that satisfies

f is positive on [0, 1],
deg f = 1 or f is a quadratic polynomial with no real root,

then f can be written as positive linear combination of the transition polynomials of the Kimura 2-parameter model if and only if

deg f = 1 and f(−1) > 0,

or
deg f = 2 and f has no root inside the set
$B = {z \in C : ∣ z + 1 ∣ \leq 1 o r ∣ z - 1 ∣ \leq \sqrt{2}} .$ (6.1)

This enables us to establish the universality of the Kimura 2-parameter model.

Theorem 6.1 (Universality). If L is a one-dimensional phylogenetic likelihood function of a tree under an arbitrary time-reversible model that satisfies Assumption 2.1, then up to translation and rescaling, L is equal to a one-dimensional likelihood under the Kimura 2-parameter model.

That is, there exist c₁, c₂, c₃ > 0 such that

L (t) = c_{2} L_{K2P} (τ, ψ, c_{3} t) - c_{1}, \forall t \in [0, \infty),

where L_K2P(τ, ψ, ·) is the one-dimensional likelihood function under the Kimura 2-parameter model on some edge of a tree τ with labeling ψ.

Proof. Assumption 2.1 implies that the function

ℒ (x) ≔ L (- \frac{1}{γ} \log x)

is a polynomial in x for some γ > 0. Since ℒ is continuous and the set B defined by (6.1) is compact, if we define

c_{1} = 1 + sup_{z \in B} ∣ ℒ (z) ∣,

then by the triangle inequality, the polynomial ℒ(x) + c₁ has no root in B.

By the Fundamental Theorem of Algebra, the polynomial ℒ(x) + c₁ can be written as

ℒ (x) + c_{1} = \prod_{s = 1}^{S} g_{s} (x),

where each g_s is either a quadratic polynomial with no real root, or a polynomial of degree 1. Moreover, each g_s is positive on [0, 1] and has no root in B (which also implies g_s(−1) > 0 if deg g_s = 1). Lemma 6.1 implies that each g_s can be written as a positive linear combination of the transition polynomials of the Kimura 2-parameter model

g_{s} (x) = \sum_{i j} b_{i j}^{s} P_{i j} (x) .

We deduce that

\log (ℒ (x) + c_{1}) = \sum_{s = 1}^{S} \log (\sum_{i j} b_{i j}^{s} P_{i j} (x)) .

We recall that the Kimura 2-parameter model has symmetries such that any transition probability P_ij(t) is in fact equal to P_Al(t) = P_l(t) for some l. Therefore, by grouping

η_{l}^{s} ≔ \sum_{i, j : P_{i j} = P_{A l}} b_{i j}^{s},

we have

\log (ℒ (x) + c_{1}) = \sum_{s = 1}^{S} \log (\sum_{l} η_{l}^{s} P_{l} (x)) .

Also, the characteristic polynomial for the Kimura 2-parameter model (5.1) with κ = 3 is parameterized by x = exp(−0.5t) such that the one-dimensional likelihood L_K2P(τ, ψ, t) satisfies

L_{K2P} (τ, ψ, t) = ℒ_{K2P} (τ, ψ, exp (- 0.5 t)) .

Now, Theorem 5.1 guarantees that there exists a tuple (τ, ψ) under the Kimura 2-parameter model on an edge of the tree such that

\log ℒ_{K2P} (τ, ψ, x) = - \log c_{2} + \sum_{s = 1}^{S} \log (\sum_{l} η_{l}^{s} P_{l} (x))

for some positive constant c₂.

In other words, we have

ℒ (x) = c_{2} ℒ_{K2P} (τ, ψ, x) - c_{1}, \forall x \in (0, 1] .

Hence,

L (- \frac{1}{γ} \log x) = c_{2} L_{K2P} (τ, ψ, - 2 \log x) - c_{1}, \forall x \in (0, 1],

L (t) = c_{2} L_{K2P} (τ, ψ, c_{3} t) - c_{1}, c_{3} = γ ∕ 2, \forall t \in [0, \infty) .

That is, up to translation and rescaling, L is equal to a one-dimensional phylogenetic likelihood function under the Kimura 2-parameter model.

Since the set of rate matrices for a given evolutionary model that satisfy Assumption 2.1 is dense in the set of all possible rate matrices under the same evolutionary model (Remark 2.1), we also have the following corollary.

Corollary 6.1. Any one-dimensional phylogenetic likelihood function under an arbitrary time-reversible evolutionary model can be uniformly approximated with arbitrary precision by (rescaled and translated) one-dimensional phylogenetic likelihood functions under the Kimura 2-parameter model.

We also note that the rescaling and translation constants in the statements of Theorem 6.1 can not be removed: Lemma 6.1 indicates that some polynomial function can not be represented exactly as a Kimura 2-parameter likelihood function. For example, one of the transition polynomials of the Jukes-Cantor model is

J (x) = 0.25 + 0.75 x

which has J(−1) < 0. For this reason, some likelihood functions under the Jukes-Cantor model may not be represented exactly by the Kimura 2-parameter model without adjusting by an additive constant.

6.2. Complexity of the Kimura 2-parameter model.

The universality results in the previous section can be adapted easily to analyze the set of all one-dimensional phylogenetic likelihood functions under the Kimura 2-parameter model. The following complexity results imply that one-dimensional likelihood functions under advanced evolutionary models can be more complex than it is typically assumed by phylogenetic inference algorithms.

First, it is straightforward to check that Theorem 6.1 still holds (without changing the proof) if we replace the one-dimensional phylogenetic likelihood function L with an arbitrary polynomial P in x = exp(−γt) for some γ > 0 and relax Assumption 2.1. Moreover, if P is of degree n, then by Theorem 5.1, it can be represented by a one-dimensional likelihood function of a tree with at most (64n + 1) edges with respect to some n-site labeling of its taxa

Corollary 6.2. Given an arbitrary polynomial P of degree n and γ > 0, then up to translation and rescaling, P(exp(−γt)) is equal to a one-dimensional likelihood under the Kimura 2-parameter model on a phylogeny with at most 64n + 1 edges.

This corollary indicates that by increasing the number of sites and the size of the tree, we can obtain likelihood functions shaped like an arbitrary polynomial in the interval [0, 1]. For example, given an arbitrary finite sequence t₁, t₂, …, t_k ∈ (0, ∞), we can construct a polynomial P_k that peaks precisely at x_k = exp(−0.5t_k) and use Corollary 6.2 to obtain the following result.

Corollary 6.3. Given an arbitrary finite sequence t₁, t₂, …, t_k ∈ (0, ∞), there exists a phylogenetic tree τ and some labeling of its taxa such that for some edge of the tree, the one-dimensional likelihood function under the Kimura 2-parameter model peaks precisely at t₁, t₂, …, t_k.

Furthermore, since rescaling and translation do not change the relative order of the likelihood values at the stationary points, we can make any of the t_i’s (or all of them) the function’s global maxima

Finally, we can replace the phylogenetic likelihood functions in Corollary 6.1 by an arbitrary continuous function f with finite limit to obtain the following density result.

Corollary 6.4. The space of all rescaled and translated one-dimensional phylogenetic likelihood functions under the Kimura 2-parameter model is dense in the space of all non-negative continuous functions on [0, ∞) with finite limits.

Proof. Let f be a continuous function on [0, ∞) with finite limit. Define

g (x) = f (- \log (x)) \forall x \in (0, 1],

then g(x) can be extended continuously to [0, 1]. By Weierstrass’s theorem [13], there exists a sequence of positive polynomials {P_n} such that

sup_{x \in [0, 1]} ∣ P_{n} (x) - g (x) ∣ \to 0 .

This implies that

sup_{t \in [0, \infty)} ∣ P_{n} (exp (- t)) - f (t) ∣ \to 0 .

On the other hand, we deduce from Corollary 6.1 that P_n(exp(−t)) is, up to rescaling and translation, a one-dimensional likelihood under the Kimura 2-parameter model. This completes the proof.

7. Non-reversible Markov models of evolution.

As we mentioned in Section 2, the analyses in the previous sections can be described in the more general framework of probabilistic inference for graphical models. In this framework, the likelihood function can be be defined as the marginal distribution on the leaf nodes of a joint probability distribution that factorizes over the edges of the tree via the non-negative kernels (also referred to as potential functions) k_e(i, j, t). The one-dimensional phylogenetic likelihood functions can be obtained by fixing all but one branch length.

In this section, we briefly analyze the extent to which our analyses of one-dimensional likelihood functions (in this general framework) are valid in this more general setting. As we illustrate below, the results in this section do not assume the reversibility of the kernels and thus apply for non-reversible models of evolution. However, we need to modify our assumptions accordingly.

Several parts of our analysis rely on the core assumption that the kernel functions need to be polynomials of x = exp(−γt) for some γ > 0. Thus we require the following assumption, which is the equivalent of Assumption 2.1 but in a more general setting.

Assumption 7.1 (Polynomial representation). There exists a constant γ_e > 0 and polynomials $p_{e}^{i j}$ (x) such that

k_{e} (i, j, t) = p_{e}^{i j} (exp (- γ_{e} t)) \forall t,

for all i, j ∈ Ω and e ∈ E(T).

This assumption implies that the limit of k_e(i, j, t) for large t exists, that is, the kernels are stationary. We also note that using the density results for Bernstein’s polynomial approximation (see, for example, [13]), any non-negative continuous function on [0, ∞) with finite limit can be approximated with arbitrary precision by some kernels that satisfy Assumption 7.1.

Under this assumption, the characteristic polynomials λ_s(x) can be defined in a similar manner and the result in Section 3 (Theorem 3.1) is still valid.

Theorem 7.1. Under Assumption 7.1, if for every site index s, the polynomial λ_s has only real roots, then one-dimensional likelihood function has at most one stationary point. Moreover, if such a point exists, it is also a global maximum.

While Section 4 is specifically developed to analyze the Kimura 2-parameter model, the logarithmic relative frequency pattern can be extended easily by replacing the transition probabilities across the edge e by the kernel k_e and by setting the distribution of the root by the uniform distribution. Building upon this concept, we can study the algebraic structure of the space of all frequency patterns and obtain a partial extension of Theorem 4.1 in a more general setting as described below. However, the proofs of Theorems 4.2 and 4.3 are tailor-made for the Kimura 2-parameter model and are not easily extended to the general case. We leave their extension as open problems.

We do obtain the following partial extension of Theorem 4.1 in the more general setting.

Theorem 7.2. The following properties hold:

(G, +) is a subgroup of ( $R^{r \times S}$ , +).
Assume
$lim_{t \to \infty} k_{e} (i, j, t) = p_{e}^{i j} (0) = \frac{1}{r} a n d lim_{t \to 0} k_{e} (i, j, t) = p_{e}^{i j} (1) = δ_{i j}$
for all i, j ∈ Ω and e ∈ E(T), where δ_ij = 1 if i = j and δ_ij = 0 otherwise. Then G is a linear subspace of $R^{r \times S}$ .

We recall that the frequency patterns can be defined without the polynomial representation of the likelihood. Thus, in Theorem 7.2, Assumption 7.1 is not required. We further note that for part (1) of the theorem to be valid, we do not need to assume that the kernels are stationary. In fact, the only condition required is that the distribution at the root is uniform. To provide a proof for path-connectivity of G, however, the conditions about the behavior of the kernels at 0 and ∞ are necessary.

Polynomial representation of likelihood functions (Assumption 7.1) is needed to extend the results of Section 6. By the same arguments as in the proof of Theorem 6.1, we obtain an equivalent result.

Theorem 7.3. If L is a one-dimensional phylogenetic likelihood function of a tree under a model whose kernels satisfy Assumption 7.1, then up to translation and rescaling, L is equal to a one-dimensional likelihood under the Kimura 2-parameter model.

8. Conclusions and discussion.

In this work, we investigate the problem of characterizing the shape of one-dimensional phylogenetic likelihood functions. Our results classify all evolutionary models into two categories:

For binary, Jukes-Cantor and Felsenstein 1981 models: the one-dimensional likelihood function has at most one stationary point.
For Kimura 2-parameter model and more advanced evolutionary models: the shape of the one-dimensional likelihood function can be much more complex. In fact, the space of all rescaled and translated one-dimensional phylogenetic likelihood functions under such a model is dense in the set of all non-negative continuous functions on [0, ∞) with finite limits.

Despite the complexity of the one-dimensional likelihood functions under advanced evolutionary models, we prove that all one-dimensional phylogenetic likelihood function are essentially Kimura 2-parameter likelihood functions. This result establishes a strong foundation for the use of the Kimura 2-parameter as the building block of all evolutionary models.

Our results are based on two novel techniques. First, we introduce and use characteristic polynomial representations of one-dimensional phylogenetic likelihood functions and the Fundamental Theorem of Algebra to decompose any evolutionary models into smaller modules, each of which resembles the Kimura 2-parameter model. Second, we introduce the new concept of logarithmic relative frequency patterns and analyze algebraic structures on the space of such patterns. These structures open a new way to explore the space of all possible likelihood functions. Moreover, by analyzing these structures, we are able to tackle the inverse problem of constructing a phylogenetic tree that has a given frequency pattern at the root. This enables us to construct phylogenetic trees that approximate any given likelihood function with arbitrary precision.

There are several avenues for improvement. Firstly, while we know that the shape of one-dimensional likelihood function can be very complex, it is not clear how frequently multimodality might be encountered in practice and to which degree it affects the accuracy of phylogenetic algorithms. Since the space of high degree polynomials are dominated by multimodal functions, one might expect that as the number of sites and the size of the tree increase, multimodality becomes more likely. However, since the space of phylogenies is known to possess considerable hidden structure which sometimes lead to counter-intuitive properties, careful analysis of the space of all rescaled and translated one-dimensional phylogenetic likelihood functions under the Kimura 2-parameter model are required to evaluate this hypothesis. Secondly, although the focus of this work is on one-dimensional phylogenetic likelihood functions, it is possible to utilize the framework we propose to study full phylogenetic likelihood functions. This will be a subject for future work.

Fig 1: — [Left:] A general Markov model of DNA evolution along a tree edge. [Right:] An extension a of the labelling ψ_i (corresponds to site i in the sequences) of the leaves of a simple tree τ to its inner nodes.

9. Acknowledgements.

We are grateful to Connor McCoy and Brian Claywell for their work on surrogate functions for likelihood computation, which motivated this research. This work is supported by DMS-1223057 and CISE-1564137 from the National Science Foundation and U54GM111274 from the National Institutes of Health.

10. Appendix.

Proof of Remark 2.1. If we denote the entries of the diagonalizing matrix M and N of Q by m_ij and n_ij, respectively, then

Q_{i j} = \sum_{k} m_{i k} e^{- r_{k}} n_{k j} .

where −r_k are the eigenvalues of Q. (The eigenvalues are known to be non-positive, so r_k are non-negative.)

Since the set of rational numbers $Q$ is dense in $R^{+}$ , we can find $r (k, l) \in Q^{+}$ such that for all k, r(k, l) → r_k as l approaches infinity. If we define

Q_{i j}^{l} = \sum_{k} m_{i k} e^{- r (k, l)} n_{k j},

then Q^l → Q element-wise as l approaches infinity. Since r(k, l) are all rational the matrices are of fixed finite dimension, we can also find γl > 0 and $d (k, l) \in N$ such that r(k, l) = d(k, l)_γl.

Proof of Theorem 4.1. We define the equivalence relation ~ on $R^{4 \times S}$ as follows: u ~ v if and only if there exists a vector of real constants c = (c₁, …, c_S) such that for all i = 1, 2, 3, 4 and s = 1, …, S, we have

u_{i, s} = v_{i, s} + c_{s} .

If we define

[h (τ, ψ)]_{i, s} = \log \sum_{a \in ℒ_{i, s}} \prod_{(u, v) \in E (τ)} P_{a_{u} a_{v}}^{u v} (t_{u v})

for i ∈ Ω, s = 1, …, S and ℒ_i,s being the set of all extensions a of ψ_s to all the nodes of τ such that a(ρ) = i.

Then for all τ, ψ, c, we have ϕ(τ, ψ, c) ~ h(τ, ψ).

(Addition): Consider any two elements x₁, x₂ ∈ G. By the definition of G and since the stationary frequency of the evolutionary model is the same for every state, there exist trees τ₁, τ₂ with n₁, n₂ taxa and labelings ψ₁, ψ₂ such that
$x_{i} \sim h (τ_{i}, ψ_{i}), i = 1, 2 .$

If we construct a new tree τ from τ₁ and τ₂ by gluing the roots ρ₁, ρ₂ and label the taxa of τ corresponding to ψ₁, ψ₂, then we have
$[h (τ, ψ)]_{i, s} = \log \sum_{a \in ℒ_{i, s}} \prod_{(u, v) \in E (τ_{1})} P_{a_{u}^{1} a_{v}^{2}}^{u v} (t_{u v}) \prod_{(u, v) \in E (τ_{2})} P_{a_{u}^{2} a_{v}^{2}}^{u v} (t_{u v})$
where each term a in the sum corresponds uniquely to a pair of extensions (a¹, a²) of, $ψ_{1}^{s}$ , $ψ_{2}^{s}$ to the internal nodes of τ₁, τ₂, respectively, such that a¹(ρ) = a²(ρ) = i.

Therefore,
$[h (τ, ψ)]_{i, s} = \log \sum_{a^{1}} \prod_{(u, v) \in E (τ_{1})} P_{a_{u}^{1} a_{v}^{1}}^{u v} (t_{u v}) + \log \sum_{a^{2}} \prod_{(u, v) \in E (τ_{2})} P_{a_{u}^{2} a_{v}^{2}}^{u v} (t_{u v}) = [h (τ_{1}, ψ_{1})]_{i, s} + [h (τ_{2}, ψ_{2})]_{i, s}$
for all i ∈ Ω and s = 1, …, S.

Therefore
$h (τ, ψ) \sim h (τ_{1}, ψ_{1}) + h (τ_{2}, ψ_{2})$
and
$x_{1} + x_{2} \sim h (τ, ψ) \in G$
which implies that G is closed under addition.
(Inverse): Consider any element x ∈ G and its corresponding representative tree τ and labeling ψ. For any permutation σ of the states, we define the labeling ψ_σ as
$ψ_{σ} (ω) = σ (ψ (ω))$
for every taxon ω of T. For example, if σ is the permutation (A G T C) in cycle notation, then ψ_σ is obtained from ψ by replacing A by G, G by T, T by C and C by A.

Now let σ₀ be a permutation of order r on the state space Ω, create r identical copies τ₁, τ₂, …, τ_r of the tree τ with labelings ψ_σ₀, $ψ_{σ_{0}^{2}}$ , …, $ψ_{σ_{0}^{r}}$ and glue the root of all the trees together with taxon labeling γ corresponding to the labelings of τ₁, τ₂, …, τ_r. Then because of symmetry, the frequency pattern f at the root of the newly created tree μ will be the same for every state, i.e., f ~ 0. We deduce that 0 ∈ G and for every x ∈ G, there exists y ∈ G such that x + y = 0.

This property and the fact that G is closed under addition prove that (G, +) is a subgroup of ( $R^{r \times S}$ , +).
(Connectedness): Consider any two elements x₁, x₂ ∈ G and their corresponding trees τ₁, τ₂, labelings ψ₁, ψ₂ and vectors of real constants c₁, c₂. For any α ∈ (0, 1), we create a new tree τ(α) by adding a new root ρ, joining ρ₁, ρ₂ by new edges of length $t_{1} = \tan (\frac{π}{2} α)$ , t₂ = 1/t₁, respectively. We make ρ the root of τ and label the taxa of τ according to ψ₁, ψ₂.

Now we note that when α → 0, we have
$h (τ (α), ψ) \to h (τ_{1}, ψ_{1}) + \log \frac{1}{r}$
since the contribution of τ₂ becomes stationary (the stationary frequency is 1/r because of the model’s symmetry). Similarly, when α → 1, we have
$h (τ (α), ψ) \to h (τ_{2}, ψ_{2}) + \log \frac{1}{r} .$

Therefore, the function g(α) = ϕ(τ(α),ψ) can be extended continuously to the closed interval [0, 1]. By changing c continuously from c₁ to log (1/r), varying α continuously from 0 to 1, then changing c continuously from log (1/r) to c₂, we can make a path in G that connects x₁ and x₂.
Since any path-connected subgroup of $R^{n}$ is a linear subspace [12], so is G.

Proof of Theorem 4.2. Denote by ℋ the set of all rooted trees with one edge (which have varying branch lengths) and

H = {ϕ (τ, ψ, c) : τ \in ℋ, ψ = (ψ_{1}, ψ_{2}, \dots, ψ_{S}) \in R^{S}, c \in R^{S}} .

Note that in the context of this paper, trees with different branch lengths (or in other words, different values of x) are considered as different trees. Thus the set H defined here is non-trivial.

We have

[h (τ, ψ)]_{j, s} = \log P_{ψ_{s} j} (x)

(10.1)

where x = exp(−0.5t), j = A, G, T, C, and t is the length of the unique edge of τ.

Let x₁ = 1/4, x₂ = 1/2, x₃ = 3/4. We will prove, by induction on S, that H contains 4 × S independent frequency patterns.

For S = 1, by considering the 4 different patterns (A), (G), (T), (C) at the only leaf and the 3 values of x (corresponding to different branch lengths) described above, we can create a set of 4 × 3 = 12 different pairs (τ, ψ). A quick check by computer shows that the corresponding frequency patterns generated by those pairs span the whole vector space $R^{4 \times 1}$ . We can achieve similar result for S = 2 with the patterns (A, G), (G, T), (T, C), (C, A).

Now assume that for S = n, H contains 4 x n independent frequency patterns of the form (10.1).

For l = A, G, T, C and x ∈ [0, 1], we define the building blocks

R_{l} (x) ≔ [\log P_{l A} (x) \log P_{l G} (x) \log P_{l T} (x) \log P_{l C} (x)] W_{l} (x) ≔ (\begin{matrix} R_{l} (x_{1}) \\ R_{l} (x_{2}) \\ R_{l} (x_{3}) \end{matrix}),

The induction hypothesis implies that there exist 4 × n independent frequency patterns of the form (10.1). This means that for some labelings ψ₁, ψ₂, …, ψ_4n, the block matrix

J = (\begin{matrix} B_{1} \\ B_{2} \\ \dots \\ B_{4 n} \end{matrix})

has maximal rank 4n, where

B_{s} ≔ (\begin{matrix} R_{ψ_{s}^{1}} (x_{1}) \dots R_{ψ_{s}^{n}} (x_{1}) \\ R_{ψ_{s}^{1}} (x_{2}) \dots R_{ψ_{s}^{n}} (x_{2}) \\ R_{ψ_{s}^{1}} (x_{3}) \dots R_{ψ_{s}^{n}} (x_{3}) \end{matrix}) .

For s = 1, …, 4n, we consider all the labelings obtained by appending ψ_s with one of the four nucleotides A, T, G, C. By doing so, we create a set of 48n different frequency patterns. We want to prove that the block matrix

C = (\begin{matrix} B_{1} W_{A} (x) \\ B_{2} W_{A} (x) \\ \dots \\ B_{4 n} W_{A} (x) \\ B_{1} W_{G} (x) \\ B_{2} W_{G} (x) \\ \dots \\ B_{4 n} W_{G} (x) \\ B_{1} W_{T} (x) \\ B_{2} W_{T} (x) \\ \dots \\ B_{4 n} W_{T} (x) \\ B_{1} W_{C} (x) \\ B_{2} W_{C} (x) \\ \dots \\ B_{4 n} W_{C} (x) \end{matrix})

has maximal rank 4n + 4.

Note that this matrix is row-equivalent to

(\begin{matrix} J & U \\ 0 & V \end{matrix})

where each row of V is of the form R_i(x_k) − R_A(x_k) for i = G, T, C. (This is done by subtracting the blocks (B_s R_i(x)) by the block (B_s R_A(x)) then rearranging the row to obtain the sub-matrix J at the top-left corner.)

On the other hand, from the case S = 1, we have

rank (\begin{matrix} W_{A} (x) \\ W_{G} (x) \\ W_{T} (x) \\ W_{C} (x) \end{matrix}) = 4,

which implies that rank(V) = 4. Hence, rank(C) = rank(J) + rank(V) = 4n + 4.

We deduce that for every S, the set G of all possible logarithmic conditional frequency patterns with S sites under the Kimura 2-parameter model is a linear subspace of $R^{4 \times S}$ (Theorem 4.1) that contains 4S linearly independent vectors. This implies that $G = R^{4 \times S}$ .

Proof of Theorem 4.4 (Step 1). (Any pattern of the form v = [x 0 0 0] can be produced by a tree τ with four edges.)

Denote

x_{1} (t) = P_{A A} (t) x_{2} (t) = P_{A G} (t) x_{3} (t) = P_{A T} (t) x_{4} (t) = P_{A C} (t)

we note that in the Kimura 2-parameter model, x₃(t) = x₄(t).

Now consider two trees τ₁ and τ₂, each with one edge, whose branch lengths are t and s, respectively. We label the only nodes of τ₁ and τ₂ by the patterns ψ₁ = (A) and ψ₂ = (G), and obtain the frequency patterns f₁(t) and f₂(s) respectively. By gluing the roots of τ₁ (1 edge) and the “inverse” of the tree τ₂ (3 edges)), we obtain a tree T(t, s) with 4 edges whose frequency pattern is equivalent to

f_{1} (t) - f_{2} (s) \sim [\log \frac{x_{1} (t) x_{4} (s)}{x_{4} (t) x_{2} (s)}, \log \frac{x_{2} (t) x_{4} (s)}{x_{4} (t) x_{1} (s)}, 0, 0] .

On the other hand, we note that for the Kimura 2-parameter model (5.1),

\frac{x_{2} (t)}{x_{4} (t)} = 1 + 2 exp (- 0.5 t)

only admits values in the interval [1, 3], while x₁(s)/x₄(s) is a continuous decreasing function in s that admits all values in the interval [1, ∞). Hence, for every t > 0, there exists a unique k(t) > 0 such that

\frac{x_{2} (t) x_{4} (k (t))}{x_{4} (t) x_{1} (k (t))} = 1 .

Moreover, k(t) is a continuous function in t and

lim_{t \to \infty} k (t) = \infty lim_{t \to 0} k (t) = k_{0}

where k₀ satisfies x₁(k₀)/x₄(k₀) = 3.

Now if we denote

g (t) = \frac{x_{1} (t) x_{4} (k (t))}{x_{4} (t) x_{2} (k (t))}

then g(t) is a continuous function that satisfies

lim_{t \to \infty} g (t) = 1 lim_{t \to 0} g (t) = \infty .

We deduce that for a range of t,

f_{1} (t) - f_{2} (k (t)) \sim [\log g (t), 0, 0, 0]

which admits every patterns of the form [x, 0, 0, 0] with x > 0. Similarly

f_{2} (k (t)) - f_{1} (t) \sim [- \log g (t), 0, 0, 0]

admits every patterns of the form [x, 0, 0, 0] with x < 0. This completes the proof.

Proof of Theorem 5.1. From Theorem 4.1, there exists a rooted tree τ, a labeling ψ and a vector of real constants c = (c₁, …, c_S) such that

c_{s} + \log \sum_{a \in ℒ_{i, s}} π (i) \prod_{(u, v) \in E (τ)} P_{a_{u} a_{v}}^{u v} (t_{u v}) = \log (η_{i}^{s}) .

For any t > 0, we create a new tree τ(t) by adding an edge e of length t to the root ρ and labeling the additional taxon by the constant vector (A, A, …, A). The log-likelihood function on e of τ(t) given this taxon labeling is

ℓ (t) = \sum_{s = 1}^{S} \log (\sum_{i} \sum_{a \in ℒ_{i, s}} π (i) \prod_{(u, v) \in E (τ)} P_{a_{u} a_{v}}^{u v} (t_{u v}) P_{i A} (t)) = - \sum_{s = 1}^{S} c_{s} + \sum_{s = 1}^{S} \log (\sum_{i} η_{i}^{s} P_{i} (t)) .

Theorem 4.4 implies that the tree τ can be constructed with at most 64S edges. Hence, τ(t) has at most (64S + 1) edges.

Proof of Lemma 6.1. We first consider the case of linear functions. Assume that f(x) = ax + b such that f is positive in [0, 1]. We deduce that b + a = f(1) > 0. Hence f can be written as

f (x) = a x + b = 2 (b - a) (\frac{1}{4} - \frac{1}{4} x) + (b + a) (\frac{1}{4} + \frac{1}{4} x - \frac{1}{2} x^{2}) + (b + a) (\frac{1}{4} + \frac{1}{4} x + \frac{1}{2} x^{2}) = 2 (b - a) P_{3} (x) + (b + a) P_{2} (x) + (b + a) P_{1} (x)

using the transition polynomials P_i(x) from equation (5.2).

Since {P₁, P₂, P₃} are linearly independent, we deduce that f can be expressed as positive linear combination of P₁, P₂, P₃ if and only if f(−1) = b − a > 0.

If f(x) is a monic polynomial of degree 2 with no real roots, then f can be written as

f (x) = x^{2} - 2 a x + a^{2} + b^{2} = [(a - 1)^{2} + b^{2} - 1] P_{1} (x) + [(a - 1)^{2} + b^{2} - 2] P_{2} (x) + 2 [(a + 1)^{2} + b^{2} - 1] P_{3} (x) .

The coefficients are positive if and only if a ± bi do not belong to B.

References.

[1].Bryant D, Galtier N, and Poursat M-A, “Likelihood calculation in molecular phylogenetics,” Mathematics of Evolution and Phylogeny, pp. 33–62, 2005. [Google Scholar]
[2].Steel M, “The maximum likelihood point for a phylogenetic tree is not unique,” Systematic Biology, pp. 560–564, 1994. [Google Scholar]
[3].Chor B, Hendy MD, Holland BR, and Penny D, “Multiple maxima of likelihood in phylogenetic trees: an analytic approach,” Molecular Biology and Evolution, vol. 17, no. 10, pp. 1529–1541, 2000. [DOI] [PubMed] [Google Scholar]
[4].Rogers JS and Swofford DL, “Multiple local maxima for likelihoods of phylogenetic trees: a simulation study.,” Molecular biology and evolution, vol. 16, no. 8, pp. 1079–1085, 1999. [DOI] [PubMed] [Google Scholar]
[5].Fukami K and Tateno Y, “On the maximum likelihood method for estimating molecular trees: uniqueness of the likelihood point,” Journal of molecular evolution, vol. 28, no. 5, pp. 460–464, 1989. [DOI] [PubMed] [Google Scholar]
[6].Jukes TH and Cantor CR, “Evolution of protein molecules,” Mammalian protein metabolism, vol. 3, pp. 21–132, 1969. [Google Scholar]
[7].Felsenstein J, “Evolutionary trees from DNA sequences: a maximum likelihood approach,” Journal of molecular evolution, vol. 17, no. 6, pp. 368–376, 1981. [DOI] [PubMed] [Google Scholar]
[8].Kimura M, “A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences,” Journal of molecular evolution, vol. 16, no. 2, pp. 111–120, 1980. [DOI] [PubMed] [Google Scholar]
[9].Felsenstein J, Inferring phylogenies. Sinauer associates Sunderland, 2004. [Google Scholar]
[10].Hanson MA, “On sufficiency of the Kuhn-Tucker conditions,” Journal of Mathematical Analysis and Applications, vol. 80, no. 2, pp. 545–550, 1981. [Google Scholar]
[11].Ben-Israel A and Mond B, “What is invexity?,” The Journal of the Australian Mathematical Society. Series B. Applied Mathematics, vol. 28, no. 01, pp. 1–9, 1986. [Google Scholar]
[12].Hayashida T, “Arc-wise connected subgroup of a vector group,” in Kodai Mathematical Seminar Reports, vol. 1, pp. 16–16, 1949. [Google Scholar]
[13].Farouki RT, “The bernstein polynomial basis: a centennial retrospective,” Computer Aided Geometric Design, vol. 29, no. 6, pp. 379–419, 2012. [Google Scholar]

[R1] [1].Bryant D, Galtier N, and Poursat M-A, “Likelihood calculation in molecular phylogenetics,” Mathematics of Evolution and Phylogeny, pp. 33–62, 2005. [Google Scholar]

[R2] [2].Steel M, “The maximum likelihood point for a phylogenetic tree is not unique,” Systematic Biology, pp. 560–564, 1994. [Google Scholar]

[R3] [3].Chor B, Hendy MD, Holland BR, and Penny D, “Multiple maxima of likelihood in phylogenetic trees: an analytic approach,” Molecular Biology and Evolution, vol. 17, no. 10, pp. 1529–1541, 2000. [DOI] [PubMed] [Google Scholar]

[R4] [4].Rogers JS and Swofford DL, “Multiple local maxima for likelihoods of phylogenetic trees: a simulation study.,” Molecular biology and evolution, vol. 16, no. 8, pp. 1079–1085, 1999. [DOI] [PubMed] [Google Scholar]

[R5] [5].Fukami K and Tateno Y, “On the maximum likelihood method for estimating molecular trees: uniqueness of the likelihood point,” Journal of molecular evolution, vol. 28, no. 5, pp. 460–464, 1989. [DOI] [PubMed] [Google Scholar]

[R6] [6].Jukes TH and Cantor CR, “Evolution of protein molecules,” Mammalian protein metabolism, vol. 3, pp. 21–132, 1969. [Google Scholar]

[R7] [7].Felsenstein J, “Evolutionary trees from DNA sequences: a maximum likelihood approach,” Journal of molecular evolution, vol. 17, no. 6, pp. 368–376, 1981. [DOI] [PubMed] [Google Scholar]

[R8] [8].Kimura M, “A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences,” Journal of molecular evolution, vol. 16, no. 2, pp. 111–120, 1980. [DOI] [PubMed] [Google Scholar]

[R9] [9].Felsenstein J, Inferring phylogenies. Sinauer associates Sunderland, 2004. [Google Scholar]

[R10] [10].Hanson MA, “On sufficiency of the Kuhn-Tucker conditions,” Journal of Mathematical Analysis and Applications, vol. 80, no. 2, pp. 545–550, 1981. [Google Scholar]

[R11] [11].Ben-Israel A and Mond B, “What is invexity?,” The Journal of the Australian Mathematical Society. Series B. Applied Mathematics, vol. 28, no. 01, pp. 1–9, 1986. [Google Scholar]

[R12] [12].Hayashida T, “Arc-wise connected subgroup of a vector group,” in Kodai Mathematical Seminar Reports, vol. 1, pp. 16–16, 1949. [Google Scholar]

[R13] [13].Farouki RT, “The bernstein polynomial basis: a centennial retrospective,” Computer Aided Geometric Design, vol. 29, no. 6, pp. 379–419, 2012. [Google Scholar]

PERMALINK

THE SHAPE OF THE ONE-DIMENSIONAL PHYLOGENETIC LIKELIHOOD FUNCTION

Vu Dinh

Frederick A Matsen IV

Abstract

1. Introduction.

2. Background and Definitions.

2.1. Markov models of sequence evolution.

2.2. One-dimensional phylogenetic likelihood functions.

2.3. Evolutionary models.

2.4. Characteristic polynomials of one-dimensional phylogenetic likelihood functions.

3. Uniqueness of the stationary point.

4. Algebraic structures on the space of all logarithmic relative frequency patterns under the Kimura 2-parameter model.

Sketch of proof.

Fig 2:

Fig 3:

5. Non-uniqueness of stationary points: Kimura 2-parameter model.

Fig 4:

Fig 5:

6. Universality and complexity of the Kimura 2-parameter model.

6.1. Universality of the Kimura 2-parameter model.

6.2. Complexity of the Kimura 2-parameter model.

7. Non-reversible Markov models of evolution.

8. Conclusions and discussion.

Fig 1:

9. Acknowledgements.

10. Appendix.

References.

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

THE SHAPE OF THE ONE-DIMENSIONAL PHYLOGENETIC LIKELIHOOD FUNCTION

Vu Dinh

Frederick A Matsen IV

Abstract

1. Introduction.

2. Background and Definitions.

2.1. Markov models of sequence evolution.

2.2. One-dimensional phylogenetic likelihood functions.

2.3. Evolutionary models.

2.4. Characteristic polynomials of one-dimensional phylogenetic likelihood functions.

3. Uniqueness of the stationary point.

4. Algebraic structures on the space of all logarithmic relative frequency patterns under the Kimura 2-parameter model.

Sketch of proof.

Fig 2:

Fig 3:

5. Non-uniqueness of stationary points: Kimura 2-parameter model.

Fig 4:

Fig 5:

6. Universality and complexity of the Kimura 2-parameter model.

6.1. Universality of the Kimura 2-parameter model.

6.2. Complexity of the Kimura 2-parameter model.

7. Non-reversible Markov models of evolution.

8. Conclusions and discussion.

Fig 1:

9. Acknowledgements.

10. Appendix.

References.

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases