Are Skyline Plot-Based Demographic Estimates Overly Dependent on Smoothing Prior Assumptions?

Kris V Parag; Oliver G Pybus; Chieh-Hsi Wu

doi:10.1093/sysbio/syab037

. 2021 May 13;71(1):121–138. doi: 10.1093/sysbio/syab037

Are Skyline Plot-Based Demographic Estimates Overly Dependent on Smoothing Prior Assumptions?

Kris V Parag ^1,^2,^✉, Oliver G Pybus ², Chieh-Hsi Wu ³

Editor: Simon Ho

PMCID: PMC8677568 PMID: 33989428

Abstract

In Bayesian phylogenetics, the coalescent process provides an informative framework for inferring changes in the effective size of a population from a phylogeny (or tree) of sequences sampled from that population. Popular coalescent inference approaches such as the Bayesian Skyline Plot, Skyride, and Skygrid all model these population size changes with a discontinuous, piecewise-constant function but then apply a smoothing prior to ensure that their posterior population size estimates transition gradually with time. These prior distributions implicitly encode extra population size information that is not available from the observed coalescent data or tree. Here, we present a novel statistic, Inline graphic , to quantify and disaggregate the relative contributions of the coalescent data and prior assumptions to the resulting posterior estimate precision. Our statistic also measures the additional mutual information introduced by such priors. Using we show that, because it is surprisingly easy to overparametrize piecewise-constant population models, common smoothing priors can lead to overconfident and potentially misleading inference, even under robust experimental designs. We propose Inline graphic as a useful tool for detecting when effective population size estimates are overly reliant on prior assumptions and for improving quantification of the uncertainty in those estimates.[Coalescent processes; effective population size; information theory; phylodynamics; prior assumptions; skyline plots.]

The coalescent process models how changes in the effective size of a target population influence the phylogenetic patterns of sequences sampled from that population. First derived in (Kingman, 1982) under the assumption of a constant sized population, the coalescent process has since been extended to account for temporal variation in the population size (Griffiths and Tavare 1994), structured demographics (Beerli and Felsenstein 1999), and multilocus sampling (Li and Durbin 2011). Inference under these models aims to statistically recover the unknown effective population size (or demographic) history from the reconstructed phylogeny (or tree) and has provided insights into infectious disease epidemiology, population genetics, and molecular ecology (Pybus et al. 2003; Wakeley 2008; Shapiro et al. 2004). Here, we focus on coalescent processes that describe the genealogies of serially sampled individuals from populations with deterministically varying size. These are widely applied to study the phylodynamics of infectious diseases (Griffiths and Tavare 1994; Rodrigo and Felsenstein 1999).

Early approaches to inferring effective population size from coalescent phylogenies used pre-defined parametric models (e.g., exponential or logistic growth functions) to represent temporal demographic changes (Kuhner et al. 1998; Pybus et al. 2003). While these formulations required only a few variables and provided interpretable estimates, selecting the most appropriate parametric description could be challenging and risk underfitting complex trends (Minin et al. 2008). This motivated the introduction of the classic skyline plot (Pybus et al. 2000), which, by proposing an independent, piecewise-constant demographic change at every coalescent event (i.e., at the branching times in the phylogeny), maximized flexibility and removed parametric restrictions. However, this flexibility came at the cost of increased estimation noise and potential overfitting of changes in effective population size (Ho and Shapiro 2011).

Efforts to redress these issues within a piecewise-constant framework subsequently spawned a family of skyline plot-based methods (Ho and Shapiro 2011). Among these, the most popular and commonly used are the Bayesian Skyline Plot (BSP) (Drummond et al. 2005), the Skyride (Minin et al. 2008), and the Skygrid (Gill et al. 2013) approaches. All three attempted to regulate the sharp fluctuations of the inferred piecewise-constant demographic function by enforcing a priori assumptions about the smoothness (i.e., the level of autocorrelation among piecewise-constant segments) of real population dynamics. This was seen as a biologically sensible compromise between noise regulation and model flexibility (Parag and Donnelly 2020; Strimmer and Pybus 2001).

The BSP limited overfitting by i) predefining fewer piecewise demographic changes than coalescent events and ii) smoothing noise by asserting a priori that the population size after a change-point was exponentially distributed around the population size before it. This method was questioned by (Minin et al., 2008) for making strong smoothing and change-point assumptions and stimulated the development of the Skyride, which embeds the flexible classic skyline plot within a tunable Gaussian smoothing field. The Skygrid, which extends the Skyride to multiple loci and allows arbitrary change-points (the BSP and Skyride change-times coincide with coalescent events), also uses this prior. The Skyride and Skygrid methods aimed to better trade off prior influence with noise reduction, and while somewhat effective, are still imperfect because they can fail to recover genuinely abrupt demographic changes such as bottlenecks (Faulkner et al. 2019).

As a result, studies continue to explore and address the nontrivial problem of optimizing this tradeoff, either by searching for less-restrictive and more adaptive priors (Faulkner et al. 2019) or by deriving new data-driven skyline change-point grouping strategies (Parag and Donnelly 2020). The evolution of coalescent model inference thus reflects a desire to understand and fine-tune how prior assumptions and observed phylogenetic data interact to yield reliable posterior population size estimates. Surprisingly, and in contrast to this desire, no study has yet tried to directly and rigorously measure the relative influence of the priors and data on these estimates.

Here, we develop and present a novel information theoretic statistic, Inline graphic , to formally disaggregate and quantify the contributions of both priors and data on the uncertainty around the posterior demographic estimates of popular skyline-based coalescent methods. Using we show how widely used smoothing priors can result in overconfident population size inferences (i.e., estimates with unjustifiably small credible intervals) and provide practical guidelines against such circumstances. We illustrate the utility of this approach on well-characterized data sets describing the population size of HCV in Egypt (Pybus et al. 2003) and ancient Beringian steppe bison (Shapiro et al. 2004).

To our knowledge, Inline graphic , which in theory can be adapted to any prior-data comparison problem, is new not only to the field of phylogenetics but also across statistics and data science. While inference that is strongly driven by prior assumptions can be beneficial, for example when a prior encodes expert knowledge or salient dynamics, having a measure of the relative information introduced by data and prior distributions can improve the reproducibility and interpretability of analyses. Our statistic will help to detect when prior assumptions are inadvertently and overly influencing demographic estimates and will hopefully serve as a diagnostic tool that future methods can employ to optimize and validate their prior-data tradeoffs.

Materials and Methods

Coalescent Inference

We provide an overview of the coalescent process and statistical inference under skyline plot-based demographic models. The coalescent is a stochastic process that describes the ancestral genealogy of sampled individuals or lineages from a target population (Kingman 1982). Under the coalescent, a tree or phylogeny of relationships among these individuals is reconstructed backwards in time with coalescent events defined as the points where pairs of lineages merge (i.e., coalesce) into their ancestral lineage. This tree, Inline graphic , is rooted at time into the past, which is the time to the most recent common ancestor (TMRCA) of the sample. The tips of correspond to sampled individuals.

The rate at which coalescent events occur (i.e., the rate of branching in Inline graphic ) is determined by and hence informative about the effective size of the target population. We assume that a total of samples are taken from the target population at distinct sampling times, which are independent of and uninformative about population size changes (Drummond et al. 2005). We do not specify the sample generating process as it does not affect our analysis by this independence assumption (Parag and Pybus 2019). We let Inline graphic be the time of the th coalescent event in with and ( samples can coalesce times before reaching the TMRCA).

We use Inline graphic to count the number of lineages in at time into the past; then decrements by 1 at every and increases at sampling times. Here, is the present. The effective population size or demographic function at is so that the coalescent rate underlying is (Kingman 1982). While can be described using appropriate parametric formulations (Parag and Pybus 2017), it is more common to represent Inline graphic by some tractable -dimensional piecewise-constant approximation (Ho and Shapiro 2011). Thus, we can write , with as the number of piecewise-constant segments. Here, is the constant population size of the segment which is delimited by times , with and and is an indicator function. The rate of producing new coalescent events is then Inline graphic . Kingman's coalescent model is obtained by setting (constant population of ).

When reconstructing the population size history of infectious diseases, it is often of interest to infer Inline graphic from (Ho and Shapiro 2011), which forms our coalescent data generating process. If denotes the vector of demographic parameters to be estimated then the coalescent data log-likelihood can be obtained from (Parag and Pybus, 2019) and (Snyder and Miller, 1991) as

(1)

with Inline graphic and as constants that depend on the times and lineage counts of the coalescent events that fall within the segment duration , and . Equation 1 is equivalent to the standard serially sampled skyline log-likelihood in (Drummond et al., 2005), except that we do not restrict to change only at coalescent event times.

In Bayesian phylogenetic inference, skyline-based methods such as the BSP, Skyride and Skygrid combine this likelihood with a prior distribution Inline graphic , which encodes a priori beliefs about the demographic function. This yields a population size posterior, from Bayes law, which depends on both the prior and coalescent data-likelihood as:

(2)

Here, we assume that the phylogeny, Inline graphic , is known without error. In some instances, only sampled sequence data, , are available and a distribution over must be reconstructed from under a model of molecular evolution with parameters . Equation 2 becomes embedded in the more complex expression , which then involves inferring both the tree and population size (Drummond et al. 2002).

While we do not consider this extension here we note that results presented here are still applicable and relevant. This follows because the output of the more complex Bayesian analysis above (i.e., when sequence data Inline graphic are used directly) is a posterior distribution over tree space. We can sample from this posterior and treat each sampled tree effectively as a fixed tree. Consequently, we expect any summary statistic that we derive here, under the assumption of a fixed-tree will be usable in studies that incorporate genealogical uncertainty by computing the distribution of that statistic over this covering set of sampled posterior trees.

Information and Estimation Theory

We review and extend some concepts from information and estimation theory, applying them to skyline-based coalescent inference. We consider a general parametrization of the effective population size Inline graphic , where for all and (.) is a differentiable function. Popular skyline-based methods usually choose the identity function (e.g., BSP) or the natural logarithm (e.g., the Skyride and Skygrid) for . Equations 1 and 2 are then reformulated with as the coalescent data log-likelihood and Inline graphic as the demographic prior. The Bayesian posterior, combines this likelihood and prior and hence is influenced by both the coalescent data and prior beliefs. We can formalize these influences using information theory.

The expected Fisher information, Inline graphic , is a matrix with th element (Lehmann and Casella 1998). The expectation is taken over the coalescent tree branches and . As observed in (Parag and Pybus, 2019), quantifies how precisely we can estimate the demographic parameters, , from the coalescent data, . Precision is defined as the inverse of variance (Lehmann and Casella 1998). The BSP, Skyride, and Skygrid parametrizations all yield Inline graphic and , with I as a identity matrix (Parag and Pybus 2019). These matrices provide several useful insights that we will exploit in later sections. First, is orthogonal (diagonal), meaning that the coalescent process over the segment can be treated as deriving from an independent Kingman coalescent with constant population size Inline graphic (Parag and Pybus 2017). Second, the number of coalescent events in that segment, , controls the Fisher information available about . Last, working under removes any dependence of this Fisher information component on the unknown parameter (Parag and Pybus 2019).

The prior distribution, Inline graphic , that is placed on the demographic parameters can alter and impact both estimate bias and precision. We can gauge prior-induced bias by comparing the maximum likelihood estimate (MLE), with the maximum a posteriori estimate (MAP), (van Trees 1968). The difference measures this bias. We can account for prior-induced precision by computing Fisher-type matrices for the prior and posterior as Inline graphic and (Tichavsky et al. 1998; Huang and Zhang 2018). Combining these gives

(3)

Equation 3 describes how the posterior Fisher information matrix, Inline graphic , relates to the standard Fisher information and the prior second derivative . We make the common regularity assumptions (see Huang and Zhang 2018 for details) that ensure is positive definite and that all Fisher matrices exist. These assumptions are valid for exponential families such as the piecewise-constant coalescent (Lehmann and Casella 1998; Parag and Pybus 2019). Equation 3 will prove fundamental to resolving the relative impact of the prior and data on the best precision achievable using the posterior Inline graphic . We also define expectations on these matrices with respect to the prior as , and , with , for example. These matrices are now constants instead of functions of . Equation 3 also holds for these constant matrices (Tichavsky et al. 1998).

These Fisher information matrices set theoretical upper bounds on the precision attainable by all possible statistical inference methods. For any unbiased estimate of Inline graphic , , the Cramer–Rao bound (CRB) states that with indicating transpose. If we relax the unbiased estimation requirement and include prior (distribution) information then the Bayesian or posterior Cramer–Rao lower bound (BCRB) controls the best estimate precision (van Trees 1968). If Inline graphic is any estimator of then the BCRB states that . This bound is not dependent on due to the extra expectation over the prior (Tichavsky et al. 1998).

The CRB describes how precisely we can estimate demographic parameters using just the coalescent data and is achieved (asymptotically) with equality for skyline (piecewise-constant) coalescent models (Parag and Pybus 2019). The BCRB, instead, defines the precision limit for the combined contributions of the data and the prior. The CRB is a frequentist bound that assumes a true fixed Inline graphic , while the BCRB is a Bayesian bound that treats as a random parameter. The expectation over the prior connects the two formalisms (Ben-Haim and Eldar 2009). Given their importance in delimiting precision, the and Fisher matrices will be central to our analysis, which focuses on resolving and quantifying the individual contributions of the data versus prior assumptions.

Results

The Coalescent Information Ratio,

We propose and derive the coalescent information ratio, Inline graphic , as a statistic for evaluating the relative contributions of the prior and coalescent data to the posterior estimates obtained as solutions to Bayesian skyline inference problems (see Materials and Methods section). Consider such a problem in which the -tip phylogeny is used to estimate the Inline graphic -element demographic parameter vector . Let be the MLE of given the coalescent data . Asymptotically, the uncertainty around this MLE can be described with a multivariate Gaussian distribution with covariance matrix . The Fisher information, then defines a confidence ellipsoid that circumscribes the total uncertainty from this distribution. In (Parag and Pybus, 2019), this ellipsoid was found central to understanding the statistical properties of skyline-based estimates.

The volume of this ellipsoid is Inline graphic , with as some -dependent constant. Decreasing increases the best estimate precision attainable from the data (Lehmann and Casella 1998). In a Bayesian framework, the asymptotic posterior distribution of also follows a multivariate Gaussian distribution with covariance matrix of . We can therefore construct an analogous ellipsoid from Inline graphic with volume that measures the uncertainty around the MAP estimate (Tichavsky et al. 1998). This volume includes the effect of both prior and data on estimate precision. Accordingly, we propose the ratio

(4)

as a novel and natural statistic for dissecting the relative impact of the data and prior distribution on posterior estimate precision.

From Equation 4, we observe that Inline graphic with signifying that the information from our prior distribution is negligible in comparison to that from the data and indicating the converse. Importantly, we find

(5)

At this threshold value Inline graphic contributes at least as much information as the data. Moreover, since the prior contribution becomes negligible with increasing data and is undefined when is unidentifiable from (i.e., when is singular, (Rothenburg 1971). Consequently, we posit that a smaller implies the prior provides a greater contribution to estimate precision.

We define Inline graphic as an information ratio due to its close connection to both the Fisher and mutual information. The mutual information between and , , measures how much information (in bits for example) contains about (Cover and Thomas 2006). This is distinct but related to , which quantifies the precision of estimating Inline graphic from (Brunel and Nadal 1998). Recent work from (Huang and Zhang, 2018) into the connection between the Fisher and mutual information has yielded two key approximations to . These can be obtained by substituting either or for in

(6)

with Inline graphic as the differential entropy of (Cover and Thomas 2006).

For a flat prior or many observations, Inline graphic , as the prior contributes little or no information (Brunel and Nadal 1998). For sharper priors, as the prior contribution is significant—using would lead to large errors (Huang and Zhang 2018). Equation 6 is predicated on (i) regularity assumptions for the distributions used (i.e., that the second derivatives exist), (ii) conditional dependence of the observed data given Inline graphic and (iii) that the likelihood is peaked around its most probable value (Lehmann and Casella 1998; Brunel and Nadal 1998; Huang and Zhang 2018). The skyline-based inference problems that we consider here automatically satisfy (i) and (ii) as these models belong to an exponential family. Condition (iii) is satisfied for moderate to large trees (and asymptotically) (Lehmann and Casella 1998; Parag and Pybus 2019).

Using the above approximations, we derive the interesting expression

(7)

which suggests that our ratio directly measures the excess mutual information introduced by the prior, providing a substantive link between how sharper estimate precision is attained with extra mutual information. Observe that both sides of Equation (7) diminish when Inline graphic . Because the mutual information and its approximations (see Equation (6)) are invariant to invertible parameter transformations (Huang and Zhang 2018), our coalescent information ratio does not depend on whether we infer , its inverse, or its logarithm.

Moreover, we can use normalizing transformations to make Inline graphic valid at even small tree sizes. In (Slate, 1994), several such transformations for exponentially distributed models like the coalescent are derived. Among them, the logarithmic transform can achieve approximately normal log-likelihoods for about seven observations and above (). Thus, Inline graphic , which is also optimal for experimental design (Parag and Pybus 2019), ensures the validity of on small trees. This is the parametrization adopted by the Skyride and Skygrid methods (Minin et al. 2008). Other (cubic-root) parametrizations under which would be valid at even smaller also exist (Slate 1994).

Equations 4–7 are not restricted to coalescent inference problems and are generally applicable to statistical models that involve exponential families (Lehmann and Casella 1998). We now specify Inline graphic for skyline-based models, which all possess piecewise-constant population sizes and orthogonal matrices (Parag and Pybus 2019). These properties permit the expansion (Ipsen and Rehman 2008):

where Inline graphic are the diagonal elements of with , and is the sub-matrix formed by deleting the rows and columns of .

This allows us to formulate a prior signal-to-noise ratio

(8)

which quantifies the relative excess Fisher information (the ``signal'') that is introduced by the prior. This ratio signifies when the prior contribution overwhelms that of the data i.e., Inline graphic . Having derived theoretically meaningful metrics for resolving prior-data precision contributions, we next investigate their ramifications.

The Kingman Conjugate Prior

Kingman's coalescent process (Kingman 1982), which describes the phylogeny of a constant sized population Inline graphic , is the foundation of all skyline model formulations. Specifically, a -dimensional skyline model is analogous to having Kingman coalescent models, the of which is valid over and describes the genealogy under population size . Here, we use Kingman's coalescent to validate and clarify the utility of Inline graphic as a measure of relative data-prior precision contributions.

We assume an Inline graphic -tip Kingman coalescent tree, and initially work with the inverse parametrization, . We scale at by as in (Parag and Pybus, 2017) so that for with . If defines the space of values, and has prior distribution , then, by (Snyder and Miller, 1991), its posterior distribution is

where Inline graphic is a constant and is the scaled TMRCA of .

The likelihood function embedded within Inline graphic is proportional to a shape-rate parametrized gamma distribution, with known shape . The conjugate prior for is also gamma (Fink 1997) i.e., with shape and rate . The posterior distribution is then with counting coalescent events in (Robert 2007). Transforming to implies . This is an inverse gamma distribution with mean Inline graphic , shape and inverse rate . If describes the space of possible values and then

We can interpret the parameters of the gamma posterior distribution as involving a prior contribution of Inline graphic coalescent events from a virtual tree, , with scaled TMRCA . This is then combined with the actual coalescent data, which contributes coalescent events from , with scaled TMRCA of (Robert 2007). This offers a clear breakdown of how our posterior estimate precision is derived from prior and likelihood contributions and suggests that if Inline graphic has more tips than then we are depending more on the prior than the data. We now calculate to determine if we can formalize this intuition.

The Fisher information values of Inline graphic are and . The information ratio and mutual information difference, , which hold for all parametrizations, then follow from Equations 4, 7, and 8 as

(9)

with Inline graphic , as the effective signal-to-noise ratio. The approximations shown are valid when . Interestingly, when so that , we get (see Equation (5)). This exactly quantifies the relative impact of real and virtual observations described previously. At this point, we are being equally informed by both the conjugate prior and the likelihood. Prior over-reliance can be defined by the threshold condition of Inline graphic .

The expression of Inline graphic confirms our interpretation of as an effective signal-to-noise ratio controlling the extra mutual information introduced by the conjugate prior. This can be seen by comparison with the standard Shannon mutual information expressions from information theory (Cover and Thomas 2006). At small Inline graphic , where the data dominates, we find that the prior linearly detracts from and linearly increases . We also observe that , the gamma rate parameter, has no effect on estimate precision or mutual information.

Our information ratio Inline graphic therefore provides a systematic decomposition of the posterior population size estimate precision and generalizes the virtual observation idea to any prior distribution. In essence, the prior is contributing an effective sample size, which for the conjugate Kingman prior is . We summarize these points in Figure 1, which shows the conjugate prior and two posteriors together with their corresponding Inline graphic values.

Figure 1. — Effect of conjugate prior on Kingman coalescent estimation. We examine the relative impact on estimate precision of a conjugate Kingman prior that contributes virtual observations. We work in for convenience. We compare this prior to posteriors, which are obtained under observed trees with (red) and (yellow) coalescent events. The true value is in black. The prior contribution decays as increases towards 1.

Skyline Smoothing Priors

In this section, we tailor Inline graphic for the BSP, Skyride, and Skygrid coalescent inference methods. These popular skyline-based approaches couple a piecewise-constant demographic coalescent data likelihood with a smoothing prior to produce population size estimates that change more continuously with time. The smoothing prior achieves this by assuming informative relationships between Inline graphic and its neighboring parameters . Such a priori correlation implicitly introduces additional demographic information that is not available from the coalescent data . While these priors can embody sensible biological assumptions, we show that they may also engender overconfident statements or obscure parameter non-identifiability. We propose Inline graphic as a simple but meaningful analytic for diagnosing these problems.

We first define uniquely objective (i.e., uninformative) reference skyline priors, which we denote Inline graphic . Finding objective priors for multivariate statistical models is generally nontrivial, but (Berger et al., 2015) state that if has form then . Here, and are some functions and symbolizes the vector excluding . Following this, we obtain the objective priors

with Inline graphic , as normalization constants. Given its optimal properties (Parag and Pybus 2019), we only consider , and drop explicit notational references to it. Under this parametrization, and its expectation with respect to the prior are equal, that is . In addition, the reference prior in this case is Inline graphic , with as a matrix of zeros. This yields by Equation (4). A uniform prior over log-population space is hence uniquely objective for skyline inference.

Other prior distributions, which are subjective by this definition, necessarily introduce extra information and contribute to the posterior estimate precision. This contribution will result in Inline graphic . The two most widely used, subjective, skyline plot smoothing priors are:

(i)
the Sequential Markov Prior (SMP) used in the BSP (Drummond et al. 2005), and
(ii)
the Gaussian Markov Random Field (GMRF) prior employed in both the Skyride and Skygrid methods (Minin et al. 2008; Gill et al. 2013).

As the SMP and GMRF both propose nearest neighbor autocorrelations among elements of Inline graphic , tridiagonal posterior Fisher information matrices result. We represent these as and , respectively.

The SMP is defined as: Inline graphic (Drummond et al. 2005). It assumes that with a prior mean of . An objective prior is used for . To adapt this for , we define for . In the Appendix, we show how this expression yields Equation A1 and hence the transformed prior . We then take relevant derivatives to obtain , which for the minimally representative Inline graphic case is written as:

(10)

The Inline graphic matrices simply extend the tridiagonal pattern of Equation (10).

An issue with the SMP is its dependence on the unknown ``true'' demographic parameter values. As a result, we cannot evaluate (or control) a priori how much information is contributed by this smoothing prior. Rapidly declining populations could feature Inline graphic , for example, which would result in prior over-reliance. Conversely, exponentially growing populations would be more data-dependent. This likely reflects the asymmetry in using sequential exponential distributions. The only control we have on smoothing implicitly emerges from choosing the number of segments, Inline graphic . Some recent implementations of the BSP include an alternative log-normal prior that links with (Bouckaert et al. 2019), which is conceptually similar to the GMRF below.

The possibly strong or inflexible prior assumptions under the BSP motivated the development of the GMRF for the Skyride and Skygrid methods (Minin et al. 2008). The GMRF works directly with Inline graphic and models the autocorrelation between the neighbouring segments with multivariate Gaussian distributions. The GMRF prior (Minin et al. 2008) is defined as . In this model, is a normalization constant, a smoothing parameter, to which a gamma prior is often applied, and the values adjust for the duration of the piecewise-constant skyline segments. Usually, either (i) Inline graphic is chosen based on the inter-coalescent midpoints in or (ii) a uniform GMRF is assumed with for every .

Similarly, we calculate Inline graphic for the as:

(11)

The appendix provides the general derivation for any Inline graphic . As is arbitrary and the depend only on , the GMRF is insensitive to the unknown parameter values. This property makes it more desirable than the SMP and gives us some control (via ) of the level of smoothing introduced. Nevertheless, the next section demonstrates that this model still tends to over-smooth demographic estimates.

We diagonalize Inline graphic and to obtain matrices of form . Here is an orthogonal transformation matrix (i.e., ) and with as the eigenvalue of . Since , we can use Equation 4 to find that . This equality reveals that acts as a prior perturbed version of . When objective reference priors are used we recover Inline graphic and . We can use the matrix to gain insight into how the GMRF and SMP encode population size correlations. The principal components of our posterior demographic estimates (which are obtained from ) are the vectors forming the axes of the uncertainty ellipsoid described by .

These principal component vectors take the form Inline graphic when we apply the reference prior . Thus, as we would expect, our uncertainty ellipses are centered on the parameters we wish to infer. However, if we use the GMRF prior these axes are instead transformed to . These new axes are linear combinations of and elucidate how smoothing priors share information (i.e., introduce autocorrelations) about Inline graphic across its elements. These geometrical changes also hint at how smoothing priors influence the statistical properties of our coalescent inference problem.

To solidify these ideas, we provide a visualization of Inline graphic and an example of . We consider the simple case, where the posterior Fisher information and for the GMRF and SMP both take the form:

(12)

with Inline graphic for the GMRF and for the SMP. The signal-to-noise ratio is (see Equation 9), and performance clearly depends on how the coalescent events in are apportioned between the two population size segments.

We can lower bound the contribution of these priors to Inline graphic under any settings by using the robust coalescent design from (Parag and Pybus, 2019). This stipulates that we define our skyline segments such that in order to optimize estimate precision under . At this robust point, we also find that (or ) is attained. Figure 2 gives the uncertainty ellipses for this robust Inline graphic model at . These are constructed in coordinates centered about population size means as with controlling the confidence level.

Figure 2. — Uncertainty ellipses for SMP and GMRF. We show the improvement in asymptotic precision rendered by use of a smoothing prior for a segment skyline inference problem. The prior informed ellipse (red) is smaller in volume and has skewed principal axes relative to the purely data informed one (blue). All ellipses represent confidence with the indicating coordinate directions about their means, which are the log population sizes, . The covariance that smoothing introduces controls the skew of these ellipses. Here, , (total coalescent event count) and (this controls the prior influence see Equation 12). Larger values lead to over-reliance on the smoothing prior.

Here Inline graphic is either or . Because is diagonal the data-informed confidence ellipse has principal axes aligned with . The covariance among population size segments in , which is induced by the smoothing prior, skews these principal axes. We can see this by diagonalizing at and for every to obtain:

(13)

Applying Inline graphic , we find that the axes of our uncertainty ellipse (as visible in Figure 2) have changed from to . Sums and differences of log-populations are now the parameters that can be most naturally estimated under the SMP and GMRF. The reduction in the area of the ellipses of Figure 2 is a proxy for Inline graphic .

The Dangers of Smoothing

Having defined ratios for measuring the contribution of smoothing priors to the precision of estimates, we now use them to explore and expose the conditions under which prior over-reliance is likely to occur in practice. We assume that skyline segments are chosen to satisfy the robust design Inline graphic for (Parag and Pybus 2019), with as the total number of skyline segments. We previously proved that robust designs, at , minimize dependence on the prior (maximize ). While this is not the case for , in Figure A1 of the Appendix, we illustrate that the maximal point is generally well approximated by this robust setting. The Inline graphic values computed here are therefore conservative for most settings. Other experimental designs rely more on the prior.

As in Equation 5, we use the Inline graphic threshold to diagnose when the coalescent data (likelihood) and prior are equally influencing demographic posterior estimate precision. At the total Fisher information doubles since . We previously uncovered the importance of this threshold in the Kingman conjugate prior problem, where it signified an equality between the number of pseudo and real samples contributed by the prior and data, respectively. As Inline graphic (see Equation 8), this setting is also meaningful because it achieves a unit signal-to-noise ratio for any skyline-based model.

We first reconsider the Inline graphic case of Equation 12, where controls the prior contribution to . Here suggests , which implies that we are overly-reliant on smoothing when is larger than of the total observed coalescent events. This occurs when or , for the SMP and GMRF respectively. The improved precision due to the prior at this Inline graphic threshold is shown in Figure 2. The relative ellipse area (and hence ) will shrink further as we deviate from robust designs.

As the number of skyline segments, Inline graphic , increase, smoothing becomes more influential and can promote misleading conclusions. For the cases, we will only examine the GMRF, since the SMP has the undesirable property of dependence on the unknown values. To better expose the impact of the smoothing parameter , we will assume a uniform GMRF ( Inline graphic ) so that then only depends on and . We compute and hence , at various . For example, we find that

under the robust design. Interestingly, the order of the polynomial dependence of Inline graphic (and hence ) on increases with . We find that this trend holds for any design. We will use the term robust for when is calculated under a robust design.

Figure 3 plots the robust Inline graphic against and for the uniform GMRF. A key feature of Figure 3 is the steep -dependent decay of relative to the threshold, which exposes how easily we can be unduly reliant on the prior, as increases. Given a phylogeny , increasing the complexity of a skyline-based model enhances the dependence of our posterior estimate precision on the smoothing prior. This pattern is intuitive as fewer coalescent events now inform each demographic parameter (Parag and Pybus 2019). However, Inline graphic decays with surprising speed. For example, at (the lowest curve in Figure 3), we get for and . Usually, has a gamma-prior with mean of 1 (Minin et al. 2008). We show the corresponding mutual information increases due to these GMRF priors in Figure A2 of the Appendix.

While Figure 3 might seem specific to the uniform GMRF, it is broadly applicable to the BSP, Skyride, and Skygrid methods. We now outline the implications of Figure 3 for each of these skyline-based approaches.

(1) Bayesian Skyline Plot. This method uses the SMP, which depends on the unknown Inline graphic values. However, the results of Figure 3 remain valid if we set to , which results in the smallest non-data contribution to Equation 10. This follows as and have similar forms. While this choice underestimates the impact of the SMP, it still cautions against high- skylines and confirms suspected BSP issues related to poor estimation precision when skylines are too complex, or the coalescent data are not sufficiently informative (Ho and Shapiro 2011). However, good use of the BSP grouping parameter (Drummond et al. 2005), which sets Inline graphic , could alleviate these problems.

(2) Skyride. When this method uses the uniform GMRF, all results apply exactly. In its full implementation, the Skyride employs a time-aware GMRF that sets Inline graphic based on and estimates from the data (Minin et al. 2008). However, even with these adjustments, the GMRF can over-smooth, and fail to recover population size changes (Ho and Shapiro 2011; Faulkner et al. 2019). Our results provide a theoretical grounding for this observation. The Skyride constrains Inline graphic and then smooths this noisy piecewise model. Consequently, it constructs a skyline which is too complex by our measures (the lowest curve in Equation 3 is at ). By rescaling the smoothing parameter to , the curves in Figure 3 upper bound the true values of the time-aware GMRF.

(3) Skygrid. This method uses a scaled GMRF. For a tree with TMRCA Inline graphic , the Skygrid assumes new population size segments every time units (Gill et al. 2013). As a result, every and the time-aware GMRF becomes uniform with rescaled smoothing parameter . Therefore, the conclusions of Figure 3 hold exactly for the Skygrid, provided the horizontal axis is scaled by Inline graphic . This setup reduces the rate of decay but the curves still caution strongly against using skylines with . Unfortunately, as its default formulation sets to 1 less than the number of sampled taxa (or lineages) (Gill et al. 2013), the Skygrid is also be vulnerable to prior over-reliance.

The popular skyline-based coalescent inference methods therefore all tend to over-smooth, resulting in population size estimates that can be overconfident or misleading. This issue can be even more severe than Figure 3 suggests since in current practice Inline graphic is often close to and non-robust designs are generally employed. Further, skylines are only statistically identifiable if every segment has at least 1 coalescent event (Parag and Pybus 2019; Parag et al. 2020). Consequently, if is set, smoothing priors can even mask identifiability problems. We recommend that Inline graphic must be guaranteed and in the next section derive a model rejection guideline for finding , the suggested minimum number of coalescent events per skyline segment, and diagnosing prior over-reliance.

Prior Informed Model Rejection

We previously demonstrated how commonly-used smoothing priors can dominate the posterior estimate precision when coalescent inference involves complex, highly parametrized (large- Inline graphic ) skyline models. Since data are more influential than the prior when , we can use this threshold to define a simple -rejection policy to guard against prior over-reliance. Assume that the matrix resulting from our prior of interest is symmetric and positive definite. This holds for the GMRF and SMP. The standard arithmetic–geometric mean inequality, Inline graphic , then applies with denoting the matrix trace. Since , we can expand this inequality and substitute in Equation 4 to get .

Since this inequality applies to all Inline graphic , we can maximize its right hand side to get a tighter lower bound on . This bound, termed , is achieved at the robust design and is given by

(14)

We define Inline graphic as a conservative model rejection criterion with implying that . If is the largest satisfying these inequalities (see Equation 14, indicates argument), then any skyline with more than segments is likely to be overly dependent on the prior and should be rejected under the current coalescent data or tree.

Alternatively, we recommend that skylines using a smoothing prior (with matrix Inline graphic ) should have at least events per segment to avoid prior reliance. The condition in Equation 14 ensures skyline identifiability (Parag and Pybus 2019) and generally (i.e., ). The dependence of on means that additions to the diagonals of necessarily increase the precision contribution from the prior. This insight supports our previous analysis, which used Inline graphic from the uniform GMRF to bound the performance of the SMP and time-aware GMRF. In the Appendix (see Equation A2) we derive analogous rejection bounds based on the excess mutual information, , from Equation 7. There we find that acts like an information-theoretic bandwidth, controlling the prior-contributed mutual information.

Equation 14, which forms a key contribution of this work, can be computed and is valid for any smoothing prior of interest. For the uniform GMRF where Inline graphic , we get . Note that here whenever or , as expected (i.e., there is no smoothing at these values). In Figure A4 of the Appendix, we confirm that is a good lower bound of . We enumerate across and , for an observed tree with , to get Figure 4, which recommends using no more than segments ( Inline graphic ). In Figure A5, we plot curves for various and , defining boundaries beyond which skyline estimates will be overly dependent on the GMRF.

Figure 4. — Bounding skyline complexity using the prior-data tradeoff. For the GMRF with uniform smoothing, we show how the maximum number of recommended skyline segments, (red), decreases with prior contribution (level of smoothing, i.e., increasing ). Hence the minimum recommended number of coalescent events per segment, (blue), rises. Here, we use the boundary (Figure 14), which approximates and provides a more easily computed measure of prior-data contributions. At larger the at a given decreases. The measure provides a model rejection tool, suggesting that models with should not be used, as they would risk being overly informed by the prior.

In the Appendix, we further analyze Equation 14 for the uniform GMRF to discover that Inline graphic is bounded by curves with exponents linear in and quadratic in (see Equation A3). This explains how the influence of smoothing increases with skyline complexity and yields a simple transformation , which can negate prior over-reliance. For comparison, the Skyride implements . The marked improvement, relative to Figure 3, is striking in Figure A3. Other revealing prior-specific insights can be obtained from Equation 14, reaffirming its importance as a model rejection statistic.

Our model rejection tool of Equation 14 can serve as a useful diagnostic for skyline over-parametrization, and as a precaution against prior over-reliance. However, we do not propose Inline graphic as the sole measure of optimal skyline complexity; because while warns against the prior being too relatively influential, it does not guarantee any absolute estimate precision. For example, a small pair might produce the same as a larger pair. Choosing an optimal in a data-justified manner is an open problem that is still under active study (Parag and Donnelly 2020). We next illustrate how Inline graphic , via its more easily computed approximation, , can be practically applied to detect and reject over-smoothed skyline plot models, using data sets that are commonly employed to evaluate the performance of coalescent demographic inference.

Illustrative Examples: Egyptian HCV and Beringian Bison

We validate the practical utility of Inline graphic (and hence ), as a diagnostic of prior over-dependence, by investigating changes in effective population size inferred from the well-studied Egyptian HCV-4 (Pybus et al. 2003) and Beringian steppe bison (Shapiro et al. 2004) data sets. The first consists of 63 partial sequences of HCV genotype 4 and was previously analyzed in (Pybus et al., 2003) using a coalescent model with a parametric demographic function that featured periods of constant population size separated by a phase of exponential growth. The second data set comprises 152 modern and partial mtDNA and was investigated in (Shapiro et al., 2004), where skyline plot models confirmed a demographic history of exponential growth then decline (boom-bust) with an additional bottleneck dynamic (Drummond et al. 2005). These two data sets have since been re-examined under various alternate models in (Minin et al., 2008), (Gill et al., 2013), (Parag et al., 2020) and several other studies.

We simulated 100 trees with Inline graphic and tips, using the software package MASTER (Vaughan and Drummond 2013), according to inferred HCV and bison population size trends, respectively. The HCV population size trend that we simulated from is provided in (Pybus et al., 2003). We inferred the population size trend of the bison data set using the BSP (with sequential Markovian prior) in accordance with published analyses (Drummond et al. 2005). We used 20 population groups and the optimal design from (Parag and Pybus 2019) to ensure that we captured complex bison population dynamics reliably. As our focus is on exploring the behavior of skylines and Inline graphic given a particular underlying population size trend and not the uncertainty associated with that trend, we used the posterior mean (HCV) or median (bison) of these inferred trends for simulating trees and do not consider genealogical uncertainty.

The simulated set of coalescent trees from each data set provide an approximate measure of the coalescent variance that could arise from the inferred underlying population size trends. We then estimated Inline graphic from every simulated tree using various skyline models with time-aware GMRF smoothing priors, as in (Minin et al., 2008). We varied the relative contributions of the coalescent data and GMRF to our posterior log-population size estimates by changing either the skyline dimension, , or the GMRF smoothing parameter Inline graphic . As is fixed for a given data set and robust designs are applied, increasing the number of coalescent events in each segment, , reduces .

We analyzed every tree over all combinations of Inline graphic across a wide range of . For comparison, we also generated purely data-informed estimates of , for the same , by replacing the subjective GMRF with a uniform, objective prior. We computed from Equation 14 for these settings in Figure 5 and observe that, as expected, it decreases with both Inline graphic and (i.e., increases with ). Practical analyses of these data sets using Skyride or Skygrid approaches, would choose or infer a value and set . However, Figure 5 shows and hence events per skyline parameter are often necessary to achieve . This raises questions about the validity of the common practice of applying these methods using their default settings.

Figure 5 confirms that the recommended maximum skyline dimension Inline graphic falls and hence the minimum allowable number of coalescent events per segment grows as the smoothing parameter increases. We demonstrate the qualitative difference in skyline-based estimates between values on either side of the criterion for a single simulated HCV and bison tree in Figure 6. In panels A and C, we present the Skyride estimate, which uses Inline graphic and implements , at the chosen values (0.05 and 1). Contrastingly, in B and D, we illustrate an equivalent skyline with a different , which achieves at this same , according to our metric (see the and curves at and in panels A and B of Figure 5, respectively). We overlay the corresponding skyline (with the same Inline graphic ) obtained with an objective uniform prior, to visualize the uncertainty engendered from the coalescent data alone.

At Inline graphic (panels A and C of Figure 6), the uniform prior produces a skyline that infers more rapid demographic fluctuations through time than that estimated with the GMRF prior. Further, the 95% HPD intervals from the uniform prior (red) are substantially wider than those from the GMRF prior (blue) in both examples, highlighting the marked contribution of the time-aware GMRF prior to posterior estimate precision. While this smoothed trajectory looks reliable we argue that, because Inline graphic (and hence ), it is difficult to justify using the data alone and that the prior is responsible for too much of the estimate precision. In contrast, at and (panels B and D of Figure 6), which apply , both prior distributions yield more similar skylines, implying that GMRF smoothing has not substantially inflated posterior estimate precision.

Under these settings, we have fewer demographic fluctuations than for Inline graphic because 4 and 2 times more coalescent events are informing each parameter or skyline segment, respectively. We achieve smaller uncertainty than with a uniform prior (which is overfitted) but without excessively relying on the GMRF smoothing, which at is likely underfitting. The metric and hence Inline graphic criterion help us better balance data, noise, and our prior assumptions. In contextualizing these results it is important to note that skyline plots provide harmonic mean and not point estimates of population size (Pybus et al. 2000). Consequently, we are inferring sequences of means from our coalescent data, which a priori may not need to conform to a smooth pattern.

The HCV example shows that for times beyond Inline graphic years there are so few events that it is more sensible to estimate a single mean (panel B), which we are confident in across this period, as opposed to several less certain and overfitted means (panel A). In contrast, for the bison example, the bottleneck over years is over-smoothed (panel C), despite many coalescent events occurring in that region. The simple correction of extending our harmonic mean over 2 events (panel D) restores the necessary fall in population size. Deciding on how to balance uncertainty with model complexity is non-trivial and, as shown in these examples, caution is needed to avoid misleading conclusions. We posit that Inline graphic (and hence ) can help formalize this decision-making and improve our quantification of the uncertainty across skyline plots.

Having confirmed Inline graphic as a credible measure of relative uncertainty, we briefly explore how it relates to more easily ascertained measures of uncertainty. For each simulated coalescent tree in the HCV example above, we computed (via Equation 4) and two ancillary statistics based on the 95% highest posterior density (HPD) intervals of the Inline graphic estimates. These are the median HPD ratio and the relative HPD product (across the skyline segments) , which are formulated as:

with med indicating the median value of a set. Here Inline graphic is the 95% HPD interval of under a GMRF with smoothing parameter and is the equivalent HPD when the objective uniform prior is applied instead.

The 95% HPD interval is closely connected to the inverse of the Fisher information matrices that define Inline graphic and, further, describes the most visually conspicuous representation of the uncertainty present in skyline plot estimates. Comparing to these ancillary statistics, which evaluate the median and total 95% uncertainty of a skyline plot, allows us to contextualize against more relatable (though different) and obvious visualizations of posterior performance. We present these comparisons in Figure A6 of the Appendix. There we find that all statistics monotonically decay with Inline graphic that is as the time-aware GMRF becomes more informative. The sharpness of this decay is highly sensitive to . Larger means that more coalescent data are informing each estimated parameter (smaller ).

The reduced decay with Inline graphic supports our assertion that acts as an exponent controlling prior over-reliance (see Fig. 3). The gentler decay of (relative to and ), which largely does not account for , confirms that we could be misled in our understanding of the impact of smoothing if we neglected skyline dimension. In contrast Inline graphic and , which both measure, in some sense, the relative volumes of uncertainty across the entire skyline-plot due to the data alone and the data and prior, fall more significantly and consistently. At (), which is the most common setting in the Skyride and Skygrid methods, both statistics are markedly below Inline graphic and posterior estimates will often be too dependent on the prior. This high- behavior is also indicative of model overparametrization (Parag and Donnelly, 2020). Our metric therefore relates sensibly to visible and common proxies of uncertainty.

Discussion

Popular approaches to coalescent inference, such as the BSP, Skyride, and Skygrid methods, all rely on combining a piecewise-constant population size likelihood function with prior assumptions that enforce continuity. This combination, which is meant to maximize descriptive flexibility without sacrificing the smoothness that is expected to be exhibited by real population size curves over time, has led to many insights in phylodynamics (Ho and Shapiro 2011). However, it has also spawned concerns related to over-smoothing and lack of methodological transparency (Minin et al. 2008; Faulkner et al. 2019). In this work, we attempted to address these concerns by deriving metrics for diagnosing and clarifying the existing assumptions present in current best practice.

Detecting and correcting for underfitting or over-smoothing is crucial if reliable and meaningful assessments of the effective population size changes of a species or pathogen of interest are to be made from sequence data. Abrupt changes in effective population size are not only biologically plausible but may also signal key events that have shaped the demographic histories of populations (Pyron and Burbink 2013). In ecology, identifying rapid extinctions and bottlenecks in diversity might signify the impact of environmental change or anthropogenic influences (e.g., hunting or changes in land use) (Stiller et al. 2010; Thomas et al. 2019). Similarly, in epidemiology, sharp fluctuations in the prevalence of an infection might support hypotheses about emergence in novel populations, seasonality, the effect of interventions, vaccines, or drug treatments. Further, rapid exponential growth of any population may, when observed over a longer timescale, appear as a near-stepwise transition in population size.

Underfitting or over-smoothing these changes would limit understanding of the dynamics of the study population and could affect conclusions about the potential causative factors that influenced those dynamics. However, recognizing when commonly used methods for inferring these demographic trends are over-smoothing is difficult. By capitalizing on (mutual) information theory and (Fisher) information geometry, we formulated the novel coalescent information ratio, Inline graphic , which provides a rigorous means of solving this over-smoothing problem. This ratio describes both the proportion of the asymptotic uncertainty around our posterior estimates that is due solely to the data and the additional mutual information that the prior assumptions introduce.

We derived analytic expressions for Inline graphic for the BSP, Skyride, and Skygrid estimators of effective population size, which combine piecewise skyline likelihoods with either SMP or GMRF smoothing priors. We also showed that has an exact and intuitive interpretation as the ratio of real coalescent events to the sum of real and virtual (prior-contributed) ones in a Kingman coalescent model. Using Inline graphic as a threshold delimiting when the prior contributes as much information as the coalescent data, we found that it is easy to become overly dependent on prior assumptions as the skyline dimension, , increases (for a fixed tree size). This central result emerges from the drastic reduction in the number of coalescent events informing on any population size parameter as Inline graphic rises. Per parameter, the BSP and Skyride use only a few or one event respectively (Minin et al. 2008; Drummond et al. 2005), while the Skygrid may have no events informing some parameters (Gill et al. 2013).

These issues can be obscured by current Bayesian implementations, which can still produce apparently reasonable population size estimates, at least visually, as illustrated in our simulated HCV and bison case studies. Our simulations indicate that analyses that combine maximally parametrized skylines (one event per segment or parameter) with GMRF smoothing can lead to errors in population size inference. For trees simulated according to the HCV demographic scenario, estimates were likely overfitted in the far past, inflating HPDs, but over-smoothed towards the present. The resulting skyline uncertainty contrasted that from the original (Pybus et al. 2003) and later (Parag and Pybus 2017) analyses. In the bison example, we found evidence for underfitting. The inferred skyline there emphasized a smoother boom-bust trend with concentrated HPDs. However, this underestimated the depth of a bottleneck during which coalescent events were concentrated.

These mismatches between data and smoothing can be difficult to diagnose and problematic, not just for prior over-dependence. Low coalescent event counts, for example, can lead to poor statistical identifiability (Rothenburg 1971), which might manifest in spurious MCMC mixing. Consequently, we proposed a practical Inline graphic rejection criterion for ensuring that coalescent data is the main source of inferential information. This criterion, which was based on an approximation to , provided a way of regularizing skyline complexity. When applied to our examples it recommended a 4-event skyline grouping that resulted in demographic reconstructions that were more consistent with the above mentioned HCV studies. It also suggested a simple 2-event grouping that recovered the bison bottleneck dynamic without generating too much estimate noise.

This Inline graphic criterion bounds the maximum recommended skyline dimension for a given data set (tree) size and provides a usable means of defining the minimum number of coalescent events, , which we should allocate to each skyline segment to guard against too much prior influence. Since only requires our computing the sum of the diagonals of the prior Fisher matrix, it can serve as a simple rule-of-thumb for sensibly balancing the prior-data tradeoff in skyline plots (e.g., in the BSP, the grouping parameter might be set to a value above Inline graphic to ensure well-regularized estimates). As we found to be lower-bounded by more visible measures of skyline uncertainty, such as the product of relative HPD widths, useful approximations to and may also be computed from these measures.

Our Inline graphic metric also provides insight into how we can alleviate the dramatic impact of skyline complexity on prior over-reliance. When specialized to the GMRF, for example, it reveals that we can negate over-smoothing by scaling the smoothing parameter with a quadratic of . Moreover, it shows that only by increasing the information available from the sampled phylogeny can we reasonably allow for more complex piecewise-constant functions under a given prior. Recent methods, such as the epoch sampling skyline plot (Parag et al. 2020), which can double the Fisher information extracted from a given phylogeny by exploiting the informativeness of sampling times, would support higher dimensional skylines. Such approaches have the potential to increase the contribution of the data without elevating the influence of the smoothing prior.

While in this article we have applied Inline graphic to non-parametric, skyline inference problems in population genetics, ecology and infectious disease epidemiology, its general formulation in Equation 4 is more widely applicable. It can be also applied to coalescent inference problems where specific parametric models (e.g., exponential/logistic growth) are used, in order to disentangle the contributions of observed data and the prior distributions over these parameters, though numerical solutions will likely be necessary. More generally, our approach is valid for any statistical problem, provided the Hessian matrices necessary for deriving the prior and data Fisher information terms are valid and computable. This is not limited to prior-data tradeoffs. Similar ratio metrics should be derivable by comparing Fisher information terms from different sources (e.g., to test whether one source of data is more informative than another).

Thus, we have devised and validated a rigorous means of better understanding, diagnosing and preventing prior over-dependence. We hope that our statistic, which clarifies and quantifies the often inscrutable impact of the prior and data, will help researchers make more active and considered design decisions when adapting popular skyline-based techniques. Our work also aligns with recent studies, which have started to re-examine both model selection and prior definition (Parag and Donnelly 2020; Faulkner et al. 2019) in an attempt to derive more reliable effective population size estimates from coalescent trees. While we believe that data-driven conclusions are generally the most justifiable we note that, in the context of skyline plots, this can be open to interpretation and the choice of prior is far from trivial.

Acknowledgments

We thank Louis du Plessis for his useful comments and insights on this project.

Appendix

Smoothing Prior Fisher Information Matrices

Here, we derive the prior-informed Fisher information matrices for the SMP and GMRF smoothing priors. We start by finding the log-population size transformed version of the SMP smoothing prior. We then calculate its Hessian to get Inline graphic , and so obtain the general form of Equation 10. The SMP is given in (Drummond et al., 2005) as . We define so that its inverse . These expressions are in vector form so . We want the transformed prior . Applying the multivariate change of variables formula gives , with as the Jacobian of Inline graphic . This implies that . Substituting gives the SMP log-prior:

(A1)

We can then obtain Inline graphic , with . The diagonals of are: for , and . The non-zero off-diagonal terms are: and . The result is a symmetric tridiagonal matrix that has zero row and column sums. The matrix is then added to the Fisher information matrix (with as the number of coalescent events informing on the Inline graphic parameter), to get .

We now compute Inline graphic , which is given in the main text as Equation (11). For the GMRF (Minin et al. 2008) and so . Taking second derivatives we get diagonal terms of the Hessian, , as: for , and . The nonzero off diagonal terms are: and . The GMRF also gives a symmetric tridiagonal with row and column sums of zero. Adding Inline graphic to the diagonal matrix yields .

Further Smoothing Results

In the main text, we asserted that the Inline graphic computed at the robust point of (Parag and Pybus 2019) generally upper bounds the achievable values at other settings. Here we provide evidence for this assertion. While strictly (except for ), we numerically find that . We show this for the GMRF under uniform smoothing in Figure A1. This makes sense as while (for fixed smoothing parameters) Inline graphic and , there is no reason to believe that this also maximizes their ratio. The sawtooth curves in Figure A1 reflect changes in the other values, given a fixed .

Hence, we used the robust design point in our calculation of the Inline graphic curves for the GMRF in Figure 3. The corresponding additional mutual information () curves for this case are provided in Figure A2. These show how larger values of the smoothing parameter, , directly lead to increases in the relative mutual information contribution from the prior. Observe that Inline graphic is highly sensitive to the skyline complexity, , thus clarifying how estimates from overparametrized skyline plots can be dominated by prior information.

Interestingly, we can largely negate the impact of skyline complexity by making Inline graphic a function of . In the main text we explained how the Skyride implicitly implements the scaling . While this reduces some of the effect of shown in Figure 3, it still leads to decaying curves that can, for a given , be deceptively dependent on smoothing. Here we propose the key transformation Inline graphic , as a means of reducing our smoothing in line with our skyline complexity. This transformation was inspired by the dependence of a lower bound on , which we derive in Equation A3 later in the Appendix. Its striking impact on the spread of curves from Figure 3 is given in Figure A3.

Figure A1. — Robust and optimal designs. For the GMRF smoothing prior with for all and , we show that the optimal design point is not always the same as the robust design point, at which . The colored curves are (along the dashed arrow) for at , and computed across all partitions for any given (hence the zig-zagged form). The gray vertical lines mark the robust point for each curve, and the black circles give the optimal points. While these lines and circles do not always match, both generally feature approximately the same values. We found this to be the case across several and values.

Figure A2. — Prior mutual information increases with skyline complexity.} For the uniform GMRF, we show that under fixed smoothing (and hence ), the additional mutual information introduced by the prior, , significantly increases with the complexity, , of our skyline. The colored curves are (along the grey arrow) for at with (robust design point). The dashed is also given for comparison. Clearly, the more skyline segments we have for a given tree, the more likely we are being overly informed by our prior.

Figure A3. — Negating the impact of skyline dimension. We show how an appropriate quadratic scaling of the GMRF precision parameter, , can remove the complexity () induced smoothing contribution portrayed in Figure 3 of the main text. This scaling significantly compresses the colored curves shown, which are for at with (robust design point). The resulting values are now all comfortably above the threshold and justified by our information theoretic metrics.

Further Model Selection Bounds

In the the main text, we derived lower bounds on Inline graphic , which led to the model rejection parameter, (see Equation 14). Here, we extend and support those results. In Figure A4, we first show that the bound of Equation 14 is a good measure of the true value, for a skyline with uniform GMRF smoothing. We used this bound to define a maximum Inline graphic , , above which the skyline would be over-parametrized and susceptible to prior induced overconfidence. We explore over and for this GMRF in Figure A5 and observe that becomes more restrictive with fewer observed data (coalescent events) or increased smoothing. This supports as a useful measure of prior-data contribution.

Figure A4. — Lower bounds on . For the GMRF smoothing prior with for all and , we compare the lower bound on (red, dashed, see Equation 14) with the actual value of (cyan) at the robust design point of . We examine all integer values that are factors of , and find that qualitatively similar comparisons hold for different and settings. In general the lower bound () is a good approximation to .

Figure A5. — Maximum model selection boundary. For the GMRF smoothing prior with for all and at the robust point , we compute the maximum allowed number of skyline segments, , such that . These curves increase with and decrease with , indicating how the prior-data contribution can be used to define model rejection regions. Skylines with would be overly informed by the prior and hence should not be used.

Figure A6. — Trends in HPD-based statistics and under various time-aware GMRF settings. The (panel A), median HPD ratio of (panel B) and HPD product (panel C) statistics are computed across over various combinations of and . Box-plots summarize our results over 100 observed coalescent trees simulated from previously inferred demographic trends found for the Egyptian HCV data set. Analyses with are in dark green, in yellow and in orange. The solid lines link the median values across boxes for a given value. The dashed line is positioned at the threshold .

Lower bounds on Inline graphic imply upper bounds on the excess mutual information, (see Equation 7). We manipulate Equation 14 (under a robust design) to obtain the first inequality in Equation A2, with as follows

(A2)

This expression reveals that Inline graphic is akin to a signal bandwidth, by comparison with standard Shannon–Hartley theory (Cover and Thomas 2006) and is therefore a key controlling factor in defining how much additional information the prior will introduce. This supports our proposed rejection criterion.

Under the Inline graphic parametrization, and are symmetric, positive definite matrices. For such matrices we can apply a theorem from (Huang and Zhang, 2018), which states that , with . At the robust point, we get , which leads to the second inequality in Equation A2. Thus, our bound is tighter than that in (Huang and Zhang, 2018), and useful for broader, future mathematical analyses of Inline graphic . This inequality also clarifies why is often important for characterizing performance here.

We can also use the bound of (Huang and Zhang, 2018) to derive alternate (but slacker) lower bounds on Inline graphic . This gives the first inequality in Equation A3. Applying this to the uniform GMRF gives the second inequality:

(A3)

Interestingly, Equation A3 shows that the dependence of Inline graphic on the smoothing parameter is at most only linear, while the dependence on complexity can be quadratic. This provides further theoretical backing for the use of to reject models and emphasizes how smoothing can play a deceptively prominent role in the resulting estimate precision produced under complex (high-dimensional) skyline plots.

Ancillary Uncertainty Statistics

In the Egyptian-HCV simulated example, we defined two 95% HPD based ancillary statistics for characterizing the visual uncertainty present in a skyline plot demographic estimate. In Figure A6, we plot these statistics and Inline graphic for various and values under a time-aware GMRF. We discuss the implications of Figure A6 in the main text but observe here that trends between the more common (and more easily visualized) HPD based measures and our novel statistic are largely consistent.

Funding

This study was funded by the UK Medical Research Council (MRC) and the UK Department for International Development (DFID) under the MRC/DFID Concordat agreement and is also part of the EDCTP2 programme supported by the European Union [grant reference MR/R015600/1]. This work was also supported by the Oxford Martin School.

Supplementary Material

Data available from the Dryad Digital Repository: https://datadryad.org/stash/dataset/doi:10.5061/dryad.1jwstqjs2.

References

Beerli P., Felsenstein J. 1999. Maximum likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152:763–773. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ben-Haim Z., Eldar Y. 2009. A lower bound on the Bayesian MSE based on the optimal bias function. IEEE Trans. Information Theory 55(11):5179–5196. [Google Scholar]
Berger J., Bernardo J., Sun D. 2015. Overall objective priors. Bayesian Anal. 10(1):189–221. [Google Scholar]
Bouckaert R., Vaughan T., Barido-Sottani J., Duchêne S., Fourment M., Gavryushkina A., Heled J., Jones G., Kühnert D., De Maio N., Matschiner M., Mendes F., Müller N., Ogilvie H., du Plessis L., Popinga A., Rambaut A., Rasmussen D., Siveroni I., Suchard M., Wu C., Xie D., Zhang C., Stadler T., Drummond A. 2019. BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 15(4):e1006650. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brunel N., Nadal J. 1998. Mutual information, fisher information, and population coding. Neural Comput. 10:1731–1757. [DOI] [PubMed] [Google Scholar]
Cover T., Thomas J. 2006. Elements of information theory. 2nd ed. New Jersey: Wiley. [Google Scholar]
Drummond A., Nicholls G., Rodrigo A., Solomon W. 2002. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161:1307-1320. [DOI] [PMC free article] [PubMed] [Google Scholar]
Drummond A., Rambaut A., Shapiro B., Pybus O. 2005. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22:1185-1192. [DOI] [PubMed] [Google Scholar]
Faulkner J., Magee A., Shapiro B., Minin V. 2019. Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories. Biometrics. 76:677–690. [DOI] [PubMed] [Google Scholar]
Fink D. 1997. A compendium of conjugate priors. Technical Report, Montana State University. [Google Scholar]
Gill M., Lemey P., Faria, N., Rambaut A., Shapiro B., Suchard M. 2013. Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol. Biol. Evol. 30(3):713–724. [DOI] [PMC free article] [PubMed] [Google Scholar]
Griffiths R., Tavare, S. 1994. Sampling theory for neutral alleles in a varying environment. Philos. Trans. R. Soc. B 344:403–410. [DOI] [PubMed] [Google Scholar]
Ho S., Shapiro B. 2011. Skyline-plot methods for estimating demographic history from nucleotide sequences. Mol. Ecol. Resour. 11:423–434. [DOI] [PubMed] [Google Scholar]
Huang W., Zhang K. 2018. Information-theoretic bounds and approximations in neural population coding. Neural Comput. 30(4):885–944. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ipsen I., Rehman R. 2008. Perturbation bounds for determinants and characteristic polynomials. SIAM J. Matrix Anal. Appl. 30(2):762–776. [Google Scholar]
Kingman J. 1982. On the genealogy of large populations. J. Appl. Probab. 19:27–43. [Google Scholar]
Kuhner M., Yamato J., Felsenstein J. 1998. Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149:429–434. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lehmann E., Casella G. 1998. Theory of point estimation. 2nd ed. New York:Springer. [Google Scholar]
Li H., Durbin R. 2011. Inference of human population history from individual whole-genome sequences. Nature 475(7357): 493-496. [DOI] [PMC free article] [PubMed] [Google Scholar]
Minin V., Bloomquist E., Suchard M. 2008. Smooth Skyride through a rough Skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25(7):1459–1471. [DOI] [PMC free article] [PubMed] [Google Scholar]
Parag K., Donnelly C. 2020. Adaptive estimation for epidemic renewal and phylogenetic Skyline models. Syst. Biol. 69(6):1163–1179. [DOI] [PMC free article] [PubMed] [Google Scholar]
Parag K., Pybus O. 2017. Optimal point process filtering and estimation of the Coalescent process. J. Theor. Biol. 421:153–167. [DOI] [PubMed] [Google Scholar]
Parag K., Pybus O. 2019. Robust design for coalescent model inference. Syst. Biol. 68(5):730–743. [DOI] [PubMed] [Google Scholar]
Parag K., du Plessis L., Pybus O. 2020. Jointly inferring the dynamics of population size and sampling intensity from molecular sequences. Mol. Biol. Evol. 37(8):2414–2429. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pybus O., Rambaut A., Harvey P. 2000. An integrated framework for the inference of viral population history from reconstructed genealogies. Genetics 155:1429–1437. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pybus O., Drummond A., Nakano T., Robertson B., Rambaut. A. 2003. The epidemiology and iatrogenic transmission of hepatitis C virus in Egypt: a Bayesian coalescent approach. Mol. Biol. Evol. 20(3):381–387. [DOI] [PubMed] [Google Scholar]
Pyron R., Burbink F. 2013. Phylogenetic estimates of speciation and extinction rates for testing ecological and evolutionary hypotheses. Trends Ecol. Evol. 28(12):729–736. [DOI] [PubMed] [Google Scholar]
Robert C. 2007. The Bayesian choice. Newyork:Springer Science and Business Media. [Google Scholar]
Rodrigo A., Felsenstein J. 1999. Coalescent approaches to HIV-1 population. The evolution of HIV. Baltimore:Johns Hopkins University Press. [Google Scholar]
Rothenburg T. 1971. Identification in parametric models. Econometrica 39(3):577–591. [Google Scholar]
Shapiro B., Drummond A., Rambaut A., Wilson M., Matheus P., Sher A., Pybus O., Gilbert M., Barnes I., Binladen J., Willerslev E., Hansen A., Baryshnikov G., Burns J., Davydov S., Driver J., Froese D., Harington C., Keddie G., Kosintsev P., Kunz M., Martin L., Stephenson R., Storer J., Tedford R., Zimov S., Cooper A. 2004. Rise and fall of the Beringian steppe bison. Science 306(5701):1561–1565. [DOI] [PubMed] [Google Scholar]
Slate E. 1994. Parameterizations for natural exponential families with quadratic variance functions. J. Am. Stat. Assoc. 89(428): 1471–1481. [Google Scholar]
Snyder D., Miller M. 1991. Random point processes in time and space. 2nd ed. Newyork:Springer. [Google Scholar]
Stiller M., Baryshnikov G., Bocherens H., d’Anglade A., Hilpert B., Munzel S., Pinhasi R., Rabeder G., Rosendahl W., Trinkaus E., Hofreiter M., Knapp M. 2010. Withering away-25,000 years of genetic decline preceded cave bear extinction. Mol. Biol. Evol. 27(5): 975–978. [DOI] [PubMed] [Google Scholar]
Strimmer K., Pybus O. 2001. Exploring the demographic history of DNA sequences using the generalized skyline plot. Mol. Biol. Evol. 18(12):2298–2305. [DOI] [PubMed] [Google Scholar]
Thomas J., Carvalho G., Haile J., Rawlence N., Martin M., Ho S., Sigfusson A., Josefsson V., Frederiksen M., Linnebjerg J., Castruita J., Niemann J., Sinding M., Sandoval-Velasco M., Soares A., Lacy R., Barilaro C., Best J., Brandis D., Cavallo C., Elorza M., Garrett K., Groot M., Johansson F., Lifjeld J., Nilson G., Serjeanston D., Sweet P., Fuller E., Hufthammer A., Meldgaard M., Fjeldsa J., Shapiro B., Hofreiter M., Stewart J., Gilbert M., Knapp M. (2019). Demographic reconstruction from ancient DNA supports rapid extinction of the great auk. eLife 8:e47509. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tichavsky P., Muravchik C., Nehorai A. 1998. Posterior Cramer-Rao bounds for discrete-time nonlinear filtering. IEEE Trans. Signal Process. 46(5):1386–1395. [Google Scholar]
van Trees H. 1968. Detection, estimation, and modulation theory, Part I. New Jersey:Wiley. [Google Scholar]
Vaughan T., Drummond A. 2013. A stochastic simulator of birth–death master equations with application to phylodynamics. Mol. Biol. Evol. 30(6):1480–1493. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wakeley J. 2008. Coalescent theory: an introduction. Colorado:Roberts and Company Publishers. [Google Scholar]

[B1] Beerli P., Felsenstein J. 1999. Maximum likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152:763–773. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] Ben-Haim Z., Eldar Y. 2009. A lower bound on the Bayesian MSE based on the optimal bias function. IEEE Trans. Information Theory 55(11):5179–5196. [Google Scholar]

[B3] Berger J., Bernardo J., Sun D. 2015. Overall objective priors. Bayesian Anal. 10(1):189–221. [Google Scholar]

[B4] Bouckaert R., Vaughan T., Barido-Sottani J., Duchêne S., Fourment M., Gavryushkina A., Heled J., Jones G., Kühnert D., De Maio N., Matschiner M., Mendes F., Müller N., Ogilvie H., du Plessis L., Popinga A., Rambaut A., Rasmussen D., Siveroni I., Suchard M., Wu C., Xie D., Zhang C., Stadler T., Drummond A. 2019. BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 15(4):e1006650. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] Brunel N., Nadal J. 1998. Mutual information, fisher information, and population coding. Neural Comput. 10:1731–1757. [DOI] [PubMed] [Google Scholar]

[B6] Cover T., Thomas J. 2006. Elements of information theory. 2nd ed. New Jersey: Wiley. [Google Scholar]

[B7] Drummond A., Nicholls G., Rodrigo A., Solomon W. 2002. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161:1307-1320. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] Drummond A., Rambaut A., Shapiro B., Pybus O. 2005. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22:1185-1192. [DOI] [PubMed] [Google Scholar]

[B9] Faulkner J., Magee A., Shapiro B., Minin V. 2019. Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories. Biometrics. 76:677–690. [DOI] [PubMed] [Google Scholar]

[B10] Fink D. 1997. A compendium of conjugate priors. Technical Report, Montana State University. [Google Scholar]

[B11] Gill M., Lemey P., Faria, N., Rambaut A., Shapiro B., Suchard M. 2013. Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol. Biol. Evol. 30(3):713–724. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] Griffiths R., Tavare, S. 1994. Sampling theory for neutral alleles in a varying environment. Philos. Trans. R. Soc. B 344:403–410. [DOI] [PubMed] [Google Scholar]

[B13] Ho S., Shapiro B. 2011. Skyline-plot methods for estimating demographic history from nucleotide sequences. Mol. Ecol. Resour. 11:423–434. [DOI] [PubMed] [Google Scholar]

[B14] Huang W., Zhang K. 2018. Information-theoretic bounds and approximations in neural population coding. Neural Comput. 30(4):885–944. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] Ipsen I., Rehman R. 2008. Perturbation bounds for determinants and characteristic polynomials. SIAM J. Matrix Anal. Appl. 30(2):762–776. [Google Scholar]

[B16] Kingman J. 1982. On the genealogy of large populations. J. Appl. Probab. 19:27–43. [Google Scholar]

[B17] Kuhner M., Yamato J., Felsenstein J. 1998. Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149:429–434. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] Lehmann E., Casella G. 1998. Theory of point estimation. 2nd ed. New York:Springer. [Google Scholar]

[B19] Li H., Durbin R. 2011. Inference of human population history from individual whole-genome sequences. Nature 475(7357): 493-496. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] Minin V., Bloomquist E., Suchard M. 2008. Smooth Skyride through a rough Skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25(7):1459–1471. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] Parag K., Donnelly C. 2020. Adaptive estimation for epidemic renewal and phylogenetic Skyline models. Syst. Biol. 69(6):1163–1179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] Parag K., Pybus O. 2017. Optimal point process filtering and estimation of the Coalescent process. J. Theor. Biol. 421:153–167. [DOI] [PubMed] [Google Scholar]

[B23] Parag K., Pybus O. 2019. Robust design for coalescent model inference. Syst. Biol. 68(5):730–743. [DOI] [PubMed] [Google Scholar]

[B24] Parag K., du Plessis L., Pybus O. 2020. Jointly inferring the dynamics of population size and sampling intensity from molecular sequences. Mol. Biol. Evol. 37(8):2414–2429. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] Pybus O., Rambaut A., Harvey P. 2000. An integrated framework for the inference of viral population history from reconstructed genealogies. Genetics 155:1429–1437. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] Pybus O., Drummond A., Nakano T., Robertson B., Rambaut. A. 2003. The epidemiology and iatrogenic transmission of hepatitis C virus in Egypt: a Bayesian coalescent approach. Mol. Biol. Evol. 20(3):381–387. [DOI] [PubMed] [Google Scholar]

[B27] Pyron R., Burbink F. 2013. Phylogenetic estimates of speciation and extinction rates for testing ecological and evolutionary hypotheses. Trends Ecol. Evol. 28(12):729–736. [DOI] [PubMed] [Google Scholar]

[B28] Robert C. 2007. The Bayesian choice. Newyork:Springer Science and Business Media. [Google Scholar]

[B29] Rodrigo A., Felsenstein J. 1999. Coalescent approaches to HIV-1 population. The evolution of HIV. Baltimore:Johns Hopkins University Press. [Google Scholar]

[B30] Rothenburg T. 1971. Identification in parametric models. Econometrica 39(3):577–591. [Google Scholar]

[B31] Shapiro B., Drummond A., Rambaut A., Wilson M., Matheus P., Sher A., Pybus O., Gilbert M., Barnes I., Binladen J., Willerslev E., Hansen A., Baryshnikov G., Burns J., Davydov S., Driver J., Froese D., Harington C., Keddie G., Kosintsev P., Kunz M., Martin L., Stephenson R., Storer J., Tedford R., Zimov S., Cooper A. 2004. Rise and fall of the Beringian steppe bison. Science 306(5701):1561–1565. [DOI] [PubMed] [Google Scholar]

[B32] Slate E. 1994. Parameterizations for natural exponential families with quadratic variance functions. J. Am. Stat. Assoc. 89(428): 1471–1481. [Google Scholar]

[B33] Snyder D., Miller M. 1991. Random point processes in time and space. 2nd ed. Newyork:Springer. [Google Scholar]

[B34] Stiller M., Baryshnikov G., Bocherens H., d’Anglade A., Hilpert B., Munzel S., Pinhasi R., Rabeder G., Rosendahl W., Trinkaus E., Hofreiter M., Knapp M. 2010. Withering away-25,000 years of genetic decline preceded cave bear extinction. Mol. Biol. Evol. 27(5): 975–978. [DOI] [PubMed] [Google Scholar]

[B35] Strimmer K., Pybus O. 2001. Exploring the demographic history of DNA sequences using the generalized skyline plot. Mol. Biol. Evol. 18(12):2298–2305. [DOI] [PubMed] [Google Scholar]

[B36] Thomas J., Carvalho G., Haile J., Rawlence N., Martin M., Ho S., Sigfusson A., Josefsson V., Frederiksen M., Linnebjerg J., Castruita J., Niemann J., Sinding M., Sandoval-Velasco M., Soares A., Lacy R., Barilaro C., Best J., Brandis D., Cavallo C., Elorza M., Garrett K., Groot M., Johansson F., Lifjeld J., Nilson G., Serjeanston D., Sweet P., Fuller E., Hufthammer A., Meldgaard M., Fjeldsa J., Shapiro B., Hofreiter M., Stewart J., Gilbert M., Knapp M. (2019). Demographic reconstruction from ancient DNA supports rapid extinction of the great auk. eLife 8:e47509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B37] Tichavsky P., Muravchik C., Nehorai A. 1998. Posterior Cramer-Rao bounds for discrete-time nonlinear filtering. IEEE Trans. Signal Process. 46(5):1386–1395. [Google Scholar]

[B38] van Trees H. 1968. Detection, estimation, and modulation theory, Part I. New Jersey:Wiley. [Google Scholar]

[B39] Vaughan T., Drummond A. 2013. A stochastic simulator of birth–death master equations with application to phylodynamics. Mol. Biol. Evol. 30(6):1480–1493. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40] Wakeley J. 2008. Coalescent theory: an introduction. Colorado:Roberts and Company Publishers. [Google Scholar]

PERMALINK

Are Skyline Plot-Based Demographic Estimates Overly Dependent on Smoothing Prior Assumptions?

Kris V Parag

Oliver G Pybus

Chieh-Hsi Wu

Roles

Abstract

Materials and Methods

Coalescent Inference

Information and Estimation Theory

Results

The Coalescent Information Ratio,

The Kingman Conjugate Prior

Figure 1.

Skyline Smoothing Priors

Figure 2.

The Dangers of Smoothing

Figure 3.

Prior Informed Model Rejection

Figure 4.

Illustrative Examples: Egyptian HCV and Beringian Bison

Figure 5.

Figure 6.

Discussion

Acknowledgments

Appendix

Smoothing Prior Fisher Information Matrices

Further Smoothing Results

Figure A1.

Figure A2.

Figure A3.

Further Model Selection Bounds

Figure A4.

Figure A5.

Figure A6.

Ancillary Uncertainty Statistics

Funding

Supplementary Material

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases