Adaptive Estimation for Epidemic Renewal and Phylogenetic Skyline Models

Kris V Parag; Christl A Donnelly

doi:10.1093/sysbio/syaa035

. 2020 Apr 25;69(6):1163–1179. doi: 10.1093/sysbio/syaa035

Adaptive Estimation for Epidemic Renewal and Phylogenetic Skyline Models

Kris V Parag ^s1,^✉, Christl A Donnelly ^s1,^s2

Editor: Simon Ho

PMCID: PMC7584150 PMID: 32333789

Abstract

Estimating temporal changes in a target population from phylogenetic or count data is an important problem in ecology and epidemiology. Reliable estimates can provide key insights into the climatic and biological drivers influencing the diversity or structure of that population and evidence hypotheses concerning its future growth or decline. In infectious disease applications, the individuals infected across an epidemic form the target population. The renewal model estimates the effective reproduction number, R, of the epidemic from counts of observed incident cases. The skyline model infers the effective population size, N, underlying a phylogeny of sequences sampled from that epidemic. Practically, R measures ongoing epidemic growth while N informs on historical caseload. While both models solve distinct problems, the reliability of their estimates depends on p-dimensional piecewise-constant functions. If p is misspecified, the model might underfit significant changes or overfit noise and promote a spurious understanding of the epidemic, which might misguide intervention policies or misinform forecasts. Surprisingly, no transparent yet principled approach for optimizing p exists. Usually, p is heuristically set, or obscurely controlled via complex algorithms. We present a computable and interpretable p-selection method based on the minimum description length (MDL) formalism of information theory. Unlike many standard model selection techniques, MDL accounts for the additional statistical complexity induced by how parameters interact. As a result, our method optimizes p so that R and N estimates properly and meaningfully adapt to available data. It also outperforms comparable Akaike and Bayesian information criteria on several classification problems, given minimal knowledge of the parameter space, and exposes statistical similarities among renewal, skyline, and other models in biology. Rigorous and interpretable model selection is necessary if trustworthy and justifiable conclusions are to be drawn from piecewise models. [Coalescent processes; epidemiology; information theory; model selection; phylodynamics; renewal models; skyline plots]

Inferring the temporal trends or dynamics of a target population is an important problem in ecology, evolution, and systematics. Reliable estimates of the demographic changes underlying empirical data sampled from an animal or human population, for example, can corroborate or refute hypotheses about the historical and ongoing influence of environmental or anthropogenic factors, or inform on the major forces shaping the diversity and structure of that population (Turchin 2003; Ho and Shapiro 2011). In infectious disease epidemiology, where the target population is often the number of infected individuals (infecteds), demographic fluctuations can provide insight into key shifts in the fitness and transmissibility of a pathogen and motivate or validate public health intervention policy (Rambaut et al. 2008; Churcher et al. 2014).

Sampled phylogenies (or genealogies) and incidence curves (or epi-curves) are two related but distinct types of empirical data that inform about the population dynamics and ecology of infectious disease epidemics. Phylogenies map the tree of ancestral relationships among genetic sequences that were sampled from the infected population (Drummond et al. 2005). They facilitate a retrospective view of epidemic dynamics by allowing estimation of the historical effective size or diversity of that population. Incidence curves chart the number of new infecteds observed longitudinally across the epidemic (Wallinga and Teunis 2004). They provide insight into the ongoing rate of spread of that epidemic, by enabling the inference of its effective reproduction number. Minimal examples of each empirical data type are given in Fig. 1(a)(i) and (b)(ii).

Figure 1. — Skyline and renewal model inference problems. The left panels (a) show how the reconstructed phylogeny of infecteds (i) leads to (branching) coalescent events, which form the Poisson count record of (ii). The timing of these observable events encodes information about the piecewise effective population size function to be inferred in (iii). The right panels (b) indicate how infecteds, which naturally conform to the Poisson count record of (iv) are usually only observed at the resolution of days or weeks, leading to the Poisson histogram record in (v). The number of infecteds in these histogram bins inform on the piecewise effective reproduction number in (vi). Both models feature data with size and involve parameters to be estimated. See Materials and Methods for notation.

The effective reproduction number at time Inline graphic , , is a key diagnostic of whether an outbreak is growing or under control. It defines how many secondary infections an infected will, on average, generate (Wallinga and Teunis 2004). The renewal or branching process model (Fraser 2007) is a popular approach for inferring from incidence curves that generalizes the Lotka–Euler equation from ecology (Wallinga and Lipsitch 2007). Renewal models describe how fluctuations in Inline graphic modulate the tree-like propagation structure of an epidemic and have been used to predict Ebola virus disease case counts and assess the transmissibility of pandemic influenza, for example (Fraser et al. 2011; Cori et al. 2013; Nouvellet et al. 2018). Here indicates discrete time, for example, days.

The effective population size at Inline graphic , , is a popular proxy for census (or true) population size that derives from the genetic diversity of the target demography. When applied to epidemics, measures the number of infecteds contributing offspring (i.e., transmitting the disease) to the next generation (Ho and Shapiro 2011). The skyline plot model (Pybus et al. 2000) is a prominent means of estimating Inline graphic from phylogenies that extends the Kingman coalescent process from population genetics (Kingman 1982). Skyline models explain how variations in influence the shape and size of the infected genealogy and have informed on the historical transmission and origin of HIV, influenza and hepatitis C, among others (Pybus et al. 2001; Lemey et al. 2003; Rambaut et al. 2008). Here, Inline graphic is continuous and usually in units of genealogical time.

While renewal and skyline models depict very different aspects of an infectious disease, they possess some statistical similarities. Foremost is their approximation of Inline graphic and by -dimensional, piecewise-constant functions (see Fig. 1(iii)). Here, is the number of parameters to be inferred from the data and time is regressive for phylogenies but progressive for incidence curves. The choice of is critical to the quality of inference. Models with large Inline graphic can better track rapid changes but are susceptible to noise and uncertainty (overfitting) (Cori et al. 2013). Smaller improves estimate precision but reduces flexibility, easily over-smoothing (underfitting) salient changes (Minin et al. 2008). Optimally selecting , in a manner that is justified by the available data, is integral to deriving reliable and sensible conclusions from these models.

Surprisingly, no transparent, principled and easily computable Inline graphic -selection strategy exists. In renewal models, is often set by trial and error, or defined using heuristic sliding windows (Fraser 2007; Cori et al. 2013). Existing theory on window choice is limited, with (Cori et al. 2013) positing a bound on the minimum number of infecteds a window should contain for a given level of estimate uncertainty and (Nouvellet et al. 2018) initially proposing a “naïve-rational” squared error based window-sizing approach, which they subsequently found inferior to other subjective window choices examined in that study. In skyline models, this problem has been more actively researched because the classic skyline plot (Pybus et al. 2000), which forms the core of most modern skyline methods, overfits by construction, that is, it infers a parameter per data-point. Accordingly, various approaches for reducing Inline graphic , by ensuring that each population size parameter is informed by groups of data points, have been proposed.

The generalized skyline plot (Strimmer and Pybus 2001) uses a small sample correction to the Akaike information criterion (AIC) to achieve one such grouping in an interpretable and computable fashion. However, basing analyses solely on the AIC can still lead to overfitting (Kass and Raftery 1995). The Bayesian skyline plot built on the generalized skyline by additionally incorporating a prior distribution that assumed an exponentially distributed autocorrelation between successive parameters (Drummond et al. 2005). This implicitly influenced group choices but is known to oversmooth or underfit (Minin et al. 2008). As a result, later approaches such as the Skyride and Skygrid reverted to the classic skyline plot and applied Gaussian–Markov smoothing prior distributions to achieve implicit grouping (Minin et al. 2008; Gill et al. 2012). However, these methods also raised concerns about underfitting and the relationship between model selection and smoothing prior settings is obscure (Parag et al. 2020a).

Other approaches to effective population size model selection are considerably more involved. The extended Bayesian skyline plot and the multiple change-point method use piecewise-linear functions and apply Bayesian stochastic search variable selection (Heled and Drummond 2008) and reversible jump MCMC (Opgen-Rhein et al. 2005) to optimize Inline graphic . These algorithms, while capable, are more computationally demanding, and lack interpretability (their results are not easily debugged and linear functions do not possess the biological meaningfulness of constant ones, which estimate the harmonic mean of time-varying population sizes, Pybus et al. 2000). Note that we assume phylogenetic data is available without error (i.e., we do not consider extensions of the above or subsequent methods to genealogical uncertainty) and limit the definition of skyline models to those with piecewise-constant functions. In Fig. A4 of the Appendix, we illustrate estimates from some of these approaches on an empirical HIV data set.

New Inline graphic -selection metrics, which can balance between the interpretability of the generalized skyline and the power of more sophisticated Bayesian selection methods, are therefore needed. Here, we attempt to answer this need by developing and validating a minimum description length (MDL)-based approach that unifies renewal and skyline model selection. MDL is a formalism from information theory that treats model selection as equivalent to finding the best way of compressing observed data (i.e., its shortest description) (Rissanen 1978). MDL is advantageous because it includes both model dimensionality and parametric complexity within its definition of model complexity (Rissanen 1996). Parametric complexity describes how the functional relationship between parameters matters (Myung et al. 2006) and is usually ignored by standard selection criteria. However, MDL is generally difficult to compute (Grunwald 2007), which may explain why it has not penetrated the epidemiological or phylodynamics literature.

We overcome this issue by deriving a tractable Fisher information approximation (FIA) to MDL. This is achieved by recognizing that sampled phylogenies and incidence curves both sit within a Poisson point process framework and by capitalizing on the piecewise-constant structure of skyline and renewal models. The result is a pair of analogous FIA metrics that lead to adaptive estimates of Inline graphic and by selecting the most justified by the observed Poisson data. These expressions decompose model complexity into clearly interpretable contributions and are as computable as the standard AIC and the Bayesian information criterion (BIC). We find, over a range of selection problems, that the FIA generally outperforms the AIC and BIC, emphasizing the importance of including parametric complexity. This improvement requires some knowledge about the piecewise parameter space domain.

Materials and Methods

Phylogenetic Skyline and Epidemic Renewal Models

The phylogenetic skyline and epidemic renewal models are popular approaches for solving inference problems in infectious disease epidemiology. The skyline plot or model (Ho and Shapiro 2011) infers the hidden, time-varying effective population size, Inline graphic , from a phylogeny of sequences sampled from that infected population; while the renewal or branching process model (Fraser et al. 2011) estimates the hidden, time-varying effective reproduction number, , from the observed incidence of an infectious disease. Here, indicates continuous time, which is progressive (moving from past to present) in the renewal model, but reversed (retrospective) in the skyline, while Inline graphic is its discrete equivalent. We use here initially as we work in continuous time before deriving the discretized version .

While both models solve different problems, they approximate their variable of interest, Inline graphic , with a -dimensional piecewise-constant function, and assume a Poisson point process (PP) relationship between it and the observed data, , as in Eq. (1).

(1)

Here, Inline graphic is either or and is either phylogenetic or incidence data, depending on the model of interest. The piecewise component of , which is valid over the interval , is . The rate function, depends on and allows us to treat the usually distinct skyline and renewal models within the same Poisson point process framework. We want to estimate the parameter vector Inline graphic from the data over , denoted . We consider two fundamental mechanisms for observing and then show how they apply to skyline and renewal models in turn.

The first, known as a Poisson count record (Snyder and Miller 1991), involves having access to every event time of the Poisson process, that is, Inline graphic is observed directly. Eq. (2) gives the likelihood of these data, in which a total of events occur.

(2)

The Inline graphic event time is and . The set collects all event indices within the piecewise interval and emphasizes that the parameter controlling the rate in is . We denote the portion of events falling within as so that . The number of elements in is therefore . The boundaries of are defined by the times of the Inline graphic event (exclusive) and the event (inclusive). The size of the data is also summarized by and starts at 0.

The second is called a Poisson histogram record (Snyder and Miller 1991) and applies when individual events are not observed. Instead only counts of the events occurring within time bins are available and the size of the data is now defined by the number of bins. We redefine Inline graphic for this data type as the number of bins so that it again controls data size. The bin is defined on interval and has count . We use to denote the bin transformed version of . The likelihood is then given by Eq. (3).

(3)

Here, Inline graphic is the Poisson rate integrated across the observation bin and again defines the indices (of bins in this case) that are controlled by . The time interval over which is valid is . Figure 1 illustrates the relationship between histogram and count records. We now detail how these two observation schemes apply to phylogenetic and incidence data and hence skyline and renewal models.

The skyline model is founded on the coalescent approach to phylogenetics (Kingman 1982). Here, genetic sequences (lineages) sampled from an infected population across time elicit a reconstructed phylogeny or tree, in which these lineages successively merge into their common ancestor. The observed branching or coalescent times of this tree form a Poisson point process that contains information about the piecewise effective population parameters Inline graphic . Since the coalescent event times are observable, phylogenetic data correspond to a Poisson count record. The rate underlying the events for is with counting the lineages in the phylogeny at time (this increases at sample event times and decrements at coalescent times).

The log-likelihood of the observed, serially sampled tree data, denoted by count record Inline graphic is then derived from Eq. (2) to obtain Eq. (4), which is equivalent to standard skyline log-likelihoods (Drummond et al. 2005), but with constant terms removed.

(4)

Here, Inline graphic and counts the number of coalescent events falling within . The endpoints of coincide with coalescent event times, as in (Pybus et al., 2000), (Drummond et al., 2005), and (Parag et al., 2020b). Figure 1a outlines the skyline coalescent inference problem and summarizes its notation. Since Inline graphic can have a large dynamic range (e.g., for exponentially growing epidemics), we will analyze the skyline model under the robust log transform (Parag and Pybus 2019), which ensures good statistical properties.

The maximum likelihood estimate (MLE) and Fisher information (FI) are important measures for describing how estimates of Inline graphic (or ) depend on . We compute the MLE, , and FI, , of the skyline model by solving and and then log-transforming, with as the vector derivative operator (Lehmann and Casella 1998). The result is Eq. (5) (Parag and Pybus 2019).

(5)

For a given Inline graphic , the MLE controls the per-segment bias because as increases decreases. The FI defines the precision, that is, the inverse of the variance around the MLEs, and also (directly) improves with . We will find these two quantities to be integral to formulating our approach to -model selection. Thus, the FI and MLE control the per-segment performance, while Inline graphic determines how well the overall piecewise function adapts to the underlying generating process.

The renewal model is based on the classic (Lotka–Euler) renewal equation or branching process approach to epidemic transmission (Wallinga and Lipsitch 2007). This states that the number of new infecteds depends on past incidence through the generation time distribution, and the effective reproduction number Inline graphic . As incidence is usually observed on a coarse temporal scale (e.g., days or weeks), exact infection times are not available. As a result, incidence data conform to a Poisson histogram record with the number of infecteds observed in the bin denoted . For simplicity, we assume daily (unit) bins. The generation time distribution is specified by Inline graphic , the probability that an infected takes between and days to transmit that infection (Fraser 2007).

The total infectiousness of the disease is Inline graphic . We make the common assumptions that is known (it is disease specific) and stationary (does not change with time) (Cori et al. 2013). If an epidemic is observed for days then the historical incidence counts, , constitute the histogram record informing on the piecewise parameters to be estimated, Inline graphic . The renewal equation asserts that (Fraser 2007). Setting this as the integrated bin rate allows us to obtain the log-likelihood of Eq. (6) from Eq. (3).

(6)

Here, Inline graphic and are sums across the indices , which define the bins composing . Equation 6 is equivalent to the standard renewal log-likelihood (Fraser et al. 2011) but with the constant terms removed.

This derivation emphasizes the statistical similarity between count and histogram records (and hence skyline and renewal models) and allows generalization to variable width histogram records (e.g., irregularly timed epi-curves). Figure 1b illustrates the renewal inference problem and its associated notation. We can compute the relevant MLE and robust FI from Eq. (6) as Eq. (7) (Fraser et al. 2011; Parag and Pybus 2019).

(7)

As each Inline graphic becomes large the per-segment bias decreases. Using results from (Parag and Pybus, 2019), we find the square root transform of to be robust for renewal models, that is, it guarantees optimal estimation properties. We compute the FI under this parametrization to reveal that the total infectiousness controls the precision around our MLEs (via Inline graphic ). This will also improve as increases, but with the caveat that the parameters underlying bigger epidemics (specified by larger historical incidence values and controlled via ) are easier to estimate than those of smaller ones.

In both models, we find a clear piecewise separation of MLEs and FIs. Per-segment bias and precision depend on the quantity of data apportioned to each parameter. This data division is controlled by Inline graphic , which balances per-segment performance against the overall fit of the model to its generating process. Thus, model dimensionality fundamentally controls inference quality. Large means more segments, which can adapt to rapid or changes. However, this also rarefies the per-segment data (grouped sums like Inline graphic or decrease) with both models becoming unidentifiable if . Small improves segment inference, but stiffens the model. We next explore information theoretic approaches to -selection that formally utilize both MLEs and FIs within their decision making.

Model and Parametric Complexity

Our proposed approach to model selection relies on the MDL framework of (Rissanen, 1978). This treats modeling as an attempt to compress the regularities in the observed data, which is equivalent to learning about its statistical structure. MDL evaluates a Inline graphic -parameter model, , in terms of its code length (in e.g., nats or bits) as (Grunwald 2007). Here, computes the length to encode and is the observed data. is the sum of the information required to describe and the data given that is chosen. More complex models have larger (more bits are needed to depict just the model), and smaller Inline graphic (as complex models should better fit the data, there is less remaining information to detail).

If Inline graphic models are available to describe , then the model with best compresses or most succinctly represents the data. The model with is known to possess the desirable properties of generalizability and consistency (Grunwald 2007). The first means that provides good predictions on newly observed data (i.e., it fits the underlying data generating process instead of a specific instance of data obtained from that process), while the second indicates that the selected Inline graphic will converge to the true model index (if one exists) as data increase (Barron et al. 1998; Pitt et al. 2002). If represents the -parameter vector of and is a potential instance of data derived from the same generating process as then the MDL code lengths can be reframed as (Rissanen 1996).

The first term of MDL Inline graphic describes the goodness-of-fit of the model to the observed data, while the second term balances this against the fit to unobserved data ( is the MLE of the parameters of but with as data) from the same process. This is done over all possible data that could be obtained from that process (hence the integral with respect to Inline graphic ) and measures the generalizability of the model. This generalizability term is usually intractable. We therefore use a well-known FI approximation from (Rissanen, 1996), which we denote FIA for in Eq. (8), with “det” as the standard matrix determinant.

(8)

The approximation of Eq. (8) is good, provided certain regularity conditions are met. These mostly relate to the FI being identifiable and continuous in Inline graphic and are not issues for either skyline or renewal models (Myung et al. 2006). While we will apply the FIA within a class of renewal or skyline models, this restriction is unnecessary. The FIA can be used to select among any variously parametrized and non-nested models (Grunwald 2007).

The FIA not only maintains the advantages of MDL, but also has strong links to Bayesian model selection (BMS). BMS compares models based on their posterior evidence, that is, BMS Inline graphic (Kass and Raftery 1995). BMS and MDL are considered the two most complete and rigorous model selection measures (Grunwald 2007). As with MDL, the BMS integral is often intractable and it can be difficult to disentangle and interpret how the formulation of impacts its associated complexity according to these metrics (Pitt et al. 2002). Interestingly, if a Jeffreys prior distribution is used for Inline graphic , then it can be shown that BMS FIA (via an asymptotic expansion) (Myung et al. 2006). Consequently, the FIA uniquely trades off the performance of BMS and MDL for some computational ease.

However, this tradeoff is not perfect. For many model classes the integral of the FI in Eq. (8) can be divergent or difficult to compute (Grunwald 2007). At the other end of the computability–completeness spectrum are standard metrics such as the AIC and BIC, which are quick and simple to construct, calculate, and interpret. These generally penalize a goodness-of-fit term (e.g., Inline graphic ) with the number of parameters and may also consider the total size of the data . Unfortunately, these methods often ignore the parametric complexity of a model, which measures the contribution of the functional form of a model to its overall complexity. Parametric complexity explains why two-parameter sinusoidal and exponential models have non-identical complexities, for example. This concept is detailed in (Pitt et al., 2002) and (Grunwald, 2007) and corresponds to the FI integral term in Eq (8).

This provides the statistical context for our proposing the FIA as a meaningful metric for skyline and renewal models. In the Results section, we will show that the piecewise separable MLEs and FIs (Eqs 5 and 7) of these models not only ensure that the FI integral is tractable, but also guarantee that Eq. (8) is no more difficult to compute than the AIC or BIC. Consequently, our proposed adaptation of the FIA is able to combine the simplicity of standard measures such as the AIC and BIC while still capturing the more sophisticated and comprehensive descriptions of complexity inherent to the BMS and MDL by including parametric complexity. This point is embodied by the relationship between the FIA and BIC. As data size asymptotically increases, the parametric complexity becomes less important (it does not grow with Inline graphic ) and FIA BIC. The BIC is hence a coarser approximation to both the MDL and BMS, than the FIA (Myung et al. 2006).

While the FIA achieves a favorable compromise among interpretability, completeness and computability in its description of complexity, it does depend on roughly specifying the domain of the FI integral. We will generally assume some arbitrary but sensible domain. However, when this is not possible the Qian–Kunsch approximation to MDL, denoted QK Inline graphic and given in Eq. (9), can be used (Qian and Kunsch 1998).

(9)

This approximation trades off some interpretability and performance for the benefit of not having to demarcate the multidimensional domain of integration.

Lastly, we provide some intuition about Eq. (8), which balances fit via the maximum log-likelihood Inline graphic against model complexity, which can be thought of as a geometric volume defining the set of distinguishable behaviors (i.e., parameter distributions) that can be generated from the model. This volume is composed of two terms. The first, , shows, unsurprisingly, that higher model dimensionality, Inline graphic , expands the volume of possible behaviors. Less obvious is the fact that increased data size also enlarges this volume because distinguishability improves with inference resolution. The second term, which is parametric complexity, is invariant to transformations of , independent of and is an explicit volume integral measuring how different functional relationships among the parameters, defined via the FI, influence the possible, distinguishable behaviors the model can describe (Grunwald 2007).

Results

The Insufficiency of Log-Likelihoods

The inference performance of both the renewal and skyline models, for a given data set, strongly depends on the chosen model dimensionality, Inline graphic . As observed previously, current approaches to -selection utilize ad hoc rules or elaborate algorithms that are difficult to interrogate. Here, we emphasize why finding an optimal , denoted , is important and illustrate the pitfalls of inadequately balancing bias and precision. We start by proving that overfitting is a guaranteed consequence of depending solely on the log-likelihood for Inline graphic -selection. While this may seem obvious, early formulations of piecewise models did over-parametrize by setting (Strimmer and Pybus 2001) and our proof can be applied more generally, for example, when selecting among models with . Substituting the MLEs of Eq. (5) and Eq. (7) into Eq. (4) and Eq. (6), we get Eq. (10).

(10)

Both the renewal and skyline log-likelihoods take the form Inline graphic , due to their inherent and dominant piecewise-Poisson structure. Here, and are grouped variables that are directly computed from the observed data ( or ). The most complex model supportable by the data is at , with . As the data size () is fixed, we can clump the indices falling within the duration of the Inline graphic group as and . The log-sum inequality from (Cover and Thomas, 2006) states that . Repeating this across all possible groupings results in Eq. (11).

(11)

Thus, log-likelihood based model selection always chooses the highest dimensional renewal or skyline model. This result also holds when solving Eq. (11) over a subset of all possible Inline graphic , provided smaller models are non-overlapping groupings of larger ones (Hanson and Fu 2004). Thus, it is necessary to penalize with some term that increases with .

The highest Inline graphic -model is most sensitive to changes in , but extremely noisy and likely to overfit the data. This noise is reflected in a poor FI. From Eq. (5) and Eq. (7) it is clear that grouping linearly increases the FI, hence smoothing noise. However, this improved precision comes with lower flexibility. At the extreme of Inline graphic , for example, is approximated by a single, perennial parameter, and the log-likelihood is unchanged for all combinations of data that produce the same grouped sums. This oversmooths and underfits. We will always select if our log-likelihood penalty is too sensitive to dimensionality.

We now present some concrete examples of bad model selection. We use adjacent groupings of size Inline graphic to control that is, every clumps successive indices (the last index is ). In Fig. 2(a), we examine skyline models with periodic exponential fluctuations ((i)–(ii)) and bottleneck variations ((iii)–(iv)). The periodic case describes seasonal epidemic oscillations in infecteds, while the bottleneck simulates the severe decline that results from a catastrophic event. In Fig. 2(b), we investigate renewal models featuring cyclical ((i)–(ii)) and sigmoidal ((iii)–(iv)) Inline graphic dynamics. The cyclical model depicts the pattern of spread for a seasonal epidemic (e.g., influenza), while the sigmoidal one might portray a vaccination policy that quickly leads to outbreak control.

Figure 2. — Skyline and renewal model under and overfitting. Small leads to smooth but biased estimates characteristic of underfitting ((i) and (iii) in (a) and (b)). Large results in noisy estimates that respond well to changes. This is symptomatic of overfitting ((ii) and (iv) in (a) and (b)). The MLEs ( or ) are in blue and the true or in black. Panel (a) shows cyclic and bottleneck skyline models at and (b) focuses on sinusoidal and sigmoidal renewal models at .

In both Fig. 2(a) and (b), we observe underfitting at low Inline graphic ((i) and (iii)) and overfitting at high ((ii) and (iv)). The detrimental effects of choosing the wrong model are not only dramatic, but also realistic. For example, in the skyline examples the underfitted case corresponds to the fundamental Kingman coalescent model (Kingman 1982), which is often used as a null model in phylogenetics. Alternatively, the classic skyline (Pybus et al. 2000), which is at the core of many coalescent inference algorithms, is exactly as noisy as the overfitted case. Correctly, penalizing the log-likelihood is therefore essential for good estimation, and forms the subject of the subsequent section.

Minimum Description Length Selection

Having clarified the impact of non-adaptive estimation, we develop and appraise various, easily computed, model selection metrics, in terms of how they penalize renewal and skyline log-likelihoods. The most common and popular metrics are the AIC and BIC (Kass and Raftery 1995), which we reformulate in Eqs 12 and 13, with Inline graphic or for skyline and renewal models, respectively.

(12)

(13)

By decomposing the AIC and BIC on a per-segment basis (for a model with Inline graphic segments or dimensions), as in Eqs 12 and 13, we gain insight into exactly how they penalize the log-likelihood. Specifically, the AIC simply treats model dimensionality as a proxy for complexity, while the BIC also factors in the total dimension of the available data. A small-sample correction to the AIC, which adds a further Inline graphic to the penalty in Eq. (12), was used in (Strimmer and Pybus, 2001) for skyline models. We found this correction inconsequential to our later simulations and so used the standard AIC only.

As discussed in the Materials and Methods section, these metrics are insufficient descriptions because they ignore parametric complexity. Consequently, we suggested the MDL approximations of Eqs 8 and 9. We now derive and specialize these expressions to skyline and renewal models. Adapting the FIA metric of Eq. (8) forms a main result of this work. Its integral term, Inline graphic , can, in general, be intractable (Rissanen 1996). However, the piecewise structure of both the skyline and renewal models, which leads to orthogonal (diagonal) FI matrices, allows us to decompose as with as the diagonal element of , which only depends on . Note that or for the skyline and renewal model, respectively.

Using this decomposition, we partition Inline graphic across each piecewise segment as . The is known to be invariant to parameter transformations (Grunwald 2007). This is easily verified by using the FI change of variable formula (Lehmann and Casella 1998). This asserts that , with as some function of . The orthogonality of our piecewise-constant FI matrices allows this component-by-component transformation. Hence Inline graphic , which equals . We let denote the robust transform of or for the skyline or renewal model, respectively. Robust transforms make the integral more transparent by removing the dependence of on (Parag and Pybus 2019).

Hence, we use Eq. (5) ( Inline graphic ) and Eq. (7) () to further obtain and . The domain of integration for each parameter is all that remains to be solved. We make the reasonable assumption that each piecewise parameter, , has an identical domain. This is and , with as an unknown model-dependent maximum. The minima of 1 and 0 are sensible for these models. This gives Inline graphic or for the skyline or renewal model. Substituting into and Eq. (8) yields Eq. (14) and Eq. (15).

(14)

(15)

Equations 14 and 15 present an interesting and more complete view of piecewise model complexity. Comparing to Eq. (13) reveals that the FIA further accounts for how the data are divided among segments, making explicit use of the robust FI of each model. This is an improvement over simply using the (clumped) data dimension Inline graphic . Intriguingly, the maximum value of each parameter to be inferred, , is also central to computing model complexity. This makes sense as models with larger parameter spaces can describe more types of dynamical behaviors (Grunwald 2007). By comparing, these terms we can disentangle the relative contribution of the data and parameter spaces to complexity.

One limitation of the FIA is its dependence on the unknown Inline graphic , which is assumed finite. This is reasonable as similar assumptions would be implicitly made to compute the BMS or MDL (in cases where they are tractable). The QK metric (Qian and Kunsch 1998), which also approximates the MDL, partially resolves this issue. We compute QK by substituting FIs and MLEs into Eq. (9). Expressions identical to Eqs 14 and 15 result, except for the Inline graphic -based terms, which are replaced as in Eqs 16 and 17.

(16)

(17)

These replacements require no knowledge of the parameter domain, but still approximate the parametric complexity of the model (Qian and Kunsch 1998). However, in gaining this domain independence we lose some performance (see later sections), and transparency. Importantly, both the FIA and QK are as easy to compute as the AIC or BIC. The similarity in the skyline and renewal model expressions reflects the significance of their piecewise-Poisson structure. We next investigate the practical performance of these metrics.

Adaptive Estimation: Epidemic Renewal Models

We validate our FIA approach on several renewal inference problems. We simulate incidence curves, Inline graphic , via the renewal or branching process relation with as the true effective reproduction number that we wish to estimate and Poiss indicating the Poisson distribution. We construct using a gamma generation time distribution that approximates the one used in (Nouvellet et al., 2018) for Ebola virus outbreaks. We initialize each epidemic with Inline graphic infecteds as in (Cori et al., 2013). We condition on the epidemic not dying out, and remove initial sequences of zero incidence to ensure model identifiability. We consider an observation period of days, and select among models with such that is divisible by . Here counts how many days are grouped to form a piecewise segment (i.e., the size of every Inline graphic ), and model dimensionality, , is bijective in that is, .

We apply the criteria developed above to select among possible Inline graphic -parameter (or -grouped) renewal models. For the FIA, we set as a conservative upper bound on the reproduction number domain. We start by highlighting how the FIA (1) regulates between the over and underfitting extremes from Fig. 2(b), and (2) updates its selected as the data increase. These points are illustrated in Fig. 3(a) and Fig. 3(b). Graphs (i) and (iii) exemplify (1) as the FIA ((iii)) reduces Inline graphic from the maximum chosen by the log-likelihood ((i)), leading to estimates that balance noise against dimensionality. Interestingly, the FIA chooses a minimum of segments for the sigmoidal fall in Fig. 3, and so pinpoints its key dynamics. As the observed data are increased (graphs (ii) and (iv) of Fig. 3(a) and 3(b)) the FIA adapts Inline graphic to reflect the improved resolution that is now justified, hence demonstrating (2). The increased data use more, conditionally independent (on ) curves and have size . The and used now sum over all 6 curves.

Figure 3. — Adaptive cyclical and sigmoidal estimation with FIA. In (a) and (b), graphs (i)–(ii) present optimal log-likelihood based MLEs for ((i)) and ((ii)) observed incidence data streams, simulated under renewal models with time-varying effective reproduction numbers. Graphs (iii)–(iv) give the FIA adaptive estimates at the same settings with . Panels (a) and (b) examine cyclical and sigmoidal (also called logistic) reproduction number profiles, respectively.

While the above examples provide practical insight into the merits of the FIA, they cannot rigorously assess its performance, since continuous Inline graphic functions have no true or . We therefore study two problems in which a true exists: a simple binary classification, and a more complex piecewise model search. In both, we benchmark the FIA against the AIC, BIC, and QK metric over the same set of simulated curves. We note that, when Inline graphic is piecewise-constant, increasing the number of conditionally independent curves improves the probability of recovering . We discuss the results of the first problem in the Appendix (see Fig. A1), where we show that the FIA most accurately identifies between a null model of an uncontrolled epidemic and an alternative model featuring rapid outbreak control. The FIA uniformly outperforms all other metrics at every Inline graphic in this problem, with the QK a close second.

For the second and more complicated problem, we consider models involving piecewise-constant Inline graphic changes after every days, with looping over and days. For every we generate independent epidemics, allowing to vary in each run, with magnitudes uniformly drawn from . Fig. 4(a) illustrates typical random telegraph models at each (these change in magnitude for each run). Key selection results are shown in Fig. 4(b) with Inline graphic , in (i) and in (ii). In both cases, the FIA attains the best overall accuracy, that is, the largest sum of across , followed by the QK (which overlaps the FIA curve in (i)), BIC and AIC. The dominance of both MDL-based criteria suggests that parametric complexity is important. However, the FIA can do worse than the BIC and QK when Inline graphic is large compared to (or if is notably above 0). We discuss these cases in the Appendix (see Fig. A3), explaining why the reduced is used in (ii).

Figure 4. — Renewal model selection. We simulate epidemics from renewal models with and . We test the ability of several model selection criteria to recover the true from among this set. Each epidemic has an independent, piecewise-constant , examples of which are shown in (a). These models change in amplitude but not for every simulation. Panel b) shows the probability of detecting the true model as a function of and (i) considers with while (ii) uses and . The FIA performs best at every in (i) and overall in (ii).

Adaptive Estimation: Phylogenetic Skyline Models

We verify the FIA performance on several skyline problems. We simulate serially sampled phylogenies with sampled tips spread evenly over some interval using the phylodyn R package of (Karcher et al., 2017). Increasing the sampling density within that interval increases overall data size Inline graphic (each pair of sampled tips can produce a coalescent event). We define our segments as groups of coalescent events. Skyline model selection is more involved because the end-points of the segments coincide with coalescent events. While this ensures statistical identifiability, it means that grouping is sensitive to phylogenetic noise (Strimmer and Pybus 2001), and that Inline graphic changes for a given if varies (). This can result in MLEs, even at optimal groupings, appearing delayed or biased relative to , when is not a grouped piecewise function. Methods are currently under being developed to resolve these biases (Parag et al. 2020b).

Nevertheless, we start by examining how our FIA approach mediates the extremes of Fig. 2(a). We restrict our grouping parameter to Inline graphic , set () and apply the FIA of Eq. (14) to obtain Fig. 5(a) and (b). Two points are immediately visible: (1) the FIA ((iii)–(iv)) regulates the noise from the log-likelihood ((i)–(ii)), and (2) the FIA supports higher when the data are increased ((iv)). Specifically, the FIA characterizes the bottleneck of Fig. 5(b) using a minimum of segments but with a delay. As data accumulate, more groups can be justified and so the FIA is able to compensate for the delay. Note that the last 1–2 coalescent events are often truncated, as they can span half the time-scale, and bias all model selection criteria (Nordborg 2001). In the Appendix (see Fig. A4), we show how the sensitivity of the FIA to event density compares to other methods on empirical data (see the Materials and Methods section).

Figure 5. — Adaptive periodic and bottleneck estimation with FIA. For (a) and (b), graphs (i)–(ii) present inferred under optimal log-likelihood groupings, while (iii)–(iv) show corresponding estimates under the FIA at . Graphs (i) and (iii) feature while (ii) and (iv) have (data size increases). Panels (a) and (b) respectively consider periodically exponential and bottleneck population size changes, with phylogenies sampled approximately uniformly over and time units.

We consider two model selection problems involving a piecewise-constant Inline graphic , to formally evaluate the FIA against the QK, BIC, and AIC. We slightly abuse notation by redefining as the number of coalescent events per piecewise segment. The first is a binary hypothesis test between a Kingman coalescent null model (Kingman 1982) and an alternative with a single shift to Inline graphic . We investigate this problem in the Appendix and show in Fig. A2 (i) that the FIA is, on average, better at selecting the true model than other criteria, with the QK a close second. Further, these metrics generally improve in accuracy with increased data. Closer examination also reveals that the FIA and QK have the best overall true positive and lowest false positive rates (Fig. A2(ii)).

The second classification problem is more complex, requiring selection from among 5 possible square waves, with half-periods that are powers of 2. We define 15 change-point times at multiples of Inline graphic time units (i.e., there are 16 components) and allow to fluctuate between maximum and . At each change-point and 0, equal numbers of samples are introduced, to allow approximately coalescent events per component (the phylogeny has total events). The possible models are in Fig. 6(a). A similar problem, but for Gaussian MDL selection, was investigated in (Hanson and Fu, 2004). We simulate 200 phylogenies from each wave and compute the probability that each metric selects the correct model (i.e., Inline graphic ) at ((i)) and ((ii)) with in Fig. 6(b). The group size () search space is times the half-period of every wave.

Figure 6. — Skyline model selection. We simulate 200 sampled phylogenies from each of the 5 square wave models of (a), with coalescent events per segment. Each square wave varies between and (ratios shown on y axes), and occurs with varying half-periods over 16 segments (x axes) of duration . Each phylogeny contains sampled tips at 0 and every multiple of time units after. Panel (b) gives the probability that several model selection criteria select the true () model from among these waves at for ((i)) and ((ii)). The FIA is the most accurate criterion on average and improves with and as gets closer to the true .

We find that the FIA has the best overall accuracy at both Inline graphic settings (i.e., the largest sum of across ), though the BIC is not far behind. The QK displays slightly worse performance than the BIC and the AIC is the worst (except at low ). At ((i)), there is a greater mismatch with and so the FIA is not as dominant. As ((ii)) gets closer to Inline graphic this issue dissipates. We discuss this dependence of FIA on in the Appendix (see Fig. A3). Observe that the improves for most metrics as the sample phylogeny data size () increases (consistency). The strong performance of the FIA confirms the impact of parametric complexity, while the suboptimal QK curves suggest that these advantages are sometimes only realizable when this complexity component is properly specified.

Discussion

Identifying salient fluctuations in effective population size, Inline graphic , and effective reproduction number, , is essential to understanding the retrospective and continuing behavior of an epidemic, at the population level. A significant swing in could inform on whether an outbreak is exponentially growing (e.g., if for a sustained period) or if enacted control measures are working (e.g., if Inline graphic falls rapidly below ) (Fraser et al. 2011; Cori et al. 2013). Similarly, sharp changes in could evidence the historical impact of a public health policy (e.g., if has a bottleneck or logistic growth) or corroborate hypotheses about past transmissions (e.g., if correlates with seasonal changes) (Rambaut et al. 2008; Pybus et al. 2001). Together, Inline graphic and can provide a holistic view of the temporal dynamics of an epidemic, with their change-points signifying the impact of climatic, ecological, and anthropogenic factors (Ho and Shapiro 2011.

Piecewise-constant approaches, such as skyline plots and renewal models, are tractable and popular ways of separating insignificant fluctuations (the constant segments) from meaningful ones (the change points). However, the efficacy of these models requires principled and data-justified selection of their dimension, Inline graphic . Failure to do so, as in Fig. 2, could result in salient changes being misidentified (i.e., underfitting) or random noise being over-interpreted (i.e., overfitting). Existing approaches to -selection for renewal models usually involve heuristics or trial and error (Cori et al. 2013). Skyline models feature a more developed set of Inline graphic -selection methods but many of these, though widely used, are either computationally complex (e.g., involving sophisticated MCMC algorithms) (Heled and Drummond 2008) or difficult to interpret (e.g., when is implicitly controlled with smoothing prior distributions) (Ho and Shapiro 2011; Parag et al. 2020a).

We therefore focused on finding a Inline graphic -selection metric that favorably compromises among simplicity, transparency, and performance. We started by proving that ascribing solely on the evidence of the log-likelihood (i.e., the model fit) guarantees overfitting (see Eq. (11)). Consequently, it is absolutely necessary to penalize the log-likelihood with a measure of model complexity. However, getting this measure wrong can just as easily lead to underfitting. This is a known issue in common skyline methods that apply smoothing prior distributions for example, where the prior-induced penalty is unclear (Minin et al. 2008). Standard metrics, such as the AIC and BIC, are easy to compute and offer transparent penalties; treating model complexity as either equivalent to Inline graphic or mediated by the observed data size (see Eqs 12 and 13). However, this description, while useful, is incomplete, and neglects parametric complexity (Rissanen 1996).

Parametric complexity describes how the functional relationship among parameters matters. MDL and BMS, which are the most powerful model selection methods, both account for parametric complexity but are often intractable (Grunwald 2007). The general FIA of Eq. (8) approximates both the MDL and BMS and defines this complexity as an integral across parameter space (Myung et al. 2006). Unfortunately, this integral is often difficult to evaluate, also rendering the FIA impractical. However, we found that the piecewise-constant nature of renewal and skyline models, together with their Poisson data structures, allowed us to analytically solve this integral and obtain Eqs 14 and 15. These expressions form our main results, are of similar computability to the AIC and BIC, and disaggregate model complexity into interpretable elements as follows for Eq. (15).

A similar breakdown exists for Eq. (14). Intriguingly, the parametric complexity now only depends on the unknown parameter domain maximum, Inline graphic .

Knowledge of Inline graphic is the main cost of our metric. This parameter limit requirement is not unusual and can often improve estimates. In (Parag and Pybus, 2017) and (Parag and Pybus, 2018), this knowledge facilitated exact inference from sampled phylogenies, for example. Similar domain choices are also implicitly made when setting prior distributions on Inline graphic and or practically performing MCMC sampling. In Fig. A3, we explored the effect of misspecifying . While drastic mismatches between the true and assumed can be detrimental, we found that in some cases poor knowledge of can be inconsequential. We adapted the QK metric (Qian and Kunsch 1998) to obtain Eqs 16 and 17 which, though less interpretable than the FIA, also somewhat account for parametric complexity and offer good performance should reasonable knowledge of Inline graphic be unavailable.

The FIA balances performance with simplicity. The MDL method it approximates has the desirable theoretical properties of generalizability (it mediates overfitting and underfitting) and consistency (it selects the true model with increasing probability as data accumulate) (Grunwald 2007). We therefore investigated whether the FIA maintained these properties. In Figs 3 and 5, we demonstrated that the FIA not only inherits the generalizability property, but also regulates its selections based on the available data. Higher data resolution supports larger Inline graphic as both bias and variance can be simultaneously reduced under these conditions (van Erven and Grunwald 2012). Figures 4, 6, A1, and A2 confirmed the consistency of the FIA, in addition to benchmarking its performance against the comparable AIC and BIC. We found that the FIA consistently outperformed all other metrics, provided that Inline graphic was not drastically misspecified.

We recommend the FIA as a principled, transparent and computationally simple means of adaptively estimating informative changes in Inline graphic and , and for diagnosing the relative contributions of different components of model complexity. We provide software for computing the FIA in the Supplementary Material. The FIA can be easily interfaced with the EpiEstim and projections packages (Cori et al. 2013; Nouvellet et al. 2018), which are common renewal model toolboxes for analyzing real epidemic data, to formalize the window size choices used in Inline graphic inference. Until now, these choices have been subjective. For skyline analyses, we propose the FIA as a useful diagnostic for verifying the estimates generated by phylogenetic software such as BEAST or phylodyn (Karcher et al. 2017; Suchard et al. 2018). This can help validate or interrogate the outputs of common but complex MCMC methods. Comparing MCMC grouping choices to the FIA-optimized Inline graphic for example might help flag when known issues such as oversmoothing (underfitting) are biasing estimates (Minin et al. 2008; Parag et al. 2020a).

Sampled phylogenies and incidence curves, and hence skyline and renewal models, have often been treated separately in the epidemiological and phylodynamics literature. While they do solve different problems, we showed how refocusing on their shared piecewise-Poisson framework exposed their common complexity properties. Our information theoretic approach could also generate broad insight into other distinct models in genetics, molecular evolution, and ecology (Parag and Pybus 2019). The structured coalescent model is often used to estimate migration rate and population size changes from phylogeographic data (Beerli and Felsenstein 2001) while sequential Markovian coalescent methods are widely applied to infer demographic changes from metazoan genomes (Li and Durbin 2011). These models all involve Poisson count and histogram records and piecewise parameter sets and are promising candidates for future application of our metrics.

Appendix

Binary Model Selection

We examine the binary classification performance of the FIA, QK, BIC, and AIC for both renewal and skyline models. For the first, we set Inline graphic days and use a constant null model with , to exemplify an uncontrolled epidemic. The alternative model changes to , simulating rapid control at (inset of Fig. A1). We randomly generate epidemics with some null model probability () and compute the frequentist probability that each criterion selects the correct model ( Inline graphic ) in Fig. A1. We find that the FIA uniformly outperforms all other criteria, with the QK as its closest competitor. The AIC performs poorly, as does (not shown), because they are biased towards the more complex model. Relative metric performance is unchanged if we instead set (an accelerating epidemic).

Figure A1. — Binary renewal model selection. The consistency of several selection criteria is tested on a binary classification problem in which the null model 1 has no change in (solid, inset), while the alternative model 2 has a rapid decline (dashed, inset). We generate independent incidence curves randomly according to model 1 with probability , and compute the ability of each criterion to decipher the correct model, . The FIA outperforms other metrics at every with QK a close second.

For the skyline problem, we test between a Kingman coalescent null model (Kingman 1982) with Inline graphic , and an alternative with a single shift to that simulates rapid change potentially due to some environmental driver at units. We set and generate 500 replicate phylogenies, with controlling the quantity of data available per piecewise component (so the total number of coalescent events is Inline graphic ). This is a slight abuse of previous definitions of but is more useful here as we want for the null model and for the alternative. We introduce sampled tips at 0 and time units only. The grouping parameter search space is with . Figure A2 presents our main results, showing that the FIA is, overall, more accurate (achieving a higher sum of Inline graphic ) with the QK second. We find relative performance to be largely unchanged with and to hold when is doubled. Observe that all metrics except the AIC (which is known to be inconsistent) improve with data size, .

Figure A2. — Binary skyline model selection. We simulate 500 conditionally independent phylogenies from skyline models and test the classification ability of model selection criteria. The null model is a Kingman coalescent with , and the alternative features a sharp fall to at time units. The sampled tips of the phylogeny are introduced at and only. Graph (i) gives the probability of correct classification as a function of data size . The FIA performs best, on average, but the BIC is better at small . Graph (ii) gives the true (TPR) and false positive rates (FPR) of the metrics. The FIA and QK have the best overall rates.

Figure A3. — FIA parameter space sensitivity. In (a), we repeat the simulations from Fig. 4(b)(ii) but at different . The accuracy of the FIA clearly depends on the discrepancy between and , and becomes inferior when is dramatically above this maximum ((i)). In (b), we revisit the simulations of Fig. 6(a), but vary between and . The AIC, BIC, and QK from Fig. 6(a) are in cyan, while the best and worst case FIA values are in grey. While the FIA does depend on , interestingly, its performance is still superior on average, for both ((i)) and ((ii)).

Figure A4. — HIV demographic estimates. We estimate the effective population size history underlying an empirical HIV phylogeny with coalescent events. All tips of this tree are sampled in 1997 from the Democratic Republic of Congo. We plot the generalized skyline, the multiple change-point method and the FIA-optimized skyline estimates in (i), (ii), and (iv) in black against the classic skyline in grey. In (iii), we show the optimization over of the FIA (black) against its associated BIC (grey).

Weaknesses of Piecewise Model Selection

In Results section, we found the FIA to be a viable and top performing model selection strategy, when compared to standard metrics of similar computability such as the AIC and BIC. However, the FIA can do worse if the parameter maximum Inline graphic is large relative to the actual domain or space from which or is drawn. In such cases, the incorrect parameter bounds can cause the FIA to overestimate the complexity of the generating renewal or skyline models. While the QK criterion offers a more stable and reasonably performing MDL alternative, it is less interpretable. Here, we examine the nature of this Inline graphic dependence, and discuss some general issues limiting piecewise model selection.

In Fig. 4(b)(ii), we showed the FIA outperforming other metrics for a model selection problem over piecewise Inline graphic functions drawn within the artificial range (the AIC was better at higher due to its tendency to overfit). We achieved this by setting to the true . However, when there is a significant mismatch between and we find that the FIA is notably inferior to the QK and BIC. Figure A3(a) illustrates, at Inline graphic and , how the magnitude of this mismatch influences relative performance. However, this effect is not always important, as seen in Fig. 4(b)(i), where and . The skyline model also has this FIA -dependence. We re-examine the square wave model selection problem of Fig. 6(b), but for ranging between Inline graphic and . Figure A3(b) plots the resulting changes in the FIA detection probability at ((i)) and ((ii)). There we observe, that while the FIA is sensitive to , it still performs well over the entire range. Thus, the FIA can sometimes be a choice selection metric, even in the absence of reasonable parameter space knowledge.

Lastly, we comment on some general issues limiting Inline graphic -selection performance of any metric on renewal and skyline models. The MLEs and FIs of the renewal model depend on the and groups. As a result, epidemics with low observed incidence (i.e., likely to have ) and diseases possessing sharp (low variance) generation time distributions (i.e., likely to feature Inline graphic ) will be difficult to adaptively estimate. This is why we conditioned on the epidemic not dying out. Similarly, the MLEs and FIs of the skyline are sensitive to , meaning that it is necessary to ensure each group has coalescent events falling within its duration. Forcing segment end-points to coincide with coalescent events, as in (Drummond et al., 2005), guards against this identifiability problem (Parag and Pybus 2019). However, skyline model selection remains difficult even after averting this issue.

This follows from the random timing of coalescent events, which means that regular Inline graphic groupings can miss change-points, and that long branches can bias analysis (Parag et al. 2020b). These are known skyline plot issues and evidence why we truncated the last few events in the simulations. Further, there will always be limits to the maximum temporal precision attainable by Inline graphic and estimates under renewal and skyline models. It is impossible to infer changes in on a finer time scale than that of the observed incidence curve or estimate more segments than the number of available coalescent events (Parag and Pybus 2019). This cautions against naively applying the criteria we have developed here. It is necessary to first understand and then prepare for these preconditions before sensible model selection results can be obtained. Practical model selection is rarely straightforward and the performance of most metrics is often only strictly guaranteed under asymptotic conditions (Grunwald 2007).

Empirical Case Study: HIV-1

We consider an empirical, ultrametric phylogeny composed of HIV-1 sequences sampled in 1997 from the Democratic Republic of Congo. This data set was previously examined in (Strimmer and Pybus, 2001). In Fig. A4, we illustrate several estimates of the effective population size underling this phylogeny. As a baseline, we plot the classic skyline plot (Pybus et al. 2000) for this phylogeny in grey on (i), (ii), and (iv). This represents the maximally parametrized skyline model and is known to overfit. Because the classic skyline converts every coalescent interval into a population size estimate, it also portrays where the events in the HIV tree are located. The clustering of estimates between 1940 and 1980 indicates that this period is considerably more informative (i.e., has a higher count record event density) than the time-regions around it.

We investigate two extremes of skyline model selection methodology. In Fig. A4(i), we consider the generalized skyline plot, which uses a small sample AIC. This method is simple, computable and improves on the noisy classic skyline by using Inline graphic . However, it does require more extensive optimization than our metrics (it chooses groups based on their durations) and can be susceptible to overfitting (Kass and Raftery 1995). In (ii), we plot estimates from the multiple change-point method of (Opgen-Rhein et al., 2005). This approach is computationally intensive and lacks transparency but uses powerful reversible jump MCMC algorithms. Its output smooths over all demographic fluctuations.

In (iv), we compute the FIA solution with Inline graphic , which mediates between (i) and (ii). The FIA responds to the varying data-density across the tree by using notably more parameters than (i) in the 1940–1980 period, where it can be confident of a smooth trend, and fewer otherwise. This approach to group choice was theoretically supported in (Parag and Pybus 2019). In (iii), we compare the FIA ( Inline graphic ) and BIC () curves, where we find that they agree at small due to the large sample size in that region (the BIC is an asymptotic approximation to the FIA). However, at larger , where the space of parameter interactions is more notable, the parametric complexity terms matter.

Supplementary material

Data (and code in Matlab) available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.mpg4f4qv6.

Funding

K.V.P. and C.A.D. acknowledge joint funding from the UK Medical Research Council (MRC) and the UK Department for International Development (DFID) under the MRC/DFID Concordat agreement and the EDCTP2 programme supported by the European Union (grant MR/R015600/1). C.A.D. thanks the UK National Institute for Health Research Health Protection Research Unit (NIHR HPRU) in Modelling Methodology at Imperial College London in partnership with Public Health England (PHE) for funding (grant HPRU-2012-10080)

References

Barron A., Rissanen J., Yu B. 1998. The minimum description length principle in coding and modeling. IEEE Trans. Inform. Theory 44:2743–2760. [Google Scholar]
Beerli P., Felsenstein J. 2001. Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc. Natl. Acad. Sci. USA, 98:4563–4568. [DOI] [PMC free article] [PubMed] [Google Scholar]
Churcher T., Cohen J., Novotny J., Ntshalintshali N., Kunene S., and Cauchemez S. 2014. Measuring the path toward malaria elimination. Science 344:1230–1232. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cori A., Ferguson N., Fraser C., and Cauchemez S. 2013. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am. J. Epidemiol. 178:1505–1512. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cover T. and Thomas J. 2006. Elements of information theory. 2nd ed New York: John Wiley and Sons. [Google Scholar]
Drummond A., Rambaut A., Shapiro B., and Pybus O. 2005. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22:1185–1192. [DOI] [PubMed] [Google Scholar]
Fraser C. 2007. Estimating individual and household reproduction numbers in an emerging epidemic. PLoS One 8:e758. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fraser C., Cummings D., Klinkenberg D., Burke D., and Ferguson N. 2011. Influenza transmission in households during the 1918 pandemic. Am. J. Epidemiol. 174:505–514. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gill M., Lemey P., Faria N., Rambaut A., Shapiro B., and Suchard M. 2012. Improving Bayesian population dynamics inference:a coalescent-based model for multiple loci. Mol. Biol. Evol. 30:713–724. [DOI] [PMC free article] [PubMed] [Google Scholar]
Grunwald P. 2007. The minimum description length principle. Cambridge (MA): The MIT Press. [Google Scholar]
Hanson A. and Fu P. 2004. Applications of MDL to selected families of models In: Advances in minimum description length:theory and applications. MIT Press. [Google Scholar]
Heled J. and Drummond A. 2008. Bayesian inference of population size history from multiple loci. BMC Evol. Biol. 8(289). [DOI] [PMC free article] [PubMed] [Google Scholar]
Ho S. and Shapiro B. 2011. Skyline-plot methods for estimating demographic history from nucleotide sequences. Mol. Ecol. Res. 11:423–434. [DOI] [PubMed] [Google Scholar]
Karcher M., Palacios J., Lan S., Minin V. 2017. PHYLODYN:an R package for phylodynamic simulation and inference. Mol. Ecol. Res. 17:96–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kass R. and Raftery A. 1995. Bayes factors. J. Am. Stat. Assoc. 90:773–795. [Google Scholar]
Kingman J. 1982. On the genealogy of large populations. J. Appl. Prob. 19:27–43. [Google Scholar]
Lehmann E. and Casella G. 1998. Theory of point estimation. 2nd ed Springer. [Google Scholar]
Lemey P., Pybus O., Wang B., Saksena N., Salemi M., and Vandamme A. 2003. Tracing the origin and history of the HIV-2 epidemic. Proc. Natl. Acad. Sci. USA 100:6588–6592. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H. and Durbin R. 2011. Inference of human population history from individual whole-genome sequences. Nature 475:493–496. [DOI] [PMC free article] [PubMed] [Google Scholar]
Minin V., Bloomquist E., and Suchard M. 2008. Smooth skyride through a rough skyline:Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25:1459–1471. [DOI] [PMC free article] [PubMed] [Google Scholar]
Myung J., Navarro D., and Pitt M. 2006. Model selection by normalized maximum likelihood. J. Math. Psychol. 50:167–179. [Google Scholar]
Nordborg M. 2001. Handbook of statistical genetics:coalescent theory. Chichester, UK: John Wiley and Sons. [Google Scholar]
Nouvellet P., Cori A., Garske T., Blake I., Dorigatti I., Hinsley W., Jombart T., Mills H., Nedjati-Gilani G., Van Kerkhove M., Fraser C., Donnelly C., Ferguson N., and Riley S. 2018. A simple approach to measure transmissibility and forecast incidence. Epidemics 22:29–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
Opgen-Rhein R., Fahrmeir L., and Strimmer K. 2005. Inference of demographic history from genealogical trees using reversible jump Markov chain Monte Carlo. BMC Evol. Biol. 5(6). [DOI] [PMC free article] [PubMed] [Google Scholar]
Parag K., Pybus O. 2017. Optimal point process filtering and estimation of the coalescent process. J. Theor. Biol. 421:153–167. [DOI] [PubMed] [Google Scholar]
Parag K., Pybus O. 2018. Exact Bayesian inference for phylogenetic birth-death models. Bioinformatics 34:3638–3645. [DOI] [PubMed] [Google Scholar]
Parag K., Pybus O. 2019. Robust design for coalescent model inference. Syst. Biol. 68:730–743. [DOI] [PubMed] [Google Scholar]
Parag K., Pybus O., Wu C. 2020a. Are skyline plot-based demographic estimates overly dependent on smoothing prior assumptions? BioRxiv 920215. [DOI] [PMC free article] [PubMed] [Google Scholar]
Parag K., du Plessis L., Pybus O. 2020b. Jointly inferring the dynamics of population size and sampling intensity from molecular sequences. Mol. Biol. Evol. (in press) doi: 10.1093/molbev/msaa016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pitt M., Myung I., Zhang S. 2002. Toward a method of selecting among computational models of cognition. Psych. Rev. 109:472–491. [DOI] [PubMed] [Google Scholar]
Pybus O., Rambaut A., Harvey P. 2000. An integrated framework for the inference of viral population history from reconstructed genealogies. Genetics 155:1429–1437. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pybus O., Charleston M., Gupta S., Rambaut A., Holmes E., and Harvey P. 2001. The epidemic behavior of the hepatitis C virus. Science 292:2323–2325. [DOI] [PubMed] [Google Scholar]
Qian G., Kunsch H. 1998. Some notes on Rissanen’s stochastic complexity. IEEE Trans. Inf. Theory 44:782–786. [Google Scholar]
Rambaut A., Pybus O., Nelson M., Viboud C., Taubenberger J., and Holmes E. 2008. The genomic and epidemiological dynamics of human influenza A virus. Nature 453:615–619. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rissanen J. 1978. Modeling by shortest data description. Automatica 14:465–471. [Google Scholar]
Rissanen J. 1996. Fisher information and stochastic complexity. IEEE Trans. Inf. Theory 42:40–47. [Google Scholar]
Snyder D., Miller M. 1991. Random point processes in time and space. 2nd ed New York: Springer. [Google Scholar]
Strimmer K., Pybus O. 2001. Exploring the demographic history of DNA sequences using the generalized skyline plot. Mol. Biol. Evol. 18:2298–2305. [DOI] [PubMed] [Google Scholar]
Suchard M., Lemey P., Baele G., Ayres D., Drummond A., and Rambaut A. 2018. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Turchin P. 2003. Complex population dynamics:a theoretical/empirical synthesis. Princeton University Press. [Google Scholar]
van Erven T. and Grunwald P. 2012. Catching up faster by switching sooner:a predictive approach to adaptive estimation with an application to the AIC–BIC dilemma. J. R. Stat. Soc. B, 74:361–417. [Google Scholar]
Wallinga J. and Lipsitch M. 2007. How generation intervals shape the relationship between growth rates and reproductive numbers. Proc. R. Soc. B, 274:599–604. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wallinga J. and Teunis P. 2004. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. Am. J. Epidemiol. 160:509–516. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B1] Barron A., Rissanen J., Yu B. 1998. The minimum description length principle in coding and modeling. IEEE Trans. Inform. Theory 44:2743–2760. [Google Scholar]

[B2] Beerli P., Felsenstein J. 2001. Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc. Natl. Acad. Sci. USA, 98:4563–4568. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] Churcher T., Cohen J., Novotny J., Ntshalintshali N., Kunene S., and Cauchemez S. 2014. Measuring the path toward malaria elimination. Science 344:1230–1232. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] Cori A., Ferguson N., Fraser C., and Cauchemez S. 2013. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am. J. Epidemiol. 178:1505–1512. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] Cover T. and Thomas J. 2006. Elements of information theory. 2nd ed New York: John Wiley and Sons. [Google Scholar]

[B6] Drummond A., Rambaut A., Shapiro B., and Pybus O. 2005. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22:1185–1192. [DOI] [PubMed] [Google Scholar]

[B7] Fraser C. 2007. Estimating individual and household reproduction numbers in an emerging epidemic. PLoS One 8:e758. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] Fraser C., Cummings D., Klinkenberg D., Burke D., and Ferguson N. 2011. Influenza transmission in households during the 1918 pandemic. Am. J. Epidemiol. 174:505–514. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] Gill M., Lemey P., Faria N., Rambaut A., Shapiro B., and Suchard M. 2012. Improving Bayesian population dynamics inference:a coalescent-based model for multiple loci. Mol. Biol. Evol. 30:713–724. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] Grunwald P. 2007. The minimum description length principle. Cambridge (MA): The MIT Press. [Google Scholar]

[B11] Hanson A. and Fu P. 2004. Applications of MDL to selected families of models In: Advances in minimum description length:theory and applications. MIT Press. [Google Scholar]

[B12] Heled J. and Drummond A. 2008. Bayesian inference of population size history from multiple loci. BMC Evol. Biol. 8(289). [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] Ho S. and Shapiro B. 2011. Skyline-plot methods for estimating demographic history from nucleotide sequences. Mol. Ecol. Res. 11:423–434. [DOI] [PubMed] [Google Scholar]

[B14] Karcher M., Palacios J., Lan S., Minin V. 2017. PHYLODYN:an R package for phylodynamic simulation and inference. Mol. Ecol. Res. 17:96–100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] Kass R. and Raftery A. 1995. Bayes factors. J. Am. Stat. Assoc. 90:773–795. [Google Scholar]

[B16] Kingman J. 1982. On the genealogy of large populations. J. Appl. Prob. 19:27–43. [Google Scholar]

[B17] Lehmann E. and Casella G. 1998. Theory of point estimation. 2nd ed Springer. [Google Scholar]

[B18] Lemey P., Pybus O., Wang B., Saksena N., Salemi M., and Vandamme A. 2003. Tracing the origin and history of the HIV-2 epidemic. Proc. Natl. Acad. Sci. USA 100:6588–6592. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] Li H. and Durbin R. 2011. Inference of human population history from individual whole-genome sequences. Nature 475:493–496. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] Minin V., Bloomquist E., and Suchard M. 2008. Smooth skyride through a rough skyline:Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25:1459–1471. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] Myung J., Navarro D., and Pitt M. 2006. Model selection by normalized maximum likelihood. J. Math. Psychol. 50:167–179. [Google Scholar]

[B22] Nordborg M. 2001. Handbook of statistical genetics:coalescent theory. Chichester, UK: John Wiley and Sons. [Google Scholar]

[B23] Nouvellet P., Cori A., Garske T., Blake I., Dorigatti I., Hinsley W., Jombart T., Mills H., Nedjati-Gilani G., Van Kerkhove M., Fraser C., Donnelly C., Ferguson N., and Riley S. 2018. A simple approach to measure transmissibility and forecast incidence. Epidemics 22:29–35. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] Opgen-Rhein R., Fahrmeir L., and Strimmer K. 2005. Inference of demographic history from genealogical trees using reversible jump Markov chain Monte Carlo. BMC Evol. Biol. 5(6). [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] Parag K., Pybus O. 2017. Optimal point process filtering and estimation of the coalescent process. J. Theor. Biol. 421:153–167. [DOI] [PubMed] [Google Scholar]

[B26] Parag K., Pybus O. 2018. Exact Bayesian inference for phylogenetic birth-death models. Bioinformatics 34:3638–3645. [DOI] [PubMed] [Google Scholar]

[B27] Parag K., Pybus O. 2019. Robust design for coalescent model inference. Syst. Biol. 68:730–743. [DOI] [PubMed] [Google Scholar]

[B28] Parag K., Pybus O., Wu C. 2020a. Are skyline plot-based demographic estimates overly dependent on smoothing prior assumptions? BioRxiv 920215. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] Parag K., du Plessis L., Pybus O. 2020b. Jointly inferring the dynamics of population size and sampling intensity from molecular sequences. Mol. Biol. Evol. (in press) doi: 10.1093/molbev/msaa016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30] Pitt M., Myung I., Zhang S. 2002. Toward a method of selecting among computational models of cognition. Psych. Rev. 109:472–491. [DOI] [PubMed] [Google Scholar]

[B31] Pybus O., Rambaut A., Harvey P. 2000. An integrated framework for the inference of viral population history from reconstructed genealogies. Genetics 155:1429–1437. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] Pybus O., Charleston M., Gupta S., Rambaut A., Holmes E., and Harvey P. 2001. The epidemic behavior of the hepatitis C virus. Science 292:2323–2325. [DOI] [PubMed] [Google Scholar]

[B33] Qian G., Kunsch H. 1998. Some notes on Rissanen’s stochastic complexity. IEEE Trans. Inf. Theory 44:782–786. [Google Scholar]

[B34] Rambaut A., Pybus O., Nelson M., Viboud C., Taubenberger J., and Holmes E. 2008. The genomic and epidemiological dynamics of human influenza A virus. Nature 453:615–619. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B35] Rissanen J. 1978. Modeling by shortest data description. Automatica 14:465–471. [Google Scholar]

[B36] Rissanen J. 1996. Fisher information and stochastic complexity. IEEE Trans. Inf. Theory 42:40–47. [Google Scholar]

[B37] Snyder D., Miller M. 1991. Random point processes in time and space. 2nd ed New York: Springer. [Google Scholar]

[B38] Strimmer K., Pybus O. 2001. Exploring the demographic history of DNA sequences using the generalized skyline plot. Mol. Biol. Evol. 18:2298–2305. [DOI] [PubMed] [Google Scholar]

[B39] Suchard M., Lemey P., Baele G., Ayres D., Drummond A., and Rambaut A. 2018. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40] Turchin P. 2003. Complex population dynamics:a theoretical/empirical synthesis. Princeton University Press. [Google Scholar]

[B41] van Erven T. and Grunwald P. 2012. Catching up faster by switching sooner:a predictive approach to adaptive estimation with an application to the AIC–BIC dilemma. J. R. Stat. Soc. B, 74:361–417. [Google Scholar]

[B42] Wallinga J. and Lipsitch M. 2007. How generation intervals shape the relationship between growth rates and reproductive numbers. Proc. R. Soc. B, 274:599–604. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B43] Wallinga J. and Teunis P. 2004. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. Am. J. Epidemiol. 160:509–516. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Adaptive Estimation for Epidemic Renewal and Phylogenetic Skyline Models

Kris V Parag

Christl A Donnelly

Roles

Abstract

Figure 1.

Materials and Methods

Phylogenetic Skyline and Epidemic Renewal Models

Model and Parametric Complexity

Results

The Insufficiency of Log-Likelihoods

Figure 2.

Minimum Description Length Selection

Adaptive Estimation: Epidemic Renewal Models

Figure 3.

Figure 4.

Adaptive Estimation: Phylogenetic Skyline Models

Figure 5.

Figure 6.

Discussion

Appendix

Binary Model Selection

Figure A1.

Figure A2.

Figure A3.

Figure A4.

Weaknesses of Piecewise Model Selection

Empirical Case Study: HIV-1

Supplementary material

Funding

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Adaptive Estimation for Epidemic Renewal and Phylogenetic Skyline Models

Kris V Parag

Christl A Donnelly

Roles

Abstract

Figure 1.

Materials and Methods

Phylogenetic Skyline and Epidemic Renewal Models

Model and Parametric Complexity

Results

The Insufficiency of Log-Likelihoods

Figure 2.

Minimum Description Length Selection

Adaptive Estimation: Epidemic Renewal Models

Figure 3.

Figure 4.

Adaptive Estimation: Phylogenetic Skyline Models

Figure 5.

Figure 6.

Discussion

Appendix

Binary Model Selection

Figure A1.

Figure A2.

Figure A3.

Figure A4.

Weaknesses of Piecewise Model Selection

Empirical Case Study: HIV-1

Supplementary material

Funding

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases