Abstract
Many important stochastic counting models can be written as general birth-death processes (BDPs). BDPs are continuous-time Markov chains on the non-negative integers in which only jumps to adjacent states are allowed. BDPs can be used to easily parameterize a rich variety of probability distributions on the non-negative integers, and straightforward conditions guarantee that these distributions are proper. BDPs also provide a mechanistic interpretation – birth and death of actual particles or organisms – that has proven useful in evolution, ecology, physics, and chemistry. Although the theoretical properties of general BDPs are well understood, traditionally statistical work on BDPs has been limited to the simple linear (Kendall) process. Aside from a few simple cases, it remains impossible to find analytic expressions for the likelihood of a discretely-observed BDP, and computational difficulties have hindered development of tools for statistical inference. But the gap between BDP theory and practical methods for estimation has narrowed in recent years. There are now robust methods for evaluating likelihoods for realizations of BDPs: finite-time transition, first passage, equilibrium probabilities, and distributions of summary statistics that arise commonly in applications. Recent work has also exploited the connection between continuously- and discretely-observed BDPs to derive EM algorithms for maximum likelihood estimation. Likelihood-based inference for previously intractable BDPs is much easier than previously thought and regression approaches analogous to Poisson regression are straightforward to derive. In this review, we outline the basic mathematical theory for BDPs and demonstrate new tools for statistical inference using data from BDPs.
Graphical abstract
Realization of a birth-death process X(t).

INTRODUCTION
Birth-death processes (BDPs) are a flexible class of continuous-time Markov chains that model the number of “particles” in a system, where each particle can “give birth” to another particle or “die” (Feller, 1971; Karlin & Taylor, 1975). The rate of births and deaths at any given time depends on how many extant particles there are. When there are k particles, a birth occurs with instantaneous rate λk and a death with instantaneous rate μk. In the classical “simple linear” BDP, λk = kλ and μk = kμ so that per-particle birth and death rates remain constant. In a “general” BDP, λk and μk can be any function of k but are time-homogeneous (Kendall, 1948, 1949). Table 1 gives examples of well-known BDPs and their birth and death rates. Figure 1 shows an example realization from a BDP.
Table 1.
Some well-known BDPs with birth and death rates λk and μk. The SI and SIS models refers to the susceptible-infectious(-susceptible) process in epidemiology in which there are k infectious individuals in a finite population of size N. The Moran/Ehrenfest process models the change in the numbers of particles of two types, where transitions between types occur at a rate proportional to the number of potential contacts between members each type in a finite population of size N.
| Model | λk | μk |
|---|---|---|
| Poisson | λ | 0 |
| Yule/Pure birth | kλ | 0 |
| Survival/Pure death | 0 | kμ |
| Kendall | kλ | kμ |
| Kendall + immigration | kλ +α | kμ |
| M/M/1 queue | λ | μ |
| M/M/c queue | λ | min(k, c)μ |
| M/M/∞ queue | λ | kμ |
| SI/Logistic | k(N – k)λ | 0 |
| SIS | k(N – k)λ | kμ |
| Moran/Ehrenfest | k(N – k)λ | k(N – k)μ |
Figure 1.
Stochastic simulation of a BDP starting at X(0) = 1 on the interval 0 < t < 2.
The usefulness of BDPs lies in the fact that “particle” can refer to a member of any discrete potentially interacting system in which one only keeps track of the number of objects in existence. BDPs are popular modeling tools in evolution, population biology, genetics, and ecology (Novozhilov, Karev, & Koonin, 2006). For example, if we interpret the particles as species in a macro-evolutionary setting, BDPs can be used to study speciation and extinction over evolutionary timescales (Nee, 2006; Nee, May, & Harvey, 1994). BDPs can also be used to study infectious disease dynamics in a finite population, where the number of individuals infected is the quantity of interest (Andersson & Britton, 2000; N. T. J. Bailey, 1964). In molecular evolution, BDPs can model inserted and deleted nucleotides in a DNA or RNA sequence as part of a probabilistic alignment method (Holmes & Bruno, 2001; Thorne, Kishino, & Felsenstein, 1991), mobile/transposable genetic elements (Rosenberg, Tsolaki, & Tanaka, 2003), gene families (Demuth, De Bie, Stajich, Cristianini, & Hahn, 2006), or even whole chromosomes (Mayrose, Barker, & Otto, 2010). BDPs can model populations of organisms in a resource-limited environment (Renshaw, 1993, 2011; Tan & Piantadosi, 1991). In finite populations, BDPs are commonly used to model quantities of interest in an evolutionary setting, such as mutations, allele frequencies, selection, or coalescence (Fudenberg, Imhof, Nowak, & Taylor, 2004; Kingman, 1982; Krone & Neuhauser, 1997; McFarland, Mirny, & Korolev, 2014; P. A. P. Moran, 1958; Nowak, Michor, & Iwasa, 2003; Nowak, Sasaki, Taylor, & Fudenberg, 2004).
Branching processes are popular tools used in biological modeling that share some of these properties with BDPs (Guttorp, 1991; Kimmel & Axelrod, 2016). Branching processes are specified by assumptions about what happens to individual particles, which typically behave independently: they can die, give birth to or be replaced by offspring of the same or different types. But the nature of this process is individualistic – rules about how individual particles behave in isolation give rise to models for population-level dynamics. In contrast, BDPs are specified at the population level. The conditional rate of particle birth and death given the numbers in existence can depend arbitrarily on the current size of the population.
Many important models in queuing theory can be written as general BDPs (Norris, 1998; Renshaw, 2011; Ross, 1995). In basic Markovian queues, customers arrive into a queue or buffer as a Poisson process with rate λ, and waiting customers are served (removed from the queue) with per-customer service rate μ. In the M/M/∞ queue, also known as the immigration-death process, there are infinitely many servers, so the arrival and service (birth and death) rates are λk = λ and μk = kμ for k > 0. In the M/M/1 queue, also known as the immigration-emigration model, there is only a single server, so the rates are λk = λ and μk = μ. In the M/M/c queue, there are exactly c servers, so μk = min{c, k}μ.
BDPs can also be useful for defining arbitrary probability distributions on the non-negative integers. Crawford and Zelterman (2015) demonstrate that any sum of exchangeable Bernoulli random variables can be exactly represented as a pure-birth BDP. In fact, M. J. Faddy (1997) shows that one can define a pure birth process (a BDP with death rates μk = 0 for all k) whose transition probabilities reproduce any discrete distribution on the counting numbers. Klar, Parthasarathy, and Henze (2010) establish a correspondence between several power law distributions and the long-time limit of specially constructed BDPs, providing a time-dependent interpretation that may be useful for modelling mechanistic processes that give rise to power law outcomes. Sometimes this power law behavior is motivated by a mechanistic model: researchers have developed models and statistical methodology for estimating BDP parameters for power law behavior in protein domains (Karev, Wolf, Rzhetsky, Berezovskaya, & Koonin, 2002) and gene and protein family sizes (Reed & Hughes, 2004). Crawford and Suchard (2012) define a BDP to mimic a process of frameshift-aware insertions and deletions in DNA sequences. Lee, Weiss, and Suchard (2011) set the birth and death rates of a BDP to exhibit over-dispersion relative to the Poisson distribution, and Crawford, Weiss, and Suchard (2015) define a BDP to model rounding in counts of sex partners to multiples of 5, 10, 25, or 50 in self-reported counts of sex partners in a public health study.
There is a rich history of theoretical research into the properties of BDPs. Kendall (1948, 1949) introduce the process with constant per-particle birth and death rates and finds the transition probabilities by a generating function argument. In their groundbreaking series of papers, Karlin and McGregor (1957b) analyze properties of BDPs, including stationary distributions, moments, transition probabilities, recurrence and passage times, and other quantities of interest (Karlin & McGregor, 1957a, 1957b). They also explore in-depth applications of this theory to BDPs whose rates depend linearly on k (Karlin & McGregor, 1958a), and queuing processes (Karlin & McGregor, 1958b).
Beyond the work of Karlin and McGregor (1957b), many authors have discovered extensions and deeper interpretations for the theoretical properties of BDPs. For example, the theory of BDPs is intimately related to properties of continued fractions (Guillemin & Pinchon, 1999). Flajolet and Guillemin (2000) elucidate the relationship between sample trajectories (or state paths) of a BDP and lattice path combinatorics via continued fractions and develop expressions for a variety of recurrence and passage time variables in terms of continued fractions. Lenin and Parthasarathy (2000) and Parthasarathy, Lenin, Schoutens, and Van Assche (1998) discuss further some well-known continued fractions whose connection to BDPs previously went unappreciated.
The study of BDPs has benefited from wide interest in the theoretical properties of this class of processes. But their usefulness as flexible tools for statistical inference has been under-appreciated. In this review, we outline basic properties of BDPs and show how to perform principled statistical inference using data from continuous and discrete observation of BDPs. First, we present the basic time-evolution equations of general BDPs, derive the transition probabilities for the Kendall process (Feller, 1971; Kendall, 1948), and describe the analytic theory developed by Karlin and McGregor (1957a, 1957b) for general BDPs. Then we outline a computational strategy for evaluating BDP transition probabilities using a continued fraction representation of their Laplace transform, which allows routine computation of likelihoods for discretely observed processes (Crawford & Suchard, 2012). We describe a generic class of EM algorithms for maximum likelihood (or maximum a posteriori ) inference for discretely observed BDPs (Crawford, Minin, & Suchard, 2014). Finally, we derive the distribution of integral summary statistics of BDPs that arise often in applications.
BACKGROUND
A BDP is a continuous-time Markov chain X(t) counting the number of particles in a system at time t, taking values on the non-negative integers ℕ. To construct a general BDP in a formal way, we must define the rules according to which the number of particles evolves. We do this by specifying the behavior of the process for a very short time dt, when there are k particles in the system. If dt is very small, the probability of an event during (t, t+dt) that occurs with rate r is approximately rdt. Therefore, the probability of a birth in the interval (t, t + dt), given X(t) = k, is
| (1) |
Intuitively, this means that the probability of more than one birth event in a small time dt is negligibly small. The probability of a death in (t, t + dt) is likewise
| (2) |
where k ≥ 1. Together, these assumptions imply that the probability of no births or deaths occurring during (t, t + dt) is
| (3) |
TRANSITION PROBABILITIES
Let Pab(t) = Pr(X(t) = b | X(0) = a) be the transition probability from state X(0) = a to X(t) = b. We can use the above expressions to form a differential equation describing the change in transition probabilities over time. Suppose that X(0) = a. At the current time t, we want to know the probability that in the next dt units of time, the process will reach state b. We look into the future by writing the probabilities of three types of events that can take the process to state b: birth from b − 1, death from b + 1, or no change from b:
| (4) |
Subtracting Pab(t) from both sides, dividing by dt, and sending dt to zero, we obtain the Kolmogorov forward equations:
| (5) |
where Pab(0) = 1 if a = b and zero otherwise. In this article, we always assume μ0 = λ−1 = 0; this keeps the process on the non-negative integers. Letting P(t) = {Pab(t)} in matrix form, (5) becomes
| (6) |
where A is the infinitesimal generator matrix with entries A = {aij}, ai,i−1 = μi, aii = −(λi+μi), and an,n+1 = λi. In the matrix case, the initial condition becomes P(0) = I. This infinite sequence of coupled ordinary differential equations can be difficult or impossible to solve for many general BDPs (Novozhilov et al., 2006; Renshaw, 2011).
KENDALL PROCESS
In the simple linear BDP, also known as the Kendall process where λk = kλ and μk = kμ, it is possible to solve for these transition probabilities explicitly by finding a generating function solution to the forward equations (N. T. J. Bailey, 1964; Lange, 2010a). To illustrate, let . Let b = k in (5), multiply both sides by sk, and sum on k to obtain
| (7) |
with the initial condition Ga(s, 0) = sa. The solution is
| (8) |
Inverting and finding the bth coefficient of the power series Ga(s, t), we find the transition probabilities
| (9) |
where
| (10) |
As in the Bienaymé-Galton-Watson branching process, the Kendall process gives rise to a linear-fractional distribution (Athreya & Ney, 2004) when starting from one individual (a = 1). Sagitov (2013) recently developed a multi-dimensional linear-fractional distribution to characterize the multi-type Bienaymé-Galton-Watson branching process with countably many types. This generalization may also be applicable for some multivariate extensions of the Kendall process.
GENERAL BDPS
The problem becomes much more complicated for general BDPs. Karlin and McGregor (1957b) present the definitive treatment of the existence of transition probabilities and other properties of BDPs. They obtain the following integral form for the transition probabilities:
| (11) |
where ω0 = 1 and ωk = (λ0 · · · λk−1)/(μ1 · · · μk) for k ≥ 1. Here, Qk(x), k = 0, 1, 2, … is a system of orthogonal polynomials and ψ(x) is an orthogonalizing spectral measure that are specific to a particular set of birth and death rates.
This integral representation is intuitively satisfying because the time-dependency of Pab(t) is contained entirely in the exponential term, and Pab(t) depends on Qa(x) and Qb(x) in a simple way. In addition, we have the obvious corollary that
| (12) |
Beyond these simple results related to the interpretation of (11), the formalism developed by Karlin and McGregor (1957b) makes possible deep analytic insight into the behavior of general BDPs, including recurrence times and first passage times. Notably, a similar spectral representation for the transition probabilities of time-inhomogeneous linear BDPs has been derived recently (Ohkubo, 2014).
EQUILIBRIUM PROBABILITIES AND EXPLOSION
Equilibrium solutions are straightforward to obtain (Renshaw, 2011). Setting the left-hand side of the Kolmogorov forward equations (5) to zero and replacing the finite-time transition probabilities Pab(t) with the equilibrium probabilities πb, we find that
| (13) |
Since this is the case for every b, it is true for b = 0 in particular, and μ0 = λ−1 = 0, so both sides of (13) are zero for every b by induction. This gives the detailed balance condition for continuous-time Markov chains,
| (14) |
Therefore every general BDP is a reversible Markov chain. Iterating the recurrence (14), we find that
| (15) |
where we have chosen π0 so that Σk πk = 1. Note that πk ∝ ωk for every k.
The birth and death rates for a general BDP may be such that the process “runs away” to infinity in finite time. This is known as explosive growth. Formally, suppose the process begins at X(0) = 0 and there are no absorbing states. Renshaw (2011) shows that the expected first passage time to infinity τ∞ is
| (16) |
where π1 = 1 and
| (17) |
for i > 1. When (16) diverges, the process is non-explosive, and the first passage time from 0 to any finite state j is almost surely finite. When (16) is finite, the first passage time to infinity is finite with non-zero probability.
One result of special interest to us gives the conditions under which a BDP with a given generator A is unique: Karlin and McGregor (1957a) show that there is only one transition probability matrix P(t) that satisfies (6) if and only if
| (18) |
This property assures that probability is conserved on the non-negative integers. We will always assume this is the case in what follows.
Despite the elegant representation (11) for the transition probabilities, it can be very difficult to find the polynomials {Qk(x)} (Novozhilov et al., 2006; Renshaw, 2011). Settings in which the expression (11) leads to a tractable analytic representations are rare, even for linear processes. As outlined in Crawford and Suchard (2012), Ismail, Letessier, and Valent (1988) give the polynomials Qk(x) and measure for the birth-death-immigration-emigration process with λk = kλ + ν and μk = kμ + γ, but a closed-form expression for transition probabilities remains out of reach. In addition, the problem of finding these polynomials and measure ψ(x) is a fundamentally analytical task, and is generally not amenable to computational solution. In other words, one cannot simply compute Pab(t) using a computer for an arbitrary set of birth and death rates {λk} and {μk} using the formula (11) alone.
Since analytic derivation of ψ(x) is so complicated, Renshaw (2011, page 111) writes of the need for an alternative approach to solving the forward system in order to find transition probabilities for general BDPs:
“A worthwhile and potentially rewarding challenge would be to develop a simplified and user-friendly version of this technique which would work over a wide range of stochastic processes.”
The next section is devoted to this task.
TRANSITION PROBABILITIES FOR GENERAL BDPS
We now outline a method, first presented in Crawford and Suchard (2012) and based on work by Murphy and O’donohoe (1975), for numerically computing the transition probabilities for a general BDP with arbitrary birth and death rates. To proceed, denote the Laplace transform of Pab(t) as
| (19) |
Now, applying the Laplace transform to (5) with a = 0, we have
| (20) |
for b ≥ 1. Recalling that P00(0) = 1 and P0b(0) = 0 for b ≥ 1, we rearrange (20) to find
| (21) |
By combining these recurrence relations, we obtain the generalized continued fraction
| (22) |
that is an exact expression for the Laplace transform of the transition probability P00(t) (Bordes & Roehner, 1983; Flajolet & Guillemin, 2000; Guillemin & Pinchon, 1999; Karlin & McGregor, 1957b). Now define a1 = 1, an = −λn−2μn−1, b1 = s+λ0 and bn = s+λn−1+μn−1 for n ≥ 2. Then (22) becomes
| (23) |
in more concise notation. We denote the kth convergent of the Laplace transform f00(s) by
| (24) |
The main result of Crawford and Suchard (2012) is the following theorem giving continued fraction expressions for the Laplace transform of the transition probability in a general birth-death process.
Theorem 1
The Laplace transform of the transition probability Pab(t) is given by
| (25) |
where an, bn, and Bn are as defined above.
The proof of this theorem relies on elementary manipulation of the continued fraction recurrences (21).
COMPUTATION OF TRANSITION PROBABILITIES
The Laplace transforms (25) usually cannot be inverted analytically to obtain time-domain transition probabilities. However, the representation (25) has several desirable properties for computational inversion. First, infinite continued fraction representations often converge much faster than their corresponding power series (Bankier & Leighton, 1942; Wall, 1948). Second, highly efficient numerical methods exist for evaluating continued fractions to a specified depth (Blanch, 1964; Lange, 2010b; Lorentzen & Waadeland, 1992; Wallis, 1972). Third, stable numerical inversion of Laplace transforms is well established, and methods exist for bounding the discretization and approximation error (Abate & Whitt, 1999; Craviotto, Jones, & Thron, 1993; Cuyt, Petersen, Verdonk, Waadeland, & Jones, 2008). For these reasons, Crawford and Suchard (2012) obtain time-domain transition probabilities Pab(t) from (25) by numerically inverting the Laplace transforms. Error bounds for numerical Laplace inversion are derived in Crawford and Suchard (2012) and Crawford et al. (2014).
Alternatively, when the state space of a BDP is finite, numerical computation of transition probabilities can sometimes be accomplished by matrix methods. For a BDP that takes values on {0, …, N}, consider the N + 1 × N + 1 stochastic transition rate matrix Q whose elements are
| (26) |
When the eigendecomposition Q = UDU−1 exists, where U is orthogonal and D is diagonal, the matrix of transition probabilities P(t) satisfies
| (27) |
with P(0) = I. Then the time-domain transition probabilities can obtained by matrix exponentiation,
| (28) |
When the state space is large or infinite, it is sometimes possible to truncate the state space at a suitably large index N and compute transition probabilities using (28). When the decomposition exists and can be found in a numerically stable way, Crawford, Stutz, and Lange (2016) give coupling bounds for finding an appropriate truncation index to control the truncation error.
FIRST PASSAGE TIMES
Now consider the time of first arrival of a BDP X(t) into an arbitrary set S of taboo states, and suppose X(0) = i ∈ ℕ \ S. This first passage time is defined formally as
| (29) |
To find the relationship between first passage times and the expressions for transition probabilities discussed above, construct a new process Y (t) identical to X(t) except that λj = μj = 0 for every j ∈ S, so every state in S is absorbing. Then for this modified process, with Pij(t) = Pr(Y (t) = j | Y (0) = i),
| (30) |
The intuitive reason for this equality is the absorbing nature of the states in S: if Y reaches an absorbing state j ∈ S at any time before t, we must have Y (t) = j. Furthermore, Y cannot visit more than one state in S, so the absorption events are mutually exclusive and the probability of absorption is simply the sum of the individual absorption probabilities. Therefore the cumulative distribution function of the first passage time into S is given by the sum of the transition probabilities from i to every taboo state in S for the modified process Y (t).
LIKELIHOODS
One factor hindering more widespread adoption of BDPs by applied researchers is the difficulty in performing statistical estimation of the unknown parameters in a BDP using real-world data (Doss, Suchard, Holmes, Kato-Maeda, & Minin, 2013; Holmes & Bruno, 2001). Typically efforts in estimation for BDPs have been limited to continuous observation of the process (Anscombe, 1953; Darwin, 1956; P. Moran, 1951, 1953; Reynolds, 1973; Wolff, 1965). In addition, much work to date has focused on the simple linear BDP because it is analytically tractable (Dauxois, 2004; Keiding, 1975; Rosenberg et al., 2003; Thorne et al., 1991), though important progress has been made in analysis of nonlinear processes (Karev et al., 2002; Klar et al., 2010; Reed & Hughes, 2004). In practice researchers often observe data from BDPs only at discrete times through longitudinal sampling. In addition, the simple linear BDP may be unappealing because it fails to capture more complicated dynamics of population growth and decay that arise when particles do not behave independently. To learn from discretely-observed general BDPs, we will need more advanced statistical tools.
LIKELIHOOD FOR THE CONTINUOUSLY-OBSERVED PROCESS
In a discretely-observed general BDP, the likelihood cannot be written in closed form, making analytic maximum likelihood estimation impossible. However, the likelihood of a continuously-observed BDP is straightforward to express (Keiding, 1975; Reynolds, 1973). To develop the likelihood for continuously-observed data from a general BDP, we note the following important fact: the exponentially distributed waiting time of a continuous-time Markov process in a certain state is independent of the destination of the next jump (Lange, 2010a). Recall that the waiting time W for the first event to occur from state k is exponentially distributed with rate λk +μk. If the waiting time in the current state k is W = τ, and the next change is a birth,
| (31) |
Likewise, the probability of a waiting time W = τ followed by a death is
| (32) |
Since we can only observe the process for a finite time t, the last observation will be the waiting time in some state k from the time of the jump to k to the end of observation. Using the same reasoning,
| (33) |
To write the likelihood of a continuously-observed BDP from time 0 to t, we introduce some notation to ease our presentation. Suppose we observe n jumps in the time interval (0, t), and label the jumps i = 1, …, n. Let Wi be the waiting time in the current state just before the ith jump. Define the indicator Bi = 1 if the ith jump is a birth, and Bi = 0 if the ith jump is a death. Let t1, …, tn be the times of the n jumps, with t0 = 0 and tn < t. Then the likelihood of a sequence of observations Y = {X(τ ), 0 < τ < t} is
| (34) |
where X(ti−1) is the state just before the ith jump. This cumbersome notation can be eliminated if we instead keep track of the total waiting time in each state and the number of births and deaths from each state. Define 𝟙{E} to be the indicator of an event E, and let
| (35) |
be the total time spent in state k over all visits to k. Then let
| (36) |
be the number of up steps (births) from state k, and let
| (37) |
be the number of down steps (deaths) from state k. Then we can re-write the likelihood (34) in much simpler and more transparent form as
| (38) |
Of course, in a BDP observed continuously for a finite time (for which (18) holds), there are only finitely many jumps observed, so the product above is not really infinite in practice.
Equation (38) also reveals that the likelihood for a continuously-observed BDP is a member of the exponential family, where {Uk}, {Dk}, and {Tk} for k = 0, 1, … are the sufficient statistics of the continuously-observed BDP likelihood. In other words, one only needs to know the total number of up and down steps from, and time spent in, each state k visited by the process in order to compute the likelihood.
EXAMPLE: CONTINUOUSLY-OBSERVED KENDALL PROCESS
Maximum likelihood estimation for continuously-observed BDPs is often straightforward. Consider the simple linear BDP with birth rate λk = kλ and death rate μk = kμ. The likelihood (38) of a single observation, up to a normalizing constant, becomes
| (39) |
where U = Σk Uk is the total number of up steps (births), D = Σk Dk is the total number of down steps (deaths) during the interval (0, t), and
| (40) |
is the “total particle time” or total time lived by every particle that existed during the interval (0, t). Maximizing (39) with respect to the unknown parameters λ and μ, we obtain the maximum likelihood estimators
| (41) |
first given by Reynolds (1973). Although the estimators provided by (41) involve an integral over the state path of the process, the integrand is simply a step function that is fully observed over (0, t).
LIKELIHOOD FOR THE DISCRETELY-OBSERVED PROCESS
Suppose now that the process X(τ ) is observed only discretely, once at time 0 and again at time t, without loss of generality owing to the Markov assumption. Let us label the state of the BDP at these times as X(0) = a and X(t) = b. Then given that X(0) = a, the probability that X(t) = b is the transition probability Pab(t). Above we outlined a method for numerically computing this probability for any general BDP. If we regard the transition probability Pab(t) as a function of some unknown parameters θ which control the birth and death rates, writing Pab(t|θ), then we have the likelihood of our observation,
| (42) |
In principle, we could numerically maximize the likelihood for discrete observations to find an estimate of θ. However, as the number of parameters increases, naïve numerical optimization often suffers from poor convergence (Doss et al., 2013). The difficulty in writing or computing the likelihood for discrete observations from BDPs has limited the usefulness of BDPs in applications.
In contrast to the appealing analytic characterization (38) of the continuously-observed process likelihood, the discretely-observed process is hard to characterize. To bridge this gap, it is helpful to view computation of the likelihood in the discretely-observed process as a missing data problem. When a BDP is observed discretely, we do not know the sufficient statistics . This perspective suggests that we exploit analytic information about these statistics, conditional on the start and end states of the observed process.
EM ALGORITHMS FOR MAXIMUM LIKELIHOOD ESTIMATION
In this section, we review the estimation machinery developed by Crawford et al. (2014) for maximum likelihood or maximum a posteriori estimation in BDPs. When a BDP is discretely sampled, Uk, Dk, and Tk are unobserved for every k; we cannot maximize the likelihood without knowing these statistics. We therefore appeal to the expectation-maximization (EM) algorithm for iterative maximum likelihood estimation with missing data (Dempster, Laird, & Rubin, 1977). When the incomplete data likelihood is intractable but the complete data likelihood has a simple form, the EM algorithm operates by replacing each missing datum by a conditional expectation as follows. If X is the complete (unobserved data), Y represents the incomplete (observed) data, and (θ|X) is the complete data log-likelihood, we form a surrogate function Q as the expectation of the complete data likelihood, conditional on the observed data Y and the current (mth) parameter iterate:
| (43) |
This is the E-step of the EM algorithm, and it accomplishes a minorization of (θ|X) at θ(m). The M-step maximizes (or takes a step toward the maximum of) Q. By alternating these steps — minorizing ℓ by Q, then finding a θ that increases Q — the EM algorithm drives succeeding iterates toward the MLE.
Taking the expectation of the logarithm of (38), conditional on the observed data Y = (X(0) = a, X(t) = b, t) and the current parameter estimate θ(m), we write the surrogate function for the BDP as follows:
| (44) |
In the above equation and many that follow, we omit the dependence of the conditional expectations on θ(m) from the mth iterate for visual clarity.
To calculate the conditional expectations necessary for the E-step of the EM algorithm, we appeal to the following integral expressions
| (45a) |
| (45b) |
| (45c) |
These expressions have appeared repeatedly in literature on inference for discretely-observed continuous-time Markov chains (Bladt & Sorensen, 2005; Hobolth & Jensen, 2005; Holmes & Rubin, 2002; Lange, 1995; Metzner, Dittmer, Jahnke, & Schütte, 2007). When the process takes only finitely many states, matrix solutions are possible using the uniformization method (Neuts, 1995). Hobolth and Stone (2009) develop efficient Monte Carlo methods using simulation conditioned on the start and end points of the discrete observation Y. Finally, Doss et al. (2013) study a linear BDP on an infinite state space and derive the expectations analytically using a generating function argument. None of the exact methods is a general approach for arbitrary BDPs on an infinite state space. The Monte Carlo approaches, while not reliant on a particular parameterization of the process, can suffer from poor performance when observed realizations occur with low probability. The lack of a reliable method for computing the E-step of the EM algorithm for discretely-observed BDPs has hindered progress on statistical inference for these processes.
An alternative approach introduced by Crawford et al. (2014) takes advantage of the Laplace transforms fab(s) of the transition probabilities (25). The numerators in (45) are time-domain convolutions of transition probabilities. The functional form of these expressions suggests using the Laplace convolution property to obtain
| (46a) |
| (46b) |
| (46c) |
where ℒ−1[·] denotes inverse Laplace transformation. These expressions are formally equivalent to (45), but they offer substantial computational time savings over numerical integration of (45), and make possible efficient computation of conditional expectations for EM algorithms for any BDP (Crawford et al., 2014).
We now show how to complete the M-step for several BDP models. The first two, variations on the simple linear (Kendall) process, were given in Crawford et al. (2014). The others are novel, yet remarkably easy to derive and implement computationally. In each case, we describe the surrogate likelihood function Q(θ|θ(m)) and give the M-step updates for each unknown parameter.
EXAMPLE: DISCRETELY-OBSERVED KENDALL PROCESS
In the simple linear BDP, births and deaths happen at constant per-particle rates, so λk = kλ and μk = kμ. The unknown is θ = (λ, μ). The surrogate function Q becomes
| (47) |
Maximizing (47) with respect to the θ yields the updates:
| (48a) |
| (48b) |
where
| (49) |
and we have again suppressed the dependence of the conditional expectations on θ(m) for typographic clarity. These expressions are identical in form to the estimators given in (41), but are instead iterative updates in the EM algorithm.
EXAMPLE: LINEAR BDP WITH IMMIGRATION
The linear BDP with immigration is similar to the simple linear BDP, but there is a source of new arrivals whose rate is constant and does not depend on the number of particles already in existence. This yields the birth and death rates λk = kλ+ν and μk = kμ. The log-likelihood becomes
| (50) |
Unfortunately, it is difficult to maximize the resulting surrogate function analytically. But since each term in the sum is a concave function of the unknown parameters, we can separate them in a second minorizing function H such that for all θ, H(θ|θ(m)) ≤ ℓ(θ) and H(θ(m)|θ(m)) = ℓ(θ(m)). To accomplish the minorization, note that
| (51) |
We form a minorizing log-likelihood function H as follows:
| (52) |
where
| (53) |
Exploiting this surrogate function and maximizing with respect to the unknown sufficient statistics gives the updates
| (54a) |
| (54b) |
The update for μ is the same as (48b).
EXAMPLE: PURE-BIRTH AND GENERALIZED POISSON PROCESSES
Recall that the Poisson process with arrival rate λ is a BDP with λk = λ, μk = 0 for all k. Many researchers have found that real-world count data are sometimes over- or under-dispersed relative to the Poisson distribution. Statisticians seeking a more flexible distribution for count outcomes that can accommodate over- and under-dispersion have arrived at several alternative distributions. A notable example that fits neatly into the BDP framework is the general pure-birth process with arbitrary birth rates λk, k = 0, 1, …, and μk = 0 for all k. This class of processes has an appealing property: it can recover any discrete probability distribution on the counting numbers by appropriately setting the birth rates (M. Faddy & Bosch, 2001; M. J. Faddy, 1997). Crawford and Zelterman (2015) show that any such pure-birth process can be represented as a sum of exchangeable Bernoulli random variables, a result that connects BDPs with phenomenological models often used for dependent outcomes in toxicology and epidemiology. Renshaw (2011, page 65) gives an analytic form for these transition probabilities
| (55) |
for 0 ≤ a ≤ b and t > 0 provided that λi ≠ λj for all i and j. While (55) has an appealing form, it depends on none of the birth rates being equal. Another potentially serious drawback is that it can be numerically troublesome to compute; the summands may be alternating in sign and the product of small differences in the denominators can lead to serious roundoff error. In many scenarios, especially when some observed counts are large and some λk’s are nearly or exactly equal, (55) provides an unappealing way to compute the likelihood. Exactly equal λk may arise, for example, when entertaining a Bayesian non-parametric prior. Fortunately, the EM framework does not require use of (55). We now provide an example of a pure birth process intended to generalize the Poisson distribution to accommodate over- and under-dispersion.
M. J. Faddy (1997) describes a class of pure-birth BDPs with λk = λ(γ +k)c and μk = 0, where c = 0 corresponds to a Poisson process with rate λ, c > 0 results in overdispersion relative to Poisson, and c < 0 results in underdispersion. The log-likelihood for the continuously-observed process beginning at X(0) = a and ending at X(t) = b is
| (56) |
Letting θ = (λ, γ, c), the surrogate function is
| (57) |
The update for λ is given by
| (58) |
but the updates for γ and c are not available in closed form. However, Lange (1995) shows that one step of a gradient ascent algorithm suffices to preserve the ascent property of the EM algorithm. Therefore a Newton-Raphson update can be derived, and
| (59) |
where ∇Q and d2Q are the gradient and Hessian of Q with respect to γ and c respectively.
EXAMPLE: MORAN MODEL
The Moran process models genetic drift in a finite population by keeping track of the number of alleles of a certain type at a biallelic locus in a haploid population of constant size N < ∞. Call the two alleles A and B, and suppose we wish to keep track of the number of A carriers in the population. In the Moran model with selection, carriers of A have fitness α, and carriers of B have fitness β. For the sake of identifiability in a statistical setting, we specify β = 1 and let α denote the relative fitness of A carriers over B carriers. Furthermore, A mutates to B in one generation with probability u, and vice versa with probability v. When an existing individual dies, a new allele is drawn at random. The birth and death rates are
| (60) |
for n = 0, … , N. Forming the surrogate function from (44), we see that maximizing the log-likelihood with respect to the unknowns α, u, and v is difficult. However, we can again construct a minorizing function to separate the parameters in the logarithm terms. We minorize the birth rate as
| (61) |
where
| (62) |
Although (61) and (62) may appear complicated, this minorization has the effect of separating the parameters α and u in the surrogate function, allowing closed-form updates. In a similar way, we minorize the log-death rate log(μn) as
| (63) |
where
| (64) |
We form the complete minorizing function H as
| (65) |
and the surrogate function is Q(θ) = 𝔼(H(θ)|Y, θ(m)). A simple way to proceed is to find updates for each of the unknowns individually, conditional on the previous (mth) estimate of the others, giving a cyclic coordinate ascent algorithm. The update for α is
| (66) |
The update for u is the positive solution of the quadratic equation
| (67) |
when 0 < u < 1. The update for v is obtained by similar manipulations.
EXAMPLE: MAXIMUM A POSTERIORI ESTIMATION FOR THE KENDALL PROCESS
In a Bayesian setting, a prior distribution f(θ) on the unknown parameters θ is given, and we seek to maximize the log-posterior distribution of the parameters, given the data, Pr(θ | Y ) ∝ Pr(Y | θ)f(θ) to obtain the maximum a posteriori (MAP) estimate of θ. Here the surrogate function becomes Q(θ|θ(m)) = 𝔼(ℓ(θ)|Y, θ(m)) + log [f (θ)]. To illustrate, suppose that independent observations from a BDP follow the simple linear model, and we believe that λ and μ are a priori independent and are Gamma-distributed:
| (68) |
Then the unknowns are θ = (λ, μ) and the log-prior for θ is
| (69) |
Ignoring irrelevant terms, the surrogate function becomes
| (70) |
The MAP updates are
| (71a) |
| (71b) |
EXAMPLE: REGRESSION FOR COUNT DATA
Perhaps the most interesting use of EM algorithms for BDP inference is to provide a unified framework for regression estimation. To illustrate, consider a collection of n independent BDPs, Xi(t) with and for i = 1, … , n, where Zi is a d × 1 vector of covariates and β is a covariate vector of corresponding dimension and μk = 0 for all k. Then letting Xi(0) = 0 and Xi(1) = xi for each i, the log-likelihood becomes
| (72) |
This is the log-likelihood for classical Poisson regression, and updates are found using a Newton-Raphson step (Dobson, 2001).
It is possible to formulate an analogous model for the Kendall process. Let and be the birth and death rates of a BDP Xi(t). The log-likelihood is
| (73) |
where the statistics , and correspond to observation i. When the process is discretely-observed, we form the surrogate as before, and find the gradient vector
| (74) |
for β. The Hessian matrix is
| (75) |
Then, the Newton-Raphson update for β becomes
| (76) |
A similar update is available for γ. We contrast the simplicity of the update expressions (76) with the formula for the Kendall process transition probability (9).
INTEGRAL FUNCTIONALS OF BDPS
Many important real-life applications of BDPs can be characterized as questions about the distribution of summary statistics. A common feature of stochastic processes in decision-making contexts is that the parameters estimated by the statistical inference procedure are not always the ones of interest in the application. Often the quantity of interest is a summary statistic related to the time-integral of the process. To illustrate, let g : ℕ → [0,∞) be a function and let S be a set of “taboo” or prohibited states. Suppose the initial state of the BDP is X(0) = i ∈ ℕ\S. Define the functional
| (77) |
where the upper limit of integration is the first passage time
| (78) |
Here, Ci is a functional because it maps a realization of the stochastic process g(X(t)( to its integral. Figure 2 shows an example realization of a BDP and its integral Ci with S = {0}. The left-hand side shows a BDP beginning at X(0) = 1, and ending at X(τ1) = 0. The right-hand plot shows g(X(t)( over the same time interval, and the area under the trajectory is Ci.
Figure 2.
Illustration of the integral of a functional of a general birth-death process (BDP). On the left, a BDP begins at X(0) = 1 and ends when the process reaches the absorbing state 0 just before time t = 2. On the right, is the area under the trajectory of g(X(t)), where g : ℕ → [0,∞) is an arbitrary positive “reward” or “cost” function. The upper limit of integration τ1 is the first passage time to zero, beginning at X(0) = 1.
Expressions like (77) arise often in applied work. For example, epidemiologists usually estimate the parameters (contact/infection rate and recovery rate) of an epidemic process from data, but their objective is to make inference of the predictive distribution of the cost of the epidemic in the future. Operations researchers may estimate the arrival rate λ and service rate μ in a queuing process, but the object of inference is the distribution of customer-hours waited. Traffic engineers may be interested in the number of vehicle-hours waited in models for highway accident delays (Gaver, 1969).
To illustrate the role of integral summaries of BDPs in statistical prediction, let p(c|θ) be the density of Ci given θ. The posterior predictive uncertainty about the statistic is the marginal distribution
| (79) |
where p(θ|Y ) is the sampling distribution of θ given the realized data Y. In a Bayesian context, p(θ|Y ) is a posterior distribution, and we might estimate p(c|Y ) by a Monte Carlo approximation involving N draws θj ~ p(θ|Y ) via
| (80) |
BACKGROUND ON INTEGRALS OF BDPS
Karlin and McGregor (1957a, 1957b) provided the first theoretical tools for working with integral functionals of general BDPs. P. Puri (1966); P. S. Puri (1968) derives the characteristic function for the joint distribution of simple linear BDP and its integral and gives expressions for moments and limiting distributions (P. Puri, 1972; P. S. Puri, 1971, 1972). McNeil (1970) gives the first results for general BDPs, Gani and McNeil (1971) derive expressions for the joint distribution of a general BDP and its integral, and Kaplan (1974) provides limit theorems for integrals of simple BDPs with immigration. Straightforward methods for moments of integrals of general BDPs using Laplace transforms are also available (Gani & Swift, 2008; Hernández-Suárez & Castillo-Chavez, 1999; Pollett, 2003; Pollett & Stefanov, 2003). However, most analyses of integral functionals of general BDPs are limited to simple analytically tractable models or focused on moments.
Now we consider the problem of computing the distribution of (77). Our emphasis on first-passage times as the upper limit of integration in (77) has two benefits. First, our analyses need not be conditional on an arbitrary time in the future. Second, first passage times allow us to exploit powerful analytic tools that establish a correspondence between transition probabilities and first-passage times, enabling analytic progress on integrals for arbitrary well-behaved processes. Our presentation follows the outline given by McNeil (1970). Let ci(s) = 𝔼 (e−sCi] be the Laplace transform of Ci. Note that if X(0) = i ∈ S then τi = 0, Ci = 0, and so ci(s) = 1. Now by an analogous conditioning argument for X(0) = i ∉ S, we re-write the Laplace transform as
| (81) |
that gives
| (82) |
Now dividing both sides of the above by g(i), we find that
| (83) |
where and . Therefore, we see that (83) is simply the backward equation for a modified process with birth and death rates and for i ∈ ℕ. The forward equation for the cumulative distribution function of ci is therefore equivalent to (5) with the modified birth and death rates.
Pollett (2003) gives the conditions, analogous to those for (16), under which this modified process explodes. We note that differentiation of solutions of (83) yields the moments of Ci, as noted by McNeil (1970) and subsequently refined by Hernández-Suárez and Castillo-Chavez (1999), Stefanov and Wang (2000), and Pollett (2003). We refer interested readers to those papers and focus here on results for the distribution of Ci that are more useful in statistical and decision applications.
To take advantage of (83), we modify (29) as follows. Fix S ⊂ ℕ and suppose X(t) is a general BDP with rates {λn} and {μn} with starting state X(0) = i ∈ ℕ\S. Suppose g(n) is a positive function defined for all n ∈ ℕ. Let Y (t) be a general BDP with rates and for all n ∈ ℕ\S, and for every n ∈ S. Then let . We then have
| (84) |
If instead of the cumulative distribution function H(c) of Ci, we wish to have the probability density, we could numerically differentiate (84). However, using the properties of the Laplace transform,
| (85) |
where is the Laplace transform of , ℒ−1[·] denotes Laplace inversion, and for all j ∈ S since we have assumed i ∉ S.
EXAMPLE: PROBABILISTIC CONTROL OF AN EPIDEMIC
In infectious disease epidemiology, stochastic modeling can give valuable insight into both disease dynamics and optimal intervention strategies (Ball, 1986; Wickwire, 1977). The total cost of an infectious disease epidemic is proportional to the area under the time trajectory of the number of infected people (Gani & Jerwood, 1972; Jerwood, 1970). To illustrate, we model the number of infected persons in a homogeneously mixing population as a type of general BDP. This simple model, called the susceptible-infected-susceptible (SIS) model, keeps track of the number of infected in a finite population of size N (N. T. Bailey, 1975). If there are currently n < N infected persons in the population, the rate of new infections is proportional to the product of the number infected n and susceptible N − n. The contact/transmission rate between infected and susceptible persons is λ. Infected persons recover and revert to susceptible status with constant per-person rate μ. For a SIS process X(t), the addition and removal rates are
| (86) |
where ε is a positive control parameter related to vaccination or some other public health intervention strategy. Suppose the initial number of infected is X(0) = i ≤ N and we are interested in the total cost of the epidemic until its eventual extinction, so S = {0}. Let the cost of managing the epidemic per unit time be aε. Additionally, let the cost per infected person per unit time be b > 0, so the cost function becomes g(n) = aε + bn. Then the total cost is
| (87) |
where τi is the time to extinction of the epidemic.
Most optimal control models seek a policy that minimizes the expected total cost, corresponding to the expectation of (77) under certain conditions on the intervention and cost functions (Cai & Luo, 1994; Clancy, 1999; Guo & Hernández-Lerma, 2009; Lefévre, 1981). The availability of probability distributions for the total cost allows us to seek the minimal intervention policy that guarantees that the total cost of the epidemic is small with high probability. Let X(t) be the process with rates given by (86) for a certain control setting ε. Then we wish to find the smallest ε such that
| (88) |
where c is a desired bound on the total cost, and 0 < α < 1 is a small probability. Assuming this probability is continuous and increases monotonically with ε near 1 − α, it is straightforward to find the smallest ε that satisfies (88).
Figure 3 shows how to find the minimal ε for a SIS process with N = 100 individuals, X(0) = 50, infectivity λ = 0.1, recovery rate μ = 8, control cost a = 0.1, and per-infected cost b = 0.3 per unit time. The top traces show the cumulative distribution function of the total cost for ε = 0, 0.5, 1, 1.5, 2. The vertical gray line shows Ci = 7, and we wish to keep the total cost less than 7 with probability 1− α = 0.95. The bottom trace shows Pr(Ci < 7) as a function of ε. The horizontal gray dashed line shows 0.95 probability, and the vertical gray dashed line shows the smallest value of ε (ε ≈ 3.4) that achieves this bound.
Figure 3.
Probabilistic control of a stochastic SIS epidemic. At top, the distribution of total epidemic cost Ci for different values of a control parameter ε. The dashed gray vertical line is at w = 7, and we wish to keep Ci < 7 with high probability. At bottom, the probability that Ci < 7 as a function of the control parameter ε. The horizontal gray dashed line denotes 0.95, and the vertical dashed line is the smallest epsilon that achieves Pr(Ci < 7) > 0.95; this yields ε ≈ 3.4. In this way, we can easily find the smallest value of a control parameter that bounds the probability that the epidemic will exceed a certain threshold.
DISCUSSION: COMPUTATION AND LIKELIHOOD-BASED INFERENCE FOR BDPS
BDPs are vital tools for modeling stochastic counting processes in epidemiology, evolution, ecology, chemistry, physics, and other fields. Modeling with BDPs is often straightforward; by considering rates of addition of new particles and removal of existing particles, conditional on the number already present, researchers can specify the birth and death rates . The ease of modeling with BDPs stands in stark contrast to the computational difficulty of inference using stochastic realizations of BDPs. Routine use of BDPs in statistical settings has been thwarted by intractable likelihoods and burdensome computations. A unified perspective on BDPs with arbitrary birth and death rates has remained elusive, until recently.
Laplace transforms of transition probabilities provide the essential analytic tools for bridging this gap in practice. Our approach for computing transition probabilities (likelihoods) in (25) and conditional expectations in the E-step (46) is general, robust, and computationally efficient. Laplace transforms of first-passage times also play an important role in finding the distribution of integral functionals of BDPs in applications. As a theoretic tool, this Laplace-perspective is not new; Karlin and McGregor (1957a, 1958a, 1957b) discuss the fundamental importance of Laplace transforms for analysis of BDPs. More recent results related to combinatorial properties of BDPs also rely on Laplace transforms (Flajolet & Guillemin, 2000; Guillemin & Pinchon, 1998, 1999).
Extensions to the general approach presented here have been developed for multivariate BDPs, where progress has been slower and analytic approaches to reducing computation are less readily available. Analytic formulae of transition probabilities for the simplest multivariate Kendall-like processes – called monomolecular reaction systems – have only recently been derived (Jahnke & Huisinga, 2007). Notably, Xu, Guttorp, Kato-Maeda, and Minin (2015) propose a fast algorithm to compute the transition probabilities of multi-type branching processes using a generating function approach. However, this method is only applicable for processes whose transition rates are linear. The first result for non-linear multivariate BDPs has recently been established by Ho, Xu, Crawford, Minin, and Suchard (2017), who consider a subclass of bivariate BDPs called birth/birth-death processes and develop an efficient computation method for the transition probabilities by deriving recursion formulae for their Laplace transforms. Nonetheless, the problem of computing the transition probabilities of general multivariate BDPs remains open and will be an exciting research direction for the future.
In this chapter, we have outlined new tools for practical likelihood-based analysis inference of BDP parameters under discrete and continuous observation of the process. In particular, BDP generalizations of Poisson regression yield more flexible and easy-to-fit models for count data. We have intentionally limited our discussion to basic computation of likelihoods, algorithms for maximum likelihood estimation, and finding the distribution of integral summary statistics for general BDPs. But these are only the first steps toward a comprehensive theory of estimation for BDPs. Ideally, we would like to see an analysis of identifiability, consistency and other statistical properties, like the groundbreaking work of Guttorp (1991) for Galton-Watson branching processes. We hope this review will stimulate statistical research related to BDPs with a view to bringing this rich class of stochastic models into wider use by applied scientists.
Acknowledgments
FWC was supported by NIH grants DP2 OD022614 and T32GM008185. LSTH was supported by startup funds from the Dalhousie University and the Canada Research Chairs program. MAS was supported by NIH grants R01 AI107034 and R01 HG006139, and NSF grant DMS 1264153 and IIS 1251151. We thank Yiyi Liu for helpful comments.
References
- Abate J, Whitt W. Computing Laplace transforms for numerical inversion via continued fractions. INFORMS J Comput. 1999;11(4):394–405. [Google Scholar]
- Andersson H, Britton T. Stochastic Epidemic Models and their Statistical Analysis. Springer; New York: 2000. [Google Scholar]
- Anscombe FJ. Sequential estimation. Journal of the Royal Statistical Society B. 1953;15(1):1–29. [Google Scholar]
- Athreya KB, Ney PE. Branching processes. Courier Corporation; 2004. [Google Scholar]
- Bailey NT. The mathematical theory of infectious diseases and its applications. 2. Charles Griffin & Company Ltd; 5a Crendon Street, High Wycombe, Bucks HP13 6LE: 1975. [Google Scholar]
- Bailey NTJ. The Elements of Stochastic Processes with Applications to the Natural Sciences. Wiley; New York: 1964. [Google Scholar]
- Ball F. A unified approach to the distribution of total size and total area under the trajectory of infectives in epidemic models. Advances in Applied Probability. 1986:289–310. [Google Scholar]
- Bankier JD, Leighton W. Numerical continued fractions. Am J Math. 1942;64(1):653–668. [Google Scholar]
- Bladt M, Sorensen M. Statistical inference for discretely observed Markov jump processes. Journal of the Royal Statistical Society B. 2005;67(3):395–410. [Google Scholar]
- Blanch G. Numerical evaluation of continued fractions. SIAM Rev. 1964;6(4):383–421. [Google Scholar]
- Bordes G, Roehner B. Application of Stieltjes theory for S-fractions to birth and death processes. Advances in Applied Probability. 1983;15(3):507–530. [Google Scholar]
- Cai H, Luo X. Stochastic control of an epidemic process. International Journal of Systems Science. 1994;25(4):821–828. [Google Scholar]
- Clancy D. Optimal intervention for epidemic models with general infection and removal rate functions. Journal of Mathematical Biology. 1999;39(4):309–331. doi: 10.1007/s002850050193. [DOI] [PubMed] [Google Scholar]
- Craviotto C, Jones WB, Thron WJ. A survey of truncation error analysis for Padé and continued fraction approximants. Acta Appl Math. 1993;33:211–272. [Google Scholar]
- Crawford FW, Minin VN, Suchard MA. Estimation for general birth-death processes. Journal of the American Statistical Association. 2014;109(506):730–747. doi: 10.1080/01621459.2013.866565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crawford FW, Stutz TC, Lange K. Coupling bounds for approximating birth–death processes by truncation. Statistics & probability letters. 2016;109:30–38. doi: 10.1016/j.spl.2015.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crawford FW, Suchard MA. Transition probabilities for general birth-death processes with applications in ecology, genetics, and evolution. Journal of Mathematical Biology. 2012;65:553–580. doi: 10.1007/s00285-011-0471-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crawford FW, Weiss RE, Suchard MA. Sex, lies, and self-reported counts: Bayesian mixture models for longitudinal heaped count data via birth-death processes. The Annals of Applied Statistics. 2015;9(2):572–596. doi: 10.1214/15-AOAS809. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crawford FW, Zelterman D. Markov counting models for correlated binary responses. Biostatistics. 2015;16(3):427–440. doi: 10.1093/biostatistics/kxv006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cuyt A, Petersen V, Verdonk B, Waadeland H, Jones W. Handbook of Continued Fractions for Special Functions. Springer Berlin/Heidelberg; 2008. [Google Scholar]
- Darwin JH. The behaviour of an estimator for a simple birth and death process. Biometrika. 1956;43(1):23–31. [Google Scholar]
- Dauxois J-Y. Bayesian inference for linear growth birth and death processes. Journal of Statistical Planning and Inference. 2004;121(1):1–19. [Google Scholar]
- Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B. 1977:1–38. [Google Scholar]
- Demuth JP, De Bie T, Stajich JE, Cristianini N, Hahn MW. The evolution of mammalian gene families. PloS One. 2006;1(1):e85. doi: 10.1371/journal.pone.0000085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dobson AJ. An introduction to generalized linear models. CRC press; 2001. [Google Scholar]
- Doss CR, Suchard MA, Holmes I, Kato-Maeda M, Minin VN. Fitting birth-death processes to panel data with applications to bacterial DNA fingerprinting. The Annals of Applied Statistics. 2013;7(4):2315. doi: 10.1214/13-AOAS673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faddy M, Bosch R. Likelihood-based modeling and analysis of data underdispersed relative to the Poisson distribution. Biometrics. 2001;57(2):620–624. doi: 10.1111/j.0006-341x.2001.00620.x. [DOI] [PubMed] [Google Scholar]
- Faddy MJ. Extended poisson process modelling and analysis of count data. Biometrical Journal. 1997;39(4):431–440. doi: 10.1002/bimj.201100214. [DOI] [PubMed] [Google Scholar]
- Feller W. An Introduction to Probability Theory and its Applications. Wiley; New York: 1971. [Google Scholar]
- Flajolet P, Guillemin F. The formal theory of birth-and-death processes, lattice path combinatorics and continued fractions. Advances in Applied Probability. 2000;32(3):750–778. [Google Scholar]
- Fudenberg D, Imhof L, Nowak MA, Taylor C. Stochastic evolution as a generalized moran process. Unpublished manuscript 2004 [Google Scholar]
- Gani J, Jerwood D. The cost of a general stochastic epidemic. Journal of Applied Probability. 1972;9(2):257–269. [Google Scholar]
- Gani J, McNeil DR. Joint distributions of random variables and their integrals for certain birth-death and diffusion processes. Advances in Applied Probability. 1971;3(2):339–352. [Google Scholar]
- Gani J, Swift R. A simple approach to the integrals under three stochastic processes. Journal of Statistical Theory and Practice. 2008;2(4):559–568. [Google Scholar]
- Gaver D. Highway delays resulting from flow-stopping incidents. Journal of Applied Probability. 1969;6(1):137–153. [Google Scholar]
- Guillemin F, Pinchon D. Continued fraction analysis of the duration of an excursion in an M/M/∞ system. Journal of Applied Probability. 1998;35(1):165–183. [Google Scholar]
- Guillemin F, Pinchon D. Excursions of birth and death processes, orthogonal polynomials, and continued fractions. Journal of Applied Probability. 1999;36(3):752–770. [Google Scholar]
- Guo X, Hernández-Lerma O. Continuous-time Markov decision processes. Springer; 2009. [Google Scholar]
- Guttorp P. Statistical inference for branching processes. Wiley-Interscience; 1991. [Google Scholar]
- Hernández-Suárez C, Castillo-Chavez C. A basic result on the integral for birth-death Markov processes. Mathematical Biosciences. 1999;161(1):95–104. doi: 10.1016/s0025-5564(99)00034-6. [DOI] [PubMed] [Google Scholar]
- Ho LST, Xu J, Crawford FW, Minin VN, Suchard MA. Birth/birth-death processes and their computable transition probabilities with biological applications. Journal of Mathematical Biology. 2017 doi: 10.1007/s00285-017-1160-3. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hobolth A, Jensen JL. Statistical inference in evolutionary models of DNA sequences via the EM algorithm. Statistical Applications in Genetics and Molecular Biology. 2005;4(1) doi: 10.2202/1544-6115.1127. [DOI] [PubMed] [Google Scholar]
- Hobolth A, Stone EA. Simulation from endpoint-conditioned, continuous-time Markov chains on a finite state space, with applications to molecular evolution. Annals of Applied Statistics. 2009;3(3):1024–1231. doi: 10.1214/09-AOAS247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holmes I, Bruno WJ. Evolutionary HMMs: a Bayesian approach to multiple alignment. Bioinformatics. 2001 Sep;17(9):803–820. doi: 10.1093/bioinformatics/17.9.803. [DOI] [PubMed] [Google Scholar]
- Holmes I, Rubin G. An expectation maximization algorithm for training hidden substitution models. Journal of Molecular Biology. 2002;317(5):753–764. doi: 10.1006/jmbi.2002.5405. [DOI] [PubMed] [Google Scholar]
- Ismail MEH, Letessier J, Valent G. Linear birth and death models and associated Laguerre and Meixner polynomials. J Approx Theory. 1988;55(3):337–348. [Google Scholar]
- Jahnke T, Huisinga W. Solving the chemical master equation for monomolecular reaction systems analytically. Journal of Mathematical Biology. 2007;54(1):1–26. doi: 10.1007/s00285-006-0034-x. [DOI] [PubMed] [Google Scholar]
- Jerwood D. A note on the cost of the simple epidemic. Journal of Applied Probability. 1970;7(2):440–443. [Google Scholar]
- Kaplan N. Limit theorems for the integral of a population process with immigration. Stochastic Processes and their Applications. 1974;2(3):281–294. [Google Scholar]
- Karev GP, Wolf YI, Rzhetsky AY, Berezovskaya FS, Koonin EV. Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evolutionary Biology. 2002;2(1):18. doi: 10.1186/1471-2148-2-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karlin S, McGregor J. The classification of birth and death processes. Transactions of the American Mathematical Society. 1957a;86(2):366–400. [Google Scholar]
- Karlin S, McGregor J. Linear growth, birth and death processes. Journal of Mathematics and Mechanics. 1958a:643–662. [Google Scholar]
- Karlin S, McGregor J. Many server queueing processes with poisson input and exponential service times. Pacific Journal of Mathematics. 1958b;8(1):87–118. [Google Scholar]
- Karlin S, McGregor JL. The differential equations of birth-and-death processes, and the Stieltjes moment problem. Transactions of the American Mathematical Society. 1957b;85(2):489–546. [Google Scholar]
- Karlin S, Taylor HM. A First Course in Stochastic Processes. Academic Press; 1975. [Google Scholar]
- Keiding N. Maximum likelihood estimation in the birth-and-death process. Annals of Statistics. 1975;3(2):363–372. [Google Scholar]
- Kendall DG. On the generalized” birth-and-death” process. The Annals of Mathematical Statistics. 1948:1–15. [Google Scholar]
- Kendall DG. Stochastic processes and population growth. Journal of the Royal Statistical Society. Series B (Methodological) 1949;11(2):230–282. [Google Scholar]
- Kimmel M, Axelrod DE. Branching processes in biology. Springer Publishing Company, Incorporated; 2016. [Google Scholar]
- Kingman JF. On the genealogy of large populations. Journal of Applied Probability. 1982;19(A):27–43. [Google Scholar]
- Klar B, Parthasarathy P, Henze N. Zipf and lerch limit of birth and death processes. Probability in the Engineering and Informational Sciences. 2010;24(1):129–144. [Google Scholar]
- Krone SM, Neuhauser C. Ancestral processes with selection. Theoretical Population Biology. 1997;51:210–237. doi: 10.1006/tpbi.1997.1299. [DOI] [PubMed] [Google Scholar]
- Lange K. A gradient algorithm locally equivalent to the EM algorithm. Journal of the Royal Statistical Society B. 1995;57(2):425–437. [Google Scholar]
- Lange K. Applied Probability. 2. Springer; New York: 2010a. [Google Scholar]
- Lange K. Numerical analysis for statisticians. 2. Springer; New York. Hardcover: 2010b. [Google Scholar]
- Lee J, Weiss RE, Suchard MA. Using a birth-death process to account for reporting errors in longitudinal self-reported counts of behavior. Unpublished UCLA Biostatistics Technical report 2011 [Google Scholar]
- Lefévre C. Optimal control of a birth and death epidemic process. Operations Research. 1981;29(5):971–982. doi: 10.1287/opre.29.5.971. [DOI] [PubMed] [Google Scholar]
- Lenin R, Parthasarathy P. A birth-death process suggested by a chain sequence. Computers & Mathematics with Applications. 2000;40(2–3):239–247. [Google Scholar]
- Lorentzen L, Waadeland H. Continued Fractions with Applications. North-Holland; Amsterdam: 1992. [Google Scholar]
- Mayrose I, Barker MS, Otto SP. Probabilistic models of chromosome number evolution and the inference of polyploidy. Systematic Biology. 2010;59(2):132–144. doi: 10.1093/sysbio/syp083. [DOI] [PubMed] [Google Scholar]
- McFarland CD, Mirny LA, Korolev KS. Tug-of-war between driver and passenger mutations in cancer and other adaptive processes. Proceedings of the National Academy of Sciences. 2014;111(42):15138–15143. doi: 10.1073/pnas.1404341111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McNeil D. Integral functionals of birth and death processes and related limiting distributions. Annals of Mathematical Statistics. 1970:480–485. [Google Scholar]
- Metzner P, Dittmer E, Jahnke T, Schütte C. Generator estimation of Markov jump processes. Journal of Computational Physics. 2007;227(1):353–375. [Google Scholar]
- Moran P. Estimation methods for evolutive processes. Journal of the Royal Statistical Society. Series B (Methodological) 1951:141–146. [Google Scholar]
- Moran P. The estimation of the parameters of a birth and death process. Journal of the Royal Statistical Society. Series B (Methodological) 1953:241–245. [Google Scholar]
- Moran PAP. Random processes in genetics. Mathematical proceedings of the cambridge philosophical society. 1958;54:60–71. [Google Scholar]
- Murphy J, O’donohoe M. Some properties of continued fractions with applications in markov processes. IMA Journal of Applied Mathematics. 1975;16(1):57–71. [Google Scholar]
- Nee S. Birth-death models in macroevolution. Annual Review of Ecology, Evolution, and Systematics. 2006;37:1–17. [Google Scholar]
- Nee S, May RM, Harvey PH. The reconstructed evolutionary process. Philosophical Transactions of the Royal Society of London B: Biological Sciences. 1994;344(1309):305–311. doi: 10.1098/rstb.1994.0068. [DOI] [PubMed] [Google Scholar]
- Neuts MF. Algorithmic probability: A collection of problems (stochastic modeling series) Chapman and Hall/CRC; 1995. [Google Scholar]
- Norris JR. Markov chains (No. 2008) Cambridge university press; 1998. [Google Scholar]
- Novozhilov AS, Karev GP, Koonin EV. Biological applications of the theory of birth-and-death processes. Briefings in Bioinformatics. 2006;7(1):70–85. doi: 10.1093/bib/bbk006. [DOI] [PubMed] [Google Scholar]
- Nowak MA, Michor F, Iwasa Y. The linear process of somatic evolution. Proceedings of the national academy of sciences. 2003;100(25):14966–14969. doi: 10.1073/pnas.2535419100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nowak MA, Sasaki A, Taylor C, Fudenberg D. Emergence of cooperation and evolutionary stability in finite populations. Nature. 2004;428(6983):646. doi: 10.1038/nature02414. [DOI] [PubMed] [Google Scholar]
- Ohkubo J. Karlin–McGregor-like formula in a simple time-inhomogeneous birth–death process. Journal of Physics A: Mathematical and Theoretical. 2014;47(40):405001. [Google Scholar]
- Parthasarathy P, Lenin R, Schoutens W, Van Assche W. A birth and death process related to the Rogers–Ramanujan continued fraction. Journal of Mathematical Analysis and Applications. 1998;224(2):297–315. [Google Scholar]
- Pollett P. Integrals for continuous-time markov chains. Mathematical Biosciences. 2003;182(2):213–225. doi: 10.1016/s0025-5564(02)00161-x. [DOI] [PubMed] [Google Scholar]
- Pollett P, Stefanov V. A method for evaluating the distribution of the total cost of a random process over its lifetime. International congress on modelling and simulation. 2003;4:1863–1867. [Google Scholar]
- Puri P. On the homogeneous birth-and-death process and its integral. Biometrika. 1966;53(1–2):61–71. [PubMed] [Google Scholar]
- Puri P. A method for studying the integral functional of stochastic processes with applications II. sojourn time distributions for Markov chains. Probab Theory Rel. 1972;23(2):85–96. [Google Scholar]
- Puri PS. Some further results on the birth-and-death process and its integral. Mathematical proceedings of the cambridge philosophical society. 1968;64:141–154. [Google Scholar]
- Puri PS. A method for studying the integral functionals of stochastic processes with applications: I. Markov chain case. Journal of Applied Probability. 1971;8(2):331–343. [Google Scholar]
- Puri PS. Proceedings of the sixth Berkeley symposium on mathematical statistics and probility. Vol. 3. University of California Press; 1972. A method for studying the integral functionals of stochastic processes with applications III; pp. 481–500. [Google Scholar]
- Reed WJ, Hughes BD. A model explaining the size distribution of gene and protein families. Mathematical Biosciences. 2004;189(1):97–102. doi: 10.1016/j.mbs.2003.11.002. [DOI] [PubMed] [Google Scholar]
- Renshaw E. Modelling Biological Populations in Space and Time. Cambridge University Press; 1993. [Google Scholar]
- Renshaw E. Stochastic Population Processes: Analysis, Approximations, Simulations. Oxord University Press; 2011. [Google Scholar]
- Reynolds JF. On estimating the parameters of a birth-death process. Australian & New Zealand Journal of Statistics. 1973;15(1):35–43. [Google Scholar]
- Rosenberg NA, Tsolaki AG, Tanaka MM. Estimating change rates of genetic markers using serial samples: applications to the transposon IS6110 in Mycobacterium tuberculosis. Theoretical Population Biology. 2003;63(4):347–363. doi: 10.1016/s0040-5809(03)00010-8. [DOI] [PubMed] [Google Scholar]
- Ross SM. Stochastic processes. 2. Wiley; 1995. [Google Scholar]
- Sagitov S. Linear-fractional branching processes with countably many types. Stochastic Processes and their Applications. 2013;123(8):2940–2956. [Google Scholar]
- Stefanov V, Wang S. A note on integrals for birth-death processes. Mathematical Biosciences. 2000;168(2):161–165. doi: 10.1016/s0025-5564(00)00046-8. [DOI] [PubMed] [Google Scholar]
- Tan WY, Piantadosi S. On stochastic growth processes with application to stochastic logistic growth. Statistica Sinica. 1991;1:527–540. [Google Scholar]
- Thorne J, Kishino H, Felsenstein J. An evolutionary model for maximum likelihood alignment of DNA sequences. Journal of Molecular Evolution. 1991 Aug;33(2):114–124. doi: 10.1007/BF02193625. [DOI] [PubMed] [Google Scholar]
- Wall HS. Analytic Theory of Continued Fractions. D. Van Nostrand Company, Inc; New York: 1948. [Google Scholar]
- Wallis J. Opera mathematica volume 1. oxoniae e theatro shedoniano. 1972. reprinted by Georg Olms Verlag, Hildeshein, New York. [Google Scholar]
- Wickwire K. Mathematical models for the control of pests and infectious diseases: a survey. Theoretical Population Biology. 1977;11(2):182–238. doi: 10.1016/0040-5809(77)90025-9. [DOI] [PubMed] [Google Scholar]
- Wolff RW. Problems of statistical inference for birth and death queuing models. Operations Research. 1965;13(3):343–357. [Google Scholar]
- Xu J, Guttorp P, Kato-Maeda M, Minin VN. Likelihood-based inference for discretely observed birth–death-shift processes, with applications to evolution of mobile genetic elements. Biometrics. 2015;71(4):1009–1021. doi: 10.1111/biom.12352. [DOI] [PMC free article] [PubMed] [Google Scholar]



