Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Aug 1.
Published in final edited form as: Theor Popul Biol. 2011 Apr 2;80(1):1–15. doi: 10.1016/j.tpb.2011.03.004

Derivatives of the Stochastic Growth Rate

David Steinsaltz 1, Shripad Tuljapurkar 2, Carol Horvitz 3
PMCID: PMC3186700  NIHMSID: NIHMS287297  PMID: 21463645

Abstract

We consider stochastic matrix models for population driven by random environments which form a Markov chain. The top Lyapunov exponent a, which describes the long-term growth rate, depends smoothly on the demographic parameters (represented as matrix entries) and on the parameters that define the stochastic matrix of the driving Markov chain. The derivatives of a — the “stochastic elasticities” — with respect to changes in the demographic parameters were derived by Tuljapurkar (1990). These results are here extended to a formula for the derivatives with respect to changes in the Markov chain driving the environments. We supplement these formulas with rigorous bounds on computational estimation errors, and with rigorous derivations of both the new and the old formulas.

1 Introduction

Stochastic matrix models for structured populations are widely used in evolutionary biology, demographic forecasting, ecology, and population viability analysis (e.g., Tuljapurkar (1990); Lee and Tuljapurkar (1994); Morris and Doak (2002); Caswell (2001); Lande et al. (2003)). In these models, a discrete-time stochastic process drives changes in environmental conditions that determine the population's stage-transition rates (survival, fertility, growth, regression and so on). Population dynamics are described by a product of randomly chosen population projection matrices. In most biological situations the population's stage structure converges to a time-varying but stable structure Cohen (1977), and in the long run the population grows at a stochastic growth rate a that is not random and is the leading Lyapunov exponent of the random product of population projection matrices Furstenberg and Kesten (1960a); Cohen (1977); Lange (1979); Lange and Holmes (1981); Tuljapurkar and Orzack (1980). This growth rate a is of considerable biological interest, as a fitness measure for a stage-structured phenotype Tuljapurkar (1982), as a determinant of population viability and persistence Tuljapurkar and Orzack (1980); Morris and Doak (2002); Lande et al. (2003), and in a variety of invasion problems in evolution and epidemiology Metz et al. (1992).

The map between environments and projection matrices describes how pheno-type change depends on the environment — the phenotypic norm of response — and we are often interested in how populations respond to changes in, say, the mean or variance of the projection matrix elements. Such questions are answered by computing the derivatives of a with respect to changes in the projection matrices, using a formula derived by Tuljapurkar (1990). Tuljapurkar et al. (2003) called these derivatives stochastic elasticities, to contrast with the elasticity of the dominant eigenvalue of a fixed projection matrix to the elements of that matrix (Caswell, 2001). Stochastic elasticity has been used to examine evolutionary questions (Haridas and Tuljapurkar, 2005) and the effects of climate change (Morris et al., 2008). At the same time, a is also a function of the stochastic process that drives environments. Many processes, such as climate change (Boyce et al., 2006), will result in changes in the frequencies of, or the probabilities of transition between, environmental states. How is a affected by a change in the pattern and distribution of environments, rather than by a change in the population projection matrices? To answer this question, we consider a model in which the environment makes transitions among several discrete states, according to a Markov chain. Then what we want is the derivative of a with respect to changes in the transition probabilities of this Markov chain. This derivative exists (at least away from the boundaries of the space of stochastic matrices), and in fact we know from Peres (1992) that a is an analytic function of the parameters of both the projection matrices and the parameters defining the stochastic matrix, in an open neighborhood of the set of stochastic matrices. In deterministic models (Caswell, 2001), the growth rate is represented as λ = er; then sensitivities are derivatives of the form (∂λ/∂x) with respect to a parameter x whereas elasticities are proportional derivatives of the form (∂r/∂ log x). In stochastic models we compute derivatives of a, and these can be used to compute elasticities (as in Tuljapurkar et al. (2003)) or sensitivities.

Our first contribution here is a new formula for computing the derivative of a with respect to changes in the transition probabilities of the environmental Markov chain, given in abstract form as equation (40). To obtain this result we show how an initial environmental state affects future growth, using coupling and importance sampling; this analysis may be of independent interest. Even with a formula in hand we must compute derivatives of a by numerical simulation which is subject to both sampling (Monte Carlo) error and bias. Our second contribution here is to show how one can bound these estimation errors. Our third contribution is a rigorous proof of the heuristically derived formula given by Tuljapurkar (1990) for the derivatives of a to the elements of the population projection matrices.

In Section 2 of this paper we set out the model and assumptions, the approach to finding derivatives, along with necessary facts about the convergence of population structures and distributions. In Section 3 we discuss systematic and sampling errors and show how we can bound them. We illustrate this approach in Section 4 by presenting bounds (in Theorem 1) for simulation estimates of the stochastic growth rate a and (in Theorem 2) for the derivatives of a with respect to projection matrix elements. In Section 5 we define a measure of the effect of an initial environmental state on subsequent population growth and show how to estimate this measure using coupling arguments. Section 6 presents (in Theorem 4) the formula, algorithm, and error bounds for the derivative of a with respect to the elements of the Markov chain that drives environments. We end by discussing how these theorems can be applied and some related issues concerning parameter estimation in such models. Proofs are in the Appendix.

2 The Model

We consider a population whose individuals exist in K different stages (these may be, for example, ages, developmental stages or size classes). Newborns are in stage i = 1. The progression between stages occurs at discrete time intervals at rates that depend on the environment in each time interval. The environment et in period t is in one of M possible states; we denote the set of possible environments by M = {1, …, M}. Individuals in stage i at time t move to stage j at a rate Xet+1(j, i). These rates are elements of a nonnegative population projection matrix, and at time t when the environment is et this matrix is denoted by Xet; there are M such matrices, one for each environmental state. We assume that allocation of individuals to classes and the identification of environment states are certain. We also assume that the total number of individuals in the population is large enough that we can ignore sampling variation. Successive environments are chosen according to a Markov process with transition matrix P whose elements are P(e,e˘) and whose stationary distribution is ν = {ν(e)}. We follow the standard convention for Markov chains, that P(e,e˘) represents the probability of a transition from state e to state e˘; note that this is the reverse of the convention used in matrix population models.

To guarantee demographic weak ergodicity (Cohen 1977) we assume that

  • (i)

    There exists some R > 0 such that any product Xe1XeR has all entries positive. This implies, in particular, that each population projection matrix is row-allowable and column-allowable. That is, every row and every column has at least one positive entry.

  • (ii)

    The chain is ergodic (so transitive and aperiodic), and environments are in the stationary distribution of P, which will also be denoted by ν(e).

The population in year t is represented by a vector NtR+K{(nt(1),,nt(K))T:nt(i)>0}. The superscript T will always mean transpose; here it indicates that population vectors are column vectors. (There may be population classes early on that have 0 members; condition (ii) above forces all classes eventually to have positive membership, and so we assume without loss of generality that we start with all population classes occupied.) The population structure changes according to Nt+1 = Xet+1Nt, and

Nt=XetXet1Xe1N0. (1)

The normalized population structure Nt/(Σi Nt(i)) does not converge to a fixed limit (as it would if the environment were constant) but it does converge in distribution.

At each stage i, we define

alimtt1logNt(i)limtt1logΣiNt(i)ΣiN0(i) (2)

A fundamental result in this area of research is the Furstenberg-Kesten Theorem Furstenberg and Kesten (1960b), which tells us that the long-run growth rate exists and is not random. This a is called the “stochastic growth rate”. The proof of this fact is fairly straightforward, within the framework of modern Markov chain theory. We present a version in section 3.1 and a proof in the appendix, both for the reader's convenience and because it illustrates some of the important basic ideas that we draw on throughout this work.

The main result of this paper is formula (40), which represents the derivative of a with respect to an incremental change in the probability of transitioning from e to e~. If we denote this derivative by Ae,e~, then the change corresponding to shifting P in the direction of a matrix W is a linear functional

da(P+W)d=e,e~MAe,e~We,e~.

We note that, while the existence of such a derivative follows from the general theory of Peres (1992), the computable formula is new.

Counting the K2 parameters in each matrix, there are at most (MK2 + M2) parameters. While all parameters must be nonnegative, and all elements of the population projection matrices but the birth rates must be ≤ 1, the only universal constraint is ΣeMP(e,e˘)=1. (There may, however, be further constraints imposed, as some transitions may be impossible. If we are considering age-structured populations, the matrices are Leslie matrices, each with only 2K − 1 potentially nonzero parameters.) The sensitivities we examine are derivatives of a with respect to these parameters. We will define a one-dimensional family of population matrices or environment transition matrices, and we refer to change “in the direction of” the derivative of this family. This will be made precise in the analyses that follow.

3 Technical background

3.1 Contraction

We denote the K-dimensional column vector with 1's in all places by 1. By default we use the L1 norm ∥x∥ := Σ |xi| when x is a vector in RK, and write Xi,j=1KXi,j when X is a K × K matrix. Note that ∥X∥ = ∥X1∥ when X is a nonnegative matrix. Our assumptions imply that there are positive constants k^ and r^, such that for any environments e1, …, em,

logXemXe1k^+mr^ (3)

We use the Hilbert projective metric ρ, described in Golubitsky et al. (1975) and in section 3.1 of Seneta (1981), defined by

ρ(x,y)logmax1iKx(i)y(i)+logmax1iKy(i)x(i). (4)

This is a pseudometric on R+K that is a metric on S{(x(1),,x(K))T:x(i)>0} and Σx(i) = 1}. The distance between two vectors is defined by the ray from the origin; that is, ρ(x, y) = ρ(x/∥x∥, y/∥y∥). It is straightforward to show — cf. Bushell (1973) — that

12[min{x(i)}+min{y(i)}]eρ(x,y)ρ(x,y)xyeρ(x,y)1

for any x, yS. (The upper bound is shown by Bushell with respect to the Euclidean norm, but the same argument holds for any Lp norm.) Thus, convergence in the projective metric implies that the projections onto S converge in the standard norms. It is also straightforward to show that

max1iKlogx(i)y(i)ρ(x,y)+logxy. (5)

The Birkhoff contraction coefficient (cf. Seneta (1981)) is defined for nonnegative matrices X as

τB(X)supuvSρ(Xu,Xv)ρ(x,v). (6)

It is clear that τB is sub-multiplicative (that is, τB(XY) ≤ τB(XB(Y)). We also have the formula (Theorem 3.12 of Seneta (1981))

τB(X)=1ϕ(X)121+ϕ(X)12,whereϕ(X)={mini,j,k,Xi,kXjXjkXiifX>0,0ohterwise.} (7)

(This depends on the assumption that X is row-allowable. We will be applying this result only to cases in which X is positive, so our bound on τB will be strictly less than 1.)

An immediate consequence is Lemma 3.1.

Lemma 3.1. For any fixed R, set rmaxe1,,eRMτB(XeRXe1)1R. Then for any e1,,enM,

τB(XenXe1)k1rn, (8)

where k1 := r1/R−1. In particular, if R is chosen so that every product XeR … Xe1 is positive, then r < 1.

We also know that for any u, u˘S, and vT any positive row vector,

log min{u(i)u˘(i)}logvTuvTu˘log max{u(i)u˘(i)}.

Since the left-hand side is ≤ 0 and the right-hand side ≥ 0, so

logvTulogvTu˘ρ(u,u˘). (9)

Following Lemma 1 of Lange (1979), we define a compact subset US which is stable under the transformations uXeu/∥Xeu∥ for any eM and includes the vector 1/K, as well as all vectors of the form XeRXe1 y/∥XeRXe1 y∥ where yS; and a compact subset VST which is stable under the transformations vTvT Xe/∥vT Xe∥, and includes the vector 1T/K as well as all vectors of the form yT XeRXe1/∥yT XeRXe1∥ where yS. Note that we could take U to be any Un defined to be the union of all products Xe1Xen S. These are a decreasing sequence of compact sets, with XeUnUn+1. whose limit U is in general a fractal set, which is the support of the stationary distribution π. For practicality we are likely to work with a U which is a finite-stage approximation to U.

Since U is a compact subset of S, its diameter Δsupu,u˘Uρ(u,u˘) is finite. Combining this with Lemma 3.1, there is a constant k2 = k1Δ such that

ρ(XemXe1u,XemXe1u˘)1rmρ(u,u˘)k2rm. (10)

(Cf. Theorem 2 of Lange and Holmes (1981).) Of course, the constants may be chosen so that the same relation holds for the transposed matrices, with uT,u˘TV. We use then the definition

Δmax{supu,u˘Uρ(u,u˘),supv,v˘Uρ(v,v˘)}. (11)

We also define

ΔU(u)supu˘U(u,u˘);ΔV(u)supv˘V(v,v˘). (12)

Since for any positive vectors u, u′, u″ we have ρ(u+u′, u″) ≤ max {ρ(u, u″), ρ(u′, u″)}, the maximum of ρ(u, u′) among vectors in a cone is taken between extreme points of the cone, we can compute Δ as the maximum of all distances of the form ρ(XeRXe11,XeRXe11); and distances of the form ρ(1XeRXe1,1XeRXe1).

It follows (as in Lemma 2 of Lange and Holmes (1981)) that for any u, u˘U and environments e, e1, …, em,

logXeXemXe1uXemXe1ulogXeXemXe1u˘XemXe1u˘k1rmρ(u,u˘)k2rm. (13)

The same relation holds when the matrices X are replaced by their transposes, with uT, u˘TV.

Since ∥X∥ = ∥X1∥, and 1/K is in U, it immediately follows that if e˘1,,e˘i are any other environments,

logXeXemXe1XemXe1logXeXemXei+1Xe˘iXe˘1XemXei+1Xe˘iXe˘1k2rmi. (14)

We note that the results in Lange and Holmes (1981) depend only on the set of matrices X being compact, not on it being finite. In section 5 and beyond we will be letting the matrices Xe and/or the transition matrix P depend smoothly on a parameter x, which will take values either in [−x0, x0] or [0, x0]. We may then choose the sets U and V and constants k1, k2, r such that the properties above — in particular, the stability of U and V and the bounds (10) and (13) — hold simultaneously for all values of the parameter.

One slightly more complicated bound is proved in the appendix.

Lemma 3.2 Given v, v˘, u, u˘R+R, and constants ku, kv14 with maxlogviv˘ikv and maxloguju˘jku we have

logvTuv˘TulogvTuv˘Tu˘8kukv. (15)

This condition is satisfied if

kvρ(v,v˘)+logvv˘14

and

kuρ(u,u˘)+logvv˘14.

3.2 Time reversal and the stationary distribution

The transition matrix for the time-reversal of P will be denoted P~, and is given by

P~(e,e˘)ν(e˘)νe˘P(e˘,e).

A standard result (for example, see Theorem 6.5.1 of Grimmett and Stirzaker (2001)) tells us that if e1, e2, …, em form a stationary Markov chain with transition matrix P, for a fixed m, the reversed sequence em, em−1, …, e1 is a Markov chain with transition matrix P~. (We use the boldface e to represent a random environment — a random variable with values in M — and e to represent a particular environment or a realisation of a random variable e.)

As described in Lange and Holmes (1981), if e1, e2, … forms a stationary Markov chain with transition probabilities P, there is a unique distribution π on U×M which is stable under the transformation (u, et) ↦ (Xetu/∥Xetu∥, et+1). That is, if the normalized population structure Nt/∥Nt∥ paired with et+1 is chosen from the distribution π, then the pair (Nt+1/∥Nt+1∥, et+2) will also be in the distribution π.

This Markov chain is naturally represented as an iterated function system; that is, a step in the chain is made by choosing randomly from a collection of possible functions, and applying it to the current position of the chain. In this case, the function is uXeu/∥Xeu∥, where e is the next random environment. Time reversal is a convenient approach for studying the asymptotic behavior of such systems, and more general random dynamical systems. For applications, see Barnsley and Elton (1988); Elton (1990); Steinsaltz (1999).

To understand how this works, suppose we extend our Markov chain of environments forward and backward in time to infinity: …, e−2, e−1, e0, e1, e2, …. Starting at any s, positive or negative, moving forward we have a Markov chain with transition matrix P: es, es+1, es+2, …. Moving backward, es, es−1, es−2, … is a Markov chain with transition matrix P~. Choose any population vector u0U, and define for s > t,

Us,tXesXes1Xetu0XesXes1Xetu0. (16)

From (10), we know that for any t1, t2m, we have the bound ρ(Us,−t1, Ust2) ≤ k2rm+s, which means that (Us,t)t=0 is a Cauchy sequence. We may denote its limit by Us,−∞.

We note four important features of this backward iterated sequence:

  • (i)

    If the environments all shift one to the left, so that we start at es+1 instead of es, the distribution remains exactly the same. Thus, the distribution of Us,−∞ is the same for any s.

  • (ii)
    If we start our population Markov chain in the state (U0,−∞, e1), then the next state will be
    (Xe1U0,Xe1U0,,e2)=(U1,,e2).
    Thus, the distribution of (U0,−∞, e1) is a stationary distribution π for the chain.
  • (iii)
    For each fixed t, U0,−t has the same distribution as
    Ut,0=UetXet1Xe0u0XetXet1Xe0u0, (17)
    which is a realization of the normalized population vector Nt/∥Nt∥, when it is started at the vector u0. Thus, the population-environment Markov chain (Nt/∥Nt∥) converges in distribution to the same stationary distribution π, no matter what the initial condition.
  • (iv)
    We will use frequently the fact that (U0,−t, e1) may be considered an approximate sample from the distribution π, with an error that is bounded by
    ρ(U0,t,U0,)=ρ(Xe0Xetu0,Xe0XetUt1,)ΔU(u0)τB(Xe0Xet),
    which is straightforward to bound (cf. section 3.1), independent of any information about Ut,−∞.

The same holds true, of course, if we reverse the matrix multiplication: Starting from any nonnegative K-dimensional row vector v0T, we define the sequence of row vectors

V0,tTv0TXetXe0v0TXetXe0V.

Then V0,t converges pointwise to a vector V0,∞ := limt→∞ V0,t. We denote the distribution of (V0,∞, e−1) by π~. As before, π~ is the stationary distribution for the Markov chain on V×M, defined by taking (vT, et) at time t to (vT Xet/∥vT Xet∥, e−(t+1)) at time t + 1.

We also define the regular conditional distributions πe on U as follows: pick (U, e) from the distribution π, conditioned on e being e, and take πe to be the distribution of U. We define π~e similarly. In the case of i.i.d. environments, of course, π and π~ would be simply products of an independent population vector and the stationary environment with distribution ν.

4 Errors and How to Bound Them

A standard approach to estimating a — and a similar approach to the derivatives that we describe later on — is to choose a fixed starting vector u0U, simulate sequences e0(i), e1(i), …, emi (i independently from the stationary Markov chain with transition probabilities P (i = 1, …, J), and then compute

amE[logXemi(i)Xemi1(i)Xe0(i)u0Xemi1(i)Xe0(i)u0]1Ji=1J(logXemi(i)Xem1(i)Xe0(i)u0logXemi1(i)Xe0(i)u0). (18)

It is important not only to know what would be an appropriate approximation to a or its derivatives in the sense of being asymptotically correct, but also to have rigorous bounds for the error arising from any finite simulation procedure, such as (18). There are two sources of error: systematic error, arising from the fact that (Xe0, u0) is not exactly a sample from the distribution π, and sampling error, arising from the fact that we have estimated the expectation by averaging over a random sample.

For obtaining an unbiased estimate there is no reason the sample Markov chain realizations should need to be independent. For instance, a standard approach would be to take a single realization of the chain e0; e1; e2,…, and take et(i) = et, with mi = B + i for some “burn-in time” B. The problem with this approach is that it becomes more difficult to bound the errors; in principle, the convergence could be extremely slow for these kinds of sums. We leave the problem of bounding errors for these cumulative simulations to a future work.

4.1 Systematic error

By “systematic error” we mean the error in our estimate of a arising from the difference between the distribution we are aiming for and the distribution we are actually sampling from. The quantity we are trying to estimate may be represented as a = π[F], the expectation of a certain function F with respect to the distribution π. If we can simulate Z1,…, ZJ from π, then a^jj=1JF(Zj) is an unbiased estimator of π[F], and will be consistent under modest Passumptions on F and the independence of the samples. Suppose, though that what we have are not samples from π, but samples Zj from a “similar” distribution π*. Then we can bound the error by

aa^J1j=1JF(Zj)π[F]+π[F]π[F]. (19)

Here the first term on the right-hand side is the sampling error, and the second term is the bias, the expected value of systematic error. The problem is that the bounds we can obtain for the bias are likely to be crude, absent good computational tools for the distribution π. (And if we could compute analytically from π, we wouldn't need to be simulating).

An alternative is to couple the samples Zj from the approximate distribution π′ to exact samples Zj from the distribution π, we can break up the error in a slightly di erent way:

aa^J1j=1JF(Zj)π[F]+J1j=1JF(Zj)J1j=1JF[Zj]J1j=1JF(Zj)π[F]+J1j=1JF(Zj)F(ZJ) (20)

Bounds for the sampling error in (19) will generally also be bounds for the first term in (20). The second term in (20), on the other hand, which takes the place of the bias, is a random variable, computed from the samples Zj. Its expectation is still a bound on the bias. The crucial fact is that the last line may be computable without knowing in detail what the “true” sample Zj is.

A small disadvantage of this approach is that the systematic error varies with the sample. To achieve a particular fixed error bound we need an adaptive approach, whereby we successively extend our sequence of matrices until the error crosses the desired threshold. We note here that this approach to estimating the systematic error in simulations is essentially just a version of the Propp-Wilson algorithm (cf. Chapter 10 of Häggström (2002)). Unlike standard applications, though, the space is continuous, so the systematic error never reaches 0.

4.2 Sampling error

The sampling error is difficult to control with current techniques, because the distribution of the samples is so poorly understood — the very reason why we resort to the Monte Carlo approximation in the rst place. The best we can do for a rigorous bound is to use Hoeffiding's inequality (see Hoeffding (1963)), relying on crude bounds on the terms in the expectation. Hoeffding's inequality tells us that if X1,…,XJ are i.i.d. random variables such that α ≤ Xi β almost surely, then for any z > 0,

P{1JXiE[X]>z}2e2Jz2(βα)2. (21)

This is essentially the same bound that we would estimate from the normal approximation if the standard deviation of X were (β – α)/2. Generally we will want to fix p0, the con dence level, and compute the corresponding z, which will be

z0=(βα)12Jlog(p02). (22)

Of course, the standard deviation will be smaller than this, but we do not know how much smaller. An alternative approach then would be to use the bound 2τ(zJσ^), where σ^ is the standard deviation of the simulated samples, and τ is the cumulative distribution function of the Student t distribution with J – 1 degrees of freedom. This will be a smaller bound, in that sense “better”, but not precisely true for finite samples, to the extent that the sample distribution is not normal. The corresponding bound on the error, at probability level p0, is

z0=σ^Jt1p02(J1), (23)

where tp(J – 1) is the p quantile of the Student T distribution with J – 1 degrees of freedom; that is, if T has this distribution then P{T > tp(J – 1) = p.

These asymptotic bounds are perfectly conventional in Markov chain Monte Carlo analysis (see, for example, Asmussen and Glynn (2007)), and there is no reason particularly to eschew them in this context. They are likely to be quite accurate, and superior to the rigorous bounds, and might also be applied to the setting where the expectations are estimated not from independent samples, but from a single run of the environment chain. We have nonetheless emphasised the Hoeffding-based rigorous bounds in the statements of our theorems for three reasons: First, they are likely to be less familiar, and the reader may require more guidance in applying them to the individual cases; second, because some residue of skepticism must remain for the asymptotic bounds, while these results may be applied to calculations that are otherwise analytically precise; and third, as a spur to further thought on the best ways to bound errors rigorously in these sorts of problems.

5 Growth Rate and Sensitivity to Projection Matrices

We present here extensions of two known results. In these cases (and in later results) we start by defining an estimator that converges to the the quantity we desire, and follow that by bounds on the systematic and sampling errors, as well as an error bound for estimates from a simulation estimator. We state our results on error bounds in the form “The quantity Q may be approximated by the expectation of A, with systematic error bounded by B and sampling error bounded by C(J, p).” This means that if A1,…, AJ are independent realizations of A, then the probability that the true value of Q is not in the interval J−1 ΣAi ± [B + C(J, p)] is no bigger than p. When describing an adaptive bound on the systematic error, B will depend upon the particular simulation result. Again, the sampling error may be bounded either by a universally valid Hoeffding bound, based on known upper bounds on the samples, or by the Student t distribution, using the standard deviation estimated from the sample, which provides a generally much superior bound, but which can only be treated as an approximation.

5.1 Computing a

The stochastic growth rate a is commonly estimated by numerical simulation but, as discussed with examples by Caswell (2001), there is no general way to bound the errors in the estimated values. The following result provides suitable bounds.

Theorem 5.1 Let u0 be any fixed element of U, and Ym := Xem Xem–1 …Xe1, where e0; e1, … form a Markov chain with transition rates P. The stochastic growth rate may be approximated by the simulated expectation of

logXem+1Ymu0Ymu0, (24)

with systematic error bounded by k2rm and sampling error at level p on J samples bounded by

(logsupuUmaxeMXeuinfuUmineMXeu)(logp2J)12. (25)

When the simulated expectation is

1Ji=1JlogXem+1(i)Ym(i)u0Ym(i)u0

we may also bound the systematic error by

1Ji=1Jsupu,u˘Uρ(Ym(i)u,Ym(i)u˘)ΔU(u0)Ji=1JτB(Ym(i)), (26)

where ΔU(u0) is de ned as in (12) and τB is defined as in (6).

5.2 Derivatives with respect to Projection Matrices

We need care in defining derivatives of a with respect to elements of the population projection matrices. As discussed in Tuljapurkar and Horvitz (2003) we must define how the matrix entries change, e.g., do we change fertility rates in a particular environment, or in all possible environments? Although the main formula here is known, Tuljapurkar's (1990) derivation did not justify a crucial exchange of limits (between taking the perturbation to zero and time to infinity). We provide a rigorous proof (see Appendix) and of course the error bounds here are new.

We will suppose that the matrices Xe depend smoothly on a parameter ε, so that we may define Xe := ∂Xe/∂ε, and we define the base matrices to be at ε = 0. In some cases, the parametrization will be defined only for ε ≥ 0, and in those cases we will understand the partial derivatives to be one-sided derivatives, and the limits limε→0 will be the one-sided likits limε↓0.

Theorem 5.2 Let Ue and Ve be independent random variables with distributions πe and π~e (the conditional stationary distributions defined in section 3.2). Then

a(0)=eMνeE[VeTXeUeVeTXeUe] (27)

Each term may be approximated by averaging samples of the form

eMνeV(m)TXeU(m)V(m)TXeU(m), (28)

where u0; v0T are any fixed elements of U, V respectively, U(m)=Xe~1Xe~mu0 and V(m)T=v0TXemXe1, e = ẽ0; ẽ1,…,ẽm form a sample from the Markov chain P̃, and e = e0, e1,…, em form a sample from the Markov chain P. The systematic error may be bounded uniformly by

2(exp(4k2rm)1)a(0), (29)

while the sampling error at level p on J samples is bounded by

(log(p2)2J)12supuU,vTVu˘U,v˘TVeMνe(vTXeuvTXeuv˘Xeu˘v˘TXeu˘). (30)

Suppose the simulated expectation is

1Jj=1JeMνe(V(m)(j))TXeU(m)(j)(V(m)(j))TXeU(m)(j),

where

U(m)(j)=Xe~1(j)Xe~m(j)u0Y~m(j)u0

and

(V(m)(j))T=v0TXem(j)Xe1(j)v0TYm(j).

Let

U(j)Y~m(j)U={Y~m(j)u:uU},
V(j)VY~m(j)={vTYm(j):vTV}.

Then we may also bound the systematic error by

eMνeJj=1JsupuU(j)vTV(j)vTXeuvTXeu(V(m)(j))TXeU(m)(j)(V(m)(j))TXeU(m)(j)eMνeJj=1J(exp{2supuU(j)ρ(u,U(m)(j))+2supvTV(j)ρ(u,V(m)(j))}1)×supuU(j)vTV(j)vTXeuvTXeueMνeJj=1Jsu[uU(j)vTV(j)vTXeuvTXeu(exp{2Δ(τB(Y~m(j))+τB(Ym(j))T)}1). (31)

Note that the bound (29) is given as a proportion of the unknown a…(0). It can be turned into an explicit bound by using an upper bound on a…(0). We have the trivial bound

a(0)eMνesupuUvTVvTXeuvTXeu. (32)

We recall that Δ and τB may be bounded according to the formulas given in section 3.1.

It may seem surprising that the vectors Ue and Ve are independent. If we think of the environments as a doubly infinite sequence with e0 = e, then the vector Ue in equation (27) depends only on the past (that is, ei with i < 0) and Ve depends only on the future (ei with i > 0). By the Markov property, these two are independent, when conditioned on e0 = e.

6 Environments and Coupling

Suppose we change the transition matrix P to a slightly different matrix P(ε), and want to compare population growth along environmental sequences generated by the original and the perturbed matrix. We expect that the perturbed environmental sequences will only occasionally deviate from the environment that we “would have had” in the original distribution of environments. Computing the derivative of a is then a matter of measuring the cumulative deviations due to these changes. These may be split into two parts: First, the process moves under P(ε) to a state e˘ , different from the state e that it would have moved to under P. Then there is a sequence of environments following on from e˘ that is different from the sequence that would have followed from e, until the Markov chain gradually “forgets” its starting point. The change to a new sequence of matrices induces two separate changes on the growing population: The new sequence accumulates a difference in magnitude on its way to stationarity; and it produces a different stationary distribution of unit vectors depending on the starting environment e˘ rather than e.

In this section we examine the first effect. We fix the transition matrix P and compare the growth of total population size when starting from environment e to the growth starting from the stationary distribution ν. A standard method for doing this is coupling. For an outline of coupling techniques in MCMC, see Kendall (2005) and Roberts and Rosenthal (2004). We use coupling in two ways, corresponding to the two components of the Markov chain: the environment and the population vector.

Fix environments e and e˘ (possibly the same). We define sequences e0, e1, …; e˘0, e˘1, …; and e0, e1, …: all three are Markov chains with transition probabilities P, but with e0 = e, e˘0=e˘ and e0 having distribution ν (so that (ei) is stationary). We define the total population effect of starting in state e rather than e˘ as

ζee˘limt(E[logXetXe0]E[logXe˘tXe˘0])ζelimt(E[logXetXe0]E[logXetXe0])=e˘Mνe˘ζee˘. (33)

This is one of the terms that will come into formula (40), the derivative of a with respect to shifting transition probabilities from e to e˘.

Note that when the environments are i.i.d. — so P(e,e˘)=νe˘ — we have

ζee˘=E[logVTXeVTXe˘],

where VT has the distribution π.

Computing ζe depends on coupling the version of the Markov chain starting at e, to another version starting in the distribution ν. We define the coupling time τ to be the first time such that eτ=eτ; after this time the chains follow identical trajectories. If we know the distribution of τ and of the sequences followed by the two chains from time 0 to τ, we can average the diferences in (33) to nd ζ. The advantage of coupling is, first, that it reduces the variability of the estimates, and second, that we know from the simulation when the coupling time has been achieved, which gives bounds on the error. A suitable choice is Griffeath's maximal coupling (Griffeath, 1975) which we will apply in Pitman's (Pitman, 1976) path-decomposition representation. (The coupling is “maximal” in the sense of making the coupling time, and hence the variance of the estimate, as small as possible.) However we must be careful about sampling values of τ, because they may be large if the Markov chain mixes slowly. To deal with this we overweight large coupling times that generate a large contribution to ζ.

Beginning with a fixed environment e, the procedure is as follows:

  • (i)

    Define the sequence of vectors αt := Pt(e, ·) − ν(·). We also define αt+ and αt to be the vectors of pointwise positive and negative parts respectively. Let C(t) be a bound on logXe1XetlogXe˘1Xe˘t, where the ei and e˘i are any environments. From (3) we know that 2k^+2tr^ is a possible choice for C(t).

  • (ii)
    For pairs (t, e), where t is a positive integer and eM, define a probability distribution
    q(t,e˘){νeife=e˘,t=0;0ifee˘,t=0;[αt1+P](e˘)αt+(e˘)otherwise.}
    This is the distribution of the pair (τ, eτ) for the maximally coupled chain. Define
    At=1e˘Mq(t,e˘)C(t),
    and a probability distribution on N×M
    q˚(t,e˘)q(t,e˘)C(t)A.
  • (iii)
    Average J independent realizations of the following random variable: Let (τ, e0) be chosen from the distribution q˚ on N×M, and e˘0 independently from the distribution ν. From these starting states e0 and e˘0, let (e0, …, eτ, …, em) and (e˘0, …, e˘τ, …, e˘m) be realizations of the coupled pair of Markov chains with transition probabilities P, conditioned on the coupling time being τ and eτ=e˘τ=e˘. These realizations are generated from independent inhomogeneous Markov chains running backward, with transition probabilities
    P{ei1=xei=y}=αi1+(x)P(x,y)ΣzMαi1+(z)P(z,y),
    P{e~i1=xe˘i=y}=αi1(x)P(x,y)ΣzMαi1(z)P(z,y).
    We extend the chain past τ, requiring et=e˘t for t > τ, as a realization of the Markov transition probabilities P, to obtain a total sample of predetermined length m. The random variable is then
    ZAC(τ)logXemXeτXe0XemXeτ+1Xe˘τXe˘0.
    (Note that the realizations corresponding to τ = 0 are identically 0. The possibility of τ = 0 has been included only to simplify the notation. In practice, we are free to condition on τ > 0.)

The change from q to q˚ is an example of importance sampling (cf. Chapter V.1 in Asmussen and Glynn (2007)). We oversample the values of the random variable with high τ to reduce the variability of the estimate. The importance sampling makes Z(j) a bounded random variable, with bound A. Imagine that we had a source of perfect samples VT (j) from the distribution π~em, and define

Z~(j)AC(τ(j))logVT(j)Xem(j)Xe0(j)VT(j)Xe˘m(j)Xe˘0(j).

Let Y(j) := Xem(j)Xe0(j) and Y˘(j)Xe˘m(j)Xe˘0(j). Then

Z~(j)Z(j)AΔν(1)C(τ(j))(τB(Y(j))+τB(Y˘(j))). (34)

Since E[Z~]=ζe, we may use (22) to compute the bound

P{ζen1j=1nZ~(j)>2A12Jlog(p02)}p0. (35)

Lemma 6.1 The limits defining the coefficients ζee˘ and ζe exist and are finite. We may approximate ζe by

1Jj=1JAC(τ(j))logXeτ(j)Xe0(j)Xe~τ(j)Xe~0(j). (36)

If 0 < p0 ≤ 1, the probability is no more than p0 that the error in this estimation is larger than

1Jj=1JAΔν(1)C(τ(j))(τB(Y(j))+τB(Y˘(j)))+2A12Jlog(p02), (37)

It remains to bound A. From standard Markov chain theory, for any vector vS,

vTPtvtDξt. (38)

Setting setting Q := P − 1νT, αtT is the e-th row of Qt, which may also be written as

αt=1eT(Qt)T,

where 1e is the column vector with 1 in place e and 0 elsewhere. Thus

αtDξt.

If we use the bound C(t)=k^+tr^, then

A=t=1(k^+tr^)(αt1αt)=k^α1+r^t=1αtDξ(k^+Dr^1ξ) (39)

7 Derivatives with respect to Environmental Transitions

We are now ready to compute derivatives of a with respect to changes in the distribution of environments, as determined by P. Complicating the notation slightly is the constraint {P:e˘P(e,e˘)=1 for each e}; thus, there can be no sense in speaking of the derivative with respect to changes in P(e,e˘) for some particular e, e˘. Instead, we must compute directional derivatives along the direction of some matrix W, in the plane e˘We,e˘=0.

For the purposes of this result we write a(ε) = a(P(ε)), where P(ε) is a differentiable curve of M × M matrices, where the parameter ε takes values either in a two-sided interval [−ε0, ε0], or a one-sided interval [0, ε0]. Let W := ∂P(ε)/∂ε, an M × M matrix whose rows all sum to 0. The perturbations are such that P(ε) retains the ergodicity and irreducibility of P. (The result should be the same whether ε is positive or negative. If P is on the boundary of the set of possible values, one or the other sign may be impossible. Some choices of W may be impossible in both directions.) In the special case in which We,e˘=1 and We,e˘=1, with all other entries 0, we are computing the derivative corresponding to a small increase in the rate of transitioning from environment e to e˘, and a decrease in the rate of transitioning to e˘.

In this section the matrices X1, …, XM are assumed fixed.

Theorem 7.1 The derivative of the stochastic growth rate is

a(0)=e,e˘Mνe˘We˘,e(ζe+E[logVeTXeXe˘Ue˘VeTXe]), (40)

where Ue˘, VeT are independent random variables with distributions πe˘ and π~e respectively.

The quantities ζe may be approximated, with error bounds, according to the algorithm described in section 6.

The other part of the expression may be approximated by averaging samples of the form

e,e~Mνe~We~,elogVe(m)XeXe~Ue~(m)Ve(m)TXe, (41)

where

Ue~(m)=Xe~1Xe~mu0Xe~1Xe~mu0

and

Ve(m)T=v0TXemXe1v0TXemXe1,

and e = e0, e1, …, em is a Markov chain with transition matrix P, and e~=e~0, e~1, …, e~m is an independent Markov chain with transition probabilities P~, and u0U and v0TV.

The systematic error may be bounded uniformly by 2k2rm∥(νT|W|)∥, while the sampling error at level p on J samples bounded by

2νTWsupuMvTVmaxeMlogvTXeu(logp2J)12.. (42)

Suppose the simulated expectation is

1Jj=1Je~,eMνe~We,e~logV(m,e~)T(j)Xe~XeU(m,e)(j)V(m,e~)T(j)Xe~,

where

Ue~(m)(j)Xe~1(j)Xe~m(j)u0Xe~1(j)Xe~m(j)u0Y~e~(m)(j)u0Y~e~(m)(j)u0

and

Ve(m)T(j)=v0TXem(j)Xe1(j)v0TXem(j)Xe1(j)v0TYe(m)(j)v0TYe(m)(j).

We may also bound the systematic error by

1Jj=1Je~,eMνe~We~,e(supu,u˘Uρ(Y~e~(m)(j)u,Y~e~(m)(j)u˘)+supvT,v˘TVρ(vTYe(m)(j)Xe,v˘TYe(m)(j)Xe))ΔJj=1Je~,eMνe~We,e~(τB(Y~e~(m)(j))+τB(XeTYe(m)(j)T)). (43)

We note that expressions like (40) are examples of what Brémaud (1992) calls “ersatz derivatives”. In a rather different class of applications Brémaud suggests applying maximal coupling.

8 Discussion

Our results provide analytical formulas and simulation estimators for the derivatives of stochastic growth rate with respect to the transition probability matrix or the population projection matrices. We have concentrated here on the theoretical results; although this may not be obvious, we have made considerable effort at brevity. Partly for this reason, we will present elsewhere numerical applications of these results. We expect that our results should carry over to integral population models (IPMs), given the strong parallels between the stochastic ergodicity properties of IPMs and matrix models (Ellner and Rees, 2007).

Our results apply not only to stochastic structured populations but to any stochastic system in which a Lyapunov exponent of a product of random matrices determines stability or other dynamic properties. Examples include the net reproductive rate in epidemic models and some models of network dynamics. An obvious application of our results is to the analysis of optimal life histories, i.e., environment-to-projection matrix maps that maximize the stochastic growth rate. As discussed by McNamara (1997), this optimization problem translates into what is called an average reward problem in stochastic control theory, and so our results may be more generally useful in such control problems.

Highlights

  • >

    We analyse the growth rate a in matrix population models driven by Markov environments.

  • >

    We derive formulas for derivatives of a with respect to changes in the Markov parameters.

  • >

    We prove known formulas for the derivatives with respect to demographic parameters.

9 Acknowledgements

We thank NIA (BSR) for support under 1P01 AG22500. David Steinsaltz was supported by a New Dynamics of Ageing grant, a joint interdisciplinary program of the UK research councils.

Appendix

Proofs of the theorems.

A.1 Proof of Lemma 3.2

We have

logvTuv˘TulogvTu˘v˘Tu˘=logi,jviuiv˘ju˘ji,jviu˘iv˘juj=logiuiviu˘iu˘i+i<j(viuiv˘ju˘j+vjujv˘iu˘i)iuiviu˘iu˘i+i<j(viu˘iv˘juj+vju˘jv˘iui)maxi,jlogviuiv˘ju˘j+vjujv˘iu˘iviu˘iv˘juj+vju˘jv˘iui

For some choice of i, j, let

αloguiu˘i,βlogviv˘i,γloguju˘j,δlogvjv˘j.

Note that by (5),

αρ(u,u˘)+loguu˘14,βρ(v,v˘)+logvv˘14,γρ(u,u˘)+loguu˘14,δρ(v,v˘)+loguu˘14. (44)

Then we need to bound

logeα+β+eγ+δeβ+δ+eα+γeα+β+eγ+δeβ+δeα+γmin{eα+β+eγ+δ,eβ+δ+eα+δ}eα+β+eγ+δeβ+δeα+γ, (45)

where we have used the relation |log x| ≤ max{x, 1/x} − 1.

We now use the Taylor series expansion

ex=k=0nxkk!+Cxk+1(k+1)!,

where |C| ≤ max{ex, 1}. This turns the right-hand side of (45) into

2k=21k!(α+β)k+(γ+δ)k(β+δ)k(α+γ)k=2k=2=1k11!(k)!(αβk+γδkβδkαγk)2k=01(k2)!(αβ=0k2(k2)αβk2+γδ=0k2(k2)γδk2+βδ=0k2(k2)βδk2+αγ=0k2(k2)αγk2).

Applying the bounds (44) yields finally the upper bound in (15).

A.2 Estimating the stochastic growth rate

We prove here Theorem 5.1. The quantity we are trying to compute is

a=E[logXeU],where(U,e)is selected from the distributionπ. (46)

Let e0, e1, e2, … be a realization of the stationary Markov chain with transition matrix P. Let Ym := XemXem−1Xe1. Choose u0U, and let U be a random variable with distribution πe0. Then a=E[logXem+1YmUYmU], which may be approximated by E[logXem+1Ymu0Ymu0].

If we identify systematic error with bias, this is

Errorsys=E[logXem+1Ymu0Ymu0]E[logXem+1YmUYmU],

since (YmU/∥YmU∥, em) also has the distribution π (if Ym and U are taken to be independent, conditioned on e0). Thus

ErrorsysE[logXem+1Ymu0Ymu0logXem+1YmUYmU]E[supu,u˘UlogXem+1YmuYmulogXem+1Ymu˘Ymu˘]k2rm,

by (13). The corresponding bound on the sampling error may be computed from (22).

For a particular choice of of e1, …, em+1 and U we can also represent the random systematic error as

logXem+1Ymu0Ymu0logXem+1YmUYmU,

which may be bounded by the summand in (26).

A.3 Estimating sensitivities: Matrix entries

We prove here Theorem 5.2. As discussed at the end of section 3.1, we may assume that the compact sets U and V are stable and satisfy the bounds of section 3.1 simultaneously for all Xe(). The stationary distributions corresponding to products of the perturbed matrices are denoted π(ε) and π~e(), and the corresponding regular conditional distributions are πe(ε) and π~e()

The derivative a′(0) may be written as

lim01(limmE[logXem()Xem1()Xe0()u0Xem1()Xe0()u0]limmE[logXemXem1Xe0u0Xem1Xe0u0])=(limmE[logXe~0()Xe~1()Xe~m()u0Xe~1()Xe~m()u0]limmE[logXe~0Xe~1Xemu0Xe~1Xemu0])=lim0limms=0mas,m()=:lim0limmA(m,),

where

as,m():=E[1(logXe~0()Xe~1()Xe~s()Xe~s+1Xe~mu0Xe~1()Xe~s()Xe~s+1Xe~mu0logXe~0()Xe~1()Xe~s1()Xe~sXe~mu0Xe~1()Xe~s1()Xe~sXe~mu0)].

Here e0, e1, … is the stationary Markov chain with transition probabilities P, and e~0, e~1, … is the reverse stationary Markov chain, with transition probabilities P~.

By (13), for ∊ > 0 sufficiently small

as,m()1k2rs1supuUE[ρ(Xe~s()u,Xe~su)]2k2Crs,

where

C:=r1maxeMmax1K{maxuUXeu(Xeu),maxvVvTXe(vTXe)}.

If we define

Vi,j:=1TXe~i()Xe~i+1()Xe~j()1TXe~i()Xe~i+1()Xe~j()
Ui,j:=Xe~iXe~i+1Xe~ju0Xe~iXe~i+1Xe~ju0,

then for mn,

as,m()as,n()1E[logV(0,s1)TXe~s()U(s+1,m)V(0,s1)TXe~sU(s+1,m)logV(0,s1)TXe~s()U(s+1,n)V(0,s1)TXe~sU(s+1,n)+logV1,s1TXe~s()Us+1,mV1,s1TXe~sUs+1,mlogV1,s1TXe~s()Us+1,nV1,s1TXe~sUs+1,n]

We note by (10) that ρ(Us+1,m, Us+1,n) ≤ k2rms; and for ∊ > 0 sufficiently small and any vV we have

max1jK,eM,log(vTXe())j(vTXe)j2C.

It follows by Lemma 3.2 that

as,m()as,n()min{16k2Crms,4k2Crs}. (47)

Putting these together, we get, for all nm,

A(m,)A(n,)s=m+1nas,n()+s=0mas,m()as,n()s=m+1n2k2Crs+1sm216k2Crms+m2<m4k2Crs20k2C1rrm2.

By Lemma A.1 we may exchange the order of the limits, to see that

a=limmlim01s=1m(E[logXem()Xem1()Xes()Xes1Xe0u0Xem1()Xes()Xes1Xe0u0]E[logXem()Xem1()Xes+1()XesXe0u0Xem1()Xes+1()XesXe0u0]). (48)

This limit is the same for any choice of u0, hence would also be the same if we replaced u0 by a random U, with any distribution on U. We choose U to have the distribution πe0, independent of the rest of the Markov chain. By the invariance property of the distributions π,

a=limmlim01(s=1mE[logXem()Xem1()Xes()Xes1Xe0U0Xem1()Xes()Xes1Xe0U0]E[logXem()Xem1()Xes+1()XesXe0U0Xem1()Xes+1()XesXe0U0])=limms=0mlim01E[logXem()Xem1()Xes()Us1Xem1()Xes()Us1logXem()Xem1()Xes+1()XesUs1Xem1()Xes+1()XesUs1], (49)

where (Us−1, es) has distribution π.

For ms ≥ 1 define functions

fs(,δ)1TXem(δ)Xem1(δ)Xes+1(δ)Xes()Us11TXem1(δ)Xes+1(δ)Xes()Us1,

where the denominator is understood to be 1 for s = m. The summand on the right of (49) may be written as

lim01E[logfs(,)log,fs(0,)]. (50)

By (13),

1Elogfs(,δ)log,fs(0,δ)]1k2rsρ(Xes()U,XesU)<2Ck2rs

for ∊ in a neighborhood of 0, so the Bounded Convergence Theorem turns (50) into

E[lim01(logfs(,)log,fs(0,))]=E[logfs(0,0)]. (51)

We have, by linearity of the matrix product and ∥ · ∥,

logfs(0,0)=1TXemXes+1Xes()Us1XemXesUs11TXem1Xes+1Xes()Us1Xem1XesUs1={(1TXemXes+1XesUs1XemXesUs11TXem1Xes+1XesUs1Xem1XesUs1)for1sm1,1XesUm1XemUm1fors=m}

Combining this with (49) yields the telescoping sum

a(0)=limm(t=1mE[1TXetXe1Xe0U1TXetXe1Xe0U])t=1m1E[1TXetXe1Xe0U1TXetXe1Xe0U]=limmE[1TXemXe1Xe0U1TXemXe1Xe0U],

where in the last line (U, e0) has the distribution π. De ne VmT:=1TXemXe11TXeme1. Then VmT converges in distribution to V, with distribution π~e0 (conditioned on e0), and so

a(0)=limmEVmTXe0UVmTXe0U=eMνeEVeTXeUeVeTXeUe,

which is identical to (27).

Now we estimate the error. We use the representation

UXe~1Xe~mU0Xe~1Xe~mU0
VTV0TXemXe1V0TXemXe1,

where U0 and V0T are assumed to have distributions πe~m and π~em respectively. Then

V(m)TXeU(m)V(m)TXeU(m)VTXeVTXeU2(e2ρ(V,V(m)T)+2ρ(U,U(m))1)VXeUVXeU2(e4k2rm1)VXeUVXeU, (52)

by (13). This implies the uniform bound on systematic error, and the bound on sampling error (30) follows from applying (22) to a trivial bound on the terms in the average. The simulated bound (31) also follows directly from (52).

Lemma A.1 Let A(m, ∊) be a two-dimensional array of real numbers, indexed by m ∈ N and ∊ > 0, with A(m, 0) := lim∊↓0 A(m, ∊) existing for each m, and A(∞, ∊) := limm→∞ A(m, ∊) existing for all ∊ sufficiently small (independent of m). Suppose A satisfies

limMlim sup0supm,n>MA(m,)A(n,)=0. (53)

Then the two limits

limmlim0A(m,)

and

lim0limmA(m,)

are equal; in particular, if one exists the other exists as well.

Proof. Suppose A* := lim∊↓0 A(∞, ∊) exists. Then we need to show that A* = limm→∞ A(m, 0).

Choose any δ > 0, and choose M such that

lim sup0supm,n>MA(m,)A(n,)<δ.

This means that we may find ∊0 > 0 such that |A(m, ∊) − A(n, ∊)| < 2δ when 0 < ∊ < ∊0 and m, n < M, and such that also |A(∞, ∊) − A*| < 2δ for all 0 < ∊ < ∊0. It follows, in particular, that |A(m, ∊) − A*| < 4δ when m > M and 0 < ∊ < ∊0. Thus |A(m, 0) − A*| < 4δ for m > M, and consequently |lim supm→∞ A(m, 0) − A*| < 4δ. Since δ is arbitrary, it follows that limm→∞ A(m, 0) = A*.

The converse result (starting from the assumption that limm→∞ A(m, 0) exists) follows identically.

A.4 Estimating sensitivities: Markov environments

We derive here the formula (40) by a combination of the coupling method and importance sampling. We use importance sampling for the actual computation, but coupling provides a more direct path to validating the crucial exchange of limits. As in the proof of Theorem 5.2, the error bounds are an obvious consequence of the formula (40) and the general formulas for errors described in section 4.

Given two distributions q and q* on {1, …,M}, we define a standard coupling between q and q*. Suppose we are given a uniform random variable ω on [0, 1]. Let M{e:qe<qe} and M+{e:qe<qe}. Let δΣeM(qeqe)=ΣeM+(qeqe). We define three random variables e˘ on M, e˘+ on M+, and e˘ on M, according to the following distributions:

P{e˘=e}=min{qe,qe}(1δ),
P{e˘+=e}={qeqe}+δ,
P{e˘=e}={qeqe}+δ,

The joint distribution is irrelevant, but for definiteness we let them be independent. Then we define the coupled pair (e, e*) to have the values

(e˘,e˘)ifω>δ.(e˘+,e˘)ifωδ. (54)

Then e has distribution q, e* has distribution q*, and e = e* with probability 1 − δ. This δ is called the total-variation distance between q and q*.

We write EP for the expectation with respect to the distribution that makes e0, …, em a stationary Markov chain with transition matrix P. Define ν(∊) to be the stationary distribution corresponding to P(∊), and define P~() to be the time-reversed chain of P(∊). We define

g(m;;u)EP()[logXemXem1Xe0uXem1Xe0u]EP[logXemXem1Xe0uXem1Xe0u].

By the time-reversal property,

g(m;;u)EP~()[logXe0Xe1XemuXe1Xemu]EP~[logXe0Xe1XemuXe1Xemu].

For ∊ > 0 we couple a sequence e0, …, em selected from the distribution P~ to a sequence e0(),,em() selected from the distribution P~() as follows: We start by choosing (e0,e0()) according to the standard coupling of (ν, ν(∊). Assume now that we have produced sequences of length i, ending in ei−1 and ei1(). We then produce (ei,ei()) according to the standard coupling of row ei−1 of P~ to row ei1() of P~(). (To simplify the typography in some places we use e(i) and e(∊)(i) interchangeably with ei and ei().)

Let δ = δ(∊) be the maximum of the total variation distance between ν and ν(∊), and all of the pairs of rows. It is easy to see that there is a constant c such that δ ≤ c∊ for ∊ sufficiently small. Define ω1, ω2, … to be an i.i.d. sequence of uniform random variables on [0; 1], and two sequences of random times as follows: T0 := S0 := −1, and

Ti+1=min{t>Si:ωtδ},
Si+1=min{t>Ti+1:et()=et}.

Thus, et()=et for all Sit < Ti+1. Define for any u0U the random vector

UtlimmXe(t)Xe(t+m)u0Xe(t)Xe(t+m)u0,

and define a version of g conditioned on T1 and T2

g(m;;u;T1,T2)EP~()[logXe0Xe1XemuXe1XemuT1,T2]EP~[logXe0Xe1XemuXe1XemuT1,T2].

Then for any uU,

Wa(P)=a(0)=lim0limm1E[g(m;;u;T1,T2)]. (55)

We also define

γ(;T1,T2)E[logXe()(0)Xe()(S11)US1Xe()(1)Xe()(S11)US1logXe(0)Xe(1)Xe(S11)US1Xe(1)Xe(S11)US1T1,T2]

We break up these expectations into their portion overlapping three different events:

  • (i)

    {T1 > m};

  • (ii)

    {T2 > mT1};

  • (iii)

    {mT2}.

On the event {T1 > m} we have g(m, ∊ u; T1, T2) = 0, and T1m is geometrically distributed with parameter δ. By (13), γ is bounded by k2rT1−1.

On the event {T2 > mT1}: We have e(∊)(i) = e(i) for i < T1 and for S1im. If S1m,

US1=Xe(S1)Xe(m)Um+1Xe(S1)Xe(m)Um+1=Xe(S1)()Xe(m)()Um+1Xe(S1)()Xe(m)()Um+1

Thus we may write where

γ(;T1,T2)g(m,;u;T1,T2)E[logXe(0)Xe(T11)Xe()(T1)Xe()(m)U˘()Xe(1)Xe(T11)Xe()(T1)Xe()(m)U()T1,T2]E[logXe(0)Xe(T11)Xe()(T1)Xe()(m)uXe(1)Xe(T11)Xe()(T1)Xe()(m)uT1,T2]+E[logXe(0)Xe(1)Xe(m)U˘Xe(1)Xe(m)U˘logXe(0)Xe(T11)Xe(T1)Xe(m)uXe(1)Xe(T11)Xe(T1)Xe(m)uT1,T2]2k2rm,

where U˘()=U˘=Um+1 if S1m; otherwise

U˘()=Xe()(m+1)Xe()(S11)US1Xe()(m+1)Xe()(S11)US1;
U˘=X(m+1)X(S11)US1X(m+1)X(S11)US1.

On the event {T2m}: The above approach shows that

γ(;T1,T2)g(m,;u;T1,T2)2k2rT21. (57)

Combining these bounds, we obtain

γ(;T1,T2)g(m,;u;T1,T2)k2rT111{T1>m}+2k2rm1{T1m}+2k2rT21. (58)

Taking the expectation with respect to the distribution of T1 and T2, using the fact that T1 and T2S1 are independent with distribution geometric with parameter δ, we obtain

E[γ(;T1,T2)g(m,;u;T1,T2)]k21rrm1δ+2k2δmrm+2k2r2(1r)2δ2. (59)

Since δ is bounded by a constant times |∊|, we may find a constant C such that (by the triangle inequality) for all ∊, positive integers m, and uU,

E[γ(;T1,T2)]E[g(m,;u;T1,T2)]E[γ(;T1,T2)g(m,;u;T1,T2)]C(mrm+2). (60)

This bound allows us to exchange the limits in (55):

a(0)=lim0limm1E[g(m,;u;T1,T2)]=lim01E[γ(;T1,T2)]=limmlim01E[g(m,;u;T1,T2)]=limmdd=0EP()[logXemXem1Xe0uXem1Xe0u] (61)

Now we apply the method of importance sampling. We may assume without loss of generality that W (e, e′) = 0 whenever P (e, e′) = 0 (using the analyticity of a, and the fact that the formula (40) is nonsingular on the nonnegative orthant). For any function Z:Mm+1R,

EP()[Z(e0,,em)]=EP[Z(e0,,em)F(;e0,,em)],

where F is the Radon-Nikodym derivative

F(;e0,,em)=dP()dP(e0,,em)=νe0()νe0i=0m1P()(ei,ei+1)P(ei,ei+1).

This allows us to rewrite

a(0)=limmdd=0EP[νe0()νe0i=0m1P()(ei,ei+1)P(ei,ei+1)logXemXem1Xe0uXem1Xe0u] (62)

For any fixed m, there is an upper bound on ∊−1(F(∊ e0, …, em) −1), so we may move the differentiation inside the expectation, to obtain

a(0)=limmEP[dd=0νe0()νe0i=0m1P()(ei,ei+1)P(ei,ei+1)logXemXem1Xe0uXem1Xe0u]=limmEP[(νe0)1dνe0()d=0logXemXem1Xe0uXem1Xe0u]+limmi=0m1EP[W(ei,ei+1)P(ei,ei+1)logXemXem1Xe0uXem1Xe0u] (63)

The first limit is 0. To see this, rewrite it as a sum over possible values of e0:

limmeMνeEPe[(νe)1dνe()d=0logXemXem1XeuXem1Xeu]=limmeMdνe()dEPe[logXemXem1Xe1XeuXem1Xe1Xeu]

Since ν(∊) is a probability distribution, it must be that e~=1Mdνe~()d=0. Thus, the expression in the limit becomes 0 if we replace the expectation by a constant, independent of e. By Lemma A.2 it follows that the limit is 0.

To compute the other limit, we sum over all possible pairs (ei, ei+1) = (e~,e). The summand becomes

e~,eMν(e~)W(e~,e)E[logXemXem1Xe0uXem1Xe0uei=e~,ei+1=e] (64)

In order to analyze this, we need to consider the distribution of e0, …, em, conditioned on ei=e~ and ei+1 = e. By the Markov property, this splits into two independent Markov chains: e = ei+1, …, em is a Markov chain of length mi, with transition probabilities P and starting point e, while e~=ei, ei−1, …, e0 is a Markov chain of length i + 1 with transition probabilities P~ and starting point e~. Define two independent infinite sequences e~0, e~1, … and e0e1, …, which are Markov chains with transitions P~ and P respectively, beginning in e~0=e~ and ei+1 = e. Define for i1,Uˇi(e~)Xe~Xe~1Xe~iu with Uˇ0(e~)1, and VˇiT(e)1TXei1Xe1Xe with Vˇ0T(e)1T. Also define

Ui(e)Uˇi(e)Uˇi(e),ViT(e)VˇiT(e)VˇiT(e).

Since ∥u∥ = 1T u for any nonnegative column vector u, the expression (63) becomes

a(0)=e~,eMν(e~)W(e~,e)limm{i=0m1(E[logVˇi+1T(e)Uˇm1(e~)]E[logVˇiT(e)Uˇm1(e~)])}=e~,eMν(e~)W(e~,e)limm{i=0m1(E[logVˇi+1T(e)]E[logVˇiT(e)]+E[logVi+1T(e)Ui(e~)]E[logViT(e)Ui(e~)])}=e~Mν(e~)limmeMW(e~,e)E[logVˇmT(e)]+e,e~Mν(e~)W(e~,e)limmi=0m1E[logVi+1T(e)Um1(e~)ViT(e)Um1(e~)] (65)

In the last line we have used the fact that ΣeMW(e~,e)=0, which means that ΣeMW(e~,e)E[logVˇ0T(e)]=0 as well, since Vˇ0T(e)=1T is independent of e. The same reasoning implies that if we define VˇiT(ν) to be the version of VˇiT started in the stationary distribution — for instance, starting from realizations of VˇiT(e), define VˇiT(ν) to be equal to VˇiT(e) with probability νe — then ΣeMW(e~,e)E[logVˇmT(ν)]=0. The first term on the right-hand side of (65) may then be written as

e~,eMν(e~)W(e~,e)limmE[logVˇmT(e)VˇmT(ν)]=e~,eMν(e~)W(e~,e)ζe. (66)

To compute the second term, we note that U(e~)limiUi(e~) exists, with distribution πe~, and ρ(Ui(e~),U(e~))k2ri; similarly, VT(e)limiViT(e) exists, with distribution π~e, and ρ(Vi(e),Vi+1(e))k2ri. Thus

logVi+1T(e)Um1(e~)logViT(e)Um1(e~)ρ(Vi(e),Vi+1(e))k2ri,

We break up the sum on the right-hand side of (65) into three pieces:

0im1E[logVi+1T(e)Um1(e~)logViT(e)Um1(e~)]=0im2E[logVi+1T(e)U(e~)logViT(e)U(e~)]+0im2E[logVi+1T(e)Um1(e~)Vi+1T(e)U(e~)logViT(e)Um1(e~)ViT(e)U(e~)]+m2im1E[logVi+1T(e)Umi(e~)logViT(e)Umi(e~)].

The first sum telescopes to

E[logV1+m2T(e)U(e~)logV0T(e)U(e~)]=E[logV1+m2T(e)U(e~)]logK,

applying the fact that V0T=1TK, so that V0T(e)U(e~)=U(e~)K1K. Applying (9), the second and third sums are bounded by

0im22k2rm1+m2<im1k2ri3k2rm21r.

Thus

limmi=0m1E[logVi+1T(e)Um1(e~)ViT(e)Umi(e~)]=E[VT(e)U(e~)], (67)

completing the proof of Theorem 7.1.

Lemma A.2. For any u, u˘U and e, eM, if we let e0, e1, … and e0,e1, … be realisations of the Markov chain P starting at e0 = e and e0=e respectively. Then

E[logXem+1Xe0uXemXe0u]E[logXem+1Xe0u˘XemXe0u˘]k2D1r(m+1)(ξr)m, (68)

where ξ and D are the constants that satisfy (38).

Proof. Using the maximal coupling, we create coupled versions of (ei, ei), such that the coupling time τ satisfies

P{τt}νPt(e,)+νPt(e,)2Dξt.

Define

uτXeτ1Xe0uXeτ1Xe0u,u˘τXeτ1Xe0Xeτ1Xe0u˘.

Then by the bound (13),

E[logXem+1Xe0uXemXe0u]E[logXem+1Xe0u˘XemXe0u˘]E[logXem+1Xe0uXemXe0ulogXem+1Xe0u˘XemXe0u˘]=E[logXem+1XeτuτXemXe0ulogXem+1Xeτu˘XemXeτu˘τ]E[k2rmτ]k2D1rt=0mrmtξtk2D1r(m+1)(ξr)m.

A.5 A version of the Furstenberg-Kesten Theorem

Theorem A.3 (Furstenberg-Kesten) The limit defined in (2) exists almost surely, and is deterministic, given by (46).

Proof. Let e1, e2,… be a stationary Markov chain with transition matrix P. If we choose N0U, this induces a Markov chain Yt := (Nt–1/∥Nt–1∥, et) with state space U×M. We extend the Hilbert metric to U×M by ρ((u, e), (u*, e*)) = ρ(u, u*) + 1{ee*}.

If we define the function F : U×MR by

F(u,m)logXmu,

then

logNt=i=1tlogNiNi1=i=1tlogXeiNi1Ni1=i=1tF(Yi).

If (Yi) were a chain on a finite space, we would invoke the law of large numbers for Markov chains, also known as the pathwise Individual Ergodic Theorem (cf. Theorem 1.10.2 of Norris (1998)) to see that

limtt1logNt=limtt1i=1tF(Yi)existsandequalsπ[F]πyF(y),

where π is the unique stationary distribution. The same would be true for a positive recurrent Markov chain on a countable state space.

On a more general state space, the pathwise ergodic theorem still holds as long as the Markov chain is uniquely ergodic; that is, it has a unique stationary distribution π (see Theorem 6.1 of Hernández-Lerma and Lasserre (1998), or Chapter 6 of Walters (1982)), and π is ergodic.

We first show the existence of π. Choose u0U, and let …, e−1, e0, e1, e2, … be a stationary Markov chain on M with transitions P, infinite in both directions (as in section 3.2). Then

ρ(U1,n,U1,n1)=ρ(Xe1Xenu0,Xe1XenXen1u0)k2rn,

implying that (U1,n)n=1 is always a Cauchy sequence, hence converges to a random variable U1,U. Define π to be the distribution of the pair (U−1,−,∞, e0).

Now, conditioned on starting at Y0 = (U−1,−∞, e0), the next step Y1 is (Xe0U−1,−∞/∥Xe0U−1,−∞e1). Notice that

Xe0U1,Xe0U1,=U0,

has the same distribution as U−1,−∞, so that π is a stationary distribution for (Yt).

It remains to show that π is uniquely ergodic. Suppose that we start the chain in an alternative distribution μ. Define Y0 = (U, e1) to have distribution π and Y0=(U,e1) to have distribution μ. Let e1, e2, … and e1,e2 … be coupled versions of the environment chain, started in e1 and e1 respectively. For definiteness, we say that they are independent up to the first time τ when eτ=eτ — and then eτ+i=eτ+i for i ≥ 0. Then

Ym(XemXe1UXemXe1U,em+1)
Ym(XemXe1UXemXe1U,em+1)

are both realisations of the Markov chain Y, with initial distribution of π and μ respectively. We write μm for the distribution of Ym. We see that

ρ(Ym,Ym)k2rmτ1r>mP0asm.

Thus, μm → π weakly. By Theorem 6.12 of Walters (1982), it follows that π is ergodic (in fact, strong mixing). Furthermore, if μ were an invariant distribution then μm = μ, so the convergence implies μ = π. Therefore, π is uniquely ergodic.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Asmussen S, Glynn P. Stochastic simulation: Algorithms and analysis. Springer Verlag; 2007. [Google Scholar]
  2. Barnsley MF, Elton JH. A new class of Markov processes for image encoding. Annals of Applied Probability. 1988;20:14–32. [Google Scholar]
  3. Boyce M, Haridas C, Lee C. Demography in an increasingly variable world. Trends in Ecology & Evolution. 2006;21(3):141–148. doi: 10.1016/j.tree.2005.11.018. [DOI] [PubMed] [Google Scholar]
  4. Brémaud P. Maximal coupling and rare perturbation sensitivity analysis. Queueing Systems. 1992;11:307–33. [Google Scholar]
  5. Bushell PJ. Hilbert's metric and positive contraction mappings in a Banach space. Archive for Rational Mechanics and Analysis. 1973;52(4):330–338. [Google Scholar]
  6. Caswell H. Matrix population models: construction, analysis and interpretation. 2nd edition Sinauer Associates inc. Publishers; Sunderland, Mass.: 2001. [Google Scholar]
  7. Cohen J. Ergodicity of age structure in populations with Markovian vital rates. II. General states. Advances in Applied Probability. 1977:18–37. [Google Scholar]
  8. Ellner S, Rees M. Stochastic stable population growth in integral projection models: theory and application. Journal of Mathematical Biology. 2007;54(2):227–256. doi: 10.1007/s00285-006-0044-8. [DOI] [PubMed] [Google Scholar]
  9. Elton JH. A multiplicative ergodic theorem for Lipschitz maps. Stochastic Processes Appl. 1990;34(1):39–47. [Google Scholar]
  10. Furstenberg H, Kesten H. Products of random matrices. The Annals of Mathematical Statistics. 1960a;31(2):457–469. [Google Scholar]
  11. Furstenberg H, Kesten H. Products of random matrices. The Annals of Mathematical Statistics. 1960b;31(2):457–69. [Google Scholar]
  12. Golubitsky M, Keeler EB, Rothschild M. Convergence of the age structure: applications of the projective metric. Theoretical population biology. 1975;7(1):84. doi: 10.1016/0040-5809(75)90007-6. [DOI] [PubMed] [Google Scholar]
  13. Griffeath D. A maximal coupling for Markov chains. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete. 1975;31:95–106. [Google Scholar]
  14. Grimmett G, Stirzaker D. Probability and random processes. Oxford University Press; USA: 2001. [Google Scholar]
  15. Häggström O. Finite Markov Chains and Algorithmic Applications. Cambridge University Press; Cambridge: 2002. [Google Scholar]
  16. Haridas CV, Tuljapurkar S. Elasticities in variable environments: properties and implications. Am Nat. 2005;166(4):481–95. doi: 10.1086/444444. [DOI] [PubMed] [Google Scholar]
  17. Hernández-Lerma O, Lasserre JB. Ergodic theorems and ergodic decomposition for Markov chains. Acta Applicandae Mathematicae. 1998;54:99–119. [Google Scholar]
  18. Hoeffding W. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association. 1963:13–30. [Google Scholar]
  19. Kendall WS. Notes on perfect simulation. In: F. L, J-S. W, editors. Markov Chain Monte Carlo: Innovations and Applications. 2005. pp. 93–146. World Scientific. [Google Scholar]
  20. Lande R, Engen S, Saether B-E. Stochastic Population Dynamics in Ecology and Conservation. Oxford University Press; Oxford: 2003. [Google Scholar]
  21. Lange K. On cohen's stochastic generalization of the strong ergodic theorem of demography. Journal of Applied Probability. 1979:496–504. [PubMed] [Google Scholar]
  22. Lange K, Holmes W. Stochastic stable population growth. Journal of Applied Probability. 1981;18(2):325–334. [PubMed] [Google Scholar]
  23. Lee RD, Tuljapurkar S. Stochastic population forecasts for the united states: Beyond high, medium, and low. Journal of the American Statistical Association. 1994;89(428):1175–1189. [PubMed] [Google Scholar]
  24. McNamara J. Optimal life histories for structured populations in fluctuating environments. Theoretical Population Biology. 1997;51(2):94–108. [Google Scholar]
  25. Metz J, Nisbet R, Geritz S. How should we define `fitness' for general ecological scenarios? Trends in Ecology & Evolution. 1992;7(6):198–202. doi: 10.1016/0169-5347(92)90073-K. [DOI] [PubMed] [Google Scholar]
  26. Morris W, Doak D. Quantitative conservation biology. Sinauer Associates, inc. Publishers; Sunderland, Massachusetts, USA: 2002. [Google Scholar]
  27. Morris W, Pfister C, Tuljapurkar S, Haridas C, Boggs C, Boyce M, Bruna E, Church D, Coulson T, Doak D. Longevity can buffer plant and animal populations against changing climatic variability. Ecology. 2008;89(1):19–25. doi: 10.1890/07-0774.1. [DOI] [PubMed] [Google Scholar]
  28. Norris J. Markov Chains. Cambridge University Press; 1998. [Google Scholar]
  29. Peres Y. Domains of analytic continuation for the top lyapunov exponent. Annales de l'Institut Henri Poincaré. Probabilités et Statistiques. 1992;28(1):131–48. [Google Scholar]
  30. Pitman J. On coupling of markov chains. Probability Theory and Related Fields. 1976;35(4):315–322. [Google Scholar]
  31. Roberts GO, Rosenthal JS. General state space Markov chains and MCMC algorithms. Probab. Surv. 2004;1:20–71. electronic. [Google Scholar]
  32. Seneta E. Non-negative matrices and Markov chains. Springer Verlag; 1981. [Google Scholar]
  33. Steinsaltz D. Locally contractive iterated function systems. Annals of Probability. 1999:1952–1979. [Google Scholar]
  34. Tuljapurkar S. Population dynamics in variable environments. III. Evolutionary dynamics of r-selection. Theoretical Population Biology. 1982;21(1):141–165. doi: 10.1016/0040-5809(85)90019-x. [DOI] [PubMed] [Google Scholar]
  35. Tuljapurkar S. Lecture notes in biomathematics. Springer-Verlag; New York: 1990. Population dynamics in variable environments; p. 85. [Google Scholar]
  36. Tuljapurkar S, Horvitz CC, Pascarella JB. The many growth rates and elasticities of populations in random environments. The American Naturalist. 2003;162(4):489–502. doi: 10.1086/378648. [DOI] [PubMed] [Google Scholar]
  37. Tuljapurkar SD, Orzack SH. Population dynamics in variable environments. I. Long-run growth rates and extinction. Theoretical Population Biology. 1980;18:314–342. [Google Scholar]
  38. Walters P. An Introduction to Ergodic Theory. Springer Verlag; New York, Heidelberg, Berlin: 1982. [Google Scholar]

RESOURCES