Derivatives of the Stochastic Growth Rate

David Steinsaltz; Shripad Tuljapurkar; Carol Horvitz

doi:10.1016/j.tpb.2011.03.004

. Author manuscript; available in PMC: 2012 Aug 1.

Published in final edited form as: Theor Popul Biol. 2011 Apr 2;80(1):1–15. doi: 10.1016/j.tpb.2011.03.004

Derivatives of the Stochastic Growth Rate

David Steinsaltz ¹, Shripad Tuljapurkar ², Carol Horvitz ³

PMCID: PMC3186700 NIHMSID: NIHMS287297 PMID: 21463645

Abstract

We consider stochastic matrix models for population driven by random environments which form a Markov chain. The top Lyapunov exponent a, which describes the long-term growth rate, depends smoothly on the demographic parameters (represented as matrix entries) and on the parameters that define the stochastic matrix of the driving Markov chain. The derivatives of a — the “stochastic elasticities” — with respect to changes in the demographic parameters were derived by Tuljapurkar (1990). These results are here extended to a formula for the derivatives with respect to changes in the Markov chain driving the environments. We supplement these formulas with rigorous bounds on computational estimation errors, and with rigorous derivations of both the new and the old formulas.

1 Introduction

Stochastic matrix models for structured populations are widely used in evolutionary biology, demographic forecasting, ecology, and population viability analysis (e.g., Tuljapurkar (1990); Lee and Tuljapurkar (1994); Morris and Doak (2002); Caswell (2001); Lande et al. (2003)). In these models, a discrete-time stochastic process drives changes in environmental conditions that determine the population's stage-transition rates (survival, fertility, growth, regression and so on). Population dynamics are described by a product of randomly chosen population projection matrices. In most biological situations the population's stage structure converges to a time-varying but stable structure Cohen (1977), and in the long run the population grows at a stochastic growth rate a that is not random and is the leading Lyapunov exponent of the random product of population projection matrices Furstenberg and Kesten (1960a); Cohen (1977); Lange (1979); Lange and Holmes (1981); Tuljapurkar and Orzack (1980). This growth rate a is of considerable biological interest, as a fitness measure for a stage-structured phenotype Tuljapurkar (1982), as a determinant of population viability and persistence Tuljapurkar and Orzack (1980); Morris and Doak (2002); Lande et al. (2003), and in a variety of invasion problems in evolution and epidemiology Metz et al. (1992).

The map between environments and projection matrices describes how pheno-type change depends on the environment — the phenotypic norm of response — and we are often interested in how populations respond to changes in, say, the mean or variance of the projection matrix elements. Such questions are answered by computing the derivatives of a with respect to changes in the projection matrices, using a formula derived by Tuljapurkar (1990). Tuljapurkar et al. (2003) called these derivatives stochastic elasticities, to contrast with the elasticity of the dominant eigenvalue of a fixed projection matrix to the elements of that matrix (Caswell, 2001). Stochastic elasticity has been used to examine evolutionary questions (Haridas and Tuljapurkar, 2005) and the effects of climate change (Morris et al., 2008). At the same time, a is also a function of the stochastic process that drives environments. Many processes, such as climate change (Boyce et al., 2006), will result in changes in the frequencies of, or the probabilities of transition between, environmental states. How is a affected by a change in the pattern and distribution of environments, rather than by a change in the population projection matrices? To answer this question, we consider a model in which the environment makes transitions among several discrete states, according to a Markov chain. Then what we want is the derivative of a with respect to changes in the transition probabilities of this Markov chain. This derivative exists (at least away from the boundaries of the space of stochastic matrices), and in fact we know from Peres (1992) that a is an analytic function of the parameters of both the projection matrices and the parameters defining the stochastic matrix, in an open neighborhood of the set of stochastic matrices. In deterministic models (Caswell, 2001), the growth rate is represented as λ = e^r; then sensitivities are derivatives of the form (∂λ/∂x) with respect to a parameter x whereas elasticities are proportional derivatives of the form (∂r/∂ log x). In stochastic models we compute derivatives of a, and these can be used to compute elasticities (as in Tuljapurkar et al. (2003)) or sensitivities.

Our first contribution here is a new formula for computing the derivative of a with respect to changes in the transition probabilities of the environmental Markov chain, given in abstract form as equation (40). To obtain this result we show how an initial environmental state affects future growth, using coupling and importance sampling; this analysis may be of independent interest. Even with a formula in hand we must compute derivatives of a by numerical simulation which is subject to both sampling (Monte Carlo) error and bias. Our second contribution here is to show how one can bound these estimation errors. Our third contribution is a rigorous proof of the heuristically derived formula given by Tuljapurkar (1990) for the derivatives of a to the elements of the population projection matrices.

In Section 2 of this paper we set out the model and assumptions, the approach to finding derivatives, along with necessary facts about the convergence of population structures and distributions. In Section 3 we discuss systematic and sampling errors and show how we can bound them. We illustrate this approach in Section 4 by presenting bounds (in Theorem 1) for simulation estimates of the stochastic growth rate a and (in Theorem 2) for the derivatives of a with respect to projection matrix elements. In Section 5 we define a measure of the effect of an initial environmental state on subsequent population growth and show how to estimate this measure using coupling arguments. Section 6 presents (in Theorem 4) the formula, algorithm, and error bounds for the derivative of a with respect to the elements of the Markov chain that drives environments. We end by discussing how these theorems can be applied and some related issues concerning parameter estimation in such models. Proofs are in the Appendix.

2 The Model

We consider a population whose individuals exist in K different stages (these may be, for example, ages, developmental stages or size classes). Newborns are in stage i = 1. The progression between stages occurs at discrete time intervals at rates that depend on the environment in each time interval. The environment e_t in period t is in one of M possible states; we denote the set of possible environments by $M$ = {1, …, M}. Individuals in stage i at time t move to stage j at a rate X_{e_t+1}(j, i). These rates are elements of a nonnegative population projection matrix, and at time t when the environment is e_t this matrix is denoted by X_{e_t}; there are M such matrices, one for each environmental state. We assume that allocation of individuals to classes and the identification of environment states are certain. We also assume that the total number of individuals in the population is large enough that we can ignore sampling variation. Successive environments are chosen according to a Markov process with transition matrix P whose elements are $P (e, \overset{˘}{e})$ and whose stationary distribution is ν = {ν(e)}. We follow the standard convention for Markov chains, that $P (e, \overset{˘}{e})$ represents the probability of a transition from state e to state $\overset{˘}{e}$ ; note that this is the reverse of the convention used in matrix population models.

To guarantee demographic weak ergodicity (Cohen 1977) we assume that

(i)
There exists some R > 0 such that any product X_e₁ … X_{e_R} has all entries positive. This implies, in particular, that each population projection matrix is row-allowable and column-allowable. That is, every row and every column has at least one positive entry.
(ii)
The chain is ergodic (so transitive and aperiodic), and environments are in the stationary distribution of P, which will also be denoted by ν(e).

The population in year t is represented by a vector $N_{t} \in R_{+}^{K} ≔ {{(n_{t} (1), \dots, n_{t} (K))}^{T} : n_{t} (i) > 0}$ . The superscript T will always mean transpose; here it indicates that population vectors are column vectors. (There may be population classes early on that have 0 members; condition (ii) above forces all classes eventually to have positive membership, and so we assume without loss of generality that we start with all population classes occupied.) The population structure changes according to N_t+1 = X_{e_t+1}N_t, and

N_{t} = X_{e_{t}} X_{e_{t - 1}} \dots X_{e_{1}} N_{0} .

(1)

The normalized population structure N_t/(Σ_iN_t(i)) does not converge to a fixed limit (as it would if the environment were constant) but it does converge in distribution.

At each stage i, we define

a ≔ lim_{t \to \infty} t^{- 1} log N_{t} (i) lim_{t \to \infty} t^{- 1} log \frac{Σ_{i} N_{t} (i)}{Σ_{i} N_{0} (i)}

(2)

A fundamental result in this area of research is the Furstenberg-Kesten Theorem Furstenberg and Kesten (1960b), which tells us that the long-run growth rate exists and is not random. This a is called the “stochastic growth rate”. The proof of this fact is fairly straightforward, within the framework of modern Markov chain theory. We present a version in section 3.1 and a proof in the appendix, both for the reader's convenience and because it illustrates some of the important basic ideas that we draw on throughout this work.

The main result of this paper is formula (40), which represents the derivative of a with respect to an incremental change in the probability of transitioning from e to $\tilde{e}$ . If we denote this derivative by $A_{e, \tilde{e}}$ , then the change corresponding to shifting P in the direction of a matrix W is a linear functional

\frac{da (P + ∊ W)}{d ∊} = \sum_{e, \tilde{e} \in M} A_{e, \tilde{e}} W_{e, \tilde{e}} .

We note that, while the existence of such a derivative follows from the general theory of Peres (1992), the computable formula is new.

Counting the K² parameters in each matrix, there are at most (MK² + M²) parameters. While all parameters must be nonnegative, and all elements of the population projection matrices but the birth rates must be ≤ 1, the only universal constraint is $Σ_{e \in M} P (e, \overset{˘}{e}) = 1$ . (There may, however, be further constraints imposed, as some transitions may be impossible. If we are considering age-structured populations, the matrices are Leslie matrices, each with only 2K − 1 potentially nonzero parameters.) The sensitivities we examine are derivatives of a with respect to these parameters. We will define a one-dimensional family of population matrices or environment transition matrices, and we refer to change “in the direction of” the derivative of this family. This will be made precise in the analyses that follow.

3 Technical background

3.1 Contraction

We denote the K-dimensional column vector with 1's in all places by 1. By default we use the L¹ norm ∥x∥ := Σ |x_i| when x is a vector in $R^{K}$ , and write $∥ X ∥ ≔ \sum_{i, j = 1}^{K} ∣ X_{i, j} ∣$ when X is a K × K matrix. Note that ∥X∥ = ∥X1∥ when X is a nonnegative matrix. Our assumptions imply that there are positive constants $\hat{k}$ and $\hat{r}$ , such that for any environments e₁, …, e_m,

∣ log ∥ X_{e_{m}} \dots X_{e_{1}} ∥ ∣ \leq \hat{k} + m \hat{r}

(3)

We use the Hilbert projective metric ρ, described in Golubitsky et al. (1975) and in section 3.1 of Seneta (1981), defined by

ρ (x, y) ≔ log max_{1 \leq i \leq K} x (i) ∕ y (i) + log max_{1 \leq i \leq K} y (i) ∕ x (i) .

(4)

This is a pseudometric on $R_{+}^{K}$ that is a metric on $S ≔ {{(x (1), \dots, x (K))}^{T} : x (i) > 0}$ and Σx(i) = 1}. The distance between two vectors is defined by the ray from the origin; that is, ρ(x, y) = ρ(x/∥x∥, y/∥y∥). It is straightforward to show — cf. Bushell (1973) — that

\frac{1}{2} [min {x (i)} + min {y (i)}] e^{- ρ (x, y)} ρ (x, y) \leq ∥ x - y ∥ \leq e^{ρ (x, y)} - 1

for any x, y ∈ $S$ . (The upper bound is shown by Bushell with respect to the Euclidean norm, but the same argument holds for any L^p norm.) Thus, convergence in the projective metric implies that the projections onto $S$ converge in the standard norms. It is also straightforward to show that

max_{1 \leq i \leq K} ∣ log \frac{x (i)}{y (i)} ∣ \leq ρ (x, y) + ∣ log \frac{∥ x ∥}{∥ y ∥} ∣ .

(5)

The Birkhoff contraction coefficient (cf. Seneta (1981)) is defined for nonnegative matrices X as

τ_{B} (X) ≔ sup_{u \neq v \in S} \frac{ρ (Xu, Xv)}{ρ (x, v)} .

(6)

It is clear that τ_B is sub-multiplicative (that is, τ_B(XY) ≤ τ_B(X)τ_B(Y)). We also have the formula (Theorem 3.12 of Seneta (1981))

τ_{B} (X) = \frac{1 - ϕ {(X)}^{1 ∕ 2}}{1 + ϕ {(X)}^{1 ∕ 2}}, where ϕ (X) = {\begin{matrix} {min}_{i, j, k, ℓ} \frac{X_{i, k} X_{j ℓ}}{X_{jk} X_{i ℓ}} & if X > 0, \\ 0 & ohterwise . \end{matrix}

(7)

(This depends on the assumption that X is row-allowable. We will be applying this result only to cases in which X is positive, so our bound on τ_B will be strictly less than 1.)

An immediate consequence is Lemma 3.1.

Lemma 3.1. For any fixed R, set $r ≔ {max}_{e_{1}, \dots, e_{R} \in M} τ_{B} {(X_{e_{R}} \dots X_{e_{1}})}^{1 ∕ R}$ . Then for any $e_{1}, \dots, e_{n} \in M$ ,

τ_{B} (X_{e_{n}} \dots X_{e_{1}}) \leq k_{1^{r^{n}}},

(8)

where k₁ := r^1/R−1. In particular, if R is chosen so that every product X_{e_R} … X_e₁ is positive, then r < 1.

We also know that for any u, $\overset{˘}{u} \in S$ , and $v^{T}$ any positive row vector,

log min {\frac{u (i)}{\overset{˘}{u} (i)}} \leq log \frac{v^{T} u}{v^{T} \overset{˘}{u}} \leq log max {\frac{u (i)}{\overset{˘}{u} (i)}} .

Since the left-hand side is ≤ 0 and the right-hand side ≥ 0, so

∣ log v^{T} u - log v^{T} \overset{˘}{u} ∣ \leq ρ (u, \overset{˘}{u}) .

(9)

Following Lemma 1 of Lange (1979), we define a compact subset $U \subset S$ which is stable under the transformations u → X_eu/∥X_eu∥ for any $e \in M$ and includes the vector 1/K, as well as all vectors of the form X_{e_R} … X_e1y/∥X_{e_R} … X_e₁y∥ where $y \in S$ ; and a compact subset $V \subset S^{T}$ which is stable under the transformations v^T → v^TX_e/∥v^TXe∥, and includes the vector 1^T/K as well as all vectors of the form y^TX_{e_R} … X_e1/∥y^TX_{e_R} … X_e1∥ where $y \in S$ . Note that we could take $U$ to be any $U_{n}$ defined to be the union of all products X_e1 … X_{e_n} $S$ . These are a decreasing sequence of compact sets, with $X_{e} U_{n} \subset U_{n + 1}$ . whose limit $U_{\infty}$ is in general a fractal set, which is the support of the stationary distribution π. For practicality we are likely to work with a $U$ which is a finite-stage approximation to $U_{\infty}$ .

Since $U$ is a compact subset of $S$ , its diameter $Δ ≔ {sup}_{u, \overset{˘}{u} \in U} ρ (u, \overset{˘}{u})$ is finite. Combining this with Lemma 3.1, there is a constant k₂ = k₁Δ such that

ρ (X_{e_{m}} \dots X_{e_{1}} u, X_{e_{m}} \dots X_{e_{1}} \overset{˘}{u}) \leq_{1^{r^{m}}} ρ (u, \overset{˘}{u}) \leq k_{2^{r^{m}}} .

(10)

(Cf. Theorem 2 of Lange and Holmes (1981).) Of course, the constants may be chosen so that the same relation holds for the transposed matrices, with $u^{T}, {\overset{˘}{u}}^{T} \in V$ . We use then the definition

Δ ≔ max {sup_{u, \overset{˘}{u} \in U} ρ (u, \overset{˘}{u}), sup_{v, \overset{˘}{v} \in U} ρ (v, \overset{˘}{v})} .

(11)

We also define

Δ_{U} (u) ≔ sup_{\overset{˘}{u} \in U} (u, \overset{˘}{u}); Δ_{V} (u) ≔ sup_{\overset{˘}{v} \in V} (v, \overset{˘}{v}) .

(12)

Since for any positive vectors u, u′, u″ we have ρ(u+u′, u″) ≤ max {ρ(u, u″), ρ(u′, u″)}, the maximum of ρ(u, u′) among vectors in a cone is taken between extreme points of the cone, we can compute Δ as the maximum of all distances of the form $ρ (X_{e_{R}} \dots X_{e_{1}} 1, X_{e_{R}^{'}} \dots X_{e_{1}^{'}} 1)$ ; and distances of the form $ρ (1 X_{e_{R}} \dots X_{e_{1}}, 1 X_{e_{R}^{'}} \dots X_{e_{1}^{'}})$ .

It follows (as in Lemma 2 of Lange and Holmes (1981)) that for any u, $\overset{˘}{u} \in U$ and environments e, e₁, …, e_m,

\begin{matrix} ∣ log \frac{∥ X_{e} X_{e_{m}} \dots X_{e_{1}} u ∥}{∥ X_{e_{m}} \dots X_{e_{1}} u ∥} - log \frac{∥ X_{e} X_{e_{m}} \dots X_{e_{1}} \overset{˘}{u} ∥}{∥ X_{e_{m}} \dots X_{e_{1}} \overset{˘}{u} ∥} ∣ & \leq k_{1^{r^{m}}} ρ (u, \overset{˘}{u}) \\ \leq k_{2^{r^{m}}} . \end{matrix}

(13)

The same relation holds when the matrices X are replaced by their transposes, with u^T, ${\overset{˘}{u}}^{T} \in V$ .

Since ∥X∥ = ∥X1∥, and 1/K is in $U$ , it immediately follows that if ${\overset{˘}{e}}_{1}, \dots, {\overset{˘}{e}}_{i}$ are any other environments,

∣ log \frac{∥ X_{e} X_{e_{m}} \dots X_{e_{1}} ∥}{∥ X_{e_{m}} \dots X_{e_{1}} ∥} - log \frac{∥ X_{e} X_{e_{m}} \dots X_{e_{i + 1}} X_{{\overset{˘}{e}}_{i}} \dots X_{{\overset{˘}{e}}_{1}} ∥}{∥ X_{e_{m}} \dots X_{e_{i + 1}} X_{{\overset{˘}{e}}_{i}} \dots X_{{\overset{˘}{e}}_{1}} ∥} ∣ \leq k_{2^{r^{m - i}}} .

(14)

We note that the results in Lange and Holmes (1981) depend only on the set of matrices X being compact, not on it being finite. In section 5 and beyond we will be letting the matrices X_e and/or the transition matrix P depend smoothly on a parameter x, which will take values either in [−x₀, x₀] or [0, x₀]. We may then choose the sets $U$ and $V$ and constants k₁, k₂, r such that the properties above — in particular, the stability of $U$ and $V$ and the bounds (10) and (13) — hold simultaneously for all values of the parameter.

One slightly more complicated bound is proved in the appendix.

Lemma 3.2 Given v, $\overset{˘}{v}$ , u, $\overset{˘}{u} \in R_{+}^{R}$ , and constants k_u, $k_{v} \leq \frac{1}{4}$ with $max ∣ log v_{i} ∕ {\overset{˘}{v}}_{i} ∣ \leq k_{v}$ and $max ∣ log u_{j} ∕ {\overset{˘}{u}}_{j} ∣ \leq k_{u}$ we have

∣ log \frac{v^{T} u}{{\overset{˘}{v}}^{T} u} - log \frac{v^{T} u}{{\overset{˘}{v}}^{T} \overset{˘}{u}} ∣ \leq 8 k_{u} k_{v} .

(15)

This condition is satisfied if

k_{v} ≔ ρ (v, \overset{˘}{v}) + ∣ log ∥ v ∥ ∕ ∥ \overset{˘}{v} ∥ ∣ \leq \frac{1}{4}

and

k_{u} ≔ ρ (u, \overset{˘}{u}) + ∣ log ∥ v ∥ ∕ ∥ \overset{˘}{v} ∥ ∣ \leq \frac{1}{4} .

3.2 Time reversal and the stationary distribution

The transition matrix for the time-reversal of P will be denoted $\tilde{P}$ , and is given by

\tilde{P} (e, \overset{˘}{e}) ≔ \frac{ν (\overset{˘}{e})}{ν \overset{˘}{e}} P (\overset{˘}{e}, e) .

A standard result (for example, see Theorem 6.5.1 of Grimmett and Stirzaker (2001)) tells us that if e₁, e₂, …, e_m form a stationary Markov chain with transition matrix P, for a fixed m, the reversed sequence e_m, e_m−1, …, e₁ is a Markov chain with transition matrix $\tilde{P}$ . (We use the boldface e to represent a random environment — a random variable with values in $M$ — and e to represent a particular environment or a realisation of a random variable e.)

As described in Lange and Holmes (1981), if e₁, e₂, … forms a stationary Markov chain with transition probabilities P, there is a unique distribution π on $U \times M$ which is stable under the transformation (u, e_t) ↦ (X_{e_t}u/∥X_{e_t}u∥, e_t+1). That is, if the normalized population structure N_t/∥N_t∥ paired with e_t+1 is chosen from the distribution π, then the pair (N_t+1/∥N_t+1∥, e_t+2) will also be in the distribution π.

This Markov chain is naturally represented as an iterated function system; that is, a step in the chain is made by choosing randomly from a collection of possible functions, and applying it to the current position of the chain. In this case, the function is u ↦ X_eu/∥X_eu∥, where e is the next random environment. Time reversal is a convenient approach for studying the asymptotic behavior of such systems, and more general random dynamical systems. For applications, see Barnsley and Elton (1988); Elton (1990); Steinsaltz (1999).

To understand how this works, suppose we extend our Markov chain of environments forward and backward in time to infinity: …, e₋₂, e₋₁, e₀, e₁, e₂, …. Starting at any s, positive or negative, moving forward we have a Markov chain with transition matrix P: e_s, e_s+1, e_s+2, …. Moving backward, e_s, e_s−1, e_s−2, … is a Markov chain with transition matrix $\tilde{P}$ . Choose any population vector $u_{0} \in U$ , and define for s > t,

U_{s, t} ≔ \frac{X_{e_{s}} X_{e_{s - 1}} \dots X_{e_{t}} u_{0}}{∥ X_{e_{s}} X_{e_{s - 1}} \dots X_{e_{t}} u_{0} ∥} .

(16)

From (10), we know that for any t₁, t₂ ≥ m, we have the bound ρ(U_s,−t₁, U_s−t₂) ≤ k₂r^m+s, which means that ${(U_{s, - t})}_{t = 0}^{\infty}$ is a Cauchy sequence. We may denote its limit by U_s,−∞.

We note four important features of this backward iterated sequence:

(i)
If the environments all shift one to the left, so that we start at e_s+1 instead of e_s, the distribution remains exactly the same. Thus, the distribution of U_s,−∞ is the same for any s.
(ii)
If we start our population Markov chain in the state (U_0,−∞, e₁), then the next state will be
$(\frac{X_{e_{1}} U_{0, - \infty}}{∥ X_{e_{1}} U_{0, - \infty} ∥}, e_{2}) = (U_{1, - \infty}, e_{2}) .$
Thus, the distribution of (U_0,−∞, e₁) is a stationary distribution π for the chain.
(iii)
For each fixed t, U_0,−t has the same distribution as
$U_{t, 0} = \frac{U_{e_{t}} X_{e_{t - 1}} \dots X_{e_{0}} u_{0}}{∥ X_{e_{t}} X_{e_{t - 1}} \dots X_{e_{0}} u_{0} ∥},$ (17)
which is a realization of the normalized population vector N_t/∥N_t∥, when it is started at the vector u₀. Thus, the population-environment Markov chain (N_t/∥N_t∥) converges in distribution to the same stationary distribution π, no matter what the initial condition.
(iv)
We will use frequently the fact that (U_0,−t, e₁) may be considered an approximate sample from the distribution π, with an error that is bounded by
$\begin{matrix} ρ (U_{0, - t}, U_{0, - \infty}) & = ρ (X_{e_{0}} \dots X_{e_{- t}} u_{0}, X_{e_{0}} \dots X_{e_{- t}} \cdot U_{- t - 1, - \infty}) \\ \leq Δ_{U} (u_{0}) τ_{B} (X_{e_{0}} \dots X_{e_{- t}}), \end{matrix}$
which is straightforward to bound (cf. section 3.1), independent of any information about U_−t,−∞.

The same holds true, of course, if we reverse the matrix multiplication: Starting from any nonnegative K-dimensional row vector $v_{0}^{T}$ , we define the sequence of row vectors

V_{0, t}^{T} ≔ \frac{v_{0}^{T} X_{e_{t}} \cdot X_{e_{0}}}{∥ v_{0}^{T} X_{e_{t}} \cdot X_{e_{0}} ∥} \in V .

Then V_0,t converges pointwise to a vector V_0,∞ := lim_t→∞V_0,t. We denote the distribution of (V_0,∞, e₋₁) by $\tilde{π}$ . As before, $\tilde{π}$ is the stationary distribution for the Markov chain on $V \times M$ , defined by taking (v^T, e_−t) at time t to (v^TX_e−t/∥v^TX_e−t∥, e_−(t+1)) at time t + 1.

We also define the regular conditional distributions π_e on $U$ as follows: pick (U, e) from the distribution π, conditioned on e being e, and take π_e to be the distribution of U. We define ${\tilde{π}}_{e}$ similarly. In the case of i.i.d. environments, of course, π and $\tilde{π}$ would be simply products of an independent population vector and the stationary environment with distribution ν.

4 Errors and How to Bound Them

A standard approach to estimating a — and a similar approach to the derivatives that we describe later on — is to choose a fixed starting vector $u_{0} \in U$ , simulate sequences e₀(i), e₁(i), …, emi (i independently from the stationary Markov chain with transition probabilities P (i = 1, …, J), and then compute

\begin{matrix} a_{m} ≔ & E [log \frac{X_{e_{m_{i}} (i)} X_{e_{m_{i} - 1} (i)} \dots X_{e_{0} (i)} u_{0}}{∥ X_{e_{m_{i} - 1} (i)} \dots X_{e_{0} (i)} u_{0} ∥}] \\ \approx \frac{1}{J} \sum_{i = 1}^{J} (log ∥ X_{e_{m_{i}} (i)} X_{e_{m - 1} (i)} \dots X_{e_{0} (i)} u_{0} ∥ - log ∥ X_{e_{m_{i} - 1} (i)} \dots X_{e_{0} (i)} u_{0} ∥) . \end{matrix}

(18)

It is important not only to know what would be an appropriate approximation to a or its derivatives in the sense of being asymptotically correct, but also to have rigorous bounds for the error arising from any finite simulation procedure, such as (18). There are two sources of error: systematic error, arising from the fact that (X_e0, u₀) is not exactly a sample from the distribution π, and sampling error, arising from the fact that we have estimated the expectation by averaging over a random sample.

For obtaining an unbiased estimate there is no reason the sample Markov chain realizations should need to be independent. For instance, a standard approach would be to take a single realization of the chain e₀; e₁; e₂,…, and take e_t(i) = e_t, with m_i = B + i for some “burn-in time” B. The problem with this approach is that it becomes more difficult to bound the errors; in principle, the convergence could be extremely slow for these kinds of sums. We leave the problem of bounding errors for these cumulative simulations to a future work.

4.1 Systematic error

By “systematic error” we mean the error in our estimate of a arising from the difference between the distribution we are aiming for and the distribution we are actually sampling from. The quantity we are trying to estimate may be represented as a = π[F], the expectation of a certain function F with respect to the distribution π. If we can simulate Z₁,…, Z_J from π, then ${\hat{a}}_{j} ≔ \sum_{j = 1}^{J} F (Z_{j})$ is an unbiased estimator of π[F], and will be consistent under modest Passumptions on F and the independence of the samples. Suppose, though that what we have are not samples from π, but samples $Z_{j}^{*}$ from a “similar” distribution π*. Then we can bound the error by

∣ a - \hat{a} ∣ \leq ∣ J^{- 1} \sum_{j = 1}^{J} F (Z_{j}^{*}) - π * [F] ∣ + ∣ π [F] - π * [F] ∣ .

(19)

Here the first term on the right-hand side is the sampling error, and the second term is the bias, the expected value of systematic error. The problem is that the bounds we can obtain for the bias are likely to be crude, absent good computational tools for the distribution π. (And if we could compute analytically from π, we wouldn't need to be simulating).

An alternative is to couple the samples $Z_{j}^{*}$ from the approximate distribution π′ to exact samples Z_j from the distribution π, we can break up the error in a slightly di erent way:

\begin{matrix} ∣ a - \hat{a} ∣ & \leq ∣ J^{- 1} \sum_{j = 1}^{J} F (Z_{j}) - π [F] ∣ + ∣ J^{- 1} \sum_{j = 1}^{J} F (Z_{j}) - J^{- 1} \sum_{j = 1}^{J} F [Z_{j}^{*}] ∣ \\ \leq ∣ J^{- 1} \sum_{j = 1}^{J} F (Z_{j}) - π [F] ∣ + J^{- 1} \sum_{j = 1}^{J} ∣ F (Z_{j}) - F (Z_{J}^{*}) ∣ \end{matrix}

(20)

Bounds for the sampling error in (19) will generally also be bounds for the first term in (20). The second term in (20), on the other hand, which takes the place of the bias, is a random variable, computed from the samples $Z_{j}^{*}$ . Its expectation is still a bound on the bias. The crucial fact is that the last line may be computable without knowing in detail what the “true” sample Z_j is.

A small disadvantage of this approach is that the systematic error varies with the sample. To achieve a particular fixed error bound we need an adaptive approach, whereby we successively extend our sequence of matrices until the error crosses the desired threshold. We note here that this approach to estimating the systematic error in simulations is essentially just a version of the Propp-Wilson algorithm (cf. Chapter 10 of Häggström (2002)). Unlike standard applications, though, the space is continuous, so the systematic error never reaches 0.

4.2 Sampling error

The sampling error is difficult to control with current techniques, because the distribution of the samples is so poorly understood — the very reason why we resort to the Monte Carlo approximation in the rst place. The best we can do for a rigorous bound is to use Hoeffiding's inequality (see Hoeffding (1963)), relying on crude bounds on the terms in the expectation. Hoeffding's inequality tells us that if X₁,…,X_J are i.i.d. random variables such that α ≤ X_i β almost surely, then for any z > 0,

P {∣ \frac{1}{J} \sum X_{i} - E [X] ∣ > z} \leq 2 e^{- 2 J z^{2} ∕ {(β - α)}^{2}} .

(21)

This is essentially the same bound that we would estimate from the normal approximation if the standard deviation of X were (β – α)/2. Generally we will want to fix p₀, the con dence level, and compute the corresponding z, which will be

z_{0} = (β - α) \sqrt{- \frac{1}{2 J} log (p_{0} ∕ 2)} .

(22)

Of course, the standard deviation will be smaller than this, but we do not know how much smaller. An alternative approach then would be to use the bound $2 τ (z \sqrt{J} ∕ \hat{σ})$ , where $\hat{σ}$ is the standard deviation of the simulated samples, and τ is the cumulative distribution function of the Student t distribution with J – 1 degrees of freedom. This will be a smaller bound, in that sense “better”, but not precisely true for finite samples, to the extent that the sample distribution is not normal. The corresponding bound on the error, at probability level p₀, is

z_{0} = \frac{\hat{σ}}{\sqrt{J}} t_{1 - p_{0} ∕ 2} (J - 1),

(23)

where t_p(J – 1) is the p quantile of the Student T distribution with J – 1 degrees of freedom; that is, if T has this distribution then P{T > t_p(J – 1) = p.

These asymptotic bounds are perfectly conventional in Markov chain Monte Carlo analysis (see, for example, Asmussen and Glynn (2007)), and there is no reason particularly to eschew them in this context. They are likely to be quite accurate, and superior to the rigorous bounds, and might also be applied to the setting where the expectations are estimated not from independent samples, but from a single run of the environment chain. We have nonetheless emphasised the Hoeffding-based rigorous bounds in the statements of our theorems for three reasons: First, they are likely to be less familiar, and the reader may require more guidance in applying them to the individual cases; second, because some residue of skepticism must remain for the asymptotic bounds, while these results may be applied to calculations that are otherwise analytically precise; and third, as a spur to further thought on the best ways to bound errors rigorously in these sorts of problems.

5 Growth Rate and Sensitivity to Projection Matrices

We present here extensions of two known results. In these cases (and in later results) we start by defining an estimator that converges to the the quantity we desire, and follow that by bounds on the systematic and sampling errors, as well as an error bound for estimates from a simulation estimator. We state our results on error bounds in the form “The quantity Q may be approximated by the expectation of A, with systematic error bounded by B and sampling error bounded by C(J, p).” This means that if A₁,…, A_J are independent realizations of A, then the probability that the true value of Q is not in the interval J⁻¹ ΣA_i ± [B + C(J, p)] is no bigger than p. When describing an adaptive bound on the systematic error, B will depend upon the particular simulation result. Again, the sampling error may be bounded either by a universally valid Hoeffding bound, based on known upper bounds on the samples, or by the Student t distribution, using the standard deviation estimated from the sample, which provides a generally much superior bound, but which can only be treated as an approximation.

5.1 Computing a

The stochastic growth rate a is commonly estimated by numerical simulation but, as discussed with examples by Caswell (2001), there is no general way to bound the errors in the estimated values. The following result provides suitable bounds.

Theorem 5.1 Let u₀ be any fixed element of $U$ , and Y_m := X_em X_em–1 …X_e1, where e₀; e₁, … form a Markov chain with transition rates P. The stochastic growth rate may be approximated by the simulated expectation of

log \frac{∥ X_{e_{m + 1}} Y_{m} u_{0} ∥}{∥ Y_{m} u_{0} ∥},

(24)

with systematic error bounded by $k_{2^{r^{m}}}$ and sampling error at level p on J samples bounded by

(log \frac{{sup}_{u \in U} {max}_{e \in M} ∥ X_{e} u ∥}{{inf}_{u \in U} {min}_{e \in M} ∥ X_{e} u ∥}) {(\frac{- log p}{2 J})}^{1 ∕ 2} .

(25)

When the simulated expectation is

\frac{1}{J} \sum_{i = 1}^{J} log \frac{∥ X_{e_{m + 1}} (i) Y_{m} (i) u_{0} ∥}{∥ Y_{m} (i) u_{0} ∥}

we may also bound the systematic error by

\frac{1}{J} \sum_{i = 1}^{J} sup_{u, \overset{˘}{u} \in U} ρ (Y_{m} (i) u, Y_{m} (i) \overset{˘}{u}) \leq \frac{Δ_{U} (u_{0})}{J} \sum_{i = 1}^{J} τ_{B} (Y_{m} (i)),

(26)

where $Δ_{U}$ (u₀) is de ned as in (12) and τ_B is defined as in (6).

5.2 Derivatives with respect to Projection Matrices

We need care in defining derivatives of a with respect to elements of the population projection matrices. As discussed in Tuljapurkar and Horvitz (2003) we must define how the matrix entries change, e.g., do we change fertility rates in a particular environment, or in all possible environments? Although the main formula here is known, Tuljapurkar's (1990) derivation did not justify a crucial exchange of limits (between taking the perturbation to zero and time to infinity). We provide a rigorous proof (see Appendix) and of course the error bounds here are new.

We will suppose that the matrices X_e depend smoothly on a parameter ε, so that we may define X′_e := ∂X_e/∂ε, and we define the base matrices to be at ε = 0. In some cases, the parametrization will be defined only for ε ≥ 0, and in those cases we will understand the partial derivatives to be one-sided derivatives, and the limits lim_ε→0 will be the one-sided likits lim_ε↓0.

Theorem 5.2 Let U_e and V_e be independent random variables with distributions π_e and ${\tilde{π}}_{e}$ (the conditional stationary distributions defined in section 3.2). Then

a' (0) = \sum_{e \in M} ν_{e} E [\frac{V_{e}^{T} X_{e}^{'} U_{e}}{V_{e}^{T} X_{e} U_{e}}]

(27)

Each term may be approximated by averaging samples of the form

\sum_{e \in M} ν_{e} \frac{V^{(m) T} X_{e}^{'} U^{(m)}}{V^{(m) T} X_{e} U^{(m)}},

(28)

where u₀; $v_{0}^{T}$ are any fixed elements of $U$ , $V$ respectively, $U^{(m)} = X_{{\tilde{e}}_{1}} \dots X_{{\tilde{e}}_{m}} u_{0}$ and $V^{(m) T} = v_{0}^{T} X_{e_{m}} \dots X_{e_{1}}$ , e = ẽ₀; ẽ₁,…,ẽ_m form a sample from the Markov chain P̃, and e = e₀, e₁,…, e_m form a sample from the Markov chain P. The systematic error may be bounded uniformly by

2 (exp (4 k_{2^{r^{m}}}) - 1) ∣ a' (0) ∣,

(29)

while the sampling error at level p on J samples is bounded by

{(\frac{- log (p ∕ 2)}{2 J})}^{1 ∕ 2} sup_{\underset{\overset{˘}{u} \in U, {\overset{˘}{v}}^{T} \in V}{u \in U, v^{T} \in V}} \sum_{e \in M} ν_{e} (\frac{v^{T} X_{e}^{'} u}{v^{T} X_{e} u} - \frac{\overset{˘}{v} X_{e}^{'} \overset{˘}{u}}{{\overset{˘}{v}}^{T} X_{e} \overset{˘}{u}}) .

(30)

Suppose the simulated expectation is

\frac{1}{J} \sum_{j = 1}^{J} \sum_{e \in M} ν_{e} \frac{{(V^{(m)} (j))}^{T} X_{e}^{'} U^{(m)} (j)}{{(V^{(m)} (j))}^{T} X_{e} U^{(m)} (j)},

where

U^{(m)} (j) = X_{{\tilde{e}}_{1} (j)} \dots X_{{\tilde{e}}_{m} (j)} u_{0} ≔ {\tilde{Y}}_{m} (j) u_{0}

and

{(V^{(m)} (j))}^{T} = v_{0}^{T} X_{e_{m} (j)} \dots X_{e_{1} (j)} ≕ v_{0}^{T} Y_{m} (j) .

Let

U (j) ≔ {\tilde{Y}}_{m} (j) U = {{\tilde{Y}}_{m} (j) u : u \in U},

V (j) ≔ V {\tilde{Y}}_{m} (j) = {v^{T} Y_{m} (j) : v^{T} \in V} .

Then we may also bound the systematic error by

\begin{matrix} \sum_{e \in M} \frac{ν_{e}}{J} \sum_{j = 1}^{J} sup_{\underset{v^{T} \in V (j)}{u \in U (j)}} ∣ \frac{v^{T} X_{e}^{'} u}{v^{T} X_{e} u} - \frac{{(V^{(m)} (j))}^{T} X_{e}^{'} U^{(m)} (j)}{{(V^{(m)} (j))}^{T} X_{e} U^{(m)} (j)} ∣ \\ \leq \sum_{e \in M} \frac{ν_{e}}{J} \sum_{j = 1}^{J} (exp {2 sup_{u \in U (j)} ρ (u, U^{(m)} (j)) + 2 sup_{v^{T} \in V (j)} ρ (u, V^{(m)} (j))} - 1) \times sup_{\underset{v^{T} V (j)}{u \in U (j)}} \frac{v^{T} ∣ X_{e}^{'} ∣ u}{v^{T} X_{e} u} \\ \leq \sum_{e \in M} \frac{ν_{e}}{J} \sum_{j = 1}^{J} \underset{\underset{v^{T} \in V (j)}{u \in U (j)}}{su[} \frac{v^{T} ∣ X_{e}^{'} u ∣}{v^{T} X_{e} u} (exp {2 Δ (τ_{B} ({\tilde{Y}}_{m} (j)) + τ_{B} {(Y_{m} (j))}^{T})} - 1) . \end{matrix}

(31)

Note that the bound (29) is given as a proportion of the unknown a…(0). It can be turned into an explicit bound by using an upper bound on a…(0). We have the trivial bound

∣ a' (0) ∣ \leq \sum_{e \in M} ν_{e} sup_{\underset{v^{T} \in V}{u \in U}} \frac{∣ v^{T} X_{e}^{'} u ∣}{v^{T} X_{e} u} .

(32)

We recall that Δ and τ_B may be bounded according to the formulas given in section 3.1.

It may seem surprising that the vectors U_e and V_e are independent. If we think of the environments as a doubly infinite sequence with e₀ = e, then the vector U_e in equation (27) depends only on the past (that is, e_i with i < 0) and V_e depends only on the future (e_i with i > 0). By the Markov property, these two are independent, when conditioned on e₀ = e.

6 Environments and Coupling

Suppose we change the transition matrix P to a slightly different matrix P^(ε), and want to compare population growth along environmental sequences generated by the original and the perturbed matrix. We expect that the perturbed environmental sequences will only occasionally deviate from the environment that we “would have had” in the original distribution of environments. Computing the derivative of a is then a matter of measuring the cumulative deviations due to these changes. These may be split into two parts: First, the process moves under P^(ε) to a state $\overset{˘}{e}$ , different from the state e that it would have moved to under P. Then there is a sequence of environments following on from $\overset{˘}{e}$ that is different from the sequence that would have followed from e, until the Markov chain gradually “forgets” its starting point. The change to a new sequence of matrices induces two separate changes on the growing population: The new sequence accumulates a difference in magnitude on its way to stationarity; and it produces a different stationary distribution of unit vectors depending on the starting environment $\overset{˘}{e}$ rather than e.

In this section we examine the first effect. We fix the transition matrix P and compare the growth of total population size when starting from environment e to the growth starting from the stationary distribution ν. A standard method for doing this is coupling. For an outline of coupling techniques in MCMC, see Kendall (2005) and Roberts and Rosenthal (2004). We use coupling in two ways, corresponding to the two components of the Markov chain: the environment and the population vector.

Fix environments e and $\overset{˘}{e}$ (possibly the same). We define sequences e₀, e₁, …; ${\overset{˘}{e}}_{0}$ , ${\overset{˘}{e}}_{1}$ , …; and ${\overset{‒}{e}}_{0}$ , ${\overset{‒}{e}}_{1}$ , …: all three are Markov chains with transition probabilities P, but with e₀ = e, ${\overset{˘}{e}}_{0} = \overset{˘}{e}$ and ${\overset{‒}{e}}_{0}$ having distribution ν (so that ( ${\overset{‒}{e}}_{i}$ ) is stationary). We define the total population effect of starting in state e rather than $\overset{˘}{e}$ as

\begin{matrix} {}_{\overset{˘}{e}}ζ_{e} & ≔ lim_{t \to \infty} (E [log ∥ X_{e_{t}} \dots X_{e_{0}} ∥] - E [log ∥ X_{{\overset{˘}{e}}_{t}} \dots X_{{\overset{˘}{e}}_{0}} ∥]) \\ ζ_{e} & ≔ lim_{t \to \infty} (E [log ∥ X_{e_{t}} \dots X_{e_{0}} ∥] - E [log ∥ X_{{\overset{‒}{e}}_{t}} \dots X_{{\overset{‒}{e}}_{0}} ∥]) \\ = \sum_{\overset{˘}{e} \in M} ν_{\overset{˘}{e}} \cdot {}_{\overset{˘}{e}}ζ_{e} . \end{matrix}

(33)

This is one of the terms that will come into formula (40), the derivative of a with respect to shifting transition probabilities from e to $\overset{˘}{e}$ .

Note that when the environments are i.i.d. — so $P (e, \overset{˘}{e}) = ν_{\overset{˘}{e}}$ — we have

{}_{\overset{˘}{e}}ζ_{e} = E [log \frac{∥ V^{T} X_{e} ∥}{∥ V^{T} X_{\overset{˘}{e}} ∥}],

where V^T has the distribution π.

Computing ζ_e depends on coupling the version of the Markov chain starting at e, to another version starting in the distribution ν. We define the coupling time τ to be the first time such that $e_{τ} = {\overset{‒}{e}}_{τ}$ ; after this time the chains follow identical trajectories. If we know the distribution of τ and of the sequences followed by the two chains from time 0 to τ, we can average the diferences in (33) to nd ζ. The advantage of coupling is, first, that it reduces the variability of the estimates, and second, that we know from the simulation when the coupling time has been achieved, which gives bounds on the error. A suitable choice is Griffeath's maximal coupling (Griffeath, 1975) which we will apply in Pitman's (Pitman, 1976) path-decomposition representation. (The coupling is “maximal” in the sense of making the coupling time, and hence the variance of the estimate, as small as possible.) However we must be careful about sampling values of τ, because they may be large if the Markov chain mixes slowly. To deal with this we overweight large coupling times that generate a large contribution to ζ.

Beginning with a fixed environment e, the procedure is as follows:

(i)
Define the sequence of vectors α_t := P^t(e, ·) − ν(·). We also define $α_{t}^{+}$ and $α_{t}^{-}$ to be the vectors of pointwise positive and negative parts respectively. Let C(t) be a bound on $∣ log ∥ X_{e_{1}} \dots X_{e_{t}} ∥ - log ∥ X_{{\overset{˘}{e}}_{1}} \dots X_{{\overset{˘}{e}}_{t}} ∥ ∣$ , where the e_i and ${\overset{˘}{e}}_{i}$ are any environments. From (3) we know that $2 \hat{k} + 2 t \hat{r}$ is a possible choice for C(t).
(ii)
For pairs (t, e), where t is a positive integer and $e * \in M$ , define a probability distribution
$q (t, \overset{˘}{e}) ≔ {\begin{matrix} ν_{e} & if e = \overset{˘}{e}, t = 0; \\ 0 & if e \neq \overset{˘}{e}, t = 0; \\ [α_{t - 1}^{+} P] (\overset{˘}{e}) - α_{t}^{+} (\overset{˘}{e}) & otherwise . \end{matrix}$
This is the distribution of the pair (τ, e_τ) for the maximally coupled chain. Define
$A ≔ \sum_{t = 1}^{\infty} \sum_{\overset{˘}{e} \in M} q (t, \overset{˘}{e}) C (t),$
and a probability distribution on $N \times M$
$\overset{˚}{q} (t, \overset{˘}{e}) ≔ \frac{q (t, \overset{˘}{e}) C (t)}{A} .$
(iii)
Average J independent realizations of the following random variable: Let (τ, e₀) be chosen from the distribution $\overset{˚}{q}$ on $N \times M$ , and ${\overset{˘}{e}}_{0}$ independently from the distribution ν. From these starting states e₀ and ${\overset{˘}{e}}_{0}$ , let (e₀, …, e_τ, …, e_m) and ( ${\overset{˘}{e}}_{0}$ , …, ${\overset{˘}{e}}_{τ}$ , …, ${\overset{˘}{e}}_{m}$ ) be realizations of the coupled pair of Markov chains with transition probabilities P, conditioned on the coupling time being τ and $e_{τ} = {\overset{˘}{e}}_{τ} = \overset{˘}{e}$ . These realizations are generated from independent inhomogeneous Markov chains running backward, with transition probabilities
$P {e_{i - 1} = x ∣ e_{i} = y} = \frac{α_{i - 1}^{+} (x) P (x, y)}{Σ_{z \in M} α_{i - 1}^{+} (z) P (z, y)},$

$P {{\tilde{e}}_{i - 1} = x ∣ {\overset{˘}{e}}_{i} = y} = \frac{α_{i - 1}^{-} (x) P (x, y)}{Σ_{z \in M} α_{i - 1}^{-} (z) P (z, y)} .$
We extend the chain past τ, requiring $e_{t} = {\overset{˘}{e}}_{t}$ for t > τ, as a realization of the Markov transition probabilities P, to obtain a total sample of predetermined length m. The random variable is then
$Z ≔ \frac{A}{C (τ)} log \frac{∥ X_{e_{m}} \dots X_{e_{τ}} \dots X_{e_{0}} ∥}{∥ X_{e_{m}} \dots X_{e_{τ + 1}} X_{{\overset{˘}{e}}_{τ}} \dots X_{{\overset{˘}{e}}_{0}} ∥} .$
(Note that the realizations corresponding to τ = 0 are identically 0. The possibility of τ = 0 has been included only to simplify the notation. In practice, we are free to condition on τ > 0.)

The change from q to $\overset{˚}{q}$ is an example of importance sampling (cf. Chapter V.1 in Asmussen and Glynn (2007)). We oversample the values of the random variable with high τ to reduce the variability of the estimate. The importance sampling makes Z(j) a bounded random variable, with bound A. Imagine that we had a source of perfect samples V^T (j) from the distribution ${\tilde{π}}_{e_{m}}$ , and define

\tilde{Z} (j) ≔ \frac{A}{C (τ (j))} log \frac{∥ V^{T} (j) X_{e_{m} (j)} \dots X_{e_{0} (j)} ∥}{∥ V^{T} (j) X_{{\overset{˘}{e}}_{m} (j)} \dots X_{{\overset{˘}{e}}_{0} (j)} ∥} .

Let Y(j) := X_{e_m(j)} ⋯ X_e₀(j) and $\overset{˘}{Y} (j) ≔ X_{{\overset{˘}{e}}_{m} (j)} \dots X_{{\overset{˘}{e}}_{0} (j)}$ . Then

∣ \tilde{Z} (j) - Z (j) ∣ \leq \frac{A Δ_{ν} (1)}{C (τ (j))} (τ_{B} (Y (j)) + τ_{B} (\overset{˘}{Y} (j))) .

(34)

Since $E [\tilde{Z}] = ζ_{e}$ , we may use (22) to compute the bound

P {∣ ζ_{e} - n^{- 1} \sum_{j = 1}^{n} \tilde{Z} (j) ∣ > 2 A \sqrt{- \frac{1}{2 J} log (p_{0} ∕ 2)}} \leq p_{0} .

(35)

Lemma 6.1 The limits defining the coefficients ${}_{\overset{˘}{e}}ζ_{e}$ and ζ_e exist and are finite. We may approximate ζ_e by

\frac{1}{J} \sum_{j = 1}^{J} \frac{A}{C (τ (j))} log \frac{∥ X_{e_{τ} (j)} \dots X_{e_{0} (j)} ∥}{∥ X_{{\tilde{e}}_{τ} (j)} \dots X_{{\tilde{e}}_{0} (j)} ∥} .

(36)

If 0 < p₀ ≤ 1, the probability is no more than p₀ that the error in this estimation is larger than

\frac{1}{J} \sum_{j = 1}^{J} \frac{A Δ_{ν} (1)}{C (τ (j))} (τ_{B} (Y (j)) + τ_{B} (\overset{˘}{Y} (j))) + 2 A \sqrt{- \frac{1}{2 J} log (p_{0} ∕ 2)},

(37)

It remains to bound A. From standard Markov chain theory, for any vector $v \in S$ ,

∥ v^{T} P^{t} - v^{t} ∥ \leq D ξ^{t} .

(38)

Setting setting Q := P − 1ν^T, $α_{t}^{T}$ is the e-th row of Q_t, which may also be written as

α_{t} = 1_{e}^{T} {(Q^{t})}^{T},

where 1_e is the column vector with 1 in place e and 0 elsewhere. Thus

∥ α_{t} ∥ \leq D ξ^{t} .

If we use the bound $C (t) = \hat{k} + t \hat{r}$ , then

\begin{matrix} A & = \sum_{t = 1}^{\infty} (\hat{k} + t \hat{r}) (∥ α_{t - 1} ∥ - ∥ α_{t} ∥) \\ = \hat{k} ∥ α_{1} ∥ + \hat{r} \sum_{t = 1}^{\infty} ∥ α_{t} ∥ \\ \leq D ξ (\hat{k} + \frac{D \hat{r}}{1 - ξ}) \end{matrix}

(39)

7 Derivatives with respect to Environmental Transitions

We are now ready to compute derivatives of a with respect to changes in the distribution of environments, as determined by P. Complicating the notation slightly is the constraint { $P : \sum_{\overset{˘}{e}} P (e, \overset{˘}{e}) = 1$ for each e}; thus, there can be no sense in speaking of the derivative with respect to changes in $P (e, \overset{˘}{e})$ for some particular e, $\overset{˘}{e}$ . Instead, we must compute directional derivatives along the direction of some matrix W, in the plane $\sum_{\overset{˘}{e}} W_{e, \overset{˘}{e}} = 0$ .

For the purposes of this result we write a(ε) = a(P^(ε)), where P^(ε) is a differentiable curve of M × M matrices, where the parameter ε takes values either in a two-sided interval [−ε₀, ε₀], or a one-sided interval [0, ε₀]. Let W := ∂P^(ε)/∂ε, an M × M matrix whose rows all sum to 0. The perturbations are such that P^(ε) retains the ergodicity and irreducibility of P. (The result should be the same whether ε is positive or negative. If P is on the boundary of the set of possible values, one or the other sign may be impossible. Some choices of W may be impossible in both directions.) In the special case in which $W_{e, \overset{˘}{e}} = 1$ and $W_{e, \overset{˘}{e}} = - 1$ , with all other entries 0, we are computing the derivative corresponding to a small increase in the rate of transitioning from environment e to $\overset{˘}{e}$ , and a decrease in the rate of transitioning to $\overset{˘}{e}$ .

In this section the matrices X₁, …, X_M are assumed fixed.

Theorem 7.1 The derivative of the stochastic growth rate is

a' (0) = \sum_{e, \overset{˘}{e} \in M} ν_{\overset{˘}{e}} W_{\overset{˘}{e}, e} (ζ_{e} + E [log \frac{V_{e}^{T} X_{e} X_{\overset{˘}{e}} U_{\overset{˘}{e}}}{∥ V_{e}^{T} X_{e} ∥}]),

(40)

where $U_{\overset{˘}{e}}$ , $V_{e}^{T}$ are independent random variables with distributions $π_{\overset{˘}{e}}$ and ${\tilde{π}}_{e}$ respectively.

The quantities ζ_e may be approximated, with error bounds, according to the algorithm described in section 6.

The other part of the expression may be approximated by averaging samples of the form

\sum_{e, \tilde{e} \in M} ν_{\tilde{e}} W_{\tilde{e}, e} log \frac{V_{e}^{(m)} X_{e} X_{\tilde{e}} U_{\tilde{e}}^{(m)}}{∥ V_{e}^{(m) T} X_{e} ∥},

(41)

where

U_{\tilde{e}}^{(m)} = \frac{X_{{\tilde{e}}_{1}} \dots X_{{\tilde{e}}_{m}} u_{0}}{∥ X_{{\tilde{e}}_{1}} \dots X_{{\tilde{e}}_{m}} u_{0} ∥}

and

V_{e}^{(m) T} = \frac{v_{0}^{T} X_{e_{m}} \dots X_{e_{1}}}{∥ v_{0}^{T} X_{e_{m}} \dots X_{e_{1}} ∥},

and e = e₀, e₁, …, e_m is a Markov chain with transition matrix P, and $\tilde{e} = {\tilde{e}}_{0}$ , ${\tilde{e}}_{1}$ , …, ${\tilde{e}}_{m}$ is an independent Markov chain with transition probabilities $\tilde{P}$ , and $u_{0} \in U$ and $v_{0}^{T} \in V$ .

The systematic error may be bounded uniformly by 2k₂r^m∥(ν^T|W|)∥, while the sampling error at level p on J samples bounded by

2 ∥ ν^{T} ∣ W ∣ ∥ sup_{\underset{v^{T} \in V}{u \in M}} max_{e \in M} ∣ log v^{T} X_{e} u ∣ {(\frac{- log p}{2 J})}^{1 ∕ 2 .} .

(42)

Suppose the simulated expectation is

\frac{1}{J} \sum_{j = 1}^{J} \sum_{\tilde{e}, e \in M} ν_{\tilde{e}} W_{e, \tilde{e}} log \frac{V^{(m, \tilde{e}) T} (j) X_{\tilde{e}} X_{e} U^{(m, e)} (j)}{∥ V^{(m, \tilde{e}) T} (j) X_{\tilde{e}} ∥},

where

U_{\tilde{e}}^{(m)} (j) ≔ \frac{X_{{\tilde{e}}_{1} (j)} \dots X_{{\tilde{e}}_{m} (j)} u_{0}}{∥ X_{{\tilde{e}}_{1} (j)} \dots X_{{\tilde{e}}_{m} (j)} u_{0} ∥} ≕ \frac{{\tilde{Y}}_{\tilde{e}}^{(m)} (j) u_{0}}{∥ {\tilde{Y}}_{\tilde{e}}^{(m)} (j) u_{0} ∥}

and

V_{e}^{(m) T} (j) = \frac{v_{0}^{T} X_{e_{m} (j)} \dots X_{e_{1} (j)}}{∥ v_{0}^{T} X_{e_{m} (j)} \dots X_{e_{1} (j)} ∥} ≕ \frac{v_{0}^{T} Y_{e}^{(m)} (j)}{∥ v_{0}^{T} Y_{e}^{(m)} (j) ∥} .

We may also bound the systematic error by

\begin{matrix} \frac{1}{J} \sum_{j = 1}^{J} & \sum_{\tilde{e}, e \in M} ν_{\tilde{e}} ∣ W_{\tilde{e}, e} ∣ (sup_{u, \overset{˘}{u} \in U} ρ ({\tilde{Y}}_{\tilde{e}}^{(m)} (j) u, {\tilde{Y}}_{\tilde{e}}^{(m)} (j) \overset{˘}{u}) + sup_{v^{T}, {\overset{˘}{v}}^{T} \in V} ρ (v^{T} Y_{e}^{(m)} (j) X_{e}, {\overset{˘}{v}}^{T} Y_{e}^{(m)} (j) X_{e})) \\ \leq \frac{Δ}{J} \sum_{j = 1}^{J} \sum_{\tilde{e}, e \in M} ν_{\tilde{e}} ∣ W_{e, \tilde{e}} ∣ (τ_{B} ({\tilde{Y}}_{\tilde{e}}^{(m)} (j)) + τ_{B} (X_{e}^{T} Y_{e}^{(m)} {(j)}^{T})) . \end{matrix}

(43)

We note that expressions like (40) are examples of what Brémaud (1992) calls “ersatz derivatives”. In a rather different class of applications Brémaud suggests applying maximal coupling.

8 Discussion

Our results provide analytical formulas and simulation estimators for the derivatives of stochastic growth rate with respect to the transition probability matrix or the population projection matrices. We have concentrated here on the theoretical results; although this may not be obvious, we have made considerable effort at brevity. Partly for this reason, we will present elsewhere numerical applications of these results. We expect that our results should carry over to integral population models (IPMs), given the strong parallels between the stochastic ergodicity properties of IPMs and matrix models (Ellner and Rees, 2007).

Our results apply not only to stochastic structured populations but to any stochastic system in which a Lyapunov exponent of a product of random matrices determines stability or other dynamic properties. Examples include the net reproductive rate in epidemic models and some models of network dynamics. An obvious application of our results is to the analysis of optimal life histories, i.e., environment-to-projection matrix maps that maximize the stochastic growth rate. As discussed by McNamara (1997), this optimization problem translates into what is called an average reward problem in stochastic control theory, and so our results may be more generally useful in such control problems.

Highlights

>
We analyse the growth rate a in matrix population models driven by Markov environments.
>
We derive formulas for derivatives of a with respect to changes in the Markov parameters.
>
We prove known formulas for the derivatives with respect to demographic parameters.

9 Acknowledgements

We thank NIA (BSR) for support under 1P01 AG22500. David Steinsaltz was supported by a New Dynamics of Ageing grant, a joint interdisciplinary program of the UK research councils.

Appendix

Proofs of the theorems.

A.1 Proof of Lemma 3.2

We have

\begin{matrix} ∣ \log \frac{v^{T} u}{{\overset{˘}{v}}^{T} u} - \log \frac{v^{T} \overset{˘}{u}}{{\overset{˘}{v}}^{T} \overset{˘}{u}} ∣ & = ∣ \log \frac{\sum_{i, j} v_{i} u_{i} {\overset{˘}{v}}_{j} {\overset{˘}{u}}_{j}}{\sum_{i, j} v_{i} {\overset{˘}{u}}_{i} {\overset{˘}{v}}_{j} u_{j}} ∣ \\ = ∣ \log \frac{\sum_{i} u_{i} v_{i} {\overset{˘}{u}}_{i} {\overset{˘}{u}}_{i} + \sum_{i < j} (v_{i} u_{i} {\overset{˘}{v}}_{j} {\overset{˘}{u}}_{j} + v_{j} u_{j} {\overset{˘}{v}}_{i} {\overset{˘}{u}}_{i})}{\sum_{i} u_{i} v_{i} {\overset{˘}{u}}_{i} {\overset{˘}{u}}_{i} + \sum_{i < j} (v_{i} {\overset{˘}{u}}_{i} {\overset{˘}{v}}_{j} u_{j} + v_{j} {\overset{˘}{u}}_{j} {\overset{˘}{v}}_{i} u_{i})} ∣ \\ \leq \max_{i, j} ∣ \log \frac{v_{i} u_{i} {\overset{˘}{v}}_{j} {\overset{˘}{u}}_{j} + v_{j} u_{j} {\overset{˘}{v}}_{i} {\overset{˘}{u}}_{i}}{v_{i} {\overset{˘}{u}}_{i} {\overset{˘}{v}}_{j} u_{j} + v_{j} {\overset{˘}{u}}_{j} {\overset{˘}{v}}_{i} u_{i}} ∣ \end{matrix}

For some choice of i, j, let

α ≔ \log \frac{u_{i}}{{\overset{˘}{u}}_{i}}, β ≔ \log \frac{v_{i}}{{\overset{˘}{v}}_{i}}, γ ≔ \log \frac{u_{j}}{{\overset{˘}{u}}_{j}}, δ ≔ \log \frac{v_{j}}{{\overset{˘}{v}}_{j}} .

Note that by (5),

\begin{matrix} ∣ α ∣ \leq ρ (u, \overset{˘}{u}) + ∣ \log ∥ u ∥ ∕ ∥ \overset{˘}{u} ∥ ∣ \leq \frac{1}{4}, \\ ∣ β ∣ \leq ρ (v, \overset{˘}{v}) + ∣ \log ∥ v ∥ ∕ ∥ \overset{˘}{v} ∥ ∣ \leq \frac{1}{4}, \\ ∣ γ ∣ \leq ρ (u, \overset{˘}{u}) + ∣ \log ∥ u ∥ ∕ ∥ \overset{˘}{u} ∥ ∣ \leq \frac{1}{4}, \\ ∣ δ ∣ \leq ρ (v, \overset{˘}{v}) + ∣ \log ∥ u ∥ ∕ ∥ \overset{˘}{u} ∥ ∣ \leq \frac{1}{4} . \end{matrix}

(44)

Then we need to bound

\begin{matrix} ∣ \log \frac{e^{α + β} + e^{γ + δ}}{e^{β + δ} + e^{α + γ}} ∣ & \leq \frac{∣ e^{α + β} + e^{γ + δ} - e^{β + δ} - e^{α + γ} ∣}{\min {e^{α + β} + e^{γ + δ}, e^{β + δ} + e^{α + δ}}} \\ \leq ∣ e^{α + β} + e^{γ + δ} - e^{β + δ} - e^{α + γ} ∣, \end{matrix}

(45)

where we have used the relation |log x| ≤ max{x, 1/x} − 1.

We now use the Taylor series expansion

e^{x} = \sum_{k = 0}^{n} \frac{x^{k}}{k!} + \frac{C x^{k + 1}}{(k + 1)!},

where |C| ≤ max{e^x, 1}. This turns the right-hand side of (45) into

2 \sum_{k = 2}^{\infty} \frac{1}{k!} ∣ {(α + β)}^{k} + {(γ + δ)}^{k} - {(β + δ)}^{k} - {(α + γ)}^{k} ∣ = 2 \sum_{k = 2}^{\infty} ∣ \sum_{ℓ = 1}^{k - 1} \frac{1}{ℓ! (k - ℓ)!} (α^{ℓ} β^{k - ℓ} + γ^{ℓ} δ^{k - ℓ} - β^{ℓ} δ^{k - ℓ} - α^{ℓ} γ^{k - ℓ}) ∣ \leq 2 \sum_{k = 0}^{\infty} \frac{1}{(k - 2)!} (α β \sum_{ℓ = 0}^{k - 2} (_{ℓ}^{k - 2}) α^{ℓ} β^{k - 2 - ℓ} + ∣ γ δ ∣ \sum_{ℓ = 0}^{k - 2} (_{ℓ}^{k - 2}) γ^{ℓ} δ^{k - 2 - ℓ} + ∣ β δ ∣ \sum_{ℓ = 0}^{k - 2} (_{ℓ}^{k - 2}) β^{ℓ} δ^{k - 2 - ℓ} + ∣ α γ ∣ \sum_{ℓ = 0}^{k - 2} (_{ℓ}^{k - 2}) α^{ℓ} γ^{k - 2 - ℓ}) .

Applying the bounds (44) yields finally the upper bound in (15).

A.2 Estimating the stochastic growth rate

We prove here Theorem 5.1. The quantity we are trying to compute is

a = E [\log ∥ X_{e} U ∥], where (U, e) is selected from the distribution π .

(46)

Let e₀, e₁, e₂, … be a realization of the stationary Markov chain with transition matrix P. Let Y_m := X_{e_m}X_{e_m−1} ⋯ X_e₁. Choose $u_{0} \in U$ , and let U be a random variable with distribution π_e₀. Then $a = E [\log ‖ X_{e_{m + 1}} Y_{m} U ‖ ∕ ‖ Y_{m} U ‖]$ , which may be approximated by $E [\log ‖ X_{e_{m + 1}} Y_{m} u_{0} ‖ ∕ ‖ Y_{m} u_{0} ‖]$ .

If we identify systematic error with bias, this is

{Error}_{sys} = ∣ E [\log \frac{∥ X_{e_{m + 1}} Y_{m} u_{0} ∥}{∥ Y_{m} u_{0} ∥}] - E [\log \frac{∥ X_{e_{m + 1}} Y_{m} U ∥}{∥ Y_{m} U ∥}] ∣,

since (Y_mU/∥Y_mU∥, e_m) also has the distribution π (if Y_m and U are taken to be independent, conditioned on e₀). Thus

\begin{matrix} {Error}_{sys} & \leq E [∣ \log \frac{∥ X_{e_{m + 1}} Y_{m} u_{0} ∥}{∥ Y_{m} u_{0} ∥} - \log \frac{∥ X_{e_{m + 1}} Y_{m} U ∥}{∥ Y_{m} U ∥} ∣] \\ \leq E [\sup_{u, \overset{˘}{u} \in U} ∣ \log \frac{∥ X_{e_{m + 1}} Y_{m} u ∥}{∥ Y_{m} u ∥} - \log \frac{∥ X_{e_{m + 1}} Y_{m} \overset{˘}{u} ∥}{∥ Y_{m} \overset{˘}{u} ∥} ∣] \\ \leq k_{2} r^{m}, \end{matrix}

by (13). The corresponding bound on the sampling error may be computed from (22).

For a particular choice of of e₁, …, e_m+1 and U we can also represent the random systematic error as

∣ \log \frac{∥ X_{e_{m + 1}} Y_{m} u_{0} ∥}{∥ Y_{m} u_{0} ∥} - \log \frac{∥ X_{e_{m + 1}} Y_{m} U ∥}{∥ Y_{m} U ∥} ∣,

which may be bounded by the summand in (26).

A.3 Estimating sensitivities: Matrix entries

We prove here Theorem 5.2. As discussed at the end of section 3.1, we may assume that the compact sets $U$ and $V$ are stable and satisfy the bounds of section 3.1 simultaneously for all $X_{e}^{(∊)}$ . The stationary distributions corresponding to products of the perturbed matrices are denoted π^(ε) and ${\tilde{π}}_{e}^{(∊)}$ , and the corresponding regular conditional distributions are π_e^(ε) and ${\tilde{π}}_{e}^{(∊)}$

The derivative a′(0) may be written as

\begin{matrix} \lim_{∊ \to 0} ∊^{- 1} (\lim_{m \to \infty} E [\log \frac{∥ X_{e_{m}}^{(∊)} X_{e_{m - 1}}^{(∊)} \dots X_{e_{0}}^{(∊)} u_{0} ∥}{∥ X_{e_{m - 1}}^{(∊)} \dots X_{e_{0}}^{(∊)} u_{0} ∥}] - \lim_{m \to \infty} E [\log \frac{∥ X_{e_{m}} X_{e_{m - 1}} \dots X_{e_{0}} u_{0} ∥}{∥ X_{e_{m - 1}} \dots X_{e_{0}} u_{0} ∥}]) & = (\lim_{m \to \infty} E [\log \frac{∥ X_{{\tilde{e}}_{0}}^{(∊)} X_{{\tilde{e}}_{1}}^{(∊)} \dots X_{{\tilde{e}}_{m}}^{(∊)} u_{0} ∥}{∥ X_{{\tilde{e}}_{1}}^{(∊)} \dots X_{{\tilde{e}}_{m}}^{(∊)} u_{0} ∥}] - \lim_{m \to \infty} E [\log \frac{∥ X_{{\tilde{e}}_{0}} X_{{\tilde{e}}_{1}} \dots X_{e_{m}} u_{0} ∥}{∥ X_{{\tilde{e}}_{1}} \dots X_{e_{m}} u_{0} ∥}]) \\ = \lim_{∊ \to 0} \lim_{m \to \infty} \sum_{s = 0}^{m} a_{s, m} (∊) = : \lim_{∊ \to 0} \lim_{m \to \infty} A (m, ∊), \end{matrix}

where

a_{s, m} (∊) : = E [∊^{- 1} (\log \frac{∥ X_{{\tilde{e}}_{0}}^{(∊)} X_{{\tilde{e}}_{1}}^{(∊)} \dots X_{{\tilde{e}}_{s}}^{(∊)} X_{{\tilde{e}}_{s + 1}} \dots X_{{\tilde{e}}_{m}} u_{0} ∥}{∥ X_{{\tilde{e}}_{1}}^{(∊)} \dots X_{{\tilde{e}}_{s}}^{(∊)} X_{{\tilde{e}}_{s + 1}} \dots X_{{\tilde{e}}_{m}} u_{0} ∥} - \log \frac{∥ X_{{\tilde{e}}_{0}}^{(∊)} X_{{\tilde{e}}_{1}}^{(∊)} \dots X_{{\tilde{e}}_{s - 1}}^{(∊)} X_{{\tilde{e}}_{s}} \dots X_{{\tilde{e}}_{m}} u_{0} ∥}{∥ X_{{\tilde{e}}_{1}}^{(∊)} \dots X_{{\tilde{e}}_{s - 1}}^{(∊)} X_{{\tilde{e}}_{s}} \dots X_{{\tilde{e}}_{m}} u_{0} ∥})] .

Here e₀, e₁, … is the stationary Markov chain with transition probabilities P, and ${\tilde{e}}_{0}$ , ${\tilde{e}}_{1}$ , … is the reverse stationary Markov chain, with transition probabilities $\tilde{P}$ .

By (13), for ∊ > 0 sufficiently small

∣ a_{s, m} (∊) ∣ \leq ∊^{- 1} k_{2} r^{s - 1} \sup_{u \in U} E [ρ (X_{{\tilde{e}}_{s}}^{(∊)} u, X_{{\tilde{e}}_{s}} u)] \leq 2 k_{2} C r^{s},

where

C : = r^{- 1} \max_{e \in M} \max_{1 \leq ℓ \leq K} {\max_{u \in U} \frac{{∣ X_{e}^{'} u ∣}_{ℓ}}{{(X_{e} u)}_{ℓ}}, \max_{v \in V} \frac{{∣ v^{T} X_{e}^{'} ∣}_{ℓ}}{{(v^{T} X_{e})}_{ℓ}}} .

If we define

V_{i, j} : = \frac{1^{T} X_{{\tilde{e}}_{i}}^{(∊)} X_{{\tilde{e}}_{i + 1}}^{(∊)} \dots X_{{\tilde{e}}_{j}}^{(∊)}}{∥ 1^{T} X_{{\tilde{e}}_{i}}^{(∊)} X_{{\tilde{e}}_{i + 1}}^{(∊)} \dots X_{{\tilde{e}}_{j}}^{(∊)} ∥}

U_{i, j} : = \frac{X_{{\tilde{e}}_{i}} X_{{\tilde{e}}_{i + 1}} \dots X_{{\tilde{e}}_{j}} u_{0}}{∥ X_{{\tilde{e}}_{i}} X_{{\tilde{e}}_{i + 1}} \dots X_{{\tilde{e}}_{j}} u_{0} ∥},

then for m ≤ n,

∣ a_{s, m} (∊) - a_{s, n} (∊) ∣ \leq ∊^{- 1} E [∣ \log \frac{V {(0, s - 1)}^{T} X_{{\tilde{e}}_{s}}^{(∊)} U (s + 1, m)}{V {(0, s - 1)}^{T} X_{{\tilde{e}}_{s}} U (s + 1, m)} - \log \frac{V {(0, s - 1)}^{T} X_{{\tilde{e}}_{s}}^{(∊)} U (s + 1, n)}{V {(0, s - 1)}^{T} X_{{\tilde{e}}_{s}} U (s + 1, n)} ∣ + ∣ \log \frac{V_{1, s - 1}^{T} X_{{\tilde{e}}_{s}}^{(∊)} U_{s + 1, m}}{V_{1, s - 1}^{T} X_{{\tilde{e}}_{s}} U_{s + 1, m}} - \log \frac{V_{1, s - 1}^{T} X_{{\tilde{e}}_{s}}^{(∊)} U_{s + 1, n}}{V_{1, s - 1}^{T} X_{{\tilde{e}}_{s}} U_{s + 1, n}} ∣]

We note by (10) that ρ(U_s+1,m, U_s+1,n) ≤ k₂r^m−s; and for ∊ > 0 sufficiently small and any $v \in V$ we have

max_{\underset{e \in M}{1 \leq j \leq K,}}, ∣ log \frac{{(v^{T} X_{e}^{(∊)})}_{j}}{{(v^{T} X_{e})}_{j}} ∣ \leq 2 C ∊ .

It follows by Lemma 3.2 that

∣ a_{s, m} (∊) - a_{s, n} (∊) ∣ \leq min {16 k_{2} {Cr}^{m - s}, 4 k_{2} {Cr}^{s}} .

(47)

Putting these together, we get, for all n ≥ m,

\begin{matrix} ∣ A (m, ∊) - A (n, ∊) ∣ & \leq \sum_{s = m + 1}^{n} ∣ a_{s, n} (∊) ∣ + \sum_{s = 0}^{m} ∣ a_{s, m} (∊) - a_{s, n} (∊) ∣ \\ \leq \sum_{s = m + 1}^{n} 2 k_{2} {Cr}^{s} + \sum_{1 \leq s \leq m ∕ 2} 16 k_{2} {Cr}^{m - s} + \sum_{m ∕ 2 < \leq m} 4 k_{2} {Cr}^{s} \\ \leq \frac{20 k_{2} C}{1 - r} r^{m ∕ 2} . \end{matrix}

By Lemma A.1 we may exchange the order of the limits, to see that

a' = lim_{m \to \infty} lim_{∊ \to 0} ∊^{- 1} \sum_{s = 1}^{m} (E [log \frac{∥ X_{e_{m}}^{(∊)} X_{e_{m - 1}}^{(∊)} \dots X_{e_{s}}^{(∊)} X_{e_{s - 1}} \dots X_{e_{0}} u_{0} ∥}{∥ X_{e_{m - 1}}^{(∊)} \dots X_{e_{s}}^{(∊)} X_{e_{s - 1}} \dots X_{e_{0}} u_{0} ∥}] - E [log \frac{∥ X_{e_{m}}^{(∊)} X_{e_{m - 1}}^{(∊)} \dots X_{e_{s + 1}}^{(∊)} X_{e_{s}} \dots X_{e_{0}} u_{0} ∥}{∥ X_{e_{m - 1}}^{(∊)} \dots X_{e_{s + 1}}^{(∊)} X_{e_{s}} \dots X_{e_{0}} u_{0} ∥}]) .

(48)

This limit is the same for any choice of u₀, hence would also be the same if we replaced u₀ by a random U, with any distribution on $U$ . We choose U to have the distribution π_e₀, independent of the rest of the Markov chain. By the invariance property of the distributions π,

\begin{matrix} a' & = lim_{m \to \infty} lim_{∊ \to 0} ∊^{- 1} (\sum_{s = 1}^{m} E [log \frac{∥ X_{e_{m}}^{(∊)} X_{e_{m - 1}}^{(∊)} \dots X_{e_{s}}^{(∊)} X_{e_{s - 1}} \dots X_{e_{0}} U_{0} ∥}{∥ X_{e_{m - 1}}^{(∊)} \dots X_{e_{s}}^{(∊)} X_{e_{s - 1}} \dots X_{e_{0}} U_{0} ∥}] - E [log \frac{∥ X_{e_{m}}^{(∊)} X_{e_{m - 1}}^{(∊)} \dots X_{e_{s + 1}}^{(∊)} X_{e_{s}} \dots X_{e_{0}} U_{0} ∥}{∥ X_{e_{m - 1}}^{(∊)} \dots X_{e_{s + 1}}^{(∊)} X_{e_{s}} \dots X_{e_{0}} U_{0} ∥}]) \\ = lim_{m \to \infty} \sum_{s = 0}^{m} lim_{∊ \to 0} ∊^{- 1} E [log \frac{∥ X_{e_{m}}^{(∊)} X_{e_{m - 1}}^{(∊)} \dots X_{e_{s}}^{(∊)} U_{s - 1} ∥}{∥ X_{e_{m - 1}}^{(∊)} \dots X_{e_{s}}^{(∊)} U_{s - 1} ∥} - log \frac{∥ X_{e_{m}}^{(∊)} X_{e_{m - 1}}^{(∊)} \dots X_{e_{s + 1}}^{(∊)} X_{e_{s}} U_{s - 1} ∥}{∥ X_{e_{m - 1}}^{(∊)} \dots X_{e_{s + 1}}^{(∊)} X_{e_{s}} U_{s - 1} ∥}], \end{matrix}

(49)

where (U_s−1, e_s) has distribution π.

For m ≥ s ≥ 1 define functions

f_{s} (∊, δ) ≔ \frac{1^{T} X_{e_{m}}^{(δ)} X_{e_{m - 1}}^{(δ)} \dots X_{e_{s + 1}}^{(δ)} X_{e_{s}}^{(∊)} U_{s - 1}}{1^{T} X_{e_{m - 1}}^{(δ)} \dots X_{e_{s + 1}}^{(δ)} X_{e_{s}}^{(∊)} U_{s - 1}},

where the denominator is understood to be 1 for s = m. The summand on the right of (49) may be written as

lim_{∊ \to 0} ∊^{- 1} E [log f_{s} (∊, ∊) - log, f_{s} (0, ∊)] .

(50)

By (13),

∊^{- 1} E ∣ log f_{s} (∊, δ) - log, f_{s} (0, δ)] \leq ∊^{- 1} k_{2^{r^{s}}} ρ (X_{e_{s}}^{(∊)} U, X_{e_{s}} U) < 2 {Ck}_{2^{r^{s}}}

for ∊ in a neighborhood of 0, so the Bounded Convergence Theorem turns (50) into

E [lim_{∊ \to 0} ∊^{- 1} (log f_{s} (∊, ∊) - log, f_{s} (0, ∊))] = E [\frac{\partial log f_{s}}{\partial ∊} (0, 0)] .

(51)

We have, by linearity of the matrix product and ∥ · ∥,

\begin{matrix} \frac{\partial log f_{s}}{\partial ∊} (0, 0) & = \frac{1^{T} X_{e_{m}} \dots X_{e_{s + 1}} \frac{\partial}{\partial ∊} X_{e_{s}}^{(∊)} U_{s - 1}}{∥ X_{e_{m}} \dots X_{e_{s}} U_{s - 1} ∥} - \frac{1^{T} X_{e_{m - 1}} \dots X_{e_{s + 1}} \frac{\partial}{\partial ∊} X_{e_{s}}^{(∊)} U_{s - 1}}{∥ X_{e_{m - 1}} \dots X_{e_{s}} U_{s - 1} ∥} \\ = {\begin{matrix} (\frac{1^{T} X_{e_{m}} \dots X_{e_{s + 1}} X_{e_{s}}^{'} U_{s - 1}}{∥ X_{e_{m}} \dots X_{e_{s}} U_{s - 1} ∥} - \frac{1^{T} X_{e_{m - 1}} \dots X_{e_{s + 1}} X_{e_{s}}^{'} U_{s - 1}}{∥ X_{e_{m - 1}} \dots X_{e_{s}} U_{s - 1} ∥}) & for 1 \leq s \leq m - 1, \\ \frac{1 X_{e_{s}}^{'} U_{m - 1}}{∥ X_{e_{m}} U_{m - 1} ∥} for s = m \end{matrix} \end{matrix}

Combining this with (49) yields the telescoping sum

\begin{matrix} a' (0) = & lim_{m \to \infty} (\sum_{t = 1}^{m} E [\frac{1^{T} X_{e_{t}} \dots X_{e_{1}} X_{e_{0}}^{'} U}{1^{T} X_{e_{t}} \dots X_{e_{1}} X_{e_{0}} U}]) - \sum_{t = 1}^{m - 1} E [\frac{1^{T} X_{e_{t}} \dots X_{e_{1}} X_{e_{0}}^{'} U}{1^{T} X_{e_{t}} \dots X_{e_{1}} X_{e_{0}} U}] \\ = lim_{m \to \infty} E [\frac{1^{T} X_{e_{m}} \dots X_{e_{1}} X_{e_{0}}^{'} U}{1^{T} X_{e_{m}} \dots X_{e_{1}} X_{e_{0}} U}], \end{matrix}

where in the last line (U, e₀) has the distribution π. De ne $V_{m}^{T} : = 1^{T} X_{e_{m}} \dots X_{e_{1}} ∕ ∥ 1^{T} X_{e_{m}} \dots_{e_{1}} ∥$ . Then $V_{m}^{T}$ converges in distribution to V, with distribution ${\tilde{π}}_{e_{0}}$ (conditioned on e₀), and so

a' (0) = lim_{m \to \infty} E \frac{V_{m}^{T} X_{e_{0}}^{'} U}{V_{m}^{T} X_{e_{0}} U} = \sum_{e \in M} ν_{e} E \frac{V_{e}^{T} X_{e}^{'} U_{e}}{V_{e}^{T} X_{e} U_{e}},

which is identical to (27).

Now we estimate the error. We use the representation

U ≔ \frac{X_{{\tilde{e}}_{1}} \dots X_{{\tilde{e}}_{m}} U_{0}}{∥ X_{{\tilde{e}}_{1}} \dots X_{{\tilde{e}}_{m}} U_{0} ∥}

V^{T} ≔ \frac{V_{0}^{T} X_{e_{m}} \dots X_{e_{1}}}{∥ V_{0}^{T} X_{e_{m}} \dots X_{e_{1}} ∥},

where U₀ and $V_{0}^{T}$ are assumed to have distributions $π_{{\tilde{e}}_{m}}$ and ${\tilde{π}}_{e_{m}}$ respectively. Then

\begin{matrix} ∣ \frac{V^{(m) T} X_{e}^{'} U^{(m)}}{V^{(m) T} X_{e} U^{(m)}} - \frac{V^{T} X_{e}^{'}}{V^{T} X_{e} U} ∣ & \leq 2 (e^{2 ρ (V, V^{(m) T}) + 2 ρ (U, U^{(m)})} - 1) \frac{∣ V X_{e}^{'} U ∣}{∣ V X_{e} U ∣} \\ \leq 2 (e^{4 k_{2^{r^{m}}}} - 1) \frac{∣ V X_{e}^{'} U ∣}{V X_{e} U}, \end{matrix}

(52)

by (13). This implies the uniform bound on systematic error, and the bound on sampling error (30) follows from applying (22) to a trivial bound on the terms in the average. The simulated bound (31) also follows directly from (52).

Lemma A.1 Let A(m, ∊) be a two-dimensional array of real numbers, indexed by m ∈ $N$ and ∊ > 0, with A(m, 0) := lim_∊↓0 A(m, ∊) existing for each m, and A(∞, ∊) := lim_m→∞ A(m, ∊) existing for all ∊ sufficiently small (independent of m). Suppose A satisfies

lim_{M \to \infty} \underset{∊ ↓ 0}{lim sup} sup_{m, n > M} ∣ A (m, ∊) - A (n, ∊) ∣ = 0 .

(53)

Then the two limits

lim_{m \to \infty} lim_{∊ ↓ 0} A (m, ∊)

and

lim_{∊ ↓ 0} lim_{m \to \infty} A (m, ∊)

are equal; in particular, if one exists the other exists as well.

Proof. Suppose A_* := lim_∊↓0A(∞, ∊) exists. Then we need to show that A_* = lim_m→∞A(m, 0).

Choose any δ > 0, and choose M such that

\underset{∊ ↓ 0}{lim sup} sup_{m, n > M} ∣ A (m, ∊) - A (n, ∊) ∣ < δ .

The converse result (starting from the assumption that lim_m→∞A(m, 0) exists) follows identically.

A.4 Estimating sensitivities: Markov environments

We derive here the formula (40) by a combination of the coupling method and importance sampling. We use importance sampling for the actual computation, but coupling provides a more direct path to validating the crucial exchange of limits. As in the proof of Theorem 5.2, the error bounds are an obvious consequence of the formula (40) and the general formulas for errors described in section 4.

Given two distributions q and q* on {1, …,M}, we define a standard coupling between q and q*. Suppose we are given a uniform random variable ω on [0, 1]. Let $M_{-} ≔ {e : q_{e} < q_{e}^{*}}$ and $M_{+} ≔ {e : q_{e}^{*} < q_{e}}$ . Let $δ ≔ Σ_{e \in M_{-}} (q_{e}^{*} - q_{e}) = Σ_{e \in M_{+}} (q_{e} - q_{e}^{*})$ . We define three random variables $\overset{˘}{e}$ on $M$ , ${\overset{˘}{e}}_{+}$ on $M_{+}$ , and ${\overset{˘}{e}}_{-}$ on $M_{-}$ , according to the following distributions:

P {\overset{˘}{e} = e} = min {q_{e}, q_{e}^{*}} ∕ (1 - δ),

P {{\overset{˘}{e}}_{+} = e} = {q_{e} - q_{e}^{*}}_{+} ∕ δ,

P {{\overset{˘}{e}}_{-} = e} = {q_{e}^{*} - q_{e}}_{+} ∕ δ,

The joint distribution is irrelevant, but for definiteness we let them be independent. Then we define the coupled pair (e, e*) to have the values

\begin{matrix} (\overset{˘}{e}, \overset{˘}{e}) if & ω > δ . \\ ({\overset{˘}{e}}_{+}, {\overset{˘}{e}}_{-}) if & ω \leq δ . \end{matrix}

(54)

Then e has distribution q, e* has distribution q*, and e = e* with probability 1 − δ. This δ is called the total-variation distance between q and q*.

We write $E_{P}$ for the expectation with respect to the distribution that makes e₀, …, e_m a stationary Markov chain with transition matrix P. Define ν^(∊) to be the stationary distribution corresponding to P^(∊), and define ${\tilde{P}}^{(∊)}$ to be the time-reversed chain of P^(∊). We define

g (m; ∊; u) ≔ E_{P (∊)} [\log \frac{‖ X_{e_{m}} X_{e_{m - 1}} \dots X_{e_{0}} u ‖}{‖ X_{e_{m - 1}} \dots X_{e_{0}} u ‖}] - E_{P} [\log \frac{‖ X_{e_{m}} X_{e_{m - 1}} \dots X_{e_{0}} u ‖}{‖ X_{e_{m - 1}} \dots X_{e_{0}} u ‖}] .

By the time-reversal property,

g (m; ∊; u) ≔ E_{\tilde{P} (∊)} [\log \frac{‖ X_{e_{0}} X_{e_{1}} \dots X_{e_{m}} u ‖}{‖ X_{e_{1}} \dots X_{e_{m}} u ‖}] - E_{\tilde{P}} [\log \frac{‖ X_{e_{0}} X_{e_{1}} \dots X_{e_{m}} u ‖}{‖ X_{e_{1}} \dots X_{e_{m}} u ‖}] .

For ∊ > 0 we couple a sequence e₀, …, e_m selected from the distribution $\tilde{P}$ to a sequence $e_{0}^{(∊)}, \dots, e_{m}^{(∊)}$ selected from the distribution ${\tilde{P}}^{(∊)}$ as follows: We start by choosing $(e_{0}, e_{0}^{(∊)})$ according to the standard coupling of (ν, ν^(∊). Assume now that we have produced sequences of length i, ending in e_i−1 and $e_{i - 1}^{(∊)}$ . We then produce $(e_{i}, e_{i}^{(∊)})$ according to the standard coupling of row e_i−1 of $\tilde{P}$ to row $e_{i - 1}^{(∊)}$ of ${\tilde{P}}^{(∊)}$ . (To simplify the typography in some places we use e(i) and e^(∊)(i) interchangeably with e_i and $e_{i}^{(∊)}$ .)

Let δ = δ(∊) be the maximum of the total variation distance between ν and ν^(∊), and all of the pairs of rows. It is easy to see that there is a constant c such that δ ≤ c∊ for ∊ sufficiently small. Define ω₁, ω₂, … to be an i.i.d. sequence of uniform random variables on [0; 1], and two sequences of random times as follows: T₀ := S₀ := −1, and

T_{i + 1} = \min {t > S_{i} : ω_{t} \leq δ},

S_{i + 1} = \min {t > T_{i + 1} : e_{t}^{(∊)} = e_{t}} .

Thus, $e_{t}^{(∊)} = e_{t}$ for all S_i ≤ t < T_i+1. Define for any $u_{0} \in U$ the random vector

U_{t} ≔ \lim_{m \to \infty} \frac{X_{e (t)} \dots X_{e (t + m)} u_{0}}{‖ X_{e (t)} \dots X_{e (t + m)} u_{0} ‖},

and define a version of g conditioned on T₁ and T₂

g (m; ∊; u; T_{1}, T_{2}) ≔ E_{\tilde{P} (∊)} [\log \frac{‖ X_{e_{0}} X_{e_{1}} \dots X_{e_{m}} u ‖}{‖ X_{e_{1}} \dots X_{e_{m}} u ‖} ∣ T_{1}, T_{2}] - E_{\tilde{P}} [\log \frac{‖ X_{e_{0}} X_{e_{1}} \dots X_{e_{m}} u ‖}{‖ X_{e_{1}} \dots X_{e_{m}} u ‖} ∣ T_{1}, T_{2}] .

Then for any $u \in U$ ,

\nabla_{W} a (P) = a' (0) = \lim_{∊ ↓ 0} \lim_{m \to \infty} ∊^{- 1} E [g (m; ∊; u; T_{1}, T_{2})] .

(55)

We also define

γ (∊; T_{1}, T_{2}) ≔ E [\log \frac{‖ X_{e^{(∊)} (0)} \dots X_{e^{(∊)} (S_{1} - 1)} U_{S_{1}} ‖}{‖ X_{e^{(∊)} (1)} \dots X_{e^{(∊)} (S_{1} - 1)} U_{S_{1}} ‖} - \log \frac{‖ X_{e (0)} X_{e (1)} \dots X_{e (S_{1} - 1)} U_{S_{1}} ‖}{‖ X_{e (1)} \dots X_{e (S_{1} - 1)} U_{S_{1}} ‖} ∣ T_{1}, T_{2}]

We break up these expectations into their portion overlapping three different events:

(i)
{T₁ > m};
(ii)
{T₂ > m ≥ T₁};
(iii)
{m ≥ T₂}.

On the event {T₁ > m} we have g(m, ∊ u; T₁, T₂) = 0, and T₁ − m is geometrically distributed with parameter δ. By (13), γ is bounded by k₂r^T₁−1.

On the event {T₂ > m ≥ T₁}: We have e^(∊)(i) = e(i) for i < T₁ and for S₁ ≤ i ≤ m. If S₁ ≤ m,

\begin{matrix} U_{S_{1}} & = X_{e (S_{1})} \dots X_{e (m)} U_{m + 1} ∕ ‖ X_{e (S_{1})} \dots X_{e (m)} U_{m + 1} ‖ \\ = X_{e_{(S_{1})}^{(∊)}} \dots X_{e_{(m)}^{(∊)}} U_{m + 1} ∕ ‖ X_{e_{(S_{1})}^{(∊)}} \dots X_{e_{(m)}^{(∊)}} U_{m + 1} ‖ \end{matrix}

Thus we may write where

\begin{matrix} ∣ γ (∊; T_{1}, T_{2}) - g (m, ∊; u; T_{1}, T_{2}) ∣ & \leq ∣ E [\log \frac{‖ X_{e (0)} \dots X_{e (T_{1} - 1)} X_{e^{(∊)} (T_{1})} \dots X_{e^{(∊)} (m)} {\overset{˘}{U}}^{(∊)} ‖}{‖ X_{e (1)} \dots X_{e (T_{1} - 1)} X_{e^{(∊)} (T_{1})} \dots X_{e^{(∊)} (m)} U^{(∊)} ‖} ∣ T_{1}, T_{2}] - E [\log \frac{‖ X_{e (0)} \dots X_{e (T_{1} - 1)} X_{e^{(∊)} (T_{1})} \dots X_{e^{(∊)} (m)} u ‖}{‖ X_{e (1)} \dots X_{e (T_{1} - 1)} X_{e^{(∊)} (T_{1})} \dots X_{e^{(∊)} (m)} u ‖} ∣ T_{1}, T_{2}] ∣ + ∣ E [\log \frac{‖ X_{e (0)} X_{e (1)} \dots X_{e (m)} \overset{˘}{U} ‖}{‖ X_{e (1)} \dots X_{e (m)} \overset{˘}{U} ‖} - \log \frac{‖ X_{e (0)} \dots X_{e (T_{1} - 1)} X_{e (T_{1})} \dots X_{e (m)} u ‖}{‖ X_{e (1)} \dots X_{e (T_{1} - 1)} X_{e (T_{1})} \dots X_{e (m)} u ‖} ∣ T_{1}, T_{2}] ∣ \\ \leq 2 k_{2} r^{m}, \end{matrix}

where ${\overset{˘}{U}}^{(∊)} = \overset{˘}{U} = U_{m + 1}$ if S₁ ≤ m; otherwise

{\overset{˘}{U}}^{(∊)} = \frac{X_{e^{(∊)} (m + 1)} \dots X_{e^{(∊)} (S_{1} - 1)} U_{S_{1}}}{‖ X_{e^{(∊)} (m + 1)} \dots X_{e^{(∊)} (S_{1} - 1)} U_{S_{1}} ‖};

\overset{˘}{U} = \frac{X_{(m + 1)} \dots X_{(S_{1} - 1)} U_{S_{1}}}{‖ X_{(m + 1)} \dots X_{(S_{1} - 1)} U_{S_{1}} ‖} .

On the event {T₂ ≤ m}: The above approach shows that

∣ γ (∊; T_{1}, T_{2}) - g (m, ∊; u; T_{1}, T_{2}) ∣ \leq 2 k_{2} r^{T_{2} - 1} .

(57)

Combining these bounds, we obtain

∣ γ (∊; T_{1}, T_{2}) - g (m, ∊; u; T_{1}, T_{2}) ∣ \leq k_{2} r^{T_{1} - 1} 1_{{T_{1} > m}} + 2 k_{2} r^{m} 1_{{T_{1} \leq m}} + 2 k_{2} r^{T_{2} - 1} .

(58)

Taking the expectation with respect to the distribution of T₁ and T₂, using the fact that T₁ and T₂ − S₁ are independent with distribution geometric with parameter δ, we obtain

E [∣ γ (∊; T_{1}, T_{2}) - g (m, ∊; u; T_{1}, T_{2}) ∣] \leq \frac{k_{2}}{1 - r} r^{m - 1} δ + 2 k_{2} δ m r^{m} + \frac{2 k_{2}}{r^{2} {(1 - r)}^{2}} δ^{2} .

(59)

Since δ is bounded by a constant times |∊|, we may find a constant C such that (by the triangle inequality) for all ∊, positive integers m, and $u \in U$ ,

∣ E [γ (∊; T_{1}, T_{2})] - E [g (m, ∊; u; T_{1}, T_{2})] ∣ \leq E [∣ γ (∊; T_{1}, T_{2}) - g (m, ∊; u; T_{1}, T_{2}) ∣] \leq C (m r^{m} ∣ ∊ ∣ + ∊^{2}) .

(60)

This bound allows us to exchange the limits in (55):

\begin{matrix} a' (0) & = \lim_{∊ \to 0} \lim_{m \to \infty} ∊^{- 1} E [g (m, ∊; u; T_{1}, T_{2})] \\ = \lim_{∊ \to 0} ∊^{- 1} E [γ (∊; T_{1}, T_{2})] \\ = \lim_{m \to \infty} \lim_{∊ \to 0} ∊^{- 1} E [g (m, ∊; u; T_{1}, T_{2})] \\ = \lim_{m \to \infty} \frac{d}{d ∊} ∣_{∊ = 0} E_{P (∊)} [\log \frac{‖ X_{e_{m}} X_{e_{m - 1}} \dots X_{e_{0}} u ‖}{‖ X_{e_{m - 1}} \dots X_{e_{0}} u ‖}] \end{matrix}

(61)

Now we apply the method of importance sampling. We may assume without loss of generality that W (e, e′) = 0 whenever P (e, e′) = 0 (using the analyticity of a, and the fact that the formula (40) is nonsingular on the nonnegative orthant). For any function $Z : M^{m + 1} \to R$ ,

E_{P (∊)} [Z (e_{0}, \dots, e_{m})] = E_{P} [Z (e_{0}, \dots, e_{m}) F (∊; e_{0}, \dots, e_{m})],

where F is the Radon-Nikodym derivative

\begin{matrix} F (∊; e_{0}, \dots, e_{m}) & = \frac{d P^{(∊)}}{d P} (e_{0}, \dots, e_{m}) \\ = \frac{ν_{e_{0}}^{(∊)}}{ν_{e_{0}}} \prod_{i = 0}^{m - 1} \frac{P^{(∊)} (e_{i}, e_{i + 1})}{P (e_{i}, e_{i + 1})} . \end{matrix}

This allows us to rewrite

a' (0) = \lim_{m \to \infty} \frac{d}{d ∊} ∣ {}_{∊ = 0} E_{P} [\frac{ν_{e_{0}}^{(∊)}}{ν_{e_{0}}} \prod_{i = 0}^{m - 1} \frac{P^{(∊)} (e_{i}, e_{i + 1})}{P (e_{i}, e_{i + 1})} \log \frac{‖ X_{e_{m}} X_{e_{m - 1}} \dots X_{e_{0}} u ‖}{‖ X_{e_{m - 1}} \dots X_{e_{0}} u ‖}]

(62)

For any fixed m, there is an upper bound on ∊⁻¹(F(∊ e₀, …, e_m) −1), so we may move the differentiation inside the expectation, to obtain

\begin{matrix} a' (0) & = \lim_{m \to \infty} E_{P} [\frac{d}{d ∊} ∣_{∊ = 0} \frac{ν_{e_{0}}^{(∊)}}{ν_{e 0}} \prod_{i = 0}^{m - 1} \frac{P^{(∊)} (e_{i}, e_{i + 1})}{P (e_{i}, e_{i + 1})} \log \frac{‖ X_{e_{m}} X_{e_{m - 1}} \dots X_{e_{0}} u ‖}{‖ X_{e_{m - 1}} \dots X_{e_{0}} u ‖}] \\ = \lim_{m \to \infty} E_{P} [{(ν_{e_{0}})}^{- 1} \frac{d ν_{e_{0}}^{(∊)}}{d ∊} ∣_{∊ = 0} \log \frac{‖ X_{e_{m}} X_{e_{m - 1}} \dots X_{e_{0}} u ‖}{‖ X_{e_{m - 1}} \dots X_{e_{0}} u ‖}] + \lim_{m \to \infty} \sum_{i = 0}^{m - 1} E_{P} [\frac{W (e_{i}, e_{i + 1})}{P (e_{i}, e_{i + 1})} \log \frac{‖ X_{e_{m}} X_{e_{m - 1}} \dots X_{e_{0}} u ‖}{‖ X_{e_{m - 1}} \dots X_{e_{0}} u ‖}] \end{matrix}

(63)

The first limit is 0. To see this, rewrite it as a sum over possible values of e₀:

\lim_{m \to \infty} \sum_{e \in M} ν_{e} E_{P_{e}} [{(ν_{e})}^{- 1} \frac{d ν_{e}^{(∊)}}{d ∊} ∣_{∊ = 0} \log \frac{‖ X_{e_{m}} X_{e_{m - 1}} \dots X_{e} u ‖}{‖ X_{e_{m - 1}} \dots X_{e} u ‖}] = \lim_{m \to \infty} \sum_{e \in M} \frac{d ν_{e}^{(∊)}}{d ∊} E_{P_{e}} [\log \frac{‖ X_{e_{m}} X_{e_{m - 1}} \dots X_{e_{1}} X_{e} u ‖}{‖ X_{e_{m - 1}} \dots X_{e_{1}} X_{e} u ‖}]

Since ν^(∊) is a probability distribution, it must be that $\sum_{\tilde{e} = 1}^{M} \frac{d ν_{\tilde{e}}^{(∊)}}{d ∊} = 0$ . Thus, the expression in the limit becomes 0 if we replace the expectation by a constant, independent of e. By Lemma A.2 it follows that the limit is 0.

To compute the other limit, we sum over all possible pairs (e_i, e_i+1) = $(\tilde{e}, e)$ . The summand becomes

\sum_{\tilde{e}, e \in M} ν (\tilde{e}) W (\tilde{e}, e) E [\log \frac{‖ X_{e_{m}} X_{e_{m - 1}} \dots X_{e_{0}} u ‖}{‖ X_{e_{m - 1}} \dots X_{e_{0}} u ‖} ∣ e_{i} = \tilde{e}, e_{i + 1} = e]

(64)

In order to analyze this, we need to consider the distribution of e₀, …, e_m, conditioned on $e_{i} = \tilde{e}$ and e_i+1 = e. By the Markov property, this splits into two independent Markov chains: e = e_i+1, …, e_m is a Markov chain of length m − i, with transition probabilities P and starting point e, while $\tilde{e} = e_{i}$ , e_i−1, …, e₀ is a Markov chain of length i + 1 with transition probabilities $\tilde{P}$ and starting point $\tilde{e}$ . Define two independent infinite sequences ${\tilde{e}}_{0}$ , ${\tilde{e}}_{1}$ , … and e₀e₁, …, which are Markov chains with transitions $\tilde{P}$ and P respectively, beginning in ${\tilde{e}}_{0} = \tilde{e}$ and e_i+1 = e. Define for $i \geq 1, {\overset{ˇ}{U}}_{i} (\tilde{e}) ≔ X_{\tilde{e}} X_{{\tilde{e}}_{1}} \dots X_{{\tilde{e}}_{i}} u$ with ${\overset{ˇ}{U}}_{0} (\tilde{e}) ≔ 1$ , and ${\overset{ˇ}{V}}_{i}^{T} (e) ≔ 1^{T} X_{e_{i - 1}} \dots X_{e_{1}} X_{e}$ with ${\overset{ˇ}{V}}_{0}^{T} (e) ≔ 1^{T}$ . Also define

U_{i} (e) ≔ \frac{{\overset{ˇ}{U}}_{i} (e)}{‖ {\overset{ˇ}{U}}_{i} (e) ‖}, V_{i}^{T} (e) ≔ \frac{{\overset{ˇ}{V}}_{i}^{T} (e)}{‖ {\overset{ˇ}{V}}_{i}^{T} (e) ‖} .

Since ∥u∥ = 1^Tu for any nonnegative column vector u, the expression (63) becomes

\begin{matrix} a' (0) & = \sum_{\tilde{e}, e \in M} ν (\tilde{e}) W (\tilde{e}, e) \lim_{m \to \infty} {\sum_{i = 0}^{m - 1} (E [\log {\overset{ˇ}{V}}_{i + 1}^{T} (e) {\overset{ˇ}{U}}_{m - 1} (\tilde{e})] - E [\log {\overset{ˇ}{V}}_{i}^{T} (e) {\overset{ˇ}{U}}_{m - 1} (\tilde{e})])} \\ = \sum_{\tilde{e}, e \in M} ν (\tilde{e}) W (\tilde{e}, e) \lim_{m \to \infty} {\sum_{i = 0}^{m - 1} (E [\log ‖ {\overset{ˇ}{V}}_{i + 1}^{T} (e) ‖] - E [\log ‖ {\overset{ˇ}{V}}_{i}^{T} (e) ‖] + E [\log V_{i + 1}^{T} (e) U_{i} (\tilde{e})] - E [\log V_{i}^{T} (e) U_{i} (\tilde{e})])} \\ = \sum_{\tilde{e} \in M} ν (\tilde{e}) \lim_{m \to \infty} \sum_{e \in M} W (\tilde{e}, e) E [\log ‖ {\overset{ˇ}{V}}_{m}^{T} (e) ‖] + \sum_{e, \tilde{e} \in M} ν (\tilde{e}) W (\tilde{e}, e) \lim_{m \to \infty} \sum_{i = 0}^{m - 1} E [\log \frac{V_{i + 1}^{T} (e) U_{m - 1} (\tilde{e})}{V_{i}^{T} (e) U_{m - 1} (\tilde{e})}] \end{matrix}

(65)

In the last line we have used the fact that $Σ_{e \in M} W (\tilde{e}, e) = 0$ , which means that $Σ_{e \in M} W (\tilde{e}, e) E [log ∥ {\overset{ˇ}{V}}_{0}^{T} (e) ∥] = 0$ as well, since ${\overset{ˇ}{V}}_{0}^{T} (e) = 1^{T}$ is independent of e. The same reasoning implies that if we define ${\overset{ˇ}{V}}_{i}^{T} (ν)$ to be the version of ${\overset{ˇ}{V}}_{i}^{T}$ started in the stationary distribution — for instance, starting from realizations of ${\overset{ˇ}{V}}_{i}^{T} (e)$ , define ${\overset{ˇ}{V}}_{i}^{T} (ν)$ to be equal to ${\overset{ˇ}{V}}_{i}^{T} (e)$ with probability ν_e — then $Σ_{e \in M} W (\tilde{e}, e) E [log ∥ {\overset{ˇ}{V}}_{m}^{T} (ν) ∥] = 0$ . The first term on the right-hand side of (65) may then be written as

\sum_{\tilde{e}, e \in M} ν (\tilde{e}) W (\tilde{e}, e) \lim_{m \to \infty} E [\log \frac{‖ {\overset{ˇ}{V}}_{m}^{T} (e) ‖}{‖ {\overset{ˇ}{V}}_{m}^{T} (ν) ‖}] = \sum_{\tilde{e}, e \in M} ν (\tilde{e}) W (\tilde{e}, e) ζ_{e} .

(66)

To compute the second term, we note that $U_{\infty} (\tilde{e}) ≔ {lim}_{i \to \infty} U_{i} (\tilde{e})$ exists, with distribution $π_{\tilde{e}}$ , and $ρ (U_{i} (\tilde{e}), U_{\infty} (\tilde{e})) \leq k_{2} r^{i}$ ; similarly, $V_{\infty}^{T} (e) ≔ {lim}_{i \to \infty} V_{i}^{T} (e)$ exists, with distribution ${\tilde{π}}_{e}$ , and $ρ (V_{i} (e), V_{i + 1} (e)) \leq k_{2} r^{i}$ . Thus

∣ \log V_{i + 1}^{T} (e) U_{m - 1} (\tilde{e}) - \log V_{i}^{T} (e) U_{m - 1} (\tilde{e}) ∣ \leq ρ (V_{i} (e), V_{i + 1} (e)) \leq k_{2} r^{i},

We break up the sum on the right-hand side of (65) into three pieces:

\sum_{0 \leq i \leq m - 1} E [\log V_{i + 1}^{T} (e) U_{m - 1} (\tilde{e}) - \log V_{i}^{T} (e) U_{m - 1} (\tilde{e})] = \sum_{0 \leq i \leq m ∕ 2} E [\log V_{i + 1}^{T} (e) U_{\infty} (\tilde{e}) - \log V_{i}^{T} (e) U_{\infty} (\tilde{e})] + \sum_{0 \leq i \leq m ∕ 2} E [\log \frac{V_{i + 1}^{T} (e) U_{m - 1} (\tilde{e})}{V_{i + 1}^{T} (e) U_{\infty} (\tilde{e})} - \log \frac{V_{i}^{T} (e) U_{m - 1} (\tilde{e})}{V_{i}^{T} (e) U_{\infty} (\tilde{e})}] + \sum_{m ∕ 2 \leq i \leq m - 1} E [\log V_{i + 1}^{T} (e) U_{m - i} (\tilde{e}) - \log V_{i}^{T} (e) U_{m - i} (\tilde{e})] .

The first sum telescopes to

E [\log V_{1 + m ∕ 2}^{T} (e) U (\tilde{e}) - \log V_{0}^{T} (e) U (\tilde{e})] = E [\log V_{1 + m ∕ 2}^{T} (e) U (\tilde{e})] - \log K,

applying the fact that $V_{0}^{T} = 1^{T} ∕ K$ , so that $V_{0}^{T} (e) U (\tilde{e}) = ∥ U (\tilde{e}) ∥ ∕ K - 1 ∕ K$ . Applying (9), the second and third sums are bounded by

\sum_{0 \leq i \leq m ∕ 2} 2 k_{2} r^{m - 1} + \sum_{m ∕ 2 < i \leq m - 1} k_{2} r^{i} \leq \frac{3 k_{2} r^{m ∕ 2}}{1 - r} .

Thus

\lim_{m \to \infty} \sum_{i = 0}^{m - 1} E [\log \frac{V_{i + 1}^{T} (e) U_{m - 1} (\tilde{e})}{V_{i}^{T} (e) U_{m - i} (\tilde{e})}] = E [V^{T} (e) U (\tilde{e})],

(67)

completing the proof of Theorem 7.1.

Lemma A.2. For any u, $\overset{˘}{u} \in U$ and e, $e^{*} \in M$ , if we let e₀, e₁, … and $e_{0}^{*}, e_{1}^{*}$ , … be realisations of the Markov chain P starting at e₀ = e and $e_{0}^{*} = e^{*}$ respectively. Then

∣ E [\log \frac{‖ X_{e_{m + 1}} \dots X_{e_{0}} u ‖}{‖ X_{e_{m}} \dots X_{e_{0}} u ‖}] - E [\log \frac{‖ X_{e_{m + 1}^{*}} \dots X_{e_{0}^{*}} \overset{˘}{u} ‖}{‖ X_{e_{m}^{*}} \dots X_{e_{0}^{*}} \overset{˘}{u} ‖}] ∣ \leq \frac{k_{2} D}{1 - r} (m + 1) {(ξ \lor r)}^{m},

(68)

where ξ and D are the constants that satisfy (38).

Proof. Using the maximal coupling, we create coupled versions of (e_i, $e_{i}^{*}$ ), such that the coupling time τ satisfies

P {τ \geq t} \leq ‖ ν - P^{t} (e, \cdot) ‖ + ‖ ν - P^{t} (e *, \cdot) ‖ \leq 2 D ξ^{t} .

Define

u_{τ} ≔ \frac{X_{e_{τ - 1}} \dots X_{e_{0}} u}{‖ X_{e_{τ - 1}} \dots X_{e_{0}} u ‖}, {\overset{˘}{u}}_{τ} ≔ \frac{X_{e_{τ - 1}^{*}} \dots X_{e_{0}^{*}}}{‖ X_{e_{τ - 1}^{*}} \dots X_{e_{0}^{*}} \overset{˘}{u} ‖} .

Then by the bound (13),

\begin{matrix} ∣ E [\log \frac{‖ X_{e_{m + 1}} \dots X_{e_{0}} u ‖}{‖ X_{e_{m}} \dots X_{e_{0}} u ‖}] - E [\log \frac{‖ X_{e_{m + 1}^{*}} \dots X_{e_{0}^{*}} \overset{˘}{u} ‖}{‖ X_{e_{m}^{*}} \dots X_{e_{0}^{*}} \overset{˘}{u} ‖}] ∣ & \leq E [∣ \log \frac{‖ X_{e_{m + 1}} \dots X_{e_{0}} u}{X_{e_{m}} \dots X_{e_{0}} u ‖} - \log \frac{‖ X_{e_{m + 1}^{*}} \dots X_{e_{0}^{*}} \overset{˘}{u} ‖}{‖ X_{e_{m}^{*}} \dots X_{e_{0}^{*}} \overset{˘}{u} ‖} ∣] \\ = E [∣ \log \frac{‖ X_{e_{m + 1}} \dots X_{e_{τ}} u_{τ} ‖}{‖ X_{e_{m}} \dots X_{e_{0}} u ‖} - \log \frac{‖ X_{e_{m + 1}} \dots X_{e_{τ}} \overset{˘}{u} ‖}{‖ X_{e_{m}} \dots X_{e_{τ}} {\overset{˘}{u}}_{τ} ‖} ∣] \\ \leq E [k_{2} r^{m - τ}] \\ \leq \frac{k_{2} D}{1 - r} \sum_{t = 0}^{m} r^{m - t} ξ^{t} \\ \leq \frac{k_{2} D}{1 - r} (m + 1) {(ξ \lor r)}^{m} . \end{matrix}

A.5 A version of the Furstenberg-Kesten Theorem

Theorem A.3 (Furstenberg-Kesten) The limit defined in (2) exists almost surely, and is deterministic, given by (46).

Proof. Let e₁, e₂,… be a stationary Markov chain with transition matrix P. If we choose $N_{0} \in U$ , this induces a Markov chain Y_t := (N_t–1/∥N_t–1∥, e_t) with state space $U \times M$ . We extend the Hilbert metric to $U \times M$ by ρ((u, e), (u_*, e_*)) = ρ(u, u_*) + 1_{{e≠e_*}}.

If we define the function F : $U \times M \to R$ by

F (u, m) ≔ \log ‖ X_{m} u ‖,

then

\begin{matrix} \log ‖ N_{t} ‖ & = \sum_{i = 1}^{t} \log \frac{‖ N_{i} ‖}{‖ N_{i - 1} ‖} \\ = \sum_{i = 1}^{t} \log ‖ X_{e_{i}} \frac{N_{i - 1}}{‖ N_{i - 1} ‖} ‖ \\ = \sum_{i = 1}^{t} F (Y_{i}) . \end{matrix}

If (Y_i) were a chain on a finite space, we would invoke the law of large numbers for Markov chains, also known as the pathwise Individual Ergodic Theorem (cf. Theorem 1.10.2 of Norris (1998)) to see that

\lim_{t \to \infty} t^{- 1} \log ‖ N_{t} ‖ = \lim_{t \to \infty} t^{- 1} \sum_{i = 1}^{t} F (Y_{i}) exists and equals π [F] ≔ \sum π_{y} F (y),

where π is the unique stationary distribution. The same would be true for a positive recurrent Markov chain on a countable state space.

On a more general state space, the pathwise ergodic theorem still holds as long as the Markov chain is uniquely ergodic; that is, it has a unique stationary distribution π (see Theorem 6.1 of Hernández-Lerma and Lasserre (1998), or Chapter 6 of Walters (1982)), and π is ergodic.

We first show the existence of π. Choose $u_{0} \in U$ , and let …, e₋₁, e₀, e₁, e₂, … be a stationary Markov chain on $M$ with transitions P, infinite in both directions (as in section 3.2). Then

ρ (U_{- 1, - n}, U_{- 1, - n - 1}) = ρ (X_{e_{- 1}} \dots X_{e_{- n}} u_{0}, X_{e_{- 1}} \dots X_{e_{- n}} X_{e_{- n - 1}} u_{0}) \leq k_{2} r^{n},

implying that ${(U_{- 1, - n})}_{n = 1}^{\infty}$ is always a Cauchy sequence, hence converges to a random variable $U_{- 1, - \infty} \in U$ . Define π to be the distribution of the pair (U_−1,−,∞, e₀).

Now, conditioned on starting at Y₀ = (U_−1,−∞, e₀), the next step Y₁ is (X_e₀U_−1,−∞/∥X_e₀U_−1,−∞∥ e₁). Notice that

\frac{X_{e_{0}} U_{- 1, - \infty}}{‖ X_{e_{0}} U_{- 1, - \infty} ‖} = U_{0, - \infty}

has the same distribution as U_−1,−∞, so that π is a stationary distribution for (Y_t).

It remains to show that π is uniquely ergodic. Suppose that we start the chain in an alternative distribution μ. Define Y₀ = (U, e₁) to have distribution π and $Y_{0}^{*} = (U^{*}, e_{1}^{*})$ to have distribution μ. Let e₁, e₂, … and $e_{1}^{*}, e_{2}^{*}$ … be coupled versions of the environment chain, started in e₁ and $e_{1}^{*}$ respectively. For definiteness, we say that they are independent up to the first time τ when $e_{τ} = e_{τ}^{*}$ — and then $e_{τ + i} = e_{τ + i}^{*}$ for i ≥ 0. Then

Y_{m} ≔ (\frac{X_{e_{m}} \dots X_{e_{1}} U}{‖ X_{e_{m}} \dots X_{e_{1}} U ‖}, e_{m + 1})

Y_{m}^{*} ≔ (\frac{X_{e_{m}^{*}} \dots X_{e_{1}^{*}} U}{‖ X_{e_{m}^{*}} \dots X_{e_{1}^{*}} U * ‖}, e_{m + 1}^{*})

are both realisations of the Markov chain Y, with initial distribution of π and μ respectively. We write μ_m for the distribution of $Y_{m}^{*}$ . We see that

ρ (Y_{m}, Y_{m}^{*}) \leq k_{2} r^{m - τ} 1_{r > m} \to_{P} 0 as m \to \infty .

Thus, μ_m → π weakly. By Theorem 6.12 of Walters (1982), it follows that π is ergodic (in fact, strong mixing). Furthermore, if μ were an invariant distribution then μ_m = μ, so the convergence implies μ = π. Therefore, π is uniquely ergodic.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

Asmussen S, Glynn P. Stochastic simulation: Algorithms and analysis. Springer Verlag; 2007. [Google Scholar]
Barnsley MF, Elton JH. A new class of Markov processes for image encoding. Annals of Applied Probability. 1988;20:14–32. [Google Scholar]
Boyce M, Haridas C, Lee C. Demography in an increasingly variable world. Trends in Ecology & Evolution. 2006;21(3):141–148. doi: 10.1016/j.tree.2005.11.018. [DOI] [PubMed] [Google Scholar]
Brémaud P. Maximal coupling and rare perturbation sensitivity analysis. Queueing Systems. 1992;11:307–33. [Google Scholar]
Bushell PJ. Hilbert's metric and positive contraction mappings in a Banach space. Archive for Rational Mechanics and Analysis. 1973;52(4):330–338. [Google Scholar]
Caswell H. Matrix population models: construction, analysis and interpretation. 2nd edition Sinauer Associates inc. Publishers; Sunderland, Mass.: 2001. [Google Scholar]
Cohen J. Ergodicity of age structure in populations with Markovian vital rates. II. General states. Advances in Applied Probability. 1977:18–37. [Google Scholar]
Ellner S, Rees M. Stochastic stable population growth in integral projection models: theory and application. Journal of Mathematical Biology. 2007;54(2):227–256. doi: 10.1007/s00285-006-0044-8. [DOI] [PubMed] [Google Scholar]
Elton JH. A multiplicative ergodic theorem for Lipschitz maps. Stochastic Processes Appl. 1990;34(1):39–47. [Google Scholar]
Furstenberg H, Kesten H. Products of random matrices. The Annals of Mathematical Statistics. 1960a;31(2):457–469. [Google Scholar]
Furstenberg H, Kesten H. Products of random matrices. The Annals of Mathematical Statistics. 1960b;31(2):457–69. [Google Scholar]
Golubitsky M, Keeler EB, Rothschild M. Convergence of the age structure: applications of the projective metric. Theoretical population biology. 1975;7(1):84. doi: 10.1016/0040-5809(75)90007-6. [DOI] [PubMed] [Google Scholar]
Griffeath D. A maximal coupling for Markov chains. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete. 1975;31:95–106. [Google Scholar]
Grimmett G, Stirzaker D. Probability and random processes. Oxford University Press; USA: 2001. [Google Scholar]
Häggström O. Finite Markov Chains and Algorithmic Applications. Cambridge University Press; Cambridge: 2002. [Google Scholar]
Haridas CV, Tuljapurkar S. Elasticities in variable environments: properties and implications. Am Nat. 2005;166(4):481–95. doi: 10.1086/444444. [DOI] [PubMed] [Google Scholar]
Hernández-Lerma O, Lasserre JB. Ergodic theorems and ergodic decomposition for Markov chains. Acta Applicandae Mathematicae. 1998;54:99–119. [Google Scholar]
Hoeffding W. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association. 1963:13–30. [Google Scholar]
Kendall WS. Notes on perfect simulation. In: F. L, J-S. W, editors. Markov Chain Monte Carlo: Innovations and Applications. 2005. pp. 93–146. World Scientific. [Google Scholar]
Lande R, Engen S, Saether B-E. Stochastic Population Dynamics in Ecology and Conservation. Oxford University Press; Oxford: 2003. [Google Scholar]
Lange K. On cohen's stochastic generalization of the strong ergodic theorem of demography. Journal of Applied Probability. 1979:496–504. [PubMed] [Google Scholar]
Lange K, Holmes W. Stochastic stable population growth. Journal of Applied Probability. 1981;18(2):325–334. [PubMed] [Google Scholar]
Lee RD, Tuljapurkar S. Stochastic population forecasts for the united states: Beyond high, medium, and low. Journal of the American Statistical Association. 1994;89(428):1175–1189. [PubMed] [Google Scholar]
McNamara J. Optimal life histories for structured populations in fluctuating environments. Theoretical Population Biology. 1997;51(2):94–108. [Google Scholar]
Metz J, Nisbet R, Geritz S. How should we define `fitness' for general ecological scenarios? Trends in Ecology & Evolution. 1992;7(6):198–202. doi: 10.1016/0169-5347(92)90073-K. [DOI] [PubMed] [Google Scholar]
Morris W, Doak D. Quantitative conservation biology. Sinauer Associates, inc. Publishers; Sunderland, Massachusetts, USA: 2002. [Google Scholar]
Morris W, Pfister C, Tuljapurkar S, Haridas C, Boggs C, Boyce M, Bruna E, Church D, Coulson T, Doak D. Longevity can buffer plant and animal populations against changing climatic variability. Ecology. 2008;89(1):19–25. doi: 10.1890/07-0774.1. [DOI] [PubMed] [Google Scholar]
Norris J. Markov Chains. Cambridge University Press; 1998. [Google Scholar]
Peres Y. Domains of analytic continuation for the top lyapunov exponent. Annales de l'Institut Henri Poincaré. Probabilités et Statistiques. 1992;28(1):131–48. [Google Scholar]
Pitman J. On coupling of markov chains. Probability Theory and Related Fields. 1976;35(4):315–322. [Google Scholar]
Roberts GO, Rosenthal JS. General state space Markov chains and MCMC algorithms. Probab. Surv. 2004;1:20–71. electronic. [Google Scholar]
Seneta E. Non-negative matrices and Markov chains. Springer Verlag; 1981. [Google Scholar]
Steinsaltz D. Locally contractive iterated function systems. Annals of Probability. 1999:1952–1979. [Google Scholar]
Tuljapurkar S. Population dynamics in variable environments. III. Evolutionary dynamics of r-selection. Theoretical Population Biology. 1982;21(1):141–165. doi: 10.1016/0040-5809(85)90019-x. [DOI] [PubMed] [Google Scholar]
Tuljapurkar S. Lecture notes in biomathematics. Springer-Verlag; New York: 1990. Population dynamics in variable environments; p. 85. [Google Scholar]
Tuljapurkar S, Horvitz CC, Pascarella JB. The many growth rates and elasticities of populations in random environments. The American Naturalist. 2003;162(4):489–502. doi: 10.1086/378648. [DOI] [PubMed] [Google Scholar]
Tuljapurkar SD, Orzack SH. Population dynamics in variable environments. I. Long-run growth rates and extinction. Theoretical Population Biology. 1980;18:314–342. [Google Scholar]
Walters P. An Introduction to Ergodic Theory. Springer Verlag; New York, Heidelberg, Berlin: 1982. [Google Scholar]

[R1] Asmussen S, Glynn P. Stochastic simulation: Algorithms and analysis. Springer Verlag; 2007. [Google Scholar]

[R2] Barnsley MF, Elton JH. A new class of Markov processes for image encoding. Annals of Applied Probability. 1988;20:14–32. [Google Scholar]

[R3] Boyce M, Haridas C, Lee C. Demography in an increasingly variable world. Trends in Ecology & Evolution. 2006;21(3):141–148. doi: 10.1016/j.tree.2005.11.018. [DOI] [PubMed] [Google Scholar]

[R4] Brémaud P. Maximal coupling and rare perturbation sensitivity analysis. Queueing Systems. 1992;11:307–33. [Google Scholar]

[R5] Bushell PJ. Hilbert's metric and positive contraction mappings in a Banach space. Archive for Rational Mechanics and Analysis. 1973;52(4):330–338. [Google Scholar]

[R6] Caswell H. Matrix population models: construction, analysis and interpretation. 2nd edition Sinauer Associates inc. Publishers; Sunderland, Mass.: 2001. [Google Scholar]

[R7] Cohen J. Ergodicity of age structure in populations with Markovian vital rates. II. General states. Advances in Applied Probability. 1977:18–37. [Google Scholar]

[R8] Ellner S, Rees M. Stochastic stable population growth in integral projection models: theory and application. Journal of Mathematical Biology. 2007;54(2):227–256. doi: 10.1007/s00285-006-0044-8. [DOI] [PubMed] [Google Scholar]

[R9] Elton JH. A multiplicative ergodic theorem for Lipschitz maps. Stochastic Processes Appl. 1990;34(1):39–47. [Google Scholar]

[R10] Furstenberg H, Kesten H. Products of random matrices. The Annals of Mathematical Statistics. 1960a;31(2):457–469. [Google Scholar]

[R11] Furstenberg H, Kesten H. Products of random matrices. The Annals of Mathematical Statistics. 1960b;31(2):457–69. [Google Scholar]

[R12] Golubitsky M, Keeler EB, Rothschild M. Convergence of the age structure: applications of the projective metric. Theoretical population biology. 1975;7(1):84. doi: 10.1016/0040-5809(75)90007-6. [DOI] [PubMed] [Google Scholar]

[R13] Griffeath D. A maximal coupling for Markov chains. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete. 1975;31:95–106. [Google Scholar]

[R14] Grimmett G, Stirzaker D. Probability and random processes. Oxford University Press; USA: 2001. [Google Scholar]

[R15] Häggström O. Finite Markov Chains and Algorithmic Applications. Cambridge University Press; Cambridge: 2002. [Google Scholar]

[R16] Haridas CV, Tuljapurkar S. Elasticities in variable environments: properties and implications. Am Nat. 2005;166(4):481–95. doi: 10.1086/444444. [DOI] [PubMed] [Google Scholar]

[R17] Hernández-Lerma O, Lasserre JB. Ergodic theorems and ergodic decomposition for Markov chains. Acta Applicandae Mathematicae. 1998;54:99–119. [Google Scholar]

[R18] Hoeffding W. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association. 1963:13–30. [Google Scholar]

[R19] Kendall WS. Notes on perfect simulation. In: F. L, J-S. W, editors. Markov Chain Monte Carlo: Innovations and Applications. 2005. pp. 93–146. World Scientific. [Google Scholar]

[R20] Lande R, Engen S, Saether B-E. Stochastic Population Dynamics in Ecology and Conservation. Oxford University Press; Oxford: 2003. [Google Scholar]

[R21] Lange K. On cohen's stochastic generalization of the strong ergodic theorem of demography. Journal of Applied Probability. 1979:496–504. [PubMed] [Google Scholar]

[R22] Lange K, Holmes W. Stochastic stable population growth. Journal of Applied Probability. 1981;18(2):325–334. [PubMed] [Google Scholar]

[R23] Lee RD, Tuljapurkar S. Stochastic population forecasts for the united states: Beyond high, medium, and low. Journal of the American Statistical Association. 1994;89(428):1175–1189. [PubMed] [Google Scholar]

[R24] McNamara J. Optimal life histories for structured populations in fluctuating environments. Theoretical Population Biology. 1997;51(2):94–108. [Google Scholar]

[R25] Metz J, Nisbet R, Geritz S. How should we define `fitness' for general ecological scenarios? Trends in Ecology & Evolution. 1992;7(6):198–202. doi: 10.1016/0169-5347(92)90073-K. [DOI] [PubMed] [Google Scholar]

[R26] Morris W, Doak D. Quantitative conservation biology. Sinauer Associates, inc. Publishers; Sunderland, Massachusetts, USA: 2002. [Google Scholar]

[R27] Morris W, Pfister C, Tuljapurkar S, Haridas C, Boggs C, Boyce M, Bruna E, Church D, Coulson T, Doak D. Longevity can buffer plant and animal populations against changing climatic variability. Ecology. 2008;89(1):19–25. doi: 10.1890/07-0774.1. [DOI] [PubMed] [Google Scholar]

[R28] Norris J. Markov Chains. Cambridge University Press; 1998. [Google Scholar]

[R29] Peres Y. Domains of analytic continuation for the top lyapunov exponent. Annales de l'Institut Henri Poincaré. Probabilités et Statistiques. 1992;28(1):131–48. [Google Scholar]

[R30] Pitman J. On coupling of markov chains. Probability Theory and Related Fields. 1976;35(4):315–322. [Google Scholar]

[R31] Roberts GO, Rosenthal JS. General state space Markov chains and MCMC algorithms. Probab. Surv. 2004;1:20–71. electronic. [Google Scholar]

[R32] Seneta E. Non-negative matrices and Markov chains. Springer Verlag; 1981. [Google Scholar]

[R33] Steinsaltz D. Locally contractive iterated function systems. Annals of Probability. 1999:1952–1979. [Google Scholar]

[R34] Tuljapurkar S. Population dynamics in variable environments. III. Evolutionary dynamics of r-selection. Theoretical Population Biology. 1982;21(1):141–165. doi: 10.1016/0040-5809(85)90019-x. [DOI] [PubMed] [Google Scholar]

[R35] Tuljapurkar S. Lecture notes in biomathematics. Springer-Verlag; New York: 1990. Population dynamics in variable environments; p. 85. [Google Scholar]

[R36] Tuljapurkar S, Horvitz CC, Pascarella JB. The many growth rates and elasticities of populations in random environments. The American Naturalist. 2003;162(4):489–502. doi: 10.1086/378648. [DOI] [PubMed] [Google Scholar]

[R37] Tuljapurkar SD, Orzack SH. Population dynamics in variable environments. I. Long-run growth rates and extinction. Theoretical Population Biology. 1980;18:314–342. [Google Scholar]

[R38] Walters P. An Introduction to Ergodic Theory. Springer Verlag; New York, Heidelberg, Berlin: 1982. [Google Scholar]

PERMALINK

Derivatives of the Stochastic Growth Rate

David Steinsaltz

Shripad Tuljapurkar

Carol Horvitz

Abstract

1 Introduction

2 The Model

3 Technical background

3.1 Contraction

3.2 Time reversal and the stationary distribution

4 Errors and How to Bound Them

4.1 Systematic error

4.2 Sampling error

5 Growth Rate and Sensitivity to Projection Matrices

5.1 Computing a

5.2 Derivatives with respect to Projection Matrices

6 Environments and Coupling

7 Derivatives with respect to Environmental Transitions

8 Discussion

Highlights

9 Acknowledgements

Appendix

A.1 Proof of Lemma 3.2

A.2 Estimating the stochastic growth rate

A.3 Estimating sensitivities: Matrix entries

A.4 Estimating sensitivities: Markov environments

A.5 A version of the Furstenberg-Kesten Theorem

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Derivatives of the Stochastic Growth Rate

David Steinsaltz

Shripad Tuljapurkar

Carol Horvitz

Abstract

1 Introduction

2 The Model

3 Technical background

3.1 Contraction

3.2 Time reversal and the stationary distribution

4 Errors and How to Bound Them

4.1 Systematic error

4.2 Sampling error

5 Growth Rate and Sensitivity to Projection Matrices

5.1 Computing a

5.2 Derivatives with respect to Projection Matrices

6 Environments and Coupling

7 Derivatives with respect to Environmental Transitions

8 Discussion

Highlights

9 Acknowledgements

Appendix

A.1 Proof of Lemma 3.2

A.2 Estimating the stochastic growth rate

A.3 Estimating sensitivities: Matrix entries

A.4 Estimating sensitivities: Markov environments

A.5 A version of the Furstenberg-Kesten Theorem

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases