Abstract
Suppose a string generated by a memoryless source (Xn)n≥1 with distribution P is to be compressed with distortion no greater than D ≥ 0, using a memoryless random codebook with distribution Q. The compression performance is determined by the “generalized asymptotic equipartition property” (AEP), which states that the probability of finding a D-close match between and any given codeword , is approximately 2−nR(P, Q, D), where the rate function R(P, Q, D) can be expressed as an infimum of relative entropies. The main purpose here is to remove various restrictive assumptions on the validity of this result that have appeared in the recent literature. Necessary and sufficient conditions for the generalized AEP are provided in the general setting of abstract alphabets and unbounded distortion measures. All possible distortion levels D ≥ 0 are considered; the source (Xn)n≥1 can be stationary and ergodic; and the codebook distribution can have memory. Moreover, the behavior of the matching probability is precisely characterized, even when the generalized AEP is not valid. Natural characterizations of the rate function R(P, Q, D) are established under equally general conditions.
Index Terms: Asymptotic equipartition property, data compression, large deviations, pattern-matching, random codebooks, rate-distortion theory
I. Introduction
Suppose a random string produced by a memoryless source (Xn : n ≥ 1) with distribution P on a source alphabet S, is to be compressed with distortion no more than some D ≥ 0 with respect to a single-letter distortion measure ρ(x, y).1 The basic information-theoretic model for understanding the best performance that can be achieved is the study of random codebooks. If we generate memoryless random strings according to some distribution Q on the reproduction alphabet T, we would like to know how many such strings are needed so that, with high probability, we will be able to find at least one codeword that matches the source string with distortion D or less. The crucial mathematical problem in answering this question is the evaluation of the probability that a given, typical , will be D-close to a random . This probability can be expressed as
| (1) |
where denotes the “distortion ball” consisting of all reproduction strings that are within distortion D (or less) from ; note that the matching probability in (1) is itself a random quantity, as it depends on the source string .
The importance of evaluating (1) was already identified by Shannon in his classic study of rate-distortion theory [4], where he showed that, for the best codebook distribution Q = Q*, we have
| (2) |
where R(P, D) is the rate-distortion function of the source.
The more general question of evaluating the matching probability (1) for distributions Q perhaps different from the optimal reproduction distribution Q*, arises naturally in a variety of contexts, including problems in pattern-matching, mismatched codebooks, Lempel-Ziv compression, combinatorial optimization on random strings, and others; see, e.g., [5]-[13], and the review and references in [14]. In this case, Shannon’s estimate (2) is replaced by the so-called “generalized asymptotic equipartition property” (or generalized AEP), which states that
| (3) |
where “a.s.” stands for “almost surely” and refers to the random string . The rate function R(P, Q, D) is defined in a way that closely resembles the rate-distortion function definition,
where H(·∥·) denotes the relative entropy, and the infimum is over all (bivariate) probability distributions of random variables (U, V) with values on S and T, respectively, such that U has distribution P and the expected distortion E[ρ(U, V)] ≤ D. (For a broad introduction to the generalized AEP, its applications and refinements, see [14] and the references therein.)
Although much is known about the generalized AEP and about R(P, Q, D) [14], all known results are established under certain restrictive conditions. In particular, it is always assumed that Dave(P, Q) < ∞ and that D ≠ Dmin(P, Q), where
Often, the codebook distribution is required to be memoryless, and when it is not, it is further assumed that the distortion measure is bounded.
The main point of this paper is to remove these constraints, and to analyze which (if any) are essential for the validity of the generalized AEP. Our motivation is twofold. On one hand, unnecessarily stringent conditions make the theoretical picture incomplete. On the other, there are applications which naturally require more general statements, such as universal lossy compression, where the source distribution is not known a priori [3].
Thus motivated, we give necessary and sufficient conditions for the generalized AEP in (3), and we precisely characterize the behavior of the matching probability in the pathological situations when the generalized AEP fails. Our results hold for all values of D, and they cover arbitrary abstract alphabets and distortion measures. We also allow the source to be stationary and ergodic, and the codebook distribution to have a certain amount of memory. We similarly extend the characterization of the rate function R(P, Q, D) to the same level of generality. We show that it can always be written as a convex dual, and that a minimizer W in the definition of R(P, Q, D) always exists (unless, of course, the infimum is taken over the empty set).
II. Notation and Assumptions
The source and reproduction alphabets S and T, along with their associated σ-algebras are assumed to be Borel spaces2 and the distortion measure ρ : S × T ↦ [0, ∞) is assumed to be product measurable. P and Q denote generic probability distributions on S and T, respectively, and let X ~ P and Y ~ Q be independent random variables (r.v.). Define3
If W is a probability distribution on S × T, then we use WS to denote the marginal distribution of W on S, and similarly for WT. Define
where the infimum is over the subset of probability distributions on S × T defined by
and where H(μ∥ν) denotes the relative entropy (in nats), namely,
We use the convention that 0̸ = +∞ and logarithms are base e. For independent r.v.s X ~ P and Y ~ Q, define
Consider now the product spaces Sn and Tn with the usual σ-algebras and with generic probability distributions Pn and Qn, respectively. Let and be independent and let Wn be a probability distribution on Sn × Tn. We use Qn to denote the product distribution of Q on Tn. Define the sequence of single-letter distortion measures (ρn : n ≥ 1) by
Since we are working with abstract alphabets, it makes sense to formally replace S, T, ρ, P, Q, W, X and Y with Sn, Tn, ρn, Pn, Qn, Wn, and , respectively, in the above definitions of ρQ, W(P, D), R(P, Q, D), Λ(P, Q, D) and Λ*(P, Q, D) in order to define , Wn(Pn, D), Rn(Pn, Qn, D), Λn(Pn, Qn, D) and .
For each , let denote the empirical probability distribution of on S and let denote the probability distribution on Sn that assigns probability 1 to the sequence . Define
where
denotes the distortion ball of radius D at the point .
Finally, consider the one-sided infinite sequence spaces S∞ and T∞ with the usual σ-algebras and with generic probability distributions ℙ and ℚ, respectively. Let (Xn : n ≥ 1) and (Yn : n ≥ 1) be independent random sequences with distributions ℙ and ℚ, respectively. Let Pn and Qn denote the marginal distributions of ℙ and ℚ on Sn and Tn, respectively, so that and as before. Define
It is straightforward to verify that when ℙ is stationary with P1 = P and ℚ is memoryless with Qn = Qn, then Λ∞ (ℙ, ℚ, λ) = Λ(P, Q, λ) and hence .
III. Memoryless Codebook Distributions
Our goal is provide necessary and sufficient conditions for the generalized AEP in (3) and for the characterization of the limit as a convex dual. In this section we restrict attention to the case where the code-book distribution is memoryless and where the source distribution is stationary and ergodic.
Under appropriate regularity conditions, it is well known in the literature that R(P, Q, D) can be expressed as a convex dual [9]. (See [14] for a review and further references). Our first result is that the regularity conditions are not necessary. (Proofs of all results are collected in Sections V and VI.)
Proposition 1
R(P, Q, D) = Λ* (P, Q, D) for all D. If W(P, D) is not empty, then this set contains a W such that R(P, Q, D) = H(W∥P × Q).
We separate our results about the generalized AEP into two theorems. The first theorem asserts that the generalized AEP always holds along some subsequence (of n’s). Consequently, the generalized AEP can only fail if the limit does not exist.4 The second theorem gives necessary and sufficient conditions for the existence of the limit and also characterizes the pathology. Taken together, they provide necessary and sufficient conditions for the generalized AEP.
Theorem 2
Suppose ℙ is stationary and ergodic with P1 = P. Then
| (4) |
for all D. The result also holds with replaced by .
Theorem 3
Suppose ℙ is stationary and ergodic with P1 = P. Then if and only if 0 < D = Dmin (P, Q) < ∞ and R(P, Q, D) < ∞ and ρQ (X1) is not a.s. constant. Furthermore, in this pathological situation
| (5a) |
| (5b) |
| (5c) |
where (Nm : m ≥ 1) is the (a.s.) infinite random subsequence of (n ≥ 1) for which is finite. All of the above also hold with replaced by .
The proofs below show that (Nm : m ≥ 1) can also be (a.s.) characterized as the random subsequence for which
| (6) |
Note that Dmin(P, Q) = EρQ(X), whenever the former is finite.
A simple example that illustrates the pathology is the following: Let (Xn : n ≥ 1) be the sequence 1, 0, 1, 0, … with probability 1/2 and the sequence 0, 1, 0, 1, … with probability 1/2, namely, the binary, stationary, periodic Markov chain (which is ergodic). Let Q be the point mass at 0, let ρ(x, y) := |x − y| and let D = 1/2. Note that ρQ(X1) = X1 is not constant, that D = Dmin (P, Q) = 1/2 and that R(P, Q, D) = 0 is finite. In the case when X1 = 0, for all n. In the case when X1 = 1, however, and for all n.
IV. Extensions to the Case With Memory
Although the source (Xn : n ≥ 1) can have memory, the generalized AEP stated thus far is restricted to the case where the reproduction distribution is memoryless, that is, Ln is evaluated with a product measure Qn. We relax this assumption here and replace it with a strong mixing condition, namely
| (7) |
for some fixed 1 ≤ C < ∞ and all and and all n. Examples include the cases where ℚ is memoryless (C = 1) and where ℚ is a hidden Markov model (HMM) whose underlying Markov chain has a finite state space with all (strictly) positive transition probabilities.
Proposition 4
If ℙ and ℚ are stationary and ℚ satisfies (7), then
for all D.
The proof shows that limit supremum in the definition of Λ∞ is actually a limit and Proposition 1 implies that . For the special case when S and T are finite and ℚ is a finite state Markov chain, a formula for R∞(ℙ, ℚ, D) not involving limits was identified in [7].
Theorem 5
If ℚ is stationary and satisfies (7), then Theorems 2 and 3 remain valid when Qn, , R(P, Q, D) and are replaced by Qn, QNm, R∞(ℙ, ℚ, D) and , respectively.
The mixing conditions here are strong enough to ensure that
| (8) |
and that
| (9) |
which is why the results for memory can still be in terms of Dmin(P, Q) and ρQ. Extending Theorem 3 to situations where these do not hold seems difficult. The generalized AEP for ℚ with memory can also be found in [7], [12], [14], [16], in some cases under more general mixing conditions, but always for bounded distortion measure ρ and for D ≠ Dmin (ℙ, ℚ).
V. Proofs: Standard Cases
Owing to space constraints, the proofs presented here focus primarily on the case D = Dmin (P, Q), which has received little or no attention in the literature. We also provide some justification for removing the standard assumption that Dave (P, Q) < ∞. This is a moment condition on ρ and it is assumed in all previous treatments of the subject. More detailed proofs, including measurability issues, can be found in a longer version of this paper [2] and the technical report that preceded it [1].
When ℚ is stationary and satisfies (7), then
| (10) |
and
| (11) |
These two bounds make it relatively straightforward to extend the memoryless case to the case with memory. Replacing x with X in (11), taking expectations and using well-known facts about subadditive sequences (cf., [15, Lemma 10.21]) gives Proposition 4. Equation (10) combined with a blocking argument can be used to extend the memoryless generalized AEP to the case with memory. The details are tedious and analogous to existing arguments in the literature. We refer the interested reader to [2] and [1]. Henceforth, except for the special case D = Dmin (P, Q) and our remarks about the lower bound below, we consider only the memoryless setting.
It is convenient in the proofs to temporarily redefine
Proposition 1, once established, shows that Dmin agrees with the definition in the text.
A. The Lower Bound and Replacing Ln With Rn
For each λ ≤ 0, the exponential Chebyshev’s inequality gives
Taking limits, using (11) to justify the sub-additive ergodic theorem [15, Th. 10.22], and then optimizing over λ ≤ 0 gives
| (12) |
This holds for all D. Combined with the previous bound this also shows that whenever converges to converges to the same limit. Furthermore, using Lemma 7 below it is straightforward to show that must be infinite when Ln is infinite. Using Propositions 1 and 4 to equate Rn and , these remarks justify our claims that Ln can be replaced by n−1 Rn in each of the theorems.
There is also a lower bound for Proposition 1.
| (13) |
This only makes sense if W (P, D) ≠ 0̸. A simple proof can be found in [14, Th. 2]. Note that the proof there does not make use of any regularity assumptions, other than the existence of a regular conditional distribution W(y∣x).
B. The Upper Bound: D < Dmin(P, Q) or D > Dave(P, Q)
The upper bound is
| (14) |
If D < Dmin(P, Q) then the upper bound is trivial since the right side is +∞. If D > Dave(P, Q) (the latter of which must then be finite), then the left side converges to 0 (Chebyshev’s inequality and the ergodic theorem) and again the upper bound must hold.
For Proposition 1, the upper bound is
| (15) |
Again, this only makes sense if W(P, D) ≠ 0̸. If D < Dmin(P, Q), then any W ∈ W(P, D) trivially satisfies (15). If D ≥ Dave(P, Q), then W = P × Q ∈ W(P, D) trivially satisfies (15).
C. The Upper Bound: Dmin(P, Q) < D ≤ Dave(P, Q)
This is the case considered in the literature, at least under the assumption that Dave(P, Q) < ∞. Under appropriate regularity conditions, the upper bound (14) is an immediate consequence of the one-sided large deviations theorem in [17] applied to the random variables and the constants an := −D for a typical, fixed realization (xn : n ≥ 1) of the source. See [14, Th. 1] for an overview of this technique using the Gärtner-Ellis theorem [18, Th. V.6], which is a generalization of [17]. The required regularity conditions are placed on Λ∞(ℙ, ℚ, λ).
For the memoryless codebook case, Λ∞(ℙ, ℚ, λ) = Λ(P, Q, λ) and the regularity conditions needed in [17] are given by the following Lemma.
Lemma 6
If Dmin(P, Q) < ∞, then Λ(λ) := Λ(P, Q, λ) is convex and continuously differentiable on (−∞, 0) with limλ↓−∞ Λ′(λ) = E(ρQ(X)) = Dmin(P, Q) and limλ↑0 Λ′(λ) = Dave(P, Q) (which may be infinite). If Dmin(P, Q) < Dave(P, Q), then Λ(λ) is strictly convex on (−∞, 0).
Remarks
The regularity assumption Dave(P, Q) < ∞ is used in the literature to establish these properties of Λ. It can be replaced by the much weaker assumption that Dmin(P, Q) < ∞. If Lemma 6 were known to be true for Λ∞, then the more general case with memory would also follow immediately from [17]. Establishing the strict convexity of Λ∞ seems challenging, however, and we resorted to a blocking argument as mentioned above to derive the general case from the memoryless case.
Proof
Let Z be a real-valued, nonnegative random variable. Define Γ(λ) := log EeλZ. It is well-known [19, Lem. 2.2.5, Ex. 2.2.24], [9], [14] that Γ is nondecreasing, convex and C∞ on (−∞, 0) with Γ′(λ) ↓ ess inf Z as λ ↓ ∞ and Γ′(λ) ↑ EZ (which may be infinite) as λ ↑ 0. Furthermore, if ess inf Z < EZ then Γ is strictly convex on (−∞, 0).
Now define Γ(x, λ) := log Eeλρ(x, Y). Using Z := ρ(x, Y), we see that Γ(x, λ) has all the above properties of Γ(λ) for each fixed x. Since Λ(λ) = EΓ(X, λ), which must be finite on (−∞, 0] if Dmin < ∞, we need only show that these properties are preserved by expectation. Convexity and strict convexity are immediately preserved. Preserving properties of the derivative requires some justification. Moving the expectation inside the derivative is justified by convexity and the monotone convergence theorem for the left hand derivative (including λ ↑ 0) and then finiteness and the dominated convergence theorem for the right hand derivative (including λ ↓ −∞). The fact that Dmin = limλ↓−∞ Λ′(λ) is a well-known property of Fenchel–Legendre transforms.
Turning to Proposition 1, if Dmin(P, Q) < D < Dave(P, Q), a proof of the upper bound (15) can be found in [14, Th. 2]. The achieving W ∈ W(P, D) is identified as
where λD < 0 is uniquely chosen so that Λ*(P, Q, D) = λDD − Λ(P, Q, λD). Note that the properties of Λ given in Lemma 6 are sufficient regularity conditions for ensuring that λD exists and has the desired properties. See [19, Lemma 2.2.5] and [20, Th. 23.5, Corollary 23.5.1, Th. 25.1].
VI. Proofs: Nonstandard Cases
Here we consider the special case when D = Dmin := Dmin(P, Q) < ∞. Lemma 6 is applicable, so
Unlike most of the previous section, we explicitly treat the case of code-book distributions with memory. In particular, we assume that ℚ is stationary and satisfies (7). The bounds in (10) and (11) easily give (9) and
which means that Λ*(Dmin) := Λ*(P, Q, Dmin) is finite (infinite) exactly when is finite (infinite). The lower bounds in (12) and (13) are also valid. These lower bounds address the upper bounds in (14) and (15) in the case Λ*(Dmin) = ∞. So in this section we will restrict attention to Λ*(Dmin) < ∞. The proofs make use of the sets
We use A(x) := A1(x) = {y : ρ(x, y) = ρQ(x)} and we use the notation 1{B} to denote the indicator function of the event B.
Lemma 7
If Λ*(Dmin) < ∞, then
for W ∈ W(P, Dmin) defined by
Proof
Define
so that ρ̃ is a valid distortion measure and so that
Let Λ̃ be defined analogously to Λ, except with ρ̃ instead of ρ. We have Λ(λ) = Λ̃(λ) + λDmin so that
We moved the limit inside the expectations using first the monotone convergence theorem and then the dominated convergence theorem. This also shows that the denominator in the definition of W is strictly positive P-a.s., so W is well-defined.
It is easy to see that and that
which completes the proof.
Lemma 7 gives the upper bound (15) and completes the proof of Proposition 1 for all possible values of D. We now address the upper bound (14) for the generalized AEP. Because of (9)
and we can use (10) in conjunction with the subadditive ergodic theorem [15, Th. 10.22] to conclude that
| (16) |
The limit in the subadditive ergodic theorem is deterministic because it is shift invariant and the source is ergodic. The second equality comes from Lemma 7 and the third from Propositions 1 and 4. Note that if ρQ(X) is a.s. constant, then and (16) gives the upper bound (14).
Now suppose ρQ(X1) is not a.s. constant (and D = Dmin and Λ*(Dmin) < ∞). This is the only pathological situation where the upper bound does not hold. Our analysis makes use of recurrence properties for random walks with stationary and ergodic increments. What we need is summarized in the following lemma. The notation “i.o.” means “infinitely often.”
Lemma 8
Let (Un : n ≥ 1) be a real-valued stationary and ergodic process and define for n ≥ 1. If EU1 = 0 and Prob{U1 ≠ 0} > 0, then Prob{Wn > 0 i.o.} > 0 and Prob{Wn ≥ 0 i.o.} = 1.
Proof
Define W0 := 0 so that (Wn : n ≥ 0) is a random walk with stationary and ergodic increments [21]. [22] shows that {lim infn−1Wn > 0} and {Wn → ∞} differ by a null set. The ergodic theorem gives Prob{n−1Wn → 0} = 1, so Prob{Wn → ∞} = 0. Similarly, by considering the process −Wn, we see that Prob{Wn → −∞} = 0.
Now {|Wn| → ∞} is invariant and must have probability 0 or 1. If it has probability 1, then since we cannot have Wn → ∞ or Wn → −∞ we must have Wn oscillating between increasingly larger positive and negative values, which means Prob{Wn > 0i.o.} = 1 and completes the proof.
Suppose Prob{|Wn| → ∞} = 0. Define
to be the number of times the random walk visits the set A. [21, Corollary 2.3.4] shows that either N(J) < ∞ a.s. for all bounded intervals J or {N(J)=0}∪{N(J)=∞} has probability 1 for all intervals J (open or closed, bounded or unbounded, but not a single point). By assumption |Wn|↛∞, so we can rule out the first possibility. Since Prob{W0 = 0} = 1, we see that for any interval J containing {0} we must have Prob{N(J) = ∞} = 1. In particular, taking J := [0, ∞) shows that Prob{Wn ≥ 0 i.o.} = 1. Similarly, taking J := (0, ∞) shows that
Returning to the main argument
| (17) |
where . Lemma 8 shows that Prob{Wn > 0 i.o.} > 0. This and (17) prove (5a).
Lemma 8 also shows that Prob{Wn ≤ 0 i.o.} = 1. Let (Nm : m ≥ 1) be the (a.s.) infinite, random subsequence of (n ≥ 1) such that Wn ≤ 0. Note that
so
| (18) |
Now, the final expression in (18) is a.s. finite because . This proves (5b) and shows that (Nm : m ≥ 1) satisfies the claims of the theorem, including (6). Letting m → ∞ in (18) and using (16) gives (5c), the upper bound along the sequence (Nm : m ≥ 1). Note that it also shows that the lim infn Ln is a.s. even in this pathological case, which proves Theorem 2 and its generalization in Theorem 5.
Acknowledgments
The author would like to thank I. Kontoyiannis, M. Madiman, and two anonymous reviewers for many useful comments and corrections, and I. Kontoyiannis for invaluable advice and for suggesting the problems that led to this correspondence.
This work was supported in part by a National Defense Science and Engineering Graduate Fellowship.
Footnotes
Precise rigorous definitions are given in the following section.
Borel spaces include ℝd as well as a large class of infinite-dimensional spaces, including Polish spaces. This assumption is made so that we can avoid certain pathologies while working with random sequences and conditional distributions [15].
The essential infimum of a random variable η, is ess inf η := inf{r : Prob{η < r} > 0}.
We are considering limits in the extended sense: if a sequence diverges to +∞ (or to −∞), then we still say that the limit exists and identify the limit as +∞ (or as −∞).
The material in this correspondence was presented in part at the 40th Annual Allerton Conference on Communications, Control, and Computers, Monticello, IL, October 2002
References
- 1.Harrison M. Brown Univ, Div Appl Math. Providence, RI, APPTS #03-3: 2003. The First Order Asymptotics of Waiting Times Between Stationary Processes Under Nonstandard Conditions. [Google Scholar]
- 2.Harrison MT. The Generalized Asymptotic Equipartition Property: Necessary and Sufficient Conditions 2007 [Online] doi: 10.1109/TIT.2008.924668. Available: http://arxiv.org/abs/0711.2666. [DOI] [PMC free article] [PubMed]
- 3.Harrison M, Kontoyiannis I. Proc 40th Ann Allerton Conf Commun Contr Comput. Allerton, IL: Oct, 2002. Maximum likelihood estimation for lossy data compression; pp. 596–604. [Google Scholar]
- 4.Shannon C. Coding theorems for a discrete source with a fidelity criterion. In: Slepian D, editor. IRE Nat Conv Rec. Vol. 4. New York: IEEE; 1959. pp. 142–163. Key Papers in the Develop. Inf. Theory. [Google Scholar]
- 5.Zhang Z, Yang E-h, Wei VK. The redundancy of source coding with a fidelity criterion – Part I: Known statistics. IEEE Trans Inf Theory. 1997 Jan;43:71–91. [Google Scholar]
- 6.Łuczak T, Szpankowski W. A suboptimal lossy data compression based on approximate pattern matching. IEEE Trans Inf Theory. 1997 Sep;43:1439–1451. [Google Scholar]
- 7.Yang E-h, Kieffer J. On the performance of data compression algorithms based upon string matching. IEEE Trans Inf Theory. 1998 Jan;44:47–65. [Google Scholar]
- 8.Kontoyiannis I. An implementable lossy version of the lempel-ziv algorithm – Part I: Optimality for memoryless sources. IEEE Trans Inf Theory. 1999 Nov;45:2293–2305. [Google Scholar]
- 9.Yang E-h, Zhang Z. On the redundancy of lossy source coding with abstract alphabets. IEEE Trans Inf Theory. 1999 May;45:1092–1110. [Google Scholar]
- 10.Dembo A, Kontoyiannis I. The asymptotics of waiting times between stationary processes, allowing distortion. Ann Appl Prob. 1999 May;9:413–429. [Google Scholar]
- 11.Yang E-h, Zhang Z. The shortest common superstring problem: Average case analysis for both exact and approximate matching. IEEE Trans Inf Theory. 1999;45:1867–1886. [Google Scholar]
- 12.Chi Z. The first-order asymptotic of waiting times with distortion between stationary processes. IEEE Trans Inf Theory. 2001 Jan;47:338–347. [Google Scholar]
- 13.Szpankowski W. Average Case Analysis of Algorithms on Sequences. New York: Wiley; 2001. [Google Scholar]
- 14.Dembo A, Kontoyiannis I. Source coding, large deviations, and approximate pattern matching. IEEE Trans Inf Theory. 2002 Jun;48:1590–1615. [Google Scholar]
- 15.Kallenberg O. Foundations of Modern Probability. 2. New York: Springer; 2002. [Google Scholar]
- 16.Chi Z. Stochastic sub-additivity approach to the conditional large deviation principle. Ann Prob. 2001;29(3):1303–1328. [Google Scholar]
- 17.Plachky D, Steinebach J. A theorem about probabilities of large deviations with an application to queuing theory. Period Math Hung. 1975;6(4):343–345. [Google Scholar]
- 18.den Hollander F. Large Deviations. Providence, RI: American Math Soc; 2000. [Google Scholar]
- 19.Dembo A, Zeitouni O. Large Deviations Techniques and Applications. 2. New York: Springer; 1998. [Google Scholar]
- 20.Rockafellar RT. Convex Analysis. Princeton, NJ: Princeton Univ Press; 1970. [Google Scholar]
- 21.Berbee H. Random Walks With Stationary Increments and Renewal Theory. Vol. 112. Amsterdam, The Netherlands: Mathematisch Centrum; 1979. Mathematical Centre Tracts. [Google Scholar]
- 22.Kesten H. Sums of stationary sequences cannot grow slower than linearly. Proc AMS. 1975 May;49(1):205–211. [Google Scholar]
