Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Sep 16.
Published in final edited form as: J Stat Plan Inference. 2014 Sep 6;157-158:54–76. doi: 10.1016/j.jspi.2014.08.007

Rigorous Error Control Methods for Estimating Means of Bounded Random Variables

Zhengjia Chen 1,, Xinjia Chen 2,
PMCID: PMC5026247  NIHMSID: NIHMS626237  PMID: 27642222

Abstract

In this article, we propose rigorous sample size methods for estimating the means of random variables, which require no information of the underlying distributions except that the random variables are known to be bounded in a certain interval. Our sample size methods can be applied without assuming that the samples are identical and independent. Moreover, our sample size methods involve no approximation. We demonstrate that the sample complexity can be significantly reduced by using a mixed error criterion. We derive explicit sample size formulae to ensure the statistical accuracy of estimation.

1 Introduction

Many problems of engineering and sciences boil down to estimating the mean value of a random variable [18, 19]. More formally, let X be a random variable with mean μ. It is a frequent problem to estimate μ based on samples X1, X2, ⋯, Xn of X, which are defined on a probability space (Ω, ℱ, ℙμ), where the subscript in the probability measure ℙμ indicates its association with μ. In many situations, the information on the distribution of X is not available except that X is known to be bounded in some interval [a, b]. For example, in clinical trials, many quantities under investigation are bounded random variables, such as biomarker, EGFR, K-Ras, B-Raf, Akt, etc (see., e.g., [3, 13, 23] and the references therein). Moreover, the samples X1, X2, ⋯, Xn may not be identical and independent (i.i.d). This gives rise to the significance of estimating μ under the assumption that

aXkbalmost surely for k, (1)
E[Xk|k1]=μalmost surely for k, (2)

where ℕ denotes the set of positive integers, and {ℱk, k = 0, 1, ⋯, } is a sequence of σ-subalgebra such that {∅, Ω} = ℱ0 ⊂ ℱ1 ⊂ ℱ2 ⊂ ⋯ ⊂ ℱ, with ℱk being generated by X1, ⋯, Xk. The motivation we propose to consider the estimation of μ under dependency assumption (2) is twofold. First, from a theoretical point of view, we want the results to hold under the most general conditions. Clearly, (2) is satisfied in the special case that X1, X2, ⋯ are i.i.d. Second, from a practical standpoint, we want to weaken the independency assumption for more applications. For example, in the Monte Carlo estimation technique based on adaptive importance sampling, the samples X1, X2, ⋯ are not necessarily independent. However, as demonstrated in page 6 of [10], it may be shown that the samples satisfy (2). An example of adaptive importance sampling is given in Section 5.8 of [8] on the study of catastrophic failure.

An unbiased estimator for μ can be taken as

X¯n=i=1nXin.

Let ɛ ∈ (0, 1) and δ ∈ (0, 1) be pre-specified margin of absolute error and confidence parameter, respectively. Since the probability distributions of X1, X2, ⋯ are usually unknown, one would use an absolute error criterion and seek the sample size, n, as small as possible such that for all values of μ,

μ{|X¯nμ|<ε}>1δ (3)

holds for all distributions having common mean μ. It should be noted that it is difficult to specify a margin of absolute error ɛ, without causing undue conservatism, for controlling the accuracy of estimation if the underlying mean value μ can vary in a wide range. To achieve acceptable accuracy, it is necessary to choose small ɛ for small μ. However, this leads to unnecessarily large sample sizes for large μ.

In addition to the absolute error criterion, a relative error criterion is frequently used for the purpose of error control. Let η ∈ (0, 1) and δ ∈ (0, 1) be pre-specified margin of relative error and confidence parameter, respectively. It is desirable to determine the sample size, n, as small as possible such that for all values of μ,

μ{|X¯nμ|<η|μ|}>1δ (4)

holds for all distributions having common mean μ. Unfortunately, the determination of sample size, n, requires a good lower bound for μ, which is usually not available. Otherwise, the sample size n needs to be very large, or infinity.

To overcome the aforementioned difficulties, a mixed criterion may be useful. The reason is that, from a practical point of view, an estimate can be acceptable if either an absolute criterion or a relative criterion is satisfied. More specifically, let ɛ > 0, η ∈ (0, 1) and δ ∈ (0, 1). To control the reliability of estimation, it is crucial that the sample size n is as small as possible, such that for all values of μ,

μ{|X¯nμ|<εor|X¯nμ|<η|μ|}>1δ (5)

holds for all distributions having common mean μ.

In the estimation of parameters, a margin of absolute error is usually chosen to be much smaller than the margin of relative error. For instance, in the estimation of a binomial proportion, a margin of relative error η = 0.1 may be good enough for most situations, while a margin of absolute error may be expected to be ɛ = 0.001 or even smaller. In many applications, a practitioner accepting a relative error normally expects a much smaller absolute error, i.e., ɛη. On the other hand, one accepting an absolute error ɛ typically tolerates a much larger relative error, i.e., ηɛ. It will be demonstrated that the required sample size can be substantially reduced by using a mixed error criterion.

Given that the measure of precision is chosen, the next task is to determine appropriate sample sizes. A conventional method is to determine the sample size by normal approximation derived from the central limit theorem [5, 7]. Such an approximation method inevitably leads to unknown statistical error due to the fact the sample size n must be a finite number [8, 11]. This motivates us to explore rigorous methods for determining sample sizes.

In this paper, we consider the problem of estimating the means of bounded random variables based on a mixed error criterion. The remainder of the paper is organized as follows. In Section 2, we introduce some martingale inequalities. In Section 3, we derive explicit sample size formulae by virtue of concentration inequalities and martingale inequalities. In Section 4, we extend the techniques to the problem of estimating the difference of means of two bounded random variables. Illustrative examples are given in Section 5. Section 6 provides our concluding remarks. Most proofs are given in Appendices.

2 Martingale Inequalities

Under assumption (2), it can be readily shown that {Xk − μ} is actually a sequence of martingale differences (see, e.g., [6, 24] and the references therein). In the sequel, we shall introduce some martingale inequalities which are crucial for the determination of sample sizes to guarantee pre-specified statistical accuracy.

Define function

ψ(ε,μ)=(μ+ε)ln(μ+εμ)+(1με)ln(1με1μ)

for 0 < ɛ < 1 − μ < 1. Under the assumption that 0 ≤ Xk ≤ 1 almost surely and (2) holds for all k ∈ ℕ, Hoeffding [12] established that

μ{X¯nμ+ε}<exp(nψ(ε,μ))for0<ε<1μ. (6)

To see that such result is due to Hoeffding, see Theorem 1 and the remarks on page 18, the second paragraph, of his paper [12]. For bounds tighter than Hoeffding’s inequality, see a recent paper [4].

To obtain simpler probabilistic inequalities, define bivariate function

φ(ε,μ)=ε22(μ+ε3)(1με3).

It is shown by Massart [17] that

ψ(ε,μ)>φ(ε,μ). (7)

By virtue of Hoeffding’s inequality and Massart’s inequality, the following results can be justified.

Theorem 1

Assume that 0 ≤ Xk ≤ 1 almost surely and (2) holds for all k ∈ ℕ. Then,

μ{X¯nμ+ε}exp(nφ(ε,μ))for0<ε<3(1μ), (8)
μ{X¯nμε}exp(nφ(ε,μ))for0<ε<3μ. (9)

Proof

To prove Theorem 1, note that

μ{X¯nμ+ε}=0<exp(nφ(ε,μ))forε>1μ. (10)

From Hoeffding’s inequality (6) and Massart’s inequality (7), we have

μ{X¯nμ+ε}<exp(nφ(ε,μ))for0<ε<1μ. (11)

Observe that μ{X¯nz} is a left-continuous function of z and that φ(ɛ, µ) is a continuous function of ɛ. Making use of this observation and (11), we have

μ{X¯nμ+ε}exp(nφ(ε,μ))forε=1μ. (12)

Note that

μ{X¯nμ+ε}μ{X¯n>1}=0exp(nφ(ε,μ))for1μ<ε<3(1μ). (13)

Combining (10), (11), (12) and (13) yields

μ{X¯nμ+ε}exp(nφ(ε,μ))for0<ε<3(1μ).

This proves (8). To show (9), define Yi = 1−Xi for i = 1, ⋯, n. Define Y¯n=1X¯n and ν = 1−μ. Then, E[Y¯n]=v. Applying (8), we have

μ{Y¯nv+ε}exp(nφ(ε,v))

for 0 < ɛ < 3(1 − ν). By the definitions of ν and Y¯n, we can rewrite the above inequality as

μ{Y¯nv+ε}=μ{1Y¯nμε}exp(nφ(ε,1μ))

for 0 < ɛ < 3μ. Observing that X¯n=1Y¯n and that φ(ɛ, 1 − μ) = φ(−ɛ, μ), we have (9). This completes the proof of Theorem 1.      □

It should be noted that Theorem 1 extends Massart’s inequality in two aspects. First, the random variables are not required to be i.i.d. Bernoulli random variables. Second, the inequalities hold for wider supports.

3 Explicit Sample Size Formulae

In this section, we shall investigate sample size methods for estimating the mean of bounded random variable X.

If X1, ⋯, Xn are i.i.d. samples of X bounded in interval [0, 1], it can be shown by Chebyshev’s inequality that (3) holds provided that

n14δε2. (14)

Under the assumption that 0 ≤ Xk ≤ 1 and E[Xk|k1]=μ almost surely for all k ∈ ℕ, Azuma-Hoeffding inequality [2, 12] implies that (3) holds for all μ ∈ (0, 1) if

n>ln2δ2ε2. (15)

Clearly, the ratio of the sample size determined by (14) to that of (15) is equal to

14δε2ln2δ2ε2=12δln2δ,

which is substantially greater than 1 for small δ ∈ (0, 1). Despite the significant improvement upon the sample size bound (14), the sample size bound (15) usually leads to a very large sample size, since ɛ is typically a small number in practice. For example, with δ = 0.05, we have n = 1, 844, 440 and n = 184, 443, 973 for ɛ = 0.001 and 0.0001, respectively.

To the best of our knowledge, the sample size bound (15) is the tightest one discovered so far under the assumption that 0 ≤ Xk ≤ 1 and E[Xk|k1]=μ almost surely for all k ∈ ℕ. In order to reduce the sample complexity, we propose to use the mixed error criterion, which can be viewed as a relaxation of the absolute error criterion. In this direction, we have exploited the application of Chebyshev’s inequality to establish the following result.

Theorem 2

If X1, ⋯, Xn are i.i.d. samples of X bounded in interval [0, 1], then (5) holds for all μ ∈ (0, 1) provided that λ=εη12 and that

n>1λδεη. (16)

See Appendix A for proof.

The sample size formula (16) may be too conservative. To derive tighter sample size formulae, we need to use the martingale inequalities of exponential form presented in the last section. Throughout the remainder of this section, we make the following assumption:

X1, X2, ⋯ are random variables such that aXkb and E[Xk|k1]=μ almost surely for all k ∈ ℕ.

In the case that X1, X2, are nonnegative random variables, we have the following general result.

Theorem 3

Let 0 ≤ a < b. Assume that 0 < ɛ < ba and η(0,32) and that a<εη<b Define

N={2ε2(εη+ε3a)(bεηε3)ln2δfor2aba+bεη+ε3a+b2,(ba)22ab(1η+13)2ln2δforεη+ε3<2aba+b,(ba)22abln2δforεη+ε3>a+b2

and

M={2ε2(εηε3a)(bεη+ε3)ln2δfor2aba+bεηε3a+b2,(ba)22ab(1η13)2ln2δforη3<bab+aandεηε3<2aba+b,[23η(1ab)29]ln2δforbab+a<η3<babandεηε3a+b2,1forη3bab,(ba)22ε2ln2δelse

Then, μ{|X¯nμ|<εor|X¯nμ|<ημ}>1δ for any μ ∈ (a, b) provided that n > max(N, M).

See Appendix B for proof. In Theorem 3, our purpose of assuming ɛ < b − a and a<εη<b is to make sure that the absolute error criterion is active for some μ ∈ (a, b) and that the relative error criterion is active for some μ ∈ (a, b). In Table 1, we list sample sizes for b = 1, ɛ = 0.001, η = 0.1, δ = 0.01 and various values of a, where Nmix denotes the sample sizes calculated by virtue of Theorem 3 and the mixed error criterion, and Nabs denotes the sample sizes obtained from the Chernoff-Hoeffding bound. More precisely,

Nmix=max(N,M)

and

Nabs=(ba)2ln2δ2ε2, (17)

where ⌈.⌉ denotes the ceil function. It can be seen from the table that the sample complexity can be significantly reduced by using a mxied error criterion and our sample size formula.

Table 1.

Table of Sample Sizes

a Nmix Nabs a Nmix Nabs
0.001 97880 499001 0.006 40765 494019
0.002 87393 498002 0.007 34871 493025
0.003 76906 497005 0.008 30451 492032
0.004 66419 496008 0.009 27013 491041
0.005 55932 495013 0.01 24263 490050

As an immediate application of Theorem 3, we have the following result.

Corollary 1

Let ɛ and η be respectively the margins of absolute and relative error such that

εη+ε3<12. (18)

Assume that 0 ≤ Xk ≤ 1 almost surely for all k ∈ ℕ. Then, (5) holds for all μ ∈ (0, 1) provided that

n>2(1η+13)(1ε1η13)ln2δ. (19)

It should be noted that (18) can be readily satisfied in practice, since 0 < ɛη < 1 is true in most applications.

An appealing feature of formula (19) is that the resultant sample size is much smaller as compared to that of (15) and (16). Moreover, to apply (19), no approximation is involved and no information of μ is needed. Furthermore, the samples need not be i.i.d.

Under the condition that 0 < ɛη ≪ 1, the sample size bound of (19) can be approximated as

2ln2δεη,

which indicates that the required sample size is inversely proportional to the product of margins of absolute and relative errors. It can be shown that the ratio of the bound of (16) to the sample size bound of (19) converges to 0 as δ decreases to 0, which implies that the bound of (19) is better for small δ.

The comparison of sample size formulae (15) and (19) is shown in Figure 1, where it can be seen that the sample size formula (19) leads to a substantial reduction in sample complexity as compared to (15).

Figure 1.

Figure 1

Comparison of Sample Sizes

To obtain more insight into such a reduction of sample size, we shall investigate the ratio of the sample sizes, which is given as

ln2δ2ε22(1η+13)(1ε1η13)ln2δ=14(λ+ε3)(1λε3).

Let ɛ ∈ (0, 1) and η ∈ (0, 1) such that (18) holds. When no information of μ is available except that μ is known to be bounded in (0, 1), the best known sample size bound is given by (15), which asserts that (3) holds for any μ ∈ (0, 1) provided that (15) holds. According to Corollary 1, we have that (5) holds for any μ ∈ (0, 1) provided that (19) is true. In view of (15) and (19), the ratio of the sample sizes tends to

R(λ)=def14λ(1λ)

as ɛ → 0 under the restriction that λ=ηε is fixed.

From Figure 2, it can be seen that the limiting ratio, R(λ), of sample sizes is substantially greater than 1 for small λ > 0. For example, if ɛ = 10−5 and η = 0.1, we have λ=εη=104 and R(λ) 2500. This demonstrates that the required sample size can be significantly reduced by virtue of a mixed error criterion. As mentioned earlier, for small η (e.g. η = 0.1), the requirement (5) can be viewed as a slight relaxation of the requirement (3). Our analysis indicates that such a slight relaxation is well worthy of the significant reduction in the sample complexity.

Figure 2.

Figure 2

Limit of Ratio of Sample Sizes

In Theorem 3, the random variables X1, X2, are assumed to be non-negative. In light of the fact that, in some situations, the random variables may assume positive or negative values, we have derived explicit sample size formula in the following result.

Theorem 4

Let a < 0 < b. Assume that 0 < ɛ < ba and η(0,32) and that ɛ < η max(|a|, b). Define

M={2ε2[|a+b|(εη+ε3)(εη+ε3)2ab]ln2δfor|a+b|2>εη+ε3,(ba)22ε2ln2δelse

Then, μ{|X¯nμ|<εor|X¯nμ|<η|μ|}>1δ for any μ ∈ (a, b) provided that n > M.

See Appendix C for proof.

It should be noted that the advantage of using the mixed error criterion is more pronounced if the interval [a, b] contains 0 and is more asymmetrical about 0. As an illustration, consider the configuration with ɛ = 0.1, η = 0.1 and δ = 0.05. Assume that the lower bound, a, of the interval is fixed as 1 and the upper bound, b, of the interval is a parameter. From formula (15), we know that the sample size required to ensure μ{|X¯nμ|<ε}1δ for any μ ∈ [a, b] can be obtained from (17).

According to Theorem 4, the sample size required to ensure μ{|X¯nμ|<εor|X¯nμ|<η|μ|}1δ for any μ ∈ [a, b] can be calculated as

Nmix=2ε2[|a+b|(εη+ε3)(εη+ε3)2ab]ln2δ.

Since a is fixed, the ratio, NabsNmix, of sample sizes is a function of b. Such a function is shown by Figure 3, from which it can be seen that the larger b is, the greater the reduction of sample size can be achieved by virtue of a mixed error criterion.

Figure 3.

Figure 3

Ratio of Sample Sizes (ɛ = 0.1, η = 0.1, δ = 0.05 and a = −1)

4 Estimating the Difference of Two Population Means

Our method can be extended to the estimation of the difference of means of bounded random variables. Let Y and Z be two bounded random variables such that E[Y]=μY and E[Z]=μZ Let X = Y − Z and μ = μY − μZ. Let Y1, ⋯ Yn be i.i.d. samples of Y. Let Z1, ⋯ Zn be i.i.d. samples of Z. Assume that the samples of Y and Z are independent. Let Xi = Yi −Zi for i = 1, 2, ⋯, n. Then, X1, ⋯, Xn are i.i.d. samples of X. Clearly, X is a bounded random variable. So are X1, ⋯, Xn. Define

X¯n=i=1nXin,Y¯n=i=1nYin,Z¯n=i=1nZin.

Then, X¯n=Y¯nZ¯n is an estimator for μ = μY − μZ. We can apply the sample size methods proposed in Section 3 to determine n such that

μ{|X¯nμ|<εor|X¯nμ|<η|μ|}>1δ.

To illustrate, consider an example with Y bounded in [0, 10] and Z bounded in [0, 1]. Assume that ɛ = 0.1, η = 0.1 and δ = 0.05. Since X = Y − Z is a random variable bounded in the interval [1, 10], from the discussion in last section, it can be seen that Theorem 4 can be employed to obtain the minimum sample size as 13, 408.

5 Illustrations

In this section, we shall illustrate the applications of our sample size formulae by examples in control and telecommunication engineering.

An extremely important problem of control engineering is to determine the probability that a system will fail to satisfy pre-specified requirements in an uncertain environment. This critical issue has been extensively studied in an area referred to as probabilistic robustness analysis (See, e.g. [14, 15, 21] and the references therein). In general, there is no effective deterministic method for computing such failure probability except the Monte Carlo estimation method. To estimate the probability of failure, the uncertain environment is modeled by a random variable Δ, which may be scalar or matrix-valued. Hence, a Bernoulli random variable X can be defined as a function X(.) of Δ such that X=X(Δ) assumes value 1 if the system associated with Δ fails to satisfy pre-specified requirements and assumes value 0 otherwise. Clearly, the failure probability p is equal to the mean of X. That is, p=E[X]=E[X(Δ)] For estimating the failure probability p, randomized algorithms have been implemented in a widely used software package RACT [22], in which an absolute error criterion is used for estimating p. Specifically, for a priori ɛ, δ ∈ (0, 1), the objective is to obtain an estimator p^ such that {|p^p|<ε}>1δ holds regardless of the value of the p ∈ (0, 1). The estimator is defined as

p^=1Ni=1NX(Δi),

where N is the sample size and Δ1, Δ2, ⋯, ΔN are i.i.d. samples of Δ. In most situations, there is no useful information about the range of the failure probability p due to the complexity of the system. Therefore, the determination of the sample size N should not be dependent on the range of p. It is well-known that, to make {|p^p|<ε}>1δ for any p ∈ (0, 1), an approximate sample size based on normal approximation is

N=Zδ/224ε2, (20)

where Zδ/2 is the critical value such that

Zδ/212πexp(x22)dx=δ2.

The approximate sample size formula (20) will inevitably lead to unknown statistical error, since the formula (20) is based on the central limit theorem, which is an asymptotic result. In view of this drawback, control theorists and practitioners are reluctant to use the approximate formula (20). To rigorously control the statistical accuracy of the estimation, the Chernoff-Hoeffding bound is most frequently used in control engineering for determination of sample size. To ensure that {|p^p|<ε}>1δ holds for any p ∈ (0, 1), it suffices to take sample size

N=ln2δ2ε2. (21)

The ratio of the sample size (21) to the sample size (20) is approximately equal to 2ln2δZδ/22, which tends to 1 as δ → 0. It can be shown that

2ln2δZδ/22<32forδ(0,110).

This indicates that in most situations, the ratio of the rigorous sample size (21) to the approximate sample size (20) does not exceed 32. From this analysis, it can be seen that it is worthy to obtain a rigorous control of the statistical accuracy by using the sample size (21) at the price of increasing the computational complexity up to 50%. This explains why the sample size (21) is frequently used in control engineering. As a matter of fact, the sample size formula (21) is implemented in RACT to estimate the failure probability.

In control engineering, the absolute error criterion is widely used. Recall that in Section 3, we have shown that a much smaller sample size is sufficient if a mixed error criterion is used. More specifically, the sample size can be significantly reduced by letting η ∈ (0, 1) and relaxing the requirement {|p^p|<ε}>1δ as

{|p^p|<εor|p^p|<ηp}>1δ.

In many situations, the margin of absolute error ɛ needs to be very small (e.g., ɛ << 0.1), since p is usually a very small number. However, the margin of relative error η does not need to be extremely small. For example, η = 0.1 may be sufficient for most cases.

As a concrete illustrative example, consider an uncertain dynamic system described by the differential equation

d3y(t)dt3+q1d2y(t)dt2+q2q3dy(t)dt+q2y(t)=u(t),

where u(t) is the input, y(t) is the output, and q1, q2, q3 are uncertain parameters. Assume that the tuple (q1, q2, q3) is uniformly distributed over the domain

|1q1|1.1,|1q2|1,|1q3|0.5.

According to control theory, the system is said to be stable if the output is bounded for any bounded input. It can be shown that such a stability criterion is satisfied if and only if all the roots of the polynomial equation

s3+q1s2+q2q3s+q2=0 (22)

with respect to s in the field of complex number have negative real parts (see, e.g., Section 3.6 of [9] for an explanation of the concept of stability). Since the roots of equation (22) are functions of random variables q1, q2 and q3, a Bernoulli random variable X can be defined in terms of q1, q2 and q3 such that X assumes value 0 if all the roots have negative real parts, and otherwise X assumes value 1. For this particular example, we are interested in estimating the probability that the system is unstable. This amounts to the estimation of the probability that the Bernoulli random variable X assumes value 1. Since X is bounded in interval [0, 1], our sample size formula can be useful for the planning of the Monte Carlo experiment. Let δ = 10−3. If the margin of error ɛ = 10−3, then the sample size is obtained by (21) as 3800452. If we use a mixed criterion with η = 0.1 and the same ɛ and δ, then the sample size can be computed by (19) as 155463, which is only about 5% of sample size for the absolute criterion. The estimate of the probability of instability is obtained as 0.5403.

In wireless data communications, a frequent problem is to evaluate the bit error rate of a data transmission scheme. The bit error rate is the probability that a bit is transmitted incorrectly. In many situations, due to the complexity of the transmission system, the only tool to obtain the bit error rate is the Monte Carlo simulation method. For example, there is no exact analytical method for computing the bit error rate of a wireless data transmission system employing multiple antennas and space-time block codes. The principle of this transmission system is proposed in [1] (see, e.g., [16] and the references therein for a comprehensive discussion). The wireless data transmission process can be modeled by a sequence of Bernoulli random variables X1, X2, , where Xi assumes value 0 and 1 in accordance with the correct and incorrect transmission of the i-th bit. If X1, X2, are identically and independently distributed Bernoulli random variables of the same mean μ ∈ (0, 1), then the bit error rate is μ and its estimator can be taken as i=1nXin with n being sufficiently large. However, as a consequence of the application of the space-time block codes, the random variables X1, X2, are not independent. This gives rise to the following question:

Is it possible to estimate the bit error rate without the independence of the random variables X1, X2, ?

In a wireless data transmission system employing multiple antennas and space-time block codes, the expectation of Xk conditioned upon X, ℓ < k is a constant μ with respect to k, since the noise process is stationary and the input data can be treated as a Bernoulli process [1, 16]. This implies that it is reasonable to treat X1, X2, as a martingale process such that condition (2) is satisfied. Hence, despite the lack of independence, the bit error rate can be approximated by i=1nXin. To control the statistical error, the sample size method proposed in the previous section can be applied to determine the appropriate value of n.

6 Concluding Remarks

In this paper, we have considered the problem of estimating means of bounded random variables. We have illustrated that in many applications, it may be more appropriate to use a mixed error criterion for quantifying the reliability of estimation. We demonstrated that as a consequence of using the mixed error criterion, the sample complexity can be substantially reduced. By virtue of probabilistic inequalities, we have developed explicit sample size formulae for the purpose of controlling the statistical error of estimation. We have attempted to make our results generally applicable by eliminating the need of i.i.d. assumptions of the samples and the form of the underlying distributions.

Research highlights.

  • A rigorous sample size method for estimating the mean of bonded random variable.

  • It requires neither information nor IID condition of the underlying distribution.

  • It involves no approximation.

  • Sample complexity can be significantly reduced by using a mixed error criterion.

  • Explicit sample size formulae to ensure the statistical accuracy of estimation.

Acknowledgments

The author would like to thank the Associated Editor and referees for their time, effort and comments in reviewing this paper.

This research is supported in part by NIH/NCI Grants No. 1 P01 CA116676, P30 CA138292-01, and 5 P50 CA128613.

A Proof of Theorem 2

Note that

μ{|X¯nμ|ε,|X¯nμ|ημ}=μ{|X¯nμ|max(ε,ημ)}. (23)

Since X1, ⋯, Xn are i.i.d. samples of X, it follows from (23) and Chebyshev’s inequality that

μ{|X¯nμ|ε,|X¯nμ|ημ}V(X)n[max(ε,ημ)]2, (24)

where V(X) denotes the variance of X. Since 0 ≤ X ≤ 1 almost surely and E[X]=μ, it must be true that

V(X)μ(1μ). (25)

Combining (24) and (25) yields

μ{|X¯nμ|ε,|X¯nμ|ημ}Q(μ)n, (26)

where

Q(μ)=μ(1μ)[max(ε,ημ)]2

for μ ∈ (0, 1). Now we investigate the maximum of Q(μ) for μ ∈ (0, 1) by considering two cases as follows.

  • Case (i) : 0 ≤ μ ≤ λ.

  • Case (ii) : λ < µ ≤ 1.

In Case (i), we have 0μλ=εη12and

Q(μ)=μ(1μ)ε2λ(1λ)ε2=1λεη, (27)

where we have used the fact that μ(1 − μ) is increasing with respect to μ(0,12). In Case (ii), we have λ < μ ≤ 1 and

Q(μ)=μ(1μ)(ημ)2=1μη2μ1λη2λ=1λεη. (28)

In view of (27) and (28), we have

Q(μ)1λεη,μ[0,1]. (29)

Making use of (26) and (29), we have

μ{|X¯nμ|ε,|X¯nμ|ημ}1λnεη,μ[0,1].

from which the theorem immediately follows. This completes the proof of Theorem 2.

Throughout the proofs of Theorems 3 and 4, we shall use the following definitions. Let

θ=μaba.

Let θ denote the probability measure associated with θ. Define

Yk=Xkaba,Y¯n=i=1kYikk,

where X1, X2, are random variables such that a ≤ Xk ≤ b and E[Xk|Fk1]=μ almost surely for all k.

B Proof of Theorem 3

To prove the theorem, we need some preliminary results.

Lemma 1

Let ζ ∈ (0, 1). Define

Q1(θ)={exp(nφ(ζ,θ))forθ(0,1ζ3),0forθ(1ζ3,1).

Then, θ{Y¯nθ+ζ}Q1(θ) for θ ∈ (0, 1). Moreover, Q1(θ) is increasing with respect to θ(0,12ζ3) and non-increasing with respect to θ(12ζ3,1)

Proof

For θ(0,12ζ3), we have 0 < ζ < 3(1 − θ), it follows from (8) that θ{Y¯nθ+ζ}exp(nφ(ζ,θ)) for θ(0,1ζ3). For θ(1ζ3,1), we have θ + ζ > 1 and consequently,

θ{Y¯nθ+ζ}θ{Y¯n1}=0.

Thus, we have shown that θ{Y¯nθ+ζ}Q1(θ) for θ ∈ (0, 1). To establish the monotonicity of Q1(θ), it is sufficient to observe that

φ(ζ,θ)θ=ζ2(θ+ζ3)2(1θζ3)2[2(θ+ζ3)1],

which is negative for any θ(0,12ζ3) and positive for any θ(12ζ3,1).      □

Lemma 2

Let ζ ∈ (0, 1). Define

Q2(θ)={exp(nφ(ζ,θ))forθ(ζ3,1),0forθ(0,ζ3).

Then, θ{Y¯nθζ}Q2(θ) for θ ∈ (0, 1). Moreover, Q2(θ) is non-decreasing with respect to θ(0,12+ζ3) and decreasing with respect to θ(12+ζ3,1).

Proof

For θ(ζ3,1), we have 0 < ζ < 3θ, it follows from (9) that θ{Y¯nθζ}exp(nφ(ζ,θ)) for (ζ3,1). For θ(ζ3,1), we have θζ < 0 and consequently,

θ{Y¯nθζ}θ{Y¯n<0}=0.

Thus, we have shown that θ{Y¯nθζ}Q2(θ) for θ ∈ (0, 1). To establish the monotonicity of Q2(θ), it is sufficient to observe that

φ(ζ,θ)θ=ζ2(θζ3)2(1θ+ζ3)2[2(θζ3)1],

which is negative for any θ(ζ3,12+ζ3) and positive for any θ(12+ζ3,1).      □

Lemma 3

Let 3η<c0. Define

r=3+ηc3+η,v=13+η(ηc+3c2c1)

and

Q3(θ)={exp(nφ(η(θc),θ))forθ(0,r),0forθ[r,1).

Then, the following assertions hold.

  1. θ{Y¯nθ+η(θc)}Q3(θ) for θ ∈ (0, 1).

  2. If ν* > 0, then Q3(θ) is increasing with respect to θ ∈ (0, ν*) and non-increasing with respect to θ ∈ (ν*, 1).

  3. If ν* ≤ 0, then Q3(θ) is non-increasing with respect to θ ∈ (0, 1)

Proof

To show assertion (I), note that θ + η(θ − c) > 1 for θ ∈ [r*, 1). Consequently,

θ{Y¯θ+η(θc)}θ{Y¯n>1}=0

for θ ∈ [r*, 1). On the other hand, 0 < η (θ − c) < 3(1 − θ) for θ ∈ (0, r*). Hence, it follows from inequality (8) that

θ{Y¯nθ+η(θc)}exp(nφ(η(θc),θ))=exp(nη22g(θ))forθ(0,r),

where

g(θ)=(θc)2ρ(θ)[1ρ(θ)]

with ρ(θ)=θ+η3(θc) Clearly, 0 < r* < 1 and 0 < ρ(θ) < 1 for θ ∈ (0, r*). This proves assertion (I).

To show assertions (II) and (III), consider the derivative of g(θ) with respect to θ. Let x=(1+η3)θ and α=cη3. Then, ρ(θ) = xα and

g(θ)=2(θc)ρ(θ)[1ρ(θ)](θc)2{ρ(θ)[1ρ(θ)]ρ(θ)ρ(θ)}[ρ(θ)]2[1ρ(θ)]2=2(θc){ρ(θ)[1ρ(θ)](1+η3)(θc)[12ρ(θ)]}[ρ(θ)]2[1ρ(θ)]2=2(θc)[ρ(θ)]2[1ρ(θ)]2[(xα)(1x+α)(xγ)(12x+α)]=2(θc)[ρ(θ)]2[1ρ(θ)]2[(12+αγ)xα(1+α)+γ(12+α)]=2(θc)[ρ(θ)]2[1ρ(θ)]2[(12c)xα(1+α)+(c+α)(12+α)]=2(θc)[ρ(θ)]2[1ρ(θ)]2[(12c)ρ(θ)+c2]. (30)

Since θ − c > 0 for θ ∈ (0, 1), it follows from (30) that g′(θ) ≥ 0 if and only if

(12c)ρ(θ)+c20,

which is equivalent to θ ≥ ν*. As a consequence of c < 1, we have ν* < r*. It follows that assertions (II) and (III) hold.      □

Lemma 4

Define

N={2ε2(εη+ε3α)(bεηε3)ln2δfor2aba+b<εη+ε3<a+b2,(ba)22ab(1η+13)2ln2δforεη+ε3<2aba+b,(ba)22ε2ln2δforεη+ε3>a+b2.

Then, μ{X¯nμ+max(ε,ημ)}δ for all μ ∈ (a, b) provided that n > N.

Proof

For simplicity of notations, define ζ=εba,λ=εη,c=aab,

p=12ζ3,θ=λba+c,v=13+η(ηc+3c2c1)

and Q+(θ)=θ{Y¯nθ+max(ζ,η(θc))} for θ ∈ (0, 1). Then, μ{X¯nμ+max(ε,ημ)}. It suffices to show the lemma for the following three cases.

  • Case (1): 2aba+bεη+ε3a+b2.

  • Case (2): εη+ε3<2aba+b.

  • Case (3): εη+ε3>a+b2.

First, consider Case (1). Clearly, as a consequence of εη+ε3a+b2, we have p* ≥ θ*. As a consequence of εη+ε32aba+b, we have θ* ≥ ν*. Therefore, it follows from 2aba+bεη+ε3a+b2 that p* ≥ θ* ≥ ν*. Since 12>pθ>0, it follows from Lemma 1 that Q1(θ) is increasing for θ ∈ (0, θ*]. Hence,

Q+(θ)Q1(θ)Q1(θ)=Q3(θ)

for θ ∈ (0, θ*]. Since θ* ≥ ν*, it follows from Lemma 3 that Q3(θ) is decreasing for θ ∈ [θ*, 1). Hence,

Q+(θ)Q3(θ)Q3(θ)

for θ ∈ [θ*, 1). Therefore, Q+(θ)δ2 for θ ∈ (0, 1) provided that Q3(θ)δ2. Observing that

Q3(θ)=exp(nφ(ζ,θ)),

we have that Q+(θ)δ2 for θ ∈ (0, 1) provided that

n>ln2δφ(ζ,θ)=2ζ2(θ+ζ3)(1θζ3)ln2δ=2ζ2(λba+c+ζ3)(1λbacζ3)ln2δ=2ε2(εη+ε3+a)(bεηε3)ln2δ.

Next, consider Case (2). As a consequence of εη+ε3<2aba+b, we have θ* < min(p*, ν*). Since 12>p>θ>0, it follows from Lemma 1 that Q1(θ) is increasing for for θ ∈ (0, θ*]. Hence,

Q+(θ)Q1(θ)Q1(θ)=Q3(θ)Q3(v)

for θ ∈ (0, θ*]. Since θ* < ν*, it follows from Lemma 3 that Q3(θ) is increasing for θ ∈ [θ*, ν*) and is decreasing for θ ∈ [ν*, 1). Hence,

Q+(θ)Q3(θ)Q3(v)

for θ ∈ [θ*, 1). Therefore, Q+(θ)δ2 for θ ∈ (0, 1) provided that Q3(v)δ2. Since Q3(v)=exp(nφ(η(vc),v)), it follows that Q+(θ)δ2 for θ ∈ (0, 1) provided that

n>ln2δφ(η(vc),v)=(ba)22ab(1η+13)2ln2δ,

where we have used the definitions of ν* and c.

Finally, consider Case (3). In this case, we have Q+(θ)θ{Y¯nθ+ζ}exp(2nζ2). Therefore, Q+(θ)δ2 provided that

n>ln2δ2ζ2=(ba)22ε2ln2δ.

This completes the proof of the lemma.      □

Lemma 5

Let c ≤ 0. Define

r=ηc3η,v=13η(3c2c1ηc)

and

Q4(θ)={exp(nφ(η(cθ),θ))forθ(r,1),0forθ(0,r).

Then, the following assertions hold.

  1. θ{Y¯nθη(θc)}Q4(θ) for θ ∈ (0, 1).

  2. If ν* ≤ 1, then Q4(θ) is non-decreasing with respect to θ ∈ (0, ν*) and decreasing with respect to θ ∈ (ν*, 1).

  3. If ν* > 1, then Q4(θ) is non-decreasing with respect to θ ∈ (0, 1).

Proof

Clearly, r* ≥ 0. To show assertion (I), note that θ − η(θ − c) < 0 for θ ∈ (0, r*). It follows that

θ{Y¯nθη(θc)}θ{Y¯n0}=0

for θ ∈ (0, r*). On the other hand, 0 < η (θ − c) < 3θ for θ ∈ (r*, 1), it follows from (9) that

θ{Y¯nθη(θc)}exp(nφ(η(cθ),θ))=exp(nη22g(θ)),

where

g(θ)=(θc)2ρ(θ)[1ρ(θ)]

with ρ(θ)=θη3(θc). Clearly, ρ(θ) < θ < 1. Since θ > r*, we have ρ(θ) > 0. Hence, 0 < ρ(θ) < 1 for θ ∈ (r*, 1). This establishes assertion (I).

To show assertions (II) and (III), consider the derivative of g(θ) with respective to θ. Tedious computation shows that

g(θ)=2(θc)[ρ(θ)]2[1ρ(θ)]2[(12c)ρ(θ)+c2]. (31)

Since θc > 0 for θ ∈ (0, 1), it follows from (31) that g(θ)0 if and only if (12c)ρ(θ)+c20, which is equivalent to θ ≥ ν*. Direct computation shows that 0 < r* < ν*. It follows that assertions (II) and (III) hold.      □

Lemma 6

Define

M={2ε2(εηε3a)(bεη+ε3)ln2δfor2aba+b<εηε3<a+b2,(ba)22ab(1η13)2ln2δforη3<bab+aandεηε3<2aba+b,[23η(1ab)29]ln2δforbab+a<η3<babandεηε3<a+b2,1forη3>bab,(ba)22ε2ln2δelse.

Then, μ{X¯nμmax(ε,ημ)}δ for all μ ∈ (a, b) provided that n > M.

Proof

For simplicity of notations, define ζ=εba,λ=εη,c=aab,

q=12+ζ3,θ=λba+c,v=13η(3c2c1ηc),r=ηc3η.

It can be checked that 0 < θ* < 1. Define Q(θ)=θ{Y¯nθmax(ζ,η(θc))} for θ ∈ (0, 1). Then, μ{X¯nμmax(ε,ημ)}=Q(θ). We need to show the lemma for the following six cases.

  • Case (i): 2aba+bεηε3a+b2.

  • Case (ii): η3<bab+a and εηε3<2aba+b.

  • Case (iii): bab+a<η3<bab and εηε3a+b2.

  • Case (iv): η3bab

  • Case (v): Else.

First, consider Case (i). Clearly, as a consequence of εηε3a+b2, we have q* ≥ θ*. As a consequence of εηε32aba+b, we have θ* ≥ ν*. It follows from 2aba+bεηε3a+b2 that q* ≥ θ* ≥ ν*. Since q* ≥ θ*, it follows from Lemma 2 that Q2(θ) is non-decreasing for θ ∈ (0, θ*]. Hence, for θ ∈ (0, θ*].

Q(θ)Q2(θ)Q2(θ).

Since θ* ≥ ν*, it follows from Lemma 5 that Q4(θ) is decreasing for θ ∈ [θ*, 1). Hence,

Q(θ)Q4(θ)Q4(θ)=Q2(θ)

for θ ∈ [θ*, 1). Hence, Q(θ)δ2 for θ ∈ (0, 1) provided that Q2(θ)δ2. Observing that Q2(θ)exp(nφ(ζ,θ)), we have that Q(θ)δ2 for θ ∈ (0, 1) provided that

n>ln2δφ(ζ,θ)=2ζ2(θζ3)(1θ+ζ3)ln2δ=2ζ2(λba+cζ3)(1λbac+ζ3)ln2δ=2ε2(εηε3a)(bεη+ε3)ln2δ.

Second, consider Case (ii). As a consequence of η3<bab+a, we have ν* < 1. Making use of εηε3<2aba+b, we have θ* < min(q*, ν*). Since q* > θ*, it follows from Lemma 2 that Q2(θ) is non-decreasing for θ ∈ (0, θ*]. Hence, for θ ∈ (0, θ*],

Q(θ)Q2(θ)Q2(θ). (32)

Since θ* < ν*, it follows from Lemma 5 that Q4(θ) is increasing for θ ∈ [θ*, ν*) and is decreasing for θ ∈ [ν*, 1). Hence,

Q(θ)Q4(θ)Q4(v) (33)

for θ ∈ [θ*, 1). Note that

Q4(v)Q4(θ)=Q2(θ). (34)

In view of (32), (33) and (34), we have that Q(θ)δ2 for θ ∈ (0, 1) provided that Q4(v)δ2 Observing that

Q4(v)=exp(nφ(η(vc),v)),

we have that Q(θ)δ2 for θ ∈ (0, 1) provided that the corresponding sample size

n>ln2δφ(η(vc),v)=(ba)22ab(1η13)2ln2δ,

where we have used the definitions of ν and c.

Third, consider Case (iii). As a consequence of

bab+a<η3<bab,εηε3a+b2,

we have r* < 1 < ν* and q* ≥ θ*. Since q* ≥ θ*, it follows from Lemma 2 that Q2(θ) is non-decreasing for θ ∈ (0, θ*]. Hence, for θ ∈ (0, θ*],

Q(θ)Q2(θ)Q2(θ).

Since ν* > 1, it follows from Lemma 5 that Q4(θ) is non-decreasing for θ ∈ [θ*, 1). Hence,

Q(θ)Q4(θ)Q4(1).

for θ ∈ [θ*, 1). Note that Q4(1)Q4(θ)=Q2(θ). Hence, Q(θ)δ2 for θ ∈ (0, 1) provided that Q4(1)δ2. Since Q4(1)=exp(nφ(η(1c),1)), it follows that Q(θ)δ2 for θ ∈ (0, 1) provided that the corresponding sample size

n>ln2δφ(η(1c),1)=2(1η(1c)3)(η(1c)3)[η(1c)]2ln2δ=[23η(1ab)29]ln2δ.

Now, consider Case (iv). As a consequence of η3bab we have r* ≥ 1, which implies that Q(θ) = 0 for θ ∈ (0, 1). Hence, Q(θ)δ2 for θ ∈ (0, 1) for any sample size n ≥ 1.

Finally, consider Case (v). In this case, we have Q(θ)θ{Y¯nθζ}exp(2nζ2). Therefore, Q(θ)δ2 provided that

n>ln2δ2ζ2=(ba)22ζ2ln2δ.

This completes the proof of the lemma.      □

Finally, Theorem 3 can be established by making use of Lemmas 4 and 6.

C Proof of Theorem 4

To prove the theorem, we need some preliminary results.

Lemma 7

Let c ∈ (0, 1). Define r=3+ηc3+η and

1(θ)={exp(nφ(η(θc),θ))forθ(c,r),0forθ(r,1).

Then, θ{Y¯nθ+η(θc)}1(θ) for θ ∈ (c, 1). Moreover, 1(θ) is non-increasing with respect to θ ∈ (c, 1).

Proof

By the definition of r*, it can be checked that c < r* < 1. Note that θ + η(θ − c) > 1 for θ ∈ (r*, 1). Hence, θ{Y¯nθ+η(θc)}θ{Y¯n>1}=0 for θ ∈ (r*, 1). On the other hand, 0 < η(θc) < 3(1 − θ) for θ ∈ (c, r*). Thus, it follows from inequality (8) that

θ{Y¯nθ+η(θc)}exp(nφ(η(θc),θ))=exp(nη22g(θ)),

where

g(θ)=(θc)2ρ(θ)[1ρ(θ)]

with ρ(θ)=θ+η3(θc). It can be verified that 0 < ρ(θ) < 1 for θ ∈ (c, r*). This shows that θ{Y¯nθ+η(θc)}1(θ) for θ ∈ (c, 1).

To show that 1(θ) is non-increasing with respect to θ ∈ (c, 1), consider the derivative of g(θ) with respective to θ. Tedious computation shows that the derivative of g(θ) is given as

g(θ)=2(θc)[ρ(θ)]2[1ρ(θ)]2[(12c)ρ(θ)+c2]. (35)

We claim that the derivative g′(θ) is positive for θ ∈ (c, 1). In view of (35) and the fact that θ −c > 0 for θ ∈ (c, 1), it is sufficient to show that (12c)ρ(θ)+c20 for θ ∈ (c, 1) in the case that 12c0 and the case that 12c<0. By the definition of ρ(θ), we have

cρ(θ)1+η3(1c) (36)

for θ ∈ (c, 1). In the case of 12c0, using the lower bound of ρ(θ) given by (36), we have (12c)ρ(θ)+c2(12c)c+c2>0 for θ ∈ (c, 1). In the case of 12c<0, using the upper bound of ρ(θ) given by (36), we have

(12c)ρ(θ)+c2(12c)[1+η3(1c)]+c2=12+η61+η2c+c2η3=η3(c1)(c32η12)>0

for θ ∈ (c, 1) and η ∈ (0, 1). Thus, we have shown the claim that g′(θ) > 0 in all cases. This implies that 1(θ) is non-increasing with respect to θ ∈ (c, 1).      □

Lemma 8

Let c ∈ (0, 1). Define 2(θ)=exp(nφ(η(cθ),θ)) for θ ∈ (c, 1). Then, θ{Y¯nθη(θc)}2(θ) for θ ∈ (c, 1). Moreover, 2(θ) is decreasing with respect to θ ∈ (c, 1).      □

Proof

Clearly, 0 < η(θ − c) < 3θ for θ ∈ (c, 1). It follows from inequality (9) that

θ{Y¯nθη(θc)}exp(nφ(η(cθ),θ))=exp(nη22g(θ)),

where

g(θ)=(θc)2ρ(θ)[1ρ(θ)]

with ρ(θ)=θη3(θc). Clearly, 0 < ρ(θ) < 1 for θ ∈ (c, 1). This shows that θ{Y¯nθη(θc)}2(θ) for θ ∈ (c, 1).

To show that 2(θ) is decreasing with respect to θ ∈ (c, 1), consider the derivative of g(θ) with respect to θ. Tedious computation shows that the derivative of g(θ) is given as

g(θ)=2(θc)[ρ(θ)]2[1ρ(θ)]2[(12c)ρ(θ)+c2]. (37)

We claim that the derivative g′(θ) is positive for θ ∈ (c, 1). In view of (37) and the fact that θ −c > 0 for θ ∈ (c, 1), it is sufficient to show that (12c)ρ(θ)+c20 for θ ∈ (c, 1) in the case that 12c<0 and the case that 12c<0. By the definition of ρ(θ), we have

0<ρ(θ)<1η3(1c) (38)

for θ ∈ (c, 1). In the case of 12c<0, using the upper bound of ρ(θ) given by (38), we have (12c)ρ(θ)+c2c20 for θ ∈ (c, 1). In the case of 12c<0, using the upper bound of ρ(θ) given by (38), we have

(12c)ρ(θ)+c2(12c)[1η3(1c)]+c2=12η61η2cc2η3>12η61η2ccη3=16(3η)(1c)>0.

Thus, we have established the claim that g′(θ) > 0 for θ ∈ (c, 1). It follows that 2(θ) is decreasing with respect to θ ∈ (c, 1). This completes the proof of the lemma.      □

Lemma 9

Let c ∈ (0, 1). Define 3(θ)=exp(nφ(η(cθ),θ)) for θ ∈ (0, c). Then, θ{Y¯nθ+η(cθ)}3(θ) for θ ∈ (0, c). Moreover, 3(θ) is increasing with respect to θ ∈ (0, c).

Proof

Note that 0 < η (c − θ) < 3(1 − θ) for θ ∈ (0, c). It follows from inequality (8) that

θ{Y¯nθ+η(cθ)}exp(nφ(η(cθ),θ))=exp(nη22g(θ)),

where

g(θ)=(cθ)2ρ(θ)[1ρ(θ)]

with ρ(θ)=θ+η3(cθ). Clearly, ρ(θ) > 0 for θ ∈ (0, c). Since c ∈ (0, 1) and η ∈ (0, 3), we have ρ(θ)<θ+η3(1θ)<θ+(1θ)=1 for θ ∈ (0, c). Hence, we have established that θ{Y¯nθ+η(cθ)}3(θ) for θ ∈ (0, c).

To show that 3(θ) is increasing with respect to θ ∈ (0, c), consider the derivative of g(θ) with respect to θ. Tedious computation shows that the derivative of g(θ) is given as

g(θ)=2(θc)[ρ(θ)]2[1ρ(θ)]2[(12c)ρ(θ)+c2]. (39)

We claim that g′(θ) is negative for θ ∈ (0, c). In view of (39) and the fact that θ −c < 0 for θ ∈ (0, c), it suffices to show (12c)ρ(θ)+c2>0 for the case that 12c0 and the case that 12c<0. Note that η3c<ρ(θ)<c for θ ∈ (0, c). In the case of 12c0, we have

(12c)ρ(θ)+c2(12c)η3c+c2>c2>0.

In the case of 12c<0, we have

(12c)ρ(θ)+c2(12c)c+c2>0.

Therefore, we have established the claim that g′(θ) < 0 for θ ∈ (0, c). This implies that 3(θ) is increasing with respect to θ ∈ (0, c). The proof of the lemma is thus completed. □

Lemma 10

Let c ∈ (0, 1). Define r=ηc3+η and

4(θ)={exp(nφ(η(θc),θ))forθ(r,c),0forθ(0,r).

Then, θ{Y¯nθ+η(θc)}4(θ) for θ ∈ (0, c). Moreover, 4(θ) is non-decreasing with respect to θ ∈ (0, c).

Proof

Clearly, 0 < r* < c. Note that θ + η (θ − c) < 0 for θ ∈ (0, r*). Hence, θ{Y¯nθ+η(θc)}θ{Y¯n<0} for θ ∈ (0, r*). On the other hand, it can be checked that 0 < η(cθ) < 3θ for θ ∈ (r*, c). It follows from inequality (9) that

θ{Y¯nθ+η(θc)}exp(nφ(η(θc),θ))=exp(nη22g(θ))

for θ ∈ (r*, c), where

g(θ)=(cθ)2ρ(θ)[1ρ(θ)]

with ρ(θ)=θ+η3(θc). It can be verified that ρ(θ) > 0 for θ ∈ (r*, c). Since c ∈ (0, 1) and η ∈ (0, 3), we have ρ(θ)<θ+η3(1θ)<θ+(1θ)=1 for θ ∈ (r*, c). Thus, we have shown that θ{Y¯nθ+η(θc)}4(θ) for θ ∈ (0, c).

To show that 4(θ) is non-decreasing with respect to θ ∈ (0, c), consider the derivative of g(θ) with respective to θ. Tedious computation shows that the derivative of g(θ) is given by

g(θ)=2(θc)[ρ(θ)]2[1ρ(θ)]2[(12c)ρ(θ)+c2]. (40)

Note that η3c<ρ(θ)<c for θ ∈ (0, c). It follows that

(12c)ρ(θ)+c2(12c)c+c20 (41)

for c(12,1) and θ ∈ (0, c). Moreover,

(12c)ρ(θ)+c2(12c)η3c+c2>η6c+c2>0 (42)

for c(0,12] and θ ∈ (0, c). Making use of (41), (41) and (42), we have g′(θ) < 0 for θ ∈ (r*, c). So, we have established that 4(θ) is non-decreasing with respect to θ ∈ (0, c). This completes the proof of the lemma.      □

Lemma 11

Assume that a < 0 < b. Define

N={2ε2(εη+ε3a)(bεηε3)ln2δforεη+ε3<a+b2,2ε2(ε3a)(b+ε3)ln2δforε3+a+b2<0,(ba)22ε2ln2δelse. (43)

Then, μ{|X¯nμ|max(ε,η|μ|)}<δ for any μ ∈ (0, b) provided that n > N.

Proof

For simplicity of notations, define ζ=εba,

λ=εη,c=aab,θ=λba+c,p=12ε3(ba),q=12+ε3(ba).

Define functions

Q+(θ)=θ{Y¯nθ+max(ζ,η(θc))},Q(θ)=θ{Y¯nθmax(ζ,η(θc))}

for θ ∈ (c, 1). For μ ∈ (0, b), putting θ=μba+c, we have c < θ < 1 and

μ{X¯nμ+max(ε,ημ)}=Q+(θ),μ{X¯nμmax(ε,ημ)}=Q(θ).

To prove the lemma, it suffices to consider the following three cases.

  • Case (I): εη+ε3<a+b2.

  • Case (II): ε3+a+b2<0.

  • Case (III): ε3a+b2εη+ε3.

First, consider Case (I). As a consequence of εη+ε3<a+b2, we have θ* < p*. Since p* > θ*, it follows from Lemma 1 that Q1(θ) is increasing for θ ∈ (0, θ*]. Moreover, according to Lemma 7, we have that 1(θ) is non-increasing for θ ∈ [θ*, 1). It follows that

Q+(θ)Q1(θ)Q1(θ)forθ[c,θ]. (44)

and that

Q+(θ)1(θ)1(θ)=Q1(θ)for θ[θ,1). (45)

Observing that q* > θ* and making use of Lemma 2, we have that Q2(θ) is non-decreasing for θ ∈ (c, θ*]. According to Lemma 8, we have that 2(θ) is decreasing for θ ∈ [θ*, 1). It follows that

Q(θ)Q2(θ)Q2(θ)for θ[c,θ). (46)

and that

Q(θ)2(θ)2(θ)=Q2(θ)for θ[θ,1). (47)

Combining (44), (45), (46) and (47), we have

Q+(θ)+Q(θ)Q1(θ)+Q2(θ)

for θ ∈ (c, 1). Observing that

Q1(θ)=exp(nζ22(θ+ζ3)(1θζ3)),Q2(θ)=exp(nζ22(θζ3)(1θ+ζ3))

and that

(θ+ζ3)(1θζ3)(θζ3)(1θ+ζ3)=2ζ3(12θ)>2ζ3(12p)>0,

we have Q1(θ)>Q2(θ) and consequently,

Q+(θ)+Q(θ)2Q1(θ)

for θ ∈ (c, 1). It follows that

μ{|X¯nμ|max(ε,η|μ|)}Q+(θ)+Q(θ)2Q1(θ)=2exp(nφ(ζ,θ))

for μ ∈ (0, b). This implies that μ{|X¯nμ|max(ε,η|μ|)}<δ provided that the corresponding sample size

n>ln2δφ(ζ,θ)=2(θ+ζ3)(1θ+ζ3)ζ2ln2δ=2(λba+c+ζ3)(1λbacζ3)ε2ln2δ=2ε2(εη+ε3a)(bεηε3)ln2δ.

Next, consider Case (II). As a consequence of ε3+a+b2<0, we have q* < c. Clearly, p* < q* < c < θ*. By Lemma 1, Q1(θ) is non-increasing for θ ∈ (c, θ*]. By Lemma 7, 1(θ) is non-increasing for θ ∈ [θ*, 1). It follows that

Q+(θ)Q1(θ)Q1(c)forθ[c,θ]. (48)

Moreover,

Q+(θ)1(θ)1(θ)=Q1(θ)Q1(c)for θ[θ,1). (49)

Similarly, according to Lemma 2, Q2(θ) is decreasing for θ ∈ (c, θ*]. By Lemma 8, 2(θ) is decreasing for θ ∈ [θ*, 1). It follows that

Q(θ)Q2(θ)+Q2(c)for θ[c,θ]. (50)

Moreover,

Q(θ)2(θ)2(θ)=Q2(θ)Q2(c)for θ[θ,1). (51)

Making use of (48), (49), (50) and (51), we have

Q+(θ)+Q(θ)Q1(c)+Q2(c)

for θ ∈ (c, 1). Observing that

Q1(c)=exp(nζ22(c+ζ3)(1cζ3)),Q2(c)=exp(nζ22(cζ3)(1c+ζ3))

and that

(c+ζ3)(1cζ3)(cζ3)(1c+ζ3)=2ζ3(12c)<2ζ3(12q)<0,

we have Q1(c)<Q2(c) and consequently,

Q+(θ)+Q(θ)2Q2(c)

for θ ∈ (c, 1). It follows that

μ{|X¯nμ|max(ε,η|μ|)}Q+(θ)+Q(θ)2Q2(c)=2exp(nφ(ζ,c))

for μ ∈ (0, b). This implies that μ{|X¯nμ|max(ε,η|μ|)}<δ provided that the corresponding sample size

n>ln2δφ(ζ,c)=2(cζ3)(1c+ζ3)ζ2ln2δ=2(aε3)(ba+a+ε3)ε2ln2δ=2ε2(ε3a)(b+ε3)ln2δ.

Finally, consider Case (III). In this case, we have μ{|X¯nμ|max(ε,η|μ|)}Q+(θ)+Q(θ)θ{|Y¯nθ|ζ}2exp(2nζ2). Therefore, μ{|X¯nμ|max(ε,η|μ|)}<δ provided that

n>ln2δ2ζ2=(ba)22ε2ln2δ.

This completes the proof of the lemma.      □

Lemma 12

Assume that a < 0 < b. Define

M={2ε2(εηε3a)(b+εη+ε3)ln2δforεη+ε3<a+b2,2ε2(ε3a)(bε3)ln2δforε3<a+b2,(ba)22ε2ln2δelse. (52)

Then, μ{|X¯nμ|max(ε,η|μ|)}<δ for any μ ∈ (a, 0) provided that n > M.

Proof

For simplicity of notations, define ζ=εba,

λ=εη,c=aab,ϑ=cλba,p=12ε3(ba),q=12+ε3(ba).

Define functions

Q+(θ)=θ{Y¯nθ+max(ζ,η(cθ))},Q(θ)=θ{Y¯nθmax(ζ,η(cθ))}

for θ ∈ (c, 1). For μ ∈ (0, b), putting θ=μbc+c, we have c < θ < 1 and

μ{X¯nμ+max(ε,η|μ|)}=Q+(θ),μ{X¯nμmax(ε,η|μ|)}=Q(θ).

To prove the lemma, it suffices to consider the following three cases.

  • Case (I): εη+ε3<a+b2.

  • Case (II): ε3<a+b2.

  • Case (III): (εη+ε3)a+b2ε3.

First, consider Case (I). As a consequence of εη+ε3<a+b2, we have q* < ϑ*. Since p* < q* < ϑ*, it follows from Lemma 9 that 3(θ) is increasing for θ ∈ (0, ϑ*]. According to Lemma 1, Q1(θ) is non-increasing for θ ∈ [ϑ*, c). Hence,

Q+(θ)3(θ)3(ϑ)=Q1(ϑ)for θ(0,ϑ]. (53)

Moreover,

Q+(θ)Q1(θ)Q1(ϑ)for θ(ϑ,c]. (54)

Since q* < ϑ*, it follows from Lemma 10 that 4(θ) is non-decreasing for θ ∈ (0, ϑ*]. From Lemma 2, Q2(θ) is decreasing for θ ∈ [ϑ*, c). Hence,

Q(θ)4(θ)4(ϑ)=Q2(ϑ)for θ(0,ϑ]. (55)

Moreover,

Q(θ)Q2(θ)Q2(ϑ)for θ(ϑ,c]. (56)

Making use of (53), (54), (55) and (56), we have

Q+(θ)+Q(θ)Q1(ϑ)+Q2(ϑ)

for θ ∈ (0, c). Observing that

Q1(ϑ)=exp(nζ22(ϑ+ζ3)(1ϑζ3)),Q2(ϑ)=exp(nζ22(ϑζ3)(1ϑ+ζ3))

and that

(ϑ+ζ3)(1ϑζ3)(ϑζ3)(1ϑ+ζ3)=2ζ3(12ϑ)>2ζ3(12q)<0,

we have Q1(ϑ)<Q2(ϑ) and consequently,

Q+(θ)+Q(θ)2Q2(ϑ)

for θ ∈ (0, c). It follows that

μ{|X¯nμ|max(ε,η|μ|)}Q+(θ)+Q(θ)2Q2(θ)=2exp(nφ(ζ,θ))

for μ ∈ (a, 0). This implies that μ{|X¯nμ|max(ε,η|μ|)}<δ provided that the corresponding sample size

n>ln2ζφ(ζ,c)=2(ϑζ3)(1ϑ+ζ3)ζ2ln2δ=2(λba+cζ3)(1+λba+c+ζ3)ζ2ln2δ=2ε2(εηε3a)(b+εη+ε3)ln2δ.

Case (II). As a consequence of εa<a+b2, we have p* > c. Clearly, q* > p* > c > ϑ*. By Lemma 9, 3(θ) is increasing for θ ∈ (0, ϑ*]. By Lemma 1, Q1(θ) is increasing for θ ∈ [ϑ*, c). Hence,

Q+(θ)3(θ)3(ϑ)=Q1(ϑ)Q1(c)for θ(0,ϑ]. (57)

Moreover,

Q+(θ)Q1(θ)Q2(c)for θ(ϑ,c]. (58)

Similarly, by Lemma 10, 4(θ) is non-decreasing for θ ∈ (0, ϑ*]. By Lemma 2, Q2(θ) is non-decreasing for θ ∈ [ϑ*, c). Hence,

Q(θ)4(θ)4(ϑ)=Q2(ϑ)Q2(c)for θ(0,ϑ]. (59)

Moreover,

Q(θ)Q2(θ)Q2(c)for θ(ϑ,c]. (60)

Making use of (57), (58), (59) and (60), we have

Q+(θ)+Q(θ)Q1(c)+Q2(c)

for θ ∈ (0, c). Observing that

Q1(c)=exp(nζ22(c+ζ3)(1cζ3)),Q2(c)=exp(nζ22(cζ3)(1c+ζ3))

and that

(c+ζ3)(1cζ3)(cζ3)(1c+ζ3)=2ζ3(12c)>2ζ3(12p)>0,

we have Q1(c)>Q2(c) and consequently,

Q+(θ)+Q(θ)2Q1(c)

for θ ∈ (0, c). It follows that

μ{|X¯nμ|max(ε,η|μ|)}Q+(θ)+Q(θ)2Q1(c)=2exp(nφ(ζ,c))

for μ ∈ (a, 0). This implies that μ{|X¯nμ|max(ε,η|μ|)}<δ provided that the corresponding sample size

n>ln2ζφ(ζ,c)=2(c+ζ3)(1cζ3)ζ2ln2δ=2(a+ε3)(ba+aε3)ε2ln2δ=2ε2(ε3a)(bε3)ln2δ.

Finally, consider Case (III). In this case, we have μ{|X¯nμ|max(ε,η|μ|)}Q+(θ)+Q(θ)θ{|Y¯nθ|ζ}2exp(2nζ2). Therefore, μ{|X¯nμ|max(ε,η|μ|)}<δ provided that

n>ln2ζ2ζ2=(ba)22ε2ln2δ.

This completes the proof of the lemma.      □

Lemma 13

Let N and M be defined by (43) and (52), respectively. Then,

max(N,M)={2ε2[|a+b|(εη+ε3)(εη+ε3)2ab]ln2δfor|a+b|2>εη+ε3,(ba)22ε2ln2δelse. (61)

Proof

To prove the lemma, it suffices to consider two cases as follows.

  • Case (A): a+b20.

  • Case (B): a+b2<0.

In Case (A), as a consequence of a+b20, we have

N={2ε2(εη+ε3a)(bεηε3)ln2δfor εη+ε3<a+b2,(ba)22ε2ln2δelse

and

M={2ε2(ε3a)(bε3)ln2δfor εη+ε3<a+b2,2ε2(ε3a)(bε3)ln2δfor ε3<a+b2,εη+ε3>a+b2,(ba)22ε2ln2δelse.

Therefore, in Case (A), we have

max(N,M)={2ε2(εη+ε3a)(bεηε3)ln2δfor a+b2>εη+ε3,(ba)22ε2ln2δfor0<a+b2<εη+ε3. (62)

In Case (B), as a consequence of a+b2<0, we have

N={2ε2(ε3a)(b+ε3)ln2δfor εη+ε3+a+b2<0,2ε2(ε3a)(b+εη)ln2δfor εη+a+b2<0,εη+ε3+a+b2>0,(ba)22ε2ln2δelse

and

M={2ε2(εηε3a)(b+εη+ε3)ln2δfor εη+εη<a+b2,(bc)22ε2ln2δelse.

Therefore, in Case (B), we have

max(N,M)={2ε2(εηε3a)(b+εη+ε3)ln2δfor a+b2<(εη+ε3),(ba)22ε2ln2δfor (εη+ε3)<a+b2<0. (63)

Combining (62) and (63), we have

max(N,M)={2ε2(εη+ε3a)(bεηε3)ln2δfor a+b2>εη+ε3,2ε2(εηε3a)(b+εη+ε3)ln2δfor a+b2<(εη+ε3),(ba)22ε2ln2δelse

which implies (61). This completes the proof of the lemma.      □

Finally, Theorem 4 can be established by making use of Lemmas 11, 12 and 13.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Dr. Zhengjia Chen, Email: zchen38@emory.edu, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322.

Dr. Xinjia Chen, Email: xinjia_chen@subr.edu, Department of Electrical Engineering, Southern University at Baton Rouge, LA 70813.

References

  • 1.Alamouti SM. A simple transmit diversity technique for wireless communications. IEEE Journal on Selected Areas in Communications. 1998;16(8):1451–1458. [Google Scholar]
  • 2.Azuma K. Weighted sums of certain dependent random variables. Tôkuku Math J. 1967;19(3):357–367. [Google Scholar]
  • 3.Arellano M, Pakkala S, Langston A, Tighiouart M, Pan L, Chen Z, Heffner LT, Lonial S, Winton E, Khoury HJ. Early clearance of peripheral blood blasts predicts response to induction chemotherapy in acute myeloid leukemia. Cancer. 2012;118(21):5278–5282. doi: 10.1002/cncr.27494. [DOI] [PubMed] [Google Scholar]
  • 4.Bentkus V. On Hoeffdings inequalities. The Annals of Probability. 2004;32(2):1650–1673. [Google Scholar]
  • 5.Chow SC, Shao J, Wang H. Sample Size Calculations in Clinical Trials. 2nd. Chapman & Hall; 2008. [Google Scholar]
  • 6.Doob J. Stochastic Processes. Wiley; 1953. [Google Scholar]
  • 7.Desu MM, Raghavarao D. Sample Size Methodology. Academic Press; 1990. [Google Scholar]
  • 8.Fishman GS. Monte Carlo – Concepts, Algorithms and Applications. Spring-Verlag; 1996. [Google Scholar]
  • 9.Franklin GF, Powell JD, Emami-Naeini A. Feedback Control of Dynamic Systems. Pearson Higher Education, Inc; 2014. [Google Scholar]
  • 10.Gajek L, Niemiro W, Pokarowski P. Optimal Monte Carlo integration with fixed relative precision. Journal of Complexity. 2013;29:4–26. [Google Scholar]
  • 11.Hampel F. Is statistics too difficult? The Canadian Journal of Statistics. 1998;26:497–513. [Google Scholar]
  • 12.Hoeffding W. Probability inequalities for sums of bounded variables. Journal of American Statistical Association. 1963;58:13–29. [Google Scholar]
  • 13.Janik M, Hartlage G, Alexopoulos N, Mirzoyev Z, McLean DS, Arepalli CD, Chen Z, Stillman AE, Raggi P. Epicardial adipose tissue volume and coronary artery calcium to predict myocardial ischemia on positron emission tomography-computed tomography studies. Journal of Nuclear Cardiology. 2010;17(5):841–847. doi: 10.1007/s12350-010-9235-1. [DOI] [PubMed] [Google Scholar]
  • 14.Khargonekar P, Tikku A. Randomized algorithms for robust control analysis and synthesis have polynomial complexity. Proceedings of the IEEE Conference on Decision and Control. 1996 [Google Scholar]
  • 15.Lagoa CM, Barmish BR. Distributionally robust Monte Carlo simulation: a tutorial survey. Proceedings of the IFAC World Congress. 2002 [Google Scholar]
  • 16.Larsson E, Stoica P. Space-Time Block Coding For Wireless Communications. Cambridge University Press; UK: 2003. [Google Scholar]
  • 17.Massart P. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. The Annals of Probability. 1990;18:1269–1283. [Google Scholar]
  • 18.Mitzenmacher M, Upfal E. Probability and Computing. Cambridage University Press; 2005. [Google Scholar]
  • 19.Motwani R, Raghavan P. Randomized Algorithms. Cambridge University Press; 1995. [Google Scholar]
  • 20.Proakis JG. Digital Communications. Mcgraw-Hill; 2000. [Google Scholar]
  • 21.Tempo R, Calafiore G, Dabbene F. Randomized Algorithms for Analysis and Control of Uncertain Systems. Springer; 2005. [Google Scholar]
  • 22.Tremba, Calafiore G, Dabbene F, Gryazina E, Polyak BT, Shcherbakov PS, Tempo R. RACT: Randomized Algorithms Control Toolbox for MATLAB. Proc of the IFAC World Congress; Seoul, Korea. July 2008. [Google Scholar]
  • 23.Wang D, Müller S, Amin AR, Huang D, Su L, Hu Z, Rahman MA, Nannapaneni S, Koenig L, Chen Z, Tighiouart M, Shin DM, Chen ZG. The pivotal role of integrin β1 in metastasis of head and neck squamous cell carcinoma. Clinical Cancer Research. 2012;18(17):4589–4599. doi: 10.1158/1078-0432.CCR-11-3127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Willams D. Probability with Martingales. Cambriage University Press; 1991. [Google Scholar]

RESOURCES