Abstract
In this article, we propose rigorous sample size methods for estimating the means of random variables, which require no information of the underlying distributions except that the random variables are known to be bounded in a certain interval. Our sample size methods can be applied without assuming that the samples are identical and independent. Moreover, our sample size methods involve no approximation. We demonstrate that the sample complexity can be significantly reduced by using a mixed error criterion. We derive explicit sample size formulae to ensure the statistical accuracy of estimation.
1 Introduction
Many problems of engineering and sciences boil down to estimating the mean value of a random variable [18, 19]. More formally, let X be a random variable with mean μ. It is a frequent problem to estimate μ based on samples X1, X2, ⋯, Xn of X, which are defined on a probability space (Ω, ℱ, ℙμ), where the subscript in the probability measure ℙμ indicates its association with μ. In many situations, the information on the distribution of X is not available except that X is known to be bounded in some interval [a, b]. For example, in clinical trials, many quantities under investigation are bounded random variables, such as biomarker, EGFR, K-Ras, B-Raf, Akt, etc (see., e.g., [3, 13, 23] and the references therein). Moreover, the samples X1, X2, ⋯, Xn may not be identical and independent (i.i.d). This gives rise to the significance of estimating μ under the assumption that
| (1) |
| (2) |
where ℕ denotes the set of positive integers, and {ℱk, k = 0, 1, ⋯, ∞} is a sequence of σ-subalgebra such that {∅, Ω} = ℱ0 ⊂ ℱ1 ⊂ ℱ2 ⊂ ⋯ ⊂ ℱ, with ℱk being generated by X1, ⋯, Xk. The motivation we propose to consider the estimation of μ under dependency assumption (2) is twofold. First, from a theoretical point of view, we want the results to hold under the most general conditions. Clearly, (2) is satisfied in the special case that X1, X2, ⋯ are i.i.d. Second, from a practical standpoint, we want to weaken the independency assumption for more applications. For example, in the Monte Carlo estimation technique based on adaptive importance sampling, the samples X1, X2, ⋯ are not necessarily independent. However, as demonstrated in page 6 of [10], it may be shown that the samples satisfy (2). An example of adaptive importance sampling is given in Section 5.8 of [8] on the study of catastrophic failure.
An unbiased estimator for μ can be taken as
Let ɛ ∈ (0, 1) and δ ∈ (0, 1) be pre-specified margin of absolute error and confidence parameter, respectively. Since the probability distributions of X1, X2, ⋯ are usually unknown, one would use an absolute error criterion and seek the sample size, n, as small as possible such that for all values of μ,
| (3) |
holds for all distributions having common mean μ. It should be noted that it is difficult to specify a margin of absolute error ɛ, without causing undue conservatism, for controlling the accuracy of estimation if the underlying mean value μ can vary in a wide range. To achieve acceptable accuracy, it is necessary to choose small ɛ for small μ. However, this leads to unnecessarily large sample sizes for large μ.
In addition to the absolute error criterion, a relative error criterion is frequently used for the purpose of error control. Let η ∈ (0, 1) and δ ∈ (0, 1) be pre-specified margin of relative error and confidence parameter, respectively. It is desirable to determine the sample size, n, as small as possible such that for all values of μ,
| (4) |
holds for all distributions having common mean μ. Unfortunately, the determination of sample size, n, requires a good lower bound for μ, which is usually not available. Otherwise, the sample size n needs to be very large, or infinity.
To overcome the aforementioned difficulties, a mixed criterion may be useful. The reason is that, from a practical point of view, an estimate can be acceptable if either an absolute criterion or a relative criterion is satisfied. More specifically, let ɛ > 0, η ∈ (0, 1) and δ ∈ (0, 1). To control the reliability of estimation, it is crucial that the sample size n is as small as possible, such that for all values of μ,
| (5) |
holds for all distributions having common mean μ.
In the estimation of parameters, a margin of absolute error is usually chosen to be much smaller than the margin of relative error. For instance, in the estimation of a binomial proportion, a margin of relative error η = 0.1 may be good enough for most situations, while a margin of absolute error may be expected to be ɛ = 0.001 or even smaller. In many applications, a practitioner accepting a relative error normally expects a much smaller absolute error, i.e., ɛ ≪ η. On the other hand, one accepting an absolute error ɛ typically tolerates a much larger relative error, i.e., η ≫ ɛ. It will be demonstrated that the required sample size can be substantially reduced by using a mixed error criterion.
Given that the measure of precision is chosen, the next task is to determine appropriate sample sizes. A conventional method is to determine the sample size by normal approximation derived from the central limit theorem [5, 7]. Such an approximation method inevitably leads to unknown statistical error due to the fact the sample size n must be a finite number [8, 11]. This motivates us to explore rigorous methods for determining sample sizes.
In this paper, we consider the problem of estimating the means of bounded random variables based on a mixed error criterion. The remainder of the paper is organized as follows. In Section 2, we introduce some martingale inequalities. In Section 3, we derive explicit sample size formulae by virtue of concentration inequalities and martingale inequalities. In Section 4, we extend the techniques to the problem of estimating the difference of means of two bounded random variables. Illustrative examples are given in Section 5. Section 6 provides our concluding remarks. Most proofs are given in Appendices.
2 Martingale Inequalities
Under assumption (2), it can be readily shown that {Xk − μ} is actually a sequence of martingale differences (see, e.g., [6, 24] and the references therein). In the sequel, we shall introduce some martingale inequalities which are crucial for the determination of sample sizes to guarantee pre-specified statistical accuracy.
Define function
for 0 < ɛ < 1 − μ < 1. Under the assumption that 0 ≤ Xk ≤ 1 almost surely and (2) holds for all k ∈ ℕ, Hoeffding [12] established that
| (6) |
To see that such result is due to Hoeffding, see Theorem 1 and the remarks on page 18, the second paragraph, of his paper [12]. For bounds tighter than Hoeffding’s inequality, see a recent paper [4].
To obtain simpler probabilistic inequalities, define bivariate function
It is shown by Massart [17] that
| (7) |
By virtue of Hoeffding’s inequality and Massart’s inequality, the following results can be justified.
Theorem 1
Assume that 0 ≤ Xk ≤ 1 almost surely and (2) holds for all k ∈ ℕ. Then,
| (8) |
| (9) |
Proof
To prove Theorem 1, note that
| (10) |
From Hoeffding’s inequality (6) and Massart’s inequality (7), we have
| (11) |
Observe that is a left-continuous function of z and that φ(ɛ, µ) is a continuous function of ɛ. Making use of this observation and (11), we have
| (12) |
Note that
| (13) |
Combining (10), (11), (12) and (13) yields
This proves (8). To show (9), define Yi = 1−Xi for i = 1, ⋯, n. Define and ν = 1−μ. Then, . Applying (8), we have
for 0 < ɛ < 3(1 − ν). By the definitions of ν and , we can rewrite the above inequality as
for 0 < ɛ < 3μ. Observing that and that φ(ɛ, 1 − μ) = φ(−ɛ, μ), we have (9). This completes the proof of Theorem 1. □
It should be noted that Theorem 1 extends Massart’s inequality in two aspects. First, the random variables are not required to be i.i.d. Bernoulli random variables. Second, the inequalities hold for wider supports.
3 Explicit Sample Size Formulae
In this section, we shall investigate sample size methods for estimating the mean of bounded random variable X.
If X1, ⋯, Xn are i.i.d. samples of X bounded in interval [0, 1], it can be shown by Chebyshev’s inequality that (3) holds provided that
| (14) |
Under the assumption that 0 ≤ Xk ≤ 1 and almost surely for all k ∈ ℕ, Azuma-Hoeffding inequality [2, 12] implies that (3) holds for all μ ∈ (0, 1) if
| (15) |
Clearly, the ratio of the sample size determined by (14) to that of (15) is equal to
which is substantially greater than 1 for small δ ∈ (0, 1). Despite the significant improvement upon the sample size bound (14), the sample size bound (15) usually leads to a very large sample size, since ɛ is typically a small number in practice. For example, with δ = 0.05, we have n = 1, 844, 440 and n = 184, 443, 973 for ɛ = 0.001 and 0.0001, respectively.
To the best of our knowledge, the sample size bound (15) is the tightest one discovered so far under the assumption that 0 ≤ Xk ≤ 1 and almost surely for all k ∈ ℕ. In order to reduce the sample complexity, we propose to use the mixed error criterion, which can be viewed as a relaxation of the absolute error criterion. In this direction, we have exploited the application of Chebyshev’s inequality to establish the following result.
Theorem 2
If X1, ⋯, Xn are i.i.d. samples of X bounded in interval [0, 1], then (5) holds for all μ ∈ (0, 1) provided that and that
| (16) |
See Appendix A for proof.
The sample size formula (16) may be too conservative. To derive tighter sample size formulae, we need to use the martingale inequalities of exponential form presented in the last section. Throughout the remainder of this section, we make the following assumption:
X1, X2, ⋯ are random variables such that a ≤ Xk ≤ b and almost surely for all k ∈ ℕ.
In the case that X1, X2, ⋯ are nonnegative random variables, we have the following general result.
Theorem 3
Let 0 ≤ a < b. Assume that 0 < ɛ < b − a and and that Define
and
Then, for any μ ∈ (a, b) provided that n > max(N, M).
See Appendix B for proof. In Theorem 3, our purpose of assuming ɛ < b − a and is to make sure that the absolute error criterion is active for some μ ∈ (a, b) and that the relative error criterion is active for some μ ∈ (a, b). In Table 1, we list sample sizes for b = 1, ɛ = 0.001, η = 0.1, δ = 0.01 and various values of a, where Nmix denotes the sample sizes calculated by virtue of Theorem 3 and the mixed error criterion, and Nabs denotes the sample sizes obtained from the Chernoff-Hoeffding bound. More precisely,
and
| (17) |
where ⌈.⌉ denotes the ceil function. It can be seen from the table that the sample complexity can be significantly reduced by using a mxied error criterion and our sample size formula.
Table 1.
Table of Sample Sizes
| a | Nmix | Nabs | a | Nmix | Nabs |
|---|---|---|---|---|---|
| 0.001 | 97880 | 499001 | 0.006 | 40765 | 494019 |
| 0.002 | 87393 | 498002 | 0.007 | 34871 | 493025 |
| 0.003 | 76906 | 497005 | 0.008 | 30451 | 492032 |
| 0.004 | 66419 | 496008 | 0.009 | 27013 | 491041 |
| 0.005 | 55932 | 495013 | 0.01 | 24263 | 490050 |
As an immediate application of Theorem 3, we have the following result.
Corollary 1
Let ɛ and η be respectively the margins of absolute and relative error such that
| (18) |
Assume that 0 ≤ Xk ≤ 1 almost surely for all k ∈ ℕ. Then, (5) holds for all μ ∈ (0, 1) provided that
| (19) |
It should be noted that (18) can be readily satisfied in practice, since 0 < ɛ ≪ η < 1 is true in most applications.
An appealing feature of formula (19) is that the resultant sample size is much smaller as compared to that of (15) and (16). Moreover, to apply (19), no approximation is involved and no information of μ is needed. Furthermore, the samples need not be i.i.d.
Under the condition that 0 < ɛ ≪ η ≪ 1, the sample size bound of (19) can be approximated as
which indicates that the required sample size is inversely proportional to the product of margins of absolute and relative errors. It can be shown that the ratio of the bound of (16) to the sample size bound of (19) converges to 0 as δ decreases to 0, which implies that the bound of (19) is better for small δ.
The comparison of sample size formulae (15) and (19) is shown in Figure 1, where it can be seen that the sample size formula (19) leads to a substantial reduction in sample complexity as compared to (15).
Figure 1.

Comparison of Sample Sizes
To obtain more insight into such a reduction of sample size, we shall investigate the ratio of the sample sizes, which is given as
Let ɛ ∈ (0, 1) and η ∈ (0, 1) such that (18) holds. When no information of μ is available except that μ is known to be bounded in (0, 1), the best known sample size bound is given by (15), which asserts that (3) holds for any μ ∈ (0, 1) provided that (15) holds. According to Corollary 1, we have that (5) holds for any μ ∈ (0, 1) provided that (19) is true. In view of (15) and (19), the ratio of the sample sizes tends to
as ɛ → 0 under the restriction that is fixed.
From Figure 2, it can be seen that the limiting ratio, R(λ), of sample sizes is substantially greater than 1 for small λ > 0. For example, if ɛ = 10−5 and η = 0.1, we have and R(λ) ≈ 2500. This demonstrates that the required sample size can be significantly reduced by virtue of a mixed error criterion. As mentioned earlier, for small η (e.g. η = 0.1), the requirement (5) can be viewed as a slight relaxation of the requirement (3). Our analysis indicates that such a slight relaxation is well worthy of the significant reduction in the sample complexity.
Figure 2.

Limit of Ratio of Sample Sizes
In Theorem 3, the random variables X1, X2, ⋯ are assumed to be non-negative. In light of the fact that, in some situations, the random variables may assume positive or negative values, we have derived explicit sample size formula in the following result.
Theorem 4
Let a < 0 < b. Assume that 0 < ɛ < b − a and and that ɛ < η max(|a|, b). Define
Then, for any μ ∈ (a, b) provided that n > M.
See Appendix C for proof.
It should be noted that the advantage of using the mixed error criterion is more pronounced if the interval [a, b] contains 0 and is more asymmetrical about 0. As an illustration, consider the configuration with ɛ = 0.1, η = 0.1 and δ = 0.05. Assume that the lower bound, a, of the interval is fixed as −1 and the upper bound, b, of the interval is a parameter. From formula (15), we know that the sample size required to ensure for any μ ∈ [a, b] can be obtained from (17).
According to Theorem 4, the sample size required to ensure for any μ ∈ [a, b] can be calculated as
Since a is fixed, the ratio, , of sample sizes is a function of b. Such a function is shown by Figure 3, from which it can be seen that the larger b is, the greater the reduction of sample size can be achieved by virtue of a mixed error criterion.
Figure 3.

Ratio of Sample Sizes (ɛ = 0.1, η = 0.1, δ = 0.05 and a = −1)
4 Estimating the Difference of Two Population Means
Our method can be extended to the estimation of the difference of means of bounded random variables. Let Y and Z be two bounded random variables such that and Let X = Y − Z and μ = μY − μZ. Let Y1, ⋯ Yn be i.i.d. samples of Y. Let Z1, ⋯ Zn be i.i.d. samples of Z. Assume that the samples of Y and Z are independent. Let Xi = Yi −Zi for i = 1, 2, ⋯, n. Then, X1, ⋯, Xn are i.i.d. samples of X. Clearly, X is a bounded random variable. So are X1, ⋯, Xn. Define
Then, is an estimator for μ = μY − μZ. We can apply the sample size methods proposed in Section 3 to determine n such that
To illustrate, consider an example with Y bounded in [0, 10] and Z bounded in [0, 1]. Assume that ɛ = 0.1, η = 0.1 and δ = 0.05. Since X = Y − Z is a random variable bounded in the interval [−1, 10], from the discussion in last section, it can be seen that Theorem 4 can be employed to obtain the minimum sample size as 13, 408.
5 Illustrations
In this section, we shall illustrate the applications of our sample size formulae by examples in control and telecommunication engineering.
An extremely important problem of control engineering is to determine the probability that a system will fail to satisfy pre-specified requirements in an uncertain environment. This critical issue has been extensively studied in an area referred to as probabilistic robustness analysis (See, e.g. [14, 15, 21] and the references therein). In general, there is no effective deterministic method for computing such failure probability except the Monte Carlo estimation method. To estimate the probability of failure, the uncertain environment is modeled by a random variable Δ, which may be scalar or matrix-valued. Hence, a Bernoulli random variable X can be defined as a function of Δ such that assumes value 1 if the system associated with Δ fails to satisfy pre-specified requirements and assumes value 0 otherwise. Clearly, the failure probability p is equal to the mean of X. That is, For estimating the failure probability p, randomized algorithms have been implemented in a widely used software package RACT [22], in which an absolute error criterion is used for estimating p. Specifically, for a priori ɛ, δ ∈ (0, 1), the objective is to obtain an estimator such that holds regardless of the value of the p ∈ (0, 1). The estimator is defined as
where N is the sample size and Δ1, Δ2, ⋯, ΔN are i.i.d. samples of Δ. In most situations, there is no useful information about the range of the failure probability p due to the complexity of the system. Therefore, the determination of the sample size N should not be dependent on the range of p. It is well-known that, to make for any p ∈ (0, 1), an approximate sample size based on normal approximation is
| (20) |
where is the critical value such that
The approximate sample size formula (20) will inevitably lead to unknown statistical error, since the formula (20) is based on the central limit theorem, which is an asymptotic result. In view of this drawback, control theorists and practitioners are reluctant to use the approximate formula (20). To rigorously control the statistical accuracy of the estimation, the Chernoff-Hoeffding bound is most frequently used in control engineering for determination of sample size. To ensure that holds for any p ∈ (0, 1), it suffices to take sample size
| (21) |
The ratio of the sample size (21) to the sample size (20) is approximately equal to , which tends to 1 as δ → 0. It can be shown that
This indicates that in most situations, the ratio of the rigorous sample size (21) to the approximate sample size (20) does not exceed . From this analysis, it can be seen that it is worthy to obtain a rigorous control of the statistical accuracy by using the sample size (21) at the price of increasing the computational complexity up to 50%. This explains why the sample size (21) is frequently used in control engineering. As a matter of fact, the sample size formula (21) is implemented in RACT to estimate the failure probability.
In control engineering, the absolute error criterion is widely used. Recall that in Section 3, we have shown that a much smaller sample size is sufficient if a mixed error criterion is used. More specifically, the sample size can be significantly reduced by letting η ∈ (0, 1) and relaxing the requirement as
In many situations, the margin of absolute error ɛ needs to be very small (e.g., ɛ << 0.1), since p is usually a very small number. However, the margin of relative error η does not need to be extremely small. For example, η = 0.1 may be sufficient for most cases.
As a concrete illustrative example, consider an uncertain dynamic system described by the differential equation
where u(t) is the input, y(t) is the output, and q1, q2, q3 are uncertain parameters. Assume that the tuple (q1, q2, q3) is uniformly distributed over the domain
According to control theory, the system is said to be stable if the output is bounded for any bounded input. It can be shown that such a stability criterion is satisfied if and only if all the roots of the polynomial equation
| (22) |
with respect to s in the field of complex number have negative real parts (see, e.g., Section 3.6 of [9] for an explanation of the concept of stability). Since the roots of equation (22) are functions of random variables q1, q2 and q3, a Bernoulli random variable X can be defined in terms of q1, q2 and q3 such that X assumes value 0 if all the roots have negative real parts, and otherwise X assumes value 1. For this particular example, we are interested in estimating the probability that the system is unstable. This amounts to the estimation of the probability that the Bernoulli random variable X assumes value 1. Since X is bounded in interval [0, 1], our sample size formula can be useful for the planning of the Monte Carlo experiment. Let δ = 10−3. If the margin of error ɛ = 10−3, then the sample size is obtained by (21) as 3800452. If we use a mixed criterion with η = 0.1 and the same ɛ and δ, then the sample size can be computed by (19) as 155463, which is only about 5% of sample size for the absolute criterion. The estimate of the probability of instability is obtained as 0.5403.
In wireless data communications, a frequent problem is to evaluate the bit error rate of a data transmission scheme. The bit error rate is the probability that a bit is transmitted incorrectly. In many situations, due to the complexity of the transmission system, the only tool to obtain the bit error rate is the Monte Carlo simulation method. For example, there is no exact analytical method for computing the bit error rate of a wireless data transmission system employing multiple antennas and space-time block codes. The principle of this transmission system is proposed in [1] (see, e.g., [16] and the references therein for a comprehensive discussion). The wireless data transmission process can be modeled by a sequence of Bernoulli random variables X1, X2, ⋯, where Xi assumes value 0 and 1 in accordance with the correct and incorrect transmission of the i-th bit. If X1, X2, ⋯ are identically and independently distributed Bernoulli random variables of the same mean μ ∈ (0, 1), then the bit error rate is μ and its estimator can be taken as with n being sufficiently large. However, as a consequence of the application of the space-time block codes, the random variables X1, X2, ⋯ are not independent. This gives rise to the following question:
Is it possible to estimate the bit error rate without the independence of the random variables X1, X2, ⋯?
In a wireless data transmission system employing multiple antennas and space-time block codes, the expectation of Xk conditioned upon Xℓ, ℓ < k is a constant μ with respect to k, since the noise process is stationary and the input data can be treated as a Bernoulli process [1, 16]. This implies that it is reasonable to treat X1, X2, ⋯ as a martingale process such that condition (2) is satisfied. Hence, despite the lack of independence, the bit error rate can be approximated by . To control the statistical error, the sample size method proposed in the previous section can be applied to determine the appropriate value of n.
6 Concluding Remarks
In this paper, we have considered the problem of estimating means of bounded random variables. We have illustrated that in many applications, it may be more appropriate to use a mixed error criterion for quantifying the reliability of estimation. We demonstrated that as a consequence of using the mixed error criterion, the sample complexity can be substantially reduced. By virtue of probabilistic inequalities, we have developed explicit sample size formulae for the purpose of controlling the statistical error of estimation. We have attempted to make our results generally applicable by eliminating the need of i.i.d. assumptions of the samples and the form of the underlying distributions.
Research highlights.
A rigorous sample size method for estimating the mean of bonded random variable.
It requires neither information nor IID condition of the underlying distribution.
It involves no approximation.
Sample complexity can be significantly reduced by using a mixed error criterion.
Explicit sample size formulae to ensure the statistical accuracy of estimation.
Acknowledgments
The author would like to thank the Associated Editor and referees for their time, effort and comments in reviewing this paper.
This research is supported in part by NIH/NCI Grants No. 1 P01 CA116676, P30 CA138292-01, and 5 P50 CA128613.
A Proof of Theorem 2
Note that
| (23) |
Since X1, ⋯, Xn are i.i.d. samples of X, it follows from (23) and Chebyshev’s inequality that
| (24) |
where denotes the variance of X. Since 0 ≤ X ≤ 1 almost surely and , it must be true that
| (25) |
Combining (24) and (25) yields
| (26) |
where
for μ ∈ (0, 1). Now we investigate the maximum of Q(μ) for μ ∈ (0, 1) by considering two cases as follows.
Case (i) : 0 ≤ μ ≤ λ.
Case (ii) : λ < µ ≤ 1.
In Case (i), we have and
| (27) |
where we have used the fact that μ(1 − μ) is increasing with respect to . In Case (ii), we have λ < μ ≤ 1 and
| (28) |
In view of (27) and (28), we have
| (29) |
Making use of (26) and (29), we have
from which the theorem immediately follows. This completes the proof of Theorem 2.
Throughout the proofs of Theorems 3 and 4, we shall use the following definitions. Let
Let denote the probability measure associated with θ. Define
where X1, X2, ⋯ are random variables such that a ≤ Xk ≤ b and almost surely for all .
B Proof of Theorem 3
To prove the theorem, we need some preliminary results.
Lemma 1
Let ζ ∈ (0, 1). Define
Then, for θ ∈ (0, 1). Moreover, is increasing with respect to and non-increasing with respect to
Proof
For , we have 0 < ζ < 3(1 − θ), it follows from (8) that for . For , we have θ + ζ > 1 and consequently,
Thus, we have shown that for θ ∈ (0, 1). To establish the monotonicity of , it is sufficient to observe that
which is negative for any and positive for any . □
Lemma 2
Let ζ ∈ (0, 1). Define
Then, for θ ∈ (0, 1). Moreover, is non-decreasing with respect to and decreasing with respect to .
Proof
For , we have 0 < ζ < 3θ, it follows from (9) that for . For , we have θ − ζ < 0 and consequently,
Thus, we have shown that for θ ∈ (0, 1). To establish the monotonicity of , it is sufficient to observe that
which is negative for any and positive for any . □
Lemma 3
Let . Define
and
Then, the following assertions hold.
for θ ∈ (0, 1).
If ν* > 0, then is increasing with respect to θ ∈ (0, ν*) and non-increasing with respect to θ ∈ (ν*, 1).
If ν* ≤ 0, then is non-increasing with respect to θ ∈ (0, 1)
Proof
To show assertion (I), note that θ + η(θ − c) > 1 for θ ∈ [r*, 1). Consequently,
for θ ∈ [r*, 1). On the other hand, 0 < η (θ − c) < 3(1 − θ) for θ ∈ (0, r*). Hence, it follows from inequality (8) that
where
with Clearly, 0 < r* < 1 and 0 < ρ(θ) < 1 for θ ∈ (0, r*). This proves assertion (I).
To show assertions (II) and (III), consider the derivative of g(θ) with respect to θ. Let and . Then, ρ(θ) = x − α and
| (30) |
Since θ − c > 0 for θ ∈ (0, 1), it follows from (30) that g′(θ) ≥ 0 if and only if
which is equivalent to θ ≥ ν*. As a consequence of c < 1, we have ν* < r*. It follows that assertions (II) and (III) hold. □
Lemma 4
Define
Then, for all μ ∈ (a, b) provided that n > N.
Proof
For simplicity of notations, define
and for θ ∈ (0, 1). Then, . It suffices to show the lemma for the following three cases.
Case (1):
Case (2):
Case (3):
First, consider Case (1). Clearly, as a consequence of , we have p* ≥ θ*. As a consequence of , we have θ* ≥ ν*. Therefore, it follows from that p* ≥ θ* ≥ ν*. Since , it follows from Lemma 1 that is increasing for θ ∈ (0, θ*]. Hence,
for θ ∈ (0, θ*]. Since θ* ≥ ν*, it follows from Lemma 3 that is decreasing for θ ∈ [θ*, 1). Hence,
for θ ∈ [θ*, 1). Therefore, for θ ∈ (0, 1) provided that . Observing that
we have that for θ ∈ (0, 1) provided that
Next, consider Case (2). As a consequence of , we have θ* < min(p*, ν*). Since , it follows from Lemma 1 that is increasing for for θ ∈ (0, θ*]. Hence,
for θ ∈ (0, θ*]. Since θ* < ν*, it follows from Lemma 3 that is increasing for θ ∈ [θ*, ν*) and is decreasing for θ ∈ [ν*, 1). Hence,
for θ ∈ [θ*, 1). Therefore, for θ ∈ (0, 1) provided that . Since , it follows that for θ ∈ (0, 1) provided that
where we have used the definitions of ν* and c.
Finally, consider Case (3). In this case, we have . Therefore, provided that
This completes the proof of the lemma. □
Lemma 5
Let c ≤ 0. Define
and
Then, the following assertions hold.
for θ ∈ (0, 1).
If ν* ≤ 1, then is non-decreasing with respect to θ ∈ (0, ν*) and decreasing with respect to θ ∈ (ν*, 1).
If ν* > 1, then is non-decreasing with respect to θ ∈ (0, 1).
Proof
Clearly, r* ≥ 0. To show assertion (I), note that θ − η(θ − c) < 0 for θ ∈ (0, r*). It follows that
for θ ∈ (0, r*). On the other hand, 0 < η (θ − c) < 3θ for θ ∈ (r*, 1), it follows from (9) that
where
with . Clearly, ρ(θ) < θ < 1. Since θ > r*, we have ρ(θ) > 0. Hence, 0 < ρ(θ) < 1 for θ ∈ (r*, 1). This establishes assertion (I).
To show assertions (II) and (III), consider the derivative of g(θ) with respective to θ. Tedious computation shows that
| (31) |
Since θ − c > 0 for θ ∈ (0, 1), it follows from (31) that if and only if , which is equivalent to θ ≥ ν*. Direct computation shows that 0 < r* < ν*. It follows that assertions (II) and (III) hold. □
Lemma 6
Define
Then, for all μ ∈ (a, b) provided that n > M.
Proof
For simplicity of notations, define
It can be checked that 0 < θ* < 1. Define for θ ∈ (0, 1). Then, . We need to show the lemma for the following six cases.
Case (i):
Case (ii): and .
Case (iii): and .
Case (iv):
Case (v): Else.
First, consider Case (i). Clearly, as a consequence of , we have q* ≥ θ*. As a consequence of , we have θ* ≥ ν*. It follows from that q* ≥ θ* ≥ ν*. Since q* ≥ θ*, it follows from Lemma 2 that is non-decreasing for θ ∈ (0, θ*]. Hence, for θ ∈ (0, θ*].
Since θ* ≥ ν*, it follows from Lemma 5 that is decreasing for θ ∈ [θ*, 1). Hence,
for θ ∈ [θ*, 1). Hence, for θ ∈ (0, 1) provided that . Observing that , we have that for θ ∈ (0, 1) provided that
Second, consider Case (ii). As a consequence of , we have ν* < 1. Making use of , we have θ* < min(q*, ν*). Since q* > θ*, it follows from Lemma 2 that is non-decreasing for θ ∈ (0, θ*]. Hence, for θ ∈ (0, θ*],
| (32) |
Since θ* < ν*, it follows from Lemma 5 that is increasing for θ ∈ [θ*, ν*) and is decreasing for θ ∈ [ν*, 1). Hence,
| (33) |
for θ ∈ [θ*, 1). Note that
| (34) |
In view of (32), (33) and (34), we have that for θ ∈ (0, 1) provided that Observing that
we have that for θ ∈ (0, 1) provided that the corresponding sample size
where we have used the definitions of ν and c.
Third, consider Case (iii). As a consequence of
we have r* < 1 < ν* and q* ≥ θ*. Since q* ≥ θ*, it follows from Lemma 2 that is non-decreasing for θ ∈ (0, θ*]. Hence, for θ ∈ (0, θ*],
Since ν* > 1, it follows from Lemma 5 that is non-decreasing for θ ∈ [θ*, 1). Hence,
for θ ∈ [θ*, 1). Note that . Hence, for θ ∈ (0, 1) provided that . Since , it follows that for θ ∈ (0, 1) provided that the corresponding sample size
Now, consider Case (iv). As a consequence of we have r* ≥ 1, which implies that Q−(θ) = 0 for θ ∈ (0, 1). Hence, for θ ∈ (0, 1) for any sample size n ≥ 1.
Finally, consider Case (v). In this case, we have . Therefore, provided that
This completes the proof of the lemma. □
Finally, Theorem 3 can be established by making use of Lemmas 4 and 6.
C Proof of Theorem 4
To prove the theorem, we need some preliminary results.
Lemma 7
Let c ∈ (0, 1). Define and
Then, for θ ∈ (c, 1). Moreover, is non-increasing with respect to θ ∈ (c, 1).
Proof
By the definition of r*, it can be checked that c < r* < 1. Note that θ + η(θ − c) > 1 for θ ∈ (r*, 1). Hence, for θ ∈ (r*, 1). On the other hand, 0 < η(θ − c) < 3(1 − θ) for θ ∈ (c, r*). Thus, it follows from inequality (8) that
where
with . It can be verified that 0 < ρ(θ) < 1 for θ ∈ (c, r*). This shows that for θ ∈ (c, 1).
To show that is non-increasing with respect to θ ∈ (c, 1), consider the derivative of g(θ) with respective to θ. Tedious computation shows that the derivative of g(θ) is given as
| (35) |
We claim that the derivative g′(θ) is positive for θ ∈ (c, 1). In view of (35) and the fact that θ −c > 0 for θ ∈ (c, 1), it is sufficient to show that for θ ∈ (c, 1) in the case that and the case that . By the definition of ρ(θ), we have
| (36) |
for θ ∈ (c, 1). In the case of , using the lower bound of ρ(θ) given by (36), we have for θ ∈ (c, 1). In the case of , using the upper bound of ρ(θ) given by (36), we have
for θ ∈ (c, 1) and η ∈ (0, 1). Thus, we have shown the claim that g′(θ) > 0 in all cases. This implies that is non-increasing with respect to θ ∈ (c, 1). □
Lemma 8
Let c ∈ (0, 1). Define for θ ∈ (c, 1). Then, for θ ∈ (c, 1). Moreover, is decreasing with respect to θ ∈ (c, 1). □
Proof
Clearly, 0 < η(θ − c) < 3θ for θ ∈ (c, 1). It follows from inequality (9) that
where
with . Clearly, 0 < ρ(θ) < 1 for θ ∈ (c, 1). This shows that for θ ∈ (c, 1).
To show that is decreasing with respect to θ ∈ (c, 1), consider the derivative of g(θ) with respect to θ. Tedious computation shows that the derivative of g(θ) is given as
| (37) |
We claim that the derivative g′(θ) is positive for θ ∈ (c, 1). In view of (37) and the fact that θ −c > 0 for θ ∈ (c, 1), it is sufficient to show that for θ ∈ (c, 1) in the case that and the case that . By the definition of ρ(θ), we have
| (38) |
for θ ∈ (c, 1). In the case of , using the upper bound of ρ(θ) given by (38), we have for θ ∈ (c, 1). In the case of , using the upper bound of ρ(θ) given by (38), we have
Thus, we have established the claim that g′(θ) > 0 for θ ∈ (c, 1). It follows that is decreasing with respect to θ ∈ (c, 1). This completes the proof of the lemma. □
Lemma 9
Let c ∈ (0, 1). Define for θ ∈ (0, c). Then, for θ ∈ (0, c). Moreover, is increasing with respect to θ ∈ (0, c).
Proof
Note that 0 < η (c − θ) < 3(1 − θ) for θ ∈ (0, c). It follows from inequality (8) that
where
with . Clearly, ρ(θ) > 0 for θ ∈ (0, c). Since c ∈ (0, 1) and η ∈ (0, 3), we have for θ ∈ (0, c). Hence, we have established that for θ ∈ (0, c).
To show that is increasing with respect to θ ∈ (0, c), consider the derivative of g(θ) with respect to θ. Tedious computation shows that the derivative of g(θ) is given as
| (39) |
We claim that g′(θ) is negative for θ ∈ (0, c). In view of (39) and the fact that θ −c < 0 for θ ∈ (0, c), it suffices to show for the case that and the case that . Note that for θ ∈ (0, c). In the case of , we have
In the case of , we have
Therefore, we have established the claim that g′(θ) < 0 for θ ∈ (0, c). This implies that is increasing with respect to θ ∈ (0, c). The proof of the lemma is thus completed. □
Lemma 10
Let c ∈ (0, 1). Define and
Then, for θ ∈ (0, c). Moreover, is non-decreasing with respect to θ ∈ (0, c).
Proof
Clearly, 0 < r* < c. Note that θ + η (θ − c) < 0 for θ ∈ (0, r*). Hence, for θ ∈ (0, r*). On the other hand, it can be checked that 0 < η(c − θ) < 3θ for θ ∈ (r*, c). It follows from inequality (9) that
for θ ∈ (r*, c), where
with . It can be verified that ρ(θ) > 0 for θ ∈ (r*, c). Since c ∈ (0, 1) and η ∈ (0, 3), we have for θ ∈ (r*, c). Thus, we have shown that for θ ∈ (0, c).
To show that is non-decreasing with respect to θ ∈ (0, c), consider the derivative of g(θ) with respective to θ. Tedious computation shows that the derivative of g(θ) is given by
| (40) |
Note that for θ ∈ (0, c). It follows that
| (41) |
for and θ ∈ (0, c). Moreover,
| (42) |
for and θ ∈ (0, c). Making use of (41), (41) and (42), we have g′(θ) < 0 for θ ∈ (r*, c). So, we have established that is non-decreasing with respect to θ ∈ (0, c). This completes the proof of the lemma. □
Lemma 11
Assume that a < 0 < b. Define
| (43) |
Then, for any μ ∈ (0, b) provided that n > N.
Proof
For simplicity of notations, define ,
Define functions
for θ ∈ (c, 1). For μ ∈ (0, b), putting , we have c < θ < 1 and
To prove the lemma, it suffices to consider the following three cases.
Case (I): .
Case (II): .
Case (III): .
First, consider Case (I). As a consequence of , we have θ* < p*. Since p* > θ*, it follows from Lemma 1 that is increasing for θ ∈ (0, θ*]. Moreover, according to Lemma 7, we have that is non-increasing for θ ∈ [θ*, 1). It follows that
| (44) |
and that
| (45) |
Observing that q* > θ* and making use of Lemma 2, we have that is non-decreasing for θ ∈ (c, θ*]. According to Lemma 8, we have that is decreasing for θ ∈ [θ*, 1). It follows that
| (46) |
and that
| (47) |
Combining (44), (45), (46) and (47), we have
for θ ∈ (c, 1). Observing that
and that
we have and consequently,
for θ ∈ (c, 1). It follows that
for μ ∈ (0, b). This implies that provided that the corresponding sample size
Next, consider Case (II). As a consequence of , we have q* < c. Clearly, p* < q* < c < θ*. By Lemma 1, is non-increasing for θ ∈ (c, θ*]. By Lemma 7, is non-increasing for θ ∈ [θ*, 1). It follows that
| (48) |
Moreover,
| (49) |
Similarly, according to Lemma 2, is decreasing for θ ∈ (c, θ*]. By Lemma 8, is decreasing for θ ∈ [θ*, 1). It follows that
| (50) |
Moreover,
| (51) |
Making use of (48), (49), (50) and (51), we have
for θ ∈ (c, 1). Observing that
and that
we have and consequently,
for θ ∈ (c, 1). It follows that
for μ ∈ (0, b). This implies that provided that the corresponding sample size
Finally, consider Case (III). In this case, we have . Therefore, provided that
This completes the proof of the lemma. □
Lemma 12
Assume that a < 0 < b. Define
| (52) |
Then, for any μ ∈ (a, 0) provided that n > M.
Proof
For simplicity of notations, define ,
Define functions
for θ ∈ (c, 1). For μ ∈ (0, b), putting , we have c < θ < 1 and
To prove the lemma, it suffices to consider the following three cases.
Case (I): .
Case (II): .
Case (III): .
First, consider Case (I). As a consequence of , we have q* < ϑ*. Since p* < q* < ϑ*, it follows from Lemma 9 that is increasing for θ ∈ (0, ϑ*]. According to Lemma 1, is non-increasing for θ ∈ [ϑ*, c). Hence,
| (53) |
Moreover,
| (54) |
Since q* < ϑ*, it follows from Lemma 10 that is non-decreasing for θ ∈ (0, ϑ*]. From Lemma 2, is decreasing for θ ∈ [ϑ*, c). Hence,
| (55) |
Moreover,
| (56) |
Making use of (53), (54), (55) and (56), we have
for θ ∈ (0, c). Observing that
and that
we have and consequently,
for θ ∈ (0, c). It follows that
for μ ∈ (a, 0). This implies that provided that the corresponding sample size
Case (II). As a consequence of , we have p* > c. Clearly, q* > p* > c > ϑ*. By Lemma 9, is increasing for θ ∈ (0, ϑ*]. By Lemma 1, is increasing for θ ∈ [ϑ*, c). Hence,
| (57) |
Moreover,
| (58) |
Similarly, by Lemma 10, is non-decreasing for θ ∈ (0, ϑ*]. By Lemma 2, is non-decreasing for θ ∈ [ϑ*, c). Hence,
| (59) |
Moreover,
| (60) |
Making use of (57), (58), (59) and (60), we have
for θ ∈ (0, c). Observing that
and that
we have and consequently,
for θ ∈ (0, c). It follows that
for μ ∈ (a, 0). This implies that provided that the corresponding sample size
Finally, consider Case (III). In this case, we have . Therefore, provided that
This completes the proof of the lemma. □
Lemma 13
Let N and M be defined by (43) and (52), respectively. Then,
| (61) |
Proof
To prove the lemma, it suffices to consider two cases as follows.
Case (A):
Case (B):
In Case (A), as a consequence of , we have
and
Therefore, in Case (A), we have
| (62) |
In Case (B), as a consequence of , we have
and
Therefore, in Case (B), we have
| (63) |
Combining (62) and (63), we have
which implies (61). This completes the proof of the lemma. □
Finally, Theorem 4 can be established by making use of Lemmas 11, 12 and 13.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Contributor Information
Dr. Zhengjia Chen, Email: zchen38@emory.edu, Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322.
Dr. Xinjia Chen, Email: xinjia_chen@subr.edu, Department of Electrical Engineering, Southern University at Baton Rouge, LA 70813.
References
- 1.Alamouti SM. A simple transmit diversity technique for wireless communications. IEEE Journal on Selected Areas in Communications. 1998;16(8):1451–1458. [Google Scholar]
- 2.Azuma K. Weighted sums of certain dependent random variables. Tôkuku Math J. 1967;19(3):357–367. [Google Scholar]
- 3.Arellano M, Pakkala S, Langston A, Tighiouart M, Pan L, Chen Z, Heffner LT, Lonial S, Winton E, Khoury HJ. Early clearance of peripheral blood blasts predicts response to induction chemotherapy in acute myeloid leukemia. Cancer. 2012;118(21):5278–5282. doi: 10.1002/cncr.27494. [DOI] [PubMed] [Google Scholar]
- 4.Bentkus V. On Hoeffdings inequalities. The Annals of Probability. 2004;32(2):1650–1673. [Google Scholar]
- 5.Chow SC, Shao J, Wang H. Sample Size Calculations in Clinical Trials. 2nd. Chapman & Hall; 2008. [Google Scholar]
- 6.Doob J. Stochastic Processes. Wiley; 1953. [Google Scholar]
- 7.Desu MM, Raghavarao D. Sample Size Methodology. Academic Press; 1990. [Google Scholar]
- 8.Fishman GS. Monte Carlo – Concepts, Algorithms and Applications. Spring-Verlag; 1996. [Google Scholar]
- 9.Franklin GF, Powell JD, Emami-Naeini A. Feedback Control of Dynamic Systems. Pearson Higher Education, Inc; 2014. [Google Scholar]
- 10.Gajek L, Niemiro W, Pokarowski P. Optimal Monte Carlo integration with fixed relative precision. Journal of Complexity. 2013;29:4–26. [Google Scholar]
- 11.Hampel F. Is statistics too difficult? The Canadian Journal of Statistics. 1998;26:497–513. [Google Scholar]
- 12.Hoeffding W. Probability inequalities for sums of bounded variables. Journal of American Statistical Association. 1963;58:13–29. [Google Scholar]
- 13.Janik M, Hartlage G, Alexopoulos N, Mirzoyev Z, McLean DS, Arepalli CD, Chen Z, Stillman AE, Raggi P. Epicardial adipose tissue volume and coronary artery calcium to predict myocardial ischemia on positron emission tomography-computed tomography studies. Journal of Nuclear Cardiology. 2010;17(5):841–847. doi: 10.1007/s12350-010-9235-1. [DOI] [PubMed] [Google Scholar]
- 14.Khargonekar P, Tikku A. Randomized algorithms for robust control analysis and synthesis have polynomial complexity. Proceedings of the IEEE Conference on Decision and Control. 1996 [Google Scholar]
- 15.Lagoa CM, Barmish BR. Distributionally robust Monte Carlo simulation: a tutorial survey. Proceedings of the IFAC World Congress. 2002 [Google Scholar]
- 16.Larsson E, Stoica P. Space-Time Block Coding For Wireless Communications. Cambridge University Press; UK: 2003. [Google Scholar]
- 17.Massart P. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. The Annals of Probability. 1990;18:1269–1283. [Google Scholar]
- 18.Mitzenmacher M, Upfal E. Probability and Computing. Cambridage University Press; 2005. [Google Scholar]
- 19.Motwani R, Raghavan P. Randomized Algorithms. Cambridge University Press; 1995. [Google Scholar]
- 20.Proakis JG. Digital Communications. Mcgraw-Hill; 2000. [Google Scholar]
- 21.Tempo R, Calafiore G, Dabbene F. Randomized Algorithms for Analysis and Control of Uncertain Systems. Springer; 2005. [Google Scholar]
- 22.Tremba, Calafiore G, Dabbene F, Gryazina E, Polyak BT, Shcherbakov PS, Tempo R. RACT: Randomized Algorithms Control Toolbox for MATLAB. Proc of the IFAC World Congress; Seoul, Korea. July 2008. [Google Scholar]
- 23.Wang D, Müller S, Amin AR, Huang D, Su L, Hu Z, Rahman MA, Nannapaneni S, Koenig L, Chen Z, Tighiouart M, Shin DM, Chen ZG. The pivotal role of integrin β1 in metastasis of head and neck squamous cell carcinoma. Clinical Cancer Research. 2012;18(17):4589–4599. doi: 10.1158/1078-0432.CCR-11-3127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Willams D. Probability with Martingales. Cambriage University Press; 1991. [Google Scholar]
