ABSTRACT
Two recently introduced model-based bias-corrected estimators for proportion of true null hypotheses ( ) under multiple hypotheses testing scenario have been restructured for random observations under a suitable failure model, available for each of the common hypotheses. Based on stochastic ordering, a new motivation behind formulation of some related estimators for is given. The reduction of bias for the model-based estimators are theoretically justified and algorithms for computing the estimators are also presented. The estimators are also used to formulate a popular adaptive multiple testing procedure. Extensive numerical study supports superiority of the bias-corrected estimators. The necessity of the proper distributional assumption for the failure data in the context of the model-based bias-corrected method has been highlighted. A case-study is done with a real-life dataset in connection with reliability and warranty studies to demonstrate the applicability of the procedure, under a non-Gaussian setup. The results obtained are in line with the intuition and experience of the subject expert. An intriguing discussion has been attempted to conclude the article that also indicates the future scope of study.
KEYWORDS: Multiple hypotheses testing, adaptive Benjamini–Hochberg algorithm, mean mileage to failure, p-value
2010 MATHEMATICS SUBJECT CLASSIFICATIONS: 62F99, 62P30, 62N99
1. Introduction
The current work considers a segmented failure dataset, where failure time or some similar entity of a particular component is available for a number of units but the units are operated or tested in different conditions, that may vary over space and time. Thus, the dataset is divided into several segments and the observations are available for each segment. The number of observations per segment (in order of tens or hundreds) might be much less than the number of segments (in order of hundreds or thousands), as the segmentation is done on the basis of time and space among other things. Thus the situation is quite similar to that of microarray datasets where thousands of genes are tested to identify differentially expressed genes based on gene expression levels of two small groups of subjects, viz. treatment group and control group. For segmented failure dataset, similar kind of questions may arise regarding identification of segment(s) for which the failure patterns of that particular component is strikingly different (worse or better) from a benchmark, say the average. To answer this question, appropriate hypotheses for each segment are framed and tested simultaneously. While testing a large number of hypotheses, a control over the false discovery rate (FDR) [1] is desirable and the classical Benjamini–Hochberg algorithm [1] might be employed to achieve it. However, the power of Benjamini–Hochberg algorithm or any general step-up procedure can be improved by incorporating a conservative estimate of the proportion of true null hypotheses ( ) or equivalently the number of the true null hypotheses [2,20].
Gaussian model assumption for failure data is inappropriate, especially when sample size corresponding to each segment or equivalently each test is small. On the contrary, in such situation, exponential distribution may be a reasonable primary model choice. Under such exponential setup, we modify both the estimators proposed in Cheng et al. [6] and Biswas [3] and find that these model-based estimators are more efficient than the existing -estimators in practice. Application of adaptive Benjamini–Hochberg procedure has the ability to list the significantly different segments with respect to such time to event or equivalent entity of a certain component in our case study. For microarray datasets, the model-based approach is well established, especially under normality assumption [3,6]. In this article, we adapt and implement the same under the exponential model to a segmented failure data, where such model assumption is appropriate and an alternative model formulation may not be satisfactory to the desired extent. In what follows we introduce the parameter through the empirical Bayesian setup given in Storey [22].
Consider m similar but independent hypotheses are to be tested, viz. . For , the ith null hypothesis is true and for , false for any . Thus, 's are Bernoulli random variables with success probability . Let be the number of true null hypotheses. Thus, is a binomial random variable with index m and parameter . Clearly, 's and hence remain latent and can never be realized in a given multiple testing scenario. As in case of single hypothesis testing problem, the test statistics , respectively, for may be observed. For being the common distribution of and being the same for , a two-component mixture model for is
| (1) |
Thus, may be thought of as the mixing proportion of the null test statistics with the non-null test statistics when multiple tests are performed. In existing literature p-values are considered as test statistics since its use ensures similar nature of critical region, irrespective of the nature of hypotheses framed. Usually, a little abuse of notation is made while denoting p-value by p irrespective of whether it is a random variable or a realization on that. The distinction of usage ought to be understood as the situation demands. The marginal density function of p-value [13] is
| (2) |
where and are two p-value densities, respectively, under the null and alternative hypotheses. When the tested null is simple and the corresponding test statistic is absolutely continuous, is simply 1, the density function of a uniform random variable over and the p-value under the alternative hypothesis is stochastically smaller than the uniform variate. In addition, the density estimation-based approaches for estimating impose certain restrictions on [9,13,17]. Often p-values under the alternative are modelled by parametric distributions [15,19] and is estimated using maximum likelihood methods. This requires the p-values to be independent among themselves which is rarely satisfied. Storey's estimator [22] is constructed on the basis of a tuning parameter such that, for . This assumption introduces a conservative bias in the estimator that can be corrected or in practice, can be reduced as have been discussed in Cheng et al. [6]. The setup given therein for the applicability of the Gaussian model-based bias correction is discussed in Section 2. Biswas [3] has recently proposed an alternative model-based bias-corrected estimator for under the same setup. A comparative performance study of both the estimators with simulated microarray datasets has also been provided. There are several other works on estimation of , not directly related to the current work. The interested readers are referred to Storey and Tibshirani [23], Wang et al. [26] and Tong et al. [25].
The remaining part of the article is structured as follows. In Section 2, we reproduce Storey's estimator and the recently introduced bias-corrected estimators from stochastic ordering approach which ties them in a yarn and may inspire further works in similar line. The next section is devoted to different testing scenarios and useful properties of respective non-null p-values. In Section 4, we briefly revisit the estimation algorithms and discuss adaptation of the estimates to Benjamini–Hochberg algorithm. Section 5 deals with performance comparison of the new estimators with existing ones through extensive simulation experiment. In Section 6, a real-life synthetic segmented failure dataset is presented, has been validated for applicability of the proposed methods, analysed to demonstrate the superior performance of adaptive algorithm with the new estimators along with proper justification of the findings. We conclude the article with a mention of a few limitations of the present work and a glimpse of the future direction of the study.
2. Methods of estimation
Let p denote a p-value corresponding to a simple null hypothesis testing problem with continuous test statistic. Thus, p has the support . Consider another random variable V on the same support with the distribution function G. Then,
| (3) |
In the following subsections, we take different choices for G and motivate different estimators for as mentioned in Section 1.
2.1. Storey's bootstrap estimator and related approaches
Consider V to be degenerate at some . Thus,
| (4) |
Putting (2) and (4) in (3), we obtain
| (5) |
where F is the distribution function of p, and Q is the survival function of non-null p-value. Assume,
A1: For an appropriate choice of λ, i.e the probability of non-null p-value being greater than λ equals zero [22].
When parameter of interest under alternative hypothesis is substantially far from the same specified under null hypothesis or sample size is moderate to large, p-value tends to be smaller for consistent tests. Hence, even for moderate choice of λ, the probability of p-value under false null dominating λ, vanishes. This is a reasonable but crucial assumption in a sense that, violation of assumptions regarding the true value of the parameter of interest and sample size may not result in . Thus, applying A1 in (5) we get
| (6) |
Let be the p-values corresponding to the m hypotheses tested or equivalently m realizations on p. Denote (I denoting the indicator function) to be the number of p-values greater than λ. Putting the plug-in estimator of , i.e in (6), an estimator for depending upon the choice of λ may be suggested as
| (7) |
For a given dataset, two different choices of λ would yield two different estimates and thus an optimum choice of λ for a given dataset is necessary. For a subjectively chosen set with possible values of , where ; a bootstrap routine is given in Storey [22] and Storey et al. [24] to approximate the best λ. Thus, Storey's bootstrap estimator is: . In Storey and Tibshirani [23], natural cubic spline has been fitted to the curve for smoothing and the evaluated value of the fit at (as motivated in Corollary 1 of [22]) is taken as the final estimate which we denote by .
For a small choice of λ in , the bias of the estimator is large while the variance is small. The situation is exactly opposite for large λ. It has been first noted by Jiang and Doerge [12] and they have suggested the use of multiple λ's instead of a single best choice, in some sense. For the time being assume a fixed set for a fixed k and equal width given by for such that A1 holds. Then, the average estimate based approach suggests to be an appropriate estimator for . The authors have also suggested a change-point based algorithm to select .
2.2. Bias correction of Storey's estimator
Without the assumption A1, from (5), we get
| (8) |
for fixed λ. Cheng et al. [6] obtained (8) from a somewhat different motivation. Substituting plug-in estimator of has already been discussed in Section 2.1. For estimating following assumptions are necessary.
A2: The availability of a common test for all the m hypotheses.
A3: The data-arrays used for each test are generated from a known parametric family.
A4: The closed form distribution of the test-statistics under the null are of a known family, enabling the calculation of the exact p-values.
A5: The distribution of the non-null test-statistics and hence the non-null p-values are labelled by unknown effect sizes, which are different for each test.
A2 is generally true for microarray experiments and is also appropriate for the present setup. Cheng et al. [6] assumed normality for each expression level, such that A3 is valid. Time to events or its equivalent entities for each segment are assumed to be exponentially distributed, thus satisfying A3. Under normality, the test-statistics for usual single-sample or two-sample tests for the mean are normal under null. In this work, the test-statistics for single-sample test related to the exponential rate parameter is a variate under null and for a two-sample problem the test-statistic is distributed as a F variate. Thus, A4 also holds good. As mentioned earlier test for is performed by and we introduce the notation to denote the effect size of the corresponding test. Non-null distribution of and hence the non-null distribution of is to be labelled by , . Hung et al. [11] have discussed properties of non-null p-values, where non-null distribution of the p-value for Z-test has been explored. For single sample and two sample t-test, similar discussion is available in Section 3 of Cheng et al. [6]. We will discuss such properties of non-null p-value for single and two sample problems under exponential setup inSection 3.
Let . Also let denote the set of indices corresponding to the originally true null hypotheses i.e, . Thus, the cardinality of , is . Similarly denote the set of originally false null hypotheses by . Clearly, with cardinality . Each null p has the same distribution, uniform over ; while the distribution of non-null p-values are different but they belong to the same family. Let denote the distribution of p with effect size δ. Then for all , , probability of ith non-null p-value being greater than λ. Define, , the average of non-null p-values greater than λ. To estimate , individual 's are estimated by , . In fact, 's are strongly consistent for for each . The estimation of δ under different testing problem is discussed in Section 3. Each is continuous in and thus, is strongly consistent for . Thus, a strongly consistent estimator for is . In practice, is unknown and hence is unavailable. Assume to be a dummy for such that with probability 1. The computation of is discussed in detail in Section 4. Substituting the plug-in estimators for and in (8), we get (or ), bias-corrected estimator for with fixed choice of λ. We now address the issues related to reduction in bias and over-correction in the followingresult.
Result 2.1
With the setup and notations introduced in Section 2.2, for all
Result 2.1 combines claims written in Sections 2 and Section 4.2 of Cheng et al. [6]. We have been able to prove Result 2.1 in a more direct way. Thus, the approach reduces conservative bias of Storey's primary estimator while refraining from over-correction.
The situations and are quite usual in multiple testing setup as the first one is a reasonable estimate of and is a consistent estimate of , which is obviously less than . If these do not hold good, lies outside the parameter space and then we take the estimate to be the nearest boundary point.
is taken as in Jiang and Doerge [12] for similar purpose (see Section 2.2 in [6]) and we identify the following estimator as the bias and variance reduced estimator for :
where denotes cardinality of Λ.
2.3. Estimator based on sum of all p-values
Instead of taking V degenerated at some fixed λ, assume . Putting for in (3), we get
| (9) |
since . In (9), we use e to denote expectation of non-null p-value: . From (9), we get
| (10) |
To estimate , both and e are to be estimated. can be estimated by the mean of observed p-values: . Define . The average of expected p-values under the alternative, e can be estimated imitating the approach of estimating with assumptions A2-A5. The corresponding estimator for has recently been introduced in Biswas [3] and computation of has been demonstrated for single and two sample t-tests therein. Since each is bounded and continuous in , following the discussion in Section 2.2, a strongly consistent estimator for e is , which cannot be realized in practice for obvious reason mentioned earlier and hence cannot be implemented. For being a dummy of with almost surely, an workable estimator for is .
Result 2.2
With the setup and notations introduced in Section 2.3
The situations and are very natural in multiple testing setup as is consistent for , which is less than 0.5 and similarly is consistent for e which is also less than 0.5. If theses do not hold good, lies outside the parameter space and then we take the estimator for as
Both the model-based bias-corrected estimators are shown to have conservative bias for estimating . In Cheng et al. [6], has been shown to outperform the robust estimators under reasonable model assumption, whereas under similar situation outperforms, it in terms of mean square error, as empirically studied in Biswas [3] through extensive simulation study. Note that, both the estimators use an initial estimator for but the computation of does not require flexible threshold tuning parameters owing to the fact that it uses all the p-values. To rule out the possibility of estimates taking value outside the parameter space under very unusual situation, is taken to be equal to the nearest boundary point when it lies outside the parameter space. Proof of the results presented in this section are provided in the Appendix.
3. Properties of non-null p-values
To implement the bias-corrected estimators and , appropriate estimates of the unknown quantities and e are needed. To get explicit expressions for these quantities, we need to have the probability density functions (for notational convenience we write this to be henceforth) for each non-null p-value with effect size , . The subscript i in effect sizes are not specified in this section for ease of notation. Thus, for different testing scenarios, we determine the probability density function , then by integrating from λ to 1 and finally obtain through the following results. As discussed in Section 2.2, for fixed λ and are continuous in δ under each of the testing problems considered here.
Result 3.1
Assume be a random sample of size n from an exponential distribution with mean θ. Consider the following testing problem:
(11) For the corresponding likelihood ratio test
Here , and denote the probability density function, the distribution function and the upper-p point of chi-square distribution with ν degrees of freedom, respectively.
Result 3.2
For to be a random sample from exponential distribution with mean θ, consider the testing problem:
(12) For the corresponding likelihood ratio test
The notations used for stating Result 3.1 also remain relevant here. In addition to that, denotes the truncated chi-squared distribution with degree of freedom ν and region of truncation being . Here μ denotes the median of distribution.
Result 3.3
Let and be two random samples of size and , respectively, from exponential distribution with mean and . Consider the testing problem
(13) For the corresponding likelihood ratio test, we have the following.
Here , and , respectively, denote the probability density function, the distribution function and the upper-p point of F distribution with and degrees of freedom.
Result 3.4
Consider and to be independent random samples of size and from exponential distributions with mean and , respectively. Consider the testing problem
(14) For the corresponding likelihood ratio test, we have
The notations used for Result 5 also remain relevant here. In addition to that, denotes the truncated F-distribution with degrees of freedom , and region of truncation . Here μ denotes the median of distribution. Interested readers may find proof of the results in the Appendix.
4. Algorithms
Algorithm for computing under normal model assumption is given in in Cheng et al. [6] and for under same setup, see Biswas [3]. First, we reframe the algorithms under current setup to maintain readability and for making the proposed estimation methods readily available to the practitioners. For the sake of brevity we only consider the testing problem in Result 3.2 and use the corresponding non-null p-value properties here. For all the four situations discussed here, the following algorithms can be implemented with obvious modifications.
Algorithm 4.1 For computing —
For all , estimate by .
For all and for each ; estimate the upper tail probability by given bywhere, denotes the available sample size for testing ith hypothesis.
Using an available estimator of as initial estimator , calculate , where denotes the usual box function. Arrange 's in increasing order and denote the ith quantity in the list as . Thus a conservative estimator for is
Given , calculate
Algorithm 4.2 For computing —
For all , estimate by .
For all , estimate the mean of non-null p-value bywhere denotes the available sample size for testing ith hypothesis and denotes the median of distribution.
Using an initial estimator of as initial estimator , calculate , as before. Arrange 's in increasing order and denote the ith quantity in the list as . Thus a conservative estimator for e is
Given , calculate
Note 1: The role of is important in obtaining and . For , observe that . Clearly, and . Thus, and .
Note 2: For implementation of both the algorithms, we choose Storey's bootstrap estimator, given by as the initial estimator. This choice seems reasonable albeit being non-universal and further research on this is warranted. The algorithms could also be implemented with other choices of . The performance analysis of the bias-corrected estimators under the current setup requires extensive simulation study, starting with different choices of the initial estimator. In fact, the algorithms could in principle be done several times, each time with the estimate of from the previous iteration. Obviously, this technique will become computation intensive for all practical purposes. We refrain from addressing these issues, as they are beyond the scope of the currentstudy.
It has already been mentioned in Section 1 that Benjamini–Hochberg procedure for controlling the FDR is conservative. To understand this, we briefly discuss FDR and the algorithm for controlling it at a prefixed level . While testing m hypotheses simultaneously, let R be the total number of rejected hypotheses by application of certain multiple testing algorithm. From the entire set of rejected hypotheses, some hypotheses may be originally true. These are categorized as false discovery and let V denote the total number of such false discoveries. Then the false discovery proportion (FDP) is defined as
Note that, prior to the application of any algorithm both V and R are random variables and the expected value of FDP is termed as FDR. Let be the ordered sequence of the available p-values. Benjamini–Hochberg procedure identifies the largest k such that and rejects all hypotheses with corresponding p-value less than along with the hypothesis with p-value . This procedure is conservative, as the implementation of the same ensures where, . To overcome this shortcoming, Craiu and Sun [7] worked with the following adaptive Benjamini–Hochberg procedure which uses an approximation of .
Algorithm 4.3 Implementing adaptive BH procedure to control FDR at level q —
Let the p-value corresponding to the problem of testing be for . Arrange the available p-values in increasing order: . Denote the corresponding hypotheses by .
Given the dataset, estimate . Let it be .
Compute the adjusted p-value corresponding to :
For all , reject if .
Both adaptive BH procedure and Storey's q-value approach are justified to be equivalent in Craiu and Sun [7]. They have also emphasized that both the approaches require a good approximation of . Less conservative estimators for are in demand since closer approximation of will bring superiority in the adaptive procedure by increasing the number of rejections while controlling FDR at level q, as evident from Algorithm 4.3. In numerical study, we use adjust.p( ) function (available in cp4p library from Bioconductor) by Gianetto et al. [8] for obtaining the adjusted p-values.
5. Simulation study
We have conducted an extensive simulation study to investigate the performance of the bias-corrected estimators under different settings. The well-known and established estimators apart from the proposed and , considered for performance comparison are listed below.
: Storey's bootstrap estimator (discussed in Section 2.1)
: Convest estimator [13]
: Jiang and Doerge's average estimator (discussed in Section 2.1)
: Natural cubic spline smoothing-based estimator (discussed in Section 2.1)
: Histogram-based estimator [16]
: A robust estimator of [18]
: Sliding linear model based model-based estimator (Wang et al. [26]).
5.1. Simulation setting
We imitate a segmented time to event dataset to generate artificial datasets. For this purpose, we choose m = 100, 500, 1000 segments. Two different settings are considered, balanced setting with sample size for each segment being equal to 35 and unbalanced setting with different sample sizes 15, 25, 35, 45, 55 for equal number of segments. For fixed , calculate and take . We set the mean failure time under null as ‘unity’. For randomly chosen numbers from we fix and for the remaining cells in the array θ, we generate values through some stochastic mechanism ensuring that they are not equal to . For segments with better average lifetime and for segments with poor average lifetime . We take the proportion of better (or poor) non-null mean lifetimes to be 0.5.
After generating the array of parameter θ, we generate a sample of size from the exponential distribution with mean for all . Thus the dataset with m rows and varying number of columns is generated where each row correspond to observations from a particular segment and out of them (fixed by the choice of ) segments originally have mean lifetime equal to . From each row of the dataset we obtain p-value by applying appropriate test and construct a p-value array of length m to compute the bias-corrected estimators from Algorithms 4.1 and 4.2. The other estimators are computed using estim.pi0 R-function (available in cp4p library). Algorithm 4.3 also uses this array of p-values and an estimate of to identify the significantly different segments with control over FDR.
5.2. Simulation results
Under different settings mentioned in Section 5.1, each experiment is repeated N = 1000 times and the estimators are compared through and . We also validate whether the adaptive BH algorithm using different estimators are conservative or not by simulating FDR values as a function of for the adaptive algorithms. Here we identify power of an multiple testing algorithm as the proportion of rejected nulls among the hypotheses in . The non-adaptive BH algorithm and its different adaptive versions are also compared with respect to power. The comparative study for m = 100 under balanced setting is provided in Figure 1. The results for other simulation settings are provided in Figures 1–5 of the supplementary material. From Figure 1, it can be pointed out that the bias-corrected estimators beat other estimators over a significant region of the parameter space (for ) while performs slightly better than . Thus, their performance may be considered to be approximately equivalent. Thus using the bias-corrected estimators for small to moderate values of brings significant improvement while for larger values of , it remains a viable alternative. For MSE, similar comments may be made. Additionally, we point out that really reduces the bias of for a significant portion of the parameter space. As expected, the mean squared error for different estimators increase with increasing m/n ratio, while relative performance of the proposed bias-corrected estimators gets better when the same ratio increases. However, the gain from improved estimation of needs to be elaborated. Precise estimation of is used to apply adaptive algorithm for identifying significant segments as mentioned in Section 4. For lower to moderate values of , the adaptive versions result in substantial gain in power. Percentage relative gains in power of -adaptive BH over non-adaptive are 41%, 27%, 17% for =0.2, 0.4, 0.6, correspondingly. Marginal gain is observed for larger values of (8% for =0.8). It is evident that the bias-corrected estimators outperform the others for lower to moderate values of , where it really matters as pointed out from Figure 1. For higher values of , effect of bias correction is there but in a lesser extent. When almost all the null hypotheses are true, and e are close to 0. Thus the bias correction does not work as effectively as it does for lower to moderate values of . FDR of all the adaptive BH algorithms are seen to be controlled below 0.1, while non-adaptive BH is the most conservative and the -adaptive BH is the least conservative one. Similar conclusions can be made from results of the other simulation settings reported in the supplementary material.
Figure 1.
Bias and mean squared error of the estimators (two dashed line), (long-dashed line), (dot-dashed line), (solid line), (dotted line), (dashed line), ( -marked line), (o-marked line), (Δ-marked line) and comparison of false discovery rate and power between ordinary BH (+-marked line) and the adaptive BH with the specified estimators for m = 100 under balanced setting.
6. Data analysis
For the case-study we have considered the real-life data (with proper camouflaging, after taking care of the data confidentiality issue) used by Gupta et al. [10] in connection with reliability and warranty studies. The detailed description of the data is available there and we report only the relevant part of it, which is required in the present study. The date of failure of a particular component of automobiles along with the mileage at failure as reflected through the odometer readings are available. Although the entire data set cover two disjoint geographical regions, as reported in Gupta et al. [10], they may be further subdivided into failures corresponding to seven sub-regions, termed as zones. Owing to data confidentiality issue, let us number them from 1 to 7. We have considered failure data corresponding to a particular year and the mileage figures of the failures in successive calendar months across the zones as the response variable. The twelve calendar months are recorded as JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV and DEC. Thus the entire dataset is inherently segmented by 7 different zones and 12 different months in a year. In other words, our synthetic dataset contains mileage at failure for 84 different month×zone segments. In total, little less than 3000 component failures have been reported in the year considered with varying number of warranty claims over the month×zone segments. Thus on an average around 35 failures are reported in each segment. In line with the discussion in Section 1, here we are primarily interested in identifying the segments which have significantly better or poor performance in terms of mileage at failure in comparison to a bench mark. Thus, appropriate hypotheses are needed to be formed and tested separately for each of the segments making way for application of adaptive FDR-controlling algorithms.
To validate our model-assumption we perform Kolmogorov-Smirnov's test with empirical p-value for exponential distribution (using R-function ks.exp.test available in exptest package) and find that, at level 0.05 exponentiality fails for only 18 out of 84 segments whereas at level 0.01 only 7 rejections are there. QQ-plot of some randomly chosen segments are given in Figure 6 of the supplementary material. As the sample sizes for most of the segments are moderate, we also check normality applying Shapiro-Wilks' test (using default R-function shapiro.test). At level 0.05, 59 out of 84 hypotheses gets rejected and at level 0.01, the number is 42. The first line of information justifies applicability of the model-based estimators for discussed in this article whereas the results from the normality test demonstrate the necessity of the modifications achieved through this work over the existing related works, usually done under normal model.
Now we consider framing of the appropriate hypotheses. We assume that the mileages at failure for the ith segment to be exponentially distributed with mean miles. Thus, 's are the mean mileage to failure (MMTF) for the ith. segment, a quantity similar to mean time to failure (MTTF) in terms of the response variable ‘mileage’, for . We consider, as an indicator of the bench mark, the MMTF of the entire dataset as our null hypothesis point, approximately given by miles. This value as an benchmark seems to be justified, as the warranty mileage limit for the data base is 36, 000 miles and it is well known that such failure data are usually positively skewed. According to the research question we then simultaneously test the following hypotheses:
| (15) |
The two-sided choice of the alternative hypotheses at all the segments needs clarification. In the absence of any prior knowledge about the functioning of the component, it is not possible to mark any segment to be better/worse than the overall benchmark in terms of MMTF. As a result, to be on the safe side, we have suggested the alternative hypotheses at all segments to be two-sided. This is very common in multiple testing situations. As an example, in microarray data analysis, the samples used as a reference are called control samples. The other samples exhibiting different phenotypic status are called treated samples. The gene expression levels among these groups may be different. To identify whether a gene is differentially expressed or not, we fix two-sided alternative [5].
Likelihood ratio tests are performed for each of the hypotheses after scaling the original observations by , maintaining equivalence of the test and the corresponding p-values along with effect sizes for each test are stored for further use. Table 1 of supplementary material provides details under the following heads:
segment: This column provides serial number of the segment, 1 to 84 such that segment i is for ith month of zone 1 for , segment 12 + i is for ith month of zone 2 for and so on for the 7 zones in order as mentioned in the first paragraph of this section.
n: Provides available sample size for each segment.
pval: Provides the obtained p-value corresponding to common likelihood ratio test performed for each segment.
del: Provides maximum likelihood estimates of the effect sizes corresponding to each test.
These array of values can be readily fed into Algorithms 4.1–4.3 to get , and the list of rejected hypotheses when adaptive FDR-controlling algorithm is applied with different -estimates. Estimate of along with the corresponding list of rejected hypotheses using the estimators already mentioned in this article are also reported. The estimated -values using different estimators are reported in the second column of Table 1.
Table 1.
Estimate of using different estimators under different model assumptions for the synthetic data.
| Estimators | Exponential model | Normal model |
|---|---|---|
| 0.5555 | 0.4961 | |
| 0.5565 | 0.4559 | |
| 0.5761 | 0.5111 | |
| 0.6074 | 0.5707 | |
| 0.5555 | 0.4497 | |
| 0.6662 | 0.5730 | |
| 0.8065 | 0.6786 | |
| 0.4842 | 0.0000 | |
| 0.5096 | 0.0000 |
In Table 2, we indicate the segments that are found to be significantly different from the average in terms of the mean mileage at of the designated component failure when adaptive BH-algorithm with different -estimators and non-adaptive(N/A) BH-algorithm are applied to control FDR at level q = 0.05, 0.1. For visual display, we plot the adjusted p-values for non-adaptive Benjamini–Hochberg algorithm and its adaptive version using with cut-off q = 0.1 (see Figure 7 of the supplementary material). From Table 2, it is evident that, the adaptive BH-algorithm using the proposed methods has the ability to identify a larger number of segments with significant variation from benchmark by controlling the FDR at the same level, compared to the non-adaptive BH-algorithm as well as adaptive BH-algorithm using existent estimators for .
Table 2.
Significantly different segments identified by adaptive-BH algorithm with different estimators for for the synthetic data.
| Segment | Zone | Month | mean: | 95%CI(θ) | N/A | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 16 | 2 | APR | 1.44 | (1.12, 1.93) | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 18 | 2 | JUN | 1.29 | (1.01, 1.70) | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 |
| 20 | 2 | AUG | 1.47 | (1.11, 2.04) | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 0 |
| 21 | 2 | SEP | 1.33 | (1.01,1.82) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 23 | 2 | NOV | 1.69 | (1.30, 2.29) | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 38 | 4 | MAR | 0.55 | (0.37, 0.92) | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
| 39 | 4 | APR | 0.50 | (0.34, 0.79) | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 40 | 4 | MAY | 0.58 | (0.40, 0.92) | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
| 41 | 4 | JUN | 0.63 | (0.44, 0.95) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 46 | 4 | OCT | 0.64 | (0.45, 0.98) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 52 | 5 | APR | 1.77 | (1.10, 3.33) | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |
| 56 | 5 | AUG | 1.82 | (1.14, 3.32) | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 1 |
| 62 | 6 | FEB | 0.66 | (0.53, 0.84) | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 63 | 6 | MAR | 0.60 | (0.48, 0.78) | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 64 | 6 | APR | 1.27 | (1.03, 1.61) | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
| 69 | 6 | SEP | 0.67 | (0.54, 0.85) | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 74 | 7 | FEB | 0.20 | (0.11, 0.51) | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| 76 | 7 | APR | 0.48 | (0.20, 0.48) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Notes: Values in fourth and fifth columns are reported after scaling the original variable by . For columns 6 to 15, 2 indicates that the segments are found significant for q = 0.05 and trivially for q = 0.1 both while 1 indicates same only for q = 0.1 and 0 indicates negation of the previous two statements. The segments not reported in this table are not found to be significantly different from the the overall average, taken as the null hypothesis point.
From the domain knowledge (not to be mentioned explicitly, owing to confidentiality issue), it is known that the functioning of the automobile component under consideration is likely to be influenced by the climate condition, reflected through the effect of the month, as well as by the effect of the zone of their usual operation. The effect of climate on the functioning of the automobiles is well known and has also been reported in Lawless [14]. For simplicity and demonstration purpose, we assume that each automobile is used only in the designated zone where the failure is reported. Although, we have used the two-sided alternative, as being done in any multiple testing problem, the corresponding confidence interval falling entirely below or above of the scaled null hypothesis point of ‘unity’, indicates the actual one-sided alternative for which the respective significance appears. Thus, the MMTFs of zone 4 are consistently and significantly smaller than the benchmark value (the null hypothesis point) indicating usage related adverse problem of the automobiles and this problem is persistent in the first or second quarter of the year indicating a transition from colder to warmer climate or the fourth quarter of the year indicating the transition from warmer to colder climate. Interestingly for zone 2, exactly the converse situation is prevailed and this seemingly high MMTF might not be due to the climate condition and on the contrary may be attributed to better usage scenario. For zone 5, better usage scenario is evident at least in two months, although weather related issues might not be associated with such improvement. The findings for zone 6 is heavily dependent on climate condition especially during the advent of spring where a significant decrease in MMTF is identified followed by significant increase in MMTF just after. Again during the fall a significant decrease in MMTF is found establishing the climate dependence of failure data. For zone 7, climate plays an adverse role during the end of the winter and start of the summer. The data corresponding to remaining two zones, do not reveal any deviation from the usual usage pattern and/or are not affected by extreme climate conditions. It is to be noted that for almost all the zones, the month of April becomes significant concerning either betterment or worsening of the scenario in comparison to the benchmark. On the other hand the two months, viz. December and January never become markedly different from the benchmark at all the locations. It might be attributed to the fact that in the winter, the relatively colder temperature does not affect all the zones, while a transition in temperature, as observed in April, may play a decisive role in operating conditions in almost all the zones. Zones 1 and 3 never figure in the list and no marked deviation from the benchmark in any climate condition (non-rejection of the null hypothesis at all seasons) is observed. This homogeneity might be attributed to the fact that these two zones correspond to a relatively warmer climate and hence climate dependence on the operating conditions are not present here. Although, we have to suppress the zone identity for confidentiality issue, the findings are as corroborated by the domain knowledge experts.
To conclude this section, we emphasize the appropriateness of the model-based bias-correction approach. We try to explore a ‘what-if’ type scenario and try to assess the validity of the findings if we assume the mileages at failure in each segment to be normally distributed, instead of the exponential assumption. Our main objective remains same, i.e to identify the significantly different segments with respect to MMTF values. If we assume that the mileages at failure for the ith segment be normally distributed with mean and variance , the testing problem is still the same as in (15). We perform single sample both-sided t-test for each of the segments and obtain the array of p-values over all the segments. Computation of robust estimates may be done similarly as mentioned, but for the bias-corrected estimators we follow algorithms given in Cheng et al. [6] (for ) and Biswas [3] (for ) instead of Algorithms 4.1 and 4.2 for obvious reasons. The estimates of under normality assumption are reported in the third column of Table 1. The robust estimators are seen to underestimate . When the rates of exponential distributions are considered as the means of normal distributions, the sample means being the estimators under both the model assumptions overestimate the normal means. Since the overestimation of normal means makes the null hypotheses appear false, the observed p-values are less compared to those under the exponential case. The robust estimators of are non-decreasing functions of p-values and hence the underestimation of . The bias-corrected estimators get disrupted owing to the inappropriate model assumption and hence misleading effect size of test, upper tail probability and expectation of non-null p-values. The problem of overestimation of normal means transcends to overestimation of effect sizes for the single sample t-tests. The inflated estimates of effect sizes result in large estimates of and e. As a result, the numerator in and usually turns out to be zero or negative. Hence both the bias-corrected estimates in Table 1 are zero. Thus, appropriate model-based bias-correction seems to be appropriate and efficient by bringing out more power in adaptive algorithms, while the findings may be misleading when not applied with adequate confidence on underlying model assumption. As a result, the necessary modification of bias-correction technique under exponential model seems to be the only way out, particularly while dealing with multiple testing problem arising from segmented failure data, usually encountered in survival and reliability studies.
7. Discussion
We have approached the problem of estimating and thus construction of adaptive FDR-controlling procedure from suitable model assumptions and a common test for all the hypotheses to be tested. Within the framework suggested in Cheng et al. [6] and Biswas [3], we have tried to develop methods for estimation of under exponential model and presented a simple adaptive Benjamini–Hochberg algorithm in a spirit similar to Craiu and Sun [7], which is shown to be more efficient than its counterparts for simulated as well as real-life synthetic data. The current work also motivates the Storey's bootstrap estimator for and the -estimator based on sum of all p-values based on . The cases of V being degenerated at some λ and V being uniformly distributed over have also been discussed. This may motivate other choices of V for further study of model-based estimators. The study on V having negatively skewed density function over is presently under consideration, which tries to give more importance to the p-values corresponding to true null hypotheses and the construction of new estimators in future. In the current work, it has been assumed that the p-values corresponding to the true null hypotheses are uniformly distributed. However, if there are composite null hypotheses as in one-sided hypothesis testing scenarios, p-values corresponding to the true null hypotheses are stochastically larger than the uniform variate. Superuniform p-values make the proposed estimators conservative due to the increased value of and . The results and methods proposed in this work do not address the issues related to superuniformity of null p-values. Though the results presented in the current work strengthens the foundations of bias-corrected estimation of in general, the distinguishing feature of this work lies on the innovative application of multiple testing procedure to segmented failure data. To the best of our knowledge, such procedures have never been applied to answer such interesting research questions framed in Section 6 related to large-scale industrial data. In this work, however, we have focused on presenting and motivating a simple yet powerful technique of identifying significantly different segments in terms of the performance of automobile and exploring the effect of zone of operation coupled with climate, that too under exponential model assumption. The synthetic data explored in this work pose several other issues that may be solved by the application of modified methods, which are to be formulated in future.
This analysis of the real-life synthetic data is based on one year data and may be carried out on the basis of monthly or even weekly data associated with the component failures. Owing to the limited number of such failure data in each segment, one has to use the standard failure models like exponential or Weibull. Instead, if one uses the usual Gaussian model to describe the failure pattern, then one is expected to commit a gross mistake and consequently, a false perception on the MMTF may be reached. This issue has been addressed with the same failure data. Instead of the exponential model, the normal probability model has been used and the test for equality of respective means in all 84 segments with the same null hypothesis point representing the bench mark, as being done in the usual multiple testing procedure, has been attempted. Interestingly, the test for normality at majority of the 84 segments fails miserably and hence conclusion on the basis of the test for MMTF with reference to the benchmark under normality will give a wrong signal about the true status of MMTF in those segments.
This work only focuses on controlling FDR by adaptive Benjamini–Hochberg algorithms with two new estimators for the proportion of true null hypotheses. It would be interesting to study control of family wise error rate (FWER) by adaptive procedures [21] with the -estimators discussed here. We have demonstrated that the new -estimators devise conservative procedures through simulation experiments. The proof is also done in asymptotic setting. Still, it is desirable to prove the same under finite sample considerations and future research on this aspect is warranted. Sarkar [20] and Blanchard and Roquain [4] provide sufficient conditions on for proving control over FDR by the corresponding adaptive algorithms. Almost all the recently proposed estimators for including the two taken up in this work lack such structural simplicity and hence the desired result can only be verified through finite sample simulation experiments [3,5,24,26]. The estimators taken up in this work need an initial estimate of . The current work does not focus on a simulation-based choice of the initial estimate. However, the choices are not expected to be universal and in this regard one may follow the routine presented in Biswas [3] for identifying the working initial estimator. The choice of initial estimate in the current work is justified by Cheng et al. [6] and Biswas [3]. As the proposed method makes assumption regarding the distribution of the mileage to failure data, we should accept the fact that, the proposed estimator is not universally suitable in all situations. At the same time, multiple testing problem in a non-Gaussian framework seems to be novel and may cover all parametric models for scenarios where non-negative valued random variables seem to be appropriate. In such a framework, we have introduced two simple estimators for which simultaneously reduces the bias and variance of the existing estimator over a relatively important part of the parameter space. The behaviour of such estimator is studied through extensive simulation studies and the new estimator is shown to be more precise under some practical assumptions in comparison to those available in the existing literature. Involvement of numerical or Monte-Carlo integration for each segment makes the proposed method rather computation intensive. This extra labour is expected to be compensated by the gain in precision of the analysis, thus meaningfully addressing the multiple testing problem in a non-Gaussian setup.
Supplementary Material
Acknowledgments
The authors would like to acknowledge the editor, associate editor and the anonymous reviewers for their suggestions and comments that led to the current improved version. The authors would also like to acknowledge Prof. Sanat K. Sarkar of Temple University for helpful discussion and useful suggestions. The third author acknowledges the contribution of Mr Soumen De, formerly of General Motors Tech Center India (GMTCI), Bengaluru, India, for his association with the previous work based on the synthetic data, used here.
Appendix.
Proof Proof of Result 2.1 —
Consider,
Note that, g is non-increasing in x for . Let and . Since, , , which proves . Now,
as . Thus,
Note that, as . Hence as .
Proof Proof of Result 2.2 —
We consider g as in Result 2.1 and assume , b = 0.5. Since, , , which proves . Here,
as . Thus,
Note that, as . Hence, as .
Proof Proof of Result 3.1 —
The likelihood ratio test corresponding to the hypothesis in Result 2.1 uses the test-statistic . Effect size of the test . As , an unbiased estimator of δ is , the sample mean.
As we reject for larger observed value of T, the corresponding p-value is defined as since under , . Therefore, , under . Under , and therefore the density function of T, labelled by δ is
(A1) From the relation between T and p, . The corresponding absolute Jacobian of transformation is . Thus from (A1), the density function of p labelled by δ is
(A2) For upper tail probability labelled by δ, using (A2) in expression of we get
By change of variable from p to v such that we get
which proves the result in . For an explicit expression for expected p-value under the false null, we apply
By change of variable from p to v such that , we get
Proof Proof of Result 3.2 —
The corresponding likelihood ratio test uses the same test-statistic as in Result 3.1 and thus part of Result 3.2 follows directly from part of Result 3.1. For the next part, it should be noted that due to two-sided alternative hypotheses, the corresponding p-value is defined through T, where
Here is the p-value defined for the testing problem in Result 2.1. Thus from part of Result 3.1, we have
(A3) Now for any ,
which proves the result in .
Now, we consider the problem of evaluating the integral . By change of variable from p to v such that , we get
Following the same steps for evaluating , can also be evaluated and thus the result in .
Proof Proof of Result 3.3 —
The likelihood ratio test corresponding to the hypothesis in Result 3.3 uses the test-statistic . Effect size of the test . As , an unbiased estimator of δ is and thus the result in follows. Rest of the proof follows from the proof of Result 3.1 with obvious changes.
Proof Proof of Result 3.4 —
For the testing problem in Result 3.4, the likelihood ratio test uses the same test-statistic as in Result 3.3. Since the critical region is two-sided, the corresponding p-value is similarly defined as in Result 3.2. One can follow the steps elaborated through the proof of Result 3.2 and use Result 3.3 to easily prove Result 3.4.
Disclosure statement
No potential conflict of interest was reported by the authors.
Code availability statement
The necessary R codes for computing the estimators and are available at https://github.com/aniketstat/EstPi0Exp2021.
References
- 1.Benjamini Y. and Hochberg Y., Controlling the false discovery rate: a practical and powerful approach to multiple testing, J. R. Statist. Soc. Ser. B (Statist. Methodol.) 57 (1995), pp. 289–300. [Google Scholar]
- 2.Benjamini Y., Krieger A.M., and Yekutieli D., Adaptive linear step-up procedures that control the false discovery rate, Biometrika 93 (2006), pp. 491–507. [Google Scholar]
- 3.Biswas A., Estimating the proportion of true null hypotheses based on sum of p-values and application in microarray data, Commun. Stat. Simul. Comput. (2020), pp. 1–15. doi: 10.1080/03610918.2020.1800036. [DOI] [Google Scholar]
- 4.Blanchard G. and Roquain E., Adaptive false discovery rate control under independence and dependence, J. Mach. Learn. 10(12) (2009), pp. 2837–2871. [Google Scholar]
- 5.Chen J.J., Wang S.J., Tsai C.A., and Lin C.J., Selection of differentially expressed genes in microarray data analysis, Pharmacogenomics. J. 7 (2007), pp. 212–220. [DOI] [PubMed] [Google Scholar]
- 6.Cheng Y., Gao D., and Tong T., Bias and variance reduction in estimating the proportion of true-null hypotheses, Biostatistics 16 (2015), pp. 189–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Craiu R.V. and Sun L., Choosing the lesser evil: trade-off between false discovery rate and non-discovery rate, Stat. Sin. 18 (2008), pp. 861–879. [Google Scholar]
- 8.Gianetto Q.G., Combes F., Ramus C., and Gianetto M.Q.G., Package ‘cp4p’. (2019). Available at https://cran.r-project.org/web/packages/cp4p/cp4p.pdf.
- 9.Guan Z., Wu B., and Zhao H., Nonparametric estimator of false discovery rate based on bernštein polynomials, Stat. Sin. 18 (2008), pp. 905–923. [Google Scholar]
- 10.Gupta S.K., De S., and Chatterjee A., Some reliability issues for incomplete two-dimensional warranty claims data, Reliab. Eng. Syst. Saf. 157 (2017), pp. 64–77. [Google Scholar]
- 11.Hung H.J., O'Neill R.T., Bauer P., and Kohne K., The behavior of the p-value when the alternative hypothesis is true, Biometrics 53 (1997), pp. 11–22. [PubMed] [Google Scholar]
- 12.Jiang H. and Doerge R.W., Estimating the proportion of true null hypotheses for multiple comparisons, Cancer. Inform. 6 (2008), pp. 25–32. [PMC free article] [PubMed] [Google Scholar]
- 13.Langaas M., Lindqvist B.H., and Ferkingstad E., Estimating the proportion of true null hypotheses, with application to DNA microarray data, J. R. Statist. Soc. Ser. B (Stat. Methodol.) 67 (2005), pp. 555–572. [Google Scholar]
- 14.Lawless J.F., Statistical analysis of product warranty data, Int. Stat. Rev. 66 (1998), pp. 41–60. [Google Scholar]
- 15.Markitsis A. and Lai Y., A censored beta mixture model for the estimation of the proportion of non-differentially expressed genes, Bioinformatics 26 (2010), pp. 640–646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Nettleton D., Hwang J.G., Caldo R.A., and Wise R.P., Estimating the number of true null hypotheses from a histogram of p-values, J. Agric. Biol. Environ. Stat. 11 (2006), pp. 337–356. [Google Scholar]
- 17.Ostrovnaya I. and Nicolae D.L., Estimating the proportion of true null hypotheses under dependence, Stat. Sin. 22 (2012), pp. 1689–1716. [Google Scholar]
- 18.Pounds S. and Cheng C., Robust estimation of the false discovery rate, Bioinformatics 22 (2006), pp. 1979–1987. [DOI] [PubMed] [Google Scholar]
- 19.Pounds S. and Morris S.W., Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values, Bioinformatics 19 (2003), pp. 1236–1242. [DOI] [PubMed] [Google Scholar]
- 20.Sarkar S.K., On methods controlling the false discovery rate, Sankhy : Indian J. Stat. Ser. A 70 (2008), pp. 135–168. [Google Scholar]
- 21.Sarkar S.K., Guo W., and Finner H., On adaptive procedures controlling the familywise error rate, J. Stat. Plan. Inference 142 (2012), pp. 65–78. [Google Scholar]
- 22.Storey J.D., A direct approach to false discovery rates, J. R. Stat. Soc. Ser. B (Stat. Methodol.) 64 (2002), pp. 479–498. [Google Scholar]
- 23.Storey J.D. and Tibshirani R., SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays, in The Analysis of Gene Expression Data, Springer, New York, NY, 2003, pp. 272–290.
- 24.Storey J.D., Taylor J.E., and Siegmund D., Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach, J. R. Statist. Soc. Ser. B (Stat. Methodol.) 66 (2004), pp. 187–205. [Google Scholar]
- 25.Tong T., Feng Z., Hilton J.S., and Zhao H., Estimating the proportion of true null hypotheses using the pattern of observed p-values, J. Appl. Stat. 40 (2013), pp. 1949–1964. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wang H.Q., Tuominen L.K., and Tsai C.J., SLIM: A sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures, Bioinformatics 27 (2010), pp. 225–231. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

