Abstract
There are many applications in which a statistic follows, at least asymptotically, a normal distribution with a singular or nearly singular variance matrix. A classic example occurs in linear regression models under multicollinearity but there are many more such examples. There is well–developed theory for testing linear equality constraints when the alternative is two–sided and the variance matrix is either singular or non–singular. In recent years there is considerable, and growing, interest in developing methods for situations in which the estimated variance matrix is nearly singular. However, there is no corresponding methodology for addressing one–sided, i.e., constrained or ordered alternatives. In this paper we develop a unified framework for analyzing such problems. Our approach may be viewed as the trimming or winsorizing of the eigenvalues of the corresponding variance matrix. The proposed methodology is applicable to a wide range of scientific problems and to a variety of statistical models in which inequality constraints arise. We illustrate the methodology using data from a gene expression microarray experiment obtained from the NIEHS’ Fibroid Growth Study.
Keywords: Constrained and ordered inference, generalized inverse, Moore–Penrose inverse, (nearly) singular covariance matrix, modified likelihood ratio test (mLRT)
1. Introduction
A common problem in many applications is to compare two or more ordered experimental groups in terms of one or more outcome variables. For example, a toxicologist may be interested in comparing different dose groups in terms of several outcomes such as body weight, red blood cell count, hematocrits and so forth. Often the statistical problem reduces to drawing inferences regarding a parameter using a statistic Sn which, under suitable standardization and regularity conditions, is asymptotically normally distributed, i.e.,
| (1.1) |
as n → ∞ where ⇒ denotes convergence in distribution. The statistic Sn may be a vector of differences among means, a collection of rank statistics or an estimator of a regression parameter. In general, Σ is unknown and must be estimated from the data; furthermore Σ may be non–singular, singular or nearly singular. By nearly singular we mean that its condition number, i.e., the ratio of its largest to its smallest eigenvalue, is extremely large. As an example, near singularity arises in regression models due to multicollinearity (Silvey 1969, Montgomery and Peck, 2007). It is well known that common statistical methods, especially those based on matrix inversion, perform poorly when Σ is nearly singular and can not be directly used when Σ is singular.
Unconstrained statistical inference when the underlying variance matrix is singular is well studied in the context of linear regression (Khatri 1968, Rao and Mitra 1971 and Rao 1972), where Σ is typically known up to a constant. However, in many applications Σ must be estimated from the data. Moreover, Σ and/or its estimator denoted here by Σn, may be singular and, additionally, their rank may not be known in advance. Unconstrained Inference in such settings has been addressed by numerous authors in the statistical and econometric literature. Some examples include Moore (1977), Andrews (1987), Hadi and Wells (1990, 1991), Lutkepohl and Burda (1997), Dufour and Valery (2015) and Duplinskiy (2015). We note that in nonlinear regression singularity or near singularity of the variance matrix is rather common. For example, in the context of quantitative high throughput screening (qHTS) assays, where thousands of chemicals are evaluated for toxicity using cell lines, Lim et al. (2012) noticed that the condition number of the information matrix may be very large; sometimes as large as 109 or higher. Hadi and Wells (1990) provide examples of nonlinear models for which the information matrix is singular for some values of the model parameters; in fact, these are often the values specified by the null hypothesis. Another mechanism by which singular variance matrices arise is when θ = h(η), η is estimated by and the Jacobian of h is rank deficient. This leads, by the δ–method, to a singular variance matrix.
When the experimental groups are naturally ordered researchers are often interested in performing one–sided tests in which case the parameter space under the alternative hypothesis is not a linear subspace but a convex cone, denoted by . Tests for equality against an ordering are often formalized as
| (1.2) |
where is a linear subspace and . In practice is often the singleton {0} and is often defined by a finite set of linear inequalities. The theory for such testing problems is well developed, cf. Silvapulle and Sen (2005), when both Σ and its estimator Σn are non–singular. Models, both parametric and nonparametric, in which the variance matrix is known to be singular and in which testing (1.2) is of interest arise in a wide range of scientific problems, such as dose response studies. A concrete example will be discussed in more detail later on. We are not aware of a general theory or methodology for ordered inference when Σ and/or Σn are singular or nearly singular. In this paper we address this open class of problems in a principled manner. Pioneered by Dixon (1960) and Tukey (1962), trimming and winsorizing data are two well–known strategies for dealing with extreme observations, or outliers, and are widely used in robust statistics (Huber and Ronchetti, 2009). Inspired by the use of trimming and winsorizing in classical data analysis, in this paper we develop trimming and winsorizing based approaches to address near singularities in multivariate data. In this setting the role of outliers is played by eigenvalues that are associated with a large condition number, i.e., the eigenvalues which are different from zero but much smaller than the largest eigenvalue. These are exactly the eigenvalues which create challenges when analyzing multivariate data. Our proposed methodology builds on and extends the existing methods developed for unconstrained singular models as referenced earlier.
The paper is organized in the following way. In Section 2 we develop a modified likelihood ratio test assuming a known singular variance matrix. In Section 3 we extend the methodology to the case of unknown, possibly singular, or nearly singular variance matrices by introducing trimmed and winsorized tests. Since these two tests depend upon a threshold parameter, specified by the user, we modify them by introducing corresponding data–driven supremum type tests which do not require any input from the data analyst. The performance of the proposed tests in terms of their type I errors and powers was evaluated using an extensive simulation study. The study design and results are summarized in Section 4. In Section 5 we describe the NIEHS Fibroid Growth Study (FGS) of Peddada et al. (2008) and illustrate our proposed methodology by reanalyzing gene expression data obtained in FGS. Although there have been studies in which single genes have been correlated with tumor size (cf. Grandhi et al., 2016), to the best of our knowledge there are no studies in which a collection of genes, or a pathway, were correlated with tumor size; thus, as far as we know, this is the first paper that performs such a multivariate analysis. Section 6 provides a brief summary and further discussion. Additional results, as well as all proofs, are provided in the online supplementary text.
2. Known singular variance matrix
Since Σ is symmetric and nonnegative definite we may write its spectral decomposition as Σ = EΛET where E is an orthogonal matrix whose columns are the eigenvectors of Σ and is a diagonal matrix with nonnegative elements λ1, …, λp which are the eigenvalues of Σ. Without loss of generality assume that λ1 ≥ λ2 ≥ … ≥ λp. Moreover if Σ is singular then q = rank(Σ) < p and thus λi > 0 when i ≤ p − q and λi = 0 when i ≥ p − q + 1. It is well known that the Moore–Penrose generalized inverse of Σ denoted by Σ+ is equal to ETΛ+E where Λ+ is a diagonal matrix with elements 1/λi if i ≤ p – q and 0 otherwise. Let M be the q × p matrix obtained by dropping the last p − q rows of the matrix (Λ+)1/2ET which are easily verified to be identically zero. Finally let denote the column space of Σ. We shall index by ‘*’ the intersection of any set with . Thus we define , and where and is the orthogonal complement of . Clearly is a linear space, and and are convex cones.
The following theorem defines, and then describes, the distribution of the modified likelihood ratio test (mLRT) assuming a normal model with a known singular variance matrix. We use the qualification modified for the LRT because the singular normal distribution does not have a density and therefore the LRT does not exist.
Theorem 2.1
Consider testing (1.2) when and Σ is known and singular. Assume that 0 ∈ support(Y). Then, if the mLRT exists and is given by
| (2.1) |
Furthermore in the notation of Silvapulle and Sen (2005), under the null the statistic (2.1) is distributed as which is defined by
| (2.2) |
where X is a random variable (RV) and Iq is the q × q identity matrix.
Theorem 2.1 generalizes several results published in the literature. Firstly, it generalizes Theorem 3.7.1 in Silvapulle and Sen (2005) to the case where Σ is not a full rank matrix. Next it generalizes Lemmas 1 and 2 in Moore (1977), originally discovered by Khatri (1968) and Rao and Mitra (1971), from two–sided to one–sided hypotheses. In the proof of Theorem 2.1, given in the supplementary materials, it is shown that there are two equivalent forms for the mLRT, (2.1) and (2.2); the distribution of the mLRT is an immediate consequence of this equivalence. In other words it is shown that testing (1.2) with a singular Σ is equivalent to testing H0 : η = 0 versus based on a random variable (RV) X distributed as where is a cone. Thus, the original problem is solved by the usual procedure but in a lower dimensional space. Effectively the test is carried out as if but we need not specify that in advance. This feature is of practical importance when Σ is unknown as in Section 3. It is also important to note that Theorem 2.1 remains valid if we replace the Moore–Penrose inverse Σ+ with any other generalized inverse Σ− and choose instead of M, as defined above, any q × p matrix N which satisfies Σ− = NT N.
The statistic (2.1) may be computed using low–rank quadratic programming (cf. Boyd and Vandenberghe, 2004) where the cone is given in Proposition 2.1 below. In some applications there may be an interest in computing (2.2) directly, which requires expressing the cone in standard form, that is as for some matrix D. In some cases the cone may be found by simple and direct analysis. In general, though, this is not the case. Fortunately, a broadly applicable scheme for explicitly finding the inequalities defining is available and known as the double description method (Avis and Fukuda, 1992). Explaining the underlying key ideas requires some additional notation. Assume for now that is a polyhedral pointed convex cone, i.e., for some restriction matrix C with a finite number of rows. This form is referred to as the H–representation of (H for half–planes). Alternatively, by the Minkowski–Weyl Theorem (cf. Proposition 3.12.10 in Silvapulle and Sen 2005), can be also expressed as
where the set {θ1, …, θm} contains the extreme rays of . This form is referred to as the V–representation of (V for vertices). Recall that an extreme ray in a convex cone is an element , such that if θ = θ1 + θ2 with then θ = λθ1 or θ = λθ2 for some λ ≥ 0. The set of extreme rays of is denoted by . Clearly if θ is an extreme ray so is λθ for all λ ≥ 0, thus for convenience we will pick only one element from this equivalence class. Since the extreme rays are those elements which are necessary for generating the cone. Both the H–representation and the V–representation may be redundant, i.e., the matrix C may include unnecessary rows and the set {θ1, …, θm} may contain unnecessary rays. Such redundancies can always be removed. The conversion from the H–representation to the V–representation is know as the vertex enumeration problem and the reverse as the facet enumeration problem, see Lauritzen (2011) for a beautiful account of the theory which involves polyhedral geometry and computational linear algebra. Both conversions are used for computing the standard form of the cone .
Proposition 2.1
Let and let L and S be matrices, the rows of which, are bases for and respectively. Then
where and .
By Proposition 2.1 is the conic combination of the finite set extr(Mθ) where . The H–representation for is given in Proposition 2.1, hence finding requires the solution of a vertex enumeration problem. Proposition 2.1 provides a V–representation of from which an H–representation is found via a facet enumeration problem. In some settings intermediate, redundancy removal steps, may be applied. Although Proposition 2.1 discusses only convex polyhedral cones, with suitable modifications, it can be generalized to arbitrary convex cones.
Finally, it is well known that if then
where , j = 0, …, q are nonnegative weights that sum to unity and are functions of the constraints specified in the alternative hypothesis. The weights can be computed in closed form, when q ≤ 4 and is the orthant, or some other simple cone. Otherwise the weights or the entire distribution may be estimated by simulation as discussed in detail by Silvapulle and Sen (2005).
3. Unknown and possibly singular variance matrix
Theorem 2.1 shows how to test (1.2) when (1.1) holds and Σ is a fixed known singular matrix. However, in most situations Σ is unknown and thus needs to be estimated along with θ Let Σn denote an estimator of Σ which is assumed to be symmetric and nonnegative definite. We will further assume that Σn is consistent and that
| (3.1) |
where Q is a matrix valued RV and bn → ∞ is a sequence of constants; usually . Note that Σn need not be singular for some, or even all finite n, even if Σ is singular. Conversely Σn may be singular even if Σ is not. As noted in Andrews (1987), if rank(Σn) ≠ rank(Σ) then quadratic forms associated with Σn may not converge to those associated with Σ due to the discontinuity of generalized inverses, i.e., may not converge to XTΣ+X p even if Xn ⇒ X and . Hence, quadratic form based statistics, such as projected tests common in order restricted inference, may not converge to their assumed limits.
3.1. Trimmed tests
To overcome the above noted problem we trim the variance matrix by dropping all eigenvalues smaller than some pre–specified ε. Formally,
Definition 3.1
Let ε > 0. We refer to the matrix ΣT(ε) = EΛT(ε)ET, where ΛT(ε), which is the diagonal matrix obtained from Λ by replacing any element of Λ which is smaller than ε by 0, as the ε–trimmed variance matrix. Its Moore–Penrose inverse, i.e., is referred to as the ε–trimmed inverse of Σ and denoted by Σ+(ε).
For convenience we now drop the subscript T whenever possible. Let Σ+(ε) denote the trimmed variance, then its Moore–Penrose inverse, i.e., the matrix equals where is a diagonal matrix whose elements are 1= λi,n if λi,n ≥ ε and 0 otherwise. Dufour and Valery (2015) refer to as the spectral cut–off Moore–Penrose inverse. We now introduce a distance based test for (1.2) which is motivated by the mLRT for normal means given in (2.1) and the above definition, i.e., for each n we de ne the trimmed constrained test statistic as
| (3.2) |
where , and is the column space of Σn (εn). Note that by definition Tn(εn) is a function of the threshold εn. Since in general the value of the smallest eigenvalue may not be known in advance it is appropriate to allow εn to vary with n. If εn → 0 we denote Tn(εn) by Tn. The limiting distribution of Tn is given below; a discussion of the statistic Tn(ε) for a fixed ε is deferred to Section 3.2.
Theorem 3.1
Consider testing (1.2). Suppose that (1.1) holds and the limit distribution satisfies the conditions of Theorem 2.1. In additions suppose that Σn satisfies (3.1). Then under H0 and for any sequence of cut–off values {Σn} satisfying εn → 0 and bnεn → ∞ we have
| (3.3) |
where q, M and were defined in the context of Theorem 2.1.
Theorem 3.1 shows that Theorem 2.1 can be extended to the case where Σ is unknown. Note that if λq, the value of the smallest nonzero eigenvalue of Σ, were known then Theorem 3.1 would hold with any fixed ε < λq. Similarly if q were known then simply set the smallest p − q eigenvalues to be zero. However in practice q and λq are not known. Hence we truncate the estimated eigenvalues by a small number εn. The stated conditions guarantee that with probability one rank(Σn (εn)) = rank(Σ) for large enough n. This rank condition ensures the convergence of the Moore–Penrose inverse and consequently the convergence of the associated quadratic forms (Andrews, 1987). An important feature of the proposed test is that it is well defined whether Σ is, or is not, singular, i.e., it adapts to the rank of Σ. Note that if then the conclusion of Theorem 3.1 holds with εn = 0. However, there are many situations (cf., Cragg and Donald 1996) in which rank(Σn) = p for all n even though rank(Σ) < p. Obviously the proposed method will be useful in such a setting. We note that in most applications so Theorem 2.1 is valid for any sequence εn = nδ with δ < −1/2. For example we may choose εn ∝ 1/n1/3. By choosing εn → 0 and bnεn → ∞ we are ensuring that in large samples we are not setting a nonzero eigenvalue to zero which means that with very high probability the rank of Σ will not be overestimated. A discussion of how to choose εn in practice is deferred to Section 3.3.
3.2. Winsorized tests
One may consider other forms of regularization of nearly singular variance matrices. One potential alternative to trimming is winsorizing as defined below.
Definition 3.2
Let ε > 0. We refer to the matrix ΣW (ε) = EΛW (ε)ET, where ΛW (ε), which is the diagonal matrix obtained from Λ by replacing any element of Λ which is smaller than ε by Λ, as the Λ–winsorized variance matrix. Its inverse is referred to as the ε–winsorized inverse of Σ and denoted by Σ†(ε).
We note that winsorizing is closely related to ridging. However, in contrast with ridging only the small eigenvalues are modified. In fact the winsorized inverse shrinks 1/λi to a fixed constant 1/ε and is therefore of full rank. Both trimming and winsorizing can be obtained as special cases of ε–regularized inverses (Dufour and Valery, 2015).
Definition 3.3
Fix ε ≥ 0, the ε–regularized inverse of Σ is Συ (ε) = ETΛ υ (ε)E where Λυ (ε) = diag(υ (λ1, ε), …, υ (λp, ε)) and υ (λ, ε) is a nonnegative bounded function satisfying υ (λ, ε) = 1/λ when λ ≥ ε. The value of υ (λ, ε) when λ < ε is determined by the analyst. The function υ is referred to as a variance regularizing function.
For example, if υ (λ, ε) = 0 when λ < ε and λj < ε when j ≥ q + 1 theΛ υ (ε) = diag(1/λ1, …, 1/λq,0p–q) where 0p–q is a vector of 0′s with length p–q. It is clear that with this choice Λυ (ε), the regularized inverse, coincides with the trimmed inverse. Another natural choice is υ (λ, ε) = 1/λ when λ ≥ ε and υ (λ, ε) = 1/ε when λ ≤ λ which leads to Λυ (ε) = diag(1/λ1, …, 1/λq, ε−11p–q) in which case the regularized inverse coincides with the winsorized inverse. Since for λ ≤ ε we have 0 ≤ υ (λ, ε) ≤ ε it follows that all variance regularization functions interpolate between trimming and winsorizing, the two possibilities addressed in this paper.
Before presenting our next result we require the following notation. Let Y be , define to be
Note that the quadratic forms appearing in are the unweighted versions of those in . We can now modify the statistic Tn (ε) by replacing with to obtain the winsorized constrained test statistic
| (3.4) |
where, as before, Wn = Wn (εn) when εn → 0. The limiting distribution of Wn and Wn (ε) are given in the following Theorem which applies to general variance regularizing functions. The matrices , and appearing in statement of the theorem below are defined in the proof Lemma S1.1 in the supplementary materials.
Theorem 3.2
Suppose that the assumptions given in Theorem 3.1 hold. Then
with M as defined earlier. Furthermore, let r be the number of eigenvalues of Σ which are larger than ε. Then as n → ∞
| (3.5) |
where ⊕ denotes the sum of independent RVs.
Theorem 3.2 is a generalization of Theorems 2.1 and 3.1. First, it shows that Wn and Tn have the same limiting distribution. This result may seem a bit surprising since and but Σ+ ≠ Σ† (0) whenever υ (0, 0) is nonzero. We also note that if ε is “sufficiently small”, i.e., if with high probability ε is smaller than λq,n, the qth eigenvalue of Σn, then Tn (ε) and Wn (ε) coincide. If, however, ε is “too large” then instead of the limit (3.3) we obtain the limit (3.5). It is also clear that in this case as n → ∞, where the matrix M1 is defined in the proof of Lemma S1.1 in the supplementary materials. The relative merits of these statistic are investigated below.
Proposition 3.1
Under the stated conditions the tests Tn, Wn and Wn (ε) are consistent. The test Tn (ε) is consistent if and only if .
Proposition 3.1 shows that a test based on Tn (ε) may not be consistent for some configurations in the alternative. Nevertheless, for other configurations it may be more powerful than Wn (ε), Wn or Tn. Hence no test dominates the others as demonstrated in the simple example below.
Example 3.1
Assume that Y1, Y2, …, are with where . It is further assumed that Σ is not known in advance. Consider testing H0 : θ= 0 versus . Let and . Choose ε such that . It can be verified that for large n
where Z1, Z2 are independent RVs. It is clear that the distribution of Tn (ε) is the same for all (θ1, θ2) ∈ {(0, c) : c ≥ 0}. It follows that Tn (ε) can not detect if θ2 > 0. However if (θ1, θ2) = (c, 0) and c > 0 then Tn (ε) will be more powerful than the other tests. This will also often be the case when θ1 is large and θ2 is small. The relative merit of Wn (ε) compared with Wn depends on the value of both and (θ1, θ2) and is more difficult to assess. Note that Tn (and Wn) is actually the test one would use if it were known in advance that Σ is diagonal with , and .
3.3. Choosing ε and two supremum statistics
The statistics (3.2) and (3.4) are functions of ε. We now describe a few approaches for choosing ε when using (3.2) or (3.4). First, in some settings there may be prior knowledge regarding the rank of Σ or the size of its smallest eigenvalue. Such information, when available, should of course guide our choice. In general Σ is not known and therefore it seems that an automated data driven procedure should be developed. A natural approach, motivated by numerical stability considerations, is to truncate or winsorize the small eigenvalues until the modified condition number of Σn, i.e., the ratio of the largest to the smallest nonzero eigenvalue, is bounded by some constant. More concretely let λ1,n ≥ · · · ≥ λp,n denote the eigenvalues of the Σn. Let
where Δ is a pre–specified constant (typically between 50–200), then choose εn = λk,n. This type of approach is well known and extensively discussed in the literature on numerical linear algebra where systems of linear equations with large condition numbers are solved (El Ghaoui 2002, Montgomery and Peck 2007, and Golub and Van Loan 2012). We remark that this method is a variation on the choice provided by Fan et al. (2012). However, their choice was inspired by multiple testing problems for correlated data where the objective was to approximate the population covariance by a matrix derived from a factor analytic model. Although numerical experiments show that this approach is flexible, easy to implement and results in a good power for the test statistic under consideration, we have developed a more general approach discussed below.
Recall that by Theorems 3.1 and 3.2 if εn → 0 and bnεn → ∞ then Tn and Wn converge to the same limit. Usually , therefore the Theorems will be satisfied if ε = C/n1/k for some constant C and k > 2. A reasonable choice is k = 3 and C = λ1,n, i.e., each eigenvalue is compared to largest estimated eigenvalue scaled by 1/n1/3. Obviously the aforementioned choice of the constant C is arbitrary and one can compare to any other function of the eigenvalues such as the trace, the product of the eigenvalues, the sum/product of the top r eigenvalues and so forth. Although, in large samples, all such choices are equivalent there may be differences in small to moderate samples. Moreover, it is important to note that any scaling, 1/n1/k, with k > 2 will result in the same limit. This is very different from many other statistical problems in which the order of the regularizing parameter is very important, see Cule et al. (2011), Bulmann (2013), Van de Geer et al. (2014) for ridge regression and lasso examples and Wand and Jones (1995) for kernel smoothing. Moreover, note that in all of the abovementioned papers the constant factor is also chosen in an ad–hoc manner.
Conclusion 3.1
Choosing ε in (3.2) and (3.4) is fundamentally different from the problem of choosing the value of a regularizing parameter in settings such as ridge regression or nonparametric regression. First, regardless of the value chosen for ε the resulting test always has the prescribed level, at least asymptotically. Consequently the choice of ε may only affect power. Our numerical experiments indicate that usually the power is only marginally influenced by the choice of ε. Moreover, and as indicated by Proposition 3.1 and Example 3.1, the power of Wn(ε) is much less influenced by the choice of ε compared with Tn(ε). It is also important to realize that there does not exist an optimal choice of ε as the best ε is a function of the unknown value of θ. In contrast, the choice of the ridging parameter affects the performance of both estimators and tests, e.g., Vinod and Ullah (1981) as well as the abovementioned references. The same holds for nonparametric regression where the regularizing parameter is chosen to balance the bias and variance. Such a balancing act is not required here.
Nevertheless, some users may be uncomfortable choosing a single value for ε. To accommodate such a preference we introduce two supremum statistics, based on (3.2) and (3.4), which depend on a range of ε values. Let 0 < ε1 ≤ ε2 and set I = [ε1, ε2]. Define,
| (3.6) |
The statistics (3.6) are motivated by Davies (1987) and Andrews and Ploberger (1994) where tests are developed for the case where a nuisance parameter is present only under the alternative. Here ε varies over the interval I and plays the role of the nuisance parameter.
In the following we derive the limiting distributions of the statistics (3.6). First fix ε > 0. Then by equation (3.2) and from the proof of Theorem 3.1, given in the supplementary materials, it follows that
where X is a , is the number of eigenvalues larger than ε, Ir(ε) = diag(1r(ε), 0q–r(ε)), Mε is the r(ε) × p matrix obtained by dropping the last p–r(ε) rows of the matrix (Λ+(ε))1/2E and . Conditioning on X we note that T (ε) is a function of ε only through r(ε), which is a step function. This implies that
| (3.7) |
where j = min {i : λi ≥ ε2} and k = min {i : λi ≥ ε1}, i.e., T (ε) assumes only a finite number of values. Finally we note that conditionally T (ε) is a decreasing function of ε. Therefore it follows by Theorem 1.A.6 in Shaked and Shanthikumar (2007) that unconditionally we have whenever ε1 ≥ ε2 where denotes the usual stochastic order.
Similarly, by Theorem 3.2
where
X1 and X2 are independent and and Vε = diag(λr(ε)+1/ε, …, λq/ε). The matrices , are defined as in Theorem 3.2 with r = r(ε). To gain some insight into the behavior of W(ε) consider the setup in Example 3.1, where p = 3, q = 2, with , for which a direct calculation shows that
where X = (X1, X2)T is a RV and ψ(ε, σ2) = 1 if σ2 ≥ ε and σ2/ε if σ2 < ε. Thus conditional on X, W (ε) is a decreasing, continuous function of ε. Furthermore, arguing as before, it can be shown that unconditionally . To summarize:
Theorem 3.3
Suppose that the assumptions given in Theorems 3.1 and 3.2 hold. Then as n → ∞ we have
Theorem 3.3 shows that the theoretical critical values of the statistics Tn (I) and Wn (I) should be computed at ε1. However, in finite samples Tn (I) and Wn (I) may not be maximized at ε1 even under the null. In practice the interval I can be chosen using the estimated eigenvalues of Σn. We propose choosing I = [λp,n, λ1,n], i.e., maximizing (3.2) and (3.4) over the full spectrum of Σn. Note that choosing values of ε1 smaller than λp,n will not yield numerically different results, whereas choosing ε2 larger than λ1,n will result in a loss of power. Our numerical studies, as well Theorem 3.3, indicate that the type I error is always maintained. Moreover as indicated by our simulations the power of the resulting tests is comparable to the power obtained by choosing the best value for ε when its value is known in advance. Thus our approach completely eliminates the need to choose a value for ε while hardly compromising the power of the test.
Finally, we emphasize that the statistics (3.2) and (3.4) as well as (3.6) converge to the same limit if the conditions in Theorem 3.1 are satisfied. The consistency of tests based on Tn (I) and Wn (I) follows immediately from Proposition 3.1.
3.4. Implementation
Finally a few remarks regarding implementation. Suppose first, that the statistic Sn satisfies (1.1) and that Σ is known. If so, find Σ+ the spectral cut–off Moore–Penrose inverse of Σ, as described in Section 2, and compute the statistic Tn given by (2.1) using rank reduced quadratic programming. Denote its observed value by tn. To calculate the significance level:
Step 1: Simulate a sample of size n from a multivariate normal distribution with mean vector 0 and variance matrix Σ.
Step 2: Compute the sample mean for the ith sample.
Step 3: Compute the test statistic Tn,i in the same manner, using in place of .
Repeat Steps 1–3 B times, where B is large enough to ensure that the null distribution is estimated sufficiently well. In our examples we used B = 105. We then approximate the p–value by
| (3.8) |
Modifications of the procedure above are necessary when the variance matrix Σ is unknown. We will describe the procedure for the supremum statistic Wn (I); analogous procedures are used for (3.2), (3.4) and Tn (I). The first step is to calculate Σn, a consistent estimator of Σ. Given Σn an interval I of values for ε is set forth as described in Section 3.3. The observed value of the statistic is calculated using (3.4) and (3.6) and denoted ωn(I). The significance level is calculated by repeating steps 1–3 above, where: in Step 1 we simulate from a multivariate normal distribution with mean vector 0 and variance matrix Σn; Step 2 is exactly as described above and in Step 3 we compute Wn,i(I), where at each iteration we reestimate Σ. Finally the p–value is calculated as in (3.8) with Tn,i replaced by Wn,i(I) and tn replaced by wn(I). Our approach is quite standard and falls within the framework of Monte–Carlo tests (cf. Dufour 2006, MacKinnon 2009), where a plug–in estimator for the nuisance parameter is used when constructing the test statistic. Of course this approach relies on large sample theory. It is clear from our simulations that it works well in this application. These procedures described above were programmed in MATLAB.
4. Simulations
We conducted a wide range of simulation studies to compare the proposed test statistics. More specifically, we provide two different simulation settings in which the type I errors and powers of the proposed statistics are compared. We focus on the statistics Tn = Tn (0), Tn(I) and Wn(I), although results for Tn(ε) and Wn(ε) for various values of ε are also reported but presented only in the Supplementary Materials. The first simulation setting is inspired by Example 3.1 and is shortly described whereas the second simulation setting is inspired by the Fibroid Growth Study (FGS). The FGS and the corresponding data analyzed in this paper are described in detail in Section 5.
Motivated by Example 3.1, we generated n observations from a distribution, with either p = 3 or p = 8. For the case where p = 3 we chose
This variance matrix has eigenvalues 1.25, 0.75, and 0.00. Under the null hypothesis, the mean vector θ was taken to be the null vector. We considered three different patterns of alternative hypotheses, namely, the simple order, the orthant order, and the tree order. The values of θ under the corresponding alternatives is given in Table 1. For the case of p = 8 we chose
where Ω = (1 – ρ)I4 + ρJ4, I4 is the identify matrix, J4 is the 4 × 4 matrix of all 1’s, and Ψ = {10−3, 10−5, 0, 0}. For this simulation, we set ρ = 0.25. The resulting variance matrix has eigenvalues 1.75, 0.75, 0.75, 0.75, 10−3, 10−5, 0, and 0. Thus in this example, two eigenvalues are identically zero and at least one is nearly zero. Once again we considered patterns of mean vectors satisfying either the simple order, the orthant order, or the tree order alternative. The values of θ under the corresponding alternatives is given in Table 2.
Table 1:
Patterns of the mean vector under the alternative hypothesis for the case of p = 3.
| Value of θi | ||||
|---|---|---|---|---|
| i = | 1 | 2 | 3 | |
| Simple | 0.00 | 0.25 | 0.00 | |
| Design A | Orthant | 0.50 | 0.50 | 0.00 |
| Tree | 0.00 | 0.50 | 0.00 | |
| Simple | 0.00 | 0.25 | 0.50 | |
| Design B | Orthant | 0.50 | 0.50 | 0.50 |
| Tree | 0.00 | 0.50 | 0.50 | |
Table 2:
Patterns of the mean vector under the alternative hypothesis for the case of p = 8.
| Value of θi | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| i = | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
| Simple | 0.00 | 0.01 | 0.03 | 0.04 | 0.06 | 0.07 | 0.00 | 0.00 | |
| Design A | Orthant | 0.20 | 0.20 | 0.20 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Tree | 0.00 | 0.10 | 0.10 | 0.10 | 0.00 | 0.00 | 0.00 | 0.00 | |
| Simple | 0.00 | 0.01 | 0.03 | 0.04 | 0.06 | 0.07 | 0.09 | 0.10 | |
| Design B | Orthant | 0.20 | 0.20 | 0.20 | 0.00 | 0.00 | 0.00 | 0.00 | 0.20 |
| Tree | 0.00 | 0.10 | 0.10 | 0.10 | 0.00 | 0.00 | 0.00 | 0.10 | |
For both p = 3 and p = 8, the interval of values I for ε were taken to be all non–zero eigenvalues of Σ. For the case Σ known we used 10, 000 replications to simulate the sampling distribution of all three test statistics under the null hypothesis and the type I error rates and powers were estimated using 5, 000 replications. In the case of Σ unknown, which is more realistic, the test statistic was calculated using the sample covariance matrix. Replications to simulate the sampling distribution were drawn from a multivariate Normal distribution using the true Σ, but the test statistics for these replications were computed using the sample covariance matrix.
All three tests controlled the type I errors at their nominal rate of 0.05 (see Tables S3 and S5 in the Supplementary Materials, Section S2). This is true whether Σ is known or unknown. Note that the results for known Σ are of lesser importance and provided as benchmark. The discussion below focuses on the case where Σ is unknown. Results for powers for p = 3 are summarized in Figure 1, and the powers for p = 8 are summarized in Figure 2. Complete results are provided in Tables S4 and S6 in the Supplementary Materials. For all patterns considered the power of Wn(I) generally exceeded the power of both Tn(I) and Tn(0), except for the simple order for p = 3. A bit of thought reveals why. Note that in the case of the simple order Tn(0) actually tests for θ1 < θ2 in , whereas Wn(I) tests whether {θ1 < θ2} ∪ {θ2 < θ3} on . However, in Design A θ3 = 0, so actually under the alternative θ1 < θ2 > θ3 which is an umbrella order, not a simple order. Thus the power of the test, designed for the simple order, is reduced. Moreover, if we test for the correct alternative, i.e., the umbrella order, then Wn(I) is again the most powerful test. For example when n = 10 the powers are 0.0930 for Tn(I), 0.1448 for Tn(0) and 0.1866 for Wn(I). The gain in power associated with Wn(I) is often substantial and when Wn(I) is not the most powerful it typically loses only by a small margin. Consequently, we recommend that investigators use the statistic Wn(I). Furthernote that under Design B, the gains are even more substantial for both p = 3 and p = 8. See Figures 3 and 4 as well as Tables S8, S10, and S11 in the Supplementary Materials. These Figure indicate that the Wn(I) statistic is more robust with respect to singularity of Σ and therefore also more powerful.
Figure 1:
Powers for Wn(I) (triangle, dotted lines), Tn(I) (diamonds, dashed lines), and Tn(0) (circles, solid lines) for simulation 1, Design A, the case where p = 3.
Figure 2:
Powers for Wn(I) (triangle, dotted lines), Tn(I) (diamonds, dashed lines), and Tn(0) (circles, solid lines) for simulation 1, Design A, the case where p = 8.
Figure 3:
Powers for Wn(I) (triangle, dotted lines), Tn(I) (diamonds, dashed lines), and Tn(0) (circles, solid lines) for simulation 1, Design B, the case where p = 3.
Figure 4:
Powers for Wn(I) (triangle, dotted lines), Tn(I) (diamonds, dashed lines), and Tn(0) (circles, solid lines) for simulation 1, Design B, the case where p = 8.
5. Illustration: changes in gene expression by the tumor size in the NIEHS Fibroid Growth Study
Uterine leiomyoma (also called uterine fibroids) are benign hormonally mediated smooth muscle tumors commonly found in premenopausal women. According to some estimates at least 70% of white women in the US have fibroids and the estimates are even higher for black women. Furthermore, the total annual cost associated with these tumors in the US is over 30 billion dollars. Despite such high prevalence rates and associated costs, these tumors are not very well characterized. The NIEHS Fibroid Growth Study (FGS) is one of the largest studies, involving 72 premenopausal women, where the growth patterns of fibroids were investigated (Peddada et al., 2008). Tissue samples obtained from women in the FGS who underwent either myomectomy or hysterectomy provide a great opportunity for researchers to investigate the molecular characteristics of these benign tumors. In this section we compare the mean expression of two sets of important genes using the proposed methodology. Specifically, we consider a subset of genes belonging to the Interleukin–1 (IL–1) signaling pathway which are known to be involved in the initial stages of tumor formation (Dunne et al., 2003) and a subset of genes called collagens which are well–known to be associated with tumor development and growth (Davis et al., 2013, Leppert et al., 2014). Following Grandhi et al. (2016) we shall consider four groups: (1) normal myometrium; (2) small tumors (between 0.08 – 5.70 cm3); (3) medium tumors (between 9.0–132.00 cm3); and large tumors (between 240–2016 cm3). The respective sample sizes were, n1 = 8, n2 = 14, n3 = 25 and n4 = 13.
Let Yij denote the m dimensional gene expression vector of the jth individual in the ith group. Let , i = 1, …, 4 denote the group means. Let be the mean gene expression level in group i. We also assume that for all i and j, i.e., the variance are homogeneous across groups. Define,
Clearly and , where Ψ is m × m and I is the 4 × 4 identity matrix. Hence is a p × p (p = 4m) block diagonal matrix with blocks Ψ repeated 4 times. We shall assume that .
An important component of growth and development of fibroids is angiogenesis or vascularization, the formation of blood vessels. According to Grandhi et al. (2016), the Interleukin-1 (IL–1) signaling pathway is involved in a cascade of events including inflammatory response and production of prostaglandins that trigger the expression of growth factors involved in angiogenesis. Therefore in this paper we investigated the relationship between tumor size and gene expression patterns of m = 25 genes belonging to the IL–1 signaling pathway. The names of the 25 genes along with the log–transformed (to base 2) mean gene expression data, and pooled variance matrix, are provided in Supplementary Tables S12 and S13, respectively. A priori we do not know if the 25 genes considered in this analysis are positively or negatively associated with tumor size. Hence we tested the null hypothesis H0 : θ1 = θ2 = θ3 = θ4 against the alternative where and . Stacking θ = (θ1, θ2, θ3, θ4)T and choosing a suitable contrast matrix C we restate the above hypotheses as H0 : Cθ = 0, versus H1 : {Cθ ≥ 0} ∪ {Cθ ≥ 0}. As often done (Peddada et al. 2003), we tested this union-intersection hypotheses by testing for each alternative hypothesis separately and then applying Bonferroni corrections to the resulting p–values. Thus suppose p1 is the p–value associated with testing H0 versus and p2 is the p–value associated with testing H0 versus , then the Bonferroni corrected p–values are min(1, 2pi) i = 1, 2.
5.1. Data–driven simulation study
A simulation study motivated by the data obtained from the FGS was carried out. Data from four p–variate normal distributions was generated. A test for H0 : θ1 = θ2 = θ3 = θ4 against the alternative H1 : θ1 ≤ θ2 ≤ θ3 ≤ θ4 was conducted as described above. By stacking the random vectors from the four groups, the data can be represented by a single 4m × 1 normal vector with a 4m × 1 population mean vector and 4m × 4m variance matrix of the form Σ = I ⊗ Ψ, where I is a 4 × 4 identity matrix, Ψ is a m × m matrix and ⊗ denotes the usual Kronecker product of matrices. Thus in terms of the previous notations p = 4m and m = 25. Our choice of parameters for the multivariate normal distribution were inspired by the real data in the IL–1 signaling pathway in FGS. Let θ denote the pooled sample mean vector and let Ψ denote the pooled sample covariance matrix of the 25 genes in the IL–1 signaling pathway from four groups. Out of the 25 genes, we arbitrarily chose five of the genes (i.e. components 4, 8, 12, 16, and 20 of the mean vector) to satisfy the alternative hypothesis and the remaining genes (or components) satisfy the null hypothesis. For these five non-null genes, the elements of θ were set to increase by 1% from group to group. For example, the means of the 4th gene were θ14, 1.01θ24, 1.02θ34, 1.03θ44, for groups 1, 2, 3, and 4, respectively. See Supplementary Tables S12 and S13 for the values of the mean vectors, and Ψ.
Our goal was to investigate the effect of the degree of singularity, as measured by the condition number of the matrix Ψ, on the performance of the proposed tests. Recall that we may write the spectral decomposition of Ψ as Ψ = EΛET where Λ is a diagonal matrix of the eigenvalues. Denote by Λ2 the eigenvalues observed in the FGS. In addition we constructed the matrices Λ1 an Λ3 defined by Λ1 = s × W1 and Λ3 = s × W3 where s is the sum of the eigenvalues of Λ2 and Wj = diag(ωj), j = 1, 2 are diagonal weight matrices with decreasing weights (i.e., when i ≤ i′), which preserve the sum of the eigenvalues while modifying the condition number, see Table S16 in the Supplementary text. Specifically Λ1 has a condition number of Δ = 24 and Λ3 has a condition number of Δ = 1301.
The performance of the proposed tests was evaluated assuming the variance was unknown. The results are displayed in Figure 5, and in the Supplementary Materials, Section S3. While Tn(I) occasionally achieves higher power than Wn(I), this appears to be limited to the extreme cases when n = 10 and the condition number is very high. We speculate that Tn(I) outperforms Wn(I) when n = 10 because in this setting with only 40 observations, it is di¢ cult to precisely estimate the variances and covariances and Tn(I) which trims the eigenvalues is able to reduce the amount of noise while Wn(I) does not. Nevertheless, considered overall, Wn(I) still appears to be the preferable statistic: it consistently has equal or higher power than Tn and, outside of the extreme cases, also outperforms Tn(I).
Figure 5:
Powers for Wn(I) (triangle, dotted lines), Tn(I) (diamonds, dashed lines), and Tn(0) (circles, solid lines) for simulation design 2.
5.2. Data analysis
The variance matrix has 25 eigenvalues with a condition number of 557. For illustration purposes we conducted the tests Tn(I), Wn(I), and Tn(0). The interval I, for computing Wn(I) and Tn(I), was selected to cover the full range of eigenvalues of Ψ. Thus the set I included the 1st, 5th, 10th, 15th, and 25th eigenvalues. All tests were carried out assuming that Σ was unknown. According to our analysis, after performing Bonferroni corrections, Tn(I) rejected the null hypothesis in favor of an increasing trend, but neither Wn(I) not Tn(0) did so. The respective p–values were 0.007, 1, and 1. Similarly, Tn(I) declared a significant decreasing trend, while Wn(I) and Tn(0) did not, with Bonferroni corrected p–values of 0.007, 0.36, and 0.64. However, it should be noted that the null distribution of Tn(I) was highly skewed. Of the 10,000 replications used to simulate the null distribution of Tn(I) for the increasing trend, 9965 of them resulted in a test statistic of exactly 0. The degree of singularity in the IL–1 signaling pathway data may render Tn(I) an untrustworthy test.
We emphasize that this is the first paper to assess the association between tumor size and gene expression in this context. There are two possible explanations for not detecting a significant increasing or a significant decreasing trend in the gene expression of the vector of 25 IL–1 pathway genes considered here. The first is that failing to reject a null hypothesis does not imply the null is true but may just be an indication that the study may be underpowered for discovering the multivariate trends considered in this paper. The second is that some genes are positively associated while others are negatively associated (and some may not be associated at all). To discern if the latter is the case, we performed a univariate trend analyses for each gene separately using ORIOGEN (Peddada et al., 2003, Peddada et al., 2005), a bootstrap based univariate order restricted inference method to detect patterns in an ordered gene expression data. After controlling the false discover rate at 5% we discovered that 4 out of 25 genes had a highly significant increasing trend whereas all remaining 21 genes had a highly significant decreasing trend. Results of these ORIOGEN based analyses are provided in the Supplementary text (Table S16). Due to these conflicting trends it is not surprising that the multivariate trend test did not find all 25 genes to have significant increasing trend or all to have a significant decreasing trend. Moreover, Grandhi et al. (2016) discovered that the genes in the IL–1 pathway were enriched only in the small tumors and hence are potentially involved in tumor genesis and initiation. Therefore we performed an additional post–hoc analysis using ORIOGEN to investigate if there were significant differences in the gene expression between small and medium tumors and between small and large tumors. Consistent with Grandhi et al. (2016) we discovered that none of these 25 genes were differentially expressed in these pairwise comparisons (at FDR = 0.05) although we found all of them to be differentially expressed between the normal myometrium and the small tumors.
6. Summary and discussion
In this article we introduced a general framework and concrete tools for testing hypotheses under linear inequality constraints when the variance matrix is potentially singular. The new methodology extends the existing literature in several directions and since it adapts to the true rank of Σ it can handle singularity as well as nearly singularity of variance matrices in a unified way. Adding to the novelty, the ideas of trimming and winsorizing are extended to the present context of singular and nearly singular variance matrices. It is well–known that these techniques were originally developed to deal with extreme observations in the data. Specifically we propose trimmed and winsorized tests (3.2) and (3.4) and two supremum tests (3.6) which are based on them. Although all tests are asymptotically equivalent there are substantial differences in power in finite sample. Interestingly, when comparing (3.2) and (3.4), we discovered that in most cases the test statistic based on the winsorized variance matrix has larger power than the test based on the trimmed variance matrix. The gain in power was often substantial. Recall that, under normality, winsorizing leads to asymptotically efficient estimators and is therefore preferred to trimming. Similarly we found that the statistic Wn(I) results in more powerful tests than Tn(I). Further we observe that the supremum tests (3.6), which do not require choosing ε are only minimally less powerful than the tests (3.2) and (3.4) assuming that the best value for ε is known. For these reasons we recommend using Wn(I).
The proposed methodology can be extended in several directions. As an example we discuss the regression setting. First note that in order to estimate a regression parameter it is often necessary to first invert the variance matrix Σ. This is not possible if Σ is singular. Hence additional assumptions on the data generating mechanism, beyond those given by (1.1), are required. Of course, if the matrix XT X has very large but finite condition number, then the ordinary least squares estimator exists and is unbiased and a trimming or winsorizing test can be applied as discussed above. This holds true whenever the matrix converges to a non–singular matrix. However, if converges to a singular matrix, as in Knight (2008), then a different analysis is required. To fix ideas consider the linear regression Yi = βTxi + εi where i = 1, …, n. The least squares estimator is given by
Note that if εi are independent RVs then
| (6.1) |
is distributed as for all n. A limiting distribution exists provided converges. It is typically assumed that converges to a matrix that is positive definite, in which case we obtain , the usual convergence rate. Consider now the case where as n → ∞ we have where is a subspace, i.e., the observations are increasingly concentrated on a subspace. It follows that where S is some singular matrix. We first exclude the possibility that in which case is not consistent. It is further evident that a limit in (6.1) will exist if and only if converges to some positive definite matrix and cn is a sequence of constants increasing to infinity. By letting at various rates we can obtain different normalizing constants cn. It follows that our methodology will apply here if we replace with cn in (1.1) and proceeding as before.
In this paper the focus had been on the case where Σ is not a function of the unknown mean parameter θ. However, in many applications Σ = Σ(θ) is a function of θ and is estimated by a plug–in procedure, i.e., . Such situations arise in nonlinear regression models, where Yi = f(Xi; θ) + εi, and in generalized linear models. In all such cases, although the theoretical asymptotic variance of the may be non–singular matrix, its estimated variance may be nearly singular. As seen in Lim et al. (2013) and in Lim (2014), the condition number (in log scale) of the estimated variance matrix can be as high as 105 or more. Thus, the theory and methodology discussed in this paper are relevant to non–linear as well as generalized linear models.
Finally, as noted by a referee it is of interest to extend the current methodology within the framework of order restricted model selection (cf. Mulder et al., 2010 and Kuiper et al. 2012). An additional direction is to explore the performance of the proposed tests in high dimensions.
Supplementary Material
Acknowledgments
We thank the referees, AE, and the Editors for several constructive comments which helped us improve both the presentation and the contents of the paper. The research of Ori Davidov was partially supported by the Israeli Science Foundation Grant No. 1256/13. Shyamal Peddada and Casey Jelsema were supported [in part] by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (Z01 ES101744-04).
Contributor Information
Ori Davidov, Department of Statistics, University of Haifa, Mount Carmel, Haifa 31905 Israel.
Casey M Jelsema, Department of Statistics, West Virginia University, Morgantown, WV, 26506.
Shyamal Peddada, Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Alexander Drive, RTP, NC 27709.
References
- [1].Andrews DWK (1987). Asymptotic Results for Generalized Wald Tests. Econometric Theory, 3: 348–358. [Google Scholar]
- [2].Andrews DWK, Ploberger W (1994). Optimal Tests when a nuisance parameter is present only under the alternative. Econometrica, 62: 1383–1414. [Google Scholar]
- [3].Avis D, Fukuda K (1992). A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra. Discrete Computational Geometry, 8:295–313. [Google Scholar]
- [4].Bhimasankaram P, Sengupta D (1991). Testing for the mean vector of a multivariate normal distribution with a possibly singular dispersion matrix and related results. Statistics and Probability Letters, 11: 473–478. [Google Scholar]
- [5].Boyd S, Vandenberghe L (2004). Convex Optimization. Cambridge University Press. [Google Scholar]
- [6].Buhlmann P (2013). Statistical significance in high-dimensional linear models. Bernoulli, 19: 1212–1242. [Google Scholar]
- [7].Cule E, Vineis P, De Iorio M (2011). Significance testing in ridge regression for genetic data. BMC Bioinformatics, 12:372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Davidov O, Herman A (2010). Testing for order among K populations: theory and examples. The Canadian Journal of Statistics, 37: 1–19. [Google Scholar]
- [9].Davidov O, Herman A (2012). Ordinal dominance curve based inference for stochastically ordered distributions. Journal of the Royal Statistical Society, Series B, 74: 825–847. [Google Scholar]
- [10].Davidov O, Peddada SD (2011). Order restricted inference for multivariate binary data with application to toxicology. Journal of the American Statistical Association, 106: 1394–1404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Davies RB (1987). Hypothesis testing when the nuisance parameter is present only under the alternative. Biometrika, 74: 33–43. [Google Scholar]
- [12].Davis BJ, Risinger JI, Chandramouli GVR, Bushel PR, Baird DD, Peddada SD (2013). Gene Expression in Uterine Leiomyoma from Tumors Likely to Be Growing (from Black Women over 35) and Tumors Likely to Be Non-Growing (from White Women over 35). PLOS One 8(6): e63909. doi: 10.1371/journal.pone.0063909. http://f1000.com/prime/718021284?bd=1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Dixon WJ (1960). Simplified Estimation from Censored Normal Samples. The Annals of Mathematical Statistics, 31: 385–391. [Google Scholar]
- [14].Dufour JM (2006). Monte Carlo tests with nuisance parameters: A general approach to finite-sample inference and nonstandard asymptotics. Journal of Econometrics, 133: 443–477. [Google Scholar]
- [15].Dufour JM, Valery P (2015). Wald–type tests when rank conditions fail: a smooth regularization approach. Unpublished Manuscript, https://www.american.edu/cas/economics/info-metrics/pdf/upload/Valery-paper-Nove-2011.pdf [Google Scholar]
- [16].Duplinskiy A (2015). Is regularization necessary? A Wald-type test under non-regular conditions. Unpublished Manuscript, http://duplinskiy.com/waldtest.pdf. [Google Scholar]
- [17].Eaton ML, Tyler DE (1991). On Wielandt’s inequality and its application to the asymptotic distribution of the eigenvalues of a random symmetric matrix. The Annals of Statistics, 19: 260–271. [Google Scholar]
- [18].Eaton ML, Tyler DE (1994). The asymptotic distribution of singular-Values with applications to canonical correlations and correspondence analysis. Journal of Multivariate Analysis, 50: 238–264. [Google Scholar]
- [19].Fan J, Han X, Gu W (2012). Estimating false discovery proportion under arbitrary covariance dependence. Journal of the American Statistical Association, 107:1019–1035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Golub GH, Van Loan CF (2012). Matrix Computations, 4th Edition. Johns Hopkins Studies in the Mathematical Sciences. [Google Scholar]
- [21].Grandhi A, Guo W, Peddada SD (2016). A multiple testing procedure for multidimensional pairwise comparisons with application to gene expression studies. BMC Bioinformatics, 17:104, DOI: 10.1186/s12859-016-0937-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Hadi AS, Wells MT (1990). A note on generalized Wald’s method. Metrika, 37:309–315. [Google Scholar]
- [23].Hadi AS, Wells MT (1991). Minimum distance method of estimation and testing when statistics have limiting singular multivariate normal distribution. Sankhyflā: The Indian Journal of Statistics, Series B, 53: 257–267. [Google Scholar]
- [24].Huber PJ, Ronchetti EM (2009). Robust Statistics. Wiley & Sons, NY. [Google Scholar]
- [25].Khatri CG (1968). Some results for the singular multivariate regression models. Sankhya, Ser. A, 30: 267–280. [Google Scholar]
- [26].Knight K (2008). Shrinkage estimation for nearly-singular designs. Econometric Theory, 24: 323–337. [Google Scholar]
- [27].Kuiper RM, Hoijtink H, Silvapulle MJ (2012). Generalization of the order-restricted information criterion for multivariate normal linear models. Journal of Statistical Planning and Inference, 142: 2454–2463. [Google Scholar]
- [28].Lauritzen N (2011). Lectures on convex sets. http://home.imf.au.dk/niels/lecconset.pdf
- [29].Lim C (2015). Robust ridge regression estimators for nonlinear models with applications to high throughput screening assay data. Statistics in Medicine, 34: 1185–1198. [DOI] [PubMed] [Google Scholar]
- [30].Lim C, Sen PK and Peddada SD (2013). Robust analysis of high throughput screening (HTS) assay data. Technometrics, 55: 150–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Lutkepohl H, Burda MM (1997). Modified Wald tests under nonregular conditions. Journal of Econometrics, 78: 315–332. [Google Scholar]
- [32].MacKinnon JG (2009). Bootstrap hypothesis testing. Handbook of Computational Econometrics, 183–213. [Google Scholar]
- [33].Mulder J, Hoijtink H, Klugkist I (2010). Equality and inequality constrained multivariate linear models: Objective model selection using constrained posterior priors. Journal of Statistical Planning and Inference, 140: 887–906. [Google Scholar]
- [34].Montgomery DC, Peck EA (2012). Introduction to Linear Regression Analysis, 5th Edition. Wiley & Sons, NY. [Google Scholar]
- [35].Moore DS (1977). Generalized inverses, Wald’s method, and the construction of Chi-Squared tests of fit. Journal of the American Statistical Association, 72: 131–137. [Google Scholar]
- [36].Peddada SD, Lobenhofer L, Li L, Afshari C, Weinberg C, Umbach D (2003). Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics, 19: 834–841. [DOI] [PubMed] [Google Scholar]
- [37].Peddada SD, Harris S, Zajd J, Harvey E (2005). ORIOGEN: Order Restricted Inference for Ordered Gene Expression data. Bioinformatics, 21, 3933–3934. [DOI] [PubMed] [Google Scholar]
- [38].Peddada SD, Laughlin S, Miner K, Guyon JP, Haneke K, Vahdat H, Semelka R, Kowalik A, Armao D, Davis B, Baird D. (2008). Growth of Uterine Leiomyomata among Premenopausal Black and White Women. Proceedings of the National Academy of Science, 105: 19887–19892. Epub 2008/12/03. doi: 10.1073/pnas.0808188105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Rao CR, Mitra SK (1971). Further contributions to the theory of generalized inverse of matrices and its applications. Sankhya Ser. A, 33: 289–300. [Google Scholar]
- [40].Rao CR (1972). Linear Statistical Inference and its Applications. Wiley & Sons, NY. [Google Scholar]
- [41].Shaked M, Shanthikumar JG (2007). Stochastic Orders. Springer. [Google Scholar]
- [42].Silvapulle MJ, Sen PK (2005). Constrained Statistical Inference: Order, Inequality, and Shape Constraints. Wiley & Sons, NY. [Google Scholar]
- [43].Silvey SD (1969). Multicollinearity and imprecise estimation. Journal of the Royal Statistical Society, Series B, 31: 539–552. [Google Scholar]
- [44].Tryon PV, Hettmansperger TP (1973). A class of non-parametric tests for homogeneity against ordered alternatives. Annals of Statistics, 1: 1061–1070. [Google Scholar]
- [45].Tukey JW (1962). The future of data analysis. The Annals of Mathematical Statistics, 33: 1–67. [Google Scholar]
- [46].Van Der Geer S, Buhlamnn P, Ritov Y, Dezeure R (2014). On Asymptotically optimal confidence regions and tests for high dimensional models. The Annals of Statistics, 42: 1166–1202. [Google Scholar]
- [47].Vinod HD, Ullah A (1981). Recent Advances in Regression Models. Marcel Dekker, New York, 1981. [Google Scholar]
- [48].Wand MP, Jones MC (1995). Kernal smoothing. CRC Press. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





