Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Apr 1.
Published in final edited form as: Comput Stat Data Anal. 2022 Dec 28;180:107684. doi: 10.1016/j.csda.2022.107684

Unified model-free interaction screening via CV-entropy filter

Wei Xiong a,*, Yaxian Chen b, Shuangge Ma c
PMCID: PMC9997997  NIHMSID: NIHMS1862131  PMID: 36910335

Abstract

For many practical high-dimensional problems, interactions have been increasingly found to play important roles beyond main effects. A representative example is gene-gene interaction. Joint analysis, which analyzes all interactions and main effects in a single model, can be seriously challenged by high dimensionality. For high-dimensional data analysis in general, marginal screening has been established as effective for reducing computational cost, increasing stability, and improving estimation/selection performance. Most of the existing marginal screening methods are designed for the analysis of main effects only. The existing screening methods for interaction analysis are often limited by making stringent model assumptions, lacking robustness, and/or requiring predictors to be continuous (and hence lacking flexibility). A unified marginal screening approach tailored to interaction analysis is developed, which can be applied to regression, classification, and survival analysis. Predictors are allowed to be continuous and discrete. The proposed approach is built on Coefficient of Variation (CV) filters based on information entropy. Statistical properties are rigorously established. It is shown that the CV filters are almost insensitive to the distribution tails of predictors, correlation structure among predictors, and sparsity level of signals. An efficient two-stage algorithm is developed to make the proposed approach scalable to ultrahigh-dimensional data. Simulations and the analysis of TCGA LUAD data further establish the practical superiority of the proposed approach.

Keywords: Coefficient of variation, Conditional entropy, Interaction analysis, Marginal Screening

1. Introduction

For many practical high-dimensional analysis problems, interactions have been increasingly confirmed as playing critical roles beyond main effects [1, 2]. The most representative example is perhaps gene-gene interaction. For a long array of diseases including cancer and cardiovascular diseases, gene-gene interactions with significant implications for disease risk, progression, survival, and other endpoints have been identified. “Genes” analyzed in published studies include SNPs, gene expressions, methylation and other epigenetic changes, microRNAs, and others. Interaction analysis has also been extensively conducted beyond biomedicine.

Most of the existing interaction analyses can be classified as marginal and joint. In marginal analysis, a small number of variables are analyzed at a time. As such, a large number of analyses are needed, leading to a multiple comparison adjustment problem. In contrast, in joint analysis, a large number of variables are collectively analyzed in a single model, leading to a regularized estimation and variable selection problem [3, 4, 5, 6]. The two analysis paradigms serve different purposes, with joint analysis possibly better reflecting, for example, the biology of complex diseases [7]. In this article, we focus on joint analysis. In the literature, many joint interaction analysis methods have been developed, and we refer to [8] and others for review. Despite successful methodological and theoretical developments, in practice, joint interaction analysis is usually seriously challenged by extremely high dimensionality, which can lead to an intolerably high computational cost, lack of stability, and inferior estimation and selection.

For high-dimensional data analysis in general (without and with interactions), marginal screening has been established as highly effective for reducing computational cost and improving stability, estimation, and selection performance. The essence of marginal screening lies in linking effects that are important in a joint model with those that are important in marginal models. Most of the existing screening methods are limited to main effects only. They can be roughly classified as model-based and model-free (robust). In model-based marginal screening, specific parametric or semiparametric models are assumed [9, 10]. In such analysis, when models are correctly specified, consistency properties can be established. However, in high-dimensional data analysis, model misspecification is not uncommon, which can lead to the failure of model-based screening. To tackle this problem, model-free screening methods have been developed, built on the quantile [11], rank correlation [12], distance correlation [13], and other techniques [14, 15, 16, 18]. It is noted that some of the existing techniques have relatively narrow applications. For example, the mean-variance-based sure independence screening [16] is limited to classification problems. The distance-correlation-based sure independence screening [13] requires predictors to have continuous distributions. The existing methods sometimes take quite different forms for different types of response variables, lacking uniformity – this is especially true for model-based methods.

Compared to the analysis of main effects, marginal screening for interaction analysis has been less developed. It may seem that the methods for main-effects screening can be directly applied. However, the validity of such methods often relies on certain no/weak correlation assumptions, which easily break down in interaction analysis. One solution has been brought by the unique variable selection hierarchy of interaction analysis. In particular, it has been argued both statistically and biologically that, if an interaction term is important, then one (under the weak hierarchy) or both (under the strong hierarchy) of the corresponding main effects should also be important [20]. With this hierarchy, progressive screening methods have been developed, which first conduct marginal screening with main effects, and then screen for important interactions corresponding to the selected main effects [19, 20, 21]. It is noted that the existing progressive methods are mostly model-based. In addition, they also demand certain correlation conditions to ensure that important main effects can be identified in the first place. To identify gene-gene and gene-environment interactions, entropy-based methods have been proposed. Examples include an index based on information gain to quantify the interaction effect between two categorical predictors and a response [22] and utilization of mutual information and mutual information gain to quantify gene-environment [23] and gene-gene [24] interactions. There are also approaches that take GINI purity gain into account to detect gene-gene interactions [25, 26]. However, these entropy-based methods generally overestimate dependence [27] and are focused on binary response, having limited practical applications.

In this article, our goal is to develop a new marginal screening approach tailored to interaction analysis. The significance of interaction analysis and marginal screening for such analysis has been well established and will not be reiterated. This study may complement and advance from the existing literature in multiple ways. First, the proposed analysis accommodates interactions and can complement those limited to main effects only – it is also noted that it is directly applicable to the analysis that only has main effects. Second, the proposed approach provides a unified solution to feature screening. It can comprehensively cover categorical, continuous, and censored survival outcomes. This methodological uniformity is much desired and not shared by many of the existing methods. Third, the proposed nonparametric coefficient of variation (CV) filters, built on Shannon’s entropy theory and coefficient of variation statistic, are model-free and have the much-desired robustness property. Its implementation does not require full specification of the distribution of response and covariates. The main consideration for adopting CV is that standard deviation (SD) of entropy information usually changes as mean changes, and dividing by mean can remove its impact on variation. Such a formulation can be even more useful when different predictors have different numbers of categories, as predictors with more categories are likely to be associated with larger information gain regardless of interactions are important or not. In this case, mean can serve as an adjustment factor to deal with this problem. The proposed CV filter is a standardization of SD that allows the comparison of variability regardless of the magnitudes of original features. With the assistance of a two-stage procedure, the proposed approach is also superior by being computationally scalable and fast. In addition, it is shown to have satisfactory performance when signals are weak and/or variables are dependent and heavy-tailed – such a property is not shared by most of the existing methods. Here it is noted that the merit of robustness for joint interaction analysis has been well established, for which we refer to the quantile [28], exponential loss [29, 30], rank-based [31, 32], and many other works. Fourth, the proposed approach can flexibly accommodate discrete, categorical, and continuous predictors, overcoming the stringent demand for continuous distributions by some methods. Lastly, statistical properties are rigorously established, providing the proposed approach a strong statistical ground and also shedding insights into coefficient of variation and entropy theory under high-dimensional settings. Overall, this study can provide a statistically well-grounded and numerically well-performed approach for alleviating computational burden and improving performance in high-dimensional interaction analysis.

2. Methods

The proposed CV filters are based on Shannon’s entropy theory [33], which has demonstrated great power in other fields but has not been well employed in interaction analysis. In Section 2.1, we first develop a new interaction filter based on conditional entropy for data with a categorical response and predictors. Based on this development, a new screening strategy is developed in Section 2.2, and its statistical properties are established in Section 2.3. In Section 2.4, we consider data with continuous and censored survival responses.

2.1. A new interaction filter based on conditional entropy

Let Y be a categorical response with R categories {1,,R}, and X=(X1,,Xp)T be a p-dimensional vector, where Xk is categorical with Jk categories for k=1,,p. In this study, we focus on second-order interactions. Higher-order interactions have been very limitedly investigated in high-dimensional settings. For two predictors Xk and Xl, they do not have an interaction effect, if and only if they are conditionally independent given Y.

Entropy is a key information measure for the uncertainty of random variables. Consider a categorical random variable X and its probability mass function p(x)=P(X=x). Set 0 × log 0 = 0. The entropy of X is defined as:

H(X)=xp(x)logp(x). (2.1)

It is minimal (=0), if X has probability one for a specific category. On the other hand, it is maximal, if X has the same probability for all categories. For an equiprobable variable, H(X) increases with the number of categories.

Consider two variables Xk, Xl. We propose using conditional entropy to quantify the dependence between their interaction and Y. Denote by H(YXk=i,Xl=j)=r=1Rprijlogprij the conditional entropy of Y given Xk=i and Xl=j, where prij=P(Y=rXk=i,Xl=j). Let wijkl=P(Xk=i,Xl=j), i=1Jk, j=1,,Jl be the weight characterizing the probability of (Xk, Xl) falling into class (i, j). Consider the quantity:

Θijkl:=wijklH(YXk=i,Xl=j)=wijklr=1Rprijlogprij. (2.2)

It measures the weighted amount of information remained in response Y given Xk=i and Xl=j. Consequently, H(YXk,Xl)=i=1Jkj=1JlΘijkl is the sum of Θijkl over all possible values that Xk and X1 can take. When (Xk,Xl) has a strong relationship with Y, the uncertainty of Y is expected to significantly decrease after removing effects of (Xk,Xl). Specifically, if Y is completely determined by Xk and X1 (such that Y=g(Xk,Xl), where g() is a deterministic function), H(YXk,Xl) should be 0. By simple calculations, H(YXk,Xl) can be reformulated as:

2H(YXk,Xl)=H(Xk)+H(Xl)2H(Xk,Xl)Predictorinformation+H(YXk)+H(YXl)Maineffects+H(XkY,Xl)+H(XlY,Xk)Interactioneffects (2.3)

We note from (2.3) that H(YXk,Xl) contains multiple sources of information, including the intrinsic information of the predictors, main-effect information H(YXk) and H(YXl), and interaction information H(XkY,Xl) and H(XlY,Xk). When there is no interaction, H(XkY,Xl)=H(XkY) and H(XlY,Xk)=H(XlY). And so, H(YXk,Xl)=H(XkY)+H(XlY)H(XkXl). When Xk and X1 are independent, H(YXk,Xl)=H(YXk)+H(YXl)H(Y). These results motivate us to use conditional entropy to develop an efficient interaction screening filter.

Define Θkl:={Θijkl:i=1,,Jk,j=1,,Jl} as the set of conditional entropy of Y given all the possible values of (Xk,Xl). Intuitively, if there is an interaction, the effect of Xk on Y will not be the same for all levels of X1. This can be fully reflected in the variation of Θkl. Specifically, if all the components of Θkl are equal, (Xk,Xl) should have no effect on Y. On the other hand, if there is a strong interaction, Θkl should have notable variability. As such, the standard deviation (SD) of Θkl can be potentially used to quantify interaction. When different predictors have different numbers of categories, the pair (Xk,Xl) with more categories is likely to have smaller conditional entropy, regardless of the absence or presence of interaction. In addition, the SD of Θkl generally depends on the mean and is not dimensionless. With these considerations, we propose using the coefficient of variation (CV) to quantify interaction. Specifically,

CVkl=CV(YXk,Xl)=σklμkl, (2.4)

where σkl and μkl are the SD and mean of Θkl, respectively. This CV filter is the standardization of SD. Consequently, it allows for the comparison of variability without being affected by the magnitudes of the original variables. It is easy to see that CVkl=CVlk. Its properties are further established in the following proposition.

Proposition 1.

Let Y be a categorical random variable with R(R2) categories {1,,R}, and pr=P(Y=r)>0 for all r=1,,R. Let Xk be a categorical variable with Jk categories {1,,Jk} for k=1,,p. Let 0<wijkl=P(Xk=i,Xl=j)<1 and 0<prij<1 for all r=1,,R, i=1,,Jk, and j=1,,Jl. Then, (1) Θijkl>0 for i=1,Jk and J=1,,Jl, and 0<μkl<H(Y). (2) CVkl0 for k,l{1,,p}. (3) 0CVkl2/4 if (Xk,Xl) and Y are independent. (4) CVkl=0 if and only if (Xk,Xl) and Y are independent and wijkl=1/JkJl for all i=1,,Jk and j=1,,Jl.

Proof is provided in the Supplementary Materials. Result (1) implies that each component of Θkl is positive, and so is the mean value μkl. By Jensen’s inequality, μkl is bounded by the entropy of Y. As R increases, the upper bound also increases. In addition, if (Xk,Xl) and Y are independent, CVkl cannot be too large and is bounded by ∼ 0.35. If (Xk,Xl) falls into each category with an equal probability, CVkl achieves its minimum of 0. By definition, larger CV values indicate stronger interaction effects. As such, CVkl can be utilized as marginal utility for interaction screening.

Although categorical distributions have been assumed, the CV interaction filter can be generalized to continuous and mixture distributions. Specifically, if predictor Xk has a continuous distribution, we can employ slicing and partition Xk into Jk slices, and then the CV interaction filter can be applied. In our numerical studies, we adopt uniform slicing and note that data-dependent, possibly more effective slicing techniques have been developed in the literature.

Let q(j) be the j/Jkth percentile of Xk, j=1,,Jk1, q(0)= and q(Jl)=, k=1,,p. The CV interaction filter is defined by replacing wijkl and prij in equation (2.2) by wijkl=P(Xk(q(i1),q(i)],Xl(q(j1),q(j)]) and prij=P(Y=rXk(q(i1),q(i)],Xl(q(j1),q(j)]), respectively. A similar idea can be applied to mixture distributions. When Xk is continuous and X1 is categorical, the CV filter can be constructed by replacing wijkl and prij in equation (2.2) with wijkl=P(Xk=i,Xl(q(j1),q(j)]) and prij=P(Y=rXk=i,Xl(q(j1),q(j)]), respectively.

With a sample of n iid observations {Yt,xt1,,xtp}, t=1,,n, CVkl can be estimated by plugging in sample means and standard deviations. That is,

CV^kl=i=1Jkj=1Jl(Θ^ijklμ^kl)2μ^klJkJl1, (2.5)

where Θ^ijkl=w^ijr=1Rp^rijlogp^rij, w^ij=1nt=1nI(xtk=i,xtl=j), p^rij=t=1nI(xtk=i,xtl=j,Yt=r)t=1nI(xtk=i,xtl=j), I() is the indicator function, i{1,,Jk}, j{1,,Jl} and μ^kl=i=1Jkj=1JlΘ^ijkl/JkJl.

Illustrating examples

In Section 2 of the Supplementary Materials, we present two illustrating examples, which can provide more insights into working characteristics of the CV interaction filter and show that it performs well for both continuous and categorical predictors.

Remark 1.

Following the same principle, the CV filter approach can be applied to screen main effects. This can be achieved by replacing (Xk,Xl) with its marginals Xk or Xl in (2.4) and rewriting as CVk=CV(YXk) or CVl=CV(YXl). Properties are similar to the above. In particular, CV(YXk)=0 if and only if Xk and Y are independent and wk=1/Jk for k=1,,Jk. We refer to the approach of applying the CV filter to main effects only as CVMS (which will be further considered below).

2.2. Screening Strategy

Define the index sets of predictors and their second-order terms as:

𝓣1={1,2,,p},𝓣2={(k,l):1k<lp}.

Define the active main effect and interaction index sets as:

𝓓1={j:F(YX)dependsonXjforsomeY=r,j𝓣1},
𝓓2={(k,l):XkandXlarenotconditionallyindependentforsomeY=r,(k,l)𝓣2}.

The full model index set is 𝓕=𝓣1𝓣2, and the true model index set is 𝓓=𝓓1𝓓2. For a model 𝓜, we use |𝓜| to denote its size. As described above and in the literature, interaction analysis faces the additional complexity of the variable selection hierarchy. Here we consider the weak hierarchy, under which if (k,l)𝓓2, then at least one of Xk and Xl should also be identified. Extending to the strong hierarchy can be easily carried out.

We propose the following two-stage approach, which screens main effects and interactions in two consecutive steps. In particular,

Stage 1.

CVMS: Apply the CV-main-effect filter to 𝓣1, and identify:

𝓓^1={k𝓣1:CV^kisamongthedn1largest}.

Stage 2.

CVIS: Apply the CV-interaction filter to 𝓓^1{(k,l)𝓣2:{k}{l}𝓓^1}, and identify

𝓓^2={(k,l)𝓣2:CV^klisamongthedn2largest,withatleastoneofkandl𝓓^1}.

The working active set for downstream analysis is then 𝓓^=𝓓^1𝓓^2. Here, we note that some existing progressive methods demand multiple iterations to update 𝓓^1 and 𝓓^2 [20]. In contrast, the proposed approach is not iterative. In our theoretical investigations below, we examine the asymptotic requirements on |𝓓^1| and |𝓓^2|. In our numerical study, we take dn1=[n/logn] and dn2=2[n/logn]. The value of dn1 is consistent with that in the literature [11, 13, 15, 16], and the value of dn2 has been motivated by the squared dimensionality of interactions. Our numerical study below suggests satisfactory performance. On the other hand, we note that when [n/logn] is small in practical data analysis, “to be cautious”, a larger value can be taken.

Remark 2.

Conceptually, with binary distributions, the CV filter may lose power, as the SD in the definition of CVk may not be sufficiently informative with only two data points. However, our numerical study below still suggests reasonable performance with binary distributions. When there are three or more categories, empirical study suggests highly satisfactory performance of the proposed approach. As a possible variation, one can directly apply the CV filter to the combination of main effects and interactions. Then the selected set, if needed, can be enriched to satisfy the variable selection hierarchy.

2.3. Statistical Properties

First consider the scenario with all predictors being categorical. Assume the following conditions:

  • (C1) There exist two positive constants c1 and c2, such that c1+c2R, c1/Rprc2/R, c1/Rprijc2/R. There exist two positive constants c3 and c4, such that c3+c4JkJl,c3/JkJlwijklc4/JkJl for 1k<lp, i{1,,Jk} and j{1,,Jl}.

  • (C2) There exist two positive constants c > 0 and 0τ<14, such that min(k,l)𝓓2CVkl2cnτ.

  • (C3) R=O(ns), J=max1kpJk=O(nt), where s ≥ 0, t ≥ 0 and 4τ+4s+12t<1.

  • (C4) liminfp{min(k,l)𝓓2CVkl2max(k,l)𝓓2cCVkl2}δ, where δ is a positive constant.

Condition (C1) guarantees that the proportion of each category of the response and pair of predictors cannot be too large or too small. Similar assumptions have been made in the literature [14, 16]. Condition (C2) is common in the marginal screening literature and requires that the minimum true signal is at least at the order of nτ. Condition (C3) allows the number of categories for the response and predictors to diverge with a certain order, and the maximum number of categories for predictors is allowed to vary with sample size n. Condition (C4) is assumed to separate the active interaction set from noises. It ensures that CVkl of an active interaction is always larger than that of an inactive one at the population level. Compared to the partial orthogonality condition [17] (that CVkl>0 for (k,l)𝓓2 and CVkl=0 for (k,l)𝓓2c), Condition (C4) is weaker in that the effects are not required to be 0 for all inactive interactions to have the consistency property in ranking. In fact, (k,l)𝓓2c does not necessarily imply CVkl=0. The quantity is zero only if (Xk,Xl) is independent of Y and (Xk,Xl) falls into each category with an equal probability. In comparison, the Pearson’s Chi-squared-based sure independence screening [14] requires the effects of all inactive covariates to be zero to enjoy the strong sure screening property.

Theorem 1.

(Sure screening for categorical predictors) Under conditions (C1)(C3),

P(𝓓2𝓓^2)1O(p2exp{mn1(4τ+4s+12t)+(2t+s)logn}),

where m is a positive constant. Therefore, if logp2=O(nξ) and ξ<14τ4s12t, CVIS has the sure screening property.

Proof is provided in the Supplementary Materials. This result ensures that the estimated set of interactions contains the truly important ones with probability approaching one. It is noted that, as conditional entropy has the much-desired robustness property, CVIS is robust to heavy-tailed distributions of predictors and presence of outliers – a property not shared by most of the existing methods. Further, the sure screening property holds when predictors and/or response have a diverging number of categories.

Remark 3.

Under the same conditions, the CV(YXk) filter also possesses the screening consistency property for main effects. In particular, it can be shown that, as n, P(𝓓1𝓓^1)1O(pexp{mn1(2τ+4s+4t)+(s+t)logn}), where m is a positive constant. So if ξ<14τ4s12t in Theorem 1 is satisfied, ξ<12τ4s4t. Then the sure screening property holds for main effects.

To accommodate continuous distributions, additional assumptions are needed, and condition (C3) needs to be revised. Specifically,

  • (C5) If both Xk and Xl are continuous, then there exists a constant c5 such that 0<fk(xY=r)<c5 for any 1rR and X in the domain of xk, where fk(xY=r) is the Lebesgue density function of Xk conditional on Y = r. There exists a constant c6 such that 0<fk(xXl𝓐,Y=r)<c6 for any 1rR and x in the domain of xk, 𝓐 in the domain of Xl, where fk(xXl𝓐,Y=r) is the Lebesgue density function of Xk conditional on Y = r and Xl𝓐.

  • (C5’) If Xk is continuous and Xl is categorical, then there exists a constant c6 such that 0<fk(xXl=j,Y=r)<c6 for any 1rR, 1jJl and X in the domain of Xk, where fk(xXl=j,Y=r) is the Lebesgue density function of Xk conditional on Y=r and Xl=j.

  • (C6) There exist positive constants c7 and 0ρ<12 such that fk(x)c7nρ for any 1kp and x in the domain of Xk, where fk(x) is the Lebesgue density function of Xk. Further, fk(x) is continuous in the domain of Xk.

  • (C3’) R=O(ns), J=max1kpJk=O(nt), where s0,t0, and 4τ+4s+12t+2ρ<1.

Conditions (C5) and (C5’) exclude the extreme scenario where Xk places a heavy mass in a small range. Condition (C6) is mild and assumed for technical considerations. It requires a lower bound that is in the order of nρ for the density of Xk. For data with both categorical and continuous predictors, we can establish the following results.

Theorem 2.

(Sure screening for both categorical and continuous predictors) Under Conditions (C1), (C2), (C3’), (C5), (C5’), and (C6),

P(𝓓2𝓓^2)1O(p2exp{mn1(4τ+4s+12t+2ρ)+(2t+s)logn}),

where m is a positive constant. Therefore, if logp2=O(nξ) and ξ<14τ4s12t2ρ, CVIS has the sure screening property.

Theorem 3.

(Ranking consistency) If Conditions (C1), (C4), (C5), and (C6) hold for log(JkJlR)/logn=O(1) and max{logp2,logn}Jk6Jl6R4/n12ρ=o(1), then

liminfn{min(k,l)𝓓2CV^kl2max(k,l)𝓓2cCV^kl2}>0,a.s.

This result establishes that for continuous, categorical, and mixture distributions, in a unified way, the proposed approach can properly rank and hence separate important and unimportant interaction terms. We note that Condition (C4) may be slightly stronger than some of its counterparts. However, the corresponding consistency result is also stronger. It justifies a clear gap between active and inactive interactions at the sample level. That is, the CVkl values of active interactions are always ranked above those of inactive ones with an overwhelming probability. Thus, with an appropriate cutoff, active and inactive interactions can be separated.

Remark 4.

(Computational complexity) By definition, the CV interaction filter allows the numbers of categories to differ across predictors. It can be derived that the computational complexity is O(J2Rn). Further, by Condition (C3), it is O(n1+s+2t), where 4τ+4s+12t<1. Therefore, it is less than O(n5/4).

2.4. Accommodating continuous and censored survival responses

With the assistance of slicing, the CV interaction filter can accommodate continuous responses. Specifically, we define a partition

𝓖={[ag,ag+1):ag<ag+1,g=0,,G1},

where a0= and aG=. Each [ag,ag+1) is referred to as a slice. We then define a random variable TY{1,,G} such that TY=g+1 if and only if Y is in the gth slice. The slicing counterpart of Θijkl is:

Θijkl,𝓖:=wijklg=0G1pgij𝓖logpgij𝓖,

where pgij𝓖=P(TY=g+1Xk=i,Xl=j)=P(agY<ag+1Xk=i,Xl=j). The slicing |version of the conditional entropy set can be formulated as Θkl,𝓖={Θijkl,𝓖:i=1,,Jk,j=1,,Jl}. As such, we have the CV interaction filter for a continuous response:

CVkl𝓖=σkl𝓖/μkl𝓖, (2.6)

where σkl𝓖 and μkl𝓖 are the standard deviation and mean of Θkl,𝓖, respectively. In our numerical analysis, we adopt uniform slicing. Following [15], we propose G=2,,[logn].

Now consider data with right-censored responses. Instead of {(Xi,Yi),i=1,,n}, we observe {(Xi,Yi*,Δi),i=1,n}, where Yi*=min(Yi,Ci) and Δi=I(YiCi). Here, we assume that the censoring variable Ci is independent of Yi and Xi. Denote S(t)=P(Cit). Let S^(t) be the Kaplan-Meier estimator of S(t). To apply the CV interaction filter, we first apply uniform slicing and partition Y* into G slices. The inverse-probability-of-censoring CV filter for screening main effects (IPCW-CVMS) is based on statistic:

CV^k*:=CV^(Y*Xk)=σ^k*μ^k*=i=1Jk(Θ^kiμ^k*)2Jk1μ^k*

where Θ^ki=w^ig=0G1p^gi*logp^gi*, w^i=1nt=1nI(Xtk=i), p^ri*=t=1nΔtS^(Yt*)I(Xtk=i,agYt*<ag+1)/t=1nI(Xtk=m) and μ^k*=i=1JkΘ^ki/Jk. The rationale behind this is that:

E{ΔtS(Yt*)I(Xtk=i,agYt*<ag+1)}=P(Xtk=i,agYt<ag+1).

With the same strategy, the inverse-probability-of-censoring CV filter for screening interactions (IPCW-CVIS) is based on statistic:

CV^kl*:=CV^(Y*Xk,Xl)=i=1Jkj=1Jl(Θ^klijμ^kl*)2μ^kl*JkJl1, (2.7)

Where Θ^klij=w^ijg=0G1p^gij*logp^rij*, w^ij=1nt=1nI(Xtk=i,Xtl=j), p^gij*=t=1nΔtS^(Yt*)I(Xtk=i,Xtl=j,agYt*<ag+1)/t=1nI(Xtk=i,Xtl=j), I() is the indicator function, i{1,,Jk}, j{1,,Jl} and μ^kl*=i=1Jkj=1JlΘ^klij/JkJl.

For continuous and censored responses, with the statistics defined above, screening can be conducted in the same manner as described in Section 2.2. In addition, as described in the previous subsections, the proposed screening can accommodate categorical, continuous, and mixture predictor distributions.

3. Simulation

To gauge performance of the proposed CVMS+CVIS, we compare with the following competitors: (a) PCS, which conducts the Pearson’s Chi-squared-based screening of main effects [14], (b) DCS, which conducts the distance correlation-based screening of main effects [13], (c) IGS, which conducts the information gain-based screening of main effects [18], (d) CVMS+PCIS, which conducts the screening of main effects using the proposed CV filter and the screening of interactions using the Pearson’s Chi-squared-based technique, and (e) CVMS+KIF, which is similar to approach (d), with the interaction screening based on the Kendall Interaction filter [34], (f) PCS+PCIS, which conducts the screening of main effects and interactions using the Pearson’s Chi-squared-based technique [14]. For Examples 2 and 3, we also include IIS [6], which conducts the screening of interactions for nonlinear classification, for comparison. We acknowledge that there are many other screening methods. The above have been chosen because of their competitive performance. In particular, comparing with alternatives (a)-(c) can reveal merit of the proposed CV filter in the main effect screening step, and comparing with alternatives (d)-(f) can reveal merit in the interaction screening step. With 500 replicates, we compare performance using the following criteria: (a) MMS, which is the minimum model size required to include all of the true active predictors. Its 5%, 25%, 50%, 75%, and 95% quantiles are reported, (b) 𝓟1, which is the probability that all active main effects are ranked in the top dn1=[n/logn], and 𝓟2, which is the probability that all active interactions are ranked in the top dn2=2[n/logn], (c) CZ, which is the percentage of correctly identified inactive predictors (among all identified inactive predictors), and (d) IZ, which is the percentages of mistakenly identified active predictors (among all identified active predictors). With the following examples, we consider n = 200, 500 and p = 1000, 5000. Here we note that, although p may seem moderate, the dimensionality of interaction analysis is actually extremely high, and screening is warranted.

Example 1.

(Index model) Denote Σ=(σij)p×p with σij=ρ|ij|. Cauchy(0, Ip) is the p-dimensional standard Cauchy distribution. Consider the index model:

Y=(X1+X2X1X2)3+exp(X32X3X4)+ε.

For X and ε, we consider the following three cases:

Case (1a): XN(0,Σ),εN(0,1), and ρ = 0.5;

Case (1b): uCauchy(0,Ip), X=Σ1/2u,εN(0,1), and ρ = 0.5;

Case (1c): the same as Case (1a) except that ρ = 0.8.

The active main effect set and interaction set are 𝓓1={1,2,3} and 𝓓2={(1,2),(3,4)}, respectively. When slicing, we partition each predictor into three categories and the response into two and three categories (R = 2 and 3). Results are summarized in Table 1. We can see that all approaches tend to be more accurate when the number of slices for the response increases. The proposed approach performs the best with the highest selection probability and smallest model size. In the main effect screening, performance of all the methods is insensitive to the number of slices, when the dependence structure of the predictors is complicated. DCS performs worse with the heavy-tailed predictors. IGS and PCS cannot maintain reasonable model sizes at the 75% and 95% quantiles. In comparison, CVMS performs well in all settings. In the interaction screening, CVIS outperforms the alternatives by a large margin. And its performance is almost insensitive to the dependence structure of covariates and extreme values. The alternatives fail with too many false discoveries, especially when the sample size is small.

Table 1:

Simulation Example 1: means of performance measures based on 500 replicates. A cell is left empty if the corresponding method is not applied.

Main-effect selection |𝓓1| = 3.0 Interaction selection |𝓓2| = 2.0


(n,p) Method 5% 25% 50% 75% 95% 𝓟1 CZ IZ 5% 25% 50% 75% 95% 𝓟2 CZ IZ

Case (1a): uniform slicing, R = 2
(200,1000) DCS 3.0 3.0 3.0 3.0 6.0 0.990 0.997 0.000
IGS 3.0 3.0 3.0 3.0 6.0 0.980 0.997 0.000
CVMS+CVIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 4.0 8.0 11.0 24.0 813 0.780 0.996 0.150
PCS+PCIS 3.0 3.0 3.0 3.0 7.0 0.980 0.997 0.000 6.0 10.0 17.0 109 1317 0.710 0.996 0.200
CVMS+KIF 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 3.0 16.0 64.0 708 4137 0.400 0.996 0.375
(500,1000) DCS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
IGS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
CVMS+CVIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 4.0 4.0 8.0 13.0 18.0 1.000 0.996 0.000
PCS+PCIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 9.0 10.0 14.0 15.0 18.0 1.000 0.996 0.000
CVMS+KIF 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 3.0 3.03 5.0 17.0 483 0.920 0.996 0.050
Case (1a): uniform slicing, R = 3
(200,1000) DCS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
IGS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
CVMS+CVIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 2.0 8.0 15.0 23.0 4382 0.815 0.996 0.050
PCS+PCIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 8.0 15.0 96.0 1242 8027 0.365 0.996 0.351
CVMS+KIF 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 3.0 5.0 47.0 403 3024 0.420 0.996 0.250
(500,1000) DCS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
IGS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
CVMS+CVIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 2.0 3.0 4.0 8.0 15.0 1.000 0.996 0.000
PCS+PCIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 5.0 8.0 15.0 17.0 33.0 0.960 0.996 0.000
CVMS+KIF 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 3.0 3.0 5.0 5.0 11.0 1.000 0.996 0.000
Case (1b): uniform slicing, R = 2
(200,1000) DCS 3.0 3.0 3.0 5.0 23.0 1.000 0.996 0.000
IGS 3.0 3.0 3.0 4.0 20.0 1.000 0.996 0.000
CVMS+CVIS 3.0 3.0 3.0 5.0 16.0 1.000 0.996 0.000 5.0 9.0 13.0 16.0 4480 0.820 0.996 0.050
PCS+PCIS 3.0 3.0 3.0 7.0 85.0 0.950 0.996 0.002 5.0 8.0 9.0 14.0 9996 0.800 0.996 0.075
CVMS+KIF 3.0 3.0 3.0 5.0 16.0 1.000 0.996 0.000 49.0 183 482 1421 9996 0.170 0.996 0.400
(500,1000) DCS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
IGS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
CVMS+CVIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 3.0 6.0 9.0 13.0 17.0 1.000 0.996 0.000
PCS+PCIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 4.0 8.0 10.0 13.0 17.0 1.000 0.996 0.000
CVMS+KIF 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 7.0 7.0 7.0 9.0 13.0 1.000 0.996 0.000
Case (1b): uniform slicing, R = 3
(200,1000) DCS 2.0 3.0 3.0 4.0 35.0 0.960 0.997 0.000
IGS 3.0 3.0 4.0 5.0 15.0 1.000 0.997 0.000
CVMS+CVIS 3.0 3.0 3.0 6.0 17.0 1.000 0.997 0.00 3.0 6.0 11.0 16.0 4519 0.860 0.996 0.075
PCS+PCIS 3.0 3.0 4.0 7.0 23.0 1.000 0.997 0.000 4.0 9.0 13.0 2512 8993 0.750 0.996 0.125
CVMS+KIF 3.0 3.0 3.0 6.0 17.0 1.000 0.997 0.000 4.0 6.0 9.0 89.0 9996 0.700 0.996 0.200
(500,1000) DCS 2.0 2.0 3.0 3.0 46.0 0.950 0.997 0.000
IGS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
CVMS+CVIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 3.0 4.0 5.0 6.0 10.0 1.000 0.996 0.000
PCS+PCIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 3.0 5.0 6.0 8.0 11.0 1.000 0.996 0.000
CVMS+KIF 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 5.0 5.0 7.0 9.0 12.0 1.000 0.996 0.000
Case (1c): uniform slicing, R = 2
(200,1000) DCS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
IGS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
CVMS+CVIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 10.0 15.0 19.0 25.0 36.0 1.000 0.996 0.000
PCS+PCIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 13.0 18.0 21.0 27.0 34.0 1.000 0.996 0.000
CVMS+KIF 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 79.0 459 818 2030 7580 0.080 0.996 0.500
(500,1000) DCS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
IGS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
CVMS+CVIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 2.0 4.0 5.0 8.0 14.0 1.000 0.996 0.000
PCS+PCIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 16.0 18.0 18.0 19.0 26.0 1.000 0.996 0.000
CVMS+KI 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 44.0 101 253 356 555 0.350 0.996 0.475
Case (1c): uniform slicing, R = 3
(200,1000) DCS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
IGS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
CVMS+CVIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 6.0 12.0 15.0 22.0 37.0 1.000 0.996 0.000
PCS+PCIS 3.0 3.0 3.0 3.0 4.0 1.000 0.997 0.000 10.0 14.0 17.0 21.0 22.0 1.000 0.996 0.000
CVMS+KIF 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 89.0 128 1242 1711 2072 0.350 0.996 0.325
(500,1000) DCS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
IGS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000
CVMS+CVIS 3.0 3.0 3.0 3.0 3.0 1.000 0.997 0.000 3.0 6.0 13.0 20.0 34.0 1.000 0.996 0.000
PCS+PCIS 3.0 3.0 3.0 3.0 4.0 1.000 0.997 0.000 10.0 16.0 17.0 19.0 30.0 1.000 0.996 0.000
CVMS+KIF 3.0 3.0 3.0 3.0 3.0 1.000 0.996 0.000 13.0 18.0 54.0 1088 1945 1.000 0.996 0.000

Example 2.

Consider data with a binary response and categorical predictors. For the ith observation, Yi is generated from two settings: (a) pk=P(Yi=k)=1/2, k=1,2, and (b) P(Yi=1)=3/4 and P(Yi=2)=1/4. Conditional on Yi, the predictors are generated under the following cases.

Case (2a): (binary) P(Xij=1Yi=k)=θkj for j = 1, 3 and k = 1, 2.

P(Xi,2r=1Yi=k,Xi,2r1=2)=0.05I(θk,2r10.5)+0.4I(θk,2r1<0.5),
P(Xi,2r=1Yi=k,Xi,2r1=1)=0.95I(θk,2r10.5)+0.4I(θk,2r1<0.5),

for r{1,2}, where (θ11,θ13,θ21,θ23)=(0.1,0.2,0.8,0.9), and θk,j=0.5 for the other k and j.

Case (2b): (3-level categorical) P(Xi,j=mYi=k)=θkj,m for m=1,2,3, k=1,2, and j{1,3,5}.

P(Xi,j+1=1Yi=1,Xi,j=m)=0.05I(θkj,m0.5)+0.40I(θkj,m<0.5),
P(Xi,j+1=2Yi=1,Xi,j=m)=0.05I(θkj,m0.5)+0.10I(θkj,m<0.5),
P(Xi,j+1=1Yi=2,Xi,j=m)=0.90I(θkj,m0.5)+0.40I(θkj,m<0.5),
P(Xi,j+1=2Yi=2,Xi,j=m)=0.05I(θkj,m0.5)+0.20I(θkj,m<0.5),

for j{1,3} and m=1,2,3. For j>5, P(Xj=m)=1/3 for m=1,2,3. Detailed values for θkj,m are presented in Table S2 of the Supplementary Materials.

Case (2c): (continuous) XjY=1N(4j,1), XjY=2N(2,1) for j{1,3}.

Xj+1Y=k,Xj3.50.5jN(5.50.5jk,1),
Xj+1Y=k,Xj<3.50.5jN(1.50.5j+k,1),

for j{1,3} and k{1,2}. Other Xj’s for j=5,,p follow a standard normal distribution.

For Case (2a) and Case (2c), 𝓓1={1,2,3,4}, 𝓓2={(1,2),(3,4)}, and for Case (2b), 𝓓1={1,2,3,4,5} and 𝓓2={(1,2),(3,4)}.

For Case (2b), we find that PCS has highly unsatisfactory performance with main effects. As a “remedy”, we consider CVMS+PCIS, which adopts the proposed CV filter for main-effect screening, as opposed PCS+PCIS. Results are summarized in Tables S3S6 in the Supplementary Materials. The overall findings are similar to those of Example 1. Specifically, as the number of categories of the predictors increases, performance of all methods gets slightly worse. Better performance is observed when the sample size increases. For main-effect screening, all methods perform well with binary predictors. With 3-level predictors, DCS, PCS, and IGS fail – they may miss important main effects even when the sample size is large. In comparison, CVMS consistently has higher coverage rates and smaller MMS values. For interaction screening, KIF and IIS break down, while PCIS and CVIS perform reasonably well. This is expected since IIS requires the predictors to be sub-Gaussian to enjoy the sure screening property. CVIS performs slightly better in almost all settings. When the sample size is small, PCIS tends to have large MMS and lower coverage rates, as the number of predictor categories increases. As expected, the difference is more obvious with imbalanced data. For Case (2c), when the predictors are continuous, we further include two other main-effect screening methods for comparison, namely the mean-variance based sure independence screening (MVS,[16]) and fused Kolmogorov filter (FKF,[15]). To apply CVMS, we dichotomize each continuous predictor at its median. Results are provided in Table S5S6 in the Supplementary Materials, where we again observe superiority of the proposed approach. Compared to DCS, IGS, PCS, MVS, and FKF, CVMS is either the best or one of the best. Its performance can be further improved with three or more categories as described in Remark 2. In addition, CVIS has much smaller model sizes at high quantiles and higher probabilities of including all active interactions, especially under the imbalanced design and when the number of the predictors is large but the sample size is small.

Example 3.

(Generalized linear model) We simulate from the logistic model:

log(P(Y=1X)1P(Y=1X))=3X1+2X26X1X3.

We consider two different cases for X=(X1,,Xp)T.

Case (3a): (continuous predictors) XN(0,Σ), where Σ has off-diagonal entries being 0. The first and third diagonal entries are 2 and 4, respectively, and the other diagonal entries are 1.

Case (3b): (a mixture of continuous and categorical predictors) For j{1,2,p/2+2,,p}, Xj is independently generated from N(μj,1). And for j{3,4,,p/2+1}, Xj follows a Bernoulli distribution with P(Xj=1)=1/2μ1=1, and the others are zero. The active interaction term, X1X3, is a product of a binary predictor and a continuous one.

Under this example, 𝓓1={1,2} and 𝓓2={(1,3)}. This model has a relatively simple structure. Here, we do not adopt the two-stage strategy. Instead, we directly resort to interaction screening. With CVIS and PCIS, all the continuous predictors are converted to binary via dichotomizing at the medians. Results are presented in Table S7 in the SupplementaryMaterials. The patterns of the findings are comparable to those above. The proposed CVIS is able to separate the true nonzero effects from the zeros with high accuracy. It outperforms IIS in terms of 𝓟2. It is comparable to KIF in terms of MMS and slightly outperforms PCIS. In terms of CZ, it is superior to the other two approaches, which have more false discoveries.

Example 4.

(Transformation model for a censored response) Yi is generated from the transformation model:

H(Yi)=XiTβ1ZiTβ2+ε,

where Xi is the p-dimensional vector of predictors, Zi=(Xi1Xi2,,Xi1Xip,,Xi,p1Xip) contains all two-way interactions, and H(c)=log{0.5(e2c1)}. The predictors are generated from a multivariate normal distribution with marginal means 0 and covariance Σ with cov(Xij,Xik)=0.5|jk|.εN(0,1). The censoring variable Ci is generated from a uniform distribution on [0, 7], and the censoring rate is around 15%. We set β1=(1.2,1.0,0.9,05,0.9,0.8,1.2,0p11) and β2=(1.0,07p+28,1.0,00.5p27.5p30). Thus, 𝓓1={1,2,3,9,10,11} and 𝓓2={(1,2),(8,9)}. For this example, we compare against IPCW-tau [35] for main effect screening, and against PC-IPCW-tau [36], PCIS, and KIF for interaction screening. In addition, we also consider all main effects and interactions “equally” and apply IPCW-tau. When applying the IPCW-CV filters, we equally discretize the response and continuous predictors into three categories. The results are summarized in Table S8 in the Supplementary Materials. We observe similar superior performance of the proposed approach. It is also noted that performance of the proposed approach does not seem to strongly depend on censoring.

4. Analysis of TCGA data

We analyze data on lung adenocarcinoma (LUAD). The dataset is obtained from The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov/). TCGA has published high-quality omics and clinical data on multiple cancer types. The TCGA LUAD data has been analyzed in multiple studies, and both main effects and interactions have been examined [38, 28]. We refer to the TCGA website and existing literature for information on study design and data collection. Multiple types of omics data are available. Here we analyze mRNA gene expressions, which have been considered in multiple interaction analyses. To demonstrate the broad applicability of the proposed approach, we consider both censored survival and categorical response variables. In the original dataset, the expression values of 19,559 genes are available. Although in principle the proposed approach can be directly used, to generate more reliable results with a limited sample size, we first conduct a moderate unsupervised screening and retain the 5,000 genes with the largest marginal variances. Thus, in the following analysis, there are 5,000 candidate main effects and 12,497,500 possible second-order interactions. Such a dimensionality is considerably higher than in most published studies. Some demographic/clinical variables are also available. We focus on gene expressions but note that the proposed approach can be potentially coupled with conditional screening to accommodate these additional variables.

4.1. Analysis of censored overall survival

We first consider overall survival, which is subject to right censoring. Among the 493 subjects, 178 have observable survival times, and the rest 315 are censored. The observed survival times range from 0.13 to 165.37 months, with a median of 20.52 months. The censoring times range from 0.37 to 241.6 months, with a median of 22.32 months.

For the proposed approach, we first equally discretize the survival outcome into two categories and take a uniform slicing to partition each gene expression measurement into three slices. The proposed screening leads to 79 main effects and 158 interactions. Detailed results are shown in Table 2. A quick literature search suggests that many of the remaining genes have sound biological implications. For instance, gene DDX59 has been identified to promote DNA replication in lung adenocarcinoma. SOD3 re-expression in tumor-associated endothelial cells increases doxorubicin delivery into and chemotherapeutic effect on tumors. CSAG2 has been found to be necessary and sufficient to drive cell and tumor growth. Gene MAGEA4 is overexpressed and can serve as an immunotherapy target in various malignant tumors, including non-small cell lung cancer. TENM1 has been identified in vertebrates, coding for membrane proteins that are mainly involved in embryonic and neuronal development. CENATAC depletion or expression of disease mutants results in excessive retention of AT-AN minor introns in about 100 genes enriched for nucleocytoplasmic transport and cell cycle regulators, and causes chromosome segregation errors. We acknowledge that those in Table 2 are not the final identification results. However, the highly sensible candidates can still provide some support to the validity of the proposed approach.

Table 2:

Analysis of censored overall survival: 79 genes identified by CVMS and 158 interactions identified by CVMS+CVIS.

CVMS-main effects CVMS+CVIS-interactions


DDX59 PERM1 CSAG2-ELOVL4 MAGEA4-CNTN1 MNDA-CPVL
CPVL NOTUM CSAG2-CARD14 MAGEA4-LGR5 PAGE2-SLC40A1
TNFRSF11B RHBDL1 CSAG2-ALDH1L2 PAGE2-MAL CD4-CPVL
MYOZ1 RAB36 CSAG2-TDO2 MAGEA4-ACKR3 GUCA2B-ACKR3
TDO2 ATP8B2 VCX3A-CENATAC GUCA2B-LGR5 GUCA2B-TENM1
GPD1L FAM83A CSAG2-ACKR3 TLR4-CPVL TAC1-ELOVL4
CENATAC HAVCR1 CSAG2-MYOZ1 PAGE2-CARD14 GUCA2B-EHF
RNF213 PLAC8 CSAG2-TNFRSF11B ATG16L2-CENATAC HOXD13-CARD14
ACKR3 CPXM2 CSAG2-CNTN1 SEC31B-CENATAC PAGE2-SOD3
EHF COL18A1 CSAG2-CPVL BEX2-BEX4 HOXD13-CNTN1
CNTN1 SLC1A3 VCX3A-ELOVL4 SLCO2B1-CPVL GUCA2B-MYOZ1
MAL GLRB CSAG2-CCDC184 MAGEA4-CARD14 GUCA2B- DAAM2
TENM1 CELSR1 VCX3A-TENM1 PAGE2-CPVL GOLGA8B-CENATAC
SOD3 NPIPA5 CSAG2-EHF MAGEA4-MAL GUCA2B-ALDH1L2
CARD14 ACSS3 CSAG2-DAAM2 GUCA2B-CARD14 NPY-LGR5
LGR5 OGT VCX3A-CCDC184 GUCA2B-BEX4 GUCA2B-CENATAC
ELOVL4 C15orf48 VCX3A-MYOZ1 GUCA2B-MAL P2RY13- CPVL
BEX4 CABCOCO1 HOXD13- LOVL4 MAGEA4-GPD1L GUCA2B-ELOVL4
CCDC184 AHNAK CSAG2-GPD1L MAGEA4-TENM1 GUCA2B-SLC40A1
ALDH1L2 RACGAP1 CSAG2-BEX4 PAGE2-BEX4 GUCA2B-GPD1L
DAAM2 CRISPLD2 VCX3A- CNTN1 PAGE2-ALDH1L2 TLR8-CPVL
RGS20 CDK14 CSAG2-SOD3 MAGEA4-TDO2 MAGEB2-ALDH1L2
ARHGEF26 SDSL CSAG2-CENATAC MAGEA4-MYOZ1 FAM193B-CENATAC
ABCA7 LIMS2 VCX3A-GPD1L PAGE2-TDO2 CD84-CPVL
SLC22A18 ENPP5 VCX3A-CARD14 GUCA2B-TNFRSF11B GUCA2B-TDO2
AGAP9 SLC16A8 VCX3A-MAL HOXD13-CCDC184 PAGE2-DAAM2
ZMYND12 VCX3A- RNF213 GUCA2B-CCDC184 NPY-ELOVL4
TONSL MAGEA4- ELOVL4 MAGEA4-DAAM2 AIF1-CPVL
GUCY1A1 VCX3A-CPVL MAGEA4-BEX4 MAGEB2-TDO2
MYO5C VCX3A-BEX4 MAGEA4-SOD3 HOXD13-CPVL
TNFSF4 VCX3A-ALDH1L2 MAGEA4-RNF213 HOXD13-ACKR3
PRLR MAGEA4-CPVL PAGE2-TNFRSF11B HOXD13-SOD3
BOK VCX3A-ACKR3 PAGE2-GPD1L LMNTD2-CENATAC
SLC9A5 VCX3A-TNFRSF11B HOXD13-LGR5 HOXD13-TNFRSF11B
GPR143 VCX3A-TDO2 TLR7-CPVL GUCA2B-SOD3
USP27X VCX3A-EHF LENG8-CENATAC CSF1R-CPVL
RFTN1 VCX3A-SOD3 PAGE2-CCDC184 NPY-CNTN1
TRIB3 CCNL2-CENATAC MAGEA4-EHF NCKAP1L-CPVL
WDHD1 VCX3A-DAAM2 HOXD13-TENM1 HOXD13-BEX4
OSBPL6 VCX3A-SLC40A1 TAC1-BEX4 MAGEB2-CARD14
C8B PAGE2-CENATAC PAGE2-RNF213 NPY-BEX4
HOXB3 PABPC1L-CENATAC PAGE2-MYOZ1 NPY-MYOZ1
CBLC MAGEA4-TNFRSF11B PAGE2-CNTN1 CASP14-CARD14
GLB1L2 DDX39B-CENATAC GUCA2B-CNTN1 CSAG3-ELOVL4
GPX3 TTLL3-CENATAC PAGE2-LGR5 HOXD13-EHF
WSB1 MAGEA4-CCDC184 PAGE2-EHF HOXD13-SLC40A1
ARG2 MS4A6A-CPVL PAGE2-ACKR3 TAC1-ACKR3
ELMO3 MAGEA4-ALDH1L2 GUCA2B-RNF213 TAC1-TENM1
BST1 NPY-CCDC184 MAGEA4-SLC40A1 NPY-ACKR3
BOP1 HOXD13-ALDH1L2 MAGEA4-CENATAC IQGAP2-CPVL
CDC42BPG PAGE2-ELOVL4 TAC1-CCDC184 HOXD13-TDO2
SYT8 MS4A7-CPVL CSAD-CENATAC NEUROD1-LGR5
PDPN PAGE2-TENM1 GUCA2B-CPVL

Analysis is also conducted using the alternative approaches. The summary of comparisons is provided in Table 3. The differences are quantified using the numbers of overlapping effects as well as RV coefficients [37]. RV coefficient measures the similarity of two data matrices and ranges between 0 and 1, with a larger value indicating a higher overlap in information (contained in two sets of main effects or interactions). It is observed that the proposed approach identifies significantly different sets of main effects and interactions from the alternatives. However, the amount of overlapping information as measured by RV coefficient is moderate to high, which is reasonable as different genes can contain similar information.

Table 3:

Numbers of main effects and interactions identified by different approaches (diagonal elements) and their overlaps (off-diagonal). RV coefficients in “()”.

Overall survival Approach CVMS+CVIS PCS+PCIS IGS+CVIS CVMS+PCIS IGS+PCIS

Main effects CVMS+CVIS 79 1(0.561) 1(0.563)
PCS+PCIS 79 78(0.999)
IGS+CVIS 79

Interaction CVMS+CVIS 158 0(0.714) 0(0.775) 49(0.939) 0(0.714)
PCS+PCIS 158 85(0.961) 93(0.888) 158(0.961)
IGS+CVIS 158 72(0.990) 109(0.961)
CVMS+PCIS 158 93(0.888)
IGS+PCIS 158

Stage

Main effects CVMS+CVIS 76 14(0.339) 14(0.350)
PCS+PCIS 76 71(0.977)
IGS+CVIS 76

Interaction CVMS+CVIS 152 0(0.125) 0(0.707) 4(0.192) 0(0.128)
PCS+PCIS 152 1(0.200) 14(0.939) 92(0.996)
IGS+CVIS 152 0(0.199) 1(0.210)
CVMS+PCIS 152 42(0.927)
IGS+PCIS 152

As in some published studies [32, 38], we conduct downstream analysis to further examine the effect of screening. More specifically, (a) we randomly split data into a training set of size 393 and a testing set of size 100; (b) with the training set, the proposed and alternative screenings are conducted; (c) with the obtained main effects and interactions, we apply a penalization method, which can identify the important main effects and interactions in a joint interaction analysis model and respect the variable selection hierarchy. In this step of analysis, we adopt the Cox model. This may have a “conflict” with the proposed model-free spirit. Extending the proposed approach to joint modeling is highly nontrivial and will not be pursued here; (d) the training set model is then used for prediction with the testing set samples. We adopt the C-statistic to evaluate prediction performance. A C-statistic has range [0,1], with a larger value indicating better prediction; and (e) Steps (a)-(d) are repeated 200 times. The average C-statistics are 0.7810 (CVMS), 0.7605 (PCS), 0.7733 (IGS), 0.8325 (CVMS+CVIS), 0.8019 (PCS+PCIS), 0.8129 (CVMS+PCIS), 0.8032 (IGS+CVIS), 0.8078 (IGS+PCIS), which can provide an “indirect” support to the superiority of the proposed approach. To more intuitively comprehend this result, in Figure S3 (Supplementary Materials), we present the Kaplan-Meier curves for one random split. The two groups are generated by dichotomizing the predicted risk scores at the median. We see that the difference between the good and bad survival groups is bigger under the proposed approach (and the corresponding p-value is smaller).

4.2. Analysis of categorical stage

In this set of analysis, the outcome variable is pathological stage. In the original data, there are nine stages: Stage I, Stage IA, Stage IB, Stage II, Stage IIA, Stage IIB, Stage IIIA, Stage IIIB, and Stage IV. With a limited sample size, to avoid small counts, we combine into Stages I, II, III, and IV, which have sample sizes 270, 119, 81, and 26, respectively.

The proposed screening leads to 76 main effects and 152 interactions. Details are provided in Table 4. Similar to the above subsection, we also observe that many of the screened genes have sound biological implications. For example, CSAG2 and MAGEA4 have been found to play critical roles in cancer development. LIN28B has been reported to be highly expressed during embryogenesis but silent in most adult tissues, and can block the maturation of the tumor suppressor microRNA let-7 family and mediate diverse biological functions. GUCA2B has been suggested as a susceptibility gene for essential hypertension. UGT1A10, which is expressed exclusively in extrahepatic tissues, is a highly active and important extrahepatic enzyme. MAGEA1 is a promising candidate marker for LUAD therapy, and the MAGEA1-specific CAR-T cell immunotherapy may be an effective strategy for the treatment of MAGEA1-positive LUAD. In Table 3, we summarize the comparison between the proposed and alternative screenings. The overall pattern is similar to that for overall survival. The random split-based evaluation is conducted in a similar manner as in the previous subsection. The difference is that in Step (c), a logistic model is fit. Accordingly, we use classification error as the criterion for comparison. With 200 random splittings, the average classification accuracy (1-error) values are 0.770 (CVMS), 0.753 (PCS), 0.746 (IGS), 0.786 (CVMS+CVIS), 0.769 (PCS+PCIS), 0.760 (IGS+PCIS), 0.764 (IGS+CVIS), 0.778 (CVMS+PCIS), which again suggests the superiority of the proposed approach.

Table 4:

Analysis of categorical stage: 76 genes identified by CVMS and 152 interactions identified by CVMS+CVIS.

CVMS-main effects CVMS+CVIS-interactions


CSAG2 ZIC1 CSAG2-CSAG3 PSG4-CASP14 PSG4-STRA8
PSG4 PDX1 CSAG2-MAGEA4 LIN28B-HOXD13 PSG4-REG1A
LIN28B NLGN4Y CSAG2-MAGEB2 DPPA2-PAGE2 GAGE2A-CSAG2
VCX3A TFAP2D CSAG2-MAGEA6 LIN28B-REG1A VCX3A-REG1A
MAGEA4 DPYSL5 CSAG2-MAGEA1 LIN28B-MAGEA1 PSG4-UGT1A10
PAGE2 PHGR1 VCX3A-VCX DPPA2-VCX3A GAGE2A-DLK1
GUCA2B TM4SF5 CSAG2-PSG4 LIN28B- PIWIL3 PAGE2-VCX3A
NPY SCGN CSAG2-PAGE2B DPPA2-MAGEA4 GAGE2A- DEFB4A
HOXD13 LIPK CSAG2-PAGE2 DPPA2-PAGE2B CSAG2-ZNF560
UGT1A10 ALX1 CSAG2-LIN28B LIN28B- IRX4 MAGEA6-MAGEA12
TAC1 APOBEC1 CSAG2-CASP14 VCX3A-TAC1 GAGE2A-PIWIL3
MAGEB2 C1orf21 CSAG2-MAGEA12 GAGE2A-MAGEA4 GAGE2A-MAGEB2
CASP14 NSG1 CSAG2-MAGEA10 DPPA2-TAC1 PAGE2-TAC1
SOX14 GPR160 CSAG2-MAGEC2 GAGE2A- UGT1A10 LIN28B-ZNF560
CSAG3 MEOX1 CSAG2-VCX3A LIN28B- PAGE2B VCX3A- IRX4
MAGEA1 GMNC CSAG2-NPY LIN28B- CSAG2 ZFY- NLGN4Y
CGB5 CDC25C MAGEA6-MAGEA3 GAGE2A- REG1A CSAG2- PAGE5
PIWIL3 PIMREG LIN28B- TAC1 DPPA2- REG1A DPPA2- LIN28B
PRR20G CDT1 VCX3A- PAGE2 PSG4- PAGE2B CSAG2- HOXC12
PAGE2B STAP1 GAGE2A- VCX3A MAGEA4- CSAG2 DPPA2- CASP14
PRAC2 ADSS1 MAGEA3- MAGEA6 GAGE2A- LIN28B VCX3A- MAGEA4
MAGEA6 RNASE1 GAGE2A- PSG4 DPPA2- NPY GAGE2A- GUCA2B
HOXC12 PTGDS GAGE2A- TAC1 MAGEA4- MAGEA10 PSG4- DEFB4A
SST CLUL1 CSAG2- PRR20G DPPA2- UGT1A10 LIN28B- UGT1A10
MAGEC2 SMURF2 CSAG2- PIWIL3 PSG4- NPY GUCA2B- UGT1A10
SLC10A2 GAGE2A- PAGE2 LIN28B- PAGE2 LIN28B- PSG4
IRX4 CSAG2- HOXD13 CSAG2- DLK1 LIN28B- DLK1
REG1A CSAG3- CSAG2 GAGE2A- STRA8 NPY- TAC1
STRA8 GAGE2A- PAGE2B LIN28B- MAGEB2 VCX3A- NPY
MAGEA10 PSG4- CGB5 PSG4- PIWIL3 MAGEA4- CASP14
LCN15 CSAG2- STRA8 LIN28B- GUCA2B CSAG2- DEFB4A
VCX CSAG2-UGT1A10 DPPA2-CGB5 LIN28B-CGB5
MAGEC1 CSAG2-MAGEA3 CSAG2-VCX LIN28B-PRR20G
DEFB4A LIN28B-MAGEA4 DPPA2-STRA8 GAGE2A-CSAG3
NR0B1 CSAG2-MAGEC1 DPPA2-GUCA2B MAGEA4-REG1A
SPRR2F VCX3A-STRA8 LIN28B-HOXC12 VCX3A-MAGEA10
MAGEA3 GAGE2A-CGB5 VCX3A-PSG4 MAGEA4-UGT1A10
WFDC5 DPPA2-PSG4 LIN28B-MAGEA10 LIN28B-STRA8
PAGE5 PSG4-TAC1 PSG4-PAGE2 PSG4-MAGEA4
DLK1 LIN28B-VCX3A LIN28B-NPY PSG4-HOXD13
ACTL8 CSAG2-TAC1 MAGEA4-MAGEA6 VCX3A-DLK1
MAGEA12 PSG4-VCX3A PSG4-DLK1 GAGE2A-MAGEA1
HOXA13 CSAG2-IRX4 MAGEA4-VCX3A VCX3A-ZNF560
GP2 MAGEA4-MAGEA1 DPPA2-PIWIL3 MAGEB2-CSAG2
TFF2 PAGE2-PAGE2B LIN28B-CASP14 DDX3Y-NLGN4Y
UGT2B11 CSAG2-GUCA2B DPPA2-RX4 GAGE2A-PAGE5
ETNPPL GAGE2A-CASP14 GAGE2A-PRR20G VCX3A-CASP14
SPRR2A CSAG2-REG1A DPPA2-DLK1 FTHL17-UGT1A10
ZNF560 GAGE2A-IRX4 MAGEA4-PAGE2 VCX3A-PIWIL3
KRT75 VCX3A-PAGE2B CSAG2-CGB5 CSAG2-ZIC1
INSL4 GAGE2A-NPY PSG4-IRX4

5. Discussion

In this article, we have developed a new marginal screening approach. Although marginal screening is not a new topic, with the increasing resolution of profiling (and hence increasing dimensionality), it still plays an essential role in data analysis, and there is still a strong demand for more effective screening methods. This study has advanced from many of the existing studies by focusing on interactions, whose significance is increasingly recognized. The proposed approach is based on Shannon’s information theory, whose applications to screening remain limited. It can flexibly accommodate different types of distributions of response and predictors under one unified framework. It has the much-desired robustness properties not shared by the model-based and many other approaches. The theoretical development has provided a uniquely strong basis, and the numerical studies have convincingly established its practical superiority.

For convenience, as in the literature, we have employed a hard thresholding cutoff in each stage to retain a fixed number of predictors. It is possible to more data-dependently determine dn1 and dn2. First consider 𝓓^1. Let {s1,,sp} be a permutation of {1,,p} such that CV^s1CV^s1CV^sp. We adopt the maximum ratio criterion [14], with which dn1=argmax1jp1CV^sj/CV^sj+1. Asymptotically, it can be proved that CV^sj/CV^sj+1 is Op(1) when jd1 and CV^sd1/CV^sd1+1p, since CV^sd1+1 can be arbitrarily small. Here, d1=|𝓓1|. However, this criterion can be unstable with very large or very small dn1, when there are predictors with very strong or weak effects [18]. To remedy this problem, a resampling-based method can be adopted, and dn1 can be restricted to be smaller than a user-specific constant. This technique proceeds as follows: (i) Generate B bootstrap samples. (ii) Calculate CV filters for each bootstrap sample. For the ith bootstrap sample, the CV estimates are ordered from the largest to smallest, and we calculate dn1(i)=argmax1jdmaxCV^sj/CV^sj+1, i=1,,B (iii) Obtain dn1=1B1iBdn1(i). Similar discussions hold for dn2. With the satisfactory performance of the hard cutoffs, we do not pursue this computationally more expensive determination in this article.

This study can be potentially extended in multiple ways. The proposed approach has been designed for the interactions of the same type of predictors. In omics studies, this amounts to gene-gene interactions. It will be almost straightforward to extend the proposed CV filters to gene-environment interactions, which involve two different types of predictors. We have focused on two-way interactions. Higher-order interactions are statistically meaningful, however, still have very limited practical applications under high-dimensional settings. We have focused on screening. It may be of interest to further develop joint interaction modeling also based on Shannon’s information theory, so the overall analysis – consisting of screening and joint modeling – can be more coherent.

Supplementary Material

1
2
3

Acknowledgements

We thank the editor and reviewers for their careful review and insightful comments, which have led to a significant improvement of the article. This study has been partly supported by NSFC grants 12001101 and 20YQ18, and NIH CA204120, CA121974, and CA196530.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • [1].Moore J, and Williams S. (2009). Epistasis and Its Implications for Personal Genetics. American Journal of Human Genetics, 85(3): 309–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Khan A, Dinh DM, Schneider D, Lenski R, and Cooper T. (2011). Negative epistasis between beneficial mutations in an evolving bacterial population. Science, 332(6034), 1193–1196. [DOI] [PubMed] [Google Scholar]
  • [3].Yuan M, Joseph VR and Zou H. (2009). Structured variable selection and estimation. Annals of Applied Statistics, 3, 1738–1757. [Google Scholar]
  • [4].Choi N, Li W, and Zhu J. (2010). Variable selection with the strong heredity constraint and its oracle property. Journal of the American Statistical Association, 105, 354–364. [Google Scholar]
  • [5].Bien J, Taylor J, and Tibshirnani R. (2013). A LASSO for hierarchical interactions. The Annals of Statistics, 41, 1111–1141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Fan Y, Kong Y, Li D, and Zheng Z. (2015). Innovated interaction screening for high-dimensional nonlinear classification. The Annals of Statistics, 43(3), 1243–1272. [Google Scholar]
  • [7].Yan J, Risacher S, Shen L, and Andrew S. (2018). Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data. Briefings in Bioinformatics, 19(6), 1370–1381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Hao N, and Zhang H. (2017). A Note on High-Dimensional Linear Regression With Interactions. The American Statistician, 71(4), 291–297 [Google Scholar]
  • [9].Fan J, Feng Y, and Song R. (2011). Nonparametric Independence Screening in Sparse Ultra-high Dimensional Additive Models. Journal of the American Statistical Association, 106, 544–557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Liu J, Li R, and Wu R. (2014). Feature Selection for Varying Coefficient Models with Ultrahigh Dimensional Covariates. Journal of the American Statistical Association, 109, 266–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].He X, Wang L, and Hong H. (2013). Quantile-Adaptive Model-Free Variable Screening for High-Dimensional Heterogeneous Data. The Annals of Statistics, 41, 342–369. [Google Scholar]
  • [12].Li G, Peng H, Zhang J, and Zhu L. (2012). Robust Rank Correlation Based Screening. The Annals of Statistics, 40, 1846–1877. [Google Scholar]
  • [13].Li R, Zhong W, and Zhu L. (2012). Feature Screening Via Distance Correlation Learning. Journal of American Statistical Association, 107, 1129–1139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [14].Huang D, Li R, and Wang H. (2014). Feature screening for ultrahigh dimensional categorical data with applications. Journal of Business & Economic Statistics, 32(2), 237–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Mai Q, and Zou H. (2015). The fused Kolmogorov filter: a nonparametric model-free screening method. The Annals of Statistics, 43(4), 1471–1497. [Google Scholar]
  • [16].Cui H, Li R, and Zhong W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110(510), 630–641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Huang J, Horowitz J, and Ma S. (2008). Asymptotic Properties of Bridge Estimators in Sparse High-Dimensional Regression Models. The Annals of Statistics, 36, 587–613. [Google Scholar]
  • [18].Ni L, and Fang F. (2016). Entropy-based Model-free Feature Screening for Ultrahigh-dimensional Multiclass Classification. Journal of Nonparametric Statistics, 28(3), 515–530. [Google Scholar]
  • [19].Hall P, and Xue. J. (2014). On selecting interacting features from high-dimensional data. Computational Statistics & Data Analysis, 71, 694–708. [Google Scholar]
  • [20].Hao N, and Zhang H. (2014). Interaction Screening for Ultrahigh-Dimensional Data. Journal of the American Statistical Association, 109(507), 1285–1301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Li Y, and Liu J. (2019). Robust variable and interaction selection for logistic regression and general index models. Journal of the American Statistical Association, 114(525), 271–286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Dong C, Chu X, Wang Y. et al. (2008). Exploration of gene–gene interaction effects using entropy-based methods. European Journal of Human Genetics, 16, 229–235. [DOI] [PubMed] [Google Scholar]
  • [23].Wu X, Jin L, and Xiong M. (2009). Mutual Information for Testing Gene-Environment Interaction. PLoS One, 4(2), e4578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Fan R, Zhong M, Wang S, Zhang Y, Andrew A, Karagas M, Chen H, Amos C, Xiong M, and Moore J. (2011). Entropy-based information gain approaches to detect and to characterize gene-gene and gene-environment interactions/correlations of complex diseases. Genetic Epidemiology, 35(7), 706–721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Jiang R, Tang W, Wu X, and Fu W. (2009). A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinformatics, 10(Suppl 1), S65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].O’Hagan S, Wright Muelas., Day P, Lundberg E, Kell D. (2018). GeneGini: Assessment via the Gini Coefficient of Reference “Housekeeping” Genes and Diverse Human Transporter Expression Profiles. Cell System, 6(2), 230–244.e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Zhao J, Zhou Y, Zhang X, and Chen L. (2006). Part mutual information for quantifying direct associations in networks. Proceedings of National Academy of Sciences of the United States of America, 113(18), 5130–5135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Xu Y, Wu M, Zhang Q, and Ma S. (2019). Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach. Genomics, 111, 1115–1123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Pan W. (2009). Asymptotic Tests of Association with Multiple SNPs in Linkage Disequilibrium. Genetic Epidemiology, 33(6), 497–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Pan W, Shen X. (2011). Adaptive Tests for Association Analysis of Rare Variants. Genetic Epidemiology, 35(5), 381–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [31].Shi X, Liu J, Huang J, Zhou Y, Xie Y, and Ma S. (2014). A penalized robust method for identifying gene-environment interactions. Genetic Epidemiology, 38(3), 220–230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [32].Wu C, Shi X, Cui Y, and Ma S. (2015). A penalized robust semiparametric approach for gene–environment interactions. Statistics in Medicine, 34(30), 4016–4030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Shannon C. (1974). A Mathematical Theory of Communication. Bell Labs Technical Journal, 27(4), 379–423. [Google Scholar]
  • [34].Anzarmou Y, Mkhadri A, and Oualkacha K. (2022). The Kendall Interaction Filter for Variable Interaction Screening in Ultra High Dimensional Classification Problems. Journal of Applied Statistics, published online. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Song R, Lu W, Ma S, and Jeng X. (2014). Censored rank independence screening for high-dimensional survival data. Biometrika, 101(4), 799–814. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [36].Wang J, and Chen Y. (2020). Interaction screening by Kendall’s partial correlation for ultrahigh-dimensional data with survival trait. Bioinformatics, 36(9), 2763–2769. [DOI] [PubMed] [Google Scholar]
  • [37].Escoufier Y. (1973). Le traitement des variables vectorielles. Biometrics, 29, 751–760. [Google Scholar]
  • [38].Wu M, Huang J, and Ma S. (2018). Identifying gene-gene interactions using penalized tensor regression. Statistics in Medicine, 37(4), 598–610. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3

RESOURCES