Unified model-free interaction screening via CV-entropy filter

Wei Xiong; Yaxian Chen; Shuangge Ma

doi:10.1016/j.csda.2022.107684

. Author manuscript; available in PMC: 2024 Apr 1.

Published in final edited form as: Comput Stat Data Anal. 2022 Dec 28;180:107684. doi: 10.1016/j.csda.2022.107684

Unified model-free interaction screening via CV-entropy filter

Wei Xiong ^a,^*, Yaxian Chen ^b, Shuangge Ma ^c

PMCID: PMC9997997 NIHMSID: NIHMS1862131 PMID: 36910335

Abstract

For many practical high-dimensional problems, interactions have been increasingly found to play important roles beyond main effects. A representative example is gene-gene interaction. Joint analysis, which analyzes all interactions and main effects in a single model, can be seriously challenged by high dimensionality. For high-dimensional data analysis in general, marginal screening has been established as effective for reducing computational cost, increasing stability, and improving estimation/selection performance. Most of the existing marginal screening methods are designed for the analysis of main effects only. The existing screening methods for interaction analysis are often limited by making stringent model assumptions, lacking robustness, and/or requiring predictors to be continuous (and hence lacking flexibility). A unified marginal screening approach tailored to interaction analysis is developed, which can be applied to regression, classification, and survival analysis. Predictors are allowed to be continuous and discrete. The proposed approach is built on Coefficient of Variation (CV) filters based on information entropy. Statistical properties are rigorously established. It is shown that the CV filters are almost insensitive to the distribution tails of predictors, correlation structure among predictors, and sparsity level of signals. An efficient two-stage algorithm is developed to make the proposed approach scalable to ultrahigh-dimensional data. Simulations and the analysis of TCGA LUAD data further establish the practical superiority of the proposed approach.

Keywords: Coefficient of variation, Conditional entropy, Interaction analysis, Marginal Screening

1. Introduction

For many practical high-dimensional analysis problems, interactions have been increasingly confirmed as playing critical roles beyond main effects [1, 2]. The most representative example is perhaps gene-gene interaction. For a long array of diseases including cancer and cardiovascular diseases, gene-gene interactions with significant implications for disease risk, progression, survival, and other endpoints have been identified. “Genes” analyzed in published studies include SNPs, gene expressions, methylation and other epigenetic changes, microRNAs, and others. Interaction analysis has also been extensively conducted beyond biomedicine.

Most of the existing interaction analyses can be classified as marginal and joint. In marginal analysis, a small number of variables are analyzed at a time. As such, a large number of analyses are needed, leading to a multiple comparison adjustment problem. In contrast, in joint analysis, a large number of variables are collectively analyzed in a single model, leading to a regularized estimation and variable selection problem [3, 4, 5, 6]. The two analysis paradigms serve different purposes, with joint analysis possibly better reflecting, for example, the biology of complex diseases [7]. In this article, we focus on joint analysis. In the literature, many joint interaction analysis methods have been developed, and we refer to [8] and others for review. Despite successful methodological and theoretical developments, in practice, joint interaction analysis is usually seriously challenged by extremely high dimensionality, which can lead to an intolerably high computational cost, lack of stability, and inferior estimation and selection.

For high-dimensional data analysis in general (without and with interactions), marginal screening has been established as highly effective for reducing computational cost and improving stability, estimation, and selection performance. The essence of marginal screening lies in linking effects that are important in a joint model with those that are important in marginal models. Most of the existing screening methods are limited to main effects only. They can be roughly classified as model-based and model-free (robust). In model-based marginal screening, specific parametric or semiparametric models are assumed [9, 10]. In such analysis, when models are correctly specified, consistency properties can be established. However, in high-dimensional data analysis, model misspecification is not uncommon, which can lead to the failure of model-based screening. To tackle this problem, model-free screening methods have been developed, built on the quantile [11], rank correlation [12], distance correlation [13], and other techniques [14, 15, 16, 18]. It is noted that some of the existing techniques have relatively narrow applications. For example, the mean-variance-based sure independence screening [16] is limited to classification problems. The distance-correlation-based sure independence screening [13] requires predictors to have continuous distributions. The existing methods sometimes take quite different forms for different types of response variables, lacking uniformity – this is especially true for model-based methods.

Compared to the analysis of main effects, marginal screening for interaction analysis has been less developed. It may seem that the methods for main-effects screening can be directly applied. However, the validity of such methods often relies on certain no/weak correlation assumptions, which easily break down in interaction analysis. One solution has been brought by the unique variable selection hierarchy of interaction analysis. In particular, it has been argued both statistically and biologically that, if an interaction term is important, then one (under the weak hierarchy) or both (under the strong hierarchy) of the corresponding main effects should also be important [20]. With this hierarchy, progressive screening methods have been developed, which first conduct marginal screening with main effects, and then screen for important interactions corresponding to the selected main effects [19, 20, 21]. It is noted that the existing progressive methods are mostly model-based. In addition, they also demand certain correlation conditions to ensure that important main effects can be identified in the first place. To identify gene-gene and gene-environment interactions, entropy-based methods have been proposed. Examples include an index based on information gain to quantify the interaction effect between two categorical predictors and a response [22] and utilization of mutual information and mutual information gain to quantify gene-environment [23] and gene-gene [24] interactions. There are also approaches that take GINI purity gain into account to detect gene-gene interactions [25, 26]. However, these entropy-based methods generally overestimate dependence [27] and are focused on binary response, having limited practical applications.

In this article, our goal is to develop a new marginal screening approach tailored to interaction analysis. The significance of interaction analysis and marginal screening for such analysis has been well established and will not be reiterated. This study may complement and advance from the existing literature in multiple ways. First, the proposed analysis accommodates interactions and can complement those limited to main effects only – it is also noted that it is directly applicable to the analysis that only has main effects. Second, the proposed approach provides a unified solution to feature screening. It can comprehensively cover categorical, continuous, and censored survival outcomes. This methodological uniformity is much desired and not shared by many of the existing methods. Third, the proposed nonparametric coefficient of variation (CV) filters, built on Shannon’s entropy theory and coefficient of variation statistic, are model-free and have the much-desired robustness property. Its implementation does not require full specification of the distribution of response and covariates. The main consideration for adopting CV is that standard deviation (SD) of entropy information usually changes as mean changes, and dividing by mean can remove its impact on variation. Such a formulation can be even more useful when different predictors have different numbers of categories, as predictors with more categories are likely to be associated with larger information gain regardless of interactions are important or not. In this case, mean can serve as an adjustment factor to deal with this problem. The proposed CV filter is a standardization of SD that allows the comparison of variability regardless of the magnitudes of original features. With the assistance of a two-stage procedure, the proposed approach is also superior by being computationally scalable and fast. In addition, it is shown to have satisfactory performance when signals are weak and/or variables are dependent and heavy-tailed – such a property is not shared by most of the existing methods. Here it is noted that the merit of robustness for joint interaction analysis has been well established, for which we refer to the quantile [28], exponential loss [29, 30], rank-based [31, 32], and many other works. Fourth, the proposed approach can flexibly accommodate discrete, categorical, and continuous predictors, overcoming the stringent demand for continuous distributions by some methods. Lastly, statistical properties are rigorously established, providing the proposed approach a strong statistical ground and also shedding insights into coefficient of variation and entropy theory under high-dimensional settings. Overall, this study can provide a statistically well-grounded and numerically well-performed approach for alleviating computational burden and improving performance in high-dimensional interaction analysis.

2. Methods

The proposed CV filters are based on Shannon’s entropy theory [33], which has demonstrated great power in other fields but has not been well employed in interaction analysis. In Section 2.1, we first develop a new interaction filter based on conditional entropy for data with a categorical response and predictors. Based on this development, a new screening strategy is developed in Section 2.2, and its statistical properties are established in Section 2.3. In Section 2.4, we consider data with continuous and censored survival responses.

2.1. A new interaction filter based on conditional entropy

Let Y be a categorical response with R categories ${1, \dots, R}$ , and $X = {(X_{1}, \dots, X_{p})}^{T}$ be a p-dimensional vector, where X_k is categorical with J_k categories for $k = 1, \dots, p$ . In this study, we focus on second-order interactions. Higher-order interactions have been very limitedly investigated in high-dimensional settings. For two predictors X_k and X_l, they do not have an interaction effect, if and only if they are conditionally independent given Y.

Entropy is a key information measure for the uncertainty of random variables. Consider a categorical random variable X and its probability mass function $p (x) = P (X = x)$ . Set 0 × log 0 = 0. The entropy of X is defined as:

H (X) = - \sum_{x} p (x) \log p (x) .

(2.1)

It is minimal (=0), if X has probability one for a specific category. On the other hand, it is maximal, if X has the same probability for all categories. For an equiprobable variable, H(X) increases with the number of categories.

Consider two variables X_k, X_l. We propose using conditional entropy to quantify the dependence between their interaction and Y. Denote by $H (Y ∣ X_{k} = i, X_{l} = j) = - \sum_{r = 1}^{R} p_{r ∣ i j} \log p_{r ∣ i j}$ the conditional entropy of Y given $X_{k} = i$ and $X_{l} = j$ , where $p_{r ∣ i j} = P (Y = r ∣ X_{k} = i, X_{l} = j)$ . Let $w_{i j}^{k l} = P (X_{k} = i, X_{l} = j)$ , $i = 1 \dots J_{k}$ , $j = 1, \dots, J_{l}$ be the weight characterizing the probability of (X_k, X_l) falling into class (i, j). Consider the quantity:

Θ_{i j}^{k l} : = w_{i j}^{k l} H (Y ∣ X_{k} = i, X_{l} = j) = - w_{i j}^{k l} \sum_{r = 1}^{R} p_{r ∣ i j} \log p_{r ∣ i j} .

(2.2)

It measures the weighted amount of information remained in response Y given $X_{k} = i$ and $X_{l} = j$ . Consequently, $H (Y ∣ X_{k}, X_{l}) = \sum_{i = 1}^{J_{k}} \sum_{j = 1}^{J_{l}} Θ_{i j}^{k l}$ is the sum of $Θ_{i j}^{k l}$ over all possible values that X_k and X₁ can take. When $(X_{k}, X_{l})$ has a strong relationship with Y, the uncertainty of Y is expected to significantly decrease after removing effects of $(X_{k}, X_{l})$ . Specifically, if Y is completely determined by X_k and X₁ (such that $Y = g (X_{k}, X_{l})$ , where $g (\cdot)$ is a deterministic function), $H (Y ∣ X_{k}, X_{l})$ should be 0. By simple calculations, $H (Y ∣ X_{k}, X_{l})$ can be reformulated as:

2 H (Y ∣ X_{k}, X_{l}) = \underset{Predictor information}{\underset{︸}{H (X_{k}) + H (X_{l}) - 2 H (X_{k}, X_{l})}} + \underset{Main effects}{\underset{︸}{H (Y ∣ X_{k}) + H (Y ∣ X_{l})}} + \underset{Interaction effects}{\underset{︸}{H (X_{k} ∣ Y, X_{l}) + H (X_{l} ∣ Y, X_{k})}}

(2.3)

We note from (2.3) that $H (Y ∣ X_{k}, X_{l})$ contains multiple sources of information, including the intrinsic information of the predictors, main-effect information $H (Y ∣ X_{k})$ and $H (Y ∣ X_{l})$ , and interaction information $H (X_{k} ∣ Y, X_{l})$ and $H (X_{l} ∣ Y, X_{k})$ . When there is no interaction, $H (X_{k} ∣ Y, X_{l}) = H (X_{k} ∣ Y)$ and $H (X_{l} ∣ Y, X_{k}) = H (X_{l} ∣ Y)$ . And so, $H (Y ∣ X_{k}, X_{l}) = H (X_{k} ∣ Y) + H (X_{l} ∣ Y) - H (X_{k} ∣ X_{l})$ . When X_k and X₁ are independent, $H (Y ∣ X_{k}, X_{l}) = H (Y ∣ X_{k}) + H (Y ∣ X_{l}) - H (Y)$ . These results motivate us to use conditional entropy to develop an efficient interaction screening filter.

Define $Θ^{k l} : = {Θ_{i j}^{k l} : i = 1, \dots, J_{k}, j = 1, \dots, J_{l}}$ as the set of conditional entropy of Y given all the possible values of $(X_{k}, X_{l})$ . Intuitively, if there is an interaction, the effect of X_k on Y will not be the same for all levels of X₁. This can be fully reflected in the variation of $Θ^{k l}$ . Specifically, if all the components of $Θ^{k l}$ are equal, $(X_{k}, X_{l})$ should have no effect on Y. On the other hand, if there is a strong interaction, $Θ^{k l}$ should have notable variability. As such, the standard deviation (SD) of $Θ^{k l}$ can be potentially used to quantify interaction. When different predictors have different numbers of categories, the pair $(X_{k}, X_{l})$ with more categories is likely to have smaller conditional entropy, regardless of the absence or presence of interaction. In addition, the SD of $Θ^{k l}$ generally depends on the mean and is not dimensionless. With these considerations, we propose using the coefficient of variation (CV) to quantify interaction. Specifically,

{CV}_{k l} = CV (Y ∣ X_{k}, X_{l}) = \frac{σ_{k l}}{μ_{k l}},

(2.4)

where σ_kl and μ_kl are the SD and mean of $Θ^{k l}$ , respectively. This CV filter is the standardization of SD. Consequently, it allows for the comparison of variability without being affected by the magnitudes of the original variables. It is easy to see that ${CV}_{k l} = {CV}_{l k}$ . Its properties are further established in the following proposition.

Proposition 1.

Let Y be a categorical random variable with $R (R \geq 2)$ categories ${1, \dots, R}$ , and $p_{r} = P (Y = r) > 0$ for all $r = 1, \dots, R$ . Let X_k be a categorical variable with J_k categories ${1, \dots, J_{k}}$ for $k = 1, \dots, p$ . Let $0 < w_{i j}^{k l} = P (X_{k} = i, X_{l} = j) < 1$ and $0 < p_{r ∣ i j} < 1$ for all $r = 1, \dots, R$ , $i = 1, \dots, J_{k}$ , and $j = 1, \dots, J_{l}$ . Then, (1) $Θ_{i j}^{k l} > 0$ for $i = 1, \dots J_{k}$ and $J = 1, \dots, J_{l}$ , and $0 < μ_{k l} < H (Y)$ . (2) $C V_{k l} \geq 0$ for $k, l \in {1, \dots, p}$ . (3) $0 \leq C V_{k l} \leq \sqrt{2} / 4$ if $(X_{k}, X_{l})$ and Y are independent. (4) $C V_{k l} = 0$ if and only if $(X_{k}, X_{l})$ and Y are independent and $w_{i j}^{k l} = 1 / J_{k} J_{l}$ for all $i = 1, \dots, J_{k}$ and $j = 1, \dots, J_{l}$ .

Proof is provided in the Supplementary Materials. Result (1) implies that each component of Θ^kl is positive, and so is the mean value μ_kl. By Jensen’s inequality, μ_kl is bounded by the entropy of Y. As R increases, the upper bound also increases. In addition, if $(X_{k}, X_{l})$ and Y are independent, CV_kl cannot be too large and is bounded by ∼ 0.35. If $(X_{k}, X_{l})$ falls into each category with an equal probability, CV_kl achieves its minimum of 0. By definition, larger CV values indicate stronger interaction effects. As such, CV_kl can be utilized as marginal utility for interaction screening.

Although categorical distributions have been assumed, the CV interaction filter can be generalized to continuous and mixture distributions. Specifically, if predictor X_k has a continuous distribution, we can employ slicing and partition X_k into J_k slices, and then the CV interaction filter can be applied. In our numerical studies, we adopt uniform slicing and note that data-dependent, possibly more effective slicing techniques have been developed in the literature.

Let q_(j) be the j/J_kth percentile of X_k, $j = 1, \dots, J_{k} - 1$ , $q_{(0)} = - \infty$ and $q_{(J_{l})} = \infty$ , $k = 1, \dots, p$ . The CV interaction filter is defined by replacing $w_{i j}^{k l}$ and $p_{r ∣ i j}$ in equation (2.2) by $w_{i j}^{k l} = P (X_{k} \in (q_{(i - 1)}, q_{(i)}], X_{l} \in (q_{(j - 1)}, q_{(j)}])$ and $p_{r ∣ i j} = P (Y = r ∣ X_{k} \in (q_{(i - 1)}, q_{(i)}], X_{l} \in (q_{(j - 1)}, q_{(j)}])$ , respectively. A similar idea can be applied to mixture distributions. When X_k is continuous and X₁ is categorical, the CV filter can be constructed by replacing $w_{i j}^{k l}$ and $p_{r ∣ i j}$ in equation (2.2) with $w_{i j}^{k l} = P (X_{k} = i, X_{l} \in (q_{(j - 1)}, q_{(j)}])$ and $p_{r ∣ i j} = P (Y = r ∣ X_{k} = i, X_{l} \in (q_{(j - 1)}, q_{(j)}])$ , respectively.

With a sample of n iid observations ${Y_{t}, x_{t 1}, \dots, x_{t p}}$ , $t = 1, \dots, n$ , ${CV}_{k l}$ can be estimated by plugging in sample means and standard deviations. That is,

{\hat{CV}}_{k l} = \frac{\sqrt{\sum_{i = 1}^{J_{k}} \sum_{j = 1}^{J_{l}} {({\hat{Θ}}_{i j}^{k l} - {\hat{μ}}_{k l})}^{2}}}{{\hat{μ}}_{k l} \sqrt{J_{k} J_{l} - 1}},

(2.5)

where ${\hat{Θ}}_{i j}^{k l} = - {\hat{w}}_{i j} \sum_{r = 1}^{R} {\hat{p}}_{r ∣ i j} \log {\hat{p}}_{r ∣ i j}$ , ${\hat{w}}_{i j} = \frac{1}{n} \sum_{t = 1}^{n} I (x_{t k} = i, x_{t l} = j)$ , ${\hat{p}}_{r ∣ i j} = \frac{\sum_{t = 1}^{n} I (x_{t k} = i, x_{t l} = j, Y_{t} = r)}{\sum_{t = 1}^{n} I (x_{t k} = i, x_{t l} = j)}$ , $I (\cdot)$ is the indicator function, $i \in {1, \dots, J_{k}}$ , $j \in {1, \dots, J_{l}}$ and ${\hat{μ}}_{k l} = \sum_{i = 1}^{J_{k}} \sum_{j = 1}^{J_{l}} {\hat{Θ}}_{i j}^{k l} / J_{k} J_{l}$ .

Illustrating examples

In Section 2 of the Supplementary Materials, we present two illustrating examples, which can provide more insights into working characteristics of the CV interaction filter and show that it performs well for both continuous and categorical predictors.

Remark 1.

Following the same principle, the CV filter approach can be applied to screen main effects. This can be achieved by replacing $(X_{k}, X_{l})$ with its marginals X_k or X_l in (2.4) and rewriting as ${CV}_{k} = CV (Y ∣ X_{k})$ or ${CV}_{l} = CV (Y ∣ X_{l})$ . Properties are similar to the above. In particular, $CV (Y ∣ X_{k}) = 0$ if and only if X_k and Y are independent and $w_{k} = 1 / J_{k}$ for $k = 1, \dots, J_{k}$ . We refer to the approach of applying the CV filter to main effects only as CVMS (which will be further considered below).

2.2. Screening Strategy

Define the index sets of predictors and their second-order terms as:

𝓣_{1} = {1, 2, \dots, p}, 𝓣_{2} = {(k, l) : 1 \leq k < l \leq p} .

Define the active main effect and interaction index sets as:

𝓓_{1} = {j : F (Y ∣ X) depends on X_{j} for some Y = r, j \in 𝓣_{1}},

𝓓_{2} = {(k, l) : X_{k} and X_{l} are not conditionally independent for some Y = r, (k, l) \in 𝓣_{2}} .

The full model index set is $𝓕 = 𝓣_{1} \cup 𝓣_{2}$ , and the true model index set is $𝓓 = 𝓓_{1} \cup 𝓓_{2}$ . For a model $𝓜$ , we use $| 𝓜 |$ to denote its size. As described above and in the literature, interaction analysis faces the additional complexity of the variable selection hierarchy. Here we consider the weak hierarchy, under which if $(k, l) \in 𝓓_{2}$ , then at least one of X_k and X_l should also be identified. Extending to the strong hierarchy can be easily carried out.

We propose the following two-stage approach, which screens main effects and interactions in two consecutive steps. In particular,

Stage 1.

CVMS: Apply the CV-main-effect filter to $𝓣_{1}$ , and identify:

{\hat{𝓓}}_{1} = {k \in 𝓣_{1} : {\hat{CV}}_{k} is among the d_{n 1} largest} .

Stage 2.

CVIS: Apply the CV-interaction filter to ${\hat{𝓓}}_{1} \cup {(k, l) \in 𝓣_{2} : {k} \cup {l} \subset {\hat{𝓓}}_{1}}$ , and identify

{\hat{𝓓}}_{2} = {(k, l) \in 𝓣_{2} : {\hat{CV}}_{k l} is among the d_{n 2} largest, with at least one of k and l \in {\hat{𝓓}}_{1}} .

The working active set for downstream analysis is then $\hat{𝓓} = {\hat{𝓓}}_{1} \cup {\hat{𝓓}}_{2}$ . Here, we note that some existing progressive methods demand multiple iterations to update ${\hat{𝓓}}_{1}$ and ${\hat{𝓓}}_{2}$ [20]. In contrast, the proposed approach is not iterative. In our theoretical investigations below, we examine the asymptotic requirements on $| {\hat{𝓓}}_{1} |$ and $| {\hat{𝓓}}_{2} |$ . In our numerical study, we take $d_{n 1} = [n / \log n]$ and $d_{n 2} = 2 [n / \log n]$ . The value of $d_{n 1}$ is consistent with that in the literature [11, 13, 15, 16], and the value of $d_{n 2}$ has been motivated by the squared dimensionality of interactions. Our numerical study below suggests satisfactory performance. On the other hand, we note that when $[n / \log n]$ is small in practical data analysis, “to be cautious”, a larger value can be taken.

Remark 2.

Conceptually, with binary distributions, the CV filter may lose power, as the SD in the definition of CV_k may not be sufficiently informative with only two data points. However, our numerical study below still suggests reasonable performance with binary distributions. When there are three or more categories, empirical study suggests highly satisfactory performance of the proposed approach. As a possible variation, one can directly apply the CV filter to the combination of main effects and interactions. Then the selected set, if needed, can be enriched to satisfy the variable selection hierarchy.

2.3. Statistical Properties

First consider the scenario with all predictors being categorical. Assume the following conditions:

(C1) There exist two positive constants c₁ and c₂, such that $c_{1} + c_{2} \leq R$ , $c_{1} / R \leq p_{r} \leq c_{2} / R$ , $c_{1} / R \leq p_{r ∣ i j} \leq c_{2} / R$ . There exist two positive constants c₃ and c₄, such that $c_{3} + c_{4} \leq J_{k} J_{l}, c_{3} / J_{k} J_{l} \leq w_{i j}^{k l} \leq c_{4} / J_{k} J_{l}$ for $1 \leq k < l \leq p$ , $i \in {1, \dots, J_{k}}$ and $j \in {1, \dots, J_{l}}$ .
(C2) There exist two positive constants c > 0 and $0 \leq τ < \frac{1}{4}$ , such that $\min_{(k, l) \in 𝓓_{2}} {CV}_{k l} \geq 2 c n^{- τ}$ .
(C3) $R = O (n^{s})$ , $J = \max_{1 \leq k \leq p} J_{k} = O (n^{t})$ , where s ≥ 0, t ≥ 0 and $4 τ + 4 s + 12 t < 1$ .
(C4) $\underset{p \to \infty}{lim inf} {\min_{(k, l) \in 𝓓_{2}} {CV}_{k l}^{2} - \max_{(k, l) \in 𝓓_{2}^{c}} {CV}_{k l}^{2}} \geq δ$ , where δ is a positive constant.

Condition (C1) guarantees that the proportion of each category of the response and pair of predictors cannot be too large or too small. Similar assumptions have been made in the literature [14, 16]. Condition (C2) is common in the marginal screening literature and requires that the minimum true signal is at least at the order of $n^{- τ}$ . Condition (C3) allows the number of categories for the response and predictors to diverge with a certain order, and the maximum number of categories for predictors is allowed to vary with sample size n. Condition (C4) is assumed to separate the active interaction set from noises. It ensures that CV_kl of an active interaction is always larger than that of an inactive one at the population level. Compared to the partial orthogonality condition [17] (that ${CV}_{k l} > 0$ for $(k, l) \in 𝓓_{2}$ and ${CV}_{k l} = 0$ for $(k, l) \in 𝓓_{2}^{c}$ ), Condition (C4) is weaker in that the effects are not required to be 0 for all inactive interactions to have the consistency property in ranking. In fact, $(k, l) \in 𝓓_{2}^{c}$ does not necessarily imply ${CV}_{k l} = 0$ . The quantity is zero only if $(X_{k}, X_{l})$ is independent of Y and $(X_{k}, X_{l})$ falls into each category with an equal probability. In comparison, the Pearson’s Chi-squared-based sure independence screening [14] requires the effects of all inactive covariates to be zero to enjoy the strong sure screening property.

Theorem 1.

(Sure screening for categorical predictors) Under conditions (C1)–(C3),

P (𝓓_{2} \subseteq {\hat{𝓓}}_{2}) \geq 1 - O (p^{2} \exp {- m n^{1 - (4 τ + 4 s + 12 t)} + (2 t + s) \log n}),

where m is a positive constant. Therefore, if $\log p^{2} = O (n^{ξ})$ and $ξ < 1 - 4 τ - 4 s - 12 t$ , CVIS has the sure screening property.

Proof is provided in the Supplementary Materials. This result ensures that the estimated set of interactions contains the truly important ones with probability approaching one. It is noted that, as conditional entropy has the much-desired robustness property, CVIS is robust to heavy-tailed distributions of predictors and presence of outliers – a property not shared by most of the existing methods. Further, the sure screening property holds when predictors and/or response have a diverging number of categories.

Remark 3.

Under the same conditions, the $CV (Y ∣ X_{k})$ filter also possesses the screening consistency property for main effects. In particular, it can be shown that, as $n \to \infty$ , $P (𝓓_{1} \subset {\hat{𝓓}}_{1}) \geq 1 - O (p \exp {- m^{'} n^{1 - (2 τ + 4 s + 4 t)} + (s + t) \log n})$ , where m^′ is a positive constant. So if $ξ < 1 - 4 τ - 4 s - 12 t$ in Theorem 1 is satisfied, $ξ < 1 - 2 τ - 4 s - 4 t$ . Then the sure screening property holds for main effects.

To accommodate continuous distributions, additional assumptions are needed, and condition (C3) needs to be revised. Specifically,

(C5) If both X_k and X_l are continuous, then there exists a constant c₅ such that $0 < f_{k} (x ∣ Y = r) < c_{5}$ for any $1 \leq r \leq R$ and X in the domain of x_k, where $f_{k} (x ∣ Y = r)$ is the Lebesgue density function of X_k conditional on Y = r. There exists a constant c₆ such that $0 < f_{k} (x ∣ X_{l} \in 𝓐, Y = r) < c_{6}$ for any $1 \leq r \leq R$ and x in the domain of x_k, $𝓐$ in the domain of X_l, where $f_{k} (x ∣ X_{l} \in 𝓐, Y = r)$ is the Lebesgue density function of X_k conditional on Y = r and $X_{l} \in 𝓐$ .
(C5’) If X_k is continuous and X_l is categorical, then there exists a constant $c_{6}^{'}$ such that $0 < f_{k} (x ∣ X_{l} = j, Y = r) < c_{6}^{'}$ for any $1 \leq r \leq R$ , $1 \leq j \leq J_{l}$ and X in the domain of X_k, where $f_{k} (x ∣ X_{l} = j, Y = r)$ is the Lebesgue density function of X_k conditional on $Y = r$ and $X_{l} = j$ .
(C6) There exist positive constants c₇ and $0 \leq ρ < \frac{1}{2}$ such that $f_{k} (x) \geq c_{7} n^{- ρ}$ for any $1 \leq k \leq p$ and x in the domain of X_k, where $f_{k} (x)$ is the Lebesgue density function of X_k. Further, $f_{k} (x)$ is continuous in the domain of X_k.
(C3’) $R = O (n^{s})$ , $J = \max_{1 \leq k \leq p} J_{k} = O (n^{t})$ , where $s \geq 0, t \geq 0$ , and $4 τ + 4 s + 12 t + 2 ρ < 1$ .

Conditions (C5) and (C5’) exclude the extreme scenario where X_k places a heavy mass in a small range. Condition (C6) is mild and assumed for technical considerations. It requires a lower bound that is in the order of $n^{- ρ}$ for the density of X_k. For data with both categorical and continuous predictors, we can establish the following results.

Theorem 2.

(Sure screening for both categorical and continuous predictors) Under Conditions (C1), (C2), (C3’), (C5), (C5’), and (C6),

P (𝓓_{2} \subseteq {\hat{𝓓}}_{2}) \geq 1 - O (p^{2} \exp {- m n^{1 - (4 τ + 4 s + 12 t + 2 ρ)} + (2 t + s) \log n}),

where m is a positive constant. Therefore, if $\log p^{2} = O (n^{ξ})$ and $ξ < 1 - 4 τ - 4 s - 12 t - 2 ρ$ , CVIS has the sure screening property.

Theorem 3.

(Ranking consistency) If Conditions (C1), (C4), (C5), and (C6) hold for $\log (J_{k} J_{l} R) / \log n = O (1)$ and $\max {\log p^{2}, \log n} J_{k}^{6} J_{l}^{6} R^{4} / n^{1 - 2 ρ} = o (1)$ , then

\underset{n \to \infty}{lim inf} {\min_{(k, l) \in 𝓓_{2}} {\hat{C V}}_{k l}^{2} - \max_{(k, l) \in 𝓓_{2}^{c}} {\hat{C V}}_{k l}^{2}} > 0, a . s .

This result establishes that for continuous, categorical, and mixture distributions, in a unified way, the proposed approach can properly rank and hence separate important and unimportant interaction terms. We note that Condition (C4) may be slightly stronger than some of its counterparts. However, the corresponding consistency result is also stronger. It justifies a clear gap between active and inactive interactions at the sample level. That is, the CV_kl values of active interactions are always ranked above those of inactive ones with an overwhelming probability. Thus, with an appropriate cutoff, active and inactive interactions can be separated.

Remark 4.

(Computational complexity) By definition, the CV interaction filter allows the numbers of categories to differ across predictors. It can be derived that the computational complexity is $O (J^{2} R n)$ . Further, by Condition (C3), it is $O (n^{1 + s + 2 t})$ , where $4 τ + 4 s + 12 t < 1$ . Therefore, it is less than $O (n^{5 / 4})$ .

2.4. Accommodating continuous and censored survival responses

With the assistance of slicing, the CV interaction filter can accommodate continuous responses. Specifically, we define a partition

𝓖 = {[a_{g}, a_{g + 1}) : a_{g} < a_{g + 1}, g = 0, \dots, G - 1},

where $a_{0} = - \infty$ and $a_{G} = \infty$ . Each $[a_{g}, a_{g + 1})$ is referred to as a slice. We then define a random variable $T_{Y} \in {1, \dots, G}$ such that $T_{Y} = g + 1$ if and only if Y is in the gth slice. The slicing counterpart of $Θ_{i j}^{k l}$ is:

Θ_{i j}^{k l, 𝓖} : = - w_{i j}^{k l} \sum_{g = 0}^{G - 1} p_{g ∣ i j}^{𝓖} \log p_{g ∣ i j}^{𝓖},

where $p_{g ∣ i j}^{𝓖} = P (T_{Y} = g + 1 ∣ X_{k} = i, X_{l} = j) = P (a_{g} \leq Y < a_{g + 1} ∣ X_{k} = i, X_{l} = j)$ . The slicing |version of the conditional entropy set can be formulated as $Θ^{k l, 𝓖} = {Θ_{i j}^{k l, 𝓖} : i = 1, \dots, J_{k}, j = 1, \dots, J_{l}}$ . As such, we have the CV interaction filter for a continuous response:

{CV}_{k l}^{𝓖} = σ_{k l}^{𝓖} / μ_{k l}^{𝓖},

(2.6)

where $σ_{k l}^{𝓖}$ and $μ_{k l}^{𝓖}$ are the standard deviation and mean of $Θ^{k l, 𝓖}$ , respectively. In our numerical analysis, we adopt uniform slicing. Following [15], we propose $G = 2, \dots, [\log n]$ .

Now consider data with right-censored responses. Instead of ${(X_{i}, Y_{i}), i = 1, \dots, n}$ , we observe ${(X_{i}, Y_{i}^{*}, Δ_{i}), i = 1 \dots, n}$ , where $Y_{i}^{*} = \min (Y_{i}, C_{i})$ and $Δ_{i} = I (Y_{i} \leq C_{i})$ . Here, we assume that the censoring variable C_i is independent of Y_i and X_i. Denote $S (t) = P (C_{i} \geq t)$ . Let $\hat{S} (t)$ be the Kaplan-Meier estimator of S(t). To apply the CV interaction filter, we first apply uniform slicing and partition Y* into G slices. The inverse-probability-of-censoring CV filter for screening main effects (IPCW-CVMS) is based on statistic:

{\hat{CV}}_{k}^{*} : = \hat{CV} (Y^{*} ∣ X_{k}) = \frac{{\hat{σ}}_{k}^{*}}{{\hat{μ}}_{k}^{*}} = \frac{\sqrt{\sum_{i = 1}^{J_{k}} {({\hat{Θ}}_{k}^{i} - {\hat{μ}}_{k}^{*})}^{2}}}{\sqrt{J_{k} - 1} {\hat{μ}}_{k}^{*}}

where ${\hat{Θ}}_{k}^{i} = - {\hat{w}}_{i} \sum_{g = 0}^{G - 1} {\hat{p}}_{g ∣ i}^{*} \log {\hat{p}}_{g ∣ i}^{*}$ , ${\hat{w}}_{i} = \frac{1}{n} \sum_{t = 1}^{n} I (X_{t k} = i)$ , ${\hat{p}}_{r ∣ i}^{*} = \sum_{t = 1}^{n} \frac{Δ_{t}}{\hat{S} (Y_{t}^{*})} I (X_{t k} = i, a_{g} \leq Y_{t}^{*} < a_{g + 1}) / \sum_{t = 1}^{n} I (X_{t k} = m)$ and ${\hat{μ}}_{k}^{*} = \sum_{i = 1}^{J_{k}} {\hat{Θ}}_{k}^{i} / J_{k}$ . The rationale behind this is that:

E {\frac{Δ_{t}}{S (Y_{t}^{*})} I (X_{t k} = i, a_{g} \leq Y_{t}^{*} < a_{g + 1})} = P (X_{t k} = i, a_{g} \leq Y_{t} < a_{g + 1}) .

With the same strategy, the inverse-probability-of-censoring CV filter for screening interactions (IPCW-CVIS) is based on statistic:

{\hat{CV}}_{k l}^{*} : = \hat{CV} (Y^{*} ∣ X_{k}, X_{l}) = \frac{\sqrt{\sum_{i = 1}^{J_{k}} \sum_{j = 1}^{J_{l}} {({\hat{Θ}}_{k l}^{i j} - {\hat{μ}}_{k l}^{*})}^{2}}}{{\hat{μ}}_{k l}^{*} \sqrt{J_{k} J_{l} - 1}},

(2.7)

Where ${\hat{Θ}}_{k l}^{i j} = - {\hat{w}}_{i j} \sum_{g = 0}^{G - 1} {\hat{p}}_{g ∣ i j}^{*} \log {\hat{p}}_{r ∣ i j}^{*}$ , ${\hat{w}}_{i j} = \frac{1}{n} \sum_{t = 1}^{n} I (X_{t k} = i, X_{t l} = j)$ , ${\hat{p}}_{g ∣ i j}^{*} = \sum_{t = 1}^{n} \frac{Δ_{t}}{\hat{S} (Y_{t}^{*})} I (X_{t k} = i, X_{t l} = j, a_{g} \leq Y_{t}^{*} < a_{g + 1}) / \sum_{t = 1}^{n} I (X_{t k} = i, X_{t l} = j)$ , $I (\cdot)$ is the indicator function, $i \in {1, \dots, J_{k}}$ , $j \in {1, \dots, J_{l}}$ and ${\hat{μ}}_{k l}^{*} = \sum_{i = 1}^{J_{k}} \sum_{j = 1}^{J_{l}} {\hat{Θ}}_{k l}^{i j} / J_{k} J_{l}$ .

For continuous and censored responses, with the statistics defined above, screening can be conducted in the same manner as described in Section 2.2. In addition, as described in the previous subsections, the proposed screening can accommodate categorical, continuous, and mixture predictor distributions.

3. Simulation

To gauge performance of the proposed CVMS+CVIS, we compare with the following competitors: (a) PCS, which conducts the Pearson’s Chi-squared-based screening of main effects [14], (b) DCS, which conducts the distance correlation-based screening of main effects [13], (c) IGS, which conducts the information gain-based screening of main effects [18], (d) CVMS+PCIS, which conducts the screening of main effects using the proposed CV filter and the screening of interactions using the Pearson’s Chi-squared-based technique, and (e) CVMS+KIF, which is similar to approach (d), with the interaction screening based on the Kendall Interaction filter [34], (f) PCS+PCIS, which conducts the screening of main effects and interactions using the Pearson’s Chi-squared-based technique [14]. For Examples 2 and 3, we also include IIS [6], which conducts the screening of interactions for nonlinear classification, for comparison. We acknowledge that there are many other screening methods. The above have been chosen because of their competitive performance. In particular, comparing with alternatives (a)-(c) can reveal merit of the proposed CV filter in the main effect screening step, and comparing with alternatives (d)-(f) can reveal merit in the interaction screening step. With 500 replicates, we compare performance using the following criteria: (a) MMS, which is the minimum model size required to include all of the true active predictors. Its 5%, 25%, 50%, 75%, and 95% quantiles are reported, (b) $𝓟_{1}$ , which is the probability that all active main effects are ranked in the top $d_{n 1} = [n / \log n]$ , and $𝓟_{2}$ , which is the probability that all active interactions are ranked in the top $d_{n 2} = 2 [n / \log n]$ , (c) CZ, which is the percentage of correctly identified inactive predictors (among all identified inactive predictors), and (d) IZ, which is the percentages of mistakenly identified active predictors (among all identified active predictors). With the following examples, we consider n = 200, 500 and p = 1000, 5000. Here we note that, although p may seem moderate, the dimensionality of interaction analysis is actually extremely high, and screening is warranted.

Example 1.

(Index model) Denote $Σ = {(σ_{i j})}_{p \times p}$ with $σ_{i j} = ρ^{| i - j |}$ . Cauchy(0, I_p) is the p-dimensional standard Cauchy distribution. Consider the index model:

Y = {(X_{1} + X_{2} - X_{1} X_{2})}^{3} + \exp (X_{3} - 2 X_{3} X_{4}) + ε .

For X and ε, we consider the following three cases:

Case (1a): $X \sim N (0, Σ), ε \sim N (0, 1)$ , and ρ = 0.5;

Case (1b): $u \sim C a u c h y (0, I_{p})$ , $X = Σ^{1 / 2} u, ε \sim N (0, 1)$ , and ρ = 0.5;

Case (1c): the same as Case (1a) except that ρ = 0.8.

The active main effect set and interaction set are $𝓓_{1} = {1, 2, 3}$ and $𝓓_{2} = {(1, 2), (3, 4)}$ , respectively. When slicing, we partition each predictor into three categories and the response into two and three categories (R = 2 and 3). Results are summarized in Table 1. We can see that all approaches tend to be more accurate when the number of slices for the response increases. The proposed approach performs the best with the highest selection probability and smallest model size. In the main effect screening, performance of all the methods is insensitive to the number of slices, when the dependence structure of the predictors is complicated. DCS performs worse with the heavy-tailed predictors. IGS and PCS cannot maintain reasonable model sizes at the 75% and 95% quantiles. In comparison, CVMS performs well in all settings. In the interaction screening, CVIS outperforms the alternatives by a large margin. And its performance is almost insensitive to the dependence structure of covariates and extreme values. The alternatives fail with too many false discoveries, especially when the sample size is small.

Table 1:

Simulation Example 1: means of performance measures based on 500 replicates. A cell is left empty if the corresponding method is not applied.

		Main-effect selection $\| 𝓓_{1} \|$ = 3.0								Interaction selection $\| 𝓓_{2} \|$ = 2.0

(n,p)	Method	5%	25%	50%	75%	95%	$𝓟_{1}$	CZ	IZ	5%	25%	50%	75%	95%	$𝓟_{2}$	CZ	IZ

Case (1a): uniform slicing, R = 2
(200,1000)	DCS	3.0	3.0	3.0	3.0	6.0	0.990	0.997	0.000
	IGS	3.0	3.0	3.0	3.0	6.0	0.980	0.997	0.000
	CVMS+CVIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	4.0	8.0	11.0	24.0	813	0.780	0.996	0.150
	PCS+PCIS	3.0	3.0	3.0	3.0	7.0	0.980	0.997	0.000	6.0	10.0	17.0	109	1317	0.710	0.996	0.200
	CVMS+KIF	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	3.0	16.0	64.0	708	4137	0.400	0.996	0.375
(500,1000)	DCS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	IGS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	CVMS+CVIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	4.0	4.0	8.0	13.0	18.0	1.000	0.996	0.000
	PCS+PCIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	9.0	10.0	14.0	15.0	18.0	1.000	0.996	0.000
	CVMS+KIF	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	3.0	3.03	5.0	17.0	483	0.920	0.996	0.050
Case (1a): uniform slicing, R = 3
(200,1000)	DCS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	IGS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	CVMS+CVIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	2.0	8.0	15.0	23.0	4382	0.815	0.996	0.050
	PCS+PCIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	8.0	15.0	96.0	1242	8027	0.365	0.996	0.351
	CVMS+KIF	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	3.0	5.0	47.0	403	3024	0.420	0.996	0.250
(500,1000)	DCS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	IGS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	CVMS+CVIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	2.0	3.0	4.0	8.0	15.0	1.000	0.996	0.000
	PCS+PCIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	5.0	8.0	15.0	17.0	33.0	0.960	0.996	0.000
	CVMS+KIF	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	3.0	3.0	5.0	5.0	11.0	1.000	0.996	0.000
Case (1b): uniform slicing, R = 2
(200,1000)	DCS	3.0	3.0	3.0	5.0	23.0	1.000	0.996	0.000
	IGS	3.0	3.0	3.0	4.0	20.0	1.000	0.996	0.000
	CVMS+CVIS	3.0	3.0	3.0	5.0	16.0	1.000	0.996	0.000	5.0	9.0	13.0	16.0	4480	0.820	0.996	0.050
	PCS+PCIS	3.0	3.0	3.0	7.0	85.0	0.950	0.996	0.002	5.0	8.0	9.0	14.0	9996	0.800	0.996	0.075
	CVMS+KIF	3.0	3.0	3.0	5.0	16.0	1.000	0.996	0.000	49.0	183	482	1421	9996	0.170	0.996	0.400
(500,1000)	DCS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	IGS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	CVMS+CVIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	3.0	6.0	9.0	13.0	17.0	1.000	0.996	0.000
	PCS+PCIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	4.0	8.0	10.0	13.0	17.0	1.000	0.996	0.000
	CVMS+KIF	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	7.0	7.0	7.0	9.0	13.0	1.000	0.996	0.000
Case (1b): uniform slicing, R = 3
(200,1000)	DCS	2.0	3.0	3.0	4.0	35.0	0.960	0.997	0.000
	IGS	3.0	3.0	4.0	5.0	15.0	1.000	0.997	0.000
	CVMS+CVIS	3.0	3.0	3.0	6.0	17.0	1.000	0.997	0.00	3.0	6.0	11.0	16.0	4519	0.860	0.996	0.075
	PCS+PCIS	3.0	3.0	4.0	7.0	23.0	1.000	0.997	0.000	4.0	9.0	13.0	2512	8993	0.750	0.996	0.125
	CVMS+KIF	3.0	3.0	3.0	6.0	17.0	1.000	0.997	0.000	4.0	6.0	9.0	89.0	9996	0.700	0.996	0.200
(500,1000)	DCS	2.0	2.0	3.0	3.0	46.0	0.950	0.997	0.000
	IGS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	CVMS+CVIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	3.0	4.0	5.0	6.0	10.0	1.000	0.996	0.000
	PCS+PCIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	3.0	5.0	6.0	8.0	11.0	1.000	0.996	0.000
	CVMS+KIF	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	5.0	5.0	7.0	9.0	12.0	1.000	0.996	0.000
Case (1c): uniform slicing, R = 2
(200,1000)	DCS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	IGS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	CVMS+CVIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	10.0	15.0	19.0	25.0	36.0	1.000	0.996	0.000
	PCS+PCIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	13.0	18.0	21.0	27.0	34.0	1.000	0.996	0.000
	CVMS+KIF	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	79.0	459	818	2030	7580	0.080	0.996	0.500
(500,1000)	DCS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	IGS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	CVMS+CVIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	2.0	4.0	5.0	8.0	14.0	1.000	0.996	0.000
	PCS+PCIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	16.0	18.0	18.0	19.0	26.0	1.000	0.996	0.000
	CVMS+KI	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	44.0	101	253	356	555	0.350	0.996	0.475
Case (1c): uniform slicing, R = 3
(200,1000)	DCS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	IGS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	CVMS+CVIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	6.0	12.0	15.0	22.0	37.0	1.000	0.996	0.000
	PCS+PCIS	3.0	3.0	3.0	3.0	4.0	1.000	0.997	0.000	10.0	14.0	17.0	21.0	22.0	1.000	0.996	0.000
	CVMS+KIF	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	89.0	128	1242	1711	2072	0.350	0.996	0.325
(500,1000)	DCS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	IGS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000
	CVMS+CVIS	3.0	3.0	3.0	3.0	3.0	1.000	0.997	0.000	3.0	6.0	13.0	20.0	34.0	1.000	0.996	0.000
	PCS+PCIS	3.0	3.0	3.0	3.0	4.0	1.000	0.997	0.000	10.0	16.0	17.0	19.0	30.0	1.000	0.996	0.000
	CVMS+KIF	3.0	3.0	3.0	3.0	3.0	1.000	0.996	0.000	13.0	18.0	54.0	1088	1945	1.000	0.996	0.000

Open in a new tab

Example 2.

Consider data with a binary response and categorical predictors. For the ith observation, Y_i is generated from two settings: (a) $p_{k} = P (Y_{i} = k) = 1 / 2$ , $k = 1, 2$ , and (b) $P (Y_{i} = 1) = 3 / 4$ and $P (Y_{i} = 2) = 1 / 4$ . Conditional on Y_i, the predictors are generated under the following cases.

Case (2a): (binary) $P (X_{i j} = 1 ∣ Y_{i} = k) = θ_{k j}$ for j = 1, 3 and k = 1, 2.

P (X_{i, 2 r} = 1 ∣ Y_{i} = k, X_{i, 2 r - 1} = 2) = 0.05 I (θ_{k, 2 r - 1} \geq 0.5) + 0.4 I (θ_{k, 2 r - 1} < 0.5),

P (X_{i, 2 r} = 1 ∣ Y_{i} = k, X_{i, 2 r - 1} = 1) = 0.95 I (θ_{k, 2 r - 1} \geq 0.5) + 0.4 I (θ_{k, 2 r - 1} < 0.5),

for $r \in {1, 2}$ , where $(θ_{11}, θ_{13}, θ_{21}, θ_{23}) = (0.1, 0.2, 0.8, 0.9)$ , and $θ_{k, j} = 0.5$ for the other k and j.

Case (2b): (3-level categorical) $P (X_{i, j} = m ∣ Y_{i} = k) = θ_{k j, m}$ for $m = 1, 2, 3$ , $k = 1, 2$ , and $j \in {1, 3, 5}$ .

P (X_{i, j + 1} = 1 ∣ Y_{i} = 1, X_{i, j} = m) = 0.05 I (θ_{k j, m} \geq 0.5) + 0.40 I (θ_{k j, m} < 0.5),

P (X_{i, j + 1} = 2 ∣ Y_{i} = 1, X_{i, j} = m) = 0.05 I (θ_{k j, m} \geq 0.5) + 0.10 I (θ_{k j, m} < 0.5),

P (X_{i, j + 1} = 1 ∣ Y_{i} = 2, X_{i, j} = m) = 0.90 I (θ_{k j, m} \geq 0.5) + 0.40 I (θ_{k j, m} < 0.5),

P (X_{i, j + 1} = 2 ∣ Y_{i} = 2, X_{i, j} = m) = 0.05 I (θ_{k j, m} \geq 0.5) + 0.20 I (θ_{k j, m} < 0.5),

for $j \in {1, 3}$ and $m = 1, 2, 3$ . For $j > 5$ , $P (X_{j} = m) = 1 / 3$ for $m = 1, 2, 3$ . Detailed values for $θ_{k j, m}$ are presented in Table S2 of the Supplementary Materials.

Case (2c): (continuous) $X_{j} ∣ Y = 1 \sim N (4 - j, 1)$ , $X_{j} ∣ Y = 2 \sim N (2, 1)$ for $j \in {1, 3}$ .

X_{j + 1} ∣ Y = k, X_{j} \geq 3.5 - 0.5 j \sim N (5.5 - 0.5 j - k, 1),

X_{j + 1} ∣ Y = k, X_{j} < 3.5 - 0.5 j \sim N (- 1.5 - 0.5 j + k, 1),

for $j \in {1, 3}$ and $k \in {1, 2}$ . Other X_j’s for $j = 5, \dots, p$ follow a standard normal distribution.

For Case (2a) and Case (2c), $𝓓_{1} = {1, 2, 3, 4}$ , $𝓓_{2} = {(1, 2), (3, 4)}$ , and for Case (2b), $𝓓_{1} = {1, 2, 3, 4, 5}$ and $𝓓_{2} = {(1, 2), (3, 4)}$ .

For Case (2b), we find that PCS has highly unsatisfactory performance with main effects. As a “remedy”, we consider CVMS+PCIS, which adopts the proposed CV filter for main-effect screening, as opposed PCS+PCIS. Results are summarized in Tables S3–S6 in the Supplementary Materials. The overall findings are similar to those of Example 1. Specifically, as the number of categories of the predictors increases, performance of all methods gets slightly worse. Better performance is observed when the sample size increases. For main-effect screening, all methods perform well with binary predictors. With 3-level predictors, DCS, PCS, and IGS fail – they may miss important main effects even when the sample size is large. In comparison, CVMS consistently has higher coverage rates and smaller MMS values. For interaction screening, KIF and IIS break down, while PCIS and CVIS perform reasonably well. This is expected since IIS requires the predictors to be sub-Gaussian to enjoy the sure screening property. CVIS performs slightly better in almost all settings. When the sample size is small, PCIS tends to have large MMS and lower coverage rates, as the number of predictor categories increases. As expected, the difference is more obvious with imbalanced data. For Case (2c), when the predictors are continuous, we further include two other main-effect screening methods for comparison, namely the mean-variance based sure independence screening (MVS,[16]) and fused Kolmogorov filter (FKF,[15]). To apply CVMS, we dichotomize each continuous predictor at its median. Results are provided in Table S5–S6 in the Supplementary Materials, where we again observe superiority of the proposed approach. Compared to DCS, IGS, PCS, MVS, and FKF, CVMS is either the best or one of the best. Its performance can be further improved with three or more categories as described in Remark 2. In addition, CVIS has much smaller model sizes at high quantiles and higher probabilities of including all active interactions, especially under the imbalanced design and when the number of the predictors is large but the sample size is small.

Example 3.

(Generalized linear model) We simulate from the logistic model:

\log (\frac{P (Y = 1 ∣ X)}{1 - P (Y = 1 ∣ X)}) = 3 X_{1} + 2 X_{2} - 6 X_{1} X_{3} .

We consider two different cases for $X = {(X_{1}, \dots, X_{p})}^{T}$ .

Case (3a): (continuous predictors) $X \sim N (0, Σ)$ , where $Σ$ has off-diagonal entries being 0. The first and third diagonal entries are 2 and 4, respectively, and the other diagonal entries are 1.

Case (3b): (a mixture of continuous and categorical predictors) For $j \in {1, 2, p / 2 + 2, \dots, p}$ , X_j is independently generated from $N (μ_{j}, 1)$ . And for $j \in {3, 4, \dots, p / 2 + 1}$ , X_j follows a Bernoulli distribution with $P (X_{j} = 1) = 1 / 2 \cdot μ_{1} = 1$ , and the others are zero. The active interaction term, $X_{1} X_{3}$ , is a product of a binary predictor and a continuous one.

Under this example, $𝓓_{1} = {1, 2}$ and $𝓓_{2} = {(1, 3)}$ . This model has a relatively simple structure. Here, we do not adopt the two-stage strategy. Instead, we directly resort to interaction screening. With CVIS and PCIS, all the continuous predictors are converted to binary via dichotomizing at the medians. Results are presented in Table S7 in the Supplementary Materials. The patterns of the findings are comparable to those above. The proposed CVIS is able to separate the true nonzero effects from the zeros with high accuracy. It outperforms IIS in terms of $𝓟_{2}$ . It is comparable to KIF in terms of MMS and slightly outperforms PCIS. In terms of CZ, it is superior to the other two approaches, which have more false discoveries.

Example 4.

(Transformation model for a censored response) Y_i is generated from the transformation model:

H (Y_{i}) = - X_{i}^{T} β_{1} - Z_{i}^{T} β_{2} + ε,

where X_i is the p-dimensional vector of predictors, $Z_{i} = {(X_{i 1} X_{i 2}, \dots, X_{i 1} X_{i p}, \dots, X_{i, p - 1} X_{i p})}^{'}$ contains all two-way interactions, and $H (c) = \log {0.5 (e^{2 c} - 1)}$ . The predictors are generated from a multivariate normal distribution with marginal means 0 and covariance Σ with $cov (X_{i j}, X_{i k}) = {0.5}^{| j - k |} . ε \sim N (0, 1)$ . The censoring variable C_i is generated from a uniform distribution on [0, 7], and the censoring rate is around 15%. We set $β_{1} = (1.2, 1.0, 0.9, 0_{5}, - 0.9, - 0.8, - 1.2, 0_{p - 11})$ and $β_{2} = (- 1.0, 0_{7 p + 28}, - 1.0, 0_{0.5 p^{2} - 7.5 p - 30})$ . Thus, $𝓓_{1} = {1, 2, 3, 9, 10, 11}$ and $𝓓_{2} = {(1, 2), (8, 9)}$ . For this example, we compare against IPCW-tau [35] for main effect screening, and against PC-IPCW-tau [36], PCIS, and KIF for interaction screening. In addition, we also consider all main effects and interactions “equally” and apply IPCW-tau. When applying the IPCW-CV filters, we equally discretize the response and continuous predictors into three categories. The results are summarized in Table S8 in the Supplementary Materials. We observe similar superior performance of the proposed approach. It is also noted that performance of the proposed approach does not seem to strongly depend on censoring.

4. Analysis of TCGA data

We analyze data on lung adenocarcinoma (LUAD). The dataset is obtained from The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov/). TCGA has published high-quality omics and clinical data on multiple cancer types. The TCGA LUAD data has been analyzed in multiple studies, and both main effects and interactions have been examined [38, 28]. We refer to the TCGA website and existing literature for information on study design and data collection. Multiple types of omics data are available. Here we analyze mRNA gene expressions, which have been considered in multiple interaction analyses. To demonstrate the broad applicability of the proposed approach, we consider both censored survival and categorical response variables. In the original dataset, the expression values of 19,559 genes are available. Although in principle the proposed approach can be directly used, to generate more reliable results with a limited sample size, we first conduct a moderate unsupervised screening and retain the 5,000 genes with the largest marginal variances. Thus, in the following analysis, there are 5,000 candidate main effects and 12,497,500 possible second-order interactions. Such a dimensionality is considerably higher than in most published studies. Some demographic/clinical variables are also available. We focus on gene expressions but note that the proposed approach can be potentially coupled with conditional screening to accommodate these additional variables.

4.1. Analysis of censored overall survival

We first consider overall survival, which is subject to right censoring. Among the 493 subjects, 178 have observable survival times, and the rest 315 are censored. The observed survival times range from 0.13 to 165.37 months, with a median of 20.52 months. The censoring times range from 0.37 to 241.6 months, with a median of 22.32 months.

For the proposed approach, we first equally discretize the survival outcome into two categories and take a uniform slicing to partition each gene expression measurement into three slices. The proposed screening leads to 79 main effects and 158 interactions. Detailed results are shown in Table 2. A quick literature search suggests that many of the remaining genes have sound biological implications. For instance, gene DDX59 has been identified to promote DNA replication in lung adenocarcinoma. SOD3 re-expression in tumor-associated endothelial cells increases doxorubicin delivery into and chemotherapeutic effect on tumors. CSAG2 has been found to be necessary and sufficient to drive cell and tumor growth. Gene MAGEA4 is overexpressed and can serve as an immunotherapy target in various malignant tumors, including non-small cell lung cancer. TENM1 has been identified in vertebrates, coding for membrane proteins that are mainly involved in embryonic and neuronal development. CENATAC depletion or expression of disease mutants results in excessive retention of AT-AN minor introns in about 100 genes enriched for nucleocytoplasmic transport and cell cycle regulators, and causes chromosome segregation errors. We acknowledge that those in Table 2 are not the final identification results. However, the highly sensible candidates can still provide some support to the validity of the proposed approach.

Table 2:

Analysis of censored overall survival: 79 genes identified by CVMS and 158 interactions identified by CVMS+CVIS.

CVMS-main effects		CVMS+CVIS-interactions

DDX59	PERM1	CSAG2-ELOVL4	MAGEA4-CNTN1	MNDA-CPVL
CPVL	NOTUM	CSAG2-CARD14	MAGEA4-LGR5	PAGE2-SLC40A1
TNFRSF11B	RHBDL1	CSAG2-ALDH1L2	PAGE2-MAL	CD4-CPVL
MYOZ1	RAB36	CSAG2-TDO2	MAGEA4-ACKR3	GUCA2B-ACKR3
TDO2	ATP8B2	VCX3A-CENATAC	GUCA2B-LGR5	GUCA2B-TENM1
GPD1L	FAM83A	CSAG2-ACKR3	TLR4-CPVL	TAC1-ELOVL4
CENATAC	HAVCR1	CSAG2-MYOZ1	PAGE2-CARD14	GUCA2B-EHF
RNF213	PLAC8	CSAG2-TNFRSF11B	ATG16L2-CENATAC	HOXD13-CARD14
ACKR3	CPXM2	CSAG2-CNTN1	SEC31B-CENATAC	PAGE2-SOD3
EHF	COL18A1	CSAG2-CPVL	BEX2-BEX4	HOXD13-CNTN1
CNTN1	SLC1A3	VCX3A-ELOVL4	SLCO2B1-CPVL	GUCA2B-MYOZ1
MAL	GLRB	CSAG2-CCDC184	MAGEA4-CARD14	GUCA2B- DAAM2
TENM1	CELSR1	VCX3A-TENM1	PAGE2-CPVL	GOLGA8B-CENATAC
SOD3	NPIPA5	CSAG2-EHF	MAGEA4-MAL	GUCA2B-ALDH1L2
CARD14	ACSS3	CSAG2-DAAM2	GUCA2B-CARD14	NPY-LGR5
LGR5	OGT	VCX3A-CCDC184	GUCA2B-BEX4	GUCA2B-CENATAC
ELOVL4	C15orf48	VCX3A-MYOZ1	GUCA2B-MAL	P2RY13- CPVL
BEX4	CABCOCO1	HOXD13- LOVL4	MAGEA4-GPD1L	GUCA2B-ELOVL4
CCDC184	AHNAK	CSAG2-GPD1L	MAGEA4-TENM1	GUCA2B-SLC40A1
ALDH1L2	RACGAP1	CSAG2-BEX4	PAGE2-BEX4	GUCA2B-GPD1L
DAAM2	CRISPLD2	VCX3A- CNTN1	PAGE2-ALDH1L2	TLR8-CPVL
RGS20	CDK14	CSAG2-SOD3	MAGEA4-TDO2	MAGEB2-ALDH1L2
ARHGEF26	SDSL	CSAG2-CENATAC	MAGEA4-MYOZ1	FAM193B-CENATAC
ABCA7	LIMS2	VCX3A-GPD1L	PAGE2-TDO2	CD84-CPVL
SLC22A18	ENPP5	VCX3A-CARD14	GUCA2B-TNFRSF11B	GUCA2B-TDO2
AGAP9	SLC16A8	VCX3A-MAL	HOXD13-CCDC184	PAGE2-DAAM2
ZMYND12		VCX3A- RNF213	GUCA2B-CCDC184	NPY-ELOVL4
TONSL		MAGEA4- ELOVL4	MAGEA4-DAAM2	AIF1-CPVL
GUCY1A1		VCX3A-CPVL	MAGEA4-BEX4	MAGEB2-TDO2
MYO5C		VCX3A-BEX4	MAGEA4-SOD3	HOXD13-CPVL
TNFSF4		VCX3A-ALDH1L2	MAGEA4-RNF213	HOXD13-ACKR3
PRLR		MAGEA4-CPVL	PAGE2-TNFRSF11B	HOXD13-SOD3
BOK		VCX3A-ACKR3	PAGE2-GPD1L	LMNTD2-CENATAC
SLC9A5		VCX3A-TNFRSF11B	HOXD13-LGR5	HOXD13-TNFRSF11B
GPR143		VCX3A-TDO2	TLR7-CPVL	GUCA2B-SOD3
USP27X		VCX3A-EHF	LENG8-CENATAC	CSF1R-CPVL
RFTN1		VCX3A-SOD3	PAGE2-CCDC184	NPY-CNTN1
TRIB3		CCNL2-CENATAC	MAGEA4-EHF	NCKAP1L-CPVL
WDHD1		VCX3A-DAAM2	HOXD13-TENM1	HOXD13-BEX4
OSBPL6		VCX3A-SLC40A1	TAC1-BEX4	MAGEB2-CARD14
C8B		PAGE2-CENATAC	PAGE2-RNF213	NPY-BEX4
HOXB3		PABPC1L-CENATAC	PAGE2-MYOZ1	NPY-MYOZ1
CBLC		MAGEA4-TNFRSF11B	PAGE2-CNTN1	CASP14-CARD14
GLB1L2		DDX39B-CENATAC	GUCA2B-CNTN1	CSAG3-ELOVL4
GPX3		TTLL3-CENATAC	PAGE2-LGR5	HOXD13-EHF
WSB1		MAGEA4-CCDC184	PAGE2-EHF	HOXD13-SLC40A1
ARG2		MS4A6A-CPVL	PAGE2-ACKR3	TAC1-ACKR3
ELMO3		MAGEA4-ALDH1L2	GUCA2B-RNF213	TAC1-TENM1
BST1		NPY-CCDC184	MAGEA4-SLC40A1	NPY-ACKR3
BOP1		HOXD13-ALDH1L2	MAGEA4-CENATAC	IQGAP2-CPVL
CDC42BPG		PAGE2-ELOVL4	TAC1-CCDC184	HOXD13-TDO2
SYT8		MS4A7-CPVL	CSAD-CENATAC	NEUROD1-LGR5
PDPN		PAGE2-TENM1	GUCA2B-CPVL

Open in a new tab

Analysis is also conducted using the alternative approaches. The summary of comparisons is provided in Table 3. The differences are quantified using the numbers of overlapping effects as well as RV coefficients [37]. RV coefficient measures the similarity of two data matrices and ranges between 0 and 1, with a larger value indicating a higher overlap in information (contained in two sets of main effects or interactions). It is observed that the proposed approach identifies significantly different sets of main effects and interactions from the alternatives. However, the amount of overlapping information as measured by RV coefficient is moderate to high, which is reasonable as different genes can contain similar information.

Table 3:

Numbers of main effects and interactions identified by different approaches (diagonal elements) and their overlaps (off-diagonal). RV coefficients in “()”.

Overall survival	Approach	CVMS+CVIS	PCS+PCIS	IGS+CVIS	CVMS+PCIS	IGS+PCIS

Main effects	CVMS+CVIS	79	1(0.561)	1(0.563)
	PCS+PCIS		79	78(0.999)
	IGS+CVIS			79

Interaction	CVMS+CVIS	158	0(0.714)	0(0.775)	49(0.939)	0(0.714)
	PCS+PCIS		158	85(0.961)	93(0.888)	158(0.961)
	IGS+CVIS			158	72(0.990)	109(0.961)
	CVMS+PCIS				158	93(0.888)
	IGS+PCIS					158

Stage

Main effects	CVMS+CVIS	76	14(0.339)	14(0.350)
	PCS+PCIS		76	71(0.977)
	IGS+CVIS			76

Interaction	CVMS+CVIS	152	0(0.125)	0(0.707)	4(0.192)	0(0.128)
	PCS+PCIS		152	1(0.200)	14(0.939)	92(0.996)
	IGS+CVIS			152	0(0.199)	1(0.210)
	CVMS+PCIS				152	42(0.927)
	IGS+PCIS					152

Open in a new tab

As in some published studies [32, 38], we conduct downstream analysis to further examine the effect of screening. More specifically, (a) we randomly split data into a training set of size 393 and a testing set of size 100; (b) with the training set, the proposed and alternative screenings are conducted; (c) with the obtained main effects and interactions, we apply a penalization method, which can identify the important main effects and interactions in a joint interaction analysis model and respect the variable selection hierarchy. In this step of analysis, we adopt the Cox model. This may have a “conflict” with the proposed model-free spirit. Extending the proposed approach to joint modeling is highly nontrivial and will not be pursued here; (d) the training set model is then used for prediction with the testing set samples. We adopt the C-statistic to evaluate prediction performance. A C-statistic has range [0,1], with a larger value indicating better prediction; and (e) Steps (a)-(d) are repeated 200 times. The average C-statistics are 0.7810 (CVMS), 0.7605 (PCS), 0.7733 (IGS), 0.8325 (CVMS+CVIS), 0.8019 (PCS+PCIS), 0.8129 (CVMS+PCIS), 0.8032 (IGS+CVIS), 0.8078 (IGS+PCIS), which can provide an “indirect” support to the superiority of the proposed approach. To more intuitively comprehend this result, in Figure S3 (Supplementary Materials), we present the Kaplan-Meier curves for one random split. The two groups are generated by dichotomizing the predicted risk scores at the median. We see that the difference between the good and bad survival groups is bigger under the proposed approach (and the corresponding p-value is smaller).

4.2. Analysis of categorical stage

In this set of analysis, the outcome variable is pathological stage. In the original data, there are nine stages: Stage I, Stage IA, Stage IB, Stage II, Stage IIA, Stage IIB, Stage IIIA, Stage IIIB, and Stage IV. With a limited sample size, to avoid small counts, we combine into Stages I, II, III, and IV, which have sample sizes 270, 119, 81, and 26, respectively.

The proposed screening leads to 76 main effects and 152 interactions. Details are provided in Table 4. Similar to the above subsection, we also observe that many of the screened genes have sound biological implications. For example, CSAG2 and MAGEA4 have been found to play critical roles in cancer development. LIN28B has been reported to be highly expressed during embryogenesis but silent in most adult tissues, and can block the maturation of the tumor suppressor microRNA let-7 family and mediate diverse biological functions. GUCA2B has been suggested as a susceptibility gene for essential hypertension. UGT1A10, which is expressed exclusively in extrahepatic tissues, is a highly active and important extrahepatic enzyme. MAGEA1 is a promising candidate marker for LUAD therapy, and the MAGEA1-specific CAR-T cell immunotherapy may be an effective strategy for the treatment of MAGEA1-positive LUAD. In Table 3, we summarize the comparison between the proposed and alternative screenings. The overall pattern is similar to that for overall survival. The random split-based evaluation is conducted in a similar manner as in the previous subsection. The difference is that in Step (c), a logistic model is fit. Accordingly, we use classification error as the criterion for comparison. With 200 random splittings, the average classification accuracy (1-error) values are 0.770 (CVMS), 0.753 (PCS), 0.746 (IGS), 0.786 (CVMS+CVIS), 0.769 (PCS+PCIS), 0.760 (IGS+PCIS), 0.764 (IGS+CVIS), 0.778 (CVMS+PCIS), which again suggests the superiority of the proposed approach.

Table 4:

Analysis of categorical stage: 76 genes identified by CVMS and 152 interactions identified by CVMS+CVIS.

CVMS-main effects		CVMS+CVIS-interactions

CSAG2	ZIC1	CSAG2-CSAG3	PSG4-CASP14	PSG4-STRA8
PSG4	PDX1	CSAG2-MAGEA4	LIN28B-HOXD13	PSG4-REG1A
LIN28B	NLGN4Y	CSAG2-MAGEB2	DPPA2-PAGE2	GAGE2A-CSAG2
VCX3A	TFAP2D	CSAG2-MAGEA6	LIN28B-REG1A	VCX3A-REG1A
MAGEA4	DPYSL5	CSAG2-MAGEA1	LIN28B-MAGEA1	PSG4-UGT1A10
PAGE2	PHGR1	VCX3A-VCX	DPPA2-VCX3A	GAGE2A-DLK1
GUCA2B	TM4SF5	CSAG2-PSG4	LIN28B- PIWIL3	PAGE2-VCX3A
NPY	SCGN	CSAG2-PAGE2B	DPPA2-MAGEA4	GAGE2A- DEFB4A
HOXD13	LIPK	CSAG2-PAGE2	DPPA2-PAGE2B	CSAG2-ZNF560
UGT1A10	ALX1	CSAG2-LIN28B	LIN28B- IRX4	MAGEA6-MAGEA12
TAC1	APOBEC1	CSAG2-CASP14	VCX3A-TAC1	GAGE2A-PIWIL3
MAGEB2	C1orf21	CSAG2-MAGEA12	GAGE2A-MAGEA4	GAGE2A-MAGEB2
CASP14	NSG1	CSAG2-MAGEA10	DPPA2-TAC1	PAGE2-TAC1
SOX14	GPR160	CSAG2-MAGEC2	GAGE2A- UGT1A10	LIN28B-ZNF560
CSAG3	MEOX1	CSAG2-VCX3A	LIN28B- PAGE2B	VCX3A- IRX4
MAGEA1	GMNC	CSAG2-NPY	LIN28B- CSAG2	ZFY- NLGN4Y
CGB5	CDC25C	MAGEA6-MAGEA3	GAGE2A- REG1A	CSAG2- PAGE5
PIWIL3	PIMREG	LIN28B- TAC1	DPPA2- REG1A	DPPA2- LIN28B
PRR20G	CDT1	VCX3A- PAGE2	PSG4- PAGE2B	CSAG2- HOXC12
PAGE2B	STAP1	GAGE2A- VCX3A	MAGEA4- CSAG2	DPPA2- CASP14
PRAC2	ADSS1	MAGEA3- MAGEA6	GAGE2A- LIN28B	VCX3A- MAGEA4
MAGEA6	RNASE1	GAGE2A- PSG4	DPPA2- NPY	GAGE2A- GUCA2B
HOXC12	PTGDS	GAGE2A- TAC1	MAGEA4- MAGEA10	PSG4- DEFB4A
SST	CLUL1	CSAG2- PRR20G	DPPA2- UGT1A10	LIN28B- UGT1A10
MAGEC2	SMURF2	CSAG2- PIWIL3	PSG4- NPY	GUCA2B- UGT1A10
SLC10A2		GAGE2A- PAGE2	LIN28B- PAGE2	LIN28B- PSG4
IRX4		CSAG2- HOXD13	CSAG2- DLK1	LIN28B- DLK1
REG1A		CSAG3- CSAG2	GAGE2A- STRA8	NPY- TAC1
STRA8		GAGE2A- PAGE2B	LIN28B- MAGEB2	VCX3A- NPY
MAGEA10		PSG4- CGB5	PSG4- PIWIL3	MAGEA4- CASP14
LCN15		CSAG2- STRA8	LIN28B- GUCA2B	CSAG2- DEFB4A
VCX		CSAG2-UGT1A10	DPPA2-CGB5	LIN28B-CGB5
MAGEC1		CSAG2-MAGEA3	CSAG2-VCX	LIN28B-PRR20G
DEFB4A		LIN28B-MAGEA4	DPPA2-STRA8	GAGE2A-CSAG3
NR0B1		CSAG2-MAGEC1	DPPA2-GUCA2B	MAGEA4-REG1A
SPRR2F		VCX3A-STRA8	LIN28B-HOXC12	VCX3A-MAGEA10
MAGEA3		GAGE2A-CGB5	VCX3A-PSG4	MAGEA4-UGT1A10
WFDC5		DPPA2-PSG4	LIN28B-MAGEA10	LIN28B-STRA8
PAGE5		PSG4-TAC1	PSG4-PAGE2	PSG4-MAGEA4
DLK1		LIN28B-VCX3A	LIN28B-NPY	PSG4-HOXD13
ACTL8		CSAG2-TAC1	MAGEA4-MAGEA6	VCX3A-DLK1
MAGEA12		PSG4-VCX3A	PSG4-DLK1	GAGE2A-MAGEA1
HOXA13		CSAG2-IRX4	MAGEA4-VCX3A	VCX3A-ZNF560
GP2		MAGEA4-MAGEA1	DPPA2-PIWIL3	MAGEB2-CSAG2
TFF2		PAGE2-PAGE2B	LIN28B-CASP14	DDX3Y-NLGN4Y
UGT2B11		CSAG2-GUCA2B	DPPA2-RX4	GAGE2A-PAGE5
ETNPPL		GAGE2A-CASP14	GAGE2A-PRR20G	VCX3A-CASP14
SPRR2A		CSAG2-REG1A	DPPA2-DLK1	FTHL17-UGT1A10
ZNF560		GAGE2A-IRX4	MAGEA4-PAGE2	VCX3A-PIWIL3
KRT75		VCX3A-PAGE2B	CSAG2-CGB5	CSAG2-ZIC1
INSL4		GAGE2A-NPY	PSG4-IRX4

Open in a new tab

5. Discussion

In this article, we have developed a new marginal screening approach. Although marginal screening is not a new topic, with the increasing resolution of profiling (and hence increasing dimensionality), it still plays an essential role in data analysis, and there is still a strong demand for more effective screening methods. This study has advanced from many of the existing studies by focusing on interactions, whose significance is increasingly recognized. The proposed approach is based on Shannon’s information theory, whose applications to screening remain limited. It can flexibly accommodate different types of distributions of response and predictors under one unified framework. It has the much-desired robustness properties not shared by the model-based and many other approaches. The theoretical development has provided a uniquely strong basis, and the numerical studies have convincingly established its practical superiority.

For convenience, as in the literature, we have employed a hard thresholding cutoff in each stage to retain a fixed number of predictors. It is possible to more data-dependently determine d_n1 and d_n2. First consider ${\hat{𝓓}}_{1}$ . Let ${s_{1}, \dots, s_{p}}$ be a permutation of ${1, \dots, p}$ such that ${\hat{CV}}_{s 1} \geq {\hat{CV}}_{s_{1}} \geq \dots \geq {\hat{CV}}_{s_{p}}$ . We adopt the maximum ratio criterion [14], with which $d_{n 1} = \arg \max_{1 \leq j \leq p - 1} {\hat{CV}}_{s_{j}} / {\hat{CV}}_{s_{j + 1}}$ . Asymptotically, it can be proved that ${\hat{CV}}_{s_{j}} / {\hat{CV}}_{s_{j + 1}}$ is $O_{p} (1)$ when $j \neq d_{1}$ and ${\hat{CV}}_{s_{d_{1}}} / {\hat{CV}}_{s_{d_{1} + 1}} \overset{p}{\to} \infty$ , since ${\hat{CV}}_{s_{d_{1} + 1}}$ can be arbitrarily small. Here, $d_{1} = | 𝓓_{1} |$ . However, this criterion can be unstable with very large or very small $d_{n 1}$ , when there are predictors with very strong or weak effects [18]. To remedy this problem, a resampling-based method can be adopted, and $d_{n 1}$ can be restricted to be smaller than a user-specific constant. This technique proceeds as follows: (i) Generate B bootstrap samples. (ii) Calculate CV filters for each bootstrap sample. For the ith bootstrap sample, the CV estimates are ordered from the largest to smallest, and we calculate $d_{n 1}^{(i)} = \arg \max_{1 \leq j \leq d_{\max}} {\hat{CV}}_{s_{j}} / {\hat{CV}}_{s_{j + 1}}$ , $i = 1, \dots, B$ (iii) Obtain $d_{n 1} = ⌈ \frac{1}{B} \sum_{1 \leq i \leq B} d_{n 1}^{(i)} ⌉$ . Similar discussions hold for $d_{n 2}$ . With the satisfactory performance of the hard cutoffs, we do not pursue this computationally more expensive determination in this article.

This study can be potentially extended in multiple ways. The proposed approach has been designed for the interactions of the same type of predictors. In omics studies, this amounts to gene-gene interactions. It will be almost straightforward to extend the proposed CV filters to gene-environment interactions, which involve two different types of predictors. We have focused on two-way interactions. Higher-order interactions are statistically meaningful, however, still have very limited practical applications under high-dimensional settings. We have focused on screening. It may be of interest to further develop joint interaction modeling also based on Shannon’s information theory, so the overall analysis – consisting of screening and joint modeling – can be more coherent.

Supplementary Material

NIHMS1862131-supplement-1.txt^{(497B, txt)}

NIHMS1862131-supplement-2.rar^{(1.3MB, rar)}

NIHMS1862131-supplement-3.pdf^{(601.8KB, pdf)}

Acknowledgements

We thank the editor and reviewers for their careful review and insightful comments, which have led to a significant improvement of the article. This study has been partly supported by NSFC grants 12001101 and 20YQ18, and NIH CA204120, CA121974, and CA196530.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

[1].Moore J, and Williams S. (2009). Epistasis and Its Implications for Personal Genetics. American Journal of Human Genetics, 85(3): 309–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Khan A, Dinh DM, Schneider D, Lenski R, and Cooper T. (2011). Negative epistasis between beneficial mutations in an evolving bacterial population. Science, 332(6034), 1193–1196. [DOI] [PubMed] [Google Scholar]
[3].Yuan M, Joseph VR and Zou H. (2009). Structured variable selection and estimation. Annals of Applied Statistics, 3, 1738–1757. [Google Scholar]
[4].Choi N, Li W, and Zhu J. (2010). Variable selection with the strong heredity constraint and its oracle property. Journal of the American Statistical Association, 105, 354–364. [Google Scholar]
[5].Bien J, Taylor J, and Tibshirnani R. (2013). A LASSO for hierarchical interactions. The Annals of Statistics, 41, 1111–1141. [DOI] [PMC free article] [PubMed] [Google Scholar]
[6].Fan Y, Kong Y, Li D, and Zheng Z. (2015). Innovated interaction screening for high-dimensional nonlinear classification. The Annals of Statistics, 43(3), 1243–1272. [Google Scholar]
[7].Yan J, Risacher S, Shen L, and Andrew S. (2018). Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data. Briefings in Bioinformatics, 19(6), 1370–1381. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Hao N, and Zhang H. (2017). A Note on High-Dimensional Linear Regression With Interactions. The American Statistician, 71(4), 291–297 [Google Scholar]
[9].Fan J, Feng Y, and Song R. (2011). Nonparametric Independence Screening in Sparse Ultra-high Dimensional Additive Models. Journal of the American Statistical Association, 106, 544–557. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Liu J, Li R, and Wu R. (2014). Feature Selection for Varying Coefficient Models with Ultrahigh Dimensional Covariates. Journal of the American Statistical Association, 109, 266–274. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].He X, Wang L, and Hong H. (2013). Quantile-Adaptive Model-Free Variable Screening for High-Dimensional Heterogeneous Data. The Annals of Statistics, 41, 342–369. [Google Scholar]
[12].Li G, Peng H, Zhang J, and Zhu L. (2012). Robust Rank Correlation Based Screening. The Annals of Statistics, 40, 1846–1877. [Google Scholar]
[13].Li R, Zhong W, and Zhu L. (2012). Feature Screening Via Distance Correlation Learning. Journal of American Statistical Association, 107, 1129–1139. [DOI] [PMC free article] [PubMed] [Google Scholar]
[14].Huang D, Li R, and Wang H. (2014). Feature screening for ultrahigh dimensional categorical data with applications. Journal of Business & Economic Statistics, 32(2), 237–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Mai Q, and Zou H. (2015). The fused Kolmogorov filter: a nonparametric model-free screening method. The Annals of Statistics, 43(4), 1471–1497. [Google Scholar]
[16].Cui H, Li R, and Zhong W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110(510), 630–641. [DOI] [PMC free article] [PubMed] [Google Scholar]
[17].Huang J, Horowitz J, and Ma S. (2008). Asymptotic Properties of Bridge Estimators in Sparse High-Dimensional Regression Models. The Annals of Statistics, 36, 587–613. [Google Scholar]
[18].Ni L, and Fang F. (2016). Entropy-based Model-free Feature Screening for Ultrahigh-dimensional Multiclass Classification. Journal of Nonparametric Statistics, 28(3), 515–530. [Google Scholar]
[19].Hall P, and Xue. J. (2014). On selecting interacting features from high-dimensional data. Computational Statistics & Data Analysis, 71, 694–708. [Google Scholar]
[20].Hao N, and Zhang H. (2014). Interaction Screening for Ultrahigh-Dimensional Data. Journal of the American Statistical Association, 109(507), 1285–1301. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Li Y, and Liu J. (2019). Robust variable and interaction selection for logistic regression and general index models. Journal of the American Statistical Association, 114(525), 271–286. [DOI] [PMC free article] [PubMed] [Google Scholar]
[22].Dong C, Chu X, Wang Y. et al. (2008). Exploration of gene–gene interaction effects using entropy-based methods. European Journal of Human Genetics, 16, 229–235. [DOI] [PubMed] [Google Scholar]
[23].Wu X, Jin L, and Xiong M. (2009). Mutual Information for Testing Gene-Environment Interaction. PLoS One, 4(2), e4578. [DOI] [PMC free article] [PubMed] [Google Scholar]
[24].Fan R, Zhong M, Wang S, Zhang Y, Andrew A, Karagas M, Chen H, Amos C, Xiong M, and Moore J. (2011). Entropy-based information gain approaches to detect and to characterize gene-gene and gene-environment interactions/correlations of complex diseases. Genetic Epidemiology, 35(7), 706–721. [DOI] [PMC free article] [PubMed] [Google Scholar]
[25].Jiang R, Tang W, Wu X, and Fu W. (2009). A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinformatics, 10(Suppl 1), S65. [DOI] [PMC free article] [PubMed] [Google Scholar]
[26].O’Hagan S, Wright Muelas., Day P, Lundberg E, Kell D. (2018). GeneGini: Assessment via the Gini Coefficient of Reference “Housekeeping” Genes and Diverse Human Transporter Expression Profiles. Cell System, 6(2), 230–244.e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
[27].Zhao J, Zhou Y, Zhang X, and Chen L. (2006). Part mutual information for quantifying direct associations in networks. Proceedings of National Academy of Sciences of the United States of America, 113(18), 5130–5135. [DOI] [PMC free article] [PubMed] [Google Scholar]
[28].Xu Y, Wu M, Zhang Q, and Ma S. (2019). Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach. Genomics, 111, 1115–1123. [DOI] [PMC free article] [PubMed] [Google Scholar]
[29].Pan W. (2009). Asymptotic Tests of Association with Multiple SNPs in Linkage Disequilibrium. Genetic Epidemiology, 33(6), 497–507. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Pan W, Shen X. (2011). Adaptive Tests for Association Analysis of Rare Variants. Genetic Epidemiology, 35(5), 381–388. [DOI] [PMC free article] [PubMed] [Google Scholar]
[31].Shi X, Liu J, Huang J, Zhou Y, Xie Y, and Ma S. (2014). A penalized robust method for identifying gene-environment interactions. Genetic Epidemiology, 38(3), 220–230. [DOI] [PMC free article] [PubMed] [Google Scholar]
[32].Wu C, Shi X, Cui Y, and Ma S. (2015). A penalized robust semiparametric approach for gene–environment interactions. Statistics in Medicine, 34(30), 4016–4030. [DOI] [PMC free article] [PubMed] [Google Scholar]
[33].Shannon C. (1974). A Mathematical Theory of Communication. Bell Labs Technical Journal, 27(4), 379–423. [Google Scholar]
[34].Anzarmou Y, Mkhadri A, and Oualkacha K. (2022). The Kendall Interaction Filter for Variable Interaction Screening in Ultra High Dimensional Classification Problems. Journal of Applied Statistics, published online. [DOI] [PMC free article] [PubMed] [Google Scholar]
[35].Song R, Lu W, Ma S, and Jeng X. (2014). Censored rank independence screening for high-dimensional survival data. Biometrika, 101(4), 799–814. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36].Wang J, and Chen Y. (2020). Interaction screening by Kendall’s partial correlation for ultrahigh-dimensional data with survival trait. Bioinformatics, 36(9), 2763–2769. [DOI] [PubMed] [Google Scholar]
[37].Escoufier Y. (1973). Le traitement des variables vectorielles. Biometrics, 29, 751–760. [Google Scholar]
[38].Wu M, Huang J, and Ma S. (2018). Identifying gene-gene interactions using penalized tensor regression. Statistics in Medicine, 37(4), 598–610. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1862131-supplement-1.txt^{(497B, txt)}

NIHMS1862131-supplement-2.rar^{(1.3MB, rar)}

NIHMS1862131-supplement-3.pdf^{(601.8KB, pdf)}

[R1] [1].Moore J, and Williams S. (2009). Epistasis and Its Implications for Personal Genetics. American Journal of Human Genetics, 85(3): 309–320. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] [2].Khan A, Dinh DM, Schneider D, Lenski R, and Cooper T. (2011). Negative epistasis between beneficial mutations in an evolving bacterial population. Science, 332(6034), 1193–1196. [DOI] [PubMed] [Google Scholar]

[R3] [3].Yuan M, Joseph VR and Zou H. (2009). Structured variable selection and estimation. Annals of Applied Statistics, 3, 1738–1757. [Google Scholar]

[R4] [4].Choi N, Li W, and Zhu J. (2010). Variable selection with the strong heredity constraint and its oracle property. Journal of the American Statistical Association, 105, 354–364. [Google Scholar]

[R5] [5].Bien J, Taylor J, and Tibshirnani R. (2013). A LASSO for hierarchical interactions. The Annals of Statistics, 41, 1111–1141. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] [6].Fan Y, Kong Y, Li D, and Zheng Z. (2015). Innovated interaction screening for high-dimensional nonlinear classification. The Annals of Statistics, 43(3), 1243–1272. [Google Scholar]

[R7] [7].Yan J, Risacher S, Shen L, and Andrew S. (2018). Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data. Briefings in Bioinformatics, 19(6), 1370–1381. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] [8].Hao N, and Zhang H. (2017). A Note on High-Dimensional Linear Regression With Interactions. The American Statistician, 71(4), 291–297 [Google Scholar]

[R9] [9].Fan J, Feng Y, and Song R. (2011). Nonparametric Independence Screening in Sparse Ultra-high Dimensional Additive Models. Journal of the American Statistical Association, 106, 544–557. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] [10].Liu J, Li R, and Wu R. (2014). Feature Selection for Varying Coefficient Models with Ultrahigh Dimensional Covariates. Journal of the American Statistical Association, 109, 266–274. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] [11].He X, Wang L, and Hong H. (2013). Quantile-Adaptive Model-Free Variable Screening for High-Dimensional Heterogeneous Data. The Annals of Statistics, 41, 342–369. [Google Scholar]

[R12] [12].Li G, Peng H, Zhang J, and Zhu L. (2012). Robust Rank Correlation Based Screening. The Annals of Statistics, 40, 1846–1877. [Google Scholar]

[R13] [13].Li R, Zhong W, and Zhu L. (2012). Feature Screening Via Distance Correlation Learning. Journal of American Statistical Association, 107, 1129–1139. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] [14].Huang D, Li R, and Wang H. (2014). Feature screening for ultrahigh dimensional categorical data with applications. Journal of Business & Economic Statistics, 32(2), 237–244. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] [15].Mai Q, and Zou H. (2015). The fused Kolmogorov filter: a nonparametric model-free screening method. The Annals of Statistics, 43(4), 1471–1497. [Google Scholar]

[R16] [16].Cui H, Li R, and Zhong W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110(510), 630–641. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] [17].Huang J, Horowitz J, and Ma S. (2008). Asymptotic Properties of Bridge Estimators in Sparse High-Dimensional Regression Models. The Annals of Statistics, 36, 587–613. [Google Scholar]

[R18] [18].Ni L, and Fang F. (2016). Entropy-based Model-free Feature Screening for Ultrahigh-dimensional Multiclass Classification. Journal of Nonparametric Statistics, 28(3), 515–530. [Google Scholar]

[R19] [19].Hall P, and Xue. J. (2014). On selecting interacting features from high-dimensional data. Computational Statistics & Data Analysis, 71, 694–708. [Google Scholar]

[R20] [20].Hao N, and Zhang H. (2014). Interaction Screening for Ultrahigh-Dimensional Data. Journal of the American Statistical Association, 109(507), 1285–1301. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] [21].Li Y, and Liu J. (2019). Robust variable and interaction selection for logistic regression and general index models. Journal of the American Statistical Association, 114(525), 271–286. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] [22].Dong C, Chu X, Wang Y. et al. (2008). Exploration of gene–gene interaction effects using entropy-based methods. European Journal of Human Genetics, 16, 229–235. [DOI] [PubMed] [Google Scholar]

[R23] [23].Wu X, Jin L, and Xiong M. (2009). Mutual Information for Testing Gene-Environment Interaction. PLoS One, 4(2), e4578. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] [24].Fan R, Zhong M, Wang S, Zhang Y, Andrew A, Karagas M, Chen H, Amos C, Xiong M, and Moore J. (2011). Entropy-based information gain approaches to detect and to characterize gene-gene and gene-environment interactions/correlations of complex diseases. Genetic Epidemiology, 35(7), 706–721. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] [25].Jiang R, Tang W, Wu X, and Fu W. (2009). A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinformatics, 10(Suppl 1), S65. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] [26].O’Hagan S, Wright Muelas., Day P, Lundberg E, Kell D. (2018). GeneGini: Assessment via the Gini Coefficient of Reference “Housekeeping” Genes and Diverse Human Transporter Expression Profiles. Cell System, 6(2), 230–244.e1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] [27].Zhao J, Zhou Y, Zhang X, and Chen L. (2006). Part mutual information for quantifying direct associations in networks. Proceedings of National Academy of Sciences of the United States of America, 113(18), 5130–5135. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] [28].Xu Y, Wu M, Zhang Q, and Ma S. (2019). Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach. Genomics, 111, 1115–1123. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] [29].Pan W. (2009). Asymptotic Tests of Association with Multiple SNPs in Linkage Disequilibrium. Genetic Epidemiology, 33(6), 497–507. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] [30].Pan W, Shen X. (2011). Adaptive Tests for Association Analysis of Rare Variants. Genetic Epidemiology, 35(5), 381–388. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] [31].Shi X, Liu J, Huang J, Zhou Y, Xie Y, and Ma S. (2014). A penalized robust method for identifying gene-environment interactions. Genetic Epidemiology, 38(3), 220–230. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] [32].Wu C, Shi X, Cui Y, and Ma S. (2015). A penalized robust semiparametric approach for gene–environment interactions. Statistics in Medicine, 34(30), 4016–4030. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] [33].Shannon C. (1974). A Mathematical Theory of Communication. Bell Labs Technical Journal, 27(4), 379–423. [Google Scholar]

[R34] [34].Anzarmou Y, Mkhadri A, and Oualkacha K. (2022). The Kendall Interaction Filter for Variable Interaction Screening in Ultra High Dimensional Classification Problems. Journal of Applied Statistics, published online. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] [35].Song R, Lu W, Ma S, and Jeng X. (2014). Censored rank independence screening for high-dimensional survival data. Biometrika, 101(4), 799–814. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] [36].Wang J, and Chen Y. (2020). Interaction screening by Kendall’s partial correlation for ultrahigh-dimensional data with survival trait. Bioinformatics, 36(9), 2763–2769. [DOI] [PubMed] [Google Scholar]

[R37] [37].Escoufier Y. (1973). Le traitement des variables vectorielles. Biometrics, 29, 751–760. [Google Scholar]

[R38] [38].Wu M, Huang J, and Ma S. (2018). Identifying gene-gene interactions using penalized tensor regression. Statistics in Medicine, 37(4), 598–610. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Unified model-free interaction screening via CV-entropy filter

Wei Xiong

Yaxian Chen

Shuangge Ma

Abstract

1. Introduction

2. Methods

2.1. A new interaction filter based on conditional entropy

Proposition 1.

Illustrating examples

Remark 1.

2.2. Screening Strategy

Stage 1.

Stage 2.

Remark 2.

2.3. Statistical Properties

Theorem 1.

Remark 3.

Theorem 2.

Theorem 3.

Remark 4.

2.4. Accommodating continuous and censored survival responses

3. Simulation

Example 1.

Table 1:

Example 2.

Example 3.

Example 4.

4. Analysis of TCGA data

4.1. Analysis of censored overall survival

Table 2:

Table 3:

4.2. Analysis of categorical stage

Table 4:

5. Discussion

Supplementary Material

Acknowledgements

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases