Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Mar 1.
Published in final edited form as: Genet Epidemiol. 2019 Nov 14;44(2):159–196. doi: 10.1002/gepi.22270

Identification of Gene-environment Interactions with Marginal Penalization

Sanguo Zhang 1, Yuan Xue 1,2, Qingzhao Zhang 3, Chenjin Ma 4,2, Mengyun Wu 5, Shuangge Ma 2
PMCID: PMC7028443  NIHMSID: NIHMS1057491  PMID: 31724772

Abstract

Gene-environment (G-E) interaction analysis has been extensively conducted for complex diseases. In marginal analysis, the common practice is to conduct likelihood-based (and other “standard”) estimation with each marginal model, and then select significant G-E interactions and main effects based on p-values and multiple comparisons adjustment. One limitation of this approach is that the identification results often do not respect the “main effects, interactions” hierarchy, which has been stressed in recent G-E interaction analyses. There is some recent effort tackling this problem, however, with very complex formulations. Another limitation of the common practice is that it may not perform well when regularization is needed for example because of “non-normal” distributions. In this article, we propose a marginal penalization approach which adopts a novel penalty to directly tackle the aforementioned problems. The proposed approach has a framework more coherent with that of the recently developed joint analysis methods and an intuitive formulation, and can be effectively realized. In simulation, it outperforms the popular significance-based analysis and simple penalization-based alternatives. Promising findings are made in the analysis of a SNP and a gene expression data.

Keywords: Gene-environment interaction, Marginal analysis, Penalization

1. Introduction

For many complex diseases such as cancer and cardiovascular diseases, gene-environment (G-E) interactions have critical implications for etiology, prognosis, response to treatment, and biomarkers (Hunter, 2005; Thomas, 2010; Simonds et al., 2016). Marginal G-E interaction analysis, which examines one or a small number of genes at a time, has been routinely conducted (Wu et al., 2012; Hutter et al., 2013; Sun et al., 2018). Denote Y as a disease outcome/phenotype of interest, Z = (Z1, Z2,…, Zp) as p genes, and X = (X1,…, Xq) as q environmental risk factors. Here we use the generic terminology “gene”, which can be gene expression (GE), single nucleotide polymorphism (SNP), or other omics measurements. For environmental (E) factors, as in quite a few other studies, we take a looser definition to also include clinical and other risk factors. In the literature, the most popular approach proceeds as follows: (i) For j = 1,…,p, fit the regression model Yϕ(k=1qαkXk+βjZj+k=1qγj,kXkZj), where the form of model ϕ is assumed to be known, αk’s, βj, and γj,k’s are the unknown regression coefficient. As usually the number of environmental risk factors q is small, this is a “standard” estimation problem, and likelihood-based and other simple estimation techniques are adopted. Denote pj as the p-value of β^j, the estimate of βj, and pj,k as the p-value of γ^j,k, the estimate of γj,k; (ii) With {pj : j = 1,…,p} and {pj,k : j = 1,… ,p, k = 1,…, q}, conduct multiple comparisons adjustment with the Bonferroni or FDR approach, and identify significant effects. Some studies identify significant interactions and main effects together (Kraft et al., 2007; Dai et al., 2012), while others conduct separate identification. It is noted that there have been variations of this approach. For example, some studies use objective functions other than likelihood functions (Ritchie et al., 2001).

Despite great successes, the above approach has limitations. The most important is that the “main effects, interactions” hierarchy may be violated. That is, an interaction term may have a significant p-value and concluded as significantly associated with the response, but the corresponding main effects are not. In G-E interaction analysis, usually E factors are manually selected based on prior knowledge (and hence likely to be important) and have a low dimensionality. As such, the main interest and hierarchy are on G-E interactions and main G effects. In joint G-E interaction analysis, which analyzes a large number of G factors in a single model, the statistical and biological importance of this hierarchy has been stressed (Bien et al., 2013; Hao and Zhang, 2017), and a long array of methods have been developed to respect this hierarchy. In marginal analysis, comparatively, there has been much less attention to this hierarchy. One foundational work is Bien et al. (2015), which establishes the essential importance of the “main effects, interactions” hierarchy in marginal analysis. It is argued that a significant G-E interaction (identified as associated with the response) would demand the significance of the corresponding main G effect (hence also identified as response-associated). It is also pointed out that it is insufficient to simply include the main G effect in modeling if it is not significant. This work is in the hypothesis testing framework, and, despite great successes, suffers from very complex formulation and computation. Another related strategy is to conduct two-step identification (Hao and Zhang, 2017), which identifies significant main G effects first and then searches for significant interactions only among those identified in the first step. This can ensure that a significant interaction always has a corresponding significant main G effect. However, Hao and Zhang (2017) points out that this approach demands assumptions. In addition, Bien et al. (2015) suggests that this may miss important interactions if the corresponding main effects are not sufficiently significant. The second limitation is that simple estimations may not behave well enough. Consider, for example, linear regression models with random errors deviating from normal. One potential solution is to adopt objective functions with certain robustness properties (Shi et al., 2014; Li et al., 2013). But often such objective functions are not easy to compute and conduct inference with.

In this article, our goal is to develop a new marginal G-E interaction analysis method, which may differ from the existing ones in multiple aspects. Advancing from the “straightforward” significance-based analysis, it respects the “main effects, interactions” hierarchy. That is, if a G-E interaction is identified as important for the response variable, then the corresponding main G effect is automatically identified. To achieve this, we propose using penalization for G-E interaction identification and estimation, which differs fundamentally from the significance-based approaches. Although sharing the same philosophy of respecting the hierarchy, the proposed method differs from Bien et al. (2015) by adopting a penalized estimation/selection, not hypothesis testing, strategy. In addition, implementation of the proposed method is much simpler than Bien et al. (2015). It also differs from the two-step approach by searching for important interactions and main effects simultaneously, which may improve performance by “borrowing information” between the two identifications. In addition, penalized estimation can also be advantageous over “standard” estimation (Gauderman et al., 2013). It is noted that it is not “abnormal” to employ penalization in low-dimensional estimation. In particular, the method in Bien et al. (2015) is also solved via penalized estimation. With the hierarchy and regularized estimation, the proposed approach has an analysis framework more coherent with that of the joint analysis (Liu et al., 2013; Wu et al., 2018; Wu et al., 2017; Zhao et al., 2018; Wang et al., 2019). This coherence facilitates methodological developments as well as biological interpretations. We fully acknowledge the connections between the proposed analysis strategy and technique with those of the joint genetic interaction analysis. Marginal and joint analysis cannot replace each other. Considering the importance of marginal analysis and lack of approaches that possess the aforementioned desirable properties, this study is warranted and will provide a practically useful new way of conducting marginal G-E interaction analysis.

2. Methods

For the jth gene Zj(j = 1,…, p), consider the regression model

Y~ϕ(αj,0+k=1qXkαj,k+Zjβj+k=1qXkZiγj,k)=ϕ(αj,0+Xαj+Wjbj), (1)

where the form of function ϕ is known, α0,j is the unknown intercept to accommodate models such as linear and logistic, αj = (αj,1,…, αj,q)′ is the vector of unknown regression coefficients representing the main E effects, Wj ≜ (Zj, ZjX1,…, ZjXq) accommodates the main G effect and G-E interactions, and bj = (βj, γj,1,…, γj,q)′ ≜ (bj,1, bj,2,…, bj,q+1)′ is the vector of unknown coefficients for Wj. It is noted that the proposed analysis can accommodate a wide variety of response variables (such as continuous, categorical, and censored survival) and models (such as linear, logistic, and Cox) as well as G (for example, gene expression and SNP) and E (environmental, clinical, etc.) measurements.

Denote l(α0,j, αj, bj) as the negative log-likelihood function. Other objective functions, for example estimating equation-based, can be analyzed in the same manner. We introduce a penalized objective function

Q(α0,j,αj,bj)=l(α0,j,αj,bj)+Pλ(bj), (2)

where P is the penalty function and λ is the vector of data-dependent tuning parameters. Minimizing the objective function (2) with respect to the parameters α0,j, αj and bj, we get the estimate

(α^0,j,α^j,b^j)=argminQ(α0,j,αj,bj). (3)

For the penalty function, we adopt the sparse group MCP (sgMCP) with

Pλ(bj)=ρ(bj;q+1λ1,ξ)+k=2q+1ρ(bj,k;λ2,ξ), (4)

where ρ(t;λ,ξ)=λ0t(1xλξ)+dx is the MCP penalty (Zhang et al., 2010), λ1, λ2 > 0 are data-dependent tuning parameters, and ξ is the regularization parameter.

Penalized estimation is applied to all p genes. To ensure that all genes are analyzed on the same ground, the same tuning/regularization parameters are applied in all estimations. This is also coherent with that in joint analysis. Different from joint analysis, each gene has its own model and likelihood function. Interactions and main G effects corresponding to the nonzero components of b^j ’s are identified as important and associated with the response. This is an estimation-based identification strategy and has notable differences from the significance-based (including that in Bien et al., 2015).

Rationale

Similar to other marginal analysis, one gene is considered at a time, and a marginal regression model is assumed. We adopt the penalized estimation and identification strategy, which has been extensively adopted in joint analysis (Liu et al., 2013; Kim et al., 2017) but only a few marginal analysis (Bien et al. 2015). The sparse group MCP automatically ensures that if bj,k ≠ 0 for any k ≥ 2 (that is, an interaction term), then bj,1 ≠ 0 (that is, the main G effect). The MCP-based penalty is adopted because of its satisfactory performance demonstrated in many publications. There are multiple strong reasons for adopting the sparse group MCP. It can conduct the selection of both interactions and main effects and be advantageous over, for example, applying penalization to interactions only (which cannot discriminate important main G effects against noises). Even when only interactions are of interest, by reinforcing the hierarchy, the proposed penalization can “borrow information” from main effects in the estimation/identification of interactions. Consider for example when a main G effect is zero (not identified). Then this suggests that the corresponding interactions should not be identified. That is, there is additional information for interaction identification. This may make the proposed analysis advantageous over two-step analysis, which does not use information on interactions in the first step of main effect analysis. Adopting this penalty also makes the frameworks of marginal and joint analyses (Liu et al., 2013 and others) more coherent. A “byproduct” of penalization is its regularization property. Consider, for example, linear regression with random errors deviating from normal. It has been well recognized that in this case, even with a low dimensionality, regularization may be needed to generate reliable estimation. This can be achieved with the proposed penalization.

As in other penalization problems, the identification of important effects is achieved by examining the estimates. With a sequence of tuning parameters, a sequence of identification results can be generated, which is “equivalent” to varying the p-value cutoff in significance-based approaches. When it is desirable to generate one fixed set of identification results, we propose using the extended BIC (EBIC; Chen and Chen, 2008) criterion, which has been extensively adopted in high-dimensional penalization problems. We note that high dimensional inference under penalization is still a highly challenging problem, and the present problem has remarkable differences from the joint penalized estimation. Our preliminary exploration does not suggest an easy way of making rigorous inference. As such, as a limitation of the proposed approach, there is not a simple significance interpretation associated with the proposed identification.

Computation

It is first noted that, as in other marginal analysis, computation can be conducted in a highly parallel manner. Only the tuning parameter selection demands pooling all genes together, which is needed in other marginal analyses too.

First consider a continuous response and linear models. To simplify notation, we suppress the dependence and hence subscript on j. For subject i(= 1,…,n), let Yi be the response of interest, X = (Xi1,…, Xiq) be the q-dimensional vector of E measurements, and Zi. be the jth G measurement, consider the model

Yi=α0+Xi.α+Wi.b+εi, (5)

where εi is the random error, W = (Z, ZXi1,…, ZXiq)′. With normalization, α0 = 0. Consider the objective function

Q(α,b)=12nYXαWb22+P(b;ξ,λ1,λ2), (6)

where Y is the vector composed of Yi’s, and X and W are the matrices composed of X’s and W’s. With fixed tuning parameters, we consider the following iterative algorithm: (i) Initialize b^=0; (ii) With the estimate of b fixed at b^, compute α^=(XX)1X(YWb^); (iii) Compute b^ as the minimizer of the objective function with α fixed at α^; (iv) Iterate Step (ii) -(iii) until convergence. In this computation, the key is Step (iii), and the details of this step are provided in Appendix. Different from joint analysis, here each penalized estimation is a low dimensional problem. Convergence properties of this iterative algorithm can be established following the literature (Liu et al., 2013). Convergence is achieved in all of our numerical studies.

The proposed approach and algorithm are computationally highly affordable. For a simulated dataset with n = 500, 4 E variables, 1,000 genes, and 400 pairs of (λ1, λ2) values, computation can be accomplished within 10 seconds using a regular laptop without parallel computing. It is expected that computer time can be much reduced with parallelization.

The proposed approach is applicable to a wide variety of data types and models. In Appendix, we provide details on accommodating censored survival data under the accelerated failure time (AFT) model, which is examined in our data analysis. Consider for example a binary response and logistic regression models, as analyzed below in simulation. In this case, we iterate between a Taylor expansion that keeps the quadratic term and the above computational algorithm. This strategy has also been adopted in the literature (Breheny and Huang, 2011).

3. Simulation

For each subject, we first simulate q = 4 normally distributed E factors with marginal mean 0 and variance 1. The correlation between the jth and kth E factors is ρjk with ρ = 0.2. We then dichotomize one E variable at 0, which leads to three continuous and one binary E variables for analysis. For the G variables, we consider two simulation strategies. First we sample SNP measurements from the GENEVA diabetes data (details below). To maintain the correlation structure, we sample consecutive SNPs. This way, realistic SNP distributions and correlations can be achieved. In addition, we also simulate from parametric distributions, as has been done in a large number of published studies. In simulating continuous distributions to mimic gene expression data, we generate multivariate normal distributions with marginal mean 0 and variance 1. For the correlation of G factors, we consider two structures. The first is the AR (auto-regressive) structure, under which genes j and k have correlation ρjk with ρ = 0.2. The second is the Band (banded) structure, under which genes j and k have correlation 0.6 if ∣jk∣ = 1, 0.33 if ∣jk∣ = 2, and 0 otherwise (we have also examined negative correlations and made similar findings). To mimic SNP data, we further dichotomize the continuous G variables simulated above, setting MAF (minor allele frequency) as 0.3. We consider the number of G factors p = 500 and 1,000. Among the p main G effects and p × q G-E interactions, 16 and 12 are set as associated with the response. In addition, to examine whether performance of the proposed approach scales well, we also consider the scenario with 8 important main G effects and 6 interactions. The main effects and interactions are set such that the hierarchy is respected. All E factors have important main effects. The nonzero coefficients of important effects are randomly generated from a uniform distribution Unif[0.2, 0.8] (we have also examined negative coefficients and made similar findings). We simulate continuous responses under linear regression models. Three scenarios for the random error distributions are considered: (a) standard normal distribution (Norm), (b) 0.8N(0,1)+0.2LN(0,1) distribution (MNL), and (c) t distribution with degree of freedom 2 (t). In addition, we also simulate binary responses under logistic regression models with intercept set as 0. Sample size is set as n = 100 or 500.

Besides the sgMCP penalization approach, we also analyze simulated data using (a) Sig, which is the benchmark analysis, ranks interactions based on their p-values, and selects the important ones. A parallel analysis is also conducted on the main effects. This approach does not automatically respect the “main effects, interactions” hierarchy. Approaches such as the one in Bien et al. (2015) are significance-based and respect the hierarchy. However, they are very complex to realize and have not been extensively adopted; (b) Lasso, which, for each marginal model, applies the popular Lasso penalization to the main G effect and interactions. To ensure that all G factors are analyzed on the same ground, all marginal models share the same tuning. Important interactions and main G effects are identified as those with nonzero estimates; (c) MCP, which is the same as (b) but with the MCP penalty; (d) Lassointer, which differs from (b) by applying Lasso to interactions only. This approach includes all main G effects in the model and cannot discriminate important G effects from noises; and (e) MCPinter, which is the same as (d) but with the MCP penalty. Here, comparing with Sig can directly establish the merit of the penalization strategy; comparing with Lasso and MCP can directly establish the merit of respecting the “main effects, interactions” hierarchy; and comparing with Lassointer and MCPinter can establish the benefit of jointly selecting important main G effects and interactions. We note that in the majority of existing studies, identification of important main G effects is also conducted.

It is realized that the tuning parameters of different approaches may have different implications (in Sig, the p-value cutoff is viewed as the tuning parameter). In addition, the penalization approaches have different numbers of tunings. To eliminate the impact of tuning parameter selection as much as possible, we consider a sequence of tunings, compute TP (true positive) and FP (false positive) values at each tuning, and use the AUC (area under curve) under the ROC (receiver operating characteristic) framework for comparing the identification accuracy across approaches. This evaluation approach has been adopted in quite a few recent studies (Shi et al., 2014; Meinshausen and Bühlmann, 2010). In addition, we also consider the partial AUC (pAUC) (Walter, 2015), denoted by AUCrs where r and s mark the range of the FP value. We specifically consider (r = 0, s = 0.5). Another measure we consider is Top40, defined as the number of TPs when 40 interactions (or main effects) are identified.

Representative results using SNP measurements extracted from the diabetes data based on 200 replicates are provided in Figures 1 and 2. The rest of the results are in Appendix, where for example “C-Norm” stands for continuous response and normal error, and “B-Logistic” stands for binary response and logistic model. Note that for the ROC based evaluation, the AUC (pAUC) × 100 values are presented. It is observed that across all the simulated scenarios, the proposed approach has competitive performance. For example in Figure 1 (with detailed numerical results in Table A1), consider the scenario with continuous response and C-MNL error distribution. For the identification of interactions, the mean AUC values are 69.9 (Sig), 70.9 (Lasso), 69.4 (MCP), 72.6 (Lassointer), 72.1 (MCPinter), and 78.7 (sgMCP). Taking into variation, we can see that the improvement of sgMCP is significant. Similar advantageous performance is observed with the pAUC and Top40 measures. For the identification of main effects, the mean AUC values are 68.4 (Sig), 68.6 (Lasso), 68.3 (MCP), and 72.7 (sgMCP). Note that Lassointer and MCPinter cannot discriminate important main G effects from noises. Similar advantageous performance is observed with the pAUC and Top40 measures. As shown in the “B-Logistic” results, the proposed approach is also advantageous with binary responses. Consider for example Figure 2 with detailed numerical results in Table A2. For the identification of interactions, the mean AUC values are 58.4 (Sig), 40.6 (Lasso), 39.7 (MCP), 44.5 (Lassointer), 44.3 (MCPinter), and 65.2 (sgMCP). For the identification of main effects, the mean AUC values are 52.2 (Sig), 27.9 (Lasso), 30.1 (MCP), and 61.3 (sgMCP). Similar advantageous performance is observed with pAUC and Top40.

Figure 1:

Figure 1:

Simulation results based on 200 replication. n = 100, p = 500 and SNP data from real data, which has 16 main effects and 12 interactions. Upper/lower panels: main effects/interactions.

Figure 2:

Figure 2:

Simulation results based on 200 replication. n = 500, p = 500 and SNP data from real data, which has 16 main effects and 12 interactions. Upper/lower panels: main effects/interactions.

We further examine performance of the proposed and alternative approaches with selected tunings. Representative results are provided in Table A4 (Appendix). For the Sig approach, as it is found that the numbers of TP are small at target FDR=0.1, we also consider the looser criterion with target FDR=0.5. The FP and TP values suggest competitive performance of the proposed approach with EBIC selected tunings. For example, with C-Norm and GE data, the (TP, FP) values are (4.29, 0.72) for Sig0.1 (Sig with FDR=0.1), (9.78, 16.67) for Sig0.5, (15.52, 15.7) for Lasso, (15.13, 31.64) for MCP, and (16.67, 13.25) for the proposed sgMCP. As Lassointer and MCPinter cannot select important main effects, results are not presented. When taking both TP and FP into consideration, the proposed approach is clearly advantageous.

We have also examined a few other simulation settings and observed similar patterns (results omitted). It is noted that overall the identification performance of all approaches may be not as “exciting” as in some publications. This is caused by the higher complexity of our simulated data, including weak signals, correlations among variables, etc. Comparing with alternatives, especially the significance-based method, on exactly the same ground can convincingly establish the merit of the proposed approach. It is also noted that the simulated data have dimensions lower than that in a whole genome study. Marginal analysis analyzes one gene at a time, and the summary performance is not as strongly affected by the number of genes. In addition, in practical data analysis, screening is often conducted prior to modeling to reduce dimensionality. As such, the simulated settings can be sufficient.

4. Data analysis

4.1. GENEVA diabetes data

The Health Professionals Follow-up Study (HPFS) was launched in 1986 and organized by the National Institutes of Health (NIH) as part of the Gene Environment Association Studies (GENEVA). We analyze the GENEVA type 2 diabetes data, where a major goal is to identify genetic factors that are associated with type 2 diabetes phenotypes, biomarkers, and others. In our study, data are downloaded from dbGaP (accession number phs000091.v2.p1). Here we focus on the analysis of BMI (body mass index), which is continuously distributed. BMI is the principal measure of adiposity which plays an important role in cardiovascular diseases and metabolic disorders, such as hypertension and diabetes. Following recently published studies, we take a “loose” definition of E factors. Specifically, E factors considered include age, family history of diabetes among first degree relatives (famdb), total physical activity (act), trans fat intake (trans), cereal fiber intake (ceraf), and heme iron intake (heme), all of which have been suggested as potentially associated with BMI. For G factors, we analyze SNPs on chromosome 4, which plays an important role in many disorders, such as Huntington’s disease, Parkinson’s disease, and others. Preprocessing similar to that in Wu et al. (2014) is conducted, which includes subject matching, standard quality control for SNPs, and missing data imputation. The working dataset contains 2,558 subjects with 40,568 SNPs. As the number of SNPs that are associated with BMI and related disorders and diseases is not expected to be large, we further conduct a marginal screening based on SNPs’ marginal associations with BMI. This screening may help generate more reliable findings, although we note that this step is not essential. We select the region of 10,000 consecutive SNPs with the smallest sum of p-values for downstream analysis. Here we select a region as opposed to individual SNPs to keep the physical adjacency (and hence correlation) structure.

With a continuous outcome variable, we adopt the linear regression models. With the EBIC selected tunings, the proposed approach identifies 63 main SNP effects and 115 G-E interactions. The detailed estimation results are provided in Table 1, where we also present genes that the SNPs belong to or are closest to. Note that the 0.0000’s represent estimates with small magnitudes but are actually not zero. Interactions with all six E factors are observed. Similar findings have also been made in the literature. It is clearly observed that the “main effects, interactions” hierarchy is respected. Previous literature suggests that some genes have direct effects on BMI. For example, FAM13A expression is abundant in human adipose tissues. It is shown that the expression of FAM13A is associated with adipose morphology, specifically adipocyte cell numbers (hyperplasia). FAM13A is also identified to promote fatty acid oxidation (FAO) possibly by interacting with and activating Sirtuin 1, which contributes to body fat distribution and is strongly associated with waist to hip ratio. The enzyme encoded by gene GK2 has a notable effect on the regulation of glycerol uptake and metabolism, which is associated with obesity. PPA2 may have a function in feeding behavior via controlling the phosphate level of the cell, and PPA2 is a negative regulator of the insulin metabolic signaling pathway and may contribute to abnormal BMI. The cellular m6A methylation status plays a role in the regulation of fat mass and obesity, and YTHDF1 is one of m6A readers/effector proteins. In addition, some genes can be related to BMI by affecting other diseases. A large number of studies support the hypothesis that many serious diseases may increase the risk of underweight. MIR1269A is involved in multiple cellular programs, including proliferation, differentiation, apoptosis, and differentiation, which are some of the most important processes primarily altered during the development and progression of complex diseases, including breast cancer, endometrial cancer, lung squamous cell cancer, and melanoma. The loss of heterozygosity in NAA11 is shown to correlate with a poor prognosis in hepatocellular carcinoma patients. The lack of NAA11 expression also contributes to human cancerous tissues. NAA11 is found to co-express with NAA10, which is associated with different types of cancer, in several human cell lines. Subjects with these cancers are likely to have low BMI as they suffer from pain, depression, and surgery. Obesity in men, particularly when central, is associated with lower total testosterone, free testosterone and sex hormone-binding globulin. Those hormones are related to the high expression of C4orf22 in testis, trachea, lung, fetal lung and epididymis. Furthermore, an even higher expression of C4orf22 is shown in the condition of soft tissue tumor and muscle tissue tumor, which may cause underweight.

Table 1:

Analysis of the GENEVA diabetes data using the proposed approach: identified main effects and interactions.

SNP Gene* main age famdb act trans ceraf heme
rs17090278 RP11-593F5.2 −33.9906 0.4713 0.0118 0.0000 0.0450 0.0477
rs17090286 RP11-593F5.2 −34.0006 0.4726 0.0107 0.0000 0.0381 0.0411
rs13122165 RP11-593F5.2 −12.2239 0.1648
rs17828144 RP11-593F5.2 −23.5662 0.3333
rs17085296 RP11-63H19.1 −28.9288 0.3775 −0.1094 −0.0027 −0.1187 0.0572 1.3963
rs1430504 RP11-707A18.1 −7.3531 0.0897 −0.0043 0.0002 −0.0400 0.0222 0.5266
rs6551878 RP11-707A18.1 −29.5479 0.3729 −0.0271 0.0004 −0.0992 0.0962 1.4930
rs6823601 RP11-707A18.1 −29.5479 0.3729 −0.0271 0.0004 −0.0992 0.0962 1.4930
rs13107026 MIR1269A 0.0391
rs1397755 MIR1269A 0.0925
rs13151560 MIR1269A 0.0459
rs1858306 MIR1269A 0.0202
rs10016795 MIR1269A 0.0072
rs12331987 MIR1269A 0.0504
rs10000219 MIR1269A 0.0492
rs4860208 RPS23P3 0.0844
rs1511286 RPS23P3 0.0871
rs2136822 RPS23P3 0.0032 0.0000
rs11936928 RPS23P3 0.0321 0.0001 −0.0002 0.0000 −0.0024 0.0007 0.0272
rs6838523 RPS23P3 0.0289 0.0000 −0.0002 0.0000 −0.0023 0.0007 0.0264
rs17088752 UBA6-AS1 30.1278 −0.4304
rs17088764 UBA6-AS1 0.0052 0.0000 −0.0001
rs353169 UBA6-AS1 0.0052 0.0000 −0.0001
rs10033058 YTHDC1 0.0079
rs2293595 YTHDC1 0.0172
rs17089267 YTHDC1 0.0182
rs11249477 CSN1S2AP −0.0962 0.0000 0.0001 0.0000 0.0011 0.0010
rs1399247 CSN1S2AP −0.0260
rs1717600 CSN1S2AP −0.0362
rs10003790 DCK 13.0274 −0.1394 −0.0888 −0.0017 0.1351 −0.0814 −1.7329
rs10012631 DCK 9.6975 −0.1012 −0.0677 −0.0013 0.1095 −0.0644 −1.3904
rs9790462 DCK 0.0170 −0.0001 0.0000 0.0000
rs12649753 RN7SL218P 0.0127 0.0000 0.0000 0.0000 −0.0001 0.0001 0.0023
rs7681755 LINC01088 −0.0727
rs11731223 NAA11 −0.0097
rs17003746 GK2 −0.0105
rs17003749 GK2 −0.0051
rs10004901 C4orf22 −0.0343 0.0000 −0.0002 0.0000 0.0010 0.0014
rs1391262 RP11-689K5.3 −0.0340
rs35036928 RP11-689K5.3 −0.0752
rs4693369 RP11-689K5.3 −0.0603
rs7672440 RP11-689K5.3 −0.0739
rs676592 RP11-689K5.3 −0.1460
rs1993798 RP11-689K5.3 10.0827 −0.1347 −0.0039 0.0001 −0.0138 −0.0119
rs2868257 RP11-689K5.3 6.1677 −0.0820
rs6535281 RP11-689K5.3 7.4534 −0.0984 −0.0035 0.0001 −0.0124 −0.0107
rs612318 RP11-689K5.3 −0.0542
rs1824657 RP11-689K5.3 0.0547 −0.0001 −0.0001 0.0000 −0.0003 −0.0003
rs11722328 RP11-689K5.3 0.0751 0.0000 0.0000 0.0000 −0.0045
rs2199487 RP11-689K5.3 0.0752 0.0000 0.0000 0.0000 −0.0047
rs392112 RP11-218C23.1 −0.0232
rs434193 RP11-218C23.1 −0.0219
rs6842681 RP11-218C23.1 −0.0107
rs416035 RP11-218C23.1 −0.0274
rs432755 RP11-218C23.1 −0.0139
rs375432 RP11-218C23.1 −0.0186
rs407430 RP11-218C23.1 −0.0168
rs400023 RP11-218C23.1 −0.0247
rs585787 RP11-218C23.1 −0.0033
rs3775373 FAM13A −0.0088 0.0000 0.0016
rs2726516 PPA2 0.0071 0.0000 −0.0020
rs2636739 PPA2 0.0085 0.0000 −0.0025
rs2686293 RP13-612N21.1 −0.0751
*

Genes that SNPs belong to or are the closest to.

The stability of findings is evaluated using a resampling approach (Huang and Ma, 2010). Specifically, we randomly sample the subjects and apply the proposed approach. With 500 resamplings, we compute the OOI (observed occurrence index) values, which are the probabilities that interactions and main effects are identified. The OOI values are presented in Table A16 in Appendix. It is observed that in general the findings have high to moderate OOI values. In comparison, we also compute the OOI values for interactions and main effects that are not identified by the proposed approach, and obtain the 95% confidence interval as [0.234, 0.269]. The clear separation of the OOI values between the effects identified and not identified provides support to the validity of our analysis.

Data analysis is also conducted the alternative Sig, Lasso, and MCP approaches. As a large number of the SNPs are expected to be noises, and with inferior performance observed in simulation, Lassointer and MCPinter are not applied. Summary comparison results are provided in Table 3. Detailed estimation results using the alternatives are available from the authors. It is observed that different approaches make quite different discoveries. More specifically, the Sig approach identifies main effects and interactions with almost no overlap with the proposed approach. The MCP and Lasso approaches generate quite similar results, which have moderate overlaps with sgMCP. The observed differences are not surprising with the differences observed in simulation and increased complexity of practical data.

Table 3:

Data analysis: numbers of main G effects and interactions identified by different approaches and their overlaps.

GENEVA Main effects
Interactions
Sig MCP Lasso sgMCP Sig MCP Lasso sgMCP
Sig 51 0 0 3 57 0 0 0
MCP 24 19 19 103 77 11
Lasso 20 19 83 11
sgMCP 63 115
SKCM Main effects
Interactions
Sig MCP Lasso sgMCP Sig MCP Lasso sgMCP
Sig 27 0 0 0 173 0 0 0
MCP 63 63 33 60 59 5
Lasso 66 36 66 5
sgMCP 46 110

4.2. TCGA skin cutaneous melanoma data

The Cancer Genome Atlas (TCGA), as a hallmark cancer genetic program organized by the National Cancer Institute (NCI), has published high quality genetic, epigenetic, transcriptomic, and proteomic data. In this study, we consider skin cutaneous melanoma (SKCM) and download the processed level 3 data from TCGA Provisional using the R package cgdsr. The outcome of interest is (censored) overall survival, whose importance has been established in multiple publications. For G factors, we consider mRNA gene expressions. The analyzed E factors include Age, AJCC nodes pathologic stage (PN), Gender, Breslow’s depth, and Clark level, all of which have been extensively studied in the literature. In TCGA, gene expression measurements are z-scores, which have been lowess-normalized, log-transformed and median-centered, and quantify the relative expressions of tumor samples with respect to normals. Data are available on 298 subjects and 18,934 gene expressions. Among the subjects, 152 died during follow-up. Marginal screening is also conducted, and the 10,000 genes with the strongest marginal associations with survival are selected for G-E interaction analysis.

With a censored survival outcome, we adopt the AFT model with the Kaplan-Meier weighted estimation approach (Stute, 1996), which adds a nonzero weight to each event and a zero weight to each censored subject. Details on estimation under the AFT model are provided in Appendix. It can be seen that the algorithm developed for linear models with uncensored data can be applied here with minor modifications. The proposed approach identifies 46 main G effects and 110 G-E interactions. The detailed estimation results are provided in Table 2. Interactions with all five E factors are identified, and the hierarchy is clearly observed. Published literature suggests that the identification results are biologically sensible. Some genes are found to be directly related to melanoma. Melanoma cell response to a hypoxic environment is transcriptionally regulated by HIF1A, and HIF1A has a higher expression in malignant melanoma, which suggests that it may play an important role in the generation and development of malignant melanoma. The expression level of CUL2 can be used to discriminate between two classes of uveal melanomas. RASSF8 expression is low in metastatic melanoma cells and decreases with melanoma progression. RASSF8 can induce apoptosis in melanoma cells by activating the P53-P21 pathway, and in vivo studies demonstrate that inhibiting RASSF8 increases the tumorigenic properties of human melanoma xenografts. EWSR1 encodes an RNA-binding protein and is involved in the recurrent translocations associated with a number of sarcomas including melanoma. KIF5B contributes to the outward transport of melanosomes and regulates tyrosinase expression level. In addition, KIF5B regulates the intracellular distribution of melanosomes, and the distribution of KIF5B is consistent with that of melanosomes. Targeting KIF5B can reduce melanosome transport and promote melanogenesis. The APC gene promoter is methylated in 60% of primary cutaneous melanomas and 90% of metastases. Reducing APC expression to a certain level may contribute to the development of malignant melanoma. The p53-inducible RRM2B (also named p53R2) gene plays a crucial role in DNA repair and synthesis after DNA damage. Its expression and activity are associated with the resistance of human cancer cells to anticancer agents. RRM2B expression is strongly associated with the progression of melanoma, and the proliferation of melanoma cells is inhibited by RRM2B silencing. Some of the identified genes may be indirectly related to melanoma by affecting other important factors. For example, the increased expression level of gene CASC4 is associated with HER-2/neu proto-oncogene overexpression. CCPG1 may be involved in cell cycle regulation. EEA1 is involved in phagocytic processes and regulates membrane fusion. The product of gene LRP12 is a transmembrane protein that is differentially expressed in many cancer cells. Translocation of GOLGA5 has been found in tumor tissues.

Table 2:

Analysis of the TCGA SKCM data using the proposed approach: identified main effects and interactions.

Gene Age PN Gender Breslow’s depth Clark level
ZNF25 7.0336 −0.0876 −0.1225 −0.7268 0.0240
ZNF37A 0.0445 −0.0003 −0.0122 −0.0118 0.0028 0.0090
PJA2 0.0178
KIF5B 0.1047 −0.0001 −0.0141 −0.0153 0.0021
NUDT4 0.0259
KTN1 0.0961 0.0001 −0.0080 −0.0014 0.0006
LRP12 0.0477
TRIP11 0.0428 0.0000 −0.0059 −0.0021 0.0007
EEA1 0.0544
ARHGAP12 6.5295 −0.0752 −0.0825 −1.0930 0.0243
APC 0.0050 0.0000 0.0001 −0.0005
FNDC3A 0.0005 0.0000 0.0000
EIF5 4.7374 −0.0737 −0.1937 0.2275 0.0030 0.6600
DGLUCY 0.0597 0.0000 0.0037 −0.0113
ARHGAP5 0.1040
RRM2B 0.0062 0.0000 −0.0010
UHRF1BP1L 0.0209 0.0000 −0.0005 −0.0002 0.0000
SEL1L 0.0548
VPS13C 0.0275 0.0000 −0.0005
PDP2 0.0014 0.0000 0.0000 0.0000 0.0000
GOLGA5 5.7936 −0.0713 0.0098 −0.3226
HSPA13 0.0019 0.0000 −0.0002 0.0001 0.0001
RASSF8 0.0457
EFR3A 0.0016 0.0000 −0.0001 0.0000 0.0000
PCNX1 0.0194 0.0000 −0.0022 0.0004 0.0004
PPM1A 0.0583 0.0000 −0.0002 0.0000 0.0001
DNAL1 0.0276 −0.0001 −0.0030 −0.0084 0.0008
CUL2 0.0023 0.0000 −0.0005 −0.0002 0.0001
ZMYND11 0.1752 −0.0004 0.0007 −0.0525 0.0003
MPP5 0.0176 0.0000 −0.0003 0.0000 0.0000
CCPG1 0.0332
HIF1A 0.0020
PNMA1 0.0271 −0.0001 0.0006 −0.0060
EXOC5 0.0098 0.0000 −0.0011 0.0000 0.0002
GSKIP 3.0672 −0.0386 −0.0428 −0.0055 0.0056
ARL1 0.0159
EWSR1 −0.0036 0.0000 0.0005
CASC4 0.0143
RCC2 −0.0335
ARHGAP21 0.0534
JKAMP 0.0901
ACBD5 0.0463 −0.0001 −0.0066 −0.0148 0.0016
SETD3 0.0063 0.0000 −0.0007 −0.0009 0.0001
EHMT1 −0.0031
RAB18 0.0127 0.0000 −0.0005 −0.0003 0.0001
CREG1 0.0006

Identification stability is evaluated in the same manner as for the diabetes dataset. The OOI values for the identified main effects and interactions are provided in Table A17. For those effects not identified by the proposed approach, the 95% confidence interval of the OOI values is [0.139 0.180]. The patterns are similar as for the previous data, with the effects identified by the proposed approach having high to moderate OOIs, and with a clear separation of OOI values between the identified and non-identified effects.

Data is also analyzed using the alternatives. The summary comparison results are provided in Table 3, and detailed estimation results using the alternatives are available from authors. It is observed that Sig generates findings with no overlap with the penalization approaches, Lasso and MCP generate highly similar findings, and the proposed approach generates moderate overlapping in main G effect identification with Lasso and MCP but small overlapping in interaction identification.

5. Conclusion

The importance of G-E interaction analysis and demand for new and more effective methods have been well established in the literature. This study has developed a new marginal G-E interaction analysis method, which adopts penalization to respect the “main effects, interactions” hierarchy and achieve regularized estimation. It is advantageous over approaches that do not respect the hierarchy and easier to implement than Bien et al. (2015) and others. It also has an analysis framework more coherent with that of penalized joint interaction analysis, which has been popular in the past a few years. In simulation, it demonstrates notable superiority over the competitors. And in data analysis, different findings, with sound biological implications and satisfactory stability, are made.

Marginal and joint G-E interaction analyses have different objectives. However, it is still of great interest and importance to have coherent analysis frameworks and to “borrow strength” across analysis. This study may mark an important step towards that. There have been great developments in joint G-E interaction analysis in recent literature. It can be of interest to continue “migrating” some of these methods to marginal analysis. The MCP-based penalization adopted in this study can be potentially replaced by other regularization methods that also respect the hierarchy. It can also be of interest to extend our simulation to other types of response, for example count data. Although bioinformatics and statistical evaluations have been conducted with the data analysis results, it is crucial to further validate the findings.

Acknowledgements

We thank the editor and reviewers for their careful review and insightful comments, which have led to a significant improvement of this paper. This study was supported by the University of Chinese Academy of Sciences (Y95401TXX2), Fundamental Research Funds for the Central Universities (20720171064, 20720181003), Humanity and Social Science Youth Foundation of Ministry of Education of China (19YJC910010), and National Institutes of Health (R01CA204120, P50CA121974).

Appendix

Details for Step (iii) (update b^)

Again, to simplify notation, we suppress the dependence and hence subscript on j. The update of b^ proceeds as follows. With a fixed α^, consider the problem with loss function

L(b)=12nrWb2+ρ(b;q+1λ1,ξ)+k=2q+1ρ(bjk;λ2,ξ), (A.1)

where W is a n × (q + 1) matrix and r=YXα^. Assume that W has been orthogonalized with 1nWW=Iq+1. In practice, this can be achieved via Cholesky decomposition and proper transformations.

By setting the first order derivative of (A.1) to zero, we have

L(b)b=1nWr+b+bb{q+1λ1bξ,ifbξq+1λ10,ifb>ξq+1λ1}+h=0, (A.2)

where

h=(0,sgn(b2){λ2b2ξ,ifb2ξλ20,ifb2>ξλ2},,sgn(bq+1){λ2bq+1ξ,ifbq+1ξλ20,ifbq+1>ξλ2}), (A.3)

and sgn(·) is the sign function. We can rewrite as

μ+g(b)b+h=0, (A.4)

where μ=1nWr and g(b)=1+bb{q+1λ1bξ,ifbξq+1λ10,ifb>ξq+1λ1}. Denote μk as the kth element of μ. To solve this equation, we first fix g(b) at the current estimate b^, and denote g^=g(b^). Then we can update the solution to (A.4) as

g^b1=μ1,g^bk={S(μk,λ2)11ξg^,ifμkξλ2g^μk,ifμk>ξλ2g^,} (A.5)

where S(μ,λ)=(11μ)+μ. Further, we set

ν^=bg^. (A.6)

Taking v^ back into its definition in (A.5), we have

b+bb{q+1λ1bξ,ifbξq+1λ10,ifb>ξq+1λ1}=ν^. (A.7)

Solving the above equation, we can obtain the final estimate as

b^={ξξ1S(ν^,q+1λ1),ifν^ξq+1λ1ν^,ifν^>ξq+1λ1}. (A.8)

Estimation under the AFT model

For subject i, denote Ti as the survival time of interest. For Ti, consider the accelerated failure time (AFT) model

log(Ti)=α0+Xi.α+Wi.b+ϵi. (A.9)

Denote Ci as the censoring time for subject i. Under right censoring, we observe Yi = log(min(Ti, Ci)) and δi = I(TiCi). Assume that data {(Yi, X, W), i = 1,…, n} have been sorted according to Yi from the smallest to the largest. Consider the following weighted least squared loss function

12ni=1nwi[Yiα0Xi.α+Wi.b]2, (A.10)

where wi’s are the Kaplan-Meier weights defined as

w1=δ1n,wi=δini+1i=1i1(nlnl+1)δl,i=2,,n.

With a slight abuse of notation, consider

Yi=wi(YiY¯),Xi.=wi(Xi.X¯),Wi.=wi(Wi.W¯), (A.11)

where Y¯=i=1nwiYiws, X¯=i=1nwiXi.ws, W¯=i=1nwiWi.ws, and ws=i=1nwi. Then loss function (A.10) can be written as

12nYXαWb22,

where notations have similar definitions as in the main text.

Table A1:

Simulation results mean(sd) based on 200 replication. n = 100, p = 500 and SNP data from real data, which has 16 main effects and 12 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
C-Norm Sig 71.6(0.06) 53.8(0.07) 5.3(1.37) 75.6(0.07) 59.2(0.1) 2.4(1.12)
Lasso 71.8(0.05) 52.7(0.07) 5.5(1.35) 76.6(0.07) 60.3(0.09) 2.7(1.18)
MCP 71.4(0.05) 52.4(0.07) 5.4(1.32) 74.5(0.07) 56.8(0.09) 2.5(1.13)
Lassointer 0(0) 0(0) 1.3(0) 70.2(0.08) 51.3(0.1) 1.9(1.07)
MCPinter 0(0) 0(0) 1.3(0) 68.1(0.07) 48.6(0.1) 1.8(0.88)
sgMCP 76.1(0.05) 55.4(0.07) 5.7(1.53) 84.9(0.05) 66.9(0.11) 3.6(1.62)
C-MNL Sig 68.4(0.06) 49.1(0.08) 4.3(1.34) 69.9(0.08) 51.3(0.11) 1.6(0.95)
Lasso 68.6(0.07) 48.1(0.09) 4.5(1.31) 70.9(0.07) 52(0.1) 1.9(1.09)
MCP 68.3(0.06) 48(0.08) 4.5(1.33) 69.4(0.08) 49.8(0.1) 1.8(1.02)
Lassointer 0(0) 0(0) 1.3(0) 72.6(0.07) 54.2(0.1) 2(1.07)
MCPinter 0(0) 0(0) 1.3(0) 72.1(0.07) 53.6(0.09) 2(1.09)
sgMCP 72.7(0.06) 49.9(0.09) 4.7(1.48) 78.7(0.08) 56.8(0.12) 2.2(1.34)
C-t Sig 67.6(0.07) 48.5(0.09) 4.6(1.44) 61.8(0.1) 40.1(0.12) 1(0.82)
Lasso 67.4(0.07) 46.5(0.09) 4.5(1.51) 62.3(0.09) 40.1(0.11) 1.1(0.92)
MCP 67.2(0.08) 46.4(0.1) 4.5(1.53) 61.3(0.1) 38.9(0.11) 1(0.77)
Lassointer 0(0) 0(0) 1.3(0) 64.4(0.1) 43.6(0.12) 1.4(1.12)
MCPinter 0(0) 0(0) 1.3(0) 63.4(0.1) 42.2(0.12) 1.3(1.01)
sgMCP 71(0.07) 48.7(0.1) 4.7(1.49) 71.5(0.08) 47.1(0.12) 1.8(1.15)
B-Logistic Sig 49.4(0.07) 24.2(0.08) 1.2(0.94) 49(0.1) 23.5(0.1) 0.3(0.54)
Lasso 18.9(0.13) 20.3(0.1) 1.4(1.06) 25.8(0.13) 24.3(0.1) 0.3(0.45)
MCP 19(0.13) 20.4(0.1) 1.4(1.12) 16.6(0.13) 18.8(0.11) 0.3(0.49)
Lassointer 0(0) 0(0) 1.3(0) 39.4(0.1) 23.3(0.09) 0.2(0.46)
MCPinter 0(0) 0(0) 1.3(0) 37.2(0.1) 23.5(0.09) 0.2(0.45)
sgMCP 54.5(0.07) 26.1(0.08) 1.6(1.15) 59.4(0.12) 33.4(0.14) 0.5(0.88)

Table A2:

Simulation results mean(sd) based on 200 replication. n = 500, p = 500 and SNP data from real data, which has 16 main effects and 12 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
C-Norm Sig 95.8(0.02) 90.9(0.03) 12.4(1.16) 93.5(0.04) 87.1(0.05) 6.2(1.05)
Lasso 95(0.02) 87.7(0.04) 12.1(1.23) 92.4(0.04) 85.6(0.06) 6.5(1.24)
MCP 95(0.02) 88.3(0.04) 12.1(1.22) 91.9(0.04) 84.6(0.06) 6.4(1.21)
Lassointer 0(0) 0(0) 1.3(0) 93(0.04) 87.4(0.06) 7.7(1.24)
MCPinter 0(0) 0(0) 1.3(0) 92.1(0.04) 85.3(0.06) 7.3(1.27)
sgMCP 96.5(0.01) 90.8(0.04) 12.5(1.6) 97.3(0.01) 92.5(0.03) 6.2(1.21)
C-MNL Sig 87.5(0.03) 77.9(0.05) 10.1(1.22) 93.8(0.04) 87.7(0.06) 6.4(1.06)
Lasso 86.9(0.03) 75.7(0.05) 9.8(1.22) 93.4(0.03) 86.9(0.05) 7(1.38)
MCP 86.9(0.04) 75.7(0.05) 9.7(1.24) 92.6(0.04) 85.6(0.06) 6.6(1.31)
Lassointer 0(0) 0(0) 1.3(0) 89.6(0.04) 80.9(0.06) 6.1(0.96)
MCPinter 0(0) 0(0) 1.3(0) 89.1(0.04) 80(0.06) 5.9(1)
sgMCP 90.8(0.03) 79.2(0.06) 10.4(1.52) 96.7(0.02) 86.3(0.11) 7.6(1.3)
C-t Sig 84.1(0.09) 72.2(0.14) 8.6(2.33) 82(0.09) 69(0.14) 3.6(1.57)
Lasso 83.7(0.09) 69.9(0.13) 8.3(2.2) 82.5(0.09) 69.2(0.14) 3.9(1.73)
MCP 83.6(0.09) 69.7(0.13) 8.3(2.21) 81.4(0.09) 68.1(0.13) 3.9(1.71)
Lassointer 0(0) 0(0) 1.3(0) 84.3(0.08) 72.4(0.12) 4.7(1.66)
MCPinter 0(0) 0(0) 1.3(0) 83.5(0.08) 71.1(0.12) 4.3(1.65)
sgMCP 86.9(0.08) 71.8(0.14) 8.8(2.71) 90.6(0.08) 77.1(0.14) 4.5(1.69)
B-Logistic Sig 52.2(0.07) 27.5(0.08) 1.6(1.25) 58.4(0.08) 34.9(0.11) 0.7(0.65)
Lasso 27.9(0.1) 33.7(0.1) 2.9(1.23) 40.6(0.09) 38.5(0.1) 1.1(0.9)
MCP 30.1(0.09) 32(0.08) 2.8(1.26) 39.7(0.09) 33.6(0.1) 0.9(0.77)
Lassointer 0(0) 0(0) 1.3(0) 44.5(0.1) 38.7(0.1) 0.8(0.8)
MCPinter 0(0) 0(0) 1.3(0) 44.3(0.09) 37.9(0.11) 0.8(0.8)
sgMCP 61.3(0.07) 35.2(0.08) 2.7(1.38) 65.2(0.12) 44.5(0.15) 1.5(1.49)

Table A3:

Simulation results mean(sd) based on 200 replication. p = 500 and GE data with AR Correlation, which has 16 main effects and 12 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
n=100 C-Norm Sig 65.5(0.08) 44.4(0.11) 3.8(1.68) 59.2(0.07) 37(0.08) 0.8(0.7)
Lasso 75.4(0.07) 61(0.09) 6.5(1.4) 68.5(0.07) 51.8(0.09) 2.9(1.01)
MCP 69(0.08) 51.6(0.11) 5.4(1.82) 61.4(0.07) 42.1(0.08) 2.4(0.94)
Lassointer 0(0) 0(0) 1.3(0) 67.2(0.07) 47(0.09) 1.8(0.89)
MCPinter 0(0) 0(0) 1.3(0) 66.6(0.07) 46.2(0.09) 1.7(0.89)
sgMCP 80.2(0.05) 60.4(0.08) 6.4(1.21) 84.1(0.06) 63(0.11) 4.4(1.57)
C-MNL Sig 63.5(0.07) 42.5(0.1) 3.7(1.51) 57.4(0.09) 33.4(0.11) 0.5(0.52)
Lasso 72.6(0.07) 56.4(0.1) 6.1(1.71) 66.4(0.08) 50.1(0.1) 2.2(1.09)
MCP 66.4(0.07) 48.4(0.09) 5.3(1.5) 59.1(0.08) 37.9(0.11) 1.9(1.04)
Lassointer 0(0) 0(0) 1.3(0) 65.2(0.06) 44.8(0.08) 1.7(0.77)
MCPinter 0(0) 0(0) 1.3(0) 64.4(0.06) 44.1(0.08) 1.7(0.78)
sgMCP 80(0.06) 60.3(0.1) 6.4(1.76) 81.2(0.07) 58.7(0.13) 3.1(1.28)
C-t Sig 60.7(0.08) 37.8(0.1) 3.1(1.58) 53.8(0.09) 31.4(0.11) 0.8(0.77)
Lasso 67.8(0.08) 49.9(0.11) 4.7(1.8) 61(0.09) 42.6(0.11) 1.8(1.1)
MCP 64.6(0.08) 45.9(0.11) 4.5(1.59) 54.9(0.07) 34.7(0.08) 1.6(0.94)
Lassointer 0(0) 0(0) 1.3(0) 63.2(0.08) 42.2(0.11) 1.2(0.92)
MCPinter 0(0) 0(0) 1.3(0) 62.4(0.09) 40.9(0.11) 1.2(0.93)
sgMCP 75(0.07) 51.6(0.1) 4.8(1.86) 78.2(0.11) 57.6(0.15) 2.6(1.85)
B-Logistic Sig 58.6(0.07) 35.4(0.09) 2.5(1.29) 52.3(0.09) 27.7(0.11) 0.4(0.53)
Lasso 64.8(0.07) 47.7(0.09) 4.6(1.32) 54.6(0.09) 36.6(0.11) 1.1(0.92)
MCP 61(0.08) 44.3(0.1) 4.5(1.33) 50.2(0.09) 30.1(0.11) 1.1(0.96)
Lassointer 0(0) 0(0) 1.3(0) 62(0.07) 42.2(0.09) 1.1(0.82)
MCPinter 0(0) 0(0) 1.3(0) 61.3(0.07) 41.6(0.09) 1.1(0.8)
sgMCP 72.6(0.06) 48.2(0.09) 4.4(1.32) 69.2(0.09) 42.1(0.12) 1.6(1.16)
n=500 C-Norm Sig 92.9(0.04) 86.8(0.05) 11.6(1.27) 79.1(0.05) 64(0.08) 3(0.9)
Lasso 96(0.02) 93.1(0.03) 13.3(1.15) 88.9(0.06) 83.2(0.07) 7.1(1.36)
MCP 92.2(0.04) 85.5(0.05) 11.3(1.59) 76.4(0.05) 60.5(0.06) 4.7(0.98)
Lassointer 0(0) 0(0) 1.3(0) 82.8(0.05) 70.3(0.07) 5.4(0.85)
MCPinter 0(0) 0(0) 1.3(0) 82.4(0.05) 69.7(0.07) 5.4(0.86)
sgMCP 98.4(0.01) 93(0.04) 13.1(1.72) 99(0.01) 85.9(0.13) 9.6(1.68)
C-MNL Sig 87.4(0.04) 78.1(0.06) 9.9(1.73) 80.8(0.07) 66.4(0.1) 3.4(1.2)
Lasso 93.1(0.03) 88.9(0.04) 12.4(1.42) 89.6(0.06) 83.7(0.09) 6.8(1.5)
MCP 87.8(0.05) 79.4(0.07) 10.4(1.73) 81.3(0.07) 69.2(0.1) 5.6(1.5)
Lassointer 0(0) 0(0) 1.3(0) 87.3(0.05) 78.1(0.07) 6.7(0.92)
MCPinter 0(0) 0(0) 1.3(0) 86.7(0.05) 76.7(0.07) 6.5(0.89)
sgMCP 96.3(0.02) 89.8(0.06) 12.6(1.72) 98.2(0.01) 91.9(0.07) 8.9(1.52)
C-t Sig 80.3(0.07) 66.9(0.1) 7.3(2) 74.8(0.07) 58.1(0.09) 2.2(1.22)
Lasso 88.3(0.05) 79.7(0.08) 10.5(2.09) 85.7(0.07) 78.4(0.1) 6.3(1.73)
MCP 83.3(0.06) 72.8(0.09) 9.4(2.06) 74.8(0.06) 59.3(0.08) 4.2(1.13)
Lassointer 0(0) 0(0) 1.3(0) 77.6(0.07) 61.6(0.09) 3.6(0.96)
MCPinter 0(0) 0(0) 1.3(0) 77.4(0.07) 61.2(0.09) 3.6(0.94)
sgMCP 95.2(0.04) 86.2(0.09) 11.2(2.09) 97(0.04) 82.9(0.16) 8(1.98)
B-Logistic Sig 79.9(0.06) 65.6(0.08) 7.2(1.67) 64.8(0.08) 43.5(0.1) 1.1(0.86)
Lasso 81.8(0.05) 79.1(0.06) 10.4(1.45) 69.5(0.08) 62.9(0.11) 4.7(1.33)
MCP 77.1(0.06) 70.5(0.07) 9.3(1.79) 59.7(0.07) 47.3(0.09) 3.4(1.07)
Lassointer 0(0) 0(0) 1.3(0) 73.8(0.07) 62(0.09) 3.5(1.09)
MCPinter 0(0) 0(0) 1.3(0) 73.5(0.07) 61.4(0.09) 3.5(1.1)
sgMCP 93.3(0.03) 82.9(0.06) 10.2(1.41) 94.8(0.03) 79.2(0.13) 7.3(1.54)

Table A4:

Simulation: mean (sd) of TP and FP values for main effects and interactions combined. p = 500, n = 500, and AR Correlation. 16 main effects and 12 interactions.

SNP
GE
FP TP FP TP
C-Norm Sig0.1 0.56(0.96) 3.96(2.62) 0.72(1.07) 4.29(2.18)
Sig0.5 9.69(12.94) 9.65(3.09) 16.67(22.64) 9.78(3.03)
Lasso 18.69(16.79) 18.3(2.56) 15.7(6.92) 15.52(2.04)
MCP 126.95(57.53) 20.41(2.33) 31.64(20.47) 15.13(2.39)
sgMCP 14.47(4.36) 19.83(2.45) 13.25(3.39) 16.67(2.61)
C-MNL Sig0.1 0.46(0.9) 1.44(1.62) 0.63(1.35) 1.91(1.66)
Sig0.5 10.21(19.34) 4.57(3.22) 18.63(31.17) 6.23(3.55)
Lasso 19.6(19.82) 14.47(3.16) 16.22(8.89) 17.99(2.22)
MCP 120.57(71.23) 15.25(3.19) 28.51(23.34) 11.63(2.45)
sgMCP 12.26(3.78) 15.85(3.26) 13.97(3.63) 17.82(2.76)
C-t Sig0.1 4.79(18.64) 0.35(0.66) 5.75(31.64) 1.29(1.54)
Sig0.5 45.85(140.55) 1.74(2.43) 35.3(96.29) 4.62(3.28)
Lasso 11.88(16.93) 8.08(3.28) 17.64(14.04) 11.17(3.66)
MCP 19.89(38.82) 8.3(3.33) 23.47(24.5) 8.52(2.77)
sgMCP 14.66(26.34) 9.6(3.81) 12.53(14.92) 12.73(3.5)
B-Logistic Sig0.1 0.12 (0.44) 0 (0) 0.4 (1.03) 1.38 (1.52)
Sig0.5 8.16 (22.92) 0.2 (0.76) 11 (8.39) 6.34 (2.88)
Lasso 5.68(3.53) 1.44(1.34) 7.56(2.65) 9.56(2.98)
MCP 4.92 (3.28) 1.38 (1.01) 6 (2.57) 8.94 (2.15)
sgMCP 6.58 (3.87) 1.22 (1) 9.3 (3.7) 11.62 (3.02)

Table A5:

Simulation results mean(sd) based on 200 replication. p = 500 and GE data with Band structure, which has 16 main effects and 12 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
n=100 C-Norm Sig 70.4(0.06) 51.6(0.09) 5(1.56) 57.2(0.09) 33.7(0.11) 0.6(0.66)
Lasso 81.6(0.06) 70.4(0.07) 8.5(1.28) 65.7(0.09) 48.9(0.11) 2.4(1.21)
MCP 74.4(0.06) 58.7(0.09) 7.1(1.58) 57.7(0.1) 36.6(0.12) 2(1.37)
Lassointer 0(0) 0(0) 1.3(0) 72.2(0.07) 54(0.09) 2.2(1.04)
MCPinter 0(0) 0(0) 1.3(0) 71.7(0.07) 52.6(0.09) 2.3(1.04)
sgMCP 84.8(0.05) 69.3(0.07) 8.2(1.38) 84.5(0.07) 64.9(0.13) 3.5(1.12)
C-MNL Sig 63.8(0.09) 42.6(0.11) 3.5(1.56) 57.7(0.11) 35.6(0.12) 0.8(0.74)
Lasso 72.6(0.07) 57.2(0.09) 6.1(1.55) 65.5(0.09) 47.5(0.1) 1.8(1.02)
MCP 66.9(0.09) 48.8(0.11) 5.3(1.67) 59.4(0.1) 39.2(0.11) 1.6(1.04)
Lassointer 0(0) 0(0) 1.3(0) 69.5(0.07) 49.7(0.08) 1.7(0.95)
MCPinter 0(0) 0(0) 1.3(0) 68.9(0.06) 48.7(0.08) 1.7(0.93)
sgMCP 80.1(0.05) 60(0.07) 6(1.42) 78.4(0.08) 56.3(0.12) 2.5(1.1)
C-t Sig 58(0.09) 35.3(0.13) 2.7(1.82) 53.1(0.09) 29.5(0.1) 0.5(0.53)
Lasso 66.5(0.09) 47.6(0.11) 4.7(1.7) 62.8(0.09) 45.1(0.12) 1.9(1.18)
MCP 62.3(0.09) 42.8(0.11) 4.3(1.68) 55.7(0.09) 35.6(0.09) 1.5(0.9)
Lassointer 0(0) 0(0) 1.3(0) 64.9(0.09) 43.9(0.11) 1.6(1.08)
MCPinter 0(0) 0(0) 1.3(0) 64(0.09) 43(0.11) 1.5(1.05)
sgMCP 73.7(0.07) 50.9(0.11) 5(1.66) 78.7(0.08) 57.8(0.13) 2.6(1.67)
B-Logistic Sig 58.1(0.06) 35.4(0.07) 2.7(1.02) 54.4(0.07) 29.8(0.09) 0.4(0.48)
Lasso 64.9(0.06) 48.9(0.07) 4.8(1.3) 54(0.06) 34.3(0.08) 0.9(0.69)
MCP 60.5(0.06) 44.5(0.08) 4.6(1.38) 52.9(0.07) 32.3(0.08) 0.8(0.66)
Lassointer 0(0) 0(0) 1.3(0) 55.9(0.07) 33.1(0.09) 1(0.76)
MCPinter 0(0) 0(0) 1.3(0) 54.9(0.07) 32.4(0.09) 0.9(0.74)
sgMCP 73.5(0.05) 50(0.07) 4.4(1.38) 69.4(0.09) 40.2(0.11) 1.2(0.97)
n=500 C-Norm Sig 86.7(0.05) 76.6(0.06) 9.5(1.51) 82.8(0.05) 69.9(0.07) 3.8(1.08)
Lasso 89.7(0.04) 82.5(0.05) 11.1(1.21) 91.8(0.04) 87(0.06) 7.8(1.36)
MCP 84.9(0.05) 74.5(0.07) 9.5(1.72) 80.3(0.05) 66.8(0.06) 5(1.01)
Lassointer 0(0) 0(0) 1.3(0) 92.4(0.04) 85.1(0.05) 6.7(1.19)
MCPinter 0(0) 0(0) 1.3(0) 91.9(0.04) 84.5(0.05) 6.7(1.18)
sgMCP 95.3(0.02) 86.4(0.06) 12.4(1.68) 97.3(0.02) 88.9(0.07) 9(1.27)
C-MNL Sig 85.7(0.05) 74.8(0.07) 9(1.45) 73.9(0.08) 57.4(0.09) 2.5(1)
Lasso 90.2(0.04) 83.3(0.05) 10.9(1.19) 84.4(0.08) 77.3(0.1) 6.4(1.54)
MCP 84.6(0.05) 72.8(0.07) 8.5(1.5) 74.7(0.08) 61.4(0.09) 5.2(1.26)
Lassointer 0(0) 0(0) 1.3(0) 92.8(0.04) 85.5(0.06) 6.7(1.21)
MCPinter 0(0) 0(0) 1.3(0) 91.9(0.04) 83.7(0.07) 6.4(1.16)
sgMCP 94.9(0.02) 85.7(0.06) 12.2(1.26) 95.5(0.03) 81.7(0.11) 7.3(1.58)
C-t Sig 76.7(0.11) 61.5(0.14) 6.7(2.37) 69.1(0.08) 51.2(0.11) 2(1.13)
Lasso 83.2(0.13) 73.2(0.16) 9.2(2.78) 76.7(0.1) 65.3(0.15) 4.8(1.76)
MCP 77.6(0.14) 65.1(0.17) 8.2(2.55) 68.6(0.08) 51.6(0.12) 3.5(0.97)
Lassointer 0(0) 0(0) 1.3(0) 84.5(0.07) 72(0.11) 4.6(1.64)
MCPinter 0(0) 0(0) 1.3(0) 83.7(0.07) 70.9(0.11) 4.4(1.62)
sgMCP 90(0.11) 77.5(0.16) 9.3(2.76) 93.3(0.11) 77.1(0.21) 6.7(2.41)
B-Logistic Sig 76.7(0.06) 61.3(0.08) 6.5(1.44) 70.6(0.07) 51.8(0.09) 2.1(1.04)
Lasso 78.9(0.04) 73.8(0.06) 9.2(1.45) 77(0.07) 74(0.08) 5.3(1.32)
MCP 74.4(0.05) 66.9(0.08) 8.4(1.66) 67.4(0.07) 58.2(0.1) 4.1(1.23)
Lassointer 0(0) 0(0) 1.3(0) 66.3(0.06) 51.5(0.08) 2.7(0.89)
MCPinter 0(0) 0(0) 1.3(0) 65.4(0.06) 50.3(0.08) 2.7(0.88)
sgMCP 92(0.03) 80.2(0.07) 9.8(1.25) 91.6(0.13) 76.8(0.13) 5.9(1.31)

Table A6:

Simulation results mean(sd) based on 200 replication. p = 500 and SNP data with AR structure, which has 16 main effects and 12 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
n=100 C-Norm Sig 69.6(0.08) 50(0.11) 4.6(1.47) 57.6(0.07) 34.6(0.09) 0.9(0.75)
Lasso 73.5(0.06) 55.7(0.09) 4.8(1.74) 76.9(0.08) 68.3(0.09) 3.8(1.06)
MCP 72(0.08) 56.1(0.1) 6.4(1.82) 59.5(0.08) 39.8(0.1) 2.5(1.36)
Lassointer 0(0) 0(0) 1.3(0) 67.4(0.08) 47.4(0.1) 2.1(1.02)
MCPinter 0(0) 0(0) 1.3(0) 66.2(0.07) 45.5(0.09) 1.8(0.86)
sgMCP 85.1(0.04) 66.9(0.08) 8.3(1.44) 85.3(0.05) 66.9(0.09) 4(1.21)
C-MNL Sig 59.4(0.08) 37.4(0.1) 3(1.35) 55.2(0.09) 31.1(0.11) 0.6(0.59)
Lasso 62.3(0.08) 40.5(0.1) 3.1(1.5) 70.6(0.1) 58.4(0.11) 2.9(1.35)
MCP 60.7(0.08) 40.2(0.1) 3.2(1.36) 58.8(0.09) 39.6(0.09) 2(1.07)
Lassointer 0(0) 0(0) 1.3(0) 67.9(0.09) 47.9(0.11) 1.8(1.19)
MCPinter 0(0) 0(0) 1.3(0) 66.4(0.09) 45.8(0.11) 1.8(1.1)
sgMCP 75.3(0.08) 53.9(0.11) 5.4(1.8) 79.8(0.08) 56.9(0.14) 2.5(1.41)
C-t Sig 57.4(0.09) 33.8(0.1) 2.6(1.38) 55.1(0.1) 30.5(0.12) 0.4(0.74)
Lasso 59.2(0.12) 37.7(0.11) 3(1.38) 66.1(0.11) 49.8(0.15) 1.9(1.38)
MCP 56.1(0.11) 33.8(0.1) 3.1(1.52) 58.2(0.1) 37.1(0.12) 1.6(1.19)
Lassointer 0(0) 0(0) 1.3(0) 58(0.1) 34.2(0.12) 0.9(0.87)
MCPinter 0(0) 0(0) 1.3(0) 57.3(0.1) 32.9(0.12) 0.9(0.9)
sgMCP 71.5(0.13) 48.2(0.14) 4.7(2.21) 72.1(0.15) 46.7(0.18) 1.9(1.45)
B-Logistic Sig 49.6(0.07) 23.9(0.08) 1.2(0.85) 45.9(0.08) 22.3(0.09) 0.3(0.47)
Lasso 43(0.12) 33.5(0.1) 2.2(1.38) 29.4(0.11) 26.7(0.09) 0.4(0.59)
MCP 39(0.12) 27.3(0.09) 1.9(1.19) 23.1(0.1) 22.4(0.07) 0.5(0.59)
Lassointer 0(0) 0(0) 1.3(0) 44.7(0.09) 26.8(0.09) 0.3(0.5)
MCPinter 0(0) 0(0) 1.3(0) 41.8(0.1) 26.9(0.08) 0.3(0.51)
sgMCP 56(0.08) 27(0.1) 1.8(1.31) 56.7(0.1) 28.8(0.12) 0.3(0.61)
n=500 C-Norm Sig 90.5(0.05) 83(0.06) 11(1.48) 79.5(0.07) 65.5(0.09) 3.3(0.95)
Lasso 90.5(0.04) 84.4(0.05) 11.4(1) 86.8(0.06) 81.6(0.07) 7.5(1.06)
MCP 88.8(0.05) 81.4(0.07) 10.9(1.92) 76.5(0.07) 63.8(0.09) 4.1(1.09)
Lassointer 0(0) 0(0) 1.3(0) 88.4(0.05) 79.1(0.07) 5.8(1.16)
MCPinter 0(0) 0(0) 1.3(0) 87.8(0.05) 77.9(0.07) 5.6(1.06)
sgMCP 98.8(0.01) 95.3(0.03) 14(1.07) 97.8(0.01) 91.5(0.09) 8.4(0.95)
C-MNL Sig 81.5(0.07) 68.4(0.1) 8.1(1.94) 77.8(0.08) 63(0.11) 3.3(1.3)
Lasso 85.1(0.05) 75.5(0.08) 9.3(1.54) 88.3(0.07) 84.4(0.09) 7.6(1.58)
MCP 79.7(0.07) 66.6(0.11) 7.3(2.06) 73.1(0.07) 56.8(0.1) 3(1.05)
Lassointer 0(0) 0(0) 1.3(0) 78.5(0.07) 65.1(0.09) 4.3(0.99)
MCPinter 0(0) 0(0) 1.3(0) 78.4(0.07) 65.2(0.09) 4.3(0.99)
sgMCP 93.8(0.03) 83.8(0.07) 11.7(1.39) 98(0.02) 83.8(0.15) 8.5(1.65)
C-t Sig 69.3(0.09) 50.5(0.12) 4.8(1.87) 70.1(0.1) 50.4(0.13) 1.5(1.06)
Lasso 73.6(0.07) 57(0.1) 5.8(2.05) 84.1(0.07) 76.8(0.1) 6.3(1.64)
MCP 70.1(0.08) 52.8(0.11) 5(2.04) 68.6(0.08) 48.8(0.1) 2.7(0.9)
Lassointer 0(0) 0(0) 1.3(0) 78.6(0.08) 64(0.11) 3.7(1.4)
MCPinter 0(0) 0(0) 1.3(0) 77.7(0.08) 62.4(0.12) 3.6(1.32)
sgMCP 86.9(0.07) 72.9(0.11) 9(2.17) 89(0.07) 75.7(0.11) 4.8(2.12)
B-Logistic Sig 56.2(0.1) 32.5(0.1) 2.3(1.52) 56.9(0.08) 34.4(0.1) 0.6(0.66)
Lasso 46.5(0.07) 40.9(0.08) 3.2(1.28) 43.3(0.08) 42(0.1) 0.8(0.79)
MCP 43.4(0.08) 36.2(0.09) 3.2(1.24) 39(0.08) 34.1(0.08) 0.7(0.66)
Lassointer 0(0) 0(0) 1.3(0) 49.4(0.08) 39.5(0.11) 0.9(0.81)
MCPinter 0(0) 0(0) 1.3(0) 50.3(0.08) 40.8(0.11) 0.9(0.79)
sgMCP 60.9(0.06) 33.9(0.08) 2.6(1.35) 60.1(0.09) 31.5(0.11) 0.5(0.89)

Table A7:

Simulation results mean(sd) based on 200 replication. p = 500 and SNP data with Band structure, which has 16 main effects and 12 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
n=100 C-Norm Sig 65.6(0.07) 46.3(0.08) 4.3(1.25) 63.8(0.08) 41.9(0.1) 1.1(0.77)
Lasso 69.1(0.07) 50.3(0.09) 4.5(1.27) 71.8(0.08) 57.9(0.11) 3(1.4)
MCP 62.8(0.07) 41.6(0.1) 4.1(1.69) 59.4(0.08) 36.1(0.09) 1.5(0.87)
Lassointer 0(0) 0(0) 1.3(0) 65.8(0.08) 45.5(0.11) 1.8(0.99)
MCPinter 0(0) 0(0) 1.3(0) 64.8(0.08) 43.6(0.11) 1.7(0.93)
sgMCP 81.9(0.05) 63.3(0.09) 7.2(1.34) 82.4(0.06) 63.2(0.12) 3.1(1.26)
C-MNL Sig 61.2(0.08) 39.5(0.09) 3.1(1.43) 56.1(0.08) 33(0.1) 0.6(0.67)
Lasso 66.7(0.08) 47.5(0.11) 4.3(1.77) 67.6(0.09) 53.2(0.11) 2.2(1.24)
MCP 63.5(0.08) 43.1(0.11) 3.8(1.72) 56.8(0.08) 35.6(0.1) 1.6(0.89)
Lassointer 0(0) 0(0) 1.3(0) 69(0.09) 49.2(0.12) 1.8(1.07)
MCPinter 0(0) 0(0) 1.3(0) 68.2(0.1) 48.2(0.12) 1.8(1.08)
sgMCP 76.8(0.06) 55.7(0.09) 6(1.7) 74.3(0.07) 49.6(0.11) 2(1.01)
C-t Sig 55.6(0.07) 32.4(0.1) 2.6(1.39) 56.5(0.11) 32.9(0.12) 0.5(0.63)
Lasso 59.2(0.08) 36.6(0.1) 2.9(1.48) 67.1(0.1) 52.7(0.12) 2.5(1.04)
MCP 57.6(0.08) 35.5(0.1) 3.1(1.41) 57.5(0.09) 35.7(0.09) 1.8(1.04)
Lassointer 0(0) 0(0) 1.3(0) 64.7(0.09) 43.6(0.12) 1.2(0.91)
MCPinter 0(0) 0(0) 1.3(0) 64.4(0.09) 42.7(0.11) 1.2(0.95)
sgMCP 69.1(0.07) 44.9(0.09) 4.1(1.41) 75.2(0.1) 52(0.15) 2(1.46)
B-Logistic Sig 47.5(0.08) 22.6(0.09) 1.1(0.96) 49.4(0.08) 25(0.1) 0.3(0.42)
Lasso 43.4(0.11) 30.9(0.09) 2.3(1.15) 31.1(0.12) 27.6(0.09) 0.6(0.71)
MCP 42.1(0.11) 27.8(0.08) 2.1(1.29) 25.1(0.12) 24(0.1) 0.3(0.53)
Lassointer 0(0) 0(0) 1.3(0) 41(0.09) 25.8(0.09) 0.3(0.51)
MCPinter 0(0) 0(0) 1.3(0) 36.2(0.1) 26(0.09) 0.3(0.52)
sgMCP 57.1(0.08) 28.4(0.09) 1.8(1.23) 59.5(0.1) 30.6(0.12) 0.7(0.78)
n=500 C-Norm Sig 89.8(0.05) 82(0.07) 10.8(1.34) 80(0.08) 65.8(0.11) 3.2(1.4)
Lasso 91.6(0.04) 85.9(0.05) 11.1(1.36) 88.6(0.06) 84.1(0.07) 7(1.41)
MCP 85.2(0.05) 74.6(0.08) 7.8(1.83) 71.2(0.08) 52.2(0.13) 2.8(1.06)
Lassointer 0(0) 0(0) 1.3(0) 90.3(0.04) 83(0.06) 6.7(1.04)
MCPinter 0(0) 0(0) 1.3(0) 90.2(0.04) 81.9(0.06) 6.8(1.05)
sgMCP 97(0.01) 90.1(0.04) 13.4(1.11) 96.6(0.02) 85.2(0.08) 6.9(1.04)
C-MNL Sig 79.5(0.07) 65.8(0.1) 7.5(1.88) 76(0.07) 60.3(0.08) 2.9(1.15)
Lasso 83.6(0.05) 72.3(0.07) 8.3(1.4) 87.2(0.07) 83.6(0.07) 7.7(1.39)
MCP 77.8(0.06) 63.9(0.08) 7.5(1.8) 72.7(0.06) 58.1(0.07) 3.5(1.24)
Lassointer 0(0) 0(0) 1.3(0) 94.3(0.04) 88.2(0.06) 7.4(1.32)
MCPinter 0(0) 0(0) 1.3(0) 93.8(0.04) 87.7(0.06) 7.2(1.33)
sgMCP 92.5(0.03) 80.7(0.07) 11.2(1.59) 97.9(0.02) 85.6(0.15) 7.9(1.89)
C-t Sig 72.3(0.13) 55.5(0.15) 5.6(2.22) 66.2(0.1) 44(0.12) 1.2(0.86)
Lasso 75.6(0.13) 61.2(0.15) 6.7(2.19) 82.7(0.09) 75.5(0.12) 6.1(1.85)
MCP 71.7(0.11) 55(0.14) 5.8(2.16) 67.7(0.09) 49.9(0.11) 4(1.25)
Lassointer 0(0) 0(0) 1.3(0) 81.1(0.08) 67.8(0.12) 4.2(1.6)
MCPinter 0(0) 0(0) 1.3(0) 80.5(0.08) 66.8(0.12) 4.1(1.57)
sgMCP 88.4(0.08) 75.7(0.15) 9.5(2.86) 93.9(0.06) 82.6(0.15) 6(3)
B-Logistic Sig 53(0.08) 28.5(0.09) 1.6(1.08) 55.6(0.09) 31.2(0.11) 0.5(0.56)
Lasso 43.8(0.06) 41.8(0.07) 3.7(1.28) 39.5(0.09) 35.4(0.11) 0.7(0.76)
MCP 40.4(0.05) 37(0.05) 3.3(1.12) 35.6(0.08) 28.5(0.1) 0.8(0.67)
Lassointer 0(0) 0(0) 1.3(0) 45.6(0.1) 36(0.11) 0.7(0.77)
MCPinter 0(0) 0(0) 1.3(0) 46.9(0.09) 37.2(0.11) 0.8(0.78)
sgMCP 60.1(0.06) 32.4(0.08) 2.6(1.22) 55.2(0.1) 26(0.11) 0.5(0.72)

Table A8:

Simulation results mean(sd) based on 200 replication. p = 500 and GE data with AR structure, which has 8 main effects and 6 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
n=100 C-Norm Sig 82.2(0.09) 69.8(0.12) 4.2(1.2) 62.8(0.13) 42(0.15) 0.5(0.5)
Lasso 91(0.06) 84.3(0.09) 5.6(1.13) 77(0.1) 66.3(0.13) 2.3(0.87)
MCP 83.1(0.09) 71.2(0.13) 4.5(1.4) 63.1(0.12) 45.5(0.15) 1.7(0.97)
Lassointer 0(0) 0(0) 0.6(0) 80.7(0.08) 68.6(0.1) 2.4(0.76)
MCPinter 0(0) 0(0) 0.6(0) 79.8(0.08) 67.4(0.1) 2.4(0.73)
sgMCP 93.6(0.04) 84.6(0.08) 5.4(1.09) 97.5(0.02) 89.8(0.07) 3.7(0.75)
C-MNL Sig 70.6(0.11) 52.3(0.15) 2.6(1.27) 64.5(0.11) 43.7(0.15) 0.5(0.59)
Lasso 81.2(0.09) 69.8(0.12) 4.3(1.21) 76.3(0.1) 63.3(0.13) 1.8(1.06)
MCP 73.6(0.09) 58.6(0.13) 3.7(1.16) 68.7(0.11) 52.6(0.16) 1.7(1.05)
Lassointer 0(0) 0(0) 0.6(0) 82.7(0.08) 70.2(0.1) 2.4(0.76)
MCPinter 0(0) 0(0) 0.6(0) 81.5(0.08) 68(0.11) 2.2(0.76)
sgMCP 90(0.06) 74.8(0.12) 4.7(1.23) 90.5(0.06) 73.8(0.16) 2.5(1.02)
C-t Sig 62.8(0.1) 42.5(0.14) 1.8(1.03) 63.2(0.1) 41.3(0.13) 0.4(0.42)
Lasso 69.6(0.11) 52.4(0.14) 2.5(1.29) 73.2(0.11) 58.2(0.14) 1.6(0.9)
MCP 65.2(0.1) 47.2(0.14) 2.6(1.24) 63.7(0.1) 43.9(0.1) 1.2(0.74)
Lassointer 0(0) 0(0) 0.6(0) 57.6(0.12) 34(0.14) 0.4(0.49)
MCPinter 0(0) 0(0) 0.6(0) 57.4(0.12) 34(0.14) 0.4(0.49)
sgMCP 78.2(0.1) 58(0.13) 3.1(1.19) 88.6(0.07) 72.2(0.15) 2.3(1.37)
B-Logistic Sig 64.6(0.09) 43.3(0.12) 1.7(1.02) 56.5(0.11) 32.1(0.14) 0.2(0.31)
Lasso 75.5(0.08) 64.8(0.1) 3.9(1) 63.5(0.12) 48.6(0.16) 0.9(0.76)
MCP 71.5(0.08) 60(0.1) 3.7(0.94) 55.4(0.11) 36.4(0.14) 0.7(0.66)
Lassointer 0(0) 0(0) 0.6(0) 59.1(0.11) 38.4(0.14) 0.5(0.55)
MCPinter 0(0) 0(0) 0.6(0) 58.5(0.12) 37.9(0.14) 0.5(0.54)
sgMCP 83.5(0.07) 64.4(0.11) 3.5(1.15) 82.4(0.08) 61.1(0.15) 1.4(0.99)
n=500 C-Norm Sig 96.1(0.04) 92.1(0.07) 6.6(0.85) 96.1(0.04) 91.8(0.05) 3.7(0.58)
Lasso 96.8(0.03) 95.1(0.04) 6.8(0.75) 97.1(0.03) 96.1(0.04) 4.8(0.66)
MCP 94.2(0.04) 89.7(0.07) 6.3(0.88) 93.2(0.04) 88(0.07) 3.8(0.9)
Lassointer 0(0) 0(0) 0.6(0) 97.2(0.02) 93.2(0.05) 4.2(0.74)
MCPinter 0(0) 0(0) 0.6(0) 97.1(0.02) 93.1(0.05) 4.3(0.75)
sgMCP 99.2(0.01) 94.6(0.04) 6.6(0.98) 99.5(0) 94.1(0.04) 5.1(0.64)
C-MNL Sig 93.3(0.05) 87.5(0.07) 6.2(0.8) 87.5(0.09) 76.9(0.13) 2.3(0.99)
Lasso 95.3(0.04) 92(0.06) 6.7(0.69) 95.3(0.04) 92.3(0.05) 4.1(0.68)
MCP 91.6(0.04) 85.2(0.06) 5.7(0.91) 87.6(0.09) 78.8(0.12) 3.6(0.86)
Lassointer 0(0) 0(0) 0.6(0) 92(0.06) 85.6(0.08) 3.6(0.79)
MCPinter 0(0) 0(0) 0.6(0) 91.3(0.06) 84.6(0.08) 3.5(0.72)
sgMCP 98.5(0.02) 93(0.05) 6.7(0.8) 99.6(0) 93.3(0.09) 5.4(0.66)
C-t Sig 71.6(0.11) 53.9(0.15) 2.9(1.38) 78.9(0.11) 64.1(0.16) 1.6(0.95)
Lasso 80.1(0.11) 68.3(0.14) 4.2(1.27) 87.9(0.1) 80.4(0.14) 3.3(1.15)
MCP 73.5(0.11) 59.6(0.14) 3.6(1.14) 78.7(0.1) 64.2(0.15) 2.2(1.02)
Lassointer 0(0) 0(0) 0.6(0) 81.8(0.09) 67.4(0.13) 2(0.87)
MCPinter 0(0) 0(0) 0.6(0) 81.1(0.09) 66.6(0.13) 2(0.87)
sgMCP 91.1(0.09) 78.8(0.15) 4.9(1.55) 93.7(0.07) 84.3(0.13) 3.3(1.25)
B-Logistic Sig 83.6(0.08) 72.5(0.11) 4.5(1.19) 69.8(0.11) 51.4(0.15) 1(0.58)
Lasso 84.5(0.06) 83.8(0.09) 5.8(0.91) 73.5(0.1) 70.1(0.14) 2.7(0.94)
MCP 81.4(0.07) 78.3(0.1) 5.6(0.91) 61.2(0.1) 48.6(0.13) 1.7(0.66)
Lassointer 0(0) 0(0) 0.6(0) 88.8(0.06) 85.2(0.09) 3.6(0.83)
MCPinter 0(0) 0(0) 0.6(0) 88.6(0.06) 85.1(0.09) 3.6(0.84)
sgMCP 97.7(0.03) 91.7(0.05) 6.2(0.79) 97.3(0.03) 84.8(0.12) 4.5(0.84)

Table A9:

Simulation results mean(sd) based on 200 replication. p = 500 and GE data with Band structure, which has 8 main effects and 6 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
n=100 C-Norm Sig 69.6(0.08) 50.3(0.12) 2.3(1.28) 79.1(0.12) 64.3(0.16) 1.2(0.75)
Lasso 77.1(0.09) 63.3(0.11) 3.5(1.04) 91.6(0.05) 86.1(0.07) 3.3(0.78)
MCP 67.8(0.09) 49.5(0.12) 2.6(1.2) 82.7(0.09) 73.3(0.11) 3.1(0.87)
Lassointer 0(0) 0(0) 0.6(0) 72.6(0.1) 54.5(0.14) 1.2(0.77)
MCPinter 0(0) 0(0) 0.6(0) 72.1(0.1) 53.4(0.13) 1.2(0.78)
sgMCP 89.6(0.05) 76.2(0.09) 4.8(0.94) 95.2(0.04) 85.3(0.08) 3.5(0.74)
C-MNL Sig 69.6(0.1) 50.8(0.13) 2.4(0.97) 62.5(0.1) 39.3(0.14) 0.4(0.57)
Lasso 77.2(0.1) 63.4(0.13) 3.5(1.29) 74.5(0.12) 61(0.17) 1.7(0.89)
MCP 72.7(0.11) 55.7(0.15) 3.2(1.3) 63.3(0.1) 44.1(0.13) 1.3(0.68)
Lassointer 0(0) 0(0) 0.6(0) 82.7(0.09) 68.9(0.12) 2(0.77)
MCPinter 0(0) 0(0) 0.6(0) 81.9(0.09) 67.8(0.12) 2(0.76)
sgMCP 85.8(0.07) 70.3(0.12) 4(1.12) 89.2(0.09) 74.4(0.14) 2.3(1.13)
C-t Sig 63.1(0.11) 41.6(0.15) 1.8(1.29) 57(0.13) 35(0.14) 0.4(0.55)
Lasso 69.3(0.12) 50.9(0.16) 2.7(1.36) 65.2(0.13) 49.2(0.15) 1.1(0.82)
MCP 64.2(0.11) 43.8(0.15) 2.3(1.42) 60(0.13) 41.1(0.16) 1.1(0.8)
Lassointer 0(0) 0(0) 0.6(0) 74(0.11) 56.5(0.15) 1.3(0.86)
MCPinter 0(0) 0(0) 0.6(0) 73.9(0.11) 56.2(0.15) 1.3(0.86)
sgMCP 78.4(0.09) 57.8(0.14) 3.3(1.32) 79.8(0.12) 59.1(0.19) 1.3(1.02)
B-Logistic Sig 62.2(0.08) 39.9(0.11) 1.7(0.97) 53.5(0.11) 28.2(0.12) 0.1(0.34)
Lasso 71.2(0.09) 57.9(0.13) 3.2(1.15) 54.9(0.13) 35.5(0.14) 0.4(0.46)
MCP 66(0.1) 52(0.14) 3(1.1) 52.6(0.11) 32.2(0.13) 0.3(0.42)
Lassointer 0(0) 0(0) 0.6(0) 64.1(0.1) 44.3(0.14) 0.7(0.63)
MCPinter 0(0) 0(0) 0.6(0) 63.3(0.1) 43.7(0.13) 0.7(0.62)
sgMCP 79.2(0.08) 58.7(0.13) 3.1(1.04) 72.2(0.09) 44.5(0.11) 0.8(0.61)
n=500 C-Norm Sig 95.9(0.04) 92(0.06) 6.6(0.82) 85.8(0.09) 77.2(0.11) 2.9(0.59)
Lasso 95.7(0.04) 93(0.05) 6.9(0.77) 87.2(0.1) 82.2(0.11) 3.8(0.76)
MCP 94(0.04) 89.5(0.07) 6.4(0.96) 80.7(0.1) 70(0.12) 2.2(0.52)
Lassointer 0(0) 0(0) 0.6(0) 97.7(0.02) 93.8(0.04) 4.5(0.61)
MCPinter 0(0) 0(0) 0.6(0) 97.7(0.02) 93.8(0.04) 4.6(0.61)
sgMCP 97.5(0.02) 92.8(0.05) 6.9(0.79) 99(0.01) 93.9(0.06) 5(0.56)
C-MNL Sig 89.9(0.06) 82.6(0.08) 5.5(1.03) 84.5(0.07) 71.9(0.12) 2.1(0.75)
Lasso 93.5(0.04) 89.1(0.06) 6.4(0.8) 89.1(0.08) 82.4(0.1) 3.5(0.84)
MCP 87.5(0.06) 78.5(0.09) 5.3(0.98) 83.3(0.08) 70.7(0.13) 2.5(0.77)
Lassointer 0(0) 0(0) 0.6(0) 97.3(0.02) 93.6(0.04) 4.3(0.62)
MCPinter 0(0) 0(0) 0.6(0) 97.2(0.02) 93(0.04) 4.3(0.63)
sgMCP 98.8(0.02) 94.9(0.03) 7.1(0.68) 98.9(0.01) 89.6(0.16) 4.7(0.8)
C-t Sig 79.2(0.12) 64.7(0.18) 3.6(1.6) 72.5(0.11) 55.1(0.16) 1.1(0.88)
Lasso 87.6(0.1) 78.8(0.14) 5.2(1.65) 82.3(0.13) 74.6(0.16) 2.9(1.27)
MCP 81(0.11) 68.9(0.15) 4.3(1.48) 75.2(0.11) 62.1(0.14) 2.4(0.98)
Lassointer 0(0) 0(0) 0.6(0) 80.9(0.1) 66.4(0.13) 2.1(0.71)
MCPinter 0(0) 0(0) 0.6(0) 80.4(0.1) 66.2(0.13) 2.1(0.71)
sgMCP 93.7(0.06) 84.1(0.13) 5.5(1.41) 94(0.07) 83.5(0.14) 3.3(1.63)
B-Logistic Sig 72.2(0.09) 55.1(0.13) 2.9(1.11) 79.9(0.09) 65.3(0.12) 1.6(0.76)
Lasso 73.7(0.09) 65.8(0.12) 3.9(1.04) 77.8(0.09) 73.9(0.11) 2.5(0.96)
MCP 70.2(0.09) 61.3(0.12) 3.5(1.11) 74.7(0.09) 69.3(0.12) 2.3(0.86)
Lassointer 0(0) 0(0) 0.6(0) 89.1(0.05) 85.4(0.07) 3.5(0.79)
MCPinter 0(0) 0(0) 0.6(0) 88.9(0.05) 85.5(0.07) 3.5(0.8)
sgMCP 90.5(0.05) 78.1(0.08) 5.2(0.93) 90.5(0.07) 77.9(0.12) 2.8(1.09)

Table A10:

Simulation results mean(sd) based on 200 replication. p = 500 and SNP data with AR structure, which has 8 main effects and 6 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
n=100 C-Norm Sig 75.9(0.1) 60.3(0.13) 3.3(1.15) 72.5(0.12) 54.5(0.16) 0.9(0.72)
Lasso 78.9(0.08) 64.7(0.11) 3.4(1.11) 86.1(0.1) 79.5(0.12) 3.2(0.8)
MCP 72.2(0.1) 53.8(0.13) 2.3(1.24) 67.9(0.11) 50.7(0.12) 2.1(0.68)
Lassointer 0(0) 0(0) 0.6(0) 79.4(0.08) 65.7(0.11) 2(0.89)
MCPinter 0(0) 0(0) 0.6(0) 78.7(0.09) 64.6(0.11) 2(0.86)
sgMCP 87(0.06) 72.5(0.1) 4.4(0.89) 93.1(0.05) 83.6(0.09) 3.6(0.88)
C-MNL Sig 63(0.11) 41.7(0.13) 1.9(1.08) 56.1(0.11) 32(0.14) 0.3(0.43)
Lasso 67.3(0.11) 47.8(0.15) 2.1(1.15) 68.5(0.14) 55.1(0.18) 1.6(0.94)
MCP 62.7(0.11) 42.1(0.15) 1.9(1.17) 57.2(0.12) 37.8(0.15) 1.2(0.89)
Lassointer 0(0) 0(0) 0.6(0) 67.3(0.11) 48.4(0.14) 0.9(0.72)
MCPinter 0(0) 0(0) 0.6(0) 66.1(0.11) 46.7(0.15) 0.9(0.73)
sgMCP 81.3(0.08) 62.9(0.13) 3.7(1.21) 83.5(0.11) 65.1(0.18) 1.9(1.24)
C-t Sig 57.3(0.1) 32.4(0.12) 1.2(0.98) 60.4(0.14) 37.4(0.17) 0.3(0.5)
Lasso 59.5(0.12) 35.8(0.15) 1.4(1.12) 67.1(0.17) 51.1(0.21) 1.2(0.96)
MCP 58.1(0.11) 35.8(0.16) 1.6(1.37) 59.7(0.13) 38.7(0.14) 0.7(0.61)
Lassointer 0(0) 0(0) 0.6(0) 62.5(0.12) 40.9(0.14) 0.6(0.65)
MCPinter 0(0) 0(0) 0.6(0) 61.7(0.12) 39.8(0.14) 0.6(0.64)
sgMCP 71.3(0.11) 47.4(0.16) 2.2(1.29) 72.3(0.14) 48(0.2) 0.8(0.7)
B-Logistic Sig 52.6(0.1) 26.9(0.12) 0.5(0.67) 51.8(0.13) 24.1(0.13) 0.1(0.23)
Lasso 52.1(0.1) 34.9(0.12) 1.4(0.74) 46.2(0.13) 32.8(0.15) 0.3(0.56)
MCP 51(0.1) 32.2(0.11) 1.3(0.71) 38.5(0.13) 22.6(0.13) 0.2(0.39)
Lassointer 0(0) 0(0) 0.6(0) 41.1(0.13) 23.1(0.14) 0.1(0.29)
MCPinter 0(0) 0(0) 0.6(0) 38.9(0.13) 23(0.13) 0.1(0.29)
sgMCP 60(0.09) 32.7(0.11) 1.1(0.72) 57(0.14) 30.2(0.17) 0.3(0.55)
n=500 C-Norm Sig 96.7(0.03) 93.6(0.05) 6.8(0.75) 90.7(0.08) 83(0.12) 2.8(0.9)
Lasso 96(0.02) 94.3(0.04) 6.7(0.65) 94.8(0.04) 94.6(0.05) 4.9(0.57)
MCP 95.4(0.03) 92.9(0.04) 6.9(0.64) 80.7(0.09) 68.3(0.13) 2(1.12)
Lassointer 0(0) 0(0) 0.6(0) 92(0.05) 85.7(0.07) 3.7(0.75)
MCPinter 0(0) 0(0) 0.6(0) 91.9(0.05) 85.7(0.07) 3.7(0.72)
sgMCP 99.3(0.01) 96.2(0.03) 7(0.67) 99.8(0) 96.6(0.03) 5.4(0.71)
C-MNL Sig 88.4(0.07) 79.1(0.11) 5(1.14) 73.9(0.11) 58.4(0.14) 1.3(0.74)
Lasso 91.1(0.06) 85.4(0.09) 5.8(1.22) 81.5(0.12) 74.9(0.14) 3.4(0.95)
MCP 83.4(0.08) 71.1(0.13) 4.1(1.4) 68.7(0.1) 52.8(0.12) 1.3(0.73)
Lassointer 0(0) 0(0) 0.6(0) 93.3(0.05) 86.9(0.08) 3.5(0.78)
MCPinter 0(0) 0(0) 0.6(0) 92.7(0.05) 86(0.08) 3.3(0.79)
sgMCP 97.6(0.02) 90.9(0.05) 6.5(0.9) 98.9(0.02) 91.6(0.09) 4.7(1.1)
C-t Sig 78.5(0.1) 64.4(0.13) 3.5(1.33) 71.3(0.09) 51.7(0.14) 1(0.66)
Lasso 80.4(0.1) 68.5(0.13) 4(1.17) 85(0.08) 78.7(0.11) 3.3(0.92)
MCP 73(0.1) 54(0.15) 2.7(1.2) 70.2(0.1) 51.7(0.14) 1.7(0.78)
Lassointer 0(0) 0(0) 0.6(0) 83(0.08) 70.7(0.12) 2.5(0.9)
MCPinter 0(0) 0(0) 0.6(0) 82.1(0.09) 69(0.13) 2.4(0.95)
sgMCP 91.2(0.07) 80.9(0.11) 5.3(1.37) 92.3(0.05) 78.4(0.1) 3.4(1.36)
B-Logistic Sig 66.2(0.12) 45.6(0.16) 2.1(1.35) 63.9(0.1) 41.9(0.14) 0.7(0.66)
Lasso 63.8(0.09) 57.8(0.1) 3.3(0.99) 59(0.12) 59.7(0.16) 1.6(0.69)
MCP 60.9(0.1) 53.9(0.13) 3.1(1.14) 48.5(0.09) 40.6(0.12) 1(0.63)
Lassointer 0(0) 0(0) 0.6(0) 62.9(0.1) 52.2(0.13) 1.2(0.75)
MCPinter 0(0) 0(0) 0.6(0) 62.9(0.1) 52.4(0.14) 1.2(0.74)
sgMCP 78(0.08) 58.4(0.12) 3.4(1.2) 79.1(0.1) 59.1(0.15) 1.3(0.91)

Table A11:

Simulation results mean(sd) based on 200 replication. p = 500 and SNP data with Band structure, which has 8 main effects and 6 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
n=100 C-Norm Sig 75.9(0.09) 60.3(0.12) 3.1(1.01) 74.4(0.12) 57.2(0.16) 1.2(0.82)
Lasso 79(0.07) 65.9(0.09) 3.9(0.93) 84.4(0.09) 77.1(0.12) 2.7(0.99)
MCP 75.3(0.09) 60.6(0.12) 3.7(1.12) 67.7(0.12) 47.6(0.17) 1.3(0.87)
Lassointer 0(0) 0(0) 0.6(0) 80.6(0.1) 65.3(0.13) 1.4(0.76)
MCPinter 0(0) 0(0) 0.6(0) 79.4(0.09) 63.3(0.13) 1.4(0.77)
sgMCP 90.1(0.05) 77.7(0.08) 5.2(1.09) 94.4(0.04) 83.1(0.1) 2.9(1.09)
C-MNL Sig 67.1(0.09) 47.1(0.12) 2.2(1.2) 58.6(0.11) 35.3(0.13) 0.4(0.49)
Lasso 71.5(0.09) 53.7(0.13) 2.4(1.16) 77.8(0.11) 68.5(0.14) 2.3(0.9)
MCP 65.6(0.09) 44.9(0.12) 2.1(1.08) 62.2(0.1) 44.9(0.12) 1.5(0.83)
Lassointer 0(0) 0(0) 0.6(0) 65.1(0.12) 45.1(0.15) 0.8(0.74)
MCPinter 0(0) 0(0) 0.6(0) 64.4(0.12) 44.2(0.15) 0.8(0.74)
sgMCP 83.4(0.07) 67.3(0.12) 4(1.18) 88.9(0.07) 73.8(0.14) 2.5(1.09)
C-t Sig 62.9(0.12) 41.4(0.16) 1.9(1.25) 57.9(0.14) 34.8(0.15) 0.2(0.36)
Lasso 63.7(0.11) 42.5(0.14) 2(1.07) 69.7(0.13) 53.8(0.16) 1.2(0.97)
MCP 60.2(0.12) 37.8(0.16) 1.9(1.22) 60.4(0.13) 40.6(0.13) 0.9(0.63)
Lassointer 0(0) 0(0) 0.6(0) 64.4(0.13) 43.6(0.16) 0.7(0.68)
MCPinter 0(0) 0(0) 0.6(0) 63.4(0.13) 42.6(0.16) 0.7(0.68)
sgMCP 78.2(0.1) 58.3(0.16) 2.9(1.41) 79(0.1) 58(0.15) 1.2(1)
B-Logistic Sig 50.9(0.12) 26.9(0.13) 0.7(0.75) 45.4(0.13) 22.4(0.12) 0.1(0.36)
Lasso 55.8(0.09) 38.3(0.11) 1.5(0.98) 48.2(0.11) 34.1(0.12) 0.3(0.41)
MCP 53(0.1) 35(0.11) 1.3(0.94) 40(0.12) 24.7(0.13) 0.2(0.38)
Lassointer 0(0) 0(0) 0.6(0) 49.7(0.12) 29.5(0.14) 0.2(0.4)
MCPinter 0(0) 0(0) 0.6(0) 48(0.12) 29.6(0.14) 0.2(0.41)
sgMCP 60.8(0.1) 34.3(0.11) 1.1(0.86) 60.7(0.13) 31.6(0.16) 0.2(0.33)
n=500 C-Norm Sig 95.3(0.04) 90.4(0.07) 6.5(0.8) 94.2(0.05) 88.4(0.08) 3.4(0.74)
Lasso 95.3(0.02) 93.9(0.04) 6.6(0.73) 94.2(0.04) 94.2(0.05) 4.8(0.73)
MCP 89.7(0.03) 82.4(0.06) 4.7(1.04) 86.8(0.05) 80(0.08) 2.1(0.65)
Lassointer 0(0) 0(0) 0.6(0) 92.3(0.06) 86.2(0.08) 3.5(0.9)
MCPinter 0(0) 0(0) 0.6(0) 91.5(0.06) 84.2(0.08) 3.3(0.8)
sgMCP 98.9(0.01) 93.9(0.04) 7.2(0.73) 99.7(0) 94.4(0.04) 5.5(0.45)
C-MNL Sig 91.7(0.07) 85.3(0.11) 5.9(1.2) 80(0.1) 68.2(0.12) 2.1(0.76)
Lasso 93(0.06) 87.6(0.09) 5.9(1) 83.4(0.09) 77.4(0.11) 3.3(0.9)
MCP 85.2(0.09) 72.6(0.14) 4(1.75) 73.5(0.1) 58.6(0.13) 1.4(0.86)
Lassointer 0(0) 0(0) 0.6(0) 95.4(0.04) 91.3(0.06) 4.1(0.9)
MCPinter 0(0) 0(0) 0.6(0) 95.4(0.04) 91.1(0.07) 4.1(0.89)
sgMCP 97(0.03) 90.6(0.05) 6.7(0.69) 99.5(0) 95.9(0.03) 5(1.1)
C-t Sig 82.5(0.1) 70.1(0.15) 4.1(1.47) 68.1(0.13) 50(0.16) 0.8(0.71)
Lasso 87.2(0.08) 78(0.13) 4.9(1.32) 83.6(0.11) 78.5(0.14) 3.6(0.99)
MCP 79(0.1) 65.3(0.14) 3.7(1.38) 70.4(0.12) 56.2(0.15) 3(1.04)
Lassointer 0(0) 0(0) 0.6(0) 69.9(0.12) 52.1(0.14) 1.5(0.72)
MCPinter 0(0) 0(0) 0.6(0) 69.6(0.12) 51.7(0.14) 1.5(0.7)
sgMCP 96(0.08) 89.2(0.12) 6.3(1.4) 95.2(0.08) 84(0.14) 3.3(1.28)
B-Logistic Sig 64.9(0.09) 43.6(0.11) 1.8(0.98) 56.8(0.12) 36.1(0.12) 0.3(0.47)
Lasso 66.6(0.07) 60.5(0.09) 3.1(0.86) 52(0.09) 46.7(0.11) 0.2(0.44)
MCP 61.6(0.09) 55.6(0.11) 3(1.03) 44.6(0.1) 35.1(0.13) 0.3(0.48)
Lassointer 0(0) 0(0) 0.6(0) 55.2(0.12) 41.7(0.15) 0.6(0.56)
MCPinter 0(0) 0(0) 0.6(0) 55.2(0.12) 41(0.15) 0.6(0.57)
sgMCP 73(0.07) 50.7(0.11) 2.5(0.95) 72.9(0.08) 49.6(0.13) 1.1(0.67)

Table A12:

Simulation results mean(sd) based on 200 replication. p = 1,000 and GE data with AR structure, which has 16 main effects and 12 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
n=100 C-Norm Sig 65.5(0.08) 44.9(0.1) 2.8(1.34) 60.4(0.09) 37.4(0.12) 0.6(0.61)
Lasso 77.3(0.06) 63.6(0.08) 5.4(1.46) 70(0.08) 55.2(0.1) 2.4(1.03)
MCP 69.7(0.08) 53.5(0.1) 4.8(1.37) 62.9(0.09) 43(0.1) 2.1(1)
Lassointer 0(0) 0(0) 0.6(0) 69.8(0.08) 51.1(0.1) 1.5(0.9)
MCPinter 0(0) 0(0) 0.6(0) 68.9(0.08) 49.8(0.1) 1.5(0.9)
sgMCP 82.5(0.05) 63.8(0.07) 5.2(1.56) 85.1(0.07) 67(0.13) 3.8(1.24)
C-MNL Sig 63.7(0.08) 43(0.1) 2.3(1.13) 58.7(0.08) 35.8(0.08) 0.5(0.44)
Lasso 72.7(0.06) 56.1(0.08) 4.6(1.65) 64(0.09) 45.6(0.11) 1.7(0.89)
MCP 67.4(0.06) 49.7(0.08) 4.2(1.65) 59.6(0.08) 38(0.1) 1.4(0.76)
Lassointer 0(0) 0(0) 0.6(0) 72.7(0.07) 54.4(0.09) 1.8(0.93)
MCPinter 0(0) 0(0) 0.6(0) 72.1(0.07) 53.7(0.09) 1.8(0.92)
sgMCP 79.6(0.05) 58.7(0.08) 4.7(1.9) 80.4(0.06) 59(0.1) 2.4(1.17)
C-t Sig 59.4(0.09) 36.6(0.11) 1.6(1.2) 55.9(0.08) 31.9(0.1) 0.3(0.45)
Lasso 65.1(0.08) 46.8(0.1) 3.4(1.55) 62(0.08) 42.8(0.1) 1.1(0.72)
MCP 62.2(0.07) 41.9(0.08) 2.9(1.47) 57.7(0.08) 36.1(0.1) 1(0.78)
Lassointer 0(0) 0(0) 0.6(0) 64.8(0.09) 42.9(0.11) 0.8(0.76)
MCPinter 0(0) 0(0) 0.6(0) 64.1(0.09) 41.8(0.11) 0.8(0.76)
sgMCP 74.1(0.07) 51.2(0.11) 3.5(1.71) 73.8(0.06) 51.9(0.09) 1.1(0.8)
B-Logistic Sig 59.1(0.07) 36(0.09) 1.7(1.07) 53(0.1) 29.3(0.1) 0.2(0.46)
Lasso 63.1(0.07) 46.6(0.08) 3.2(1.36) 55.2(0.09) 38.2(0.11) 0.8(0.69)
MCP 60.8(0.07) 43.8(0.08) 3(1.27) 50.8(0.1) 31.9(0.11) 0.8(0.72)
Lassointer 0(0) 0(0) 0.6(0) 54.8(0.08) 33.2(0.09) 0.8(0.69)
MCPinter 0(0) 0(0) 0.6(0) 54.6(0.08) 33(0.09) 0.8(0.69)
sgMCP 72.1(0.05) 47.5(0.08) 2.7(1.2) 71.9(0.08) 44.8(0.11) 1.2(1.19)
n=500 C-Norm Sig 89.9(0.05) 82(0.07) 8.7(1.31) 82.5(0.06) 69.7(0.08) 2.8(0.94)
Lasso 93.8(0.03) 88.6(0.05) 11.1(1.42) 91.6(0.05) 87.5(0.06) 7.2(1.25)
MCP 90(0.04) 82.6(0.06) 9.7(1.7) 82.1(0.06) 71(0.09) 5.4(1.39)
Lassointer 0(0) 0(0) 0.6(0) 85.6(0.06) 74.9(0.08) 4.5(1.1)
MCPinter 0(0) 0(0) 0.6(0) 84.8(0.06) 73.6(0.08) 4.4(1.06)
sgMCP 97.4(0.02) 90.1(0.06) 11(2.05) 98(0.02) 89.2(0.11) 8.5(1.56)
C-MNL Sig 87.2(0.05) 78.5(0.07) 8.2(1.54) 72.2(0.06) 55.5(0.08) 1.7(0.54)
Lasso 92.4(0.03) 87.8(0.05) 11.4(1.43) 83.2(0.07) 75.4(0.08) 5.8(1.17)
MCP 87.8(0.04) 79.8(0.06) 9.9(1.64) 72.9(0.05) 59.3(0.07) 4.2(1.17)
Lassointer 0(0) 0(0) 0.6(0) 91.3(0.04) 83.3(0.06) 6.2(1.13)
MCPinter 0(0) 0(0) 0.6(0) 90.6(0.04) 82.4(0.06) 6.1(1.08)
sgMCP 96.2(0.02) 89.1(0.05) 11.5(1.51) 98.5(0.02) 84.9(0.15) 8(1.77)
C-t Sig 77.9(0.09) 64.5(0.13) 5.5(2.04) 70.8(0.09) 52.2(0.13) 1.3(0.86)
Lasso 87(0.09) 78.7(0.12) 8.9(2.49) 84.2(0.08) 76.3(0.11) 5.6(1.53)
MCP 80.7(0.09) 69.8(0.14) 7.3(2.34) 73.4(0.07) 59.1(0.09) 4.2(1.53)
Lassointer 0(0) 0(0) 0.6(0) 83.9(0.07) 71.1(0.1) 3.8(1.09)
MCPinter 0(0) 0(0) 0.6(0) 83.8(0.07) 70.9(0.1) 3.8(1.08)
sgMCP 92(0.06) 79.5(0.11) 9.1(2.68) 96.4(0.06) 88.6(0.12) 7.5(2.78)
B-Logistic Sig 78.2(0.06) 63.2(0.08) 5.1(1.56) 69.9(0.08) 51.7(0.11) 1.2(0.68)
Lasso 79.6(0.05) 74.9(0.06) 8.1(1.48) 70.6(0.06) 64.5(0.08) 3.4(0.98)
MCP 75.7(0.05) 68(0.07) 7.1(1.54) 64.7(0.08) 54.1(0.09) 2.7(0.99)
Lassointer 0(0) 0(0) 0.6(0) 76.5(0.06) 66.5(0.08) 3.4(0.97)
MCPinter 0(0) 0(0) 0.6(0) 76(0.06) 65.7(0.08) 3.4(0.95)
sgMCP 90.6(0.04) 78.4(0.07) 8.2(1.6) 89.3(0.07) 71.3(0.15) 5.3(1.18)

Table A13:

Simulation results mean(sd) based on 200 replication. p = 1,000 and GE data with Band structure, which has 16 main effects and 12 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
n=100 C-Norm Sig 65.2(0.08) 45.3(0.1) 2.6(1.35) 61.5(0.09) 39.7(0.1) 0.6(0.58)
Lasso 74.9(0.06) 59.7(0.08) 4.9(1.14) 69.5(0.08) 52.6(0.09) 2(0.79)
MCP 69.2(0.08) 52.2(0.1) 4.6(1.39) 62.3(0.08) 41.6(0.09) 1.6(0.67)
Lassointer 0(0) 0(0) 0.6(0) 68.6(0.07) 48.9(0.08) 1.6(0.94)
MCPinter 0(0) 0(0) 0.6(0) 67.7(0.07) 47.8(0.08) 1.6(0.92)
sgMCP 82.1(0.05) 63.6(0.09) 4.8(1.24) 81.4(0.06) 62.2(0.1) 2.8(1.02)
C-MNL Sig 62.7(0.09) 40.6(0.11) 1.9(1.2) 57.8(0.08) 34.9(0.1) 0.4(0.38)
Lasso 70(0.08) 52.8(0.09) 3.6(1.16) 66.6(0.09) 51.3(0.1) 2.1(0.96)
MCP 64.6(0.09) 44.6(0.11) 3.2(1.3) 58.7(0.08) 38(0.1) 1.4(0.68)
Lassointer 0(0) 0(0) 0.6(0) 69(0.07) 49.7(0.09) 1.4(0.97)
MCPinter 0(0) 0(0) 0.6(0) 68(0.07) 48.1(0.09) 1.4(0.97)
sgMCP 77.6(0.06) 56(0.08) 4.1(1.22) 81.5(0.07) 61(0.12) 3(0.78)
C-t Sig 60.5(0.08) 38.7(0.09) 2(1.06) 55.3(0.09) 31(0.1) 0.2(0.31)
Lasso 69.7(0.07) 51.9(0.1) 3.8(1.31) 59.5(0.1) 39.4(0.12) 1.1(0.92)
MCP 64.4(0.07) 45(0.1) 3.3(1.36) 56(0.09) 32.8(0.11) 0.9(0.85)
Lassointer 0(0) 0(0) 0.6(0) 63.6(0.07) 41.5(0.09) 0.8(0.76)
MCPinter 0(0) 0(0) 0.6(0) 62.8(0.07) 40.3(0.08) 0.8(0.76)
sgMCP 75.9(0.07) 53.9(0.11) 3.7(1.75) 76.2(0.09) 54.4(0.14) 1.5(1.19)
B-Logistic Sig 56.4(0.08) 32.7(0.09) 1.6(1.05) 54.9(0.08) 29.7(0.1) 0.3(0.46)
Lasso 60.7(0.08) 44(0.1) 2.9(1.15) 58.3(0.08) 41.3(0.09) 0.8(0.67)
MCP 57.9(0.08) 40.3(0.11) 2.8(1.18) 55.2(0.08) 36.5(0.09) 0.8(0.65)
Lassointer 0(0) 0(0) 0.6(0) 59.6(0.07) 39.2(0.09) 0.9(0.68)
MCPinter 0(0) 0(0) 0.6(0) 58.9(0.07) 38.8(0.09) 0.9(0.68)
sgMCP 70(0.06) 45.6(0.09) 2.8(1.25) 70.4(0.08) 46.4(0.12) 1(0.97)
n=500 C-Norm Sig 87.4(0.04) 78(0.07) 8.2(1.42) 81(0.07) 68.1(0.1) 2.6(0.82)
Lasso 92.7(0.03) 87.8(0.04) 11.3(1.26) 89.8(0.04) 84.3(0.06) 6.7(1.06)
MCP 87.1(0.04) 78.4(0.06) 9.1(1.71) 78.2(0.07) 64.5(0.1) 4.8(1.19)
Lassointer 0(0) 0(0) 0.6(0) 87.1(0.05) 76.5(0.06) 5(0.91)
MCPinter 0(0) 0(0) 0.6(0) 86.5(0.05) 75.6(0.06) 5(0.88)
sgMCP 96.7(0.02) 89.7(0.05) 11.8(1.25) 98.4(0.02) 89.5(0.08) 7.8(1.03)
C-MNL Sig 83.6(0.05) 72.4(0.08) 6.7(1.56) 83.1(0.06) 70.1(0.09) 2.6(0.85)
Lasso 89.7(0.05) 82.3(0.06) 9.6(1.28) 88.9(0.06) 82.1(0.07) 5.8(1.51)
MCP 85.2(0.06) 75.7(0.08) 8.5(1.7) 81.3(0.06) 67.2(0.09) 4.1(1.08)
Lassointer 0(0) 0(0) 0.6(0) 82.1(0.06) 68.4(0.09) 3.6(0.94)
MCPinter 0(0) 0(0) 0.6(0) 81.7(0.06) 67.9(0.09) 3.6(0.95)
sgMCP 94.8(0.02) 85.7(0.07) 10.4(1.44) 96.2(0.02) 89.8(0.05) 6.7(1.1)
C-t Sig 79.3(0.09) 65.6(0.12) 5.6(1.67) 73.7(0.09) 57.3(0.11) 1.6(0.97)
Lasso 85.7(0.07) 75.4(0.08) 8.1(1.7) 81.9(0.08) 72(0.11) 5(1.49)
MCP 81.6(0.08) 69.5(0.11) 7.4(1.99) 73.7(0.08) 59.4(0.09) 3.7(1.2)
Lassointer 0(0) 0(0) 0.6(0) 85.7(0.07) 74.7(0.11) 4.5(1.54)
MCPinter 0(0) 0(0) 0.6(0) 85(0.07) 73.3(0.11) 4.5(1.55)
sgMCP 92.3(0.05) 81.1(0.09) 9.2(2.46) 94.6(0.05) 80.8(0.17) 6.4(1.96)
B-Logistic Sig 75.6(0.06) 60(0.09) 4.8(0.94) 74.1(0.08) 57.6(0.1) 1.5(0.75)
Lasso 77.5(0.06) 72.1(0.07) 7.6(1.58) 78.5(0.06) 75.5(0.08) 5(1.26)
MCP 72.8(0.06) 64.8(0.08) 6.7(1.47) 70.7(0.08) 63.3(0.1) 4.3(1.16)
Lassointer 0(0) 0(0) 0.6(0) 66.8(0.06) 53(0.08) 2.8(0.8)
MCPinter 0(0) 0(0) 0.6(0) 66.3(0.06) 52.2(0.08) 2.7(0.81)
sgMCP 91.7(0.03) 78.2(0.06) 9.5(1.4) 93.9(0.1) 72.9(0.15) 6.2(1.62)

Table A14:

Simulation results mean(sd) based on 200 replication. p = 1,000 and SNP data with AR structure, which has 16 main effects and 12 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
n=100 C-Norm Sig 66.9(0.07) 46.7(0.09) 2.5(1.19) 63.1(0.08) 41(0.1) 0.7(0.62)
Lasso 70.6(0.06) 52.3(0.08) 3.2(1.46) 75.8(0.07) 64.2(0.08) 2.8(1.01)
MCP 64.5(0.06) 43.5(0.09) 3.1(1.73) 60.9(0.08) 38.8(0.1) 1.1(0.76)
Lassointer 0(0) 0(0) 0.6(0) 70.7(0.08) 51.9(0.1) 1.6(0.93)
MCPinter 0(0) 0(0) 0.6(0) 69.2(0.08) 49.6(0.09) 1.5(0.81)
sgMCP 83.7(0.04) 65.2(0.08) 5.9(1.24) 84.7(0.06) 65.7(0.11) 2.6(0.9)
C-MNL Sig 60.8(0.07) 37.8(0.08) 1.6(0.98) 56.6(0.09) 33.2(0.1) 0.3(0.43)
Lasso 63.3(0.07) 41.3(0.09) 1.9(1.15) 68.7(0.09) 55.3(0.11) 2.3(1.19)
MCP 60.6(0.07) 38(0.09) 1.6(1.08) 56.6(0.09) 35.9(0.11) 1.3(0.64)
Lassointer 0(0) 0(0) 0.6(0) 70.5(0.08) 52.2(0.11) 1.5(0.97)
MCPinter 0(0) 0(0) 0.6(0) 69.1(0.08) 50.2(0.12) 1.5(0.95)
sgMCP 73.4(0.06) 50.7(0.09) 3.6(1.33) 79.9(0.07) 60.2(0.13) 2.7(1.65)
C-t Sig 57(0.08) 34.3(0.09) 1.6(1.17) 53.5(0.08) 28.1(0.09) 0.3(0.4)
Lasso 61.6(0.07) 39.3(0.09) 1.8(1.27) 61.9(0.1) 43.4(0.13) 1.1(0.9)
MCP 58.4(0.07) 36.1(0.1) 1.9(1.37) 55.9(0.09) 33.7(0.12) 1(0.89)
Lassointer 0(0) 0(0) 0.6(0) 60.5(0.09) 37.8(0.11) 0.6(0.66)
MCPinter 0(0) 0(0) 0.6(0) 59.8(0.09) 36.6(0.11) 0.6(0.65)
sgMCP 71.9(0.08) 48.6(0.11) 3.3(1.79) 71.9(0.09) 49.1(0.12) 1.3(1.01)
B-Logistic Sig 45.9(0.09) 20.8(0.09) 0.4(0.62) 47.3(0.09) 21.1(0.1) 0.1(0.28)
Lasso 43.7(0.13) 33.2(0.08) 1.3(0.92) 28.8(0.1) 26.1(0.09) 0.1(0.28)
MCP 38.1(0.15) 28.3(0.1) 1.1(0.89) 23.3(0.12) 23.5(0.1) 0.2(0.4)
Lassointer 0(0) 0(0) 0.6(0.06) 39.5(0.1) 23.5(0.09) 0.1(0.22)
MCPinter 0(0) 0(0) 0.6(0.06) 35.9(0.1) 24.4(0.09) 0.1(0.22)
sgMCP 54.9(0.06) 26(0.06) 0.9(0.83) 52.5(0.08) 23.2(0.09) 0.1(0.23)
n=500 C-Norm Sig 89.5(0.04) 82(0.06) 9.1(1.5) 82(0.07) 69.1(0.09) 2.7(0.78)
Lasso 91.2(0.03) 85.7(0.04) 10.5(1.62) 89.9(0.06) 87.6(0.07) 8.1(1.34)
MCP 85.7(0.05) 76.3(0.09) 7.2(2.28) 74.4(0.06) 58.7(0.08) 2.2(0.89)
Lassointer 0(0) 0(0) 0.6(0) 83(0.05) 72.2(0.07) 4.8(1.18)
MCPinter 0(0) 0(0) 0.6(0) 82.9(0.05) 72.1(0.06) 4.8(1.17)
sgMCP 97.6(0.02) 90.2(0.04) 12.7(1.34) 98.8(0.01) 86.7(0.12) 9.1(1.67)
C-MNL Sig 83.6(0.05) 71.9(0.07) 6.7(1.56) 77.6(0.07) 62.6(0.09) 2.2(0.9)
Lasso 85.9(0.05) 75.9(0.08) 7.3(1.53) 88.8(0.06) 84.3(0.08) 7(1.23)
MCP 76.8(0.06) 58.9(0.09) 4.2(1.59) 69.1(0.07) 48.4(0.09) 2(1.02)
Lassointer 0(0) 0(0) 0.6(0) 82(0.06) 68.9(0.09) 3.1(1.27)
MCPinter 0(0) 0(0) 0.6(0) 81.5(0.06) 68.3(0.09) 3.1(1.22)
sgMCP 94.7(0.03) 84.9(0.06) 10.6(1.62) 98.2(0.02) 82.7(0.13) 7.5(2.31)
C-t Sig 71.9(0.1) 54.7(0.13) 4.1(1.85) 65.9(0.1) 45.6(0.12) 0.9(0.64)
Lasso 76.2(0.08) 61.6(0.11) 5(1.77) 79.6(0.08) 71.2(0.11) 4.8(1.55)
MCP 71(0.09) 53.4(0.13) 3.8(1.85) 64(0.08) 44.5(0.09) 2.4(0.86)
Lassointer 0(0) 0(0) 0.6(0) 71.1(0.09) 53.8(0.12) 2(1.37)
MCPinter 0(0) 0(0) 0.6(0) 70.5(0.09) 52.5(0.12) 1.9(1.27)
sgMCP 89.2(0.06) 76.2(0.11) 8.2(2.41) 90(0.06) 70.6(0.17) 4.4(2.14)
B-Logistic Sig 56.3(0.08) 32.3(0.09) 1(0.99) 56.4(0.08) 32.3(0.1) 0.2(0.33)
Lasso 47.5(0.07) 44.9(0.07) 2.8(1.02) 41.4(0.06) 39.1(0.08) 0.9(0.82)
MCP 45(0.06) 39.9(0.08) 2.7(1.01) 37.5(0.07) 31.5(0.09) 0.6(0.58)
Lassointer 0(0) 0(0) 0.6(0) 46.2(0.09) 36(0.11) 0.5(0.6)
MCPinter 0(0) 0(0) 0.6(0) 47(0.09) 36.9(0.11) 0.5(0.61)
sgMCP 64.4(0.06) 38.5(0.08) 2.3(1.38) 64.3(0.08) 38.2(0.09) 0.8(0.74)

Table A15:

Simulation results mean(sd) based on 200 replication. p = 1,000 and SNP data with Band structure, which has 16 main effects and 12 interactions.

Main
Interaction
AUC pAUC Top40 AUC pAUC Top40
n=100 C-Norm Sig 65.1(0.07) 44.6(0.09) 2.6(1.3) 60.5(0.07) 38.5(0.08) 0.5(0.65)
Lasso 69.3(0.05) 50.1(0.07) 2.5(1.38) 72.5(0.09) 61.2(0.1) 2.6(1.03)
MCP 62.7(0.06) 41.8(0.09) 2.4(1.4) 61.4(0.08) 42.1(0.1) 1.9(0.89)
Lassointer 0(0) 0(0) 0.6(0) 67.6(0.07) 47.8(0.09) 1.3(0.88)
MCPinter 0(0) 0(0) 0.6(0) 66.7(0.07) 46.2(0.09) 1.3(0.88)
sgMCP 80.4(0.04) 62.8(0.07) 5.5(1.7) 82.9(0.05) 63.8(0.12) 3(1.15)
C-MNL Sig 61.2(0.08) 38(0.1) 1.9(1.27) 59.6(0.08) 37.1(0.09) 0.5(0.53)
Lasso 66.8(0.08) 46.6(0.11) 2.9(1.52) 71.3(0.08) 59.1(0.1) 2(1.05)
MCP 64.4(0.09) 44.5(0.12) 3.3(1.67) 59.6(0.08) 37.6(0.09) 1.1(0.76)
Lassointer 0(0) 0(0) 0.6(0) 69.5(0.09) 50.5(0.11) 1.4(1.03)
MCPinter 0(0) 0(0) 0.6(0) 67.9(0.08) 48.4(0.11) 1.3(0.98)
sgMCP 75.3(0.07) 53.9(0.1) 4.3(1.63) 76(0.08) 51.5(0.11) 1.6(1.06)
C-t Sig 55.8(0.07) 31.5(0.09) 1.4(1.03) 54.7(0.09) 30.3(0.1) 0.3(0.5)
Lasso 55.6(0.08) 31.4(0.09) 1.4(0.9) 64.8(0.11) 48.2(0.16) 1.3(1.19)
MCP 54.5(0.08) 30.4(0.1) 1.6(1.18) 55.3(0.08) 32.3(0.1) 1(0.93)
Lassointer 0(0) 0(0) 0.6(0) 59.9(0.09) 36.9(0.12) 0.5(0.64)
MCPinter 0(0) 0(0) 0.6(0) 59(0.09) 35.6(0.11) 0.5(0.61)
sgMCP 67.3(0.07) 42.3(0.1) 2.4(1.18) 71.4(0.08) 45.8(0.13) 0.9(1)
B-Logistic Sig 48.5(0.07) 23.6(0.07) 0.6(0.56) 49.5(0.07) 24.8(0.08) 0.1(0.26)
Lasso 39.2(0.12) 30.4(0.09) 1.2(0.99) 31.4(0.11) 29.5(0.09) 0.3(0.47)
MCP 38.1(0.14) 27.8(0.09) 1.1(0.97) 25.1(0.12) 25.7(0.09) 0.2(0.43)
Lassointer 0(0) 0(0) 0.6(0) 44.9(0.08) 25.7(0.08) 0.1(0.3)
MCPinter 0(0) 0(0) 0.6(0) 43.7(0.09) 25.9(0.08) 0.1(0.3)
sgMCP 55.3(0.09) 25.9(0.1) 0.9(0.78) 57.5(0.1) 29.2(0.11) 0.2(0.35)
n=500 C-Norm Sig 91(0.04) 83.9(0.05) 9.3(1.23) 77.8(0.05) 63.6(0.07) 2.4(0.85)
Lasso 92.3(0.03) 87.4(0.04) 10.1(1.29) 85.5(0.06) 80.7(0.07) 6.5(1.06)
MCP 86.4(0.04) 75.8(0.07) 6.3(1.75) 73.7(0.06) 59.3(0.07) 2.4(0.6)
Lassointer 0(0) 0(0) 0.6(0) 76.3(0.07) 61.6(0.08) 3.2(0.85)
MCPinter 0(0) 0(0) 0.6(0) 75.6(0.07) 60.6(0.08) 3.1(0.8)
sgMCP 96.2(0.02) 89.7(0.05) 12.2(1) 99.4(0) 90.7(0.1) 9.1(1.39)
C-MNL Sig 81.7(0.06) 69.7(0.08) 6.1(1.52) 79.9(0.07) 65.2(0.1) 1.9(1.1)
Lasso 83.1(0.06) 72.4(0.08) 7.2(1.74) 91.9(0.04) 88.6(0.06) 7.3(1.49)
MCP 75.7(0.06) 58(0.09) 4.8(1.88) 74.7(0.06) 58(0.09) 2.4(0.63)
Lassointer 0(0) 0(0) 0.6(0) 87.4(0.05) 78.3(0.07) 5.5(1.31)
MCPinter 0(0) 0(0) 0.6(0) 87(0.05) 77.9(0.07) 5.6(1.35)
sgMCP 94.2(0.03) 83.1(0.08) 10.7(1.73) 97.2(0.02) 89.6(0.08) 7.6(1.53)
C-t Sig 68.6(0.09) 50.6(0.11) 3.6(1.68) 61.5(0.08) 39(0.11) 0.6(0.57)
Lasso 71.1(0.09) 54.1(0.12) 4.5(1.61) 73.8(0.1) 62.8(0.14) 3.7(1.6)
MCP 66.7(0.09) 47.9(0.11) 3.9(1.95) 61.7(0.08) 41.8(0.1) 2.3(1.13)
Lassointer 0(0) 0(0) 0.6(0) 79.9(0.09) 67(0.12) 3.6(1.66)
MCPinter 0(0) 0(0) 0.6(0) 79.2(0.09) 65.9(0.13) 3.5(1.58)
sgMCP 84.4(0.07) 68.9(0.11) 7.3(1.97) 89.6(0.05) 75.9(0.11) 4.8(2.1)
B-Logistic Sig 55.3(0.08) 31.2(0.08) 1(0.92) 55.2(0.09) 33.1(0.11) 0.3(0.44)
Lasso 46.7(0.07) 42.1(0.08) 3.1(1.11) 44(0.07) 43.9(0.09) 1(0.89)
MCP 45.2(0.08) 38.6(0.1) 2.6(1.05) 36.6(0.08) 30.1(0.08) 0.2(0.51)
Lassointer 0(0) 0(0) 0.6(0) 47.4(0.08) 36(0.1) 0.5(0.57)
MCPinter 0(0) 0(0) 0.6(0) 48(0.08) 36.9(0.1) 0.5(0.58)
sgMCP 65.2(0.07) 39.2(0.09) 2.4(1.19) 64.3(0.1) 38.9(0.12) 0.6(0.75)

Table A16:

Analysis of the GENEVA diabetes data using the proposed approach: observed occurence index.

SNP Gene* main age famdb act trans ceraf heme
rs17090278 RP11-593F5.2 0.86 0.85 0.74 0.72 0.69 0.64
rs17090286 RP11-593F5.2 0.87 0.86 0.77 0.72 0.68 0.63
rs13122165 RP11-593F5.2 0.57 0.49
rs17828144 RP11-593F5.2 0.72 0.69
rs17085296 RP11-63H19.1 0.77 0.76 0.75 0.75 0.68 0.63 0.61
rs1430504 RP11-707A18.1 0.75 0.74 0.69 0.68 0.66 0.61 0.58
rs6551878 RP11-707A18.1 0.74 0.74 0.7 0.68 0.68 0.65 0.61
rs6823601 RP11-707A18.1 0.74 0.74 0.7 0.68 0.68 0.65 0.61
rs13107026 MIR1269A 0.64
rs1397755 MIR1269A 0.73
rs13151560 MIR1269A 0.66
rs1858306 MIR1269A 0.59
rs10016795 MIR1269A 0.55
rs12331987 MIR1269A 0.66
rs10000219 MIR1269A 0.64
rs4860208 RPS23P3 0.72
rs1511286 RPS23P3 0.74
rs2136822 RPS23P3 0.55 0.5
rs11936928 RPS23P3 0.78 0.72 0.72 0.69 0.67 0.63 0.61
rs6838523 RPS23P3 0.78 0.73 0.71 0.69 0.66 0.63 0.61
rs17088752 UBA6-AS1 0.77 0.73
rs17088764 UBA6-AS1 0.54 0.49 0.48
rs353169 UBA6-AS1 0.54 0.49 0.48
rs10033058 YTHDC1 0.53
rs2293595 YTHDC1 0.57
rs17089267 YTHDC1 0.57
rs11249477 CSN1S2AP 0.72 0.69 0.67 0.63 0.61 0.52
rs1399247 CSN1S2AP 0.6
rs1717600 CSN1S2AP 0.63
rs10003790 DCK 0.7 0.7 0.69 0.68 0.68 0.66 0.63
rs10012631 DCK 0.68 0.68 0.67 0.67 0.67 0.65 0.63
rs9790462 DCK 0.57 0.53 0.49 0.47
rs12649753 RN7SL218P 0.54 0.46 0.45 0.44 0.42 0.39 0.35
rs7681755 LINC01088 0.67
rs11731223 NAA11 0.52
rs17003746 GK2 0.54
rs17003749 GK2 0.52
rs10004901 C4orf22 0.64 0.61 0.58 0.56 0.55 0.51
rs1391262 RP11-689K5.3 0.58
rs35036928 RP11-689K5.3 0.7
rs4693369 RP11-689K5.3 0.66
rs7672440 RP11-689K5.3 0.69
rs676592 RP11-689K5.3 0.87
rs1993798 RP11-689K5.3 0.68 0.63 0.57 0.51 0.48 0.45
rs2868257 RP11-689K5.3 0.71 0.68
rs6535281 RP11-689K5.3 0.68 0.63 0.57 0.51 0.48 0.44
rs612318 RP11-689K5.3 0.59
rs1824657 RP11-689K5.3 0.63 0.59 0.54 0.48 0.46 0.4
rs11722328 RP11-689K5.3 0.7 0.67 0.65 0.63 0.61
rs2199487 RP11-689K5.3 0.7 0.67 0.65 0.62 0.61
rs392112 RP11-218C23.1 0.62
rs434193 RP11-218C23.1 0.63
rs6842681 RP11-218C23.1 0.61
rs416035 RP11-218C23.1 0.63
rs432755 RP11-218C23.1 0.61
rs375432 RP11-218C23.1 0.63
rs407430 RP11-218C23.1 0.61
rs400023 RP11-218C23.1 0.63
rs585787 RP11-218C23.1 0.59
rs3775373 FAM13A 0.56 0.51 0.5
rs2726516 PPA2 0.54 0.53 0.51
rs2636739 PPA2 0.55 0.54 0.51
rs2686293 RP13-612N21.1 0.72
*

Genes that SNPs belong to or are the closest to.

Table A17:

Analysis of the TCGA SKCM data using the proposed approach: observed occurrence index.

Gene Age PN Gender Breslow’s depth Clark level
ZNF25 0.68 0.67 0.67 0.67 0.55
ZNF37A 0.42 0.42 0.42 0.42 0.38 0.2
PJA2 0.41
KIF5B 0.58 0.56 0.56 0.56 0.47
NUDT4 0.49
KTN1 0.68 0.61 0.6 0.56 0.46
LRP12 0.62
TRIP11 0.54 0.47 0.45 0.45 0.43
EEA1 0.57
ARHGAP12 0.7 0.66 0.66 0.64 0.55
APC 0.4 0.35 0.35 0.34
FNDC3A 0.39 0.34 0.34
EIF5 0.48 0.47 0.47 0.47 0.45 0.38
DGLUCY 0.59 0.53 0.52 0.5
ARHGAP5 0.65
RRM2B 0.42 0.39 0.38
UHRF1BP1L 0.45 0.37 0.37 0.31 0.27
SEL1L 0.54
VPS13C 0.45 0.29 0.28
PDP2 0.39 0.32 0.3 0.28 0.22
GOLGA5 0.68 0.65 0.6 0.58
HSPA13 0.4 0.31 0.31 0.29 0.28
RASSF8 0.53
EFR3A 0.35 0.31 0.31 0.3 0.28
PCNX1 0.42 0.36 0.35 0.33 0.28
PPM1A 0.68 0.54 0.49 0.47 0.39
DNAL1 0.4 0.4 0.4 0.4 0.31
CUL2 0.37 0.35 0.35 0.35 0.33
ZMYND11 0.52 0.49 0.49 0.49 0.25
MPP5 0.45 0.34 0.32 0.32 0.25
CCPG1 0.49
HIF1A 0.4
PNMA1 0.43 0.38 0.37 0.37
EXOC5 0.43 0.38 0.38 0.37 0.32
GSKIP 0.42 0.38 0.35 0.34 0.25
ARL1 0.43
EWSR1 0.41 0.3 0.29
CASC4 0.41
RCC2 0.45
ARHGAP21 0.53
JKAMP 0.64
ACBD5 0.46 0.45 0.45 0.45 0.4
SETD3 0.45 0.43 0.41 0.41 0.37
EHMT1 0.32
RAB18 0.41 0.37 0.35 0.35 0.3
CREG1 0.42

Footnotes

Data Availability Statement

GENEVA diabetes data that support the findings of this study are available from the Nurses’ Health Study/Health Professionals Follow-up Study. Restrictions apply to the availability of these data, which were used under license for this study. Data are available at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000091.v2.p1 with the permission of National Human Genome Research Institute.

The TCGA skin cutaneous melanoma data that support the findings of this study are available from the Cancer Genome Atlas Program. Data are generated by The TCGA Research Network at https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga.

References

  1. Bien J, Simon N, & Tibshirani R (2015). Convex hierarchical testing of interactions. Annals of Applied Statistics, 9(1), 27–42. [Google Scholar]
  2. Bien J, Taylor J, & Tibshirani R (2013). A lasso for hierarchical interactions. Annals of Statistics, 41(3), 1111–1141. doi: 10.1214/13-AOS1096 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Breheny P, & Huang J (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. The Annals of Applied Statistics, 5(1), 232–253. doi: 10.1214/10-AOAS388 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen J, & Chen Z (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759–771. doi: 10.1093/biomet/asn034 [DOI] [Google Scholar]
  5. Dai JY, Logsdon BA, Huang Y, Hsu L, Reiner AP, Prentice RL, & Kooperberg C (2012). Simultaneously testing for marginal genetic association and gene-environment interaction. American Journal of Epidemiology, 176(2), 164–173. doi: 10.1093/aje/kwr521 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gauderman WJ, Zhang P, Morrison JL, & Lewinger JP (2013). Finding novel genes by testing G × E interactions in a genome-wide association study. Genetic Epidemiology, 37(6), 603–613. doi: 10.1002/gepi.21748 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Hao N, & Zhang HH (2017). A note on high-dimensional linear regression with interactions. The American Statistician, 71(4), 291–297. doi: 10.1080/00031305.2016.1264311 [DOI] [Google Scholar]
  8. Huang J, & Ma S (2010). Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis, 16(2), 176–195. doi: 10.1007/s10985-009-9144-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Hunter DJ (2005). Gene–environment interactions in human diseases. Nature Reviews Genetics, 6(4), 287–298. doi: 10.1038/nrg1578 [DOI] [PubMed] [Google Scholar]
  10. Hutter CM, Mechanic LE, Chatterjee N, Kraft P, Gillanders EM, & NCI Gene-Environment Think Tank. (2013). Gene-environment interactions in cancer epidemiology: a National Cancer Institute Think Tank report. Genetic Epidemiology, 37(7), 643–657. doi: 10.1002/gepi.21756 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Kim G, Lai CQ, Arnett DK, Parnell LD, Ordovas JM, Kim Y, & Kim J (2017). Detection of gene–environment interactions in a family-based population using SCAD. Statistics in Medicine, 36(22), 3547–3559. doi: 10.1002/sim.7382 [DOI] [PubMed] [Google Scholar]
  12. Kraft P, Yen Y-C, Stram DO, Morrison J, & Gauderman WJ (2007). Exploiting gene-environment interaction to detect genetic associations. Human Heredity, 63(2), 111–119. doi: 10.1159/000099183 [DOI] [PubMed] [Google Scholar]
  13. Li J, Dan J, Li C, & Wu R (2013). A model-free approach for detecting interactions in genetic association studies. Briefings in Bioinformatics, 15(6), 1057–1068. doi: 10.1093/bib/bbt082 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Lim M, & Hastie T (2015). Learning interactions via hierarchical group-lasso regularization. Journal of Computational and Graphical Statistics, 24(3), 627–654. doi: 10.1080/10618600.2014.938812 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Liu J, Huang J, Zhang Y, Lan Q, Rothman N, Zheng T, & Ma S (2013). Identification of gene-environment interactions in cancer studies using penalization. Genomics, 102(4), 189–194. doi: 10.1016/j.ygeno.2013.08.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Meinshausen N, & Bühlmann P (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473. doi: 10.1111/j.1467-9868.2010.00740.x [DOI] [Google Scholar]
  17. Vrieling A (2013). Evidence of gene–environment interactions between common breast cancer susceptibility loci and established environmental risk factors. PLoS Genetics, 9(3), e1003284 doi: 10.1371/journal.pgen.1003284 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, & Moore JH (2001). Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. The American Journal of Human Genetics, 69(1), 138–147. doi: 10.1086/321276 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. She Y, Wang Z, & Jiang H (2018). Group regularized estimation under structural hierarchy. Journal of the American Statistical Association, 113(521), 445–454. doi: 10.1080/01621459.2016.1260470 [DOI] [Google Scholar]
  20. Shi X, Liu J, Huang J, Zhou Y, Xie Y, & Ma S (2014). A penalized robust method for identifying gene–environment interactions. Genetic Epidemiology, 38(3), 220–230. doi: 10.1002/gepi.21795 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Simonds NI, Ghazarian AA, Pimentel CB, Schully SD, Ellison GL, Gillanders EM, & Mechanic LE (2016). Review of the Gene-Environment Interaction Literature in Cancer: What Do We Know? Genetic Epidemiology, 40(5), 356–365. doi: 10.1002/gepi.21967 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Stute W (1996). Distributional convergence under random censorship when covariables are present. Scandinavian Journal of Statistics, 461–471. doi: 10.1007/s004400050075 [DOI] [Google Scholar]
  23. Sun R, Carroll RJ, Christiani DC, & Lin X (2018). Testing for gene–environment interaction under exposure misspecification. Biometrics, 74(2), 653–662. doi: 10.1111/biom.12813 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Thomas D (2010). Gene-environment-wide association studies: emerging approaches. Nature Reviews Genetics, 11(4), 259–272. doi: 10.1038/nrg2764 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Walter SD (2005). The partial area under the summary ROC curve. Statistics in Medicine, 24(13), 2025–2040. doi: 10.1002/sim.2103 [DOI] [PubMed] [Google Scholar]
  26. Wang X, Xu Y, & Ma S (2019). Identifying gene–environment interactions incorporating prior information. Statistics in Medicine, 38(9), 1620–1633. doi: 10.1002/sim.8064 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Wu C, Cui Y, & Ma S (2014). Integrative analysis of gene-environment interactions under a multi-response partially linear varying coefficient model. Statistics in Medicine, 33(28), 4988–4998. doi: 10.1002/sim.6287 [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Wu C, Jiang Y, Ren J, Cui Y & Ma S, 2018. Dissecting gene-environment interactions: A penalized robust approach accounting for hierarchical structures. Statistics in medicine, 37(3), 437–456. doi: 10.1002/sim.7518 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Wu M, Zang Y, Zhang S, Huang J, & Ma S (2017). Accommodating missingness in environmental measurements in gene-environment interaction analysis. Genetic Epidemiology, 41(6), 523–554. doi: 10.1002/gepi.22055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Wu C, Kraft P, Zhai K, Chang J, Wang Z, Li Y, … Abnet CC (2012). Genome-wide association analyses of esophageal squamous cell carcinoma in Chinese identify multiple susceptibility loci and gene-environment interactions. Nature Genetics, 44(10), 1090–1097. doi: 10.1038/ng.2411 [DOI] [PubMed] [Google Scholar]
  31. Zhang C-H (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38(2), 894–942. doi: 10.1214/09-AOS729 [DOI] [Google Scholar]
  32. Zhao N, Zhang H, Clark JJ, Maity A, & Wu MC (2018). Composite kernel machine regression based on likelihood ratio test for joint testing of genetic and gene–environment interaction effect. Biometrics. doi: 10.1111/biom.13003 [DOI] [PubMed] [Google Scholar]
  33. [dataset] Nurses’ Health Study/Health Professionals Follow-up Study; 2009; GENEVA Genes and Environment Initiatives in Type 2 Diabetes; dbGaP; https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000091.v2.p1. [Google Scholar]
  34. [dataset] Cancer Genome Atlas Program; 2019; The TCGA skin cutaneous melanoma data; National Cancer Institute; https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga [Google Scholar]

RESOURCES