Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Jun 21.
Published in final edited form as: Hum Hered. 2014 Jun 21;78(1):17–26. doi: 10.1159/000360161

Analysis of Gene-Gene Interactions Using Gene-Trait Similarity Regression

Xin Wang 1,2, Michael P Epstein 3, Jung-Ying Tzeng 1,2,4,*
PMCID: PMC4115296  NIHMSID: NIHMS571225  PMID: 24969398

Abstract

Objective

Gene-Gene interactions (GxG) are important to study because of their extensiveness in biological systems and their potential in explaining missing heritability of complex traits. In this work, we propose a new similarity-based test to assess GxG at gene level, which permits the study of epistasis at biologically functional units with amplified interaction signals.

Methods

Under the framework of gene-trait similarity regression (SimReg), we propose a gene-based test for detecting gene-gene interactions. SimReg uses a regression model to correlate trait similarity with genotypic similarity across a gene. Unlike existing gene-level methods based on leading principal components (PCs), SimReg summarizes all information on genotypic variation within a gene and can be used to assess the joint/interactive effects of two genes as well as the effect of one gene conditional on another.

Results

Using simulations and a real data application on warfarin study, we show that the SimReg GxG tests have satisfactory power and robustness under different genetic architecture when compared to existing gene-based interaction tests such as PC analysis or partial least squares (PLS). A genomewide association study with ~20,000 genes may be completed on a parallel computing system in 2 weeks.

Introduction

Gene-Gene interactions (GxG) are important to study since they are believed to be widespread in biological systems [1, 2] and are likely involved in gene regulation, signal transduction, biochemical networks, as well as other physiological and developmental pathways [3-5]. GxG further can provide insight into the missing heritability of complex traits [6-8] and explain replication failures of initial GWAS findings [9-10]. Regarding this latter point, GxG help explain between-study differences in marginal genetic effects, which can be due to between-study differences in frequency of modifier genetic variant(s). The study of GxG has provided insight into biological mechanisms for many complex diseases, including Alzheimer’s disease, diabetes, cardiovascular disease, autism, multiple sclerosis, and cancer [11-17].

Analysts often implement GxG tests using a regression model accounting for the main effects of two single nucleotide polymorphisms (SNPs) and also the two-way interaction between the two SNPs. The interaction effect can be assessed on the additive scale or the multiplicative scale---the former examines the effect on the phenotype on the linear scale, and the latter examines the effect on the log scale of the phenotype. The additive scale is often more relevant to public health importance [18], while the multiplicative scale more naturally corresponds to the biological mechanisms [19]. Regardless of scale, one can test whether the interaction parameter is different from zero or, alternatively, consider a test similar to that of Chapman and Clayton [20] to assess the effect of a SNP in the presence of interaction with a second SNP. While analysts primarily applied such SNP-SNP interaction tests in small-scale candidate-gene studies, there has also been keen interest in performing exhaustive interaction testing of SNPs in a GWAS [21]. One can perform such genomewide analyses using standard tools like regression [6]. However, one can also apply more innovative techniques, such as two-stage screening procedures [22], Bayes networks [23], Bayesian model averaging [24], logic regression [25] and data-mining procedures [26-29].

All of these existing interaction methods consider the analysis on the level of a SNP. However, there is increasing interest in performing such analyses on the broader level of a gene [30-32]. Several factors motivate the paradigm shift from SNP to gene. First, genes are the basic units in the biological mechanism and SNPs within a gene tend to work concordantly. Thus, gene-level results may be more biologically insightful and easier to interpret. Second, a gene-level analysis incorporates linkage disequilibrium (LD) information from all SNPs simultaneously within the gene. Consequently, such joint analysis of SNPs should have improved ability to tag untyped causal variants compared to the analysis of individual SNPs, leading to improved power. Finally, if a gene harbors multiple causal variants, then joint analysis of SNPs in aggregate should be more powerful than separate analysis of each individual SNP (owing in part to the gene-based test often having less degrees of freedom than its individual-SNP counterpart).

A few gene-based methods for interaction testing exist. Chatterjee et al. [33] proposed Tukey’s 1-df method to investigate an interaction between two candidate genes. The approach calculates the sum of the main effects of SNPs contained within each gene and then uses the product of the two sums as the GxG interaction term. Thus, the approach models one interaction parameter at the gene level rather than several interaction parameters on the SNP level. Owing to the reduced degrees of freedom of the Tukey test, 5 the authors showed their approach lead to a great improvement in power to detect GxG compared to individual-SNP analysis. Motivated by this idea, Wang et al. [32] considered two different interaction tests that summarized SNP information within a gene. For the first test, the authors used principal component analysis (PCA) to summarize LD information of SNPs within a gene. They then created an interaction test using the top (first) principal component from each gene. For their second test, the authors applied partial least squares (PLS) to extract components that summarize both the LD information among SNPs in a gene as well as the correlation between such SNPs and the outcome of interest. The authors then constructed their interaction test using the top PLS component from each gene. Using simulated data, the PCA and PLS methods often had better performance than the Tukey 1-df method, particularly when causal SNPs had no or negligible marginal effect.

In this article, we propose a new gene-based test for detecting gene-gene interactions in complex traits using similarity regression (SimReg) [34]. SimReg is an analytic procedure that uses a regression model to correlate trait similarity with genotypic similarity across a gene. SimReg is inspired by Haseman-Elston regression from linkage analysis [35, 36] and haplotype similarity tests for regional association [37, 38]. In SimReg, the trait similarity is quantified by the trait covariance adjusting for the covariates. The multi-marker information of a gene is first summarized by genetic similarity, which is measured using a pre-specified metric such as the proportion of alleles shared identity by state (IBS) across the gene. The gene-gene interaction is then modeled by taking the product of the genetic similarities of the two genes, and its significance is assessed by testing the significance of the corresponding regression coefficient. Compared to the Tukey/PCA/PLS based methods, we believe SimReg has several advantages. First, SimReg utilizes all the information within a gene while the PCA/PLS procedures may lose some information due to consideration of only the top component of each gene within the interaction analysis. Second, as the top PCA/PLS component from each gene is a weighted sum of SNP information, the use of the product of the two linear combinations as the interaction term in the model only captures limited forms of non-additive effect. In contrast, SimReg provides a tool to model a variety of effects through different measures of summarized gene-similarity information. At the same time, SimReg does not involve a large number of parameters so that it has a good power performance. Finally, as we show later, we can obtain an analytic p-value for the SimReg procedure without the need for intensive resampling procedures like those required for the Tukey approach.

We arrange the remainder of the paper as follows. We first present the SimReg approach and describe how to use the framework to construct three score tests of interest (interaction test, joint test and conditional main effect test). We derive the distributions of these tests under the null hypothesis by connecting the approach to a variance component model. We use simulated data to example the performance of the proposed method against the PCA and PLS methods, as well as a standard SNP-based test of interaction. We also apply SimReg to the Warfarin data from the study of Wysowski et al. [39]. Finally, we discuss the further research directions of the proposed approach.

Method

The Gene-Trait Similarity Model

We assume a sample of N subjects. For subject i (i = 1, …, N), we denote Yi as the trait value, Xi a K × 1 vector of covariates (including the intercept term), and Gm,i a coding of the subject’s genotype at SNP m.

Define SijA to be the genetic similarity of gene A between subjects i and j (ij). There are many ways to describe the genetic similarity between individuals. Here, we model similarity as the weighted sum of the proportion of alleles shared identical by state (IBS) across the MA SNPs in gene A. That is, for subjects i and j, the similarity level is SijA=m=1MAwmsm,ij where sm,ij = x∕2 if Gm,i and Gm,j share x alleles IBS [40, 41], and the weight, wm, can be used to up-weight or down-weight a variant based on allele frequencies, the degree of evolutionary conservation, or the functionality of the variations [40, 42, 43]. In this work, to up-weight similarities that are contributed by rare alleles, we set wm=qm1, where qm is the minor allele frequency of marker m [44, 45]. We can define the similarity level in gene B SijB in the same manner. The trait similarity between individual i and j, denoted by Zij, is computed by Zij = (Yiμi)(Yjμj), where μi = E(YiXi) = Xiγ, which is the conditional mean of trait with no genotype effect and γ is the effect of the covariates. The gene-trait similarity regression for gene-gene interactions

E(Zij|X,G)=τASijA+τBSijB+τABSijASijB. (1)

Note that the regression model has zero intercept because we incorporated the covariate effects when quantifying trait similarity [34].

The Interaction Test

The interaction test examines the hypothesis H0,Int: τAB = 0. We derive the score test of this hypothesis from model (1) by taking advantage of the connection between the similarity model and a variance component model [34]. Below, we first show the regression coefficients in (1) can be viewed as the variance components under a mixed model. We then construct the test of τ AB using the same test from the corresponding variance component model. Specifically, consider the following working mixed model:

Yi=Xiγ+gA,i+gB,i+gAB,i+ei, (2)

where ei ~ N(0, σ), and gA,i, gB,i and gAB,i are subject- specified genetic effects for gene A, gene B and gene-gene interactions, respectively. Let gA = [gA,1,⋯, gA,n], gB = [gB,1,⋯, gB,n], and gAB = [gAB,1,⋯, gAB,n], and and assume that

gAMN(0,vASA),gBMN(0,vBSB),gABMN(0,vABSAB),

where SA={SijA}, SB={SijB}, and SAB={SijA×SijB}. Under model (2), we can obtain the marginal trait covariance by

cov(Yi,Yj|X,G)=covg.{E(Yi|X,G,gA,gB,gAB),E(Yi|X,G,gA,gB,gAB)}=covg.{Xiγ+gA,i+gB,i+gAB,i,Xjγ+gA,j+gB,j+gAB,j}=vASijA+vBSijB+vABSijASijB. (3)

Comparing (3) with (1), we have τA = vA, τB = vB, and τAB = vAB.

We derive the score function of the REML log-likelihood function of (2) in the Appendix A, and obtain the score statistic under H0,Int as

TInt=12YPIntSABPIntY|τA=τ^A,τB=τ^B,σ=σ^,

where = (Y1,…, Yn), PInt=VInt1VInt1X(X1VInt1X)1XVInt1, VInt = τASA + τ BSB + σI and (τ̂A, τ̂B, σ̂) are the maximum REML estimates obtained under H0,Int: τAB = 0. We describe an EM algorithm to obtain (τ̂A, τ̂B, σ̂) in Appendix B.

Under the alternative hypothesis τAB ≠ 0, TInt is a strictly increasing function of τAB. Therefore larger values of TInt provide stronger evidence against H0,Int. This suggests that the testing procedure should be one sided. As shown in the Appendix A, the distribution of TInt follows a weighted χ2 distribution. That is, define CInt=12VInt1/2PIntSABPIntVInt1/2, and then TInt~j=1cλj,Intχ12, where λ j,Int is the ordered none zero eigenvalues of matrix CInt. We can calculate the p-values analytically using moment-matching approximations [46].

The Joint Test

Instead of performing a test specifically for interactions, one may be interested in assessing whether the two genes have any marginal or interactive effects on the trait [47]. This can be done using the joint test to examine the null hypothesis H0,Joint: τA = τB = τAB = 0 under the full model E(Zij|X,G)=τASijA+τBSijB+τABSijASijB. As shown in Appendix A, the test statistic is given as

TJoint=12YPJoint(SA+SB+SAB)PJointY|σ=σ,

where PJoint = σ−1{IX(XX)−1X′} and σ̆ = Y′{IX(XX)−1X′}Y/(nK). The distribution of TJoint also follows a weighted χ2 distribution, i.e., TJoint~j=1cλj,jointχ12 with λj,Joint the ordered nonzero eigenvalues of CJoint=12VJoint12PJoint(SA+SB+SAB)PJointVJoint12. As with the interaction test, we calculate the p-value of the joint test analytically using moment-matching procedures.

The Conditional Main Effect Test

We also construct the score test for the main effect of a gene conditioning on the effect of the other gene, assuming no gene-gene interactions. This is because in scenarios where the interaction effects do exist, the main effects are typically not well defined, and its significance depends on the scale of the interacting variables. When there is no gene-gene interaction, the full model is

E(Zij|X,G)=τASijA+τBSijB

Here to test the main effect of Gene A accounting for the effect of Gene B, we evaluate the null hypothesis H0,A: τA = 0. We similarly can test the main effect of Gene B accounting for the effect of Gene A by evaluating H0,B: τB = 0. Similar to the score tests in the interaction test and joint test, we derive the score test statistic for conditional main effect of Gene A conditional on Gene B as:

TA=12YPASAPAY|τB=τB,σ=σ,

where PA=VA1VA1X(X1VA1X)1XVA1, VA = τBSB + σI, and (τ̃B, σ̃) are the maximum REML estimates obtained under H0,A: τA = 0. We describe the EM algorithms to obtain (τ̃B, σ̃) in Appendix B. The test statistic TB can be defined similarity for examining H0,B: τB = 0 under the constrain of τAB = 0. As shown in Appendix A, TA has the same distribution as j=1cλj,Aχ12 with λj,A the ordered nonzero eigenvalues of CA=12VA1/2PASAPAVA1/2. As with the interaction and joint tests, we analytically derive the p-value of the test statistic using moment-matching procedures.

Simulation study

Design

We study the performance of the proposed methods using simulated data, and benchmark them against 3 approaches: (1) LR: linear regression; (2) PCA: the Principal Component method of Wang, et al, [32] and (3) PLS: the Partial Least-Square of Wang et al, [32]. The LR incorporates all SNPs from each of the 2 genes as well all pairwise interactions of SNPs across genes. When performing conditional main effect testing, we exclude the interaction terms from the LR analysis. We also only consider the proposed test and LR for the conditional main effect test, since PCA and PLS are identical to LR when there is no interaction term.

To simulate genotype data with realistic LD patterns, we use genotype data of Gene RBJ (8 SNPs) and Gene GPRC5B (15 SNPs) of the Phase III CEU samples downloaded from the International HapMap Project (http://hapmap.ncbi.nlm.nih.gov/). We show the LD structure of the two genes in Supplementary Figure 1 and the MAF and the average R2 of each SNP in Supplementary Table 1. The average R2 is calculated by averaging the R2 between the target SNP and the remaining SNPs in the same gene. We consider two causal SNPs from each gene and simulate the trait values based on the model listed below.

Y=βA×(SNP1A+SNP2A+SNP1A×SNP2A)+βB×(SNP1B+SNP2B+SNP1B×SNP2B)+βAB×(SNP1A+SNP1B+SNP2A×SNP2B)+e, (4)

where SNP1A and SNP2A are the number of the minor alleles carried by a subject at the first and second causal loci in Gene A; SNP1B and SNP2B are defined similarity; e follows a normally-distributed variable with mean 0 and variance 1. The trait value Y is defined based on 3 parts: the gene effect from Gene A, the gene effect from Gene B and the interaction effect between Gene A and Gene B. We assume no LD between the genes RBJ and GPRC5B and consequently sample the genotypes of the former gene independent of the genotypes for the latter gene.

To evaluate the type-I error rate of the interaction test, we considered three situations of no gene-gene interaction effects: (1) both genes have no genetic effect (βA = βB = βAB = 0), (2) only Gene A has a main effect to the trait (βA ≠ 0, βB = βAB = 0), and (3) both Gene A and Gene B have main effect to the trait (βA ≠ 0, βB ≠ 0, βAB = 0). To evaluate type-I error rate for the joint test, we set βA = βB = βAB = 0. To evaluate type-I error for the conditional main-effect test, we examined size strictly for Gene B (H0: τB = 0). We considered two scenarios of no effect of Gene B: (1) both Gene A and Gene B have no main effect (βA = βB = βAB = 0) and (2) Gene A has a main effect but not Gene B (βA ≠ 0, βB = 0, βAB = 0). For each scenario, we simulated datasets comprised of 300 subjects and evaluated type-I error rates using 1000 replicates of the data.

For power analysis, we consider three different scenarios based on the causal SNPs. Specifically, we pick up “representative” SNPs from SNPs with similar LD and MAF patterns and form 3 different causal SNP combinations (Table 1). When generating trait values using the model above, we consider different β′s value so that power of different methods at a significance threshold of 5% are between 20%~80%. The power was calculated based on 200 replications.

Table 1.

The causal SNPs for Gene A and Gene B for the three scenarios considered in power analysis

Scenario
SNP1A
SNP2A
SNP1B
SNP2B

avgR2 MAF avgR2 MAF avgR2 MAF avgR2 MAF
1 0.58 0.12 0.13 0.48 0.26 0.46 0.23 0.48
2 0.57 0.12 0.16 0.06 0.14 0.16 0.23 0.48
3 0.58 0.12 0.16 0.06 0.07 0.04 0.14 0.16

Results

Table 2 shows empirical type I error rates for all the methods at a significance level of α = 0.05. The type I error rates of all approaches are around the nominal level in all different settings. For the interaction test, the type I error rates of the proposed method are slightly conservative but we observed that, as the variance due to the main effects of Gene A and/or Gene B increase, the type-I error rate generally approaches the nominal level. As discussed in the appendix, the conservative nature of the interaction test arises because of the bias in the EM-algorithm estimates of the variance components when their true values are 0. The type I error for interaction test using the competing methods (PCA, PLS, LR) are generally appropriate. For joint and conditional-main testing, we observed the type-I error rates of all methods considered in our analyses to be quite similar to the nominal levels.

Table 2.

The type I error rate at significance level 0.05 for 3 tests: interaction test, joint test and conditional main effect test

(Effect Size) Test (βA, βA, βAB)

Interaction Test Joint Test Conditional Test (B ∣ A)


Method (0,0,0) (0.1,0,0) (0.1,0.1,0) (0,0,0) (0,0,0) (0.3,0,0)
SimReg 0.038 0.04 0.046 0.049 0.048 0.052
LR 0.060 0.062 0.064 0.060 0.060 0.056
PCA 0.056 0.06 0.058 0.055
PLS 0.046 0.048 0.049 0.054

We next performed power analysis under the models considered in Table 1. Because different Scenario has different causal SNP combination, we adjust different β′s so that the power of different methods are around 0.2~0.8. The results are summarized in Table 3.

Table 3.

The power analysis for different approaches at significance level 0.05 under the scenarios listed in Table 1. In each scenario, the effect size (βA, βB and βAB) are set so that the power range between 20% and 80%. The shaded values indicate the method with the largest power under a certain scenario, and the bolded values indicate methods with overlapping confidence intervals with the best methods.

Interaction test
Method (βA, βB, βAB)
S1 (0.1,0.1,0.3) S2 (0.1,0.1,0.5) S3 (0.1,0.1,1.5)
SimReg 0.690 (0.626, 0.754)* 0.460 (0.391, 0.529) graphic file with name nihms571225t1.jpg (0.761, 0.869)
LR 0.375 (0.308, 0.442) 0.285 (0.222, 0.348) 0.760 (0.701, 0.819)
PCA 0.715 (0.652, 0.778) 0.350 (0.284, 0.416) 0.120 (0.075, 0.165)
PLS graphic file with name nihms571225t2.jpg (0.685, 0.805) graphic file with name nihms571225t3.jpg (0.421, 0.559) 0.470 (0.401, 0.539)
Joint test
Method (βA, βB, βAB)
S1(0.05,0.05,0.05) S2(0.1,0.1,0.1) S3(0.3,0.3,0.3)
SimReg graphic file with name nihms571225t4.jpg (0.605, 0.735)* graphic file with name nihms571225t5.jpg (0.829, 0.921) graphic file with name nihms571225t6.jpg (0.512, 0.648)
LR 0.325 (0.260, 0.390) 0.690 (0.626, 0.754) 0.325 (0.260, 0.390)
PCA 0.535 (0.466, 0.604) 0.765 (0.706, 0.824) 0.500 (0.431, 0.569)
PLS 0.545 (0.476, 0.614) 0.785 (0.728, 0.842) 0.505 (0.436, 0.574)
Conditional main effect test
Method (βA, βB, βAB)
S1(0.07,0.07,0) S2 (0.1,0.1,0) S3(0.3,0.3,0)
SimReg graphic file with name nihms571225t7.jpg (0.584, 0.716)* graphic file with name nihms571225t8.jpg (0.621, 0.749) 0.495 (0.426, 0.564)
LR 0.415 (0.347, 0.483) 0.475 (0.406, 0.544) 0.485 (0.416, 0.554)
*

95% confidence interval of the power, obtained by p^±1.96×p^×(1p^)/200 with the empirical power estimated based on 200 replications.

Overall, we observed the proposed method generally has optimal power relative to the other methods considered under various genetic architectures of the causal SNPs For the interaction test, Scenario 1 has the causal SNPs with relatively high LD and large MAF. PCA and PLS perform the best, closely followed by the proposed method. LR has the least power due to the large degree of freedom used. In Scenario 2 where the causal SNP had a relatively smaller LD and MAF, PLS still has the best power, closely followed by the proposed method. The power of PCA drops a lot in this scenario. LR again has the least power. In Scenario 3, the proposed method has the best power, followed by LR, PLS and PCA. PCA and PLS have a significant low power performance in Scenario 3. As the LD and MAF become smaller, the power of PCA drops dramatically, because the first PC can only capture limit amount of information on the causal SNPs. The first PC aims to maximize the SNP variation captured and tends to be dominated by SNPs with high LD or common MAF, which are non-causal SNPs in Scenario 3. PLS shares the same issue as PCA and hence it also suffers from power loss when LD and MAF of the causal SNPs become smaller. However PLS also accounts for the trait information, which makes PLS perform better than PCA. The power advantage of PLS over PCA is more substantial in S2 and S3, where an increased proportion of causal SNPs have low MAFs. Consequently, the first principal component of PCA (which explains the largest variance among the genotypes) is less likely to harbor such causal SNPs and therefore have reduced power for interaction testing compared to PLS [32].

For joint test, under three different scenarios, the proposed method performs the best. PCA and PLS perform similarly with a slightly better power of PLS. The observation is consistent with Wang, et al. [32] and is because the first PC from PLS considers LD and correlation between trait and gene. LR has the lowest power, which it may be due to the large degree of freedoms used in the joint test.

For conditional main effect test, the proposed method always has a better power than linear model, but the difference between proposed method and linear model becomes smaller when the LD and MAF of the causal SNPs become smaller. The finding is consistent with the finding in the interaction test.

Real Data Analysis

Warfarin is a widely used oral anticoagulant. In 2004, more than 30 million prescriptions contained this drug in United State [39]. The optimal dose of warfarin is different from patient to patient, and an inappropriate dosage can lead to severe consequence such as bleeding, swelling of face, throat. Extensive research has been conducted to develop methods for predicting the appropriate dose.

We have conducted a genetic analysis using the data from the Warfarin study [48]. In this data set, we studied the relationship between stable warfarin dose and 2 genes: VKORC1 (containing 7 SNPs) and CYP2C9 (a tri-allelic locus). We further adjusted for 4 covariates associated with warfarin therapy: age, sex, height, and weight. After quality control, the dataset consisted of 301 individuals. We applied the proposed method and the benchmark methods to evaluate the association between warfarin dose and the two genes, adjusting for covariates. We summarize the results in Table 4. All methods identified significant association between warfarin dose and the two genes. The smallest p-value of the joint test was obtained from SimReg (i.e., p-value 5.6 × 10−20), closely followed by LR (i.e., p-value 1.92 × 10−16). The p-values by PCA and PLS are similar to each other and are many orders of magnitude larger (i.e., 10−5) than those derived from SimReg and LR. We next examined the interaction effect between VKORC1 and CYP2C9 and observed no significant interaction using any of the methods considered. Finally, we performed the conditional main-effect for each gene using SimReg and LR. The result suggests that both genes had significant effects on warfarin dose, with VKORC1 demonstrating a stronger effect (i.e., p-values on the order of 10−16~10−20) than CYP2C9 (i.e., p-values on the order of 10−7~10−8).

Table 4.

The p-values of the four approaches in the warfarin analysis

Methods Joint test Interaction test Conditional main effect test
VKORC1 CYP2C9
SimReg 5.6 × 10−20 0.450 1.67 × l0−20 1.02 × 10−7
LR 1.92 × 10−16 0.753 1.94 × 10−16 2.65 × 10−8
PCA 8.07 × 10−5 0.530 - -
PLS 7.43 × 10−5 0.470 - -

The main effect results were consistent with current literature [49-51]. Previous studies indicate contradictory results regarding the interaction between the VKORC1 and CYP2C9 on anticoagulant effect and hence the dose requirement [52-56]. Our results agreed with recent reports [55, 56], which suggest no evidence of a gene-gene interaction between CYP2C9 and VKORC1.

Discussion

In this article, we described a novel similarity-based test to assess gene-gene interactions in large-scale association studies. Unlike the majority of existing procedures in this area, our approach considers the analysis at the level of the gene rather than the SNP, which we show can lead to considerable improvements in power as a result. We also compared the performance of our similarity approach to existing gene-based interaction tests (PCA/PLS) and showed our joint/interaction/conditional mean tests nearly always had optimal power across simulation models. Relative to PCA/PLS approaches, we believe our approach can improve power since it summarizes all information on genotypic variation within a gene. PCA and PLS, on the other hand, only use a subset of this variation within their analytic frameworks. The R code that implements the SimReg testing procedure is available from the authors’ website (http://www4.stat.ncsu.edu/~jytzeng/Software/SimReg-GxG-QT/). Using a personal computer equipped with Intel Core i7-3770 CPU 40GHz and 20GB RAM, 200 runs of the interaction test (which is the slowest test because it requires estimating two nuisance variance components) with the simulated datasets required 767 seconds. While the approach is not scalable to genomewide interaction testing using a single processor, we believe the procedure can be applied to genomewide data in reasonable time using a parallel-computing cluster that possesses a large number (1000) of CPUs.

In the SimReg model, we define the similarity function for two subjects as the weighted average of SNP alleles shared IBS by the pair across a gene. The weights are SNP specific and are user defined. It is common to weight SNPs based on minor-allele frequency such that rarer variants are assigned more weight than common variants; common weights in this setting include the reciprocal of a SNPs minor-allele frequency or its square root [40, 41]. Additionally, we can weight SNPs using functional information; perhaps providing more weight to SNPs that demonstrate evidence of being a cis-acting expression quantitative-trait locus [57]. While we considered only common SNP variants in this work, we note that our similarity-regression approach also can integrate information from rare variants ascertained using next-generation sequencing technology. While weighting rare variants by minor-allele frequency has obvious value, additional weighting of such variants using some measure that predicts the probability the variant is deleterious and lowers fitness likely has value as well. A variety of computational algorithms exists for such prediction based on evolutionary, biochemical, and/or structural information (see [58] for an overview).

We applied our similarity approach in the context of a candidate-gene study but we can further expand the approach to consider interactions on a genomewide scale. A gene-based interaction test has appealing features over SNP-based interaction tests when applied to a genomewide association study (GWAS). Given the number of genes is many fold smaller than the typical number of GWAS SNPs, the number of tests that need to be evaluated using similarity regression would be substantially smaller than a SNP-based interaction test like LR. This could substantially reduce the computational burden. Additionally, the issue of adjusting for multiple testing when conducting genomewide interaction testing of SNPs is challenging. Within GWAS, most studies adjust for multiple testing using permutations since a Bonferroni correction leads to conservative inference due to LD among SNPs. However, while permutations are valid for multiple-testing adjustment in tests of main effects, they are not valid when applied to tests of interactions because such random shuffling of phenotypes does not preserve the observed main effects in the sample [59]. In contrast, multiple-testing adjustment of interaction tests based on similarity regression is more straightforward. While application of gene-based similarity regression to GWAS data will lead to correlated tests (due to repeated testing of the same gene) such that a Bonferroni correction is inappropriate, we can adjust for multiple testing using a computationally-efficient perturbation procedure similar to that proposed by Wu et al. [60] that preserves the main effects of the genes under consideration. We will explore this work in a future manuscript.

Supplementary Material

1

Acknowledgments

The authors thank the International Warfarin Pharmacogenetics Consortium and the PharmGKB resources for supplying the Warfarin data. They also thank Dr. Daowen Zhang for his helpful discussions and suggestions on the work. This work was supported by NIH grant R01 HG007508, R01 MH074027, R01 MH084022 and P01 CA142538.

Appendix A. Derivation of the score tests and their distributions

Consider the matrix presentation of model (2):

Y=Xγ+gA+gB+gAB+e.

The corresponding REML log-likelihood function, denoted as L(τA, τA, τAB, σ), is:

L(θ)=12[log|V|+log|X1VX|+YPY],

where V = Var(Y) = τASA + τBSB + τABSAB + σI is the marginal variance of Y and P = V−1V−1X(X−1V−1X)−1XV−1 is the projection matrix for the model. The score functions of τA, τB, and τAB based on L(θ) are

UτA(τA,τB,τAB,σ)=L(τA,τB,τAB,σ)τA=12[YPSAPYtr(PSA)],UτB(τA,τB,τAB,σ)=L(τA,τB,τAB,σ)τB=12[YPSBPYtr(PSB)],andUτAB(τA,τB,τAB,σ)=L(τA,τB,τAB,σ)τAB=12[YPSABPYtr(PSAB)].

We construct the statistics based on the first terms of the score functions. Here after we define matrices Ph and Vh as P and V evaluated under H0,h. Then for the interaction test H0,Int: τAB = 0, we set the test statistic as

TInt=12YPIntSABPIntY|τA=τ^A,τB=τ^B,σ=σ^,

where the PInt=VInt1VInt1X(X1VInt1X)1XVInt1, VInt = τASA + τBSB + σI, and (τ̂A, τ̂B, σ̂) are the maximum REML estimates obtained under H0,Int: τAB = 0.

For the conditional main effect test H0,A: τA = 0 under the constrain of no interaction (i.e., τAB = 0), we set the test statistic as

TA=12YPASAPAY|τB=τ^B,σ=σ,

where PA=VA1VA1X(X1VA1X)1XVA1, VA = τBSB + σI, and (τ̃B,σ̃) are the maximum REML estimates obtained under H0,A: τA = 0 The test statistic TB can be defined similarily for examining H0,B: τB = 0 under the constrain of τAB = 0 We describe the EM algorithms that we use to obtain (τ̂A, τ̂B, σ̂) and (τ̃B, σ̃) and in Appendix B.

For the joint test H0,Joint: τA = τB = τAB = 0, because τ’s are non-negative variance components, τA = τB = τAB = 0 if and only if τA + τB + τAB = 0. This motivates us to construct the test statistic based on the sum of the three score functions. That is,

TJoint=12YPJoint(SA+SB+SAB)PJointY|σ=σ,

where the PJoint=VJoint1VJoint1X(X1VJoint1X)1XVJoint1, VJoint = σI and σ̆ = Y′{IX(XX)−1X′}Y/(nK).

The distributions of the test statistics can be shown to follow a weighted chi-squared distribution via the fact that these statistics are quadratic form of Y. To illustrate, consider TInt=12YPIntSABPIntY. Because PInt is a projection matrix, PIntXγ = 0. Therefore,

TInt=12YPIntSABPIntY=12(YXγ)PIntSABPInt(YXγ)=12(YXγ)VInt1/2VInt1/2PIntSABPIntVInt1/2VInt1/2(YXγ)i=1cλiχ12

Define z=VInt1/2(YXγ) and CInt=12VInt1/2PIntSABPIntVInt1/2, we have TInt=zCIntzj=1cλj,Intχ12, where λj,Int is the ordered none zero eigenvalues of matrix CInt. By the same mannar, one can obtain that TAj=1cλj,Aχ12 with λj,A the ordered nonzero eigenvalues of CA=12VA1/2PASAPAVA1/2, and Tjointj=1cλj,jointχ12 with λj,Joint the ordered nonzero eigenvalues of CJoint=12VJoint1/2PJoint(SA+SB+SAB)PJointVJoint1/2.

Appendix B. The EM algorithm to obtain the maximum REML estimates

We first describe the EM algorithm for (τ̂A, τ̂B, σ̂), i.e., the maximum REML estimates under H0,Int: τAB = 0. Under H0,Int, the LMM is

Y=Xγ+gA+gB+e,

where e~N(0, σI), gA~N(0, τASA) and gB~N(0, τBSB) as τA = vA and τB = vB. Define U = ATY with the restriction that AT A = In−k and AAT = IX(XX)−1X. Then UgA, gB ~N(A′gA + A′B,σIn−K), which is independent of the fixed effect γ̂ = (XTX)−1XTY. Therefore the maximum REML estimates can be obtained by maximizing the marginal distribution of U, i.e., f(U) = ∫ f(UgA,gB)f(gA)f(gB)dgAdgB. This motivated an expectation-maximization algorithm based on U (i.e., the observed data) and (gA, gB) (i.e., the missing data). The complete-data log likelihood is based on f (U, gA, gB) is

logf(U,gA,gB;τA,τB,σ)=logf(U|gA,gB;τA,τB,σ)+logf(gA;τA,τB,σ)+logf(gB;τA,τB,σ)=nK2logσ12σ(UAgAAgB)(UAgAAgB)qA2logτA12log(|SA|+)12τAgATSAgAqB2logτB12log(|SB|+)12τBgBTSBgB

where qA and qB are the rank for matrix SA and SB respectively, |SA|+ is the pseudo-determinant, and SA is the generalized inverse (as SA and SB may be singular).

In the expectation step, we compute

Q(τA,τB,σ;τ^A(t),τ^B(t),σ^(t))E{logf(U,gA,gB;τA,τB,σ)|U;τ^A(t),τ^B(t),σ^(t)}=nK2logσ12σE{(UAgAAgB)(UAgAAgB)|U;τ^A(t),τ^B(t),σ^(t)}qA2logτA12log(|SA|+)12τAE(gATSAgA|U;τ^A(t),τ^B(t),σ^(t))qB2logτB12log(|SB|+)12τBE(gBTSBgB|U;τ^A(t),τ^B(t),σ^(t)).

In the maximization step, we solve for ∂Q/∂τA = 0, ∂Q/∂τB = 0 and ∂Q/∂σ = 0, and obtain

τ^A=1qAE(gATSAgA|U;τA(t),τB(t),σ(t))=1qA{gA(t)SAgA(t)+tr(SAvA(t))},

where gA(t)=E(gA|gB,U;τA(t),τB(t),σ(t))=τASAPInt|τA(t),τB(t),σ(t), and vA(t)=Var(gA|gB,U;τA(t),τB(t),σ(t))=τASAτA2SAPIntSA|τA(t),τB(t),σ(t). Similarly, we have

τB=1qBE(gBTSBgB|U;τA(t),τB(t),σ(t))=1qB{gB(T)SBgB(t)+tr(SBvB(t))}

Finally,

σ(t)=1nKE{(UAgAAgB)(UAgAAgB)|U;τA(t),τB(t),σ(t)}=YTAAY+tr[AA(τA(t)SA(τA(t))2SAPSA+τB(t)SB(τB(t))2SBPSB2τA(t)τB(t)SAPSB)]

The EM algorithm for obtaining (τ̃B, σ̃) under H0,A: τA = 0 is similar to the above algorithm except that τA is set to be 0.

When applying the above EM algorithm, we add an additional testing step as detailed below. Because the EM algorithm provides non-negative estimates, when τA and τB are 0 or close to 0, the EM estimates obtained from the above algorithm can be biased as E(τA)>τA and E(τB)>τB. To solve this problem, we first apply conditional main effect tests for H0: τA = 0 and H0: τB = 0 to examine if these nuisance variance components are significantly different from 0. If we fail to reject the null hypotheses, then we set the corresponding τ̂’s as 0. If τ for a gene is significantly different from 0, then we obtain its estimate using the EM algorithm described above. By applying this additional step, E(τ̂) would be closer to 0 when the τ is 0 or close to 0. When τ is relatively large, the conditional main effect test would reject H0: τ = 0 and the final estimate is the same as the estimate from the original EM algorithm.

References

  • 1.Moore JH. The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Human Heredity. 2003;56:73–82. doi: 10.1159/000073735. [DOI] [PubMed] [Google Scholar]
  • 2.Carlborg O, Haley CS. Epistasis: Too often neglected in complex trait studies? Nature Reviews Genetics. 2004;5:618–62. doi: 10.1038/nrg1407. [DOI] [PubMed] [Google Scholar]
  • 3.Greenspan RJ. The flexible genome. Nature Reviews Genetics. 2001;2:383–387. doi: 10.1038/35072018. [DOI] [PubMed] [Google Scholar]
  • 4.Phenix H, Perkins T, Kærn M. Identifiability and inference of pathway motifs by epistasis analysis. Chaos. doi: 10.1063/1.4807483. [DOI] [PubMed] [Google Scholar]
  • 5.Barkoulas M, Zon JS, Milloz J, Oudenaarden A, Félix MA. Robustness and epistasis in the C. elegans vulval signaling network revealed by pathway dosage modulation. Dev Cell. 2013;24(1):64–75. doi: 10.1016/j.devcel.2012.12.001. [DOI] [PubMed] [Google Scholar]
  • 6.Marchini J, Donnelly P, Cardon LR. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat Genet. 2005;37(4):413–417. doi: 10.1038/ng1537. [DOI] [PubMed] [Google Scholar]
  • 7.Evans DM, Marchini J, Morris AP, Cardon LR. Two-Stage Two-Locus Models in Genome-Wide Association. PLoS Genetics. 2006;2(9):e157. doi: 10.1371/journal.pgen.0020157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proceedings of the National Academy of Sciences. 2012;109:1193–1198. doi: 10.1073/pnas.1119675109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ioannidis JP. Non-replication and inconsistency in the genome-wide association setting. Hum Hered. 2007;64:203–213. doi: 10.1159/000103512. [DOI] [PubMed] [Google Scholar]
  • 10.Greene CS, Penrod NM, Williams SM, Moore JH. Failure to replicate a genetic association may provide important clues about genetic architecture. PLoS One. 2009;4:e5639. doi: 10.1371/journal.pone.0005639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lin X, Hamilton-Williams EE, Rainbow DB, Hunter KM, Dai YD, Cheung J, Peterson LB, Wicker LS, Sherman LA. Genetic interactions among Idd3, Idd5.1, Idd5.2, and Idd5.3 protective loci in the nonobese diabetic mouse model of type 1 diabetes. J Immunol. 2013;7:3109–3120. doi: 10.4049/jimmunol.1203422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Pillai R, Waghulde H, Nie Y, Gopalakrishnan K, Kumarasamy S, Farms P, Garrett MR, Atanur SS, Maratou K, Aitman TJ, Joe B. Isolation and high-throughput sequencing of two-closely linked epistatic hypertension susceptibility loci with a panel of bicongenic strains. Physiol Genomics. 2013;45(16):729–36. doi: 10.1152/physiolgenomics.00077.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Koh-Tan HH, McBride MW, McClure JD, Beattie E, Young B, Dominiczak A, Graham D. Interaction between chromosome 2 and 3 regulates pulse pressure in the stroke-prone spontaneously hypertensive rat. Hypertension. 2013;62:33–40. doi: 10.1161/HYPERTENSIONAHA.111.00814. [DOI] [PubMed] [Google Scholar]
  • 14.Howson JM, Cooper JD, Smyth DJ, Walker NM, Stevens H, She JX, Eisenbarth GS, Rewers M, Todd JA, Akolkar B, Concannon P, Erlich HA, Julier C, Morahan G, Nerup J, Nierras C, Pociot F, Rich SS. Type 1 Diabetes Genetics Consortium. Evidence of gene-gene interaction and age-at-diagnosis effects in type 1 diabetes. Diabetes. 2012;11:3012–3017. doi: 10.2337/db11-1694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ziyab AH, Davies GA, Ewart S, Hopkin JM, Schauberger EM, Wills-Karp M, Holloway JW, Arshad SH, Zhang H, Karmaus W. Interactive effect of STAT6 and IL13 gene polymorphisms on eczema status: results from a longitudinal and a cross-sectional study. BMC Med Genet. doi: 10.1186/1471-2350-14-67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ma DQ, Rabionet R, Konidari I, Jaworski J, Cukier HN, Wright HH, Abramson RK, Gilbert JR, Cuccaro ML, Pericak-Vance MA, Martin ER. Association and gene-gene interaction of SLC6A4 and ITGB3 in autism. Am J Med Genet B Neuropsychiatr Genet. 2010;153B:477–483. doi: 10.1002/ajmg.b.31003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bush WS, McCauley JL, Dejager PL, Dudek SM, Hafler DA, Gibson RA, Matthews PM, Kappos L, Naegelin Y, Polman CH, Hauser SL, Oksenberg J, Haines JL, Ritchie MD. A knowledge-driven interaction analysis reveals potential neurodegenerative mechanism of multiple sclerosis susceptibility. Genes Immun. 2011;12:335–340. doi: 10.1038/gene.2011.3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rothman KJ, Greenland S, Walker AM. Concepts of interaction. Am J Epidemiol. 1980;112:467–470. doi: 10.1093/oxfordjournals.aje.a113015. [DOI] [PubMed] [Google Scholar]
  • 19.Siemiatycki J, Thomas DC. Biological models and statistical interactions: an example from multistage carcinogenesis. Int J Epidemiol. 1981;10(4):383–7. doi: 10.1093/ije/10.4.383. [DOI] [PubMed] [Google Scholar]
  • 20.Chapman J, Clayton D. Detecting association using epistatic information. Gen Epid. 2007;31(8):894–909. doi: 10.1002/gepi.20250. [DOI] [PubMed] [Google Scholar]
  • 21.Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009;10(6):392–404. doi: 10.1038/nrg2579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Murcray CE, Lewinger JP, Gauderman JW. Gene-enviorment interaction in genome-wide association studies. Am J Epidemiol. 2009;169:219–26. doi: 10.1093/aje/kwn353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Verzilli CJ, Stallard N, Whittaker JC. Bayesian graphical models for genomewide association studies. The American Journal of Human Genetics. 2006;79:100–112. doi: 10.1086/505313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhang Y, Liu JS. Bayesian inference of epistatic interactions in case–control studies. Nat Genet. 2007;39:1167–1173. doi: 10.1038/ng2110. [DOI] [PubMed] [Google Scholar]
  • 25.Kooperberg C, Ruczinski I. Identifying interacting SNPs using Monte Carlo logic regression. Genet Epidemiol. 2005;28:157–170. doi: 10.1002/gepi.20042. [DOI] [PubMed] [Google Scholar]
  • 26.Zhang H, Bonney G. Use of classification trees for association studies. Genet Epidemiol. 2000;19:323–332. doi: 10.1002/1098-2272(200012)19:4<323::AID-GEPI4>3.0.CO;2-5. [DOI] [PubMed] [Google Scholar]
  • 27.Sherriff A, Ott J. Applications of neural networks for gene finding. Adv Genet. 2001;42:287–297. doi: 10.1016/s0065-2660(01)42029-3. [DOI] [PubMed] [Google Scholar]
  • 28.Lunetta KL, Hayward LB, Segal J, Van Eerdewegh P. Screening large-scale association study data: exploiting interactions using random forests. BMC Genet. 2004;5:32. doi: 10.1186/1471-2156-5-32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Schwarz DF, König IR, Ziegler A. On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data. Bioinformatics. 2011;27(3):439. doi: 10.1093/bioinformatics/btq257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Jorgenson E, Witte JS. A gene-centric approach to genome-wide association studies. Nat Rev Genet. 2006;7(11):885–891. doi: 10.1038/nrg1962. [DOI] [PubMed] [Google Scholar]
  • 31.Neale BM, Sham PC. The future of association studies: gene-based analysis and replication. Am J Hum Genet. 2004;75(3):353–362. doi: 10.1086/423901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wang T, Ho G, Ye K, Strickler H, Elston RC. A partial least-square approach for modeling gene-gene and gene-environment interactions when multiple markers are genotyped. Genet Epidemiol. 2009;33(1):6–15. doi: 10.1002/gepi.20351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Chatterjee N, Kalaylioglu Z, Moslehi R, Peters U, Wacholder S. Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. Am J Hum Genet. 2006;79:1002–1016. doi: 10.1086/509704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Tzeng JY, Zhang D, Chang SM, Thomas DC, Davidian M. Gene-trati similarity regression for multimarker-based association analysis. Biometrics. 2009;65:822–832. doi: 10.1111/j.1541-0420.2008.01176.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Haseman JK, Elston RC. The investigation of linkage between a quantitative trait and a marker locus. Behav Genet. 1972 Mar;2(1):3–19. doi: 10.1007/BF01066731. [DOI] [PubMed] [Google Scholar]
  • 36.Elston RC, Buxbaum S, Jacobs KB, Olson JM. Haseman and Elston revisited. Genet Epidemiol. 2000 Jul;19(1):1–17. doi: 10.1002/1098-2272(200007)19:1<1::AID-GEPI1>3.0.CO;2-E. [DOI] [PubMed] [Google Scholar]
  • 37.Beckmann L, Fischer C, Obreiter M, Rabes M, Chang-Claude J. Haplotype-sharing analysis using Mantel statistics for combined genetic effects. BMC Genet. 2005 Dec 30;6(Suppl 1):S70. doi: 10.1186/1471-2156-6-S1-S70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Tzeng JY, Devlin B, Wasserman L, Roeder K. On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. Am J Hum Genet. 2003 Apr;72(4):891–902. doi: 10.1086/373881. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wysowski D, Nourjah P, Swartz L. Bleeding complications with warfarin use: a prevalent adverse effect resulting in regulatory action. Arch Intern Med. 2007;167:1414–9. doi: 10.1001/archinte.167.13.1414. [DOI] [PubMed] [Google Scholar]
  • 40.Wessel J, Schork NJ. Generalized Genomic Distance–Based Regression Methodology for Multilocus Association Analysis. American Journal of Human Genetics. 2006;79(5):792–806. doi: 10.1086/508346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kwee LC, Liu D, Lin X, Ghosh D, Epstein MP. A powerful and flexible multilocus association test for quantitative traits. American Journal of Human Genetics. 2008;82(2):386–97. doi: 10.1016/j.ajhg.2007.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Schaid DJ. Genomic similarity and kernel methods I: advancements by building on mathematical and statistical foundations. Hum Hered. 2010;70:109–131. doi: 10.1159/000312641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Price AL, Kryukov GV, de Bakker PIW, Purcell SM, Staples J, Wei LJ, Sunyaev SR. Pooled association tests for rare variants in exon-resequenced studies. Am J Hum Genet. 2010;86:832–838. doi: 10.1016/j.ajhg.2010.04.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Tzeng JY, Zhang D, Pongpanich M, Smith C, McCarthy MI, Sale MM, Bradford BW, Hsu FC, Thomas DC, Sullivan PF. Detecting gene and gene-environment effects of common and uncommon variants on quantitative traits: A marker-set approach using gene-trait similarity regression. The American Journal of Human Genetics. 2011;89:277–88. doi: 10.1016/j.ajhg.2011.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Pongpanich M, Neely M, Tzeng JY. On the aggregation of multimarker information for marker-set and sequencing data analysis: genotype collapsing vs. similarity collapsing. Front Genet. 2012;2:110. doi: 10.3389/fgene.2011.00110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Duchesne P, Micheaux PL. Computing the distribution of quadratic forms: Further comparisons between the liu-tang-zhang approximation and exact methods. Computational Statistics and Data Analysis. 2010;54:858–862. [Google Scholar]
  • 47.Kraft P, Yen YC, Stram DO, Morrison J, Gauderman W. Exploiting gene-environment interaction to detect genetic associations. Hum Hered. 2007;63:111–119. doi: 10.1159/000099183. [DOI] [PubMed] [Google Scholar]
  • 48.The International Warfarin Pharmacogenetics Consortium: Estimation of the warfarin dose with clinical and pharmacogenetic data. The New England journal of medicine. 2009;360:753–764. doi: 10.1056/NEJMoa0809329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Yin T, Miyata T. Warfarin dose and the pharmacogenomics of CYP2C9 and VKORC1 - rationale and perspectives. Thromb Res. 2007;120(1):1–10. doi: 10.1016/j.thromres.2006.10.021. [DOI] [PubMed] [Google Scholar]
  • 50.Puehringer H, Loreth RM, Klose G, Schreyer B, Krugluger W, Schneider B, Oberkanins C. VKORC1 -1639G>A and CYP2C9*3 are the major genetic predictors of phenprocoumon dose requirement. Eur J Clin Pharmacol. 2010 Jun;66(6):591–8. doi: 10.1007/s00228-010-0809-2. [DOI] [PubMed] [Google Scholar]
  • 51.Stehle S, Kirchheiner J, Lazar A, Fuhr U. Pharmacogenetics of oral anticoagulants: a basis for dose individualization. Clin Pharmacokinet. 2008;47(9):565–94. doi: 10.2165/00003088-200847090-00002. [DOI] [PubMed] [Google Scholar]
  • 52.Schalekamp T, Brassé BP, Roijers JF, van Meegen E, van der Meer FJ, van Wijk EM, Egberts AC, de Boer A. VKORC1 and CYP2C9 genotypes and phenprocoumon anticoagulation status: interaction between both genotypes affects dose requirement. Clin Pharmacol Ther. 2007 Feb;81(2):185–93. doi: 10.1038/sj.clpt.6100036. [DOI] [PubMed] [Google Scholar]
  • 53.Schalekamp T, Brassé BP, Roijers JF, Chahid Y, van Geest-Daalderop JH, de Vries-Goldschmeding H, van Wijk EM, Egberts AC, de Boer A. VKORC1 and CYP2C9 genotypes and acenocoumarol anticoagulation status: interaction between both genotypes affects overanticoagulation. Clin Pharmacol Ther. 2006 Jul;80(1):13–22. doi: 10.1016/j.clpt.2006.04.006. [DOI] [PubMed] [Google Scholar]
  • 54.Bodin L, Verstuyft C, Tregouet DA, Robert A, Dubert L, Funck-Brentano C, Jaillon P, Beaune P, Laurent-Puig P, Becquemont L, Loriot MA. Cytochrome P450 2C9 (CYP2C9) and vitamin K epoxide reductase (VKORC1) genotypes as determinants of acenocoumarol sensitivity. Blood. 2005 Jul 1;106(1):135–40. doi: 10.1182/blood-2005-01-0341. Epub 2005 Mar 24. [DOI] [PubMed] [Google Scholar]
  • 55.van Schie RM, Babajeff AM, Schalekamp T, Wessels JA, le Cessie S, de Boer A, van der Meer FJ, van Meegen E, Verhoef TI, Rosendaal FR, Maitland-van der Zee AH. EU-PACT study group: An evaluation of gene-gene interaction between the CYP2C9 and VKORC1 genotypes affecting the anticoagulant effect of phenprocoumon and acenocoumarol. J Thromb Haemost. 2012 May;10(5):767–72. doi: 10.1111/j.1538-7836.2012.04694.x. [DOI] [PubMed] [Google Scholar]
  • 56.Cerezo-Manchado JJ, Rosafalco M, Antón AI, Pérez-Andreu V, Garcia-Barberá N, Martinez AB, Corral J, Vicente V, González-Conejero R, Roldán V. Creating a genotype-based dosing algorithm for acenocoumarol steady dose. Thromb Haemost. 2013 Jan;109(1):146–53. doi: 10.1160/TH12-08-0631. [DOI] [PubMed] [Google Scholar]
  • 57.Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 2010;6:e1000888. doi: 10.1371/journal.pgen.1000888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet. 2011;12:628–640. doi: 10.1038/nrg3046. [DOI] [PubMed] [Google Scholar]
  • 59.Buzkova P, Lumley T, Rice K. Permutation and Parametric Bootstrap Tests for Gene–Gene and Gene–Environment Interactions. Ann Hum Genet. 2011;75:36–45. doi: 10.1111/j.1469-1809.2010.00572.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Wu MC, Maity A, Lee S, Simmons EM, Harmon QE, Lin X, Engel SM, Molldrem JJ, Armistead PM. Kernel Machine SNP-Set Testing Under Multiple Candidate Kernels. Genetic Epidemiology. 2013;37:267–275. doi: 10.1002/gepi.21715. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES