Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2010 Feb 22.
Published in final edited form as: Genet Epidemiol. 2009 May;33(4):332–343. doi: 10.1002/gepi.20385

Allelic-Based Gene-Gene Interaction Associated With Quantitative Traits

Jeesun Jung 1,*, Bin Sun 1, Deukwoo Kwon 2, Daniel L Koller 1, Tatiana M Foroud 1
PMCID: PMC2825760  NIHMSID: NIHMS171375  PMID: 19058262

Abstract

Recent studies have shown that quantitative phenotypes may be influenced not only by multiple single nucleotide polymorphisms (SNPs) within a gene but also by the interaction between SNPs at unlinked genes. We propose a new statistical approach that can detect gene-gene interactions at the allelic level which contribute to the phenotypic variation in a quantitative trait. By testing for the association of allelic combinations at multiple unlinked loci with a quantitative trait, we can detect the SNP allelic interaction whether or not it can be detected as a main effect. Our proposed method assigns a score to unrelated subjects according to their allelic combination inferred from observed genotypes at two or more unlinked SNPs, and then tests for the association of the allelic score with a quantitative trait. To investigate the statistical properties of the proposed method, we performed a simulation study to estimate type I error rates and power and demonstrated that this allelic approach achieves greater power than the more commonly used genotypic approach to test for gene-gene interaction. As an example, the proposed method was applied to data obtained as part of a candidate gene study of sodium retention by the kidney. We found that this method detects an interaction between the calcium-sensing receptor gene (CaSR), the chloride channel gene (CLCNKB) and the Na, K, 2Cl cotransporter gene (CLC12A1) that contributes to variation in diastolic blood pressure.

Keywords: quantitative trait loci, allelic test, interaction effect, blood pressure

INTRODUCTION

It has been widely recognized that gene-gene interaction is likely to have an important role in the variation of quantitative traits for complex disease. Most ongoing genetic studies focus on the identification of single gene effects and do not attempt to identify more complex genetic interactions. Unfortunately, it is quite possible that some important genetic effects may be missed because they do not have a main effect that can be identified; rather the effect of the gene may only be observed when modeled as an interaction with the other relevant genes. The failure to replicate significant association results across studies using single locus-based analysis may be due to the failure to include gene-gene interactions in the genetic analyses.

Many statistical methods have been proposed to detect gene-gene interactions contributing to disease-associated quantitative traits. Combinatorial partitioning methods [Nelson et al., 2001], restricted partitioning methods (RPM) and multivariate adaptive regression splines [Cook et al. 2004] have all been proposed to cluster genotype combinations at multiple loci into subgroups to account for the variation caused by the genetic interactions. Moreover, a standard regression method is commonly used to detect multiplicative interaction effects on a quantitative trait.

There are several shortcomings associated with the currently available methods. First, some of these methods are computationally intensive, especially when more than 10 single nucleotide polymorphisms (SNPs) are included in the interaction model [Ritchie et al., 2001]. Second, the results from these methods can be difficult to interpret biologically because there are no meaningful interpretations of the genotypes clustered together and found to affect the quantitative trait. Third, high dimensionality from multi-locus combinations and small sample size can lead to many sparse or empty cells, thereby reducing the power to identify an interaction effect unless the cell is combined with other cells.

In this article, we propose a statistical method to identify the interaction of multiple unlinked genes defined by the nonrandom association of alleles which occurs when a particular allele in one gene and a particular allele in another unlinked gene contribute to the variation in a quantitative trait through their interaction. Our proposed method assigns a score to a subject according to the allelic combinations inferred from multi-locus genotypes and then tests for the association of these allelic scores with a quantitative trait. We evaluate the analytical properties of our test statistic based on non-centrality parameter approximation and additionally perform a simulation study to compare our method with a genotype-based approach which is a standard regression method. The application of our approach to real data demonstrates its utility to detect the interaction of three genes associated with sodium retention by the kidney.

METHODS

Most of the previously described methods start with all 3 × 3 genotypic combinations obtained when analyzing bi-allelic markers such as SNPs. The basic idea is to group genotypic combinations of the two markers into subgroups having similar quantitative trait mean values. The primary difficulty in implementing the cell clustering approach using the nine genotypic combinations is to discriminate main effects and interaction effects from the clustered cell combinations. For instance, the highlighted cells on the 3 × 3 cell genotype table in Figure 1A can be interpreted as demonstrating either main or interaction effects of the two markers. When using more than two loci, the significant cell clusters become even harder to interpret biologically.

Fig. 1.

Fig. 1

Cell combinations of two unlinked markers; M1 has A and a alleles, and M2 has B and b alleles. (A) 3 × 3 combinations of genotypes; (B) 2 × 2 cell combinations of alleles.

We propose to employ SNP alleles rather than SNP genotypes in our test of interaction. As shown in Figure 1B, the AB allelic combination (A allele at M1 marker and B allele at M2 marker) has an increased or decreased trait mean as compared with the other allelic combinations. Linear trend implies that a subject having higher score (probability) at AB allelic combination tends to have a high (or low) quantitative trait value which presents disease susceptibility. Our proposed method can detect any interaction created by any allelic combination at M1 and M2 globally, then can find specific allelic combinations.

ALLELIC SCORE FOR STATISTICAL MODEL

The underlying principle of our method is to identify the association of allelic combination between two unlinked markers with a quantitative trait so that subjects are assigned an allelic score given their observed genotype information. The score is a conditional probability of obtaining the particular allelic combination given the observed genotypes at the two loci of each subject. For example, a subject with AA (at marker M1) and Bb (at marker M2) genotype has 12 in the AB combination and 12 in the Ab combination, XAB=P(ABM1=AA,M2=Bb)=12 and XAb=P(AbM1=AA,M2=Bb)=12. Table I presents the allelic scores of a subject whose genotype is given. The number of allelic combinations at two markers is 22 but the probability of a particular allelic combination is a function of the three remaining probabilities, so the number of independent probabilities is 22−1. The null hypothesis is that there is no association caused by gene-gene interaction; therefore, the mean of the quantitative trait at the four allelic combinations is equal to population mean of trait. Rejecting the null hypothesis would indicate there are particular allelic combinations associated with the quantitative trait. We can further investigate which allelic combinations are interacting to influence the quantitative trait based on our model. In the extension to multiple markers, the assignment of allelic score to each subject is straightforward, i.e. Xc = P(c|M1, …, Mk). The number of independent probabilities associated with the allelic combination from k loci is 2k − 1.

TABLE I.

Allelic scores given genotypes of individual

Allelic score
Genotype AB Ab aB ab
(AA, BB) 1 0 0 0
(AA, Bb)
12
12
0 0
(AA, bb) 0 1 0 0
(Aa, BB)
12
0
12
0
(Aa, Bb)
14
14
14
14
(Aa, bb) 0
12
0
12
(aa, BB) 0 0 1 0
(aa, Bb) 0 0
12
12
(aa, bb) 0 0 0 1

STATISTICAL MODEL AND PROPERTIES

We consider two disease-associated quantitative trait loci (QTL) Q1,Q2 underlying a quantitative trait. Both are in Hardy-Weinberg equilibrium and unlinked, but interacted to influence variation of the trait. Let Q1 and q1 be the two alleles at the first QTL, Q1 with frequencies pQ1 and pQ1, respectively. Let Q2 and q2 be the two alleles at the second QTL, Q2 with frequencies pQ2 and pq2. We consider two genotyped bi-allelic marker loci with marker M1 in linkage disequilibrium (LD) with susceptibility locus Q1 and marker M2 in LD with Q2. Marker M1 has two alleles (A, a) with frequencies pA and pa respectively, and M2 has two alleles (B, b) with frequencies pB and pb respectively. Let LDAQ1 = P(AQ1) − PAPQ1 be the measure of LD between Q1 and M1, and LDBQ2 = P(BQ2) − PBPQ2 be LD measure between QTL Q2 and M2.

Let μmnrs be the mean of a quantitative trait of two genotype combinations m, n = (Q1, q1) and r, s = (Q2, q2) at two loci, Q1,Q2. Let μnm, μrs be the mean of the quantitative trait for genotype m, n = (Q1, q1) at Q1 and r, s = (Q2, q2) at Q2, respectively. The mean of the quantitative trait for two allelic combinations Q1 and Q2 can be calculated by μQ1Q2= μQ1Q1Q2Q2pQ1pQ2 + μQ1Q1Q2q2pQ1pq2 + μQ1q1Q2Q2pq1pQ2 + μQ1q1Q2q2pq1pq2, and μQ1q2, μq1Q2, μq1q2 are calculated in the same way. Let α1 = μQ1Q1 − (μQ1Q1 + μq11q1)/2, δ1 = μQ1q1 − (μQ1Q1 + μq1q1q1)/2 be additive and dominance effect of the first QTL Q1, and α2 = μQ2Q2 − (μQ2Q2 + μq2q2)/2, δ2 = μQ2q2 − (μQ2Q2 + μq2q2)/2 be those of the second QTL Q2. Additive and dominance variances of each locus, σga12=2PQ1Pq1α12,σgd12=(PQ1Pq1)2δ12,σga22=2PQ2Pq2α22,σgd22=(PQ2Pq2)2δ22 can be derived [Falconer and Mackay, 1996]. We assume that dominance variances at each locus are negligible. The total variance is σ2=σ12+σ22+σinteraction2+σe2, where σ12,σ22 are the total variances at each locus, σinteraction2=i=(Q1Q2,,q1q2)(μiμ)2pi is the interaction variance and σe2 is the error variance.

Let yi be the trait value of ith individual with observed genotypes M1 and M2 (i = 1, …, N). Under the assumption of normality of y, the interaction effect of two genes on a trait can be modeled as

yi=wiγ+Xi,ABβAB+Xi,AbβAb+Xi,aBβaB+Xi,abβab+ei, (1)

where Xi,AB = P(AB|M1, M2) is the AB allelic score derived from M1 and M2 genotypes at ith subject and βj∈{AB,Ab,aB,ab} is the interaction effects of jth ∈ {AB, Ab, aB, ab} allelic combination. wi is a covariate such as age or gender and γ is the coefficient of the covariate. We assume ei is the error term following the normal distribution N(0,σe2). Table I presents allelic scores calculated according to the conditional probability, given the genotypes at the two markers.

For simplicity, no covariate variables are considered. The regression coefficients of model (1) is

β=K1[μ(1111)+(μQ1Q2μQ1q2μq1Q2+μq1q2)×(DAQ1DBQ2/PAPBDAQ1DbQ2/PAPbDaQ1DBQ2/PaPBDaQ1DbQ2/PaPb)+(μQ1μq1)×(DAQ1/PADAQ1/PADaQ1/PaDaQ1/Pa)+(μQ2μq2)(DBQ2/PBDbQ2/PbDBQ2/PBDbQ2/Pb)], (2)

where (μQ1 − μq1), (μQ2 − μq2) are the average effect of the gene substitution at each Q1,Q2 respectively [Falconer and Mackay, 1996], (μQ1Q2 − μQ1q2 − μq1Q2 + μq1q2) is the magnitude of interaction effect between two loci. The coefficients are functions of the LD between the marker and the putative QTL and the interaction effect as well as main effects. In Appendix A, we show the detailed derivation of regression coefficients of model (1). Assigning allelic scores with more than two markers is straightforward; for example, with K markers, the allelic score is Xi,C = P(C|M1, M2, …, MK) and the statistical model with 2k of the number of allelic combinations is yi = wiγ + ΣC Xi,CβC + ei with the extension of (1).

TEST STATISTIC AND NON-CENTRALITY PARAMETER APPROXIMATIONS

Let us denote X=(XABτ,,Xabτ), XAB = (X1,AB, …, XN,ab), y = (y1, …, yN)t and β = (βAB, βAb, βaB, βab)t. Based on the null hypothesis (global test) H0: βAB = βAb = βaB = βab = μ, a test matrix H is defined as follows:

H=(110010101001).

The global test statistic for interaction effect is

Fallele=(Hβ^)t[H(XtX)1Ht]1(Hβ^)Yt[INX(XtX)1Xt]YN43, (3)

where IN is the N × N identity matrix and (Hβ)t = (βAB − βAb, βAB − βaB, βAB − βab). The non-centrality parameter λallele = 1/σ2(Hβ)[H(XτX)−1Hτ]−1(Hβ)τ of the above F-statistic (3) is

λalleleNσ2(βABβAb,βABβaB,βABβab)×[HE(XtX)1Ht]1×(βABβAb,βABβaB,βABβab)t,

where the total variance σ2=σ12+σ22+σinteraction2+σe2.

Under the null hypothesis, Fallele test statistic follows F(3, N − 4) with λallele = 0. Under the alternative hypothesis, Fallele is noncentral to F(3, N − 4, λallele) with its non-centrality parameter λallele. Once significant interaction is detected using the global test, one may want to test which specific allelic combinations cause interaction effect on the trait. We propose to test each allelic interaction effect of βC∈{AB,Ab,aB,ab} as follows:

yi=μ+wiγ+Xi,CβC+ei,

which provides H0: βC = 0 with F(1, N − 4). With k multiple markers, the F-test statistic used in the global test H0: β1 = · · · = β2k = μ at all allelic combinations follows F(2k − 1, N − 2k) and F-test statistic of allelic specific-interaction test H0: βC = 0 has F(1, N − 2k).

K=((10.5Pa)(10.5Pb)0.5Pb(10.5Pa)0.5Pa(10.5Pb)0.25PbPa0.5PB(10.5Pa)(10.5Pa)(10.5PB)0.25PaPB0.5Pa(10.5PB)0.5PA(10.5Pb)0.25PAPb(10.5PA)(10.5Pb)0.5Pb(10.5PA)0.25PAPB0.5PA(10.5PB)0.5PB(10.5PA)(10.5PA)(10.5PB)).

We can also derive a likelihood ratio test (LRT) for the global interaction test H0: βAB = · · · = βab = μ, which is 2(LHALH0) following χdf=32. LHA is a likelihood function of maximum likelihood estimates (MLE) β̂AB, β̂Ab, β̂aB, β̂ab under the alternative hypothesis and LH0 is a likelihood function of MLE of μ̂ under the null hypothesis. For each allelic combination test of H0: βC = 0, LRT follows χdf=12. In a similar fashion, k multiple markers interaction can be tested.

RESULTS

TYPE I ERROR RATES

To evaluate the robustness of the proposed model, we performed simulations to examine the type I error rates at the 1% and 5% significance levels. Eight models of interaction between two unlinked QTL were considered (see Table II). Most of the models were designed based on the combination of dominant and recessive inheritance at the genotypic level at each marker. These models are (1) Dominant or Dominant (Dom ∪ Dom), (2) Dominant or Recessive (Dom ∪ Rec), (3) Modified model, (4) Dominant and Dominant (Dom ∩ Dom), (5) Recessive or Recessive (Rec ∪ Rec), (6) Threshold model, (7) Dominant and Recessive (Dom ∩ Rec), (8) Recessive and Recessive (Rec ∩ Rec). For each model, we simulated 5,000 datasets using SNaP [Nothnagel, 2002]. Each dataset has 500 unrelated subjects with two unlinked QTL under no LD between markers and QTLs ( DAQ1=DAQ1/Dmax=0;DBQ2=DBQ2/Dmax=0;RAQ12=RBQ22=0) and disease associated allele frequencies at the two loci are 0.2 (PQ1 = PQ2 = 0.2), and the two markers allele frequencies are PA = PB = 0.3. Quantitative trait values at genotypic combination of two loci were generated from normal distribution with a mean value indicated as a number (0 or 1) in Table II. Standard deviation is 1 for all models. Table III presents the results of the empirical type I error rates of the F-test statistic and LRT calculated from 5,000 datasets at 1% and 5% significance level. It showed that all type I error rates of both test statistics for each model were close to the nominal values 1 and 5% suggesting that the proposed method is statistically robust. For different parameters such as 0.5 disease associated alleles frequencies (PQ1 = PQ2 = 0.5) and the equal allelic frequencies of the two markers (PA = PB = 0.5), all models achieved nominal values of 1% and 5% as in Table III (results not shown).

TABLE II.

The interaction models: 0, 1 stand for a quantitative trait mean given the genotypes

Second locus
Second locus
Models First locus Q2Q2 Q2q2 q2q2 Models First locus Q2Q2 Q2q2 q2q2
Dom ∪ Dom Q1Q1 1 1 1 Rec ∪ Rec Q1Q1 1 1 1
Q1q1 1 1 1 Q1q1 1 0 0
q1q1 1 1 0 q1q1 1 0 0
Dom ∪ Rec Q1Q1 1 1 1 Threshold Q1Q1 1 1 0
Q1q1 1 1 1 Q1q1 1 0 0
q1q1 1 0 0 q1q1 0 0 0
Modified Q1Q1 1 0 1 Dom ∩ Rec Q1Q1 1 0 0
Q1q1 1 0 1 Q1q1 1 0 0
q1q1 0 0 0 q1q1 0 0 0
Dom ∩ Dom Q1Q1 1 1 0 Rec ∩ Rec Q1Q1 1 0 0
Q1q1 1 1 0 Q1q1 0 0 0
q1q1 0 0 0 q1q1 0 0 0

TABLE III.

Result of type I error rate (%) at 1 and 5% significance level from 5,000 datasets, each has 500 subjects

Type I error rates (%) for F
Type I error rates (%) for LRT
Model α = 0.01 α = 0.05 α = 0.01 α = 0.05
Dom ∪ Dom 1.06 4.24 1.12 4.54
Dom ∪ Rec 0.84 4.36 0.94 4.58
Modified 0.74 4.80 0.88 5.02
Dom ∩ Dom 1.06 5.10 1.26 5.48
Rec ∪ Rec 1.16 4.96 1.24 5.10
Threshold 1.00 5.02 1.10 5.22
Dom ∩ Rec 1.14 5.80 1.30 6.14
Rec ∩ Rec 1.30 4.96 1.34 5.62

ANALYTICAL POWER AND SAMPLE SIZE CALCULATION

To further evaluate the performance of the proposed model at the parameters affecting power, we examined the analytical power and the required sample size of the proposed approach based on the non-centrality parameter approximation across a range of each parameter. The power to detect interaction effect was influenced by various parameters such as LD between marker and the putative QTL (DAQ1, DBQ2) and trait loci allele frequencies (pQ1, pQ2), observed marker allele frequencies (pA, pB), and heritability hQ12=σga12/σ12,hQ22=σga22/σ22 of the two putative QTLs. Based on the eight two-loci interaction models, we calculated additive variances at two loci and the total variance for non-centrality parameter λallele to estimate power and the required sample size.

Figure 2 shows the analytical power curves of the test statistic Fallele plotted against LD coefficient DAQ1 with the fixed value DBQ2 = 0.05 at the 1% significance level. Power is calculated under hQ12=hQ22=0.2; PA = Pa = 0.5; PB = Pb = 0.5 and sample size, N = 100. The power of “Dom ∪ Dom” model achieves 100% power over all range of LD. Powers are higher in the order of “Dom ∪ Rec”, “Modified”, “Dom ∩ Dom”, “Rec ∪ Rec”, “Threshold” at DAQ1 −0.05, DAQ1 0.05, whereas at −0.05 < DAQ1 < 0.02, “Dom ∩ Rec” model is more powerful than “Threshold” and “Rec ∩ Rec”. Figure 3 illustrates power curves over the frequency of Q1 allele (pQ1) at Q1 when fixed pQ2 = 0.5 at 1% significance level. All parameters used for Figure 3 are the same as those used in Figure 2 except DAQ1 = {min(PQ1, PA) − PQ1 PA}/2 according to pQ1. Likewise as shown in Figure 2, the power of “Dom ∪ Dom” is the highest over all models at all range of pQ1. “Dom ∪ Rec” has higher power than “Modified”, which in turn has higher power than “Dom ∩ Dom” at pQ1 > 0.2. The patterns of power are very similar to those of powers shown in Figure 2 except “Rec ∪ Rec” whose power is lower than that of all models except “Dom ∩ Rec”. “Threshold” and “Rec ∩ Rec” achieve almost identical power.

Fig. 2.

Fig. 2

Analytical power against DAQ1 when DBQ2 = 0.05 at 1% significance level. Dominant or Dominant model is indicated by Dom ∪ Dom, Dominant or Recessive model by Dom ∪ Rec, Dominant and Dominant model by Dom ∩ Dom, Recessive or Recessive model by Rec ∪ Rec, Dominant and Recessive model by Dom ∩ Rec and Recessive and Recessive model by Rec ∩ Rec. PQ1 = 0.5; PQ2 = 0.5; PA = 0.5; PB = 0.5; heritabilities hQ12=0.2;hQ22=0.2; the sample size N = 100.

Fig. 3.

Fig. 3

Analytical power against frequency of Q1 when PQ2 = 0.5 at 1% significance level. PA = 0.5; PB = 0.5; DAQ1 = {min(PQ1, PA) − PQ1 PA}/2; DBQ2 = 0.05; heritabilities hQ12=0.2;hQ22=0.2; the sample size N = 100. Model notations are the same as in Figure 2.

Figure 4 presents power curves against heritability ( hQ12) of QTL Q1 when hQ22=0.2. Similarly the power of “Dom ∪ Dom” achieved the highest power over all models, and the “Dom ∪ Rec” has the second highest power. The power of all models follows the order stated from the top to the bottom in the legend, except that “Dom ∩ Rec” has the lowest power at hQ12>0.05. The powers for the interaction test with the A allele frequency of M1, pA when pB = 0.5 are shown in Figure 5. As in the previous results, the power of “Dom ∪ Dom” has the highest power, and “Dom ∩ Rec” has the lowest power across all range of pA < 0.5. “Threshold” and “Rec ∩ Rec” have almost the same power.

Fig. 4.

Fig. 4

Analytical power against heritability of Q1 when hQ2= 0.2 at 1% significance level. PQ1= 0.5; PQ2 = 0.5; PA = 0.5; PB = 0.5; DAQ1 = 0.1; DBQ2= 0.05; heritabilities hQ12=0.2;hQ22=0.2; the sample size N = 100. Model notations are the same as in Figure 2.

Fig. 5.

Fig. 5

Analytical power against frequency of A when PB = 0.5 at 1% significance level. PQ1 =0.5; PQ2 = 0.5; DAQ1 = {min(PQ1, PA) − PQ1PA}/2, DBQ2 = 0.05; heritabilities hQ12=0.1;hQ22=0.1; the sample size N = 70. Model notations are the same as in Figure 2.

Figure 6 shows the required sample size of test statistic Fallele across the frequency of Q1 allele (pQ1) at Q1 when fixed pQ2 < 0.5 in order to achieve 80% power at 1% significance level. The sample size was calculated under hQ12=hQ22=0.2; PA = Pa = 0.5; PB = Pb = 0.5. Shown in Figure 6, at pQ1 < 0.2 “Rec ∩ Rec” needs the largest sample size with over 400 subjects, whereas “ Dom ∩ Rec” requires the largest sample size over all model at pQ1 > 0.2. “Dom ∪ Rec”, “Dom ∪ Dom” and “Rec ∪ Rec” need less than 400 subjects over all range.

Fig. 6.

Fig. 6

The required sample size against PQ1when PQ2 = 0.5 for 80% power achievement at 1% significance level. PA = 0.5; PB = 0.5; DAQ1 = {min(PQ1, PA) − PQ1 PA}/2; DBQ2 = 0.05; heritabilities hQ12=0.2;hQ22=0.2. Model notations are the same as in Figure 2.

POWER COMPARISON WITH GENOTYPIC BASED METHOD USING SIMULATION STUDY

In order to compare the allelic-based gene-gene interaction method with the genotypic-based method, we modeled the genotypic-based method as follows. There are several genotypic-based approaches such as RPM, but these are not directly comparable. Therefore, we restricted our analysis to a regression method that takes into account the genotypic combinations of the two markers as follows. For ith subject,

yi=wiγ+k=19Zi,k=(jl,mn)βk+ei,

where

Zi,k=(jl,mn)={1ifGk=(jl,mn)0otherwiseandj,l=(A,ora),m,n=(B,orb)

for example,

Zi,k=(AA,BB)={1ifGk=(AA,BB),0otherwise.

Under the null hypothesis of no interaction between two unlinked loci, H0: β1 = · · · = β9 = μ and (Hβ)t = (β1 − β2, β1 − β3, …, β1 − β9), the F-test statistic follows a F(8, N − 9) with λ = 0. Under the alternative hypothesis, the F is noncentral to F(8, N − 9, λgenotype), with non-centrality parameter given by,

λgenotypeNσ2(β1β2,β1β3,,β1β9)×[HE(ZtZ)1Ht]1×(β1β2,β1β3,,β1β9)t.

Likewise, we can also use the LRT, 2(LHALH0, where LHA is the likelihood function under the alternative hypothesis, LH0 is the likelihood function under the null hypothesis of H0: β1 = · · · = β9 = μ. The LRT has chi-square distribution χ82.

We performed simulation studies to compare the empirical power between the allelic-based method and the genotypic-based methods. Using SNaP, we simulated 2,000 datasets with two scenarios of LD between markers and QTLs under 0.2 of disease associated quantitative allele (PQ1 = PQ2 = 0.2), and two markers allelic frequencies PA = PB = 0.3 in order to understand power in a more realistic situation: the first situation is relatively high LD between QTL and markers, DAQ1=DBQ2=0.8;RAQ12=RBQ22=0.4 and the second situation is low LD, DAQ1=DBQ2=0.65;RAQ12=RBQ22=0.25. Each dataset has 500 subjects. We explored the eight interaction models in Table II. Tables IV and V present the empirical power at the 1 and 5% significance level of both scenarios using both F-test and LRT statistics. We found that all models in the allelic-based method achieved higher power than those in the genotypic-based method except “Rec ∪ Rec”. The powers at “Dom ∪ Dom”, “Dom ∪ Rec”, “Dom ∩ Rec” and “Rec ∩ Rec” models are slightly higher in the allelic-based method or similar at both methods. The power of F-test and LR test are almost identical. Furthermore, we explored more scenarios such as perfect LD ( RAQ12=RBQ22=1) and very low LD ( RAQ12=RBQ22<0.1). Both scenarios do not provide information of powers over the models because some of the models achieve equally 100% for perfect LD scenario and some of models achieve almost no power for low LD scenario.

TABLE IV.

Power (%) at 1 and 5% significance level at DAQ1=DBQ2=0.8;RAQ12=RBQ22=0.4 from 2,000 datasets, each has 500 subjects

Allelic-based method
Genotype-based method
F-test
LR test
F-test
LR test
Model α = 0.01 α = 0.05 α = 0.01 α = 0.05 α = 0.01 α = 0.05 α = 0.01 α = 0.05
Dom ∪ Dom 99.2 99.9 99.2 99.9 96.0 99.3 95.7 99.2
Dom ∪ Rec 99.2 99.9 99.3 99.9 98.1 99.4 98.0 99.3
Modified 89.4 97.0 89.8 97.2 81.6 93.2 80.8 93.0
Dom ∩ Dom 64.7 83.7 65.8 84.3 47.0 72.0 46.3 71.2
Rec ∪ Rec 18.4 38.6 18.9 39.4 26.0 48.4 25.5 48.1
Threshold 6.8 18.1 7.1 18.5 5.7 16.6 5.6 16.4
Dom ∩ Rec 2.0 7.5 2.3 7.8 1.9 8.2 1.9 8.2
Rec ∩ Rec 1.2 5.2 1.3 5.6 1.1 5.0 1.1 4.9

TABLE V.

Power (%) at 1 and 5% significance level at DAQ1=DBQ2=0.65;RAQ12=RBQ22=0.25 from 2,000 datasets, each has 500 subjects

Allelic-based method
Genotype-based method
F-test
LR test
F-test
LR test
Model α = 0.01 α = 0.05 α = 0.01 α = 0.05 α = 0.01 α = 0.05 α = 0.01 α = 0.05
Dom ∪ Dom 88.6 96.6 89.3 96.7 73.1 89.6 72.1 89.3
Dom ∪ Rec 91.5 97.9 92.1 97.9 80.2 93.0 79.4 92.7
Modified 66.5 83.7 67.7 84.2 50.3 72.7 48.8 71.8
Dom ∩ Dom 36.9 61.2 38.8 62.1 24.2 46.8 23.2 45.7
Rec ∩ Rec 10.0 26.5 10.4 27.3 11.6 28.4 11.2 27.9
Threshold 4.2 13.3 4.4 13.7 3.4 12.2 3.4 12.0
Dom ∩ Rec 1.4 6.9 1.5 7.1 1.3 6.6 1.3 6.5
Rec ∩ Rec 1.0 5.6 1.0 5.8 0.9 4.2 0.9 4.1

APPLICATION TO GENETIC STUDY OF SODIUM RETENTION BY KIDNEY

Our proposed method is applicable to many different genotyping experiments including whole genome-wide association data or candidate gene studies. Here, we apply our method to a candidate gene study of renal sodium reabsorption and blood pressure (BP). An increase in sodium retention by the kidney has been shown to underlie an increased level of BP. Furthermore, African Americans (AA) retain more sodium in the kidney and have a higher prevalence of hypertension than European Americans [Guyton et al., 1972; Meneton et al., 2003].

Subjects consisted of 95 unrelated AAs aged 18–36 years. BP was measured under controlled conditions in an inpatient facility, the General Clinical Research Center at Indiana University–Purdue University at Indianapolis. We genotyped 67 SNPs in five genes which are known to regulate sodium reabsorption by the Na, K, 2Cl cotran-sporter in the kidney’s thick ascending limb [Hebert, 1998]: (1) a gene for a protein that facilitates Cl- transport (BSND: 10 SNPs), (2) the calcium-sensing receptor gene (CaSR: 17 SNPs), (3) a chloride channel kb gene (CLCNKB: 14 SNPs), (4) a potassium channel gene (KCNJ1: 11 SNPs), and (5) the gene for the cotransporter itself (CLC12A1: 15 SNPs). The procedure of the proposed method to search for the best interaction model consists of multiple steps: First, each SNP was individually tested for association with systolic and diastolic BP using a standard additive model [Fan et al., 2006] after we tagged SNPs using Hclust.R program [Rinaldo et al., 2005]. SNPs were ranked according to the P-values corresponding to the association test with diastolic BP as the quantitative phenotype. Second, the 20 SNPs that had the smallest P-values were selected. The 20 SNPs consisted of 8 SNPs with P-value <0.05 and 12 SNPs with P-value between 0.05 and 0.2. The SNPs created 2,400 cases of 5 way (five SNPs) combinations when choosing one SNP from each gene. We applied our method to 2,400 cases of 5 way interaction and found that only 4 cases of the 5 way combinations were statistically significant after the application of a Bonferroni correction for multiple comparisons (α<=0.0002) using LRT statistics. Third, we then employed a backward elimination model selection method to search for the best model as we dropped one SNP from the selected 5 way interaction model. We applied the same procedure to all four cases chosen from the second step. The P-value from the F-test statistic from the full (5 way) and reduced model (4 way) was calculated by

F=RSS(reduced)RSS(full)RSS(full)×ndf(full)df(full)df(reduced),

where RSS(·) denotes the residual sum of square and df is degree of freedom. We also calculated Bayesian information criterion (BIC), which is a criterion for model selection as it takes into account a penalty to an increase of the number of parameters. Based on P-value of the full-reduced F-test statistic (adjusted Bonferroni correction at significance level 0.05) and smallest BIC including P-value of the 4 way interaction analysis, we chose the two significant models of the 4 (SNPs) way interaction from the four cases of 5 ways; (1) BSND, CaSR, CLCNKB, CLC12A1 genes, (2) BSND, CLCNKB, CLC12A1, KCNJ1 genes. Fourth, we continued this procedure until 2 way interaction analysis performed and the best interaction models are determined by the criteria (significance of interaction model, full-reduced F-test statistic and BIC). Among the five unlinked genes, the 3 way interaction model consisting of rs3749204 from CaSR, rs2297727 from CLCNKB and rs2279366 from SLC12A1 was determined to be the best interaction model (P-value = 0.02 in LRT of the interaction model, P-value = 0.07 in the full-reduced F). Rs3749204 has 0.1 of P-value, and both rs2297727 and rs2279366 have 0.047 of P-value. Additionally, we investigated which combination of alleles among three SNPs creates the interaction effect and identified that allele A at rs3749204, allele T at rs2297727 and allele C at rs2279366 are associated interactively with diastolic BP (βATC = 22.11, LRT = 9.95, P-value = 0.002).

DISCUSSION

Advanced genotyping technologies make genome-wide association studies possible for many common complex diseases. It has been known that multiple genes along with the environment play interactive roles contributing to the development of complex diseases. Single marker association tests often fail to identify causal association with disease due to biological complexity or lack of a strong association signal due to the small sample size. Therefore, the search for genetic interactions has become a reasonable next step. Most of the current approaches test for interaction at the genotype level. In this article, we proposed a method to identify gene-gene interaction at the allelic level for QTL. The allelic approach provides several advantages for detecting genetic interactions contributing to a quantitative trait: (1) This method can detect the interaction of SNPs when the contribution to disease of a particular allele inherited in one gene depends on a particular allele inherited at other unlinked genes, which is a novel definition for the allelic-based gene-gene interaction. (2) The method can identify the interaction of important SNPs with or without main effects when important SNPs may not be detected using a single marker test. (3) The biological interpretation may be more straightforward at the allelic rather than the genotypic level because the interacting alleles can be explained by combinations of disease-associated alleles, whereas the genotypic method cannot explain the interaction simply due to a lack of an interpretable relationship between the genotypes combined in a cluster.

There are several criteria by which the genotype-based and allelic-based methods can be compared. Genotype-based approaches can increase the degrees of freedom of a test statistic, resulting in a concomitant loss of power when multiple unlinked markers (i.e. more than 3) are tested for interaction. In contrast, the allelic-based method can reduce the number of degrees of freedom by accounting for allelic levels. Furthermore, the power of the genotype-based method may be compromised by sparse or empty combination caused by multiple markers with small sample sizes and a lower minor allele frequency (<0.1). However, the allelic-based method is relatively robust even with lower minor allele frequencies and small sample sizes. The application to renal sodium reabsorption and BP demonstrates the feasibility of our proposed method. The reported results of gene-gene interaction associated with the Na, K, 2Cl cotransporter in the kidney’s thick ascending limb is novel. Subsequent studies must be performed based on more phenotypes related to the sodium retention. In addition, regardless of the type of test used to evaluate gene-gene interaction, molecular studies will need to be performed to test whether and how the statistical interactions reflect an underlying biological interactions.

Our approach to detect interaction effect is different from the more conventional models such as the method of Wang and Zheng [2006] and that of Fan et al. [2006]. We do not implement the main additive and dominance effect on the model. Additionally, the interaction terms in the traditional methods were modeled by multiplication of two main effect variables (i.e. additive × additive, additive × dominance, dominance × dominance). The method of Wang and Zheng [2006] allowed Hardy-Weinberg and LD among two markers, which would be more appropriate to test an interaction among linked genes. The test of interaction is limited to the way to score the additive and dominance variables in the model (i.e. in additive variable X1 = 0, 1, 2, X2 = 0, 1, 2 according to the number of copies of any allele and the interaction variable is X1 × X2 = 0, 1, 2, 4). We performed simulation study to compare the power between the commonly used model, yi = μ + βAX1 + βBX2 + βAB (X1 * X2) + εi, and our proposed model (1) and found that both methods achieved the same power under the null hypothesis of no genetic effects regardless of the source of effects. However, the parameters in the two methods explain the different genetic effects: all three parameters in our model are associated with interaction effects and tested for interactions, but one of three parameters in the traditional model is associated with an interaction and tested for the interaction effect. Further simulation was carried out for the power comparison between the two models under the null hypothesis of no genetic interaction effect. The result shows that the interaction effect (βAB) in the commonly used model does not achieve high power to detect interaction effect compared with the allelic-based method (results not shown). Moreover, the traditional model can only test for an interaction caused by the specified alleles of markers, which depends on a dosage of the alleles at X1, X2 in the model. In contrast with that, our proposed method can test all possible allelic interaction between two markers which may or may not have main effects. However, the issue of multiple comparisons still remains in the high dimensionality, which is inherent in a genome-wide association approach. Devlin et al. [2003] presented a procedure to control the false discovery rate in multiple markers analysis under the assumption of independence of SNPs, but there remains in ongoing studies the need to resolve the challenge of dependent P-values that may be created due to the LD between markers. In our currently proposed approach, we suggest a two-stage analysis; initially, individual marker analysis followed by selection of the 5–10% of markers with marginal main effects. The selected markers are then used to identify gene-gene interaction at the allelic level.

Detection of gene-gene interaction is an important procedure in genome-wide association research, but is a complex concept due to a variety of definition. Furthermore, one may want to test for association of quantitative traits measured in case-control based ascertained genome-wide association data subsequently after case and control genome scan. Even our method is designed for randomly ascertained data, the approach could be extended to incorporate the case/control status in the model. Integration of biological knowledge of pathway into gene-gene interaction may help select high-priority candidate genes that potentially play a role in physiological and genetic meaning of interactions. Future work is needed to develop and evaluate more statistical methods that can handle various study designs such as case-control data at the allelic level.

Acknowledgments

Contract grant sponsors: Indiana Genomics Initiative (INGEN®), Lilly Endowment Inc.

We thank J. Howard Pratt for providing his genetic data of renal sodium reabsorption study and Howard J. Edenberg and Xiaoling Xuei for coordinating genotyping SNPs. This work is supported in part by the Indiana Genomics Initiative (INGEN®), which in turn is supported in part by the Lilly Endowment, Inc.

APPENDIX A

Under the assumption of no covariates for simplification, we multiply both sides by X=(XABτ,XAbτ,XaBτ,Xabτ), where XAB = (X1,AB, X2.AB, …, XN,AB)τ and take the expectation in order to derive regression coefficients in Equation (1)

E(XAByXAbyXaByXaby)=E(XAB2XABXAbXABXaBXABXabXAbXABXAb2XAbXaBXAbXabXaBXABXaBXAbXaB2XaBXabXabXABXabXAbXabXaBXab2)×(βABβAbβab). (A1)

The expectation of each element in matrix E(XX) is calculated as follows:

E(XAB)=PAPB,E(XAb)=PAPb,E(XaB)=PaPB,E(Xab)=PaPb,E(XABXAb)=0.5PAPBPb(10.5Pa),E(XABXaB)=0.5PAPBPb(10.5Pa),,E(XaBXab)=0.5PaPBPb(10.5PA).

Therefore,

E(XX)=(PAPB(10.5Pa)(10.5Pb)0.5PAPBPb(10.5Pa)0.5PAPBPa(10.5Pb)0.25PAPBPbPa0.5PAPBPb(10.5Pa)PAPb(10.5Pa)(10.5PB)0.25PAPBPbPa0.5PAPaPb(10.5PB)0.5PAPBPa(10.5Pb)0.25PAPBPbPaPaPB(10.5PA)(10.5Pb)0.5PaPBPb(10.5PA)0.25PAPBPbPa0.5PAPaPb(10.5PB)0.5PaPBPb(10.5PA)PaPb(10.5PA)(10.5PB)).

The expectation E(XCy)=ijklCμijklP(CM1,M2)×P(Q1,Q2,M1,M2), where P(Cm|M1, M2) is an allelic combination score given the genotype markers M1, M2 and

y={μijkli,j=(Q1,q1),k,l=(Q2,q2),0otherwise.

For example, the expectation

E(XABy)=1×{μ1111×P(Q1=Q1Q1,Q2=Q2Q2,M1=AA,M2=BB)++μ2222×P(q1q1,q2q2,AA,BB)}+12×{μ1111×P(Q1Q1,Q2Q2,AA,Bb)++μ2222×P(q1q1,q2q2,AA,Bb)}+12×{μ1111×P(Q1Q1,Q2Q2,Aa,BB)++μ2222×P(q1q1,q2q2,Aa,BB)}+14×{μ1111×P(Q1Q1,Q2Q2,Aa,Bb)++μ2222×P(q1q1,q2q2,Aa,Bb)}=μ1111{p(AQ1)2P(BQ2)2+12×2p(AQ1)2P(BQ2)P(bQ2)+12×2p(AQ1)p(aQ1)P(BQ2)2+14×4p(AQ1)p(aQ1)P(BQ2)P(bQ2)}++μ2222{p(Aq1)2P(Bq2)2+12×2p(Aq1)2P(Bq2)P(bq2)+12×2p(Aq1)p(aq1)P(Bq2)2+14×4p(Aq1)p(aq1)P(Bq2)P(bq2)},E(XABy)=P(AQ1)P(BQ2){μ1111P(Q1)P(Q2)+μ1112P(Q1)P(q2)+μ1211P(q1)P(Q2)+μ1212P(q1)P(q2)}+P(AQ1)P(Bq2){μ1112P(Q1)P(Q2)+μ1122P(Q1)P(q2)+μ1212P(q1)P(q2)+μ1222P(q1)P(q2)}+P(Aq1)P(BQ2){μ1211P(Q1)P(Q2)+μ1212P(Q1)P(q2)+μ2211P(q1)P(Q2)+μ2212P(q1)P(q2)}+P(Aq1)P(Bq2){μ1212P(Q1)P(Q2)+μ1222P(Q1)P(q2)+μ2212P(q1)P(Q2)+μ2222P(q1)P(q2)}.

Utilizing P(AQ1) = DAQ1+PAPQ1; P(Aq1) = DAq1 + PAPq1; P(BQ2) = DBQ2 + PBPQ2; P(bq2) = Dbq2 + PbPq2 and μQ1Q2 = μ1111 P(Q1)P(Q2) μ1112 P(Q1)P(q2) + μ1211P(q1)P(Q2) + μ1212P(q1)P(q2), similarly derived μQ1q2, μq1Q2, μq1q2give

E(XABy)=(DAQ1+PAPQ1)×(DBQ2+PBPQ2)μQ1Q2++(DAd1+PAPd1)×(DBq2+PBPq2)μq1q2=μPAPB+(μQ1Q2μQ1q2μq1Q2+μq1q2)DAQ1DBQ2+(μQ1μq1)DAQ1PB+(μQ2μq2)DBQ1PA.

With the same derivation of E(XAby);E(XaBy);E(Xaby) and

DAQ1=DaQ1=DAq1,DAQ1=Daq1,DBQ2=DbQ2=DBq2,DBQ2=Dbq2,
E(Xy)=(PAPBPaPBPAPbPaPb)μ+(μQ1Q2μQ1q2μq1Q2+μq1q2)×(DAQ1DBQ2DAQ1DbQ2DaQ1DBQ2DaQ1DbQ2)+(μQ1μq1)(DAQ1PBDAQ1PbDaQ1PBDaQ1Pb)+(μQ2μq2)(DBQ2PADbQ2PADBQ1PaDbQ1Pa).

Equation (A.1) can be expressed as follows:

β=E(XtX)1[(PAPBPaPBPAPbPaPb)μ+(μQ1Q2μQ1q2μq1Q2+μq1q2)(DAQ1DBQ2DAQ1DbQ2DaQ1DBQ2DaQ1DbQ2)+(μQ1μq1)(DAQ1PBDAQ1PbDaQ1PBDaQ1Pb)+(μQ2μq2)(DBQ2PADbQ2PADBQ2PaDbQ2Pa)]=K1[μ(1111)+(μQ1Q2μQ1q2μq1Q2+μq1q2)(DAQ1DBQ2/PAPBDAQ1DbQ2/PAPbDaQ1DBQ2/PaPBDaQ1DbQ2/PaPb)+(μQ1μq1)(DAQ1/PADAQ1/PADaQ1/PaDaQ1/Pa)+(μQ2μq2)(DBQ2/PBDbQ2/PbDBQ2/PBDbQ2/Pb)],

where

K=((10.5Pa)(10.5Pb)0.5Pb(10.5Pa)0.5Pa(10.5Pb)0.25PbPa0.5PB(10.5Pa)(10.5Pa)(10.5PB)0.25PaPB0.5Pa(10.5PB)0.5PA(10.5Pb)0.25PAPb(10.5PA)(10.5Pb)0.5Pb(10.5PA)0.25PAPB0.5PA(10.5PB)0.5PB(10.5PA)(10.5PA)(10.5PB)).

The αQ1=(μQ1μq1),αQ2=(μQ2μq2) can be defined by the average effect of the gene substitution and (μQ1Q2 − μQ1q2 − μq1Q2 + μq1q2) is the magnitude of interaction effect.

References

  1. Cook NR, Zee RY, Ridker PM. Tree and spline based association analysis of gene × gene interaction models for ischemic stroke. Statist Med. 2004;23:1439–1453. doi: 10.1002/sim.1749. [DOI] [PubMed] [Google Scholar]
  2. Devlin B, Roeder K, Wasserman L. Analysis of multilocus models of association. Genet Epidemiol. 2003;25:36–47. doi: 10.1002/gepi.10237. [DOI] [PubMed] [Google Scholar]
  3. Fan RZ, Jung JS, Jin L. High-resolution association mapping of quantitative trait loci: a population-based approach. Genetics. 2006;172:663–686. doi: 10.1534/genetics.105.046417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Falconer DS, Mackay TFC. Introduction to quantitative genetics. 4. Longman; London: 1996. [Google Scholar]
  5. Guyton AC, Coleman TG, Cowley AV, Jr, Scheel KW, Manning RD, Jr, Norman RA., Jr Arterial pressure regulation. Overriding dominance of the kidneys in long-term regulation and hypertension. Am J Med. 1972;52:584–592. doi: 10.1016/0002-9343(72)90050-2. [DOI] [PubMed] [Google Scholar]
  6. Hebert SC. Roles of Na–K–2Cl and Na–Cl cotransporters and ROMK potassium channels in urinary concentrating mechanism. Am J Physiol. 1998;275:F325–F327. doi: 10.1152/ajprenal.1998.275.3.F325. [DOI] [PubMed] [Google Scholar]
  7. Meneton P, Jeunemaitre X, Wardener HE, Macgregor GA. Links between dietary salt intake, renal salt handling, blood pressure, and cardiovascular diseases. Physiologic Rev. 2003;85:679–715. doi: 10.1152/physrev.00056.2003. [DOI] [PubMed] [Google Scholar]
  8. Nelson M, Kardia SLR, Ferrell RE, Sing CF. A combinational partitioning method to identify multilocus genotype partitions that predict quantitative trait variation. Genome Res. 2001;11:458–470. doi: 10.1101/gr.172901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Nothnagel M. Simulation of LD block-structured SNP haplotype data and its use for the analysis of case-control data by supervised learning methods. J Hum Genet. 2002;71:A2363. [Google Scholar]
  10. Rinaldo A, Bacanu SA, Devlin B, Sonpar V, Wasserman L, Roeder K. Characterization of multilocus linkage disequilibrium. Genet Epidemiol. 2005;28:193–206. doi: 10.1002/gepi.20056. [DOI] [PubMed] [Google Scholar]
  11. Ritchie MD, Hahn LW, Roodi N, Bailey R, Dupont WD, Parl FF, Moore J. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet. 2001;69:138–147. doi: 10.1086/321276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Wang T, Zheng ZB. Models and partition of variance for quantitative trait loci with epitasis and linkage disequilibrium. BMC Genet. 2006;7:9. doi: 10.1186/1471-2156-7-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES