Skip to main content
PLOS One logoLink to PLOS One
. 2010 Aug 18;5(8):e11384. doi: 10.1371/journal.pone.0011384

A General Model for Multilocus Epistatic Interactions in Case-Control Studies

Zhong Wang 1,2, Tian Liu 3, Zhenwu Lin 4, John Hegarty 4, Walter A Koltun 4, Rongling Wu 1,2,*
Editor: Zvia Agur5
PMCID: PMC2909900  PMID: 20814428

Abstract

Background

Epistasis, i.e., the interaction of alleles at different loci, is thought to play a central role in the formation and progression of complex diseases. The complexity of disease expression should arise from a complex network of epistatic interactions involving multiple genes.

Methodology

We develop a general model for testing high-order epistatic interactions for a complex disease in a case-control study. We incorporate the quantitative genetic theory of high-order epistasis into the setting of cases and controls sampled from a natural population. The new model allows the identification and testing of epistasis and its various genetic components.

Conclusions

Simulation studies were used to examine the power and false positive rates of the model under different sampling strategies. The model was used to detect epistasis in a case-control study of inflammatory bowel disease, in which five SNPs at a candidate gene were typed, leading to the identification of a significant three-locus epistasis.

Introduction

The complexity of biological systems arises from the highly interactive relationships of their components [1], [2]. Thus, it is likely that the metabolic pathways for a phenotypic trait or disease involve multiple interacting gene products and regulatory loci that could generate a complex network of genetic actions and interactions [3], [4]. Current genome-wide linkage or association studies have been able to detect genetic actions of individual genes involved in the phenotypic diversity of a complex trait [5][12]. Given its ubiquitousness in controlling complex traits and diseases, epistasis resulting from interactions between alleles at different genes has now received increasing attention in genetic studies [13], [14]. However, many of these studies focus on the identification of low-order pairwise epistasis, leaving epistatic interactions of high orders, their frequency and impact on genetic variation, unexplored.

More recently, Stich et al. [15] developed a linkage mapping approach to uncover three-way interactions among different quantitative trait loci (QTLs) using a mating design. Beerenwinkel et al. [16] proposed a mathematical approach for describing multi-way genetic interactions and employing it to study the genetic structure of fitness landscapes for Escherichia coli. Based on the analysis of pathway fragments, Imielinski and Belta [17] used a genome-scale knockout design to detect high-order epistatic relationships between components of large metabolic networks. Hansen and Wagner [18] showed that higher-order genetic interactions are potentially important if the total genomic mutation rate is large and the interaction density among loci is not too low. With the widespread availability of high-throughpout genotyping technology, there is a pressing need to estimate higher-order epistasis involving any number of genes and assess the role of epistasis in the creation and maintenance of genetic variation for complex traits.

The motivation of this study is to develop a general model for estimating epistasis of any order with multilocus single nucleotide polymorphism (SNP) data in case-control studies. In particular, the model allows the estimation and testing of high-order epistasis. Because of its easy sample collection, a population-based case-control design has been widely used in candidate gene or genome-wide association studies [19][21]. By comparing genotype frequencies for a gene in unrelated individuals with the disease and healthy controls, this design has power to test the significance of the association between the gene and disease. However, only a few studies used a case-control design to characterize epistasis [19] and, also, the epistasis they defined on the basis of logistic regression models presents a computational complexity. The new model described in this article has, for the first time, embedded quantitative genetic principles into a chi-square test framework, allowing the dissection of overall multilocus genetic effects into various components including epistatic interactions of high orders. The model was validated through simulation studies and a real data analysis.

Large Quantitative Genetic Models for Epistasis

Epistasis was originally defined as the expression of an allele at one locus masked by an allele at another locus [22]. This concept was then explained in a statistical manner by Fisher [23] as the deviation of genetic action from additivity in a linear model. Fisher's definition allows epistasis to be quantified in different forms based on its biological meaning determined by Bateson [22]. For a two-locus epistasis, all possible forms of epistasis include the interactions between additive effects at the two loci, additive effect at the first locus and dominant effect at the second locus, dominant effect at the first locus and additive effect at the second locus, and dominant effects at the two loci. Each of these epistatic forms contributes differently to the overall genetic value of a two-locus genotype. We used Mather and Jinks' formulation [24] to partition a genotypic value into its different components including epistasis.

Two-locus Epistasis

Suppose there are two loci, A with two alleles Inline graphic and Inline graphic and B with two alleles Inline graphic and Inline graphic, which form nine two-locus genotypes. Let Inline graphic denote the genetic value of an arbitrary genotype Inline graphic (Inline graphic for genotypes Inline graphic, Inline graphic, and Inline graphic; Inline graphic for genotypes Inline graphic, Inline graphic, and Inline graphic, respectively). We dissect Inline graphic into different components as table 1.

Table 1. The genetic effect components of two-locus genotypes.

Component
Genotype Value Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic + + + +
Inline graphic Inline graphic + + + +
Inline graphic Inline graphic + + Inline graphic Inline graphic
Inline graphic Inline graphic + + + +
Inline graphic Inline graphic + + + +
Inline graphic Inline graphic + Inline graphic + Inline graphic
Inline graphic Inline graphic + Inline graphic + Inline graphic
Inline graphic Inline graphic + Inline graphic + Inline graphic
Inline graphic Inline graphic + Inline graphic Inline graphic +

Where Inline graphic is the overall mean, Inline graphic and Inline graphic are the additive effect at genes A and B, Inline graphic and Inline graphic are the dominant effect at genes A and B, respectively, and Inline graphic, Inline graphic, Inline graphic, and Inline graphic are the additive Inline graphic additive, additive Inline graphic dominant, dominant Inline graphic additive, and dominant Inline graphic dominant epistatic interactions between the two genes, respectively.

The dissection of genotypic values is expressed, in matrix form, as

graphic file with name pone.0011384.e066.jpg (1)

The genetic effect parameters can be solved using

graphic file with name pone.0011384.e067.jpg (2)

Three-locus Epistasis

Adding a locus, C with two alleles Inline graphic and Inline graphic, to the two-gene model generates 27 three-locus genotypes, expressed as Inline graphic (Inline graphic for genotypes Inline graphic, Inline graphic, and Inline graphic, respectively). A three-genotypic value (Inline graphic) is dissected into the following components:

  1. the overall mean Inline graphic;

  2. the main genetic effects including the three additive effects (Inline graphic, Inline graphic, and Inline graphic) at genes A, B, and C, and the three dominant effects (Inline graphic, Inline graphic, and Inline graphic) at genes A, B, and C;

  3. the two-way interaction effects including the additive Inline graphic additive (Inline graphic), additive Inline graphic dominant (Inline graphic), dominant Inline graphic additive (Inline graphic), and dominant Inline graphic dominant (Inline graphic) epistasis between genes A and B, the additive Inline graphic additive (Inline graphic), additive Inline graphic dominant (Inline graphic), dominant Inline graphic additive (Inline graphic), and dominant Inline graphic dominant (Inline graphic) epistasis between genes A and C, and additive Inline graphic additive (Inline graphic), additive Inline graphic dominant (Inline graphic), dominant Inline graphic additive (Inline graphic), and dominant Inline graphic dominant (Inline graphic) epistasis between genes B and C;

  4. the three-way interaction effects including the additive Inline graphic additive Inline graphic additive (Inline graphic), additive Inline graphic additive Inline graphic dominant (Inline graphic), additive Inline graphic dominant Inline graphic additive (Inline graphic), dominant Inline graphic additive Inline graphic additive (Inline graphic), additive Inline graphic dominant Inline graphic dominant (Inline graphic), dominant Inline graphic additive Inline graphic dominant (Inline graphic), dominant Inline graphic additive Inline graphic dominant (Inline graphic), and dominant Inline graphic dominant Inline graphic dominant (Inline graphic) epistasis among genes A, B, and C.

Mather and Jinks' theory is used to formulate the relationships between genotypic values and genetic effects, expressed as

graphic file with name pone.0011384.e131.jpg (3)

The genetic effect parameters are then solved from the genotypic values:

graphic file with name pone.0011384.e132.jpg (4)

N-locus Epistasis

We propose a general model for describing genetic components for a genotype composed of any number of loci. Consider Inline graphic loci which form Inline graphic genotypes. The value of a Inline graphic-locus genotype is composed of the overall mean, the additive and dominant effects for each locus, and epistasis of different kinds and orders among these loci. Let the space of the genetic effects at individual loci be defined as Inline graphic for gene 1, Inline graphic for gene 2, …, Inline graphic for gene Inline graphic. Thus, we can define all possible genetic effects (Inline graphic) as

  • If Inline graphic, then Inline graphic;

  • If Inline graphic, then Inline graphic;

  • …;

  • If Inline graphic, then Inline graphic;

  • …;

  • If Inline graphic, Inline graphic, …,Inline graphic, then Inline graphic.

  • …;

  • If Inline graphic, Inline graphic, …,Inline graphic, then Inline graphic.

By letting Inline graphic, Inline graphic and Inline graphic (Inline graphic), we express the value of a general multi-locus genotype as

graphic file with name pone.0011384.e159.jpg (5)

where

graphic file with name pone.0011384.e160.jpg (6)

with

graphic file with name pone.0011384.e161.jpg (7)

Inline graphic is a logical judgment function that can return 1 if the condition is true otherwise return 0.

The genetic effect parameters can be estimated by solving the linear equations using

graphic file with name pone.0011384.e163.jpg (8)

where

graphic file with name pone.0011384.e164.jpg (9)

with

graphic file with name pone.0011384.e165.jpg (10)

Equation (8) gives a general form for main and interaction genetic effects among an arbitrary number of loci. Mathematical algorithms for solving epistatic equations are given in Text S1.

Testing Epistasis

Based on the definitions, we now provide a procedure for testing epistasis of different kinds and orders with multilocus genetic data. Consider a case-control study in which Inline graphic cases (there is a disease) and Inline graphic controls (there is no disease) are selected randomly from a natural population. Case and control groups are matched for demographical factors such as age, race, gender, life style, and body mass. All subjects from the case and control groups are genotyped genome-wide or for particular chromosomal regions of interest, depending on the purpose of the study. Let Inline graphic and Inline graphic denote the observations of a general genotype Inline graphic (Inline graphic) derived from three markers A, B, and C. Based on Mather and Jinks' partition of genotypic values [24], we calculate genetic effect parameters from genotypic values using equation (4). For both cases and controls, the genotypic values used to calculate each effect parameter are dissolved into two groups, plus and minus, which forms a 2 (cases and control)Inline graphic2 (plus and minus) contingency table. For example, the contingency table for testing the additiveInline graphicadditiveInline graphicadditive epistatic effect is expressed as table 2.

Table 2. The Inline graphic test statistics for the additiveInline graphicadditiveInline graphicadditive epistatic effect.

Plus Minus
Cases Inline graphic Inline graphic
Controls Inline graphic Inline graphic

From the table, the Inline graphic test statistic is calculated and compared with the critical threshold with one degree of freedom. We proved that the test statistics under the null hypothesis calculated from the above contingency table follows a Inline graphic distribution with less than one degree of freedom [25].

The contingency tables for testing the other parameters can be made similarly. For a particular group Inline graphic (Inline graphic = 1 for cases, 2 for controls), the genotypic values used to calculate the three-way epistatic effect parameters are tabulated as table 3.

Table 3. The Inline graphic test statistics for the three-way epistatic effect parameters.

Parameter Plus Minus
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic

The thresholds for testing each of these three-locus epistases are derived, which are Inline graphic = 3.84, 3.20, 3.20, 3.20, 2.60, 2.60, 2.60, and 2.14, respectively. The genotypic values used to calculate the two-way epistatic effect parameters are tabulated as table 4.

Table 4. The Inline graphic test statistics for the two-way epistatic effect parameters.

Parameter Plus Minus
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic

The thresholds for testing each of these two-locus epistases are derived, which are Inline graphic = 3.84, 3.84, 3.84, 2.50, 2.50, 2.50, 3.20, 3.20, 3.20, 3.20, 3.20 and 3.20, respectively. The genotypic values used to calculate the main genetic effect parameters are tabulated as table 5.

Table 5. The Inline graphic test statistics for the main epistatic effect parameters.

Parameter Plus Minus
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic

The thresholds for testing each of these two-locus epistases are derived, which are Inline graphic = 3.84, 3.84, 3.84, 2.60, 2.60 and 2.60, respectively. For an arbitrary number of markers, the genotypic values used to calculate the main and epistatic (of different orders) genetic effect parameters can be similarly divided into plus and minus groups, from which the Inline graphic test statistics are calculated.

Results

The model was used to analyze a case-control study aimed to detect genetic variants for inflammatory bowel disease (IBD) with candidate gene approaches [26]. As a member of the membrane associated guanylate kinase family, TDiscs large homolog (DLG5) plays a central role in maintaining cell junctions and cell shape and in clustering channel proteins at the cell surface [27]. Five single nucleotide polymorphisms (SNPs), Arg30Gln, Glu514Gln, Pro979Leu, Gly1066Gly, and Pro1371Gln, genotyped at DLG5 for both cases and controls are hoped to be associated with IBD. The cases include 115 sporadic IBD patients, aged from 22 to 66 years old, from the Milton S Hershey Medical Center, whereas the controls are 172 unrelated healthy individuals, aged from 15 to 81 years, from the Milton S Hershey Medical Center and Philadelphia gift of Life Donor Program. All the human tissues used for pathological studies and genetic analysis were approved by the Human Subjects Protection Offices of The Pennsylvania State University College of Medicine, and were undertaken with the understanding and written consent of each subject.

Because of a modest sample size used, our analysis will focus on a three-SNP analysis, although the model can deal with any number of SNPs. None of the five SNPs displays an additive genetic effect, but Arg30Gln, Pro979Leu, and Gly1066Gly were each found to trigger a significant dominant effect on the disease (Inline graphic) (table 6). There are 10 possible pairs for the five SNPs, with each pair subject to a two-locus epistatic analysis. The number and distribution of two-locus epistasis are given in table 7. It is interesting to see that significant two-locus epistasis was observed only between Arg30Gln and other SNPs including Pro979Leu with a significant main dominant effect and two non-significant SNPs (Glu514Gln and Pro1371Gln). The form of significant epistasis is limited to the interactions between the dominant effect at Arg30Gln and the additive/dominant effects at the other SNPs.

Table 6. The Inline graphic test statistics calculated to test the additive and dominant effects at each SNP genotyped from DLG5.

SNP Additive Dominant
Inline graphic Inline graphic
Arg30Gln 1.196 14.316
(0.00015)
Glu514Gln 0 0.355
Pro979Leu 0 6.095
(0.0136)
Gly1066Gly 0.718 4.297
(0.0382)
Pro1371Gln 0 1.933

The Inline graphic-values for those significant effects (in boldface) are given in parentheses.

Table 7. The Inline graphic test statistics calculated to test the two-SNP epistasis between each pair of SNPs genotyped from DLG5.

SNP Pair Inline graphic Inline graphic Inline graphic Inline graphic
Arg30GlnInline graphicGlu514Gln 0.113 0.112 3.292 2.909
(0.040) (0.020)
Arg30GlnInline graphicPro979Leu 0.118 1.085 3.958 2.405
(0.025) (0.040)
Arg30GlnInline graphicGly1066Gly 0.005 1.393 1.453 1.741
Arg30GlnInline graphicPro1371Gln 0.107 0.097 3.184 2.740
(0.050) (0.027)
Glu514GlnInline graphicPro979Leu 0 1.211 0.314 0.340
Glu514GlnInline graphicGly1066Gly 0.222 1.160 0.545 1.205
Glu514GlnInline graphicPro1371Gln 0 0.500 0.107 0.567
Pro979LeuInline graphicGly1066Gly 0.261 0.920 0.001 0.607
Pro979LeuInline graphicPro1371Gln 0 0.401 1.434 0.045
Gly1066GlyInline graphicPro1371Gln 0.290 1.584 1.543 1.907

The Inline graphic-values for those significant effects (in boldface) are given in parentheses.

The five SNPs produce 10 three-locus combinations which were analyzed by a three-locus epistasis model. Each combination has eight forms of three-SNP epistasis. Table 8 lists the test statistics for all possible combinations and forms of epistasis, with significant epistasis highlighted in boldface. The interactions among the additive effects at any three of the five SNPs were not significant; the same was also observed for the three-way dominant interactions. The significant three-locus epistasis must include both the additive and dominant effect at three SNPs. In general, Arg30Gln have more significant three-locus interactions and display higher three-locus significance level than the other SNPs. Arg30Ln, Glu514Gln, and Pro979Leu produce the most numerous forms of epistasis (3), followed by the combinations of Arg30Ln, Gly1066Gly, and Pro1371Gln (2), Glu514Gln, Gly1066Gly, and Pro1371Gln (2), Arg30Ln, Glu514, and Pro1371Gln (1), Arg30Ln, Pro979Leu, and Pro1371Gln (1), Pro979Leu, Gly1066Gly, and Pro1371Gln (1). The three SNPs with significant main effects (Arg30Ln, Pro979Leu, and Pro1371Gln) do not produce a significant three-locus epistatic interaction. The two SNPs displaying non-significant main effects (Glu514Gln and Pro1371Gln) could generate significant three-locus interactions with SNPs Arg30Ln and Gly1066Gly but not with Pro979Leu (table 8).

Table 8. The Inline graphic test statistics calculated to test the three-SNP epistasis between each pair of SNPs genotyped from DLG5.

SNP Triplet Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Arg30LnInline graphicGlu514GlnInline graphicPro979Leu 0.465 1.274 0.008 0.202 7.437 2.780 3.291 1.533
(0.0025) (0.045) (0.025)
Arg30LnInline graphicGlu514GlnInline graphicGly1066Gly 0.010 3.038 0.054 1.499 2.469 0.340 0.815 0.134
Arg30LnInline graphicGlu514GlnInline graphicPro1371Gln 0.426 0.390 0.027 0.183 5.818 2.040 2.329 1.011
(0.008)
Arg30LnInline graphicPro979LeuInline graphicGly1066Gly 0.002 2.460 0.576 0.772 3.140 0.250 1.151 0.057
Arg30LnInline graphicPro979LeuInline graphicPro1371Gln 0.448 0.315 1.601 0.014 7.061 2.411 2.438 1.127
(0.0035)
Arg30LnInline graphicGly1066GlyInline graphicPro1371Gln 0.005 1.880 4.250 2.618 3.076 1.652 0.814 0.644
(0.020) (0.050)
Glu514GlnInline graphicPro979LeuInline graphicGly1066Gly 1.101 2.096 0.046 0.908 1.447 1.158 0.120 0.774
Glu514GlnInline graphicPro979LeuInline graphicPro1371Gln 0 0.858 2.262 0.009 0.687 0.672 0.262 0.015
Glu514GlnInline graphicGly1066GlyInline graphicPro1371Gln 1.191 3.457 3.324 1.994 1.180 1.832 1.716 1.817
(0.040) (0.045)
Pro979LeuInline graphicGly1066GlyInline graphicPro1371Gln 1.322 3.298 2.727 1.756 0.100 0.436 1.082 1.255
(0.050)

The Inline graphic-values for those significant effects (in boldface) are given in parentheses.

After significant high-order epistasis is detected, the next step is to make a biological interpretation of such epistasis. To interpret it, we will use the dominant (Inline graphic)Inline graphicadditive (Inline graphic)Inline graphicadditive (Inline graphic) epistasis among Arg30Ln, Glu514Gln, and Pro979Leu as an example. Table 9 gives the structure of genetic effects for each three-locus genotypic value in terms of the additive, dominant, and epistatic effects of different orders. The Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic epistasis only contributes to the genotypic value of Inline graphic, Inline graphic, Inline graphic, and Inline graphic (table 9). For each of these four genotypes, their values are partitioned into different effect components for both cases and controls (table 10). As can be seen, the Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic epistasis increases, by 9 cases, the incidence of IBD for those with genotype Inline graphic or Inline graphic, but decreases the IBD incidence of those carrying genotype Inline graphic or Inline graphic with the same extent.

Table 9. Genetic effect components of different three-locus genotypic values.

Genotype Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic + + + + + + +
Inline graphic + + + + + + +
Inline graphic + + Inline graphic + Inline graphic Inline graphic Inline graphic
Inline graphic + + + + + + +
Inline graphic + + + + + + +
Inline graphic + Inline graphic + Inline graphic + Inline graphic Inline graphic
Inline graphic + Inline graphic + Inline graphic Inline graphic + Inline graphic
Inline graphic + Inline graphic + Inline graphic + Inline graphic Inline graphic
Inline graphic + Inline graphic Inline graphic Inline graphic + Inline graphic +
AaBBCC + + + + + + +
Inline graphic + + + + + + +
AaBBcc + + +
Inline graphic + + + + + + +
Inline graphic + + + + + +
Inline graphic Inline graphic + + + Inline graphic Inline graphic Inline graphic
AabbCC + + +
Inline graphic Inline graphic + + + Inline graphic Inline graphic Inline graphic
Aabbcc + + +
Inline graphic Inline graphic + + Inline graphic + Inline graphic Inline graphic
Inline graphic Inline graphic + + Inline graphic Inline graphic + Inline graphic
Inline graphic Inline graphic + Inline graphic Inline graphic Inline graphic + +
Inline graphic Inline graphic + + Inline graphic Inline graphic + Inline graphic
Inline graphic Inline graphic + + + Inline graphic Inline graphic Inline graphic
Inline graphic Inline graphic Inline graphic + + Inline graphic Inline graphic +
Inline graphic Inline graphic Inline graphic + + Inline graphic Inline graphic +
Inline graphic Inline graphic Inline graphic + + Inline graphic Inline graphic +
Inline graphic Inline graphic Inline graphic Inline graphic + + + Inline graphic

The genotypic value containing the dominant Inline graphic additive Inline graphic additive epistasis are in boldface.

Table 10. The genetic effect components of four particular genotypes, Inline graphic, Inline graphic, Inline graphic, and Inline graphic at three SNPs, Arg30Ln, Glu514Gln, and Pro979Leu, which contain the dominantInline graphicadditiveInline graphicadditive three-locus epistasis.

Genotype Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Inline graphic Cases Inline graphic16 16 Inline graphic11 Inline graphic16 11 Inline graphic11 11
Controls Inline graphic10 10 Inline graphic2 Inline graphic10 2 Inline graphic2 2
Difference (Inline graphic6) (6) (Inline graphic9) (Inline graphic6) (9) (Inline graphic9) (9)
Inline graphic Cases Inline graphic16 Inline graphic16 Inline graphic11 16 11 Inline graphic11 Inline graphic11
Controls Inline graphic10 Inline graphic10 Inline graphic2 10 2 Inline graphic2 Inline graphic2
Difference (Inline graphic6) (Inline graphic6) (Inline graphic9) (6) (9) (Inline graphic9) (Inline graphic9)
Inline graphic Cases 16 16 Inline graphic11 16 Inline graphic11 Inline graphic11 Inline graphic11
Controls 10 10 Inline graphic2 10 Inline graphic2 Inline graphic2 Inline graphic2
Difference (6) (6) (Inline graphic9) (6) (Inline graphic9) (Inline graphic9) (Inline graphic9)
Inline graphic Cases 16 Inline graphic16 Inline graphic11 Inline graphic16 Inline graphic Inline graphic11 11
Controls 10 Inline graphic10 Inline graphic2 Inline graphic10 Inline graphic2 Inline graphic2 2
Difference (6) (Inline graphic6) (Inline graphic9) (Inline graphic6) (Inline graphic9) (Inline graphic9) (9)

Computer Simulation

Simulation studies were undertaken to examine the statistical behavior of the new model. We will focus on the investigation of the power and false positive rates (FDR) for the detection of three-locus epistasis. Three different simulation schemes will be used with varying numbers of cases vs. controls, 200 vs. 200, 400 vs. 400, and 1000 vs. 1000. The eight possible forms of three-locus epistasis can be sorted into four presentative ones, (1) additiveInline graphicadditiveInline graphicadditive (no dominant effect), (2) additiveInline graphicadditiveInline graphicdominant, additiveInline graphicdominantInline graphicadditive, and dominantInline graphicadditiveInline graphicadditive (no one dominant effect), (3) additiveInline graphicdominantInline graphicdominant, dominantInline graphicadditiveInline graphicdominant, and dominantInline graphicdominantInline graphicadditive (two dominant effects), and (4) dominantInline graphicdominantInline graphicdominant (three dominant effects).

For a real data set, different SNPs may be associated or independent of each other. We will investigate how SNP-SNP associations affect the behavior of the new model. In one data set, three SNPs with the same allele frequency were simulated with pair-wise and three-locus linkage disequilibria. Among the three SNPs, only additiveInline graphicadditiveInline graphicadditive, additiveInline graphicadditiveInline graphicdominant, additiveInline graphicdominantInline graphicdominant, and dominantInline graphicdominantInline graphicdominant were assumed to exist. This can be done by simulating a contingency table with constraints Inline graphic Inline graphic Inline graphic, Inline graphic Inline graphic Inline graphic, Inline graphic Inline graphic Inline graphic, Inline graphic Inline graphic Inline graphic and the test statistics for the other effects Inline graphic the corresponding thresholds. The same parameters, except that there is no linkage disequilibrium, were used to simulate the second data set containing three SNPs.

Table 11, table 12, and table 13 give the power and false positive error rates (FPR) of the three-locus interaction detection by the new epistatic models. The power to detect the three-locus epistasis increase remarkably with sample size in a case-control study. With sample sizes of 200 vs. 200, there is power of about 0.51–0.61, with the additiveInline graphicadditiveInline graphicadditive epistasis detected most easily, followed by the additiveInline graphicadditiveInline graphicdominant epistasis, the additiveInline graphicdominantInline graphicdominant epistasis, and the dominantInline graphicdominantInline graphicdominant epistasis. When sample sizes increase to 400 vs. 400, the power for the three-locus epistasis detection will surpass three quarters. If sample sizes 1000 vs. 1000 are used, the power reaches 0.99 or more. In general, whether the SNPs are associated or independent does not affect the power substantially, although in some cases the power is higher for associated SNPs than independent SNPs.

Table 11. Power and false positive rates (FPR) for the detection of three-locus epistasis among associated and independent SNPs for 200 cases and 200 controls.

Associated Independent
Epistasis Power FPR Power FPR
AdditiveInline graphicadditiveInline graphicadditive 61.2 4.9 51.8 4.0
AdditiveInline graphicadditiveInline graphicdominant 56.7 4.8 45.2 5.3
AdditiveInline graphicdominantInline graphicdominant 49.3 5.1 48.4 4.7
DominantInline graphicdominantInline graphicdominant 51.0 6.0 56.1 6.8

Table 12. Power and false positive rates (FPR) for the detection of three-locus epistasis among associated and independent SNPs for 400 cases and 400 controls.

Associated Independent
Epistasis Power FPR Power FPR
AdditiveInline graphicadditiveInline graphicadditive 85.0 5.8 79.6 5.4
AdditiveInline graphicadditiveInline graphicdominant 84.2 4.5 78.6 6.0
AdditiveInline graphicdominantInline graphicdominant 77.6 5.6 76.8 7.5
DominantInline graphicdominantInline graphicdominant 79.8 7.5 86.6 6.0

Table 13. Power and false positive rates (FPR) for the detection of three-locus epistasis among associated and independent SNPs for 1000 cases and 1000 controls.

Associated Independent
Epistasis Power FPR Power FPR
AdditiveInline graphicadditiveInline graphicadditive 99.8 5.9 99.1 5.6
AdditiveInline graphicadditiveInline graphicdominant 99.6 4.1 99.3 6.3
AdditiveInline graphicdominantInline graphicdominant 99.2 5.8 98.0 5.9
DominantInline graphicdominantInline graphicdominant 98.6 8.7 99.7 6.6

The power displays a small FPR (tables 11, 12, and 13). Even if small sample sizes 200 vs. 200 are used, there is still a small chance that the model provides a false positive result for the three-locus epistasis detection. The FPR was found to be consistent, regardless of sample sizes and the degree of SNP-SNP associations.

Discussion

The phenotypic variation of a trait or disease is highly complex given its polygenic inheritance and environmental influence. Most original quantitative genetic models generally assume that allelic effects are additive, with the size linearly proportional to the number of alleles. These models are modified by considering that there are genetic interactions between different alleles at the same locus (dominance). It is now recognized that the interactions between different loci (epistasis) within gene networks may play an important role [14], [15]. More recent evidence shows that high-order epistasis among more than two genes may form a crucial component in genetic interaction networks [9], [11], [17], [18]. In fact, quantitative genetic analyses have detected high-order epistatic effects in plants. For example, high-order epistasis could be correlated with the aggressiveness of the isolate of Phytophthora capsici through influencing double crosses among different loci at meiosis [28]. Wu [29] used a mating design with clonal replicates to identify the significant contribution of high-order epistasis to genetic variation in stem wood growth traits in poplars.

An increasing availability of high-throughput SNP data has led to the development of various statistical approaches for effectively analyzing epistasis among multiple polymorphisms, including logistic regression, multifactor dimensionality reduction (MDR), Bayesian analysis, and machine learning [15], [21], [30][32]. In this article, we developed a general model for detecting the episatsis of any order in case-control genetic association studies by integrating traditional quantitative genetic principles. Despite the existence, high-order epistasis may be obscured by metabolic network redundancy [17]. The integration of quantitative genetic principles makes our approach capable to identify high-order epistatic interactions with genetic relevance. The model was tested by simulation studies. It displays adequate power for the detection of high-order epistasis with a modest sample size; for example, 400 cases vs. 400 controls. When sample sizes of cases and controls increase to 1000 vs. 1000, which is currently not a problem for most genetic association studies, the model has almost full power to detect three-locus epistasis of different forms. Even if a small size of samples (say 200 vs. 200), the new model has a low false positive rate for epistatic detection. The practical application of the model is validated by analyzing a real data set for the genetic study of inflammatory bowel disease. The model detected significant three-locus epistatic interactions among different SNPs genotyped from a candidate gene DLG5 [27].

Our model allows the characterization of epistasis of any order. Its implementation into a practical setting of genome-wide association studies is challenged by an exponentially increasing number of SNP-SNP combinations. To make this tractable, one may incorporate optimization techniques into our model, allowing the selection of the most important combinations. An additional issue is to determine the critical threshold with multiple correlated SNPs in genome-wide association studies. An empirical approach for determining a genome-wide threshold is to employ non-parametric permutation testing (see ref. [21], [30], [33][35]). Lastly, the model is developed to detect multilocus epistasis at the SNP level, but given recent discoveries for the importance of haplotypes in trait control [36][39], the model should be extended to consider high-order interactions expressed by different haplotypes. In the current model specification, we choose controls that are matched for cases in terms of biological, environmental, or demographical factors. When such matches are not possible, we need to embed these factors as covariates into the model, in which the interactions between genes and these factors can be tested. Third, the model can be extended with multiple diseases to consider the pleiotropic effect of a gene. The results about high-order epistasis detection using the this and extended models could be used for iterative model building and functional annotation of genes. Future applications of these results includes analysis of the metabolic networks of pathogenic organisms and generation of epistatic candidate models for genome-wide association studies.

Supporting Information

Text S1

Mathematical algorithm.

(0.14 MB PDF)

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: This work is supported by National Science Foundation (NSF) grant DMS/NIGMS-0540745 and the Changjiang Scholars Award. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Weng G, Bhalla U, Iyengar R. Complexity in biological signaling systems. Science. 1999;284:92–96. doi: 10.1126/science.284.5411.92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Hlavacek W, Faeder J. The Complexity of Cell Signaling and the Need for a New Mechanics. Science's STKE. 2009;2 doi: 10.1126/scisignal.281pe46. [DOI] [PubMed] [Google Scholar]
  • 3.Huang L, Sternberg P. Genetic dissection of developmental pathways. Methods in cell biology. 1995;48:97–122. doi: 10.1016/s0091-679x(08)61385-0. [DOI] [PubMed] [Google Scholar]
  • 4.McMullen M, Byrne P, Snook M, Wiseman B, Lee E, et al. Quantitative trait loci and metabolic pathways. Proceedings of the National Academy of Sciences of the United States of America. 1998;95:1996. doi: 10.1073/pnas.95.5.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ritchie M, Hahn L, Roodi N, Bailey L, Dupont W, et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. The American Journal of Human Genetics. 2001;69:138–147. doi: 10.1086/321276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Martin M, Gao X, Lee J, Nelson G, Detels R, et al. Epistatic interaction between KIR3DS1 and HLA-B delays the progression to AIDS. Nature genetics. 2002;31:429–434. doi: 10.1038/ng934. [DOI] [PubMed] [Google Scholar]
  • 7.Gabutero E, Moore C, Mallal S, Stewart G, Williamson P. Interaction between allelic variation in IL12B and CCR5 affects the development of AIDS: IL12B/CCR5 interaction and HIV/AIDS. AIDS. 2007;21:65. doi: 10.1097/QAD.0b013e3280117f49. [DOI] [PubMed] [Google Scholar]
  • 8.Hirschhorn J, Daly M. Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics. 2005;6:95–108. doi: 10.1038/nrg1521. [DOI] [PubMed] [Google Scholar]
  • 9.Marchini J, Donnelly P, Cardon L. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature genetics. 2005;37:413–417. doi: 10.1038/ng1537. [DOI] [PubMed] [Google Scholar]
  • 10.Wang W, Barratt B, Clayton D, Todd J. Genome-wide association studies: theoretical and practical concerns. Nature Reviews Genetics. 2005;6:109–118. doi: 10.1038/nrg1522. [DOI] [PubMed] [Google Scholar]
  • 11.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wan X, Yang C, Yang Q, Xue H, Tang N, et al. Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics. 2010;26:30. doi: 10.1093/bioinformatics/btp622. [DOI] [PubMed] [Google Scholar]
  • 13.Phillips P. Epistasis-the essential role of gene interactions in the structure and evolution of genetic systems. Nature Reviews Genetics. 2008;9:855–867. doi: 10.1038/nrg2452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Moore J, Williams S. Epistasis and its implications for personal genetics. The American Journal of Human Genetics. 2009;85:309–320. doi: 10.1016/j.ajhg.2009.08.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Stich B, Yu J, Melchinger A, Piepho H, Utz H, et al. Power to detect higher-order epistatic interactions in a metabolic pathway using a new mapping strategy. Genetics. 2007;176:563. doi: 10.1534/genetics.106.067033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Beerenwinkel N, Pachter L, Sturmfels B, Elena S, Lenski R. Analysis of epistatic interactions and fitness landscapes using a new geometric approach. BMC Evolutionary Biology. 2007;7:60. doi: 10.1186/1471-2148-7-60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Imielinski M, Belta C. Exploiting the pathway structure of metabolism to reveal high-order epistasis. BMC Systems Biology. 2008;2:40. doi: 10.1186/1752-0509-2-40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hansen T, Wagner G. Epistasis and the mutation load: a measurement-theoretical approach. Genetics. 2001;158:477. doi: 10.1093/genetics/158.1.477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Zhang Y, Liu J. Bayesian inference of epistatic interactions in case-control studies. Nature genetics. 2007;39:1167–1173. doi: 10.1038/ng2110. [DOI] [PubMed] [Google Scholar]
  • 20.Nunkesser R, Bernholt T, Schwender H, Ickstadt K, Wegener I. Detecting high-order interactions of single nucleotide polymorphisms using genetic programming. Bioinformatics. 2007;23:3280. doi: 10.1093/bioinformatics/btm522. [DOI] [PubMed] [Google Scholar]
  • 21.Gayán J, González-Pérez A, Bermudo F, Sáez M, Royo J, et al. A method for detecting epistasis in genome-wide studies using case-control multi-locus association analysis. BMC genomics. 2008;9:360. doi: 10.1186/1471-2164-9-360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bateson W. Mendels principles of heredity. Molecular and General Genetics MGG. 1910;3:108–109. [Google Scholar]
  • 23.Kempthorne O. The correlation between relatives on the supposition of mendelian inheritance. American Journal of Human Genetics. 1968;20:402. [Google Scholar]
  • 24.Workman P. Biometrical genetics. The study of continuous variation. American Journal of Human Genetics. 1973;25:461. [Google Scholar]
  • 25.Liu T, Thalamuthu A, Liu J, Chen C, Wu R. A model for testing epistatic interactions of complex diseases in case-control studies. Biostatistics. 2010 (in press) [Google Scholar]
  • 26.Lin Z, Poritz L, Franke A, Li T, Ruether A, et al. Genetic association of DLG5 R30Q with familial and sporadic inflammatory bowel disease in men. Disease markers. 2009;27:193–201. doi: 10.3233/DMA-2009-0662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Stoll M, Corneliussen B, Costello C, Waetzig G, Mellgard B, et al. Genetic variation in DLG5 is associated with inflammatory bowel disease. Nature genetics. 2004;36:476–480. doi: 10.1038/ng1345. [DOI] [PubMed] [Google Scholar]
  • 28.Bartual R, Lacasa A, Marsal J, Tello J. Epistasis in the resistance of pepper to phytophthora stem blight (Phytophthora capsici L.) and its significance in the prediction of double cross performances. Euphytica. 1993;72:149–152. [Google Scholar]
  • 29.Wu R. Detecting epistatic genetic variance with a clonally replicated design: models for lowvs high-order nonallelic interaction. TAG Theoretical and Applied Genetics. 1996;93:102–109. doi: 10.1007/BF00225734. [DOI] [PubMed] [Google Scholar]
  • 30.Liang Y, Kelemen A. Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases. Statistics Surveys. 2008;2:43–60. [Google Scholar]
  • 31.Kayano M, Takigawa I, Shiga M, Tsuda K, Mamitsuka H. Efficiently finding genome-wide three-way gene interactions from transcript- and genotype-data. Bioinformatics. 2009;25:2735. doi: 10.1093/bioinformatics/btp531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Jiang R, Tang W, Wu X, Fu W. A random forest approach to the detection of epistatic interactions in case-control studies. BMC bioinformatics. 2009;10:S65. doi: 10.1186/1471-2105-10-S1-S65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Carlborg O, Andersson L. Use of randomization testing to detect multiple epistatic QTLs. Genetics Research. 2002;79:175–184. doi: 10.1017/s001667230200558x. [DOI] [PubMed] [Google Scholar]
  • 34.Alison M. The effect of alternative permutation testing strategies on the performance of multifactor dimensionality reduction. BMC Research Notes. 2009;1 doi: 10.1186/1756-0500-1-139. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Edwards T, Turner S, Torstenson E, Dudek S, Martin E, et al. A General Framework for Formal Tests of Interaction after Exhaustive Search Methods with Applications to MDR and MDR-PDT. 2010 doi: 10.1371/journal.pone.0009363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Judson R, Stephens J, Windemuth A. The predictive power of haplotypes in clinical response. pgs. 2000;1:15–26. doi: 10.1517/14622416.1.1.15. [DOI] [PubMed] [Google Scholar]
  • 37.Bader J. The relative power of SNPs and haplotype as genetic markers for association tests. pgs. 2001;2:11–24. doi: 10.1517/14622416.2.1.11. [DOI] [PubMed] [Google Scholar]
  • 38.Liu T, Johnson J, Casella G, Wu R. Sequencing complex diseases with HapMap. Genetics. 2004;168:503. doi: 10.1534/genetics.104.029603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Rha S, Jeung H, Choi Y, Yang W, Yoo J, et al. An association between RRM1 haplotype and gemcitabine-induced neutropenia in breast cancer patients. The Oncologist. 2007;12:622. doi: 10.1634/theoncologist.12-6-622. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Text S1

Mathematical algorithm.

(0.14 MB PDF)


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES