Abstract
Understanding the genetic background of complex diseases requires the expansion of studies beyond univariate associations. Therefore, it is important to use interaction assessments of risk factors in order to discover whether, and how genetic risk variants act together on disease development. The principle of interaction analysis is to explore the magnitude of the combined effect of risk factors on disease causation. In this study, we use simulations to investigate different scenarios of causation to show how the magnitude of the effect of two risk factors interact. We mainly focus on the two most commonly used interaction models, the additive and multiplicative risk scales, since there is often confusion regarding their use and interpretation. Our results show that the combined effect is multiplicative when two risk factors are involved in the same chain of events, an interaction called synergism. Synergism is often described as a deviation from additivity, which is a broader term. Our results also confirm that it is often relevant to estimate additive effect relationships, because they correspond to independent risk factors at low disease prevalence. Importantly, we evaluate the threshold of more than two required risk factors for disease causation, called the multifactorial threshold model. We found a simple mathematical relationship (square root) between the threshold and an additive-to-multiplicative linear effect scale (AMLES), where 0 corresponds to an additive effect and 1 to a multiplicative. We propose AMLES as a metric that could be used to test different effects relationships at the same time, given that it can simultaneously reveal additive, multiplicative and intermediate risk effects relationships. Finally, the utility of our simulation study was demonstrated using real data by analyzing and interpreting gene-gene interaction odds ratios from a rheumatoid arthritis case-control cohort.
Introduction
Genetic mapping studies of complex human traits have identified thousands of genetic loci implicated in the susceptibility to complex diseases [1, 2]. In other words, genome-wide association studies (GWAS) have linked thousands of single-nucleotide polymorphisms with complex human traits [1, 2]. These findings have been beneficial in areas such as drug repositioning [1, 3–5]. However, the identified individual genetic associations seldom exhibit strong disease risks and explain a small portion of the calculated heritability for each trait [1, 6–8]. This implies that the genetic associations have a very poor predictive value [6]. Comprehensive interaction analysis between, and among, risk factors is an important tool for understanding the genetic background of complex traits [7–11]. In general, interaction is said to be present when the combined effect magnitude of two or more factors is significantly different from the combined effect magnitude predicted by the model being tested [12–14]. Nevertheless, it is challenging to address whether, and how, genetic risk factors interact in shaping human traits, and to biologically interpret those interactions [9, 14]. Additionally, there is often confusion in the interpretation of the results from different interaction models. We therefore, in this study, provide a framework of simulated scenarios that represent common and simple processes where two or more genetic factors may interplay in disease causation. We investigate whether, and how, two or multiple genetic factors interact in each simulated scenario. We also address these relationships in real data from a case-control study in rheumatoid arthritis (RA), the Swedish epidemiological investigation of RA (EIRA) [15, 16].
The association between an individual genetic variant and an outcome (e.g., disease) is typically quantified as an odds ratio (OR) or relative risks (RR). The case-control design is often used for diseases with a low prevalence, where the resulting ORs approximate the RRs. Interactions studies generally examine two risk factors at a time, yielding three ORs (or RRs) notated as [13, 14]: OR11 for carrying both risk factors; OR10 and OR01 for the exclusive combinations; and OR00 for the absence of both risk factors, which is used as reference (OR00 = 1). Two different models are commonly used to test interaction, the additive (whose null hypothesis is OR11 = OR10 + OR01 − 1) and the multiplicative (whose null hypothesis is OR11 = OR10 × OR01). The additive model builds on the sufficient cause concept of KJ Rothman [12], who showed that if two factors are part of the same sufficient cause of a disease (e.g., pathway), then their joint risk will be larger than the sum thereof (often termed “departure from additivity”). This additive model has been criticized for always giving positive results when used as a null hypothesis [17, 18]. On the other hand, the multiplicative model has been criticized as a statistical convenience without a theoretical basis, boosted by the implicit multiplicativity in logistic regression [17, 19, 20]. There is often confusion regarding when and how to use and interpret each of these models.
In addition to evaluate the additive and multiplicative interaction models, we study a third model in this paper, the multifactorial threshold model [21]. This model assumes that there is a minimum number of factors required for disease causation, an assumption with a theoretical foundation in the concept of genetic liability [22]. Until now, no a statistical metric for the multifactorial threshold model has been available. We therefore propose a new metric, AMLES (additive-to-multiplicative linear effect scale), which compares the number of risk factors to a threshold value. Our proposed metric could help to determine the validity of the multifactorial threshold model, which has been criticized for simplifying disease biology [21].
Simulations are able to give answers with a high level of precision if enough data points are generated. They also provide great flexibility to fit a number of scenarios. We here build different scenarios in each of which we describe the cause of the outcome (disease or otherwise), with the purpose of tracing how a particular type of causality would be represented in an interaction study. Our intension is to include commonly occurring and simple real-world scenarios, and practical examples from the literature are provided. We further build scenarios where we model situations with confounding factors commonly identified in genetic association studies, such as population stratification. We also show when the modelled scenarios can be distinguished in real data, and where they cannot. Across all simulations the risk factors are dichotomous, and neither necessary nor sufficient for disease to occur. While most of our simulated scenarios have two risk factors, we include modelling of more than two factors and illustrate how it is possible to elucidate multifactorial interactions from a set of pairwise interactions. We compare the simulated scenarios to additive and multiplicative risk scales, aiming to contribute to the understanding and interpretation of different models of interaction. Finally, we propose a metric in the framework of the multifactorial threshold model, which can estimate the fit to the additive and multiplicative models, as well as testing against a particular threshold level involving more than two factors.
Methods
Simulations
To better understand interactions between risk factors, we first set up simulations of causation, which are represented in Fig 1A to 1C. Each scenario included a dichotomous outcome (present/absent), and causative processes, termed components. For each of the different scenarios, we performed 1,000 simulations (Figs 1A–1C, 2B, 3A, 3B, S1A–S1C, S2A and S2B Figs). Each simulation consisted of one million data points (i.e., simulated individuals) where presence or absence of a risk allele and status of case or control was assigned in accordance with each model. As sensitivity analysis, we performed simulations with 100,000 data points. These simulations gave essentially the same results and were therefore not included. For the sake of simplicity, binary factors, corresponding to a dominant or recessive scenario in genetics, were used. We also used subgroups with equal ratio of cases and controls, named components, except for the three factors model (Fig 2B), where components 2 and 3 had the same ratio and component 1 had the same ratio as the logical-OR combination for the other two.
Allele frequencies for the two exposures, named X and Z, tested in each model, were established before each simulation by drawing a random value within a given range. For instance, if the lower frequency for the risk factor Z was set, a random value between 5% and 15% was chosen, then the higher frequency for Z was set by multiplying this figure by a random number between 1.1 and 4. The allele frequency for the exposure X was established in a similar fashion but using a range between 5% and 25% to select the random value, which was then multiplied by a random number between 1.1 and 2 to set the higher frequency.
For each scenario we calculated several metrics. We calculated ORs for the combinations of the risk factors: OR01 for X = 0, Z = 1; OR10 for X = 1, Z = 0; OR11 for X = 1, Z = 1 (Fig 1D), to address interactions between two risk factors using the additive (with the null hypothesis OR11 = OR10 + OR01 − 1) and the multiplicative (with the null hypothesis OR11 = OR10 × OR01) models. Due to the nature of the components, more risk factors can exist in each scenario without interfering with the calculations. For each component there was one risk factor that was explicitly made to correlate with its "present" state (Fig 1A–1C), and the frequencies for the risk factor were varied between simulation runs. We subsampled the controls to match the number of cases and resemble how case-control studies reflect the population in a biased way (Fig 1). We calculated Pearson correlation coefficients between the risk factors X and Z (Fig 1E). For the versions of the scenarios without the subsampling (S1 Fig) we calculated RRs instead, as that is the appropriate population measure.
Interactions in rheumatoid arthritis GWAS
We evaluated both additive and multiplicative interaction models on a human case-control, genome-wide association dataset for anti-citrullinated protein antibody positive (ACPA-positive) rheumatoid arthritis (RA), from EIRA [15, 16]. The two top genetic risk factors for ACPA-positive RA in European-descendent populations, HLA-DRB1 shared epitope alleles and PTPN22 rs2476601 T, were tested against all non-HLA risk SNPs. Shared epitope (SE) is a group of HLA-DRB1 alleles with similar effects, and rs2476601 is a non-synonymous coding variant of the PTPN22 gene.
Genotyped and imputed GWAS data from the EIRA study were used in this part of the study (see [15] for sources included). Standard data filtering was performed as previously described [15]. Briefly, genotyping missing rate higher or equal to 5% and P-values of less than 0.001 for Hardy-Weinberg equilibrium in controls were excluded. The SNPs located in the extended HLA region (chr6:27339429–34586722, GRCh37/hg19) were removed, due to the high linkage disequilibrium and possible independent signals of association with ACPA-positive RA in the locus.
Departures from additivity or multiplicativity for risk factors were estimated pairwise in the imputed GWAS data (3,138,911 SNPs for the test with HLA-DRB1 shared epitope and 3,308,784 SNPs for the test with PTPN22 rs2476601 T), using GEISA [23], where a dominant model was assumed. In order to control by differences in allelic frequencies due to population stratification and sex, the first ten principal components (which summarized the genotyping data) and sex were used as covariates in this analysis. A cut-off of minimum five individuals for each OR combination was applied. The HLA-DRB1 shared epitope alleles included *01 (except *0103), *0404, *0405 and *0408 and *1001. The P-values for interaction from these analyses are plotted in Fig 4A and 4B. We included only SNPs at risk allele frequencies between 10% and 50%; however, when we tested all the SNPs at a minor allele frequency above 1% the result provided the same conclusion.
To address both additive and multiplicative risk scales and evaluate the behavior of the ORs for double risk exposure (OR11—Fig 4C, 4D, S3 and S4 Figs), we used genotyped EIRA GWAS data (281,195 SNPs). For this analysis, the data was transposed using Plink 1.07 [24]. Known risk SNPs were selected based on ORs higher than 1.1 and 95% confidence intervals do not overlapping 1, together with the criterion of having been reported as associated to RA in published case-control RA GWAS [5, 25, 26].
Computational packages
Calculations were done using Python, including the packages NumPy [27], SciPy [28], Matplotlib [29], pandas [30], scikit-learn [31], seaborn [32], jupyter and geneview [33].
Data access
Interaction tables are available at Mendeley Data, doi:10.17632/63b47w6zgr.1 and can be viewed at https://data.mendeley.com/datasets/63b47w6zgr/1. Code is available at https://github.com/danielramskold/additive_risk_heterogeneity_multiplicative_risk_synergism2 where we also provide the code used to generate the figures. Pre-existing genetic data from the EIRA cohort was used. Due to ethical considerations, data from EIRA cannot be publicly shared. Please contact the principal investigators for data requests for applicable studies. For further information go to: https://www.eirasweden.se/Kontakt_EIRA.htm.
Results
Scenarios with risk factors contributing to the explicitly causal components
We arranged a simulation scenario for synergism, which is by far the most commonly identified type of interaction between genetic risk factors. For example, synergism has been shown between peptide-presenting HLA-B and the peptidase ERAP1 in breast cancer [34]. We also simulated a scenario for the heterogeneity model, which is less commonly identified between genetic risk factors. An example of heterogeneity is the relationship between genetic variants from the BLK and TNFSF4 genes in systemic lupus erythematosus [35]. In the synergism scenario, two components need to be in the “present” state for the outcome “case” to occur (i.e. logical AND; Fig 1A). In the heterogeneity scenario, it is enough that either one of two components are in the “present” state for a “case” outcome (i.e. logical OR; Fig 1B). We also sought to investigate the multifactorial threshold model, which has been thought of in terms of genetic liability [36] and has been exemplified in diabetes [21]. Here we chose five components, although any number from three upwards could have been used. In this model a minimum number of factors need to be present for the disease or trait to occur, which in our simulation set means that at least t number of components need to be in the “present” state for a “case” outcome (Fig 1C). We also present the results without subsampling, which matches population studies (S1 Fig). For the two risk factors, the synergism scenario corresponds to Rothman’s synergism concept [13, 22], and the heterogeneity scenario corresponds to genetic heterogeneity.
For phenotypes with a low prevalence the presence of both risk factors had a multiplicative relation in the synergism scenario (Fig 1D), which is sometimes referred as "super-additive interaction", or "positive interaction" on the additive scale. For the heterogeneity scenario the presence of both risk factors instead had an additive relation at low prevalence (Fig 1D), which is sometimes referred as "negative interaction" in the multiplicative scale. The multifactorial threshold scenario equated to the heterogeneity scenario at threshold t = 1 and the synergism scenario at t = 5, where 5 meant all components needed to be present for phenotype to occur. We obtained intermediate results for intermediate thresholds; so that at low prevalence the joint behavior of the factors was above additive, but below multiplicative (Fig 1D). Without the subsampling (S1E Fig), the scenarios that produced multiplicative relationships lacked correlation between risk factors, both within cases, controls, and the two combined (“all”). However, the subsampling caused a positive correlation between risk factors in the combined (“all”) group (Fig 1E). On the other hand, the additive and intermediate relations, without the subsampling, were reflected in negative correlations among risk factors in cases but not in controls (S1E Fig). This negative correlation between two risk factors can be understood theoretically in the context of the additive model of interaction, since two independent sufficient factors (meaning that there is heterogeneity of causation) should have a strongly negative correlation. Thus, a similar, but attenuated, pattern of correlation between non-sufficient risk factors is expected. We also present results from a simulation of unbiased random sampling (without the subsampling step) that is equivalent to population studies. In this analysis, we detected a multiplicative relation between RRs or a synergism scenario at every prevalence rate (fraction of cases in the population). For the heterogeneity scenario, the relationship between risk factors was additive at low prevalence but less-than-additive at high prevalence.
AMLES, an interaction metric with a simple interpretation for multifactorial thresholds
As we have seen, the multifactorial threshold model gives rise to intermediary interaction relationships between additivity and multiplicativity for the combinations of the risk factors involved (Fig 1D). We therefore set a scale to calculate the intermediary relationships among risk factors. To anchor the scale at 0 for additive relationships, the expected effect from the additive model is subtracted from the observed double exposure OR. Consequently, to set the scale at 1 for multiplicative relationships, the previous subtraction is divided by the difference between the expected OR for the multiplicative model and the expected OR for the additive model. This provides an additive-to-multiplicative linear scale (AMLES):
where expected(additive) = OR10 + OR01 − 1 and expected(multiplicative) = OR10 × OR01.
Then, to be able to compare AMLES with the threshold in our simulated threshold model, we scaled the threshold as the ratio:
where F is the number of components (factors) that could cross the threshold t, in order to have a 0 to 1 scale (see Fig 2). We found that the square of the threshold ratio fits the scaled OR. This means that the OR at a low prevalence (such as 0.5%) for doubly exposed (OR11) was
where √f is the square root of fthr. The metric used, AMLES has the convenient properties of having both a natural lower value (0 for additive) and higher value (1 for multiplicative), as well as an interpretable scale between these values, therefore it is also related to the threshold fraction fthr. However, the metric fest = sign(AMLES) × (AMLES)2 has a more natural scale, but unlike AMLES this is not symmetric around the median, making it a less convenient scale (S2 Fig). AMLES is related to another measure of interaction size, relative excess risk due to interaction (RERI = RR11 − RR10 − RR01 + 1) [12, 22], by AMLES = RERI / (OR10 − 1) / (OR01 − 1) in cases when ORs and RRs tend to be the same. AMLES could perhaps be used instead of measures like attributable proportion, synergy index and RERI, given that it can be indicative of multiplicative or additive risk regardless of the scale (additive or multiplicative) used for testing.
Three risk factors
Relationships involving more than two potential causes can be analyzed by pair-wise interactions, as can be seen in our simulation scenario based on the multifactorial threshold model. Although an outcome with more than two causes will not always entail a threshold model, it may be possible to break down the multifactorial relationships into pairs by those involving causal heterogeneity and synergism. For example, in the case of outcome = X AND (Z OR V) = (X AND Z) OR (X AND V), the relationships among these risk factors should be distinguishable as causal heterogeneity between Z and V (from the logical OR) and synergism between X and each of the other risk factors (from the logical AND) (Fig 2B). This might be how body mass index (BMI) interacts with variants of the F5 and ABO genes (as if X = BMI, Z = F5 SNP, V = ABO SNP) in ventral thromboembolism risk [37, 38]. Our simulation performs as predicted when we tested this example against the expected RR from all eight possible combinations of both logical OR and AND (Fig 2C, S1C Fig).
Scenarios where confounder effects cause false interaction
We investigated two confounder scenarios, neither of which include any causal relationship between the risk factors (e.g., heterogeneity or synergism) and are therefore applicable when either or both are false risk factors. The first scenario describes when the genetic background for cases and controls are mismatched (Fig 3A). It can also be described as an effect of data reuse, where the same data set is used to define risk factors and to calculate interactions, as was done in a previous interaction study [18]. In the other scenario we simulated multiple ancestry groups, which may translate to different allelic frequencies across groups, a well-known confounder that is generally considered in the design or corrected for in genetic association studies [39]. In our simulation we condensed it down to two groups (Fig 3B). We did not model linkage disequilibrium (LD) in our simulations, as the type of simulations we used do not accommodate LD, but we tested our observations on a real GWAS of ACPA-positive RA (anti-citrullinated protein antibody positive rheumatoid arthritis). It may be advisable to prune for LD when addressing interactions in genome-wide data sets [40].
For the first confounder scenario (Fig 3A), ORs (Fig 3C) and correlations (Fig 3D) at high fractions of cases are more relevant than RRs (Fig 3E). We found the ORs relationship to be multiplicative, regardless of the balance between cases and controls, which makes this confounder scenario indistinguishable from real synergism. This was also true for correlation coefficients, where it would only be in population data that it would be possible to distinguish synergism from the confounder scenario (Figs 1E and 3D, S1E Fig) by the former’s lack of correlation in that situation. The second confounder scenario (Fig 3B) had negative-additive ORs and RRs relationships, matching additive in a high proportion of cases (Fig 3C and 3E). The risk factors correlation among the three groups used (all, cases, controls) was always positive (Fig 3D), which should be useful in distinguishing this confounder scenario from real interactions. We also tested what happens when adding a biased subsampling step, as in Fig 1A–1C, to Confounder II, but it made no difference to the results (S3 Fig). We also devised one simulation scheme that always produced additive RRs relationships and one that always produced additive ORs relationships (S4 Fig), because such schemes could guide randomization, and to complement the always-multiplicative OR of the first confounder scenario.
Example from rheumatoid arthritis
Both the synergism and heterogeneity scenarios represent interesting relationships between two risk factors, thus the appropriate interaction model to use depends on the hypothesis one is interested in. Alternatively, a hypothesis-free approach would be ideal, and the closest to that is to evaluate both additive and multiplicative hypotheses, as this would cover both models as well as threshold-based scenarios due to their intermediate nature (i.e. they would fail both types of tests, in the positive direction for additive and negative direction for multiplicative). We therefore evaluated both additive and multiplicative interaction for the two top genetic risk factors (HLA-DRB1 shared epitope alleles and PTPN22 rs2476601 T allele) for ACPA-positive RA in European-descent population, using GWAS data from the EIRA study. We detected no deviation for multiplicativity, but did for additivity (Fig 4A and 4B), as reported before for HLA-DRB1 [15]. The new simulations presented here increases our ability to interpret this result as a widespread interaction between HLA-DRB1 shared epitope and all non-HLA genetic risk factors, in the common meaning of interaction where synergism is a type of interaction. Correlation analyses backed up synergism (but not heterogeneity or population stratification) as an appropriate interpretation of the results (S5 Fig). From this, we could derive that the HLA-DRB1 shared epitope cannot be substituted or phenocopied by a non-HLA genetic risk factor for its part in the chain of ACPA-positive RA etiology (Fig 4A, 4C and 4D). The same was the case for the PTPN22 risk allele, given the similarities in P-value distributions observed (Fig 4A and 4B). For both set of tests there were a majority of tested loci where there was too little data to distinguish additive from multiplicative ORs. We followed up the results of multiplicativity by looking only at known risk SNPs (Fig 4D), finding results similar to a randomization based on the Confounder I scenario (and therefore bound to produce multiplicative odds ratios), with similar variability (P = 0.6–0.8, Levene’s test, n = 9–11 SNPs) implying a dearth of non-multiplicative ORs relationships (Fig 4E). This randomization is the same as Test III of a previous study by Ignac et al [41]. We also devised a randomization scheme creating additive ORs based on the Additive O scheme of S4 Fig (intended to provide always additive effect relationships) and tested it on the full SNP set (S6 Fig). This result showed a very noticeable deviation from the real data, as expected.
Discussion
We herein present a simulation approach intended to help with the interpretation of additive and multiplicative models of interaction of RRs and ORs. We show that addition of ORs, or negative deviation from the multiplicative interaction model (OR11 < 1, when testing OR11 = OR10 × OR01), occurs when the risk comprises two independent risk factors (heterogeneity scenario, Fig 1B), or a process that approximating to that setup at a given fraction of cases (S4 Fig). Multiplication of ORs, or positive deviation from additive interaction model (OR11 > 1, when testing OR11 = OR10 + OR01 − 1), follows if two different mechanisms are required for disease (synergism scenario, Fig 1A). Depending on the hypothesis one should therefore choose the appropriate statistical test. To identify deviation from synergism, deviation from a multiplicative relationship could be tested. Often however, if we are interested in testing whether disease is caused by the interaction of two factors, it is appropriate to test for deviation from additivity. A multiplicative assumption would have merit in our testing against HLA-DRB1 shared epitope and PTPN22, if ACPA-positive RA were a homogeneous set of causes, rather than the kind of heterogeneity of causation that we have shown gives rise to additivity between risk factors. Nevertheless, ACPA-positive RA may have a certain level of heterogeneity of causes, despite being defined as a subgroup of patients where ACPA positivity is a mediating risk factor [42]. Therefore, in this study, we also tested the multiplicative interaction between the strongest genetic risk for ACPA-positive RA and other risk-SNPs in the same material as previously published by Diaz-Gallo et al [15], these results showed that there is not deviation from the null hypothesis for the multiplicative model of interaction. However, we only tested the main other model, not for example some version of the multifactorial threshold model. In light of the new understanding that our simulations give, the presence of deviation from additivity, along with no deviation from multiplicativity, supports the existence of widespread synergism between the genetic risk factors in causing ACPA-positive RA. This result is also applicable to the estimation of heritability for RA, since the assumption of an additive model would lead to an underestimation of narrow-sense heritability [43].
Synergism can be viewed as a chain of events scenario, whereas causal heterogeneity corresponds to phenocopying. In terms of Rothman’s sufficient-cause model, the risk factors X and Z in the synergism scenario correspond to risk factors in the same cause, referred to as causal co-action, joint action or synergism, whereas X and Z in the heterogeneity scenario correspond to risk factors in different causes [22].
The fact that most loci showed no statistically significant deviation from neither additive nor multiplicative interaction will be the unfortunate reality for many applications of interaction testing. While statistical power for single risk factor testing scales with the inverse square of the number of samples, already forcing large GWAS sample sizes, the statistical power for interaction testing scales to the inverse power of four [43], thus requiring far larger sample sizes than standard association testing.
Our inspiration for this work came from a simulation study [18] in which case or control status was randomly assigned, and one risk factor was simulated to resemble the strongest genetic risk factor for ACPA-positive RA, and interaction with other risk factors (selected by P-value for risk) was computed. The simulation led to an overrepresentation of additive interactions (i.e., deviation for additive odds ratios). Our Confounder I scenario (Fig 3A) was produced an analogous result. The author [18] noted that his simulation produced a multiplicative null model that does not match additivity and concluded that the additive interactions observed were erroneous, as no interaction should be present. We propose an alternative interpretation, based on the convergences we found: the Confounder I scenario, and thereby also the previous simulation [18] perfectly mimics synergism in a case-control study with a similar number of cases and controls selected from a population with a low prevalence, as can be seen by comparing our figures (Figs 1D, 1E, 3C and 3D). By low prevalence, we mean the 0.5% mark in our simulations, which is similar to RA prevalence: 0.7% for all RA in Sweden [44], of which 60% are ACPA-positive [45]). It should be no surprise that a simulation [18] of synergism, which of course is interaction [46], produces many significant results, although it is of course worrying that a confounder like data reuse can produce this effect, as the study [18] showed. We conclude that the previous simulation study [18] is in line with the confusion over additive and multiplicative interaction that can sometimes be found in the literature [47], highlighting the need to understand better the relationships among risk factors that different interaction types imply.
In this simulation study we demonstrate the causal interpretations of additive and multiplicative interaction in both the RR and OR settings. Some of this has been understood intuitively in the past, especially the connection between multiplicative effect and logical AND [48], but here we contribute with further clarification of the situation through simulation. We hope that this will help guide the interpretation of future interaction studies.
Supporting information
Acknowledgments
We thank Anton Larsson for helping with the literature search, Lars Klareskog, Lars Alfredsson and Leonid Padyukov for providing access to EIRA data and engaging in scientific discussions, and Janet Ahlberg for English editing.
Data Availability
Pre-existing genetic data from the EIRA cohort was used. Due to ethical considerations, data from EIRA cannot be publicly shared. Please contact the EIRA’s principal investigators for data requests for applicable studies. For further information go to: https://www.eirasweden.se/Kontakt_EIRA.htm.
Funding Statement
L.M.D.G.: Ulla och Gustaf af Uggla Foundation 2018-02670 https://staff.ki.se/ulla-and-gustaf-af-uggla-foundation L.M.D.G.: Reumatikerförbundet R-861801, R-932138 https://reumatiker.se L.M.D.G.: Konung Gustaf V:s 80-årsfond FAI-2018-0518, FAI-2019-0597 https://www.kungahuset.se/monarkinhovstaterna/kungligastiftelser/forskning/konunggustafvs80arsfond/ L.M.D.G.: Stiftelsen Professor Nanna Svartz Fond 2019-00318 https://www.stiftelsemedel.se/stiftelsen-professor-nanna-svartz-fond/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 years of GWAS discovery: Biology, function and translation. Am J Hum Genet. 2017; 101: 5–22. 10.1016/j.ajhg.2017.06.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Buniello A, Macarthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019. 10.1093/nar/gky1120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Nelson MR, Tipney H, Painter JL, Shen J, Nicoletti P, Shen Y, et al. The support of human genetic evidence for approved drug indications. Nat Genet. 2015. 10.1038/ng.3314 [DOI] [PubMed] [Google Scholar]
- 4.Martin P, Ding J, Duffus K, Gaddi VP, Mcgovern A, Ray-Jones H, et al. Chromatin interactions reveal novel gene targets for drug repositioning in rheumatic diseases. Ann Rheum Dis. 2019;78: 1127–1134. 10.1136/annrheumdis-2018-214649 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Okada Y, Wu D, Trynka G, Raj T, Terao C, Ikari K, et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 2014; 506: 376–381. 10.1038/nature12873 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Hunter DJ, Drazen JM. Has the Genome Granted Our Wish Yet? N Engl J Med. 2019. 10.1056/NEJMp1904511 [DOI] [PubMed] [Google Scholar]
- 7.Maher B. Personal genomes: The case of the missing heritability. Nature. 2008;456: 18–21. 10.1038/456018a [DOI] [PubMed] [Google Scholar]
- 8.Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009. 10.1038/nature08494 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Gilbert-Diamond D, Moore JH. Analysis of gene-gene interactions. Curr Protoc Hum Genet. 2011; Chapter 1, Unit 1.14, 10.1002/0471142905.hg0114s70 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bloom JS, Ehrenreich IM, Loo WT, Lite TLV, Kruglyak L. Finding the sources of missing heritability in a yeast cross. Nature. 2013;494: 234–237. 10.1038/nature11867 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Forsberg SKG, Bloom JS, Sadhu MJ, Kruglyak L, Carlborg Ö. Accounting for genetic interactions improves modeling of individual quantitative trait phenotypes in yeast. Nat Genet. 2017;49: 497–503. 10.1038/ng.3800 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rothman KJ. Causes. Am J Epidemiol. 1976; 104: 587–592. 10.1093/oxfordjournals.aje.a112335 [DOI] [PubMed] [Google Scholar]
- 13.Vanderweele TJ. Sufficient cause interactions and statistical interactions. Epidemiology. 2009;20: 6–13. 10.1097/EDE.0b013e31818f69e7 [DOI] [PubMed] [Google Scholar]
- 14.Wang X, Elston RC, Zhu X. The meaning of interaction. Hum Hered. 2011;70: 269–277. 10.1159/000321967 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Diaz-Gallo LM, Ramsköld D, Shchetynsky K, Folkersen L, Chemin K, Brynedal B, et al. Systematic approach demonstrates enrichment of multiple interactions between non-HLA risk variants and HLA-DRB1 risk alleles in rheumatoid arthritis. Ann Rheum Dis. 2018; 77: 1454–1462. 10.1136/annrheumdis-2018-213412 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Padyukov L, Seielstad M, Ong RTH, Ding B, Rönnelid J, Seddighzadeh M, et al. A genome-wide association study suggests contrasting associations in ACPA-positive versus ACPA-negative rheumatoid arthritis. Ann Rheum Dis. 2011;70: 259–265. 10.1136/ard.2009.126821 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kendler KS, Gardner CO. Interpretation of interactions: guide for the perplexed. Br J Psychiatry 2010; 197: 170–171. 10.1192/bjp.bp.110.081331 [DOI] [PubMed] [Google Scholar]
- 18.Kim K. Massive false-positive gene-gene interactions by Rothman’s additive model. Ann Rheum Dis. 2019; 78: 437–439. 10.1136/annrheumdis-2018-214297 [DOI] [PubMed] [Google Scholar]
- 19.Clayton D. Commentary: reporting and assessing evidence for interaction: why, when and how? Int J Epidemiol. 2012; 41: 707–710. 10.1093/ije/dys069 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Weinberg CR. Less is more, except when less is less: Studying joint effects. Genomics 2009; 93: 10–12. 10.1016/j.ygeno.2008.06.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Génin E, Clerget-Darpoux F. Revisiting the polygenic additive liability model through the example of diabetes mellitus. Hum Hered. 2015; 80: 171–177. 10.1159/000447683 [DOI] [PubMed] [Google Scholar]
- 22.Rothman KJ, Greenland S. Modern epidemiology, 2nd edition, page 12, Lippincott Williams Wilkins (Wolters Kluwer), Philadelphia USA, ISBN 0-316-75780-1. 1998.
- 23.Uvehag D, Zazzi H. “Geisa.” https://github.com/menzzana/geisa. 2014
- 24.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007; 81: 559–575. 10.1086/519795 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Karlson EW, Chibnik LB, Kraft P, Cui J, Keenan BT, Ding B, et al. Cumulative association of 22 genetic variants with seropositive rheumatoid arthritis risk. Ann Rheum Dis. 2010; 69: 1077–1085. 10.1136/ard.2009.120170 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Stahl EA, Raychaudhuri S, Remmers EF, Xie G, Eyre S, Thomson BP, et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat Genet. 2010; 42: 508–514. 10.1038/ng.582 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.van der Walt S, Colbert SC, Varoquaux G. The NumPy Array: A Structure for Efficient Numerical Computation. Comput Sci Eng. 2011; 13, 22–30. 10.1109/MCSE.2011.37 [DOI] [Google Scholar]
- 28.Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020; 17: 261–272. 10.1038/s41592-019-0686-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hunter JD. Matplotlib: A 2D Graphics Environment. Comput Sci Eng. 2007; 9, 90–95. 10.1109/MCSE.2007.55 [DOI] [Google Scholar]
- 30.McKinney W. Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference 2010; 51–56
- 31.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011; 12, 2825–2830. [Google Scholar]
- 32.Waskom M. "Seaborn." https://seaborn.pydata.org/. 2012
- 33.Huang S.”Geneview.” https://github.com/ShujiaHuang/geneview. 2016
- 34.Evans DM, Spencer CC, Pointon JJ, Su Z, Harvey D, Kochan G et al. Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility. Nat Genet. 2011. 43: 761–767. 10.1038/ng.873 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zhou XJ, Lu XL, Nath SK, Lv JC, Zhu SN, Yang HZ, et al. Gene-gene interaction of BLK, TNFSF4, TRAF1, TNFAIP3, and REL in systemic lupus erythematosus. Arthritis Rheum. 2012. 10.1002/art.33318 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Todorov AA, Suarez BK. Genetic liability model. Encyclopedia of Biostatistics 2005. 10.1002/0470011815.b2a05036 [DOI] [Google Scholar]
- 37.Kim J, Kraft P, Hagan KA, Harrington LB, Lindstroem S, Kabrhel C. Interaction of a genetic risk score with physical activity, physical inactivity, and body mass index in relation to venous thromboembolism risk. Genet Epidemiol. 2018. 42: 354–365. 10.1002/gepi.22118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Heit JA, Armasu SM, Asmann YW, Cunningham JM, Matsumoto ME, Petterson TM et al. A genome-wide association study of venous thromboembolism identifies risk variants in chromosomes 1q24.2 and 9q. J Thromb Haemost. 2012. 10:1521–1531. 10.1111/j.1538-7836.2012.04810.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Rio S, Mary-Huard T, Moreau L, Bauland C, Palaffre C, Madur D et al. Disentangling group specific QTL allele effects from genetic background epistasis using admixed individuals in GWAS: An application to maize flowering. 2020. 16: e1008241. 10.1371/journal.pgen.1008241 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Joiret M, Mahachie John JJ, Gusareva ES, Van Steen K. Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies. BioData Min. 2019. 12: 11. 10.1186/s13040-019-0199-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Ignac TM, Skupin A, Sakhanenko NA, Galas DJ. Discovering pair-wise genetic interactions: an information theory-based approach. PLoS ONE 2014; 9: e92310. 10.1371/journal.pone.0092310 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Sokolove J. Rheumatoid arthritis pathogenesis and pathophysiology. Respiratory Medicine, 19–30. 2017. 10.1007/978-3-319-68888-6_2 [DOI] [Google Scholar]
- 43.Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci U S A. 2012; 109: 1193–1198. 10.1073/pnas.1119675109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Sjölander A, Lee W, Källberg H, Pawitan Y. Bounds on causal interactions for binary outcomes. Biometrics 2014; 70: 500–5. 10.1111/biom.12166 [DOI] [PubMed] [Google Scholar]
- 45.Neovius M, Simard JF, Askling J, ARTIS study group. Nationwide prevalence of rheumatoid arthritis and penetration of disease-modifying drugs in Sweden. Ann Rheum Dis. 2011; 70: 624–629. 10.1136/ard.2010.133371 [DOI] [PubMed] [Google Scholar]
- 46.Jiang X, Källberg H, Chen Z, Ärlestig L, Rantapää-Dahlqvist S, Davila S, et al. An Immunochip-based interaction study of contrasting interaction effects with smoking in ACPA-positive versus ACPA-negative rheumatoid arthritis. Rheumatology 2016; 55: 149–155. 10.1093/rheumatology/kev285 [DOI] [PubMed] [Google Scholar]
- 47.Källberg H, Bengtsson C. Chapter One—Terminology and definitions for interaction studies, in between the lines of genetic code. In: Padyukov L, Between the Lines of Genetic Code. Academic Press. pages 3–23. 2014. [Google Scholar]
- 48.Li W, Reich J. A complete enumeration and classification of two-locus disease models. Hum Hered. 2000; 50: 334–349. 10.1159/000022939 [DOI] [PubMed] [Google Scholar]