Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 Apr 26;16(4):e0250282. doi: 10.1371/journal.pone.0250282

Understanding interactions between risk factors, and assessing the utility of the additive and multiplicative models through simulations

Lina-Marcela Diaz-Gallo 1, Boel Brynedal 2, Helga Westerlind 3, Rickard Sandberg 4, Daniel Ramsköld 4,*
Editor: Brecht Devleesschauwer5
PMCID: PMC8075235  PMID: 33901204

Abstract

Understanding the genetic background of complex diseases requires the expansion of studies beyond univariate associations. Therefore, it is important to use interaction assessments of risk factors in order to discover whether, and how genetic risk variants act together on disease development. The principle of interaction analysis is to explore the magnitude of the combined effect of risk factors on disease causation. In this study, we use simulations to investigate different scenarios of causation to show how the magnitude of the effect of two risk factors interact. We mainly focus on the two most commonly used interaction models, the additive and multiplicative risk scales, since there is often confusion regarding their use and interpretation. Our results show that the combined effect is multiplicative when two risk factors are involved in the same chain of events, an interaction called synergism. Synergism is often described as a deviation from additivity, which is a broader term. Our results also confirm that it is often relevant to estimate additive effect relationships, because they correspond to independent risk factors at low disease prevalence. Importantly, we evaluate the threshold of more than two required risk factors for disease causation, called the multifactorial threshold model. We found a simple mathematical relationship (square root) between the threshold and an additive-to-multiplicative linear effect scale (AMLES), where 0 corresponds to an additive effect and 1 to a multiplicative. We propose AMLES as a metric that could be used to test different effects relationships at the same time, given that it can simultaneously reveal additive, multiplicative and intermediate risk effects relationships. Finally, the utility of our simulation study was demonstrated using real data by analyzing and interpreting gene-gene interaction odds ratios from a rheumatoid arthritis case-control cohort.

Introduction

Genetic mapping studies of complex human traits have identified thousands of genetic loci implicated in the susceptibility to complex diseases [1, 2]. In other words, genome-wide association studies (GWAS) have linked thousands of single-nucleotide polymorphisms with complex human traits [1, 2]. These findings have been beneficial in areas such as drug repositioning [1, 35]. However, the identified individual genetic associations seldom exhibit strong disease risks and explain a small portion of the calculated heritability for each trait [1, 68]. This implies that the genetic associations have a very poor predictive value [6]. Comprehensive interaction analysis between, and among, risk factors is an important tool for understanding the genetic background of complex traits [711]. In general, interaction is said to be present when the combined effect magnitude of two or more factors is significantly different from the combined effect magnitude predicted by the model being tested [1214]. Nevertheless, it is challenging to address whether, and how, genetic risk factors interact in shaping human traits, and to biologically interpret those interactions [9, 14]. Additionally, there is often confusion in the interpretation of the results from different interaction models. We therefore, in this study, provide a framework of simulated scenarios that represent common and simple processes where two or more genetic factors may interplay in disease causation. We investigate whether, and how, two or multiple genetic factors interact in each simulated scenario. We also address these relationships in real data from a case-control study in rheumatoid arthritis (RA), the Swedish epidemiological investigation of RA (EIRA) [15, 16].

The association between an individual genetic variant and an outcome (e.g., disease) is typically quantified as an odds ratio (OR) or relative risks (RR). The case-control design is often used for diseases with a low prevalence, where the resulting ORs approximate the RRs. Interactions studies generally examine two risk factors at a time, yielding three ORs (or RRs) notated as [13, 14]: OR11 for carrying both risk factors; OR10 and OR01 for the exclusive combinations; and OR00 for the absence of both risk factors, which is used as reference (OR00 = 1). Two different models are commonly used to test interaction, the additive (whose null hypothesis is OR11 = OR10 + OR01 − 1) and the multiplicative (whose null hypothesis is OR11 = OR10 × OR01). The additive model builds on the sufficient cause concept of KJ Rothman [12], who showed that if two factors are part of the same sufficient cause of a disease (e.g., pathway), then their joint risk will be larger than the sum thereof (often termed “departure from additivity”). This additive model has been criticized for always giving positive results when used as a null hypothesis [17, 18]. On the other hand, the multiplicative model has been criticized as a statistical convenience without a theoretical basis, boosted by the implicit multiplicativity in logistic regression [17, 19, 20]. There is often confusion regarding when and how to use and interpret each of these models.

In addition to evaluate the additive and multiplicative interaction models, we study a third model in this paper, the multifactorial threshold model [21]. This model assumes that there is a minimum number of factors required for disease causation, an assumption with a theoretical foundation in the concept of genetic liability [22]. Until now, no a statistical metric for the multifactorial threshold model has been available. We therefore propose a new metric, AMLES (additive-to-multiplicative linear effect scale), which compares the number of risk factors to a threshold value. Our proposed metric could help to determine the validity of the multifactorial threshold model, which has been criticized for simplifying disease biology [21].

Simulations are able to give answers with a high level of precision if enough data points are generated. They also provide great flexibility to fit a number of scenarios. We here build different scenarios in each of which we describe the cause of the outcome (disease or otherwise), with the purpose of tracing how a particular type of causality would be represented in an interaction study. Our intension is to include commonly occurring and simple real-world scenarios, and practical examples from the literature are provided. We further build scenarios where we model situations with confounding factors commonly identified in genetic association studies, such as population stratification. We also show when the modelled scenarios can be distinguished in real data, and where they cannot. Across all simulations the risk factors are dichotomous, and neither necessary nor sufficient for disease to occur. While most of our simulated scenarios have two risk factors, we include modelling of more than two factors and illustrate how it is possible to elucidate multifactorial interactions from a set of pairwise interactions. We compare the simulated scenarios to additive and multiplicative risk scales, aiming to contribute to the understanding and interpretation of different models of interaction. Finally, we propose a metric in the framework of the multifactorial threshold model, which can estimate the fit to the additive and multiplicative models, as well as testing against a particular threshold level involving more than two factors.

Methods

Simulations

To better understand interactions between risk factors, we first set up simulations of causation, which are represented in Fig 1A to 1C. Each scenario included a dichotomous outcome (present/absent), and causative processes, termed components. For each of the different scenarios, we performed 1,000 simulations (Figs 1A–1C, 2B, 3A, 3B, S1A–S1C, S2A and S2B Figs). Each simulation consisted of one million data points (i.e., simulated individuals) where presence or absence of a risk allele and status of case or control was assigned in accordance with each model. As sensitivity analysis, we performed simulations with 100,000 data points. These simulations gave essentially the same results and were therefore not included. For the sake of simplicity, binary factors, corresponding to a dominant or recessive scenario in genetics, were used. We also used subgroups with equal ratio of cases and controls, named components, except for the three factors model (Fig 2B), where components 2 and 3 had the same ratio and component 1 had the same ratio as the logical-OR combination for the other two.

Fig 1. Three simulated causal scenarios with selection of equal numbers of cases and controls.

Fig 1

A-C: Simulation schemes for three generalized scenarios in a case-control study context: Synergism of causes (A), heterogeneity of causes (B), and a multifactorial or 5-factor threshold (C). The numbers are example frequencies, and numbers in bold highlight the higher frequencies of the simulated risk factors (X and Z) associated with disease. For example, “Z: freq. 0.3” means that each simulated individual in the group had a 30% chance of being assigned the risk factor Z. Numbers in italic are the average frequency in the other group of simulated individuals; note that this will depend on the prevalence (which is adjusted in the scenarios in the “split” into cases and controls). “Components” (comp1 and comp2) were used as a strategy to obtain probabilistic risk factors. D: ORs for double risk (OR11) were calculated from the simulation scenarios, with boxes summarizing 1,000 simulation runs with different risk factor frequencies. The observed OR11 were compared to the expected combinations of the odds ratio for single risk (OR10 and OR01) in the additive and multiplicative models. Boxplots show median and quartiles for the simulations, but extreme values are omitted for clarity. Yellow arrows highlight where the median is visibly close to the null hypothesis for the multiplicative model, while blue arrows do the same for the additive model, for the two most extreme simulated prevalence rates. “M. threshold” refers to multifactorial threshold (scenario C). E: Correlation coefficients between the risk factors X and Z, for three sample sets (all, cases only, controls only). The relevant signal in each case is whether the median is negative, zero or positive, highlighted with a -, 0 or + symbol for the two most extreme simulated prevalence rates.

Fig 2. Relationships among risk factors in the context of the multifactorial threshold model and a 3-factor causal scenario.

Fig 2

A: AMLES (additive-to-multiplicative linear effect scale; grey boxes) and √fest (red curve) calculated using the scenario in Fig 1C and plotted on the y axis. The notches in the box plots show bootstrapped 95% confidence intervals for the medians. The simulation had 0.5% cases and 99.5% controls, with a range ±0.04%. B: A 3-factor simulated scenario. The numbers are example frequencies, and frequencies in bold highlight the higher frequencies representing association with disease. For example, “Z: freq. 0.3” means that each simulated individual in the group had a 30% chance of being assigned the risk factor Z. This scenario required one “component” (where X is associated with risk) together with either one of another two components (where either Z or V is associated with risk) to produce the outcome. C: Comparison of the eight possible combinations of additive (add) and multiplicative (mult) relationships from the simulation in (B). The second combination, “mult,mult,add” is highlighted by a box since the theoretical expectation was met. There are AND relations between X and Z, X and V but an OR relationship between Z and V. Therefore, RR100 (where only X is present) is multiplied by the addition of Z and V’s relative risks, assuming that at a low trait prevalence the results from logical AND and OR follow the multiplicative and additive models, respectively.

Fig 3. Two simulated confounder scenarios.

Fig 3

A: Simulated confounder I, where the cases and controls come from different genetic backgrounds, which was simulated by testing interactions on the same data that the tested genetic risk factors were selected from. B: Simulated confounder II, where a mix of groups with both different genetic backgrounds and different prevalence rates were simulated. The numbers are example frequencies, and frequencies in bold highlight the higher frequencies of the simulated risk factors (X and Z) associated with disease. For example, “Z: freq. 0.3” means that each simulated individual in the group had a 30% chance of being assigned the risk factor Z. In each indicated circle the two risk factors have been added in an uncorrelated/independent manner. C: For each of the simulated confounder scenarios, the observed OR11 (odds ratios for double risk) were compared to the expected ones for both the additive and multiplicative models. The boxes summarizing 1,000 simulation runs with different risk factor frequencies. The boxplots show median and quartiles for the simulations, but extreme values are omitted for clarity. Yellow arrows highlight where the median is visibly close to the null hypothesis for the multiplicative model, while blue arrows do the same for the additive model, for the two most extreme simulated trait prevalences. D: Correlation coefficients between the risk factors X and Z. The relevant signal in each case is whether the median is negative, zero or positive, highlighted with a -, 0 or + symbol for the two most extreme simulated prevalence rates. E: Like (C), but for RR.

Allele frequencies for the two exposures, named X and Z, tested in each model, were established before each simulation by drawing a random value within a given range. For instance, if the lower frequency for the risk factor Z was set, a random value between 5% and 15% was chosen, then the higher frequency for Z was set by multiplying this figure by a random number between 1.1 and 4. The allele frequency for the exposure X was established in a similar fashion but using a range between 5% and 25% to select the random value, which was then multiplied by a random number between 1.1 and 2 to set the higher frequency.

For each scenario we calculated several metrics. We calculated ORs for the combinations of the risk factors: OR01 for X = 0, Z = 1; OR10 for X = 1, Z = 0; OR11 for X = 1, Z = 1 (Fig 1D), to address interactions between two risk factors using the additive (with the null hypothesis OR11 = OR10 + OR01 − 1) and the multiplicative (with the null hypothesis OR11 = OR10 × OR01) models. Due to the nature of the components, more risk factors can exist in each scenario without interfering with the calculations. For each component there was one risk factor that was explicitly made to correlate with its "present" state (Fig 1A–1C), and the frequencies for the risk factor were varied between simulation runs. We subsampled the controls to match the number of cases and resemble how case-control studies reflect the population in a biased way (Fig 1). We calculated Pearson correlation coefficients between the risk factors X and Z (Fig 1E). For the versions of the scenarios without the subsampling (S1 Fig) we calculated RRs instead, as that is the appropriate population measure.

Interactions in rheumatoid arthritis GWAS

We evaluated both additive and multiplicative interaction models on a human case-control, genome-wide association dataset for anti-citrullinated protein antibody positive (ACPA-positive) rheumatoid arthritis (RA), from EIRA [15, 16]. The two top genetic risk factors for ACPA-positive RA in European-descendent populations, HLA-DRB1 shared epitope alleles and PTPN22 rs2476601 T, were tested against all non-HLA risk SNPs. Shared epitope (SE) is a group of HLA-DRB1 alleles with similar effects, and rs2476601 is a non-synonymous coding variant of the PTPN22 gene.

Genotyped and imputed GWAS data from the EIRA study were used in this part of the study (see [15] for sources included). Standard data filtering was performed as previously described [15]. Briefly, genotyping missing rate higher or equal to 5% and P-values of less than 0.001 for Hardy-Weinberg equilibrium in controls were excluded. The SNPs located in the extended HLA region (chr6:27339429–34586722, GRCh37/hg19) were removed, due to the high linkage disequilibrium and possible independent signals of association with ACPA-positive RA in the locus.

Departures from additivity or multiplicativity for risk factors were estimated pairwise in the imputed GWAS data (3,138,911 SNPs for the test with HLA-DRB1 shared epitope and 3,308,784 SNPs for the test with PTPN22 rs2476601 T), using GEISA [23], where a dominant model was assumed. In order to control by differences in allelic frequencies due to population stratification and sex, the first ten principal components (which summarized the genotyping data) and sex were used as covariates in this analysis. A cut-off of minimum five individuals for each OR combination was applied. The HLA-DRB1 shared epitope alleles included *01 (except *0103), *0404, *0405 and *0408 and *1001. The P-values for interaction from these analyses are plotted in Fig 4A and 4B. We included only SNPs at risk allele frequencies between 10% and 50%; however, when we tested all the SNPs at a minor allele frequency above 1% the result provided the same conclusion.

Fig 4. Application to genome-wide association data for rheumatoid arthritis.

Fig 4

A-B: P-value distribution for two tests, one for deviation from additivity and one from multiplicativity. Two risk factors (see text for alleles) were tested in EIRA data against the rest of the genome except nearby SNPs. Each bin is 0.01 wide. A uniform distribution means a lack of deviation from the null model. C: Manhattan-like plots for the observed p-values from the tested multiplicativity or additivity models of interaction, for HLA-DRB1 shared epitope against the rest of the genome except nearby SNPs. D: Odds ratio (OR) for HLA-DRB1 shared epitope and one other risk factor, compared to that expected from an additive or multiplicative null model. Only known RA risk SNPs from the literature are shown. Black bars show median and 95% confidence intervals (bootstrap). E: The same SNPs as for D but shuffled within cases and shuffled within controls to match confounder I, thereby being a positive control for multiplicative ORs and allowing a comparison of dispersion with D.

To address both additive and multiplicative risk scales and evaluate the behavior of the ORs for double risk exposure (OR11Fig 4C, 4D, S3 and S4 Figs), we used genotyped EIRA GWAS data (281,195 SNPs). For this analysis, the data was transposed using Plink 1.07 [24]. Known risk SNPs were selected based on ORs higher than 1.1 and 95% confidence intervals do not overlapping 1, together with the criterion of having been reported as associated to RA in published case-control RA GWAS [5, 25, 26].

Computational packages

Calculations were done using Python, including the packages NumPy [27], SciPy [28], Matplotlib [29], pandas [30], scikit-learn [31], seaborn [32], jupyter and geneview [33].

Data access

Interaction tables are available at Mendeley Data, doi:10.17632/63b47w6zgr.1 and can be viewed at https://data.mendeley.com/datasets/63b47w6zgr/1. Code is available at https://github.com/danielramskold/additive_risk_heterogeneity_multiplicative_risk_synergism2 where we also provide the code used to generate the figures. Pre-existing genetic data from the EIRA cohort was used. Due to ethical considerations, data from EIRA cannot be publicly shared. Please contact the principal investigators for data requests for applicable studies. For further information go to: https://www.eirasweden.se/Kontakt_EIRA.htm.

Results

Scenarios with risk factors contributing to the explicitly causal components

We arranged a simulation scenario for synergism, which is by far the most commonly identified type of interaction between genetic risk factors. For example, synergism has been shown between peptide-presenting HLA-B and the peptidase ERAP1 in breast cancer [34]. We also simulated a scenario for the heterogeneity model, which is less commonly identified between genetic risk factors. An example of heterogeneity is the relationship between genetic variants from the BLK and TNFSF4 genes in systemic lupus erythematosus [35]. In the synergism scenario, two components need to be in the “present” state for the outcome “case” to occur (i.e. logical AND; Fig 1A). In the heterogeneity scenario, it is enough that either one of two components are in the “present” state for a “case” outcome (i.e. logical OR; Fig 1B). We also sought to investigate the multifactorial threshold model, which has been thought of in terms of genetic liability [36] and has been exemplified in diabetes [21]. Here we chose five components, although any number from three upwards could have been used. In this model a minimum number of factors need to be present for the disease or trait to occur, which in our simulation set means that at least t number of components need to be in the “present” state for a “case” outcome (Fig 1C). We also present the results without subsampling, which matches population studies (S1 Fig). For the two risk factors, the synergism scenario corresponds to Rothman’s synergism concept [13, 22], and the heterogeneity scenario corresponds to genetic heterogeneity.

For phenotypes with a low prevalence the presence of both risk factors had a multiplicative relation in the synergism scenario (Fig 1D), which is sometimes referred as "super-additive interaction", or "positive interaction" on the additive scale. For the heterogeneity scenario the presence of both risk factors instead had an additive relation at low prevalence (Fig 1D), which is sometimes referred as "negative interaction" in the multiplicative scale. The multifactorial threshold scenario equated to the heterogeneity scenario at threshold t = 1 and the synergism scenario at t = 5, where 5 meant all components needed to be present for phenotype to occur. We obtained intermediate results for intermediate thresholds; so that at low prevalence the joint behavior of the factors was above additive, but below multiplicative (Fig 1D). Without the subsampling (S1E Fig), the scenarios that produced multiplicative relationships lacked correlation between risk factors, both within cases, controls, and the two combined (“all”). However, the subsampling caused a positive correlation between risk factors in the combined (“all”) group (Fig 1E). On the other hand, the additive and intermediate relations, without the subsampling, were reflected in negative correlations among risk factors in cases but not in controls (S1E Fig). This negative correlation between two risk factors can be understood theoretically in the context of the additive model of interaction, since two independent sufficient factors (meaning that there is heterogeneity of causation) should have a strongly negative correlation. Thus, a similar, but attenuated, pattern of correlation between non-sufficient risk factors is expected. We also present results from a simulation of unbiased random sampling (without the subsampling step) that is equivalent to population studies. In this analysis, we detected a multiplicative relation between RRs or a synergism scenario at every prevalence rate (fraction of cases in the population). For the heterogeneity scenario, the relationship between risk factors was additive at low prevalence but less-than-additive at high prevalence.

AMLES, an interaction metric with a simple interpretation for multifactorial thresholds

As we have seen, the multifactorial threshold model gives rise to intermediary interaction relationships between additivity and multiplicativity for the combinations of the risk factors involved (Fig 1D). We therefore set a scale to calculate the intermediary relationships among risk factors. To anchor the scale at 0 for additive relationships, the expected effect from the additive model is subtracted from the observed double exposure OR. Consequently, to set the scale at 1 for multiplicative relationships, the previous subtraction is divided by the difference between the expected OR for the multiplicative model and the expected OR for the additive model. This provides an additive-to-multiplicative linear scale (AMLES):

AMLES=(OR11(observed)expected(additive))/(expected(multiplicative)expected(additive))

where expected(additive) = OR10 + OR01 − 1 and expected(multiplicative) = OR10 × OR01.

Then, to be able to compare AMLES with the threshold in our simulated threshold model, we scaled the threshold as the ratio:

fthr=(t1)/(F1)

where F is the number of components (factors) that could cross the threshold t, in order to have a 0 to 1 scale (see Fig 2). We found that the square of the threshold ratio fits the scaled OR. This means that the OR at a low prevalence (such as 0.5%) for doubly exposed (OR11) was

(1f)×expected(additive)+f×expected(multiplicative),

where √f is the square root of fthr. The metric used, AMLES has the convenient properties of having both a natural lower value (0 for additive) and higher value (1 for multiplicative), as well as an interpretable scale between these values, therefore it is also related to the threshold fraction fthr. However, the metric fest = sign(AMLES) × (AMLES)2 has a more natural scale, but unlike AMLES this is not symmetric around the median, making it a less convenient scale (S2 Fig). AMLES is related to another measure of interaction size, relative excess risk due to interaction (RERI = RR11 − RR10 − RR01 + 1) [12, 22], by AMLES = RERI / (OR10 − 1) / (OR01 − 1) in cases when ORs and RRs tend to be the same. AMLES could perhaps be used instead of measures like attributable proportion, synergy index and RERI, given that it can be indicative of multiplicative or additive risk regardless of the scale (additive or multiplicative) used for testing.

Three risk factors

Relationships involving more than two potential causes can be analyzed by pair-wise interactions, as can be seen in our simulation scenario based on the multifactorial threshold model. Although an outcome with more than two causes will not always entail a threshold model, it may be possible to break down the multifactorial relationships into pairs by those involving causal heterogeneity and synergism. For example, in the case of outcome = X AND (Z OR V) = (X AND Z) OR (X AND V), the relationships among these risk factors should be distinguishable as causal heterogeneity between Z and V (from the logical OR) and synergism between X and each of the other risk factors (from the logical AND) (Fig 2B). This might be how body mass index (BMI) interacts with variants of the F5 and ABO genes (as if X = BMI, Z = F5 SNP, V = ABO SNP) in ventral thromboembolism risk [37, 38]. Our simulation performs as predicted when we tested this example against the expected RR from all eight possible combinations of both logical OR and AND (Fig 2C, S1C Fig).

Scenarios where confounder effects cause false interaction

We investigated two confounder scenarios, neither of which include any causal relationship between the risk factors (e.g., heterogeneity or synergism) and are therefore applicable when either or both are false risk factors. The first scenario describes when the genetic background for cases and controls are mismatched (Fig 3A). It can also be described as an effect of data reuse, where the same data set is used to define risk factors and to calculate interactions, as was done in a previous interaction study [18]. In the other scenario we simulated multiple ancestry groups, which may translate to different allelic frequencies across groups, a well-known confounder that is generally considered in the design or corrected for in genetic association studies [39]. In our simulation we condensed it down to two groups (Fig 3B). We did not model linkage disequilibrium (LD) in our simulations, as the type of simulations we used do not accommodate LD, but we tested our observations on a real GWAS of ACPA-positive RA (anti-citrullinated protein antibody positive rheumatoid arthritis). It may be advisable to prune for LD when addressing interactions in genome-wide data sets [40].

For the first confounder scenario (Fig 3A), ORs (Fig 3C) and correlations (Fig 3D) at high fractions of cases are more relevant than RRs (Fig 3E). We found the ORs relationship to be multiplicative, regardless of the balance between cases and controls, which makes this confounder scenario indistinguishable from real synergism. This was also true for correlation coefficients, where it would only be in population data that it would be possible to distinguish synergism from the confounder scenario (Figs 1E and 3D, S1E Fig) by the former’s lack of correlation in that situation. The second confounder scenario (Fig 3B) had negative-additive ORs and RRs relationships, matching additive in a high proportion of cases (Fig 3C and 3E). The risk factors correlation among the three groups used (all, cases, controls) was always positive (Fig 3D), which should be useful in distinguishing this confounder scenario from real interactions. We also tested what happens when adding a biased subsampling step, as in Fig 1A–1C, to Confounder II, but it made no difference to the results (S3 Fig). We also devised one simulation scheme that always produced additive RRs relationships and one that always produced additive ORs relationships (S4 Fig), because such schemes could guide randomization, and to complement the always-multiplicative OR of the first confounder scenario.

Example from rheumatoid arthritis

Both the synergism and heterogeneity scenarios represent interesting relationships between two risk factors, thus the appropriate interaction model to use depends on the hypothesis one is interested in. Alternatively, a hypothesis-free approach would be ideal, and the closest to that is to evaluate both additive and multiplicative hypotheses, as this would cover both models as well as threshold-based scenarios due to their intermediate nature (i.e. they would fail both types of tests, in the positive direction for additive and negative direction for multiplicative). We therefore evaluated both additive and multiplicative interaction for the two top genetic risk factors (HLA-DRB1 shared epitope alleles and PTPN22 rs2476601 T allele) for ACPA-positive RA in European-descent population, using GWAS data from the EIRA study. We detected no deviation for multiplicativity, but did for additivity (Fig 4A and 4B), as reported before for HLA-DRB1 [15]. The new simulations presented here increases our ability to interpret this result as a widespread interaction between HLA-DRB1 shared epitope and all non-HLA genetic risk factors, in the common meaning of interaction where synergism is a type of interaction. Correlation analyses backed up synergism (but not heterogeneity or population stratification) as an appropriate interpretation of the results (S5 Fig). From this, we could derive that the HLA-DRB1 shared epitope cannot be substituted or phenocopied by a non-HLA genetic risk factor for its part in the chain of ACPA-positive RA etiology (Fig 4A, 4C and 4D). The same was the case for the PTPN22 risk allele, given the similarities in P-value distributions observed (Fig 4A and 4B). For both set of tests there were a majority of tested loci where there was too little data to distinguish additive from multiplicative ORs. We followed up the results of multiplicativity by looking only at known risk SNPs (Fig 4D), finding results similar to a randomization based on the Confounder I scenario (and therefore bound to produce multiplicative odds ratios), with similar variability (P = 0.6–0.8, Levene’s test, n = 9–11 SNPs) implying a dearth of non-multiplicative ORs relationships (Fig 4E). This randomization is the same as Test III of a previous study by Ignac et al [41]. We also devised a randomization scheme creating additive ORs based on the Additive O scheme of S4 Fig (intended to provide always additive effect relationships) and tested it on the full SNP set (S6 Fig). This result showed a very noticeable deviation from the real data, as expected.

Discussion

We herein present a simulation approach intended to help with the interpretation of additive and multiplicative models of interaction of RRs and ORs. We show that addition of ORs, or negative deviation from the multiplicative interaction model (OR11 < 1, when testing OR11 = OR10 × OR01), occurs when the risk comprises two independent risk factors (heterogeneity scenario, Fig 1B), or a process that approximating to that setup at a given fraction of cases (S4 Fig). Multiplication of ORs, or positive deviation from additive interaction model (OR11 > 1, when testing OR11 = OR10 + OR01 − 1), follows if two different mechanisms are required for disease (synergism scenario, Fig 1A). Depending on the hypothesis one should therefore choose the appropriate statistical test. To identify deviation from synergism, deviation from a multiplicative relationship could be tested. Often however, if we are interested in testing whether disease is caused by the interaction of two factors, it is appropriate to test for deviation from additivity. A multiplicative assumption would have merit in our testing against HLA-DRB1 shared epitope and PTPN22, if ACPA-positive RA were a homogeneous set of causes, rather than the kind of heterogeneity of causation that we have shown gives rise to additivity between risk factors. Nevertheless, ACPA-positive RA may have a certain level of heterogeneity of causes, despite being defined as a subgroup of patients where ACPA positivity is a mediating risk factor [42]. Therefore, in this study, we also tested the multiplicative interaction between the strongest genetic risk for ACPA-positive RA and other risk-SNPs in the same material as previously published by Diaz-Gallo et al [15], these results showed that there is not deviation from the null hypothesis for the multiplicative model of interaction. However, we only tested the main other model, not for example some version of the multifactorial threshold model. In light of the new understanding that our simulations give, the presence of deviation from additivity, along with no deviation from multiplicativity, supports the existence of widespread synergism between the genetic risk factors in causing ACPA-positive RA. This result is also applicable to the estimation of heritability for RA, since the assumption of an additive model would lead to an underestimation of narrow-sense heritability [43].

Synergism can be viewed as a chain of events scenario, whereas causal heterogeneity corresponds to phenocopying. In terms of Rothman’s sufficient-cause model, the risk factors X and Z in the synergism scenario correspond to risk factors in the same cause, referred to as causal co-action, joint action or synergism, whereas X and Z in the heterogeneity scenario correspond to risk factors in different causes [22].

The fact that most loci showed no statistically significant deviation from neither additive nor multiplicative interaction will be the unfortunate reality for many applications of interaction testing. While statistical power for single risk factor testing scales with the inverse square of the number of samples, already forcing large GWAS sample sizes, the statistical power for interaction testing scales to the inverse power of four [43], thus requiring far larger sample sizes than standard association testing.

Our inspiration for this work came from a simulation study [18] in which case or control status was randomly assigned, and one risk factor was simulated to resemble the strongest genetic risk factor for ACPA-positive RA, and interaction with other risk factors (selected by P-value for risk) was computed. The simulation led to an overrepresentation of additive interactions (i.e., deviation for additive odds ratios). Our Confounder I scenario (Fig 3A) was produced an analogous result. The author [18] noted that his simulation produced a multiplicative null model that does not match additivity and concluded that the additive interactions observed were erroneous, as no interaction should be present. We propose an alternative interpretation, based on the convergences we found: the Confounder I scenario, and thereby also the previous simulation [18] perfectly mimics synergism in a case-control study with a similar number of cases and controls selected from a population with a low prevalence, as can be seen by comparing our figures (Figs 1D, 1E, 3C and 3D). By low prevalence, we mean the 0.5% mark in our simulations, which is similar to RA prevalence: 0.7% for all RA in Sweden [44], of which 60% are ACPA-positive [45]). It should be no surprise that a simulation [18] of synergism, which of course is interaction [46], produces many significant results, although it is of course worrying that a confounder like data reuse can produce this effect, as the study [18] showed. We conclude that the previous simulation study [18] is in line with the confusion over additive and multiplicative interaction that can sometimes be found in the literature [47], highlighting the need to understand better the relationships among risk factors that different interaction types imply.

In this simulation study we demonstrate the causal interpretations of additive and multiplicative interaction in both the RR and OR settings. Some of this has been understood intuitively in the past, especially the connection between multiplicative effect and logical AND [48], but here we contribute with further clarification of the situation through simulation. We hope that this will help guide the interpretation of future interaction studies.

Supporting information

S1 Fig. Three simulated causal scenarios.

A-C: Simulation schemes for three generalized scenarios: synergism of causes (A), heterogeneity of causes (B) and a 5-factor threshold (C). The numbers are example frequencies, and frequencies in bold highlight the higher frequencies of the simulated risk factors (X and Z) associated with disease. For example, “Z: freq. 0.3” means that each simulated individual in the group had a 30% chance of being assigned the risk factor Z. The numbers in italics are the average frequency in the other group of simulated individuals; note that this will depend on the prevalence (which is adjusted in the scenarios in the split between cases and controls). “Components” (comp1 and comp2) were used as a strategy to obtain probabilistic risk factors. D: The relative risks for double risk (RR11) calculated from the simulation scenarios, with boxes summarizing 1,000 simulation runs with different risk factor frequencies. The observed RR11 are compared to the additive and multiplicative combinations of the relative risks for single risk (RR10 and RR01). Boxplots show median and quartiles for the simulations, but extreme values are omitted for clarity. Yellow arrows highlight where the median is visibly close to multiplicativity, while blue arrows do the same for additivity, for the two most extreme simulated prevalence rates. “M. threshold” means scenario C. E: Correlation coefficients between the risk factors X and Z. The relevant signal in each case is whether the median is negative, zero or positive, highlighted with a -, 0 or + symbol for the two most extreme simulated prevalence rates. We left out the highest prevalence results due to division-with-zero difficulties in the multiple threshold scenario.

(PDF)

S2 Fig. Multifactorial threshold model with fest metric.

The y axis is the signed square of the y axis in Fig 2A, otherwise this is the same plot, i.e. based on the scenario in Fig 1C. The red curve is the threshold fraction; this and fest make up the y axis. The notches in the box plots show bootstrapped 95% confidence intervals for the medians. The simulation had 0.5% cases and the rest controls, with a range ±0.04%.

(PDF)

S3 Fig. No apparent difference from subsampling to equal number of controls and cases in confounder II.

As in the scheme to the left, we simulated with (purple arrows) or without (green arrow) a step that reduced the number of controls to the same as the number of cases. As can be seen in the box plots above, this does not appear to have any effect on the results.

(PDF)

S4 Fig. Schemes providing additivity.

A-B: Simulation for two schemes intended to produce additive effect. The numbers are example frequencies, and frequencies in bold highlight the higher frequencies of the simulated risk factors (X and Z) associated with disease. For example, “Z: freq. 0.3” means that each simulated individual in the group had a 30% chance of being assigned the risk factor Z. The numbers in italics are the average frequency in the other group of simulated individuals; note that this will depend on how many cases there are to controls, specifically, higher_freq × case_fraction + lower_freq × (1 –case_fraction). In each indicated circle the two risk factors have been added in an uncorrelated/independent manner. C-D: The relative risks (C) and odds ratios (D) for double risk (RR11 or OR11) calculated from the simulation schemes, with boxes summarizing 1,000 simulation runs with different risk factor frequencies. The observed RR11 were compared to the expected combinations of the relative risks for single risk (OR10 and OR01) in the additive and multiplicative models. Boxplots show median and quartiles for the simulations, but extreme values are omitted for clarity. Yellow arrows highlight where the median is visibly close to the null hypothesis for the multiplicative model, while blue arrows do the same for the additive model, for the two most extreme simulated trait prevalence rates. E: Correlation coefficients between the risk factors X and Z. The relevant signal in each case is whether the median is negative, zero or positive, highlighted with a -, 0 or + symbol for the two most extreme simulated prevalence rates.

(PDF)

S5 Fig. Shared epitope correlation relationships.

Correlation between HLA-DRB1 Shared Epitope (calculated as codominant) and other SNPs (calculated as dominant) in EIRA data within cases, control or all samples, for all non-HLA SNPs (A, C) or only known non-HLA risk SNPs (B). P-values come from 1-sample t-tests against zero. Regressing out sex and the first ten principal components gave essentially the same result (C) as without this step (A, B).

(PDF)

S6 Fig. Comparison of EIRA double risk odds ratios to two types of randomizations.

A-C: SE is HLA-DRB1 Shared Epitope, “other” are non-HLA risk SNPs for ACPA-positive RA. Unmod. or Unmodified is the EIRA data as it is, Rand1 shuffles, for each locus independently, within cases and shuffling within groups to match the independence part of the confounder scenario of Fig 3A (confounder I), Rand2 assigns individuals randomly into two groups and then in one group samples risk factor SE from controls and in the other group samples the other SNP risk factor from controls, to match S2B Fig (Additive O). We suspect that Rand2 unfortunately moderates odds ratios, meaning it could use a better replacement. Unmodified and Rand1 have similar variability (P = 0.04–0.8, Levene’s test, excluding (C) where calculations failed (NaN)). The middle of the scale for (C) is magnified to ease viewing. Blue arrows highlight where the median is visibly close to additivity, while yellow arrows do the same for multiplicativity. D: SNPs from Fig 4C and 4D, in varying numbers due to division-by-zero errors, with effect scaled between additive (blue line) and multiplicative (yellow line), as in Fig 2A.

(PDF)

Acknowledgments

We thank Anton Larsson for helping with the literature search, Lars Klareskog, Lars Alfredsson and Leonid Padyukov for providing access to EIRA data and engaging in scientific discussions, and Janet Ahlberg for English editing.

Data Availability

Pre-existing genetic data from the EIRA cohort was used. Due to ethical considerations, data from EIRA cannot be publicly shared. Please contact the EIRA’s principal investigators for data requests for applicable studies. For further information go to: https://www.eirasweden.se/Kontakt_EIRA.htm.

Funding Statement

L.M.D.G.: Ulla och Gustaf af Uggla Foundation 2018-02670 https://staff.ki.se/ulla-and-gustaf-af-uggla-foundation L.M.D.G.: Reumatikerförbundet R-861801, R-932138 https://reumatiker.se L.M.D.G.: Konung Gustaf V:s 80-årsfond FAI-2018-0518, FAI-2019-0597 https://www.kungahuset.se/monarkinhovstaterna/kungligastiftelser/forskning/konunggustafvs80arsfond/ L.M.D.G.: Stiftelsen Professor Nanna Svartz Fond 2019-00318 https://www.stiftelsemedel.se/stiftelsen-professor-nanna-svartz-fond/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 years of GWAS discovery: Biology, function and translation. Am J Hum Genet. 2017; 101: 5–22. 10.1016/j.ajhg.2017.06.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Buniello A, Macarthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019. 10.1093/nar/gky1120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Nelson MR, Tipney H, Painter JL, Shen J, Nicoletti P, Shen Y, et al. The support of human genetic evidence for approved drug indications. Nat Genet. 2015. 10.1038/ng.3314 [DOI] [PubMed] [Google Scholar]
  • 4.Martin P, Ding J, Duffus K, Gaddi VP, Mcgovern A, Ray-Jones H, et al. Chromatin interactions reveal novel gene targets for drug repositioning in rheumatic diseases. Ann Rheum Dis. 2019;78: 1127–1134. 10.1136/annrheumdis-2018-214649 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Okada Y, Wu D, Trynka G, Raj T, Terao C, Ikari K, et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 2014; 506: 376–381. 10.1038/nature12873 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hunter DJ, Drazen JM. Has the Genome Granted Our Wish Yet? N Engl J Med. 2019. 10.1056/NEJMp1904511 [DOI] [PubMed] [Google Scholar]
  • 7.Maher B. Personal genomes: The case of the missing heritability. Nature. 2008;456: 18–21. 10.1038/456018a [DOI] [PubMed] [Google Scholar]
  • 8.Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009. 10.1038/nature08494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gilbert-Diamond D, Moore JH. Analysis of gene-gene interactions. Curr Protoc Hum Genet. 2011; Chapter 1, Unit 1.14, 10.1002/0471142905.hg0114s70 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Bloom JS, Ehrenreich IM, Loo WT, Lite TLV, Kruglyak L. Finding the sources of missing heritability in a yeast cross. Nature. 2013;494: 234–237. 10.1038/nature11867 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Forsberg SKG, Bloom JS, Sadhu MJ, Kruglyak L, Carlborg Ö. Accounting for genetic interactions improves modeling of individual quantitative trait phenotypes in yeast. Nat Genet. 2017;49: 497–503. 10.1038/ng.3800 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Rothman KJ. Causes. Am J Epidemiol. 1976; 104: 587–592. 10.1093/oxfordjournals.aje.a112335 [DOI] [PubMed] [Google Scholar]
  • 13.Vanderweele TJ. Sufficient cause interactions and statistical interactions. Epidemiology. 2009;20: 6–13. 10.1097/EDE.0b013e31818f69e7 [DOI] [PubMed] [Google Scholar]
  • 14.Wang X, Elston RC, Zhu X. The meaning of interaction. Hum Hered. 2011;70: 269–277. 10.1159/000321967 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Diaz-Gallo LM, Ramsköld D, Shchetynsky K, Folkersen L, Chemin K, Brynedal B, et al. Systematic approach demonstrates enrichment of multiple interactions between non-HLA risk variants and HLA-DRB1 risk alleles in rheumatoid arthritis. Ann Rheum Dis. 2018; 77: 1454–1462. 10.1136/annrheumdis-2018-213412 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Padyukov L, Seielstad M, Ong RTH, Ding B, Rönnelid J, Seddighzadeh M, et al. A genome-wide association study suggests contrasting associations in ACPA-positive versus ACPA-negative rheumatoid arthritis. Ann Rheum Dis. 2011;70: 259–265. 10.1136/ard.2009.126821 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kendler KS, Gardner CO. Interpretation of interactions: guide for the perplexed. Br J Psychiatry 2010; 197: 170–171. 10.1192/bjp.bp.110.081331 [DOI] [PubMed] [Google Scholar]
  • 18.Kim K. Massive false-positive gene-gene interactions by Rothman’s additive model. Ann Rheum Dis. 2019; 78: 437–439. 10.1136/annrheumdis-2018-214297 [DOI] [PubMed] [Google Scholar]
  • 19.Clayton D. Commentary: reporting and assessing evidence for interaction: why, when and how? Int J Epidemiol. 2012; 41: 707–710. 10.1093/ije/dys069 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Weinberg CR. Less is more, except when less is less: Studying joint effects. Genomics 2009; 93: 10–12. 10.1016/j.ygeno.2008.06.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Génin E, Clerget-Darpoux F. Revisiting the polygenic additive liability model through the example of diabetes mellitus. Hum Hered. 2015; 80: 171–177. 10.1159/000447683 [DOI] [PubMed] [Google Scholar]
  • 22.Rothman KJ, Greenland S. Modern epidemiology, 2nd edition, page 12, Lippincott Williams Wilkins (Wolters Kluwer), Philadelphia USA, ISBN 0-316-75780-1. 1998.
  • 23.Uvehag D, Zazzi H. “Geisa.” https://github.com/menzzana/geisa. 2014
  • 24.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007; 81: 559–575. 10.1086/519795 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Karlson EW, Chibnik LB, Kraft P, Cui J, Keenan BT, Ding B, et al. Cumulative association of 22 genetic variants with seropositive rheumatoid arthritis risk. Ann Rheum Dis. 2010; 69: 1077–1085. 10.1136/ard.2009.120170 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Stahl EA, Raychaudhuri S, Remmers EF, Xie G, Eyre S, Thomson BP, et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat Genet. 2010; 42: 508–514. 10.1038/ng.582 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.van der Walt S, Colbert SC, Varoquaux G. The NumPy Array: A Structure for Efficient Numerical Computation. Comput Sci Eng. 2011; 13, 22–30. 10.1109/MCSE.2011.37 [DOI] [Google Scholar]
  • 28.Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020; 17: 261–272. 10.1038/s41592-019-0686-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hunter JD. Matplotlib: A 2D Graphics Environment. Comput Sci Eng. 2007; 9, 90–95. 10.1109/MCSE.2007.55 [DOI] [Google Scholar]
  • 30.McKinney W. Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference 2010; 51–56
  • 31.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011; 12, 2825–2830. [Google Scholar]
  • 32.Waskom M. "Seaborn." https://seaborn.pydata.org/. 2012
  • 33.Huang S.”Geneview.” https://github.com/ShujiaHuang/geneview. 2016
  • 34.Evans DM, Spencer CC, Pointon JJ, Su Z, Harvey D, Kochan G et al. Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility. Nat Genet. 2011. 43: 761–767. 10.1038/ng.873 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zhou XJ, Lu XL, Nath SK, Lv JC, Zhu SN, Yang HZ, et al. Gene-gene interaction of BLK, TNFSF4, TRAF1, TNFAIP3, and REL in systemic lupus erythematosus. Arthritis Rheum. 2012. 10.1002/art.33318 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Todorov AA, Suarez BK. Genetic liability model. Encyclopedia of Biostatistics 2005. 10.1002/0470011815.b2a05036 [DOI] [Google Scholar]
  • 37.Kim J, Kraft P, Hagan KA, Harrington LB, Lindstroem S, Kabrhel C. Interaction of a genetic risk score with physical activity, physical inactivity, and body mass index in relation to venous thromboembolism risk. Genet Epidemiol. 2018. 42: 354–365. 10.1002/gepi.22118 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Heit JA, Armasu SM, Asmann YW, Cunningham JM, Matsumoto ME, Petterson TM et al. A genome-wide association study of venous thromboembolism identifies risk variants in chromosomes 1q24.2 and 9q. J Thromb Haemost. 2012. 10:1521–1531. 10.1111/j.1538-7836.2012.04810.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Rio S, Mary-Huard T, Moreau L, Bauland C, Palaffre C, Madur D et al. Disentangling group specific QTL allele effects from genetic background epistasis using admixed individuals in GWAS: An application to maize flowering. 2020. 16: e1008241. 10.1371/journal.pgen.1008241 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Joiret M, Mahachie John JJ, Gusareva ES, Van Steen K. Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies. BioData Min. 2019. 12: 11. 10.1186/s13040-019-0199-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ignac TM, Skupin A, Sakhanenko NA, Galas DJ. Discovering pair-wise genetic interactions: an information theory-based approach. PLoS ONE 2014; 9: e92310. 10.1371/journal.pone.0092310 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Sokolove J. Rheumatoid arthritis pathogenesis and pathophysiology. Respiratory Medicine, 19–30. 2017. 10.1007/978-3-319-68888-6_2 [DOI] [Google Scholar]
  • 43.Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci U S A. 2012; 109: 1193–1198. 10.1073/pnas.1119675109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Sjölander A, Lee W, Källberg H, Pawitan Y. Bounds on causal interactions for binary outcomes. Biometrics 2014; 70: 500–5. 10.1111/biom.12166 [DOI] [PubMed] [Google Scholar]
  • 45.Neovius M, Simard JF, Askling J, ARTIS study group. Nationwide prevalence of rheumatoid arthritis and penetration of disease-modifying drugs in Sweden. Ann Rheum Dis. 2011; 70: 624–629. 10.1136/ard.2010.133371 [DOI] [PubMed] [Google Scholar]
  • 46.Jiang X, Källberg H, Chen Z, Ärlestig L, Rantapää-Dahlqvist S, Davila S, et al. An Immunochip-based interaction study of contrasting interaction effects with smoking in ACPA-positive versus ACPA-negative rheumatoid arthritis. Rheumatology 2016; 55: 149–155. 10.1093/rheumatology/kev285 [DOI] [PubMed] [Google Scholar]
  • 47.Källberg H, Bengtsson C. Chapter One—Terminology and definitions for interaction studies, in between the lines of genetic code. In: Padyukov L, Between the Lines of Genetic Code. Academic Press. pages 3–23. 2014. [Google Scholar]
  • 48.Li W, Reich J. A complete enumeration and classification of two-locus disease models. Hum Hered. 2000; 50: 334–349. 10.1159/000022939 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Brecht Devleesschauwer

29 Oct 2020

PONE-D-20-24738

Understanding interactions between risk factors, and assessing the utility of the additive and multiplicative models through simulations

PLOS ONE

Dear Dr. Ramsköld,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Dec 13 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to EACH point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Brecht Devleesschauwer

Academic Editor

PLOS ONE

Additional Editor Comments:

Based on the reviewer's comment, we invite the authors to submit a revised version of the manuscript with substantial revision in terms of clearly presenting the goals and the biological reasoning behind each presented simulation scenario.

In your revision note, please include EACH of the reviewer comments, provide your reply, and when relevant, include the modified/new text (or motivate why you decided not to modify the text). Note that failure to do so may result in a rejection of the manuscript.

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. PLOS ONE publication criteria and journal policy require authors to make data underlying the findings described in their manuscript fully available (https://journals.plos.org/plosone/s/data-availability#loc-acceptable-data-access-restrictions). Please describe how other researchers could access the cohort datasets used in your study.

3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

4. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This is a nice paper investigating interaction in several context and providing interpretation and advantages of measures of interaction. The research question is interesting, however I feel several points need to be clarified.

First, it is not fully clear what the practical utility of the study is. As it stands, this looks like a nice computational study without clear implications. It would be very useful to motivate biologically the simulation scenarios. How were they chosen? What possible situations do they represent? This would be really useful to understand the practical utility of the study beyond the presented example. I recommend providing some practical examples for each of the presented scenario, like situations where it could arise, or published papers. Also, and importantly, the introduction is built on introducing the need of evaluating interactions between genetics and environmental factors, but it seems that all simulations, as well as the provided example, are based on a situation of genetic-genetic interactions? I suggest either revise the introduction accordingly, or include more simulations and examples showing the utility of the measure in the context of GxE interactions.

Second, the abstract needs a substantial revision. It should stand by itself and convey the main messages of the paper, but now it is possible to understand the abstract only after having read the manuscript. Several concept are introduced without any explanation, like multifactorial models, threshold models, intermediaries between additive and multiplicative interaction. Simulations are not motivated. The sentence at line 6-7, which seems to be key, is not clear (missing a verb?)

Third, the newly introduced formula (lines 173 and after) seems to me the most useful and important contribution. However this (meaning the need of this formula and the potential advantages) are not sufficiently motivated in the introduction section. Given the importance of this results I suggest revising the introduction and the abstract to emphasize the need of an intermediary formula between additive and multiplicative interaction, as well as the potential advantages to motivate its introduction and use in different settings.

Minor comments.

line 7-8. For synergy the two risk factors are required, rather than contribute.

Lines 25-26. Since until here you have been talking about multiple risk factors, it is good to mention that you are now referring to only 2 factors.

Page 3. Y is universally used to refer to the outcome, I suggest using Z-X for referring to the 2 simultaneous exposures, and then another letter for the third exposure later.

Line 99. Principal components summarizing what?

Line 433. I couldn’t find detailed figure legends. I think these should be useful, to present what the different panels are showing and provide a brief understanding of the figures without the need of reading the full manuscript

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Apr 26;16(4):e0250282. doi: 10.1371/journal.pone.0250282.r002

Author response to Decision Letter 0


12 Dec 2020

This is a copy of the included rebuttal letter file (labeled Response to Reviewers), which has better formatting:

Editorial comments:

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOSONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Based on the PLOS ONE’s style requirements, we have fixed the bolding of supplemental figure captions and removed the zip codes in the affiliations.

2. PLOS ONE publication criteria and journal policy require authors to make data underlying the findings described in their manuscript fully available (https://journals.plos.org/plosone/s/data-availability#loc-acceptable-data-access-restrictions). Please describe how other researchers could access the cohort datasets used in your study.

We have added text about the cohort dataset under Data Access, stating that "[d]ue to ethical permission, data from EIRA cannot be publicly shared. Please contact the principal investigators for data requests for applicable studies. For further information go to: https://www.eirasweden.se/Kontakt_EIRA.htm".

3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

We will provide a DOI after acceptance, before publication.

4. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data.

We have now added the supplemental figure S3. Therefore, we removed the "data not shown" statement from the sentence at current line number 283.

We have also removed the "data not shown" from figure legend S6 and weakened the statement around it so that it does not require the data in question. The data itself can be found in eira_randomised_distibutions.ipynb in the paper's associated github page, but we choose not link it in the legend as it is not a core part of the research we present.

This is how S6 figure legend now reads:

"(A-C) SE is HLA-DRB1 shared epitope, “other” are non-HLA risk SNPs for ACPA-positive RA. Unmod. or Unmodified is the EIRA data as it is, Rand1 shuffles, for each locus independently, within cases and shuffling within groups to match the independence part of the confounder scenario of Fig 3A (Confounder I), Rand2 assigns individuals randomly into two groups and then in one group samples risk factor SE from controls and in the other group samples the other SNP risk factor from controls, to match S2B Fig (Additive O). We suspect that Rand2 unfortunately moderates odds ratios, meaning it could use a better replacement. Unmodified and Rand1 have similar variability (P=0.04-0.8, Levene’s test, excluding (C) where calculations failed (NaN)). The middle of the scale for (C) is magnified to ease viewing. Blue arrows highlight where the median is visibly close to additivity, while yellow arrows do the same for multiplicativity. (D) Genes from Figure 4C-D, in varying numbers due to division-by-zero errors, with effect scaled between additive (blue line) and multiplicative (yellow line), as in Fig 2A."

This is how the new supplementary figure looks:

Reviewers' comments:

Reviewer #1: This is a nice paper investigating interaction in several context and providing interpretation and advantages of measures of interaction. The research question is interesting, however I feel several points need to be clarified.

We thank the reviewer for her/his positive evaluation regarding the overall aim of our study. We have followed her/his suggestions to improve the way we are presenting the message of our investigation in the manuscript. The details of this are presented below.

First, it is not fully clear what the practical utility of the study is. As it stands, this looks like a nice computational study without clear implications. It would be very useful to motivate biologically the simulation scenarios. How were they chosen? What possible situations do they represent? This would be really useful to understand the practical utility of the study beyond the presented example. I recommend providing some practical examples for each of the presented scenario, like situations where it could arise, or published papers.

They were chosen from two perspectives: 1) what does it mean to have an additive or multiplicative effect? and 2) what would data from the most common scenarios in the literature look like? It is indeed helpful to anchor the scenarios in the literature, and we have added practical examples from the literature to several scenarios: synergism, heterogeneity model, multifactorial threshold model and the three factor Fig 2B scenario. We also add a paper discussing and dealing with genetic background confounder (confounder II). We already reference a paper discussing the data reuse confounder (confounder I). The papers for the confounder scenarios are not practical examples per se, as papers are not as a rule published with known uncorrected confounders, but referencing the discussion about these scenarios is indeed useful. The additive scenarios of S4 Fig are not as far as we know connected to any practical situations, which is why they're in the supplement, although they might be useful for randomization or understanding additivity as we demonstrate in a tentative example in S6 Fig.

Also, and importantly, the introduction is built on introducing the need of evaluating interactions between genetics and environmental factors, but it seems that all simulations, as well as the provided example, are based on a situation of genetic-genetic interactions? I suggest either revise the introduction accordingly, or include more simulations and examples showing the utility of the measure in the context of GxE interactions.

We have revised the introduction to avoid any mention of environment and to specify that the paper discusses genetic interactions.

Second, the abstract needs a substantial revision. It should stand by itself and convey the main messages of the paper, but now it is possible to understand the abstract only after having read the manuscript. Several concept are introduced without any explanation, like multifactorial models, threshold models, intermediaries between additive and multiplicative interaction. Simulations are not motivated. The sentence at line 6-7, which seems to be key, is not clear (missing a verb?)

We have rewritten the abstract.

Here is the new abstract:

"To understand the genetic background of complex diseases it is necessary to study beyond univariate associations. Therefore, interaction assessments of risk factors are important to uncover which, whether and how genetic risk variants act together on disease development. The principle of interaction analysis is to explore the combined effect’s magnitude of risk factors on disease causation. In this study, we investigate using simulations, different scenarios of causation to show whether and how the magnitude of the effect of two risk factors affect each other. We mainly but not exclusively address the two most commonly used interaction models, the additive and multiplicative, since there is often confusion regarding their use and interpretation. Our results show that when two risk factors are involved in the same chain of events, an interaction called synergism, their effect is multiplicative. Synergism is often described as deviation from additivity, which is a less precise allusion. Our results corroborate that it is often relevant to estimate additive effect relationships because they correspond to independent risk factors at low disease prevalence. Importantly, we also evaluate what is the threshold of more than two required risk factors for disease causation, called the multifactorial threshold model. We found a square root relationship between the threshold and an additive-to-multiplicative linear effect scale (AMLES), which ranges from 0 (equivalent to additive) to 1 (equivalent to multiplicative). Therefore, we propose AMLES as a metric that might be used to test different effects relationships at once, given that it can pinpoint multiplicative risk effects relationships in testing on the additive scale and vice versa. Finally, the utility of our simulation study was demonstrated in real data by analyzing and interpreting gene-gene odds ratios from a rheumatoid arthritis case-control cohort."

Third, the newly introduced formula (lines 173 and after) seems to me the most useful and important contribution. However this (meaning the need of this formula and the potential advantages) are not sufficiently motivated in the introduction section. Given the importance of this results I suggest revising the introduction and the abstract to emphasize the need of an intermediary formula between additive and multiplicative interaction, as well as the potential advantages to motivate its introduction and use in different settings.

We thank the reviewer and have revised the introduction.

Here is the new introduction, where particularly the first and last paragraphs have been rewritten:

"Genetic mapping studies of complex human traits have identified thousands of genetic loci implicated in the susceptibility to complex diseases [1, 2]. In other words, genome-wide association studies (GWAS) have linked thousands of single-nucleotide polymorphisms with complex human traits [1, 2]. While these findings have been beneficial for certain aspects, such as drug repositioning [1, 3-5], the identified individual genetic associations seldom exhibit large disease risks and explain a small portion of the calculated heritability for each trait [1, 6-8]. This implies that the genetic associations have a very poor predictive value [Hunter NEJM 2019]. Comprehensive interaction analyses between and among risk factors are an important tool to understand the genetic background of complex traits [7-11]. In general, interaction is declared when the combined effect’s magnitude of two or more factors is significantly different from the combined effect’s magnitude predicted by the model being tested [12- 14]. Nevertheless, it is challenging to address whether, and how genetic risk factors interact shaping human traits and to biologically interpret those interactions [9, 14]. Additionally, there is often confusion interpreting the results from different interaction models (i.e., additive interaction model and multiplicative interaction model). Therefore, in this study, we provide a framework of simulated scenarios that represent common and simple processes where two or more genetic factors may interplay in disease causation. We answer whether and how two or multiple genetic factors would interact in each simulated scenario. We also address those relationships in real data from a case-control study in rheumatoid arthritis (the EIRA study [15, 16]).

The association between individual genetic variants and an outcome (e.g., disease) is typically quantified as odds ratios (OR) or relative risks (RR). Often the case-control design is used to query low prevalence diseases, in which odds ratios approximate the relative risk in the population even though the samples are unevenly drawn. Interactions often examine two risk factors at a time, yielding three OR (or RR) notated as [13, 14]: OR11 for carrying both risk factors; OR10 and OR01 for the exclusive combinations, and OR00 for the absence of both risk factors, which is used as reference (OR00=1). Two different models are commonly used to test interaction, the additive (whose null hypothesis is OR11 = OR10 + OR01 – 1) and the multiplicative (whose null hypothesis is OR11 = OR10 � OR01). The additive model builds on the sufficient cause concept by KJ Rothman [12], who showed that if two factors are part of the disease’s cause and are part of the same sufficient cause (e.g., pathway), then their joint risks will be larger than their sum (often termed “departure from additivity”). This additive model has been criticized for always giving positive results [17, 18]. On the other hand, the multiplicative model has been criticized as a statistical convenience without a theoretical basis, boosted by the implicit multiplicativity in logistic regression [17, 19, 20]. There is often confusion regarding when and how to use and interpret each of the mentioned models.

In addition to evaluate the additive and multiplicative interaction models, we propose and study a third model, the multifactorial threshold model [21]. This model assumes that there is a minimum number of factors required for disease causation, which has a theoretical foundation from the concept of genetic liability [36]. Until now, no a statistical metric to account for the multifactorial threshold model is available, therefore we propose AMLES (additive-to-multiplicative linear effect scale), which compares number of risk factors to a threshold value. Although, the multifactorial threshold has been criticized for simplifying disease biology [21], our proposed metric could help to determine how well this model conforms to the biology.

Simulations are able to give an answer with high precision level if enough data points are generated. We design a variety of different scenarios as basis for the simulations. We build scenarios where we give the cause of the outcome (disease or otherwise) in its description. Our intension was to include the most likely real scenarios, and we provide practical examples from the literature. From the literature also comes the multifactorial threshold model, which is lacking a way to test for it in data. We also build scenarios where we model confounder situations, where there is no cause in the description yet there are apparent, false, risk factors. Again, these confounders are known in the literature. We will later show how the modelled scenarios can be distinguished in real data, and where they cannot. Across all simulations the risk factors are dichotomous, and neither necessary nor sufficient for disease to occur. While most of our simulated scenarios have two risk factors, we include modelling of more two factors and illustrate how it is possible to elucidate multifactorial interactions from a set of pairwise interactions. We compare to our simulation scenarios to additive and multiplicative risk scales, aiming to contribute to understand and interpret different models of interaction. This includes the multifactorial threshold model, and we find a metric that can measure fit not only to additive and multiplicative models, but also be used to elucidate the threshold level within the multifactorial threshold model from data."

Minor comments.

line 7-8. For synergy the two risk factors are required, rather than contribute.

We thank the reviewer for pointing this out and have removed that sentence from the abstract. As a result we now believe the message now is more clear.

Lines 25-26. Since until here you have been talking about multiple risk factors, it is good to mention that you are now referring to only 2 factors.

The sentence has been rephrased accordingly, now it reads like this:

Interactions are often examined two risk factors at a time, yielding three odds (or risk) ratios notated as: OR11 for carrying both risk factors; OR10 and OR01 for the exclusive combinations.

Page 3. Y is universally used to refer to the outcome, I suggest using Z-X for referring to the 2 simultaneous exposures, and then another letter for the third exposure later.

The suggested change has been implemented in the text and the figures of the manuscript. The exposures are now named X, Z and V, respectively.

Line 99. Principal components summarizing what?

The principal component analysis in GWAS is used to estimate the population structure and sample ancestry using the genotyping data. Since population structure can induce confounding factors in the genetic association studies, and in our study the interaction tests, the principal components are included as covariates.

To clarify this, we have now re-structured the sentence the reviewer referred to as follows:

In order to control by population stratification and differences between allele frequencies due to sex, the first ten principal components (which summarized the genotyping data) and sex were used as covariates in this analysis, respectively.

Line 433. I couldn’t find detailed figure legends. I think these should be useful, to present what the different panels are showing and provide a brief understanding of the figures without the need of reading the full manuscript

We apologize for this confusion. Rather than placing the detailed figure legends for the supplemental figures under "Supporting information captions" we had placed them at the bottom of the PDF files for the supplemental figures. We now have copied of the supplemental figure legends to the end of the manuscript. And corrected the naming of the supplemental files as instructed by the editor.

The main figures' legends are found embedded within the Results section, as per PLOS' author instructions, as captioned blocks.

Attachment

Submitted filename: Rebuttal_Letter.docx

Decision Letter 1

Brecht Devleesschauwer

25 Jan 2021

PONE-D-20-24738R1

Understanding interactions between risk factors, and assessing the utility of the additive and multiplicative models through simulations

PLOS ONE

Dear Dr. Ramsköld,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 11 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Brecht Devleesschauwer

Academic Editor

PLOS ONE

Additional Editor Comments (if provided):

As indicated by the reviewer, we suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service.

Upon resubmission, please provide the following:

- The name of the colleague or the details of the professional service that edited your manuscript

- A copy of your manuscript showing your changes by either highlighting them or using track changes

- A clean copy of the edited manuscript

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper has greatly improved. I only strongly recommend a thorough language/grammar revision as many sentences added to the revised version can be improved

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Apr 26;16(4):e0250282. doi: 10.1371/journal.pone.0250282.r004

Author response to Decision Letter 1


10 Mar 2021

The manuscript has gone through copy editing since last submission, by the authors and by a professional editing service (Janet Ahlberg).

Attachment

Submitted filename: Rebuttal_letter.docx

Decision Letter 2

Brecht Devleesschauwer

5 Apr 2021

Understanding interactions between risk factors, and assessing the utility of the additive and multiplicative models through simulations

PONE-D-20-24738R2

Dear Dr. Ramsköld,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Brecht Devleesschauwer

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Brecht Devleesschauwer

15 Apr 2021

PONE-D-20-24738R2

Understanding interactions between risk factors, and assessing the utility of the additive and multiplicative models through simulations

Dear Dr. Ramsköld:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof. Dr. Brecht Devleesschauwer

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Three simulated causal scenarios.

    A-C: Simulation schemes for three generalized scenarios: synergism of causes (A), heterogeneity of causes (B) and a 5-factor threshold (C). The numbers are example frequencies, and frequencies in bold highlight the higher frequencies of the simulated risk factors (X and Z) associated with disease. For example, “Z: freq. 0.3” means that each simulated individual in the group had a 30% chance of being assigned the risk factor Z. The numbers in italics are the average frequency in the other group of simulated individuals; note that this will depend on the prevalence (which is adjusted in the scenarios in the split between cases and controls). “Components” (comp1 and comp2) were used as a strategy to obtain probabilistic risk factors. D: The relative risks for double risk (RR11) calculated from the simulation scenarios, with boxes summarizing 1,000 simulation runs with different risk factor frequencies. The observed RR11 are compared to the additive and multiplicative combinations of the relative risks for single risk (RR10 and RR01). Boxplots show median and quartiles for the simulations, but extreme values are omitted for clarity. Yellow arrows highlight where the median is visibly close to multiplicativity, while blue arrows do the same for additivity, for the two most extreme simulated prevalence rates. “M. threshold” means scenario C. E: Correlation coefficients between the risk factors X and Z. The relevant signal in each case is whether the median is negative, zero or positive, highlighted with a -, 0 or + symbol for the two most extreme simulated prevalence rates. We left out the highest prevalence results due to division-with-zero difficulties in the multiple threshold scenario.

    (PDF)

    S2 Fig. Multifactorial threshold model with fest metric.

    The y axis is the signed square of the y axis in Fig 2A, otherwise this is the same plot, i.e. based on the scenario in Fig 1C. The red curve is the threshold fraction; this and fest make up the y axis. The notches in the box plots show bootstrapped 95% confidence intervals for the medians. The simulation had 0.5% cases and the rest controls, with a range ±0.04%.

    (PDF)

    S3 Fig. No apparent difference from subsampling to equal number of controls and cases in confounder II.

    As in the scheme to the left, we simulated with (purple arrows) or without (green arrow) a step that reduced the number of controls to the same as the number of cases. As can be seen in the box plots above, this does not appear to have any effect on the results.

    (PDF)

    S4 Fig. Schemes providing additivity.

    A-B: Simulation for two schemes intended to produce additive effect. The numbers are example frequencies, and frequencies in bold highlight the higher frequencies of the simulated risk factors (X and Z) associated with disease. For example, “Z: freq. 0.3” means that each simulated individual in the group had a 30% chance of being assigned the risk factor Z. The numbers in italics are the average frequency in the other group of simulated individuals; note that this will depend on how many cases there are to controls, specifically, higher_freq × case_fraction + lower_freq × (1 –case_fraction). In each indicated circle the two risk factors have been added in an uncorrelated/independent manner. C-D: The relative risks (C) and odds ratios (D) for double risk (RR11 or OR11) calculated from the simulation schemes, with boxes summarizing 1,000 simulation runs with different risk factor frequencies. The observed RR11 were compared to the expected combinations of the relative risks for single risk (OR10 and OR01) in the additive and multiplicative models. Boxplots show median and quartiles for the simulations, but extreme values are omitted for clarity. Yellow arrows highlight where the median is visibly close to the null hypothesis for the multiplicative model, while blue arrows do the same for the additive model, for the two most extreme simulated trait prevalence rates. E: Correlation coefficients between the risk factors X and Z. The relevant signal in each case is whether the median is negative, zero or positive, highlighted with a -, 0 or + symbol for the two most extreme simulated prevalence rates.

    (PDF)

    S5 Fig. Shared epitope correlation relationships.

    Correlation between HLA-DRB1 Shared Epitope (calculated as codominant) and other SNPs (calculated as dominant) in EIRA data within cases, control or all samples, for all non-HLA SNPs (A, C) or only known non-HLA risk SNPs (B). P-values come from 1-sample t-tests against zero. Regressing out sex and the first ten principal components gave essentially the same result (C) as without this step (A, B).

    (PDF)

    S6 Fig. Comparison of EIRA double risk odds ratios to two types of randomizations.

    A-C: SE is HLA-DRB1 Shared Epitope, “other” are non-HLA risk SNPs for ACPA-positive RA. Unmod. or Unmodified is the EIRA data as it is, Rand1 shuffles, for each locus independently, within cases and shuffling within groups to match the independence part of the confounder scenario of Fig 3A (confounder I), Rand2 assigns individuals randomly into two groups and then in one group samples risk factor SE from controls and in the other group samples the other SNP risk factor from controls, to match S2B Fig (Additive O). We suspect that Rand2 unfortunately moderates odds ratios, meaning it could use a better replacement. Unmodified and Rand1 have similar variability (P = 0.04–0.8, Levene’s test, excluding (C) where calculations failed (NaN)). The middle of the scale for (C) is magnified to ease viewing. Blue arrows highlight where the median is visibly close to additivity, while yellow arrows do the same for multiplicativity. D: SNPs from Fig 4C and 4D, in varying numbers due to division-by-zero errors, with effect scaled between additive (blue line) and multiplicative (yellow line), as in Fig 2A.

    (PDF)

    Attachment

    Submitted filename: Rebuttal_Letter.docx

    Attachment

    Submitted filename: Rebuttal_letter.docx

    Data Availability Statement

    Pre-existing genetic data from the EIRA cohort was used. Due to ethical considerations, data from EIRA cannot be publicly shared. Please contact the EIRA’s principal investigators for data requests for applicable studies. For further information go to: https://www.eirasweden.se/Kontakt_EIRA.htm.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES