Abstract
Interaction analyses (also termed ‘moderation’ analyses or ‘moderated multiple regression’) are a form of linear regression analysis designed to test whether the association between two variables changes when conditioned on a third variable. It can be challenging to perform a power analysis for interactions with existing software, particularly when variables are correlated and continuous. Moreover, while power is impacted by main effects, their correlation, and variable reliability, it can be unclear how to incorporate these effects into a power analysis. The R package InteractionPoweR and associated Shiny apps allow researchers with minimal or no programming experience to perform analytic and simulation-based power analyses for interactions. At minimum, these analyses require the Pearson’s correlation between variables and sample size, and additional parameters including reliability and the number of discrete levels that a variable takes (e.g., binary or likert scale) can optionally be specified. In this Tutorial we demonstrate how to perform power analyses using our package and give examples of how power can be impacted by main effects, correlations between main effects, reliability, and variable distributions. We also include a brief discussion of how researchers may select an appropriate interaction effect size when performing a power analysis.
Keywords: Interactions, moderation, power analysis, R, open materials
Introduction
Interaction analyses (also termed ‘moderation’ analyses or ‘moderated multiple regression’) are a form of linear regression analysis designed to test whether the association between two variables changes when conditioned on a third variable. For example, whether the association between a trait and an outcome differs between groups. Interactions, and equivalent tests such as two-way ANOVAs, are widely tested across the psychological and social sciences, yet there is growing concern that they contribute to the low replicability of findings in these fields (Altmejd et al., 2019; Open Science Collaboration, 2015; Vize et al., 2022). In tandem, there is increasing evidence that the effect sizes of interactions in observational designs are quite small (Aguinis et al., 2005; Beck & Jackson, 2020; Sherman & Pashler, 2019; Sommet et al., 2022; Tosh et al., 2021; Vize et al., 2022). In the context of these concerns, power analyses can be a critically useful tool for ensuring the robustness and accuracy of conclusions resulting from tests of interactions.
A power analysis is a simulation or calculation of how likely it is that a statistical test will detect a significant result, assuming a true non-zero effect in the population. Power analyses at minimum require the user to specify an effect size, sample size, and alpha (i.e., the p-value threshold that determines whether a result is significant). If the power analysis is run as a simulation, it is simply the percent of simulations where a significant result is found. Power analyses have many uses, including 1) determining the needed sample size to test an effect, 2) determining whether an existing data set can detect an effect of interest, and 3) aiding in the interpretation of findings (i.e., a sensitivity analysis (Baranger et al., 2020)). It can be quite challenging to correctly perform a power analysis for an interaction, particularly as many of the most widely-used tools were designed with experimental manipulations in mind (i.e., factorial designs), and are difficult to apply when planning an observational study, especially with continuous data. For example, they may assume that variables are uncorrelated, or require effect sizes (e.g., ) that are computed by first solving a system of simultaneous linear equations (see Supplemental Methods), which can be a significant barrier. Moreover, it is well known that variable reliability affects power, though it can frequently be unclear how to incorporate it into a power analysis.
In this article, we introduce the InteractionPoweR R package and accompanying Shiny apps, which contains functions for power analyses of interactions with cross-sectional data. This software is free to use and the Shiny apps do not require any programming experience. Compared to G*Power, and the Superpower and pwr2ppl R packages (Aberson, 2019; Faul et al., 2007; Lakens & Caldwell, 2021), InteractionPoweR requires only the standardized cross-sectional effect size (i.e., Pearson’s correlation coefficient), which makes it easy to obtain effect sizes from the published literature to inform a power analysis. InteractionPoweR is also unique in several important regards. First, it incorporates the correlation between the interacting variables, resulting in more accurate power estimates and frequently greater power, than estimates which assume variable independence (Shieh, 2010). Second, it incorporates variable reliability, which can dramatically impact power (Brunner & Austin, 2009). Third, variables can be non-continuous (i.e., binary or ordinal). In this tutorial we will demonstrate how to use InteractionPoweR to run a power analysis for interactions and we will highlight how power is dependent on main effects, variable correlations, reliability, and variable distributions.
A brief introduction to interactions
Written symbolically, interaction analyses take the form:
Where is the dependent variable, and are the independent variables, is the interaction term (i.e., ), refers to standardized effect sizes ( is the intercept), and is the error. Thus, the term of interest is , the effect size of the interaction. A significant interaction (e.g., p<0.05) will indicate that the association between each of the independent variables and the dependent variable changes when conditioned on the other (i.e., the association between and varies when conditioned on , and vice-versa) assuming the model is valid (e.g., no omitted variables); see the discussion in the next section about the importance of this assumption. In this tutorial we will refer to as the ‘moderator’ and as the ‘variable of interest’, but it is important to note that these two are interchangeable (Finsaas & Goldstein, 2021), and the interaction analysis itself provides no evidence for causal effects (Rohrer & Arslan, 2021).
One challenging aspect of performing a power analysis for an interaction is selecting an appropriate value for the magnitude of the interaction effect - the correlation between and . A common practice is to interpret interaction effects in light of the simple slopes (i.e., the association between and , at different values of (Aiken & West, 1991); see also (Finsaas & Goldstein, 2021)) and where the simple slopes intersect. Moreover, theories may hypothesize that the simple slopes will take a particular form or intersect at particular values of the main effect (e.g., the differential susceptibility and diathesis stress hypotheses (Belsky & Pluess, 2009; Widaman et al., 2012)). However, this focus on simple slopes can make it challenging to select an appropriate effect size for a power analysis, as it can be unclear what a specific correlation means in terms of the simple slopes (e.g., is r = 0.1 a small or large effect?). A key insight is that the shape and intersection of the simple slopes is a result of the magnitude of the interaction effect (), relative to the magnitude of the main effects ( and ; note that, as above, is referred to here as the ‘main effect’ and as the ‘moderator’, though they are interchangeable) (Figure 1). The shape of an interaction is governed by the ratio of the interaction effect () to the main effect . If the ratio of these effects is less than 1 then the effect is an ‘attenuated’ effect, where both simple slopes are in the same direction (Figure 1A). If these effects are the same size (a ratio of 1) then the interaction is a ‘knock-out’ effect where one of the simple slopes is a flat line (a slope of 0; Figure 1B). If the ratio is larger than 1 then the effect is a ‘cross-over’ effect, where one simple slope is positive and the other is negative (Figure 1C). Where the simple slopes intersect is governed by the ratio of the moderator effect to the interaction effect . If both effects are in the same direction (both positive or both negative) then the simple slopes will intersect at negative values of (Figure 1D). If and are in opposing directions (one negative and one positive), then the simple slopes will intersect at positive values of (Figure 1F). Note as well that if one begins with known main effects and a hypothesized interaction shape, the corresponding interaction effect size can be easily derived.
Figure 1. The shape of an interaction.

Examples of interactions with different shapes, simple slopes are plotted at -1SD, the mean, and +1SD. All variables are continuous and normally distributed; A-C: The interaction, the correlation between and , is varied (r= 0.1, 0.2, 0.4). The correlation between and is r=0.2, the correlation between and is r=0.15, the correlation between and is r=0.0. A) An attenuated effect (also termed ‘partially-attenuated’); the interaction effect is smaller than the main effect and all simple slope effects are in the same direction. B) A knock-out or ‘no-way’ interaction (also termed ‘fully-attenuated’); the interaction effect and main effect are the same size. One of the simple slopes is 0 - there is no effect of at that level of . C) A cross-over or ‘butterfly’ interaction. The interaction effect is larger than the main effect and the simple slopes are in opposite directions. D-F: The correlation between and is varied (r= 0.4,0,-0.4). The correlation between and is r=0.2, the correlation between and is r=0.2, and the correlation between and is r=0.0. Depending on the ratio of the moderator and interaction effect, the simple slopes can intersect at negative values of X1 (D), zero (E), or positive values of X1 (F). Note that these figures are illustrations of common interactions, but power to detect each interaction is not a function of any general pattern (e.g.,cross-over does not inherently require more power than knock-out).
Tools for interaction power analyses
InteractionPoweR includes two functions for computing power: power_interaction() and power_interaction_r2() (Table 1). The first function, power_interaction(), simulates multiple datasets with the correlations and sample size specified by the user, and runs a regression in each dataset. Power is computed as the percent of simulated datasets where the interaction is significant. The second function, power_interaction_r2(), determines power analytically by solving for Cohen’s and computing the noncentrality parameter (λ) of the F distribution, from which power is a direct result (Cohen, 1988; Maxwell, 2000). Further details on the methodology underlying these functions is given in the Supplement. Both functions are also accessible via online web applications, https://mfinsaas.shinyapps.io/InteractionPoweR/ for power by simulation, and https://david-baranger.shinyapps.io/InteractionPoweR_analytic/ for analytic power. Each approach has its own advantages. Analytic power is much faster to compute than simulation-based power. Simulation-based power, on the other hand, (1) allows for variables to be binary or ordinal and (2) permits users to look beyond simply computing power. For example, users can also examine the range of effect sizes, simple slopes, and interaction shapes observed in a sample that would be consistent with a given population effect size, which can be useful both in planning and interpreting analyses.
Table 1. Methods for interaction power analyses.
Functions and web applications for power analyses for interaction analyses provided in the InteractionPoweR package. Further detail on the use of these functions can be found in the package documentation.
| Methods for interaction power analyses | ||
|---|---|---|
| R Package: https://cran.r-project.org/package=InteractionPoweR | ||
| Additional documentation: https://dbaranger.github.io/InteractionPoweR/index.html | ||
| Primary Power Analysis Functions | ||
| Data Simulation | Analytic Power | |
| Function | power_interaction() | power_interaction_r2() |
| Web Application | https://mfinsaas.shinyapps.io/InteractionPoweR/ | https://david-baranger.shinyapps.io/InteractionPoweR_analytic/ |
| Advantages |
|
|
| Additional Functions in the R Package | ||
| Function | Use | |
| plot_power_curve() | Plots the power curve from a power analysis | |
| power_estimate() | Uses polynomial regression to estimate where the power curve achieves the desired level of power | |
| generate_interaction() | Simulates single dataset with the specified interaction | |
| plot_interaction() | Plots the interaction in a single simulated dataset | |
| test_interaction() | Runs a regression testing the interaction in a single simulated dataset | |
We should also note an important limitation of the power analysis functions presented here, which is that they assume that the relatively simple interaction model presented above is a valid model of the data under investigation. Specifically, it is assumed that there are no omitted variables that correlate with or depend on , and . This includes confounding variables (Rohrer et al., 2022) and also extends to non-linear relationships between and , and (e.g., it is assumed that explains no additional variation in after accounting for ) (Hainmueller et al., 2019). Another important assumption of these analyses is that the variables under investigation are normally distributed and symmetric both in lower and higher dimensions (e.g., skew=0). Thus, after mean-centering and under these assumptions, the interaction term is largely uncorrelated with and , as these correlations are partially a function of the expected value of and , which is now 0 (Olvera Astivia & Kroc, 2019). The extent to which the user’s real-world data violate these assumptions will determine the accuracy and utility of the power analysis. There is no test for omitted variable bias, though methods have been proposed to evaluate its potential impact (Diegert et al., 2022; Wilms et al., 2021). Note as well that the list of potential omitted variables greatly expands if the interaction model includes covariates (Keller, 2014). Researchers can also examine the distribution and correlation of variables in their sample, and we also recommend including individual data points in visualizations of interactions, including both plotting simple slopes and plotting the interaction term against and (Olvera Astivia & Kroc, 2019), in order to determine whether assumptions are met. These limitations notwithstanding, we believe these functions will be useful tools for researchers even in the case when these assumptions do not all hold, as they can be used to gain a deeper understanding of how to interpret interaction models and the relationship between power, effect sizes, and variable correlations in interaction models.
A simple example
Take an example of a hypothetical interaction. Say that we have a known interaction - that the strength of the association between neuroticism (our outcome) and depressive symptoms (the main effect) varies as a function of stressful life events (SLEs - the moderator) (Bondy et al., 2021). We have a new sample and we’d like to know our power to detect this interaction. To do so we need the correlation between neuroticism and depressive symptoms ( and ; r=0.51), the correlation between SLEs and depressive symptoms ( and ; r=0.22), the correlation between neuroticism and SLEs ( and ; r=0.09), and our sample size (let us say our N=940). Finally, we need a hypothesized interaction effect size - how much the association between neuroticism and depressive symptoms changes for every 1 standard deviation change in SLEs - the correlation between the interaction term neuroticism*SLEs and depressive symptoms. In the present example this can be drawn from the prior literature (r=0.09). See Figure 2A for a schematic of this interaction and Figure 2B for an example of a single data set drawn from these population-level parameters.
Figure 2. Example power analysis.

A) Depiction of the population-level effect. X2 is the moderator. X2 groups depict the simple-slopes at mean +/− 1 SD. B) Depiction of the same effect, in a sample of N=940. Generated using the generate_interaction() function, and plotted using the plot_interaction() function. C) An analytic power-analysis for the hypothesized effect at N=940. The code for this analysis is: InteractionPoweR::power_interaction_r2(N = 940,r.x1.y = 0.51,r.x2.y = 0.22,r.x1.x2 = 0.09,r.x1x2.y = 0.09,alpha = 0.05) There is 90% power (0.9). See here and here for how to perform this same power analysis using the online web apps. D) Distribution of the ‘shape’ parameter () from 10,000 simulations generated by the power_interaction() function. Color indicates the range of the 95% confidence interval. The x-axis is the log-transformed ratio of the interaction term (B3) and main effect of X1 (B1).
The standardized effect sizes described in the equation above can be calculated from these correlations via path-tracing (see Supplemental Methods). Note that all variables are standardized in these calculations, with a mean of 0 and a standard deviation of 1. As the variables are mean-centered, the intercept is 0, and the correlation between and , as well as the correlation between and , is r=0, as long as the variables are normally distributed and symmetric in both lower and higher dimensions (Olvera Astivia & Kroc, 2019). Example code using the InteractionPoweR function for analytic power power_interaction_r2, as well as the result of this analysis, is given in Figure 2C. We see that our analysis would have 90% power to detect our effect with an alpha of 0.05 (i.e., we would detect a significant effect 90% of the time). Figure 2D shows the distribution of the interaction’s shape () across 10,000 simulations (generated via the interaction_power() function - see Supplemental code). While the hypothesized shape is approximately 1:5 (an attenuated effect - the interaction effect is ⅕ the size of the main effect), shapes ranging from 1:3 to 1:14 are likely to be observed in a new sample with 90% power (95% of simulated interactions fall within this range).
Planning a study
Finding the necessary sample size
Power analyses are frequently used when planning a study. Taking the example in the prior section, let us say that we are planning our replication study, but wish to account for possible effects of publication bias and measurement error (see (Anderson & Kelley, 2022) for an extended discussion of this complex issue). In our case, we decide that the effect size reported in the original study may be inflated and we decide to plan on powering our analysis to detect an interaction effect size that is half of what was reported in the original study - the correlation between and of r=0.045, instead of r=0.09 (Figure 3A). How many participants would we need in our study in order to have 90% power to detect that effect? As opposed to entering a single value for the sample size, we can enter a range of values, say from N=500 to N=6,000, and examine power at all of them (Figure 3 C). The function plot_power_curve() plots the power curve (Figure 3B) visual inspection of which indicates that we would need approximately N=3,750 to detect our hypothesized interaction with 90% power.
Figure 3. Finding the necessary sample size.

A) Depiction of the population-level effect. X2 is the moderator. X2 groups depict the simple-slopes at mean +/− 1 SD. B) Power curve for the effect of interest, made using the plot_power_curve function. N=3,750 achieves 90% power. C) Code for the analytic power analysis used to generate B. Sample size (N) ranges from 500 to 6,000, in increments of 250. The code for this analysis is: InteractionPoweR::power_interaction_r2(N = seq(500,6000,250),r.x1.y = 0.51,r.x2.y = 0.22,r.x1.x2 = 0.09,r.x1x2.y = 0.045,alpha = 0.05). See here and here for how to perform this same power analysis using the online webapps.
Finding the minimum detectable effect size - a.k.a. sensitivity power
Note that any of the parameters can be varied in the simulation. For example, if we are planning an analysis in a data set that already exists, or if we are evaluating an analysis that has already been conducted, we can instead vary the correlation between and , thus allowing us to determine the smallest detectable effect size. We remind the reader that one should not draw these effects directly from an analysis that has already been performed (Gelman & Carlin, 2014; Zhang et al., 2019), unless there is high confidence that the observed sample-effect is a reasonably precise estimate of the population-level effect. Power is a function of population-level effects, while estimates from individual samples are likely to be biased, particularly as publication bias and selection on significant effects lead to over-estimates of effect sizes (Kühberger et al., 2014). As such, a sensitivity analysis using observed sample effects risks being circular, as it confounds sample and population-level effects, and will inevitably yield a high “power” estimate, merely reflecting that the observed sample-level effect used in the analysis was significant.
Returning to our original sample size of N=940, say that we have evaluated the costs and benefits, and decided that we only need 80% power, not 90% power. What is then the minimum detectable interaction effect size? We can test a range of values for the correlation between and (Figure 4A), say from r=0.01 to r=0.12 (Figure 4A). Doing so, we see that the minimum detectable effect size is slightly greater than 0.075 (Figure 4B). If we want to know where the power curve crosses the power=0.8 line, we can use the power_estimate() function, which uses polynomial regression to find the values that will result in the desired level of power. In the present example, we see we have 80% power to detect effects as small as 0.078 (Figure 4A). We can then determine whether our planned study, or the study we are evaluating, is powered to detect effects that are plausible and of interest. Note that comparing an observed effect size to the smallest detectable effect size is not a valid test of whether a study is underpowered, as the observed effect in a sample can be smaller than the population effect.
Figure 4. Finding the minimum detectable effect size.

A) Code for analytic power, testing a range of interaction effect-sizes (0.01 to 0.11). The power_estimate function is used to find where the power curve crosses the power=0.8 line (80% power). The code for this analysis is InteractionPoweR::power_interaction_r2(N = 940,r.x1.y = 0.51,r.x2.y = 0.22, r.x1.x2 = 0.09,r.x1x2.y = seq(0.01,0.12,.005),alpha = 0.05). B) A plot of the power curve, generated using the plot_power_curve() function. See here and here for how to perform this same power analysis using the online Shiny apps.
Important considerations
Main effects and their correlation
As with any design, power will depend on the sample size and size of the effect of interest. Beyond these considerations, interaction analyses will differ from other power analyses that readers may be familiar with (e.g., for a correlation), as power for interactions additionally depends on the strength of the main effect (the correlation between and ), the moderator’s effect (the correlation between and ), and the correlation between the main effect and moderator (between and ). Indeed, readers may have noted that in the first example the sample size needed to detect the interaction effect of r=0.09 with 90% power (N=940) is smaller than the sample size needed if one were performing a power analysis simply for the correlation of r=0.09 (N=1,292; pwr::pwr.r.test(r = 0.09,power = .9,sig.level = 0.05)). This is because power in a regression is not a function of total variance explained, but of residual variance explained (i.e., how much additional variance is explained after the main effects are taken into account) (McClelland & Judd, 1993). Larger correlations between and , and and , thus explain more of the variance that is independent of the interaction term, our variable of interest, thereby increasing the proportion of the residual variance explained by (assuming that is largely independent of and , as and are mean-centered and symmetric). An example of this is shown in Figure 5A, where we set the interaction effect, the correlation between and , to r=0.1, set the correlation between and to r=0, N=500, and vary the correlations between and , and between and .
Figure 5. Main effects and their correlation.

Examples when the interaction effect (the correlation between and ) is r=0.1 and sample size N=500. Power analyses were run with interaction_power_r2 and plots were made with plot_power_curve (Supplement) A) The main effects (the correlation between and , and between and ) are varied and the correlation between and is r=0. Power increases as the correlation increases (the effect is symmetrical around 0 for both and ). B) The correlation between and is additionally varied. Suppression and enhancement effects emerge, depending on the direction and magnitude of all three correlations.
The correlation between and will also impact power, as the degree of independence between and will influence the total variance explained by them both. Notably, a larger correlation between and can result in either increased or decreased power, depending on whether enhances or suppresses (i.e., whether the total variance explained by and is greater or less than the sum of their individual effects) (Friedman & Wall, 2005; Shieh, 2010). Examples of the correlation between and resulting in either suppression (i.e., reducing power) or enhancement (i.e., increasing power) are shown in Figure 5B. For example, in Figure 5A we see that an analysis with r.x1.y = .25, r.x2.y = .3, r.x1.x2 = 0, and r.x1x2.y = .1 will have 67.6% power when N = 500. While and capture 15.25% of the total variance in when they are uncorrelated, this amount will change with their correlation. If their correlation (r.x1.x2) becomes 0.3, then the variance they explain goes down to 11.8%, and the analysis will instead have 65.8% power. If their correlation becomes −0.3, then the variance they explain goes up to 21.7%, and the analysis will have 71% power.
Reliability
Reliability is an important consideration in most power analyses. Reliability reflects the proportion of a measurement’s variance that is not accounted for by measurement error. It is frequently operationalized as measurement consistency across time or instruments (e.g., the intraclass correlation coefficient or Cronbach’s alpha) (Revelle & Condon, 2019). Assuming classical measurement error (i.e., observed scores are normally and independently distributed around given true scores), reliability constrains the maximal observable correlation between two variables (Spearman, 1904).
For example, if the true correlation between and is r=0.20, has a reliability of 0.8, and has a reliability of 0.7, then the maximal observable correlation between and will be r=0.15 (). In the case of an interaction, the reliability of the interaction term is a function of the reliabilities of both and , as well as the correlation between and (Busemeyer & Jones, 1983)
The lower-bound of the reliability of is simply the product of the reliabilities of and , though it will be greater than this (perhaps only marginally) if and are correlated. In the example above, the reliability of would be 0.577 (). Thus, the observable effect size for most interactions will be smaller (quite likely much smaller) than the true population effect size. We should also note that reliability additionally impacts power as it may attenuate other correlations in the analysis (between and ), thereby reducing the total variance explained by variables which are independent of (see Main effects and their correlation above), further reducing the power of the analysis. Of note, while the reliability of does not attenuate effect sizes, it does increase standard errors, thereby reducing power. Taking these effects together, we can see that any power analysis for interactions which does not account for reliability is liable to grossly overestimate power. Figure 6 shows an example of how the reliabilities of and compound to affect power.
Figure 6. Reliability.

Example of the effect of reliability, using the same effects as in Figure 1 (N=940, the interaction effect is r=0.09, the correlation between and is r=0.51, the correlation between and is r=0.22, and the correlation between and is r=0.09.) Reliability is controlled with the rel.x1, rel.x2, and rel.y flags and can also be adjusted in the online web apps. While the analysis has 90% power when all variables have a reliability of 1.0, power decreases with less than perfect reliability. For example, with ‘good’ reliability (all reliabilities are 0.8) the analysis has 60% power and a sample size of N=2,023 (more than double the size) would be required to achieve 90% power.
Binary and ordinal variables
It is frequently the case in the social and psychological sciences that not all variables are continuous. The power_interaction function allows users to specify that variables are binary (e.g., sex or group) or have multiple discrete levels (e.g., likert scales). For example, can be a binary outcome and analyses will be run as a logistic regression (Figure 7A; note that the output of generate_interaction() is always z-scored, but the test_interaction() and plot_interaction() functions detect that has two levels and convert it to a binary factor with levels ‘0’ and ‘1’ prior to analyses.). Transforming a continuous variable to be discrete is well-known to impact power in many cases, as it attenuates correlations (Cohen, 1983). By setting the adjust.correlations flag to FALSE users can evaluate the impact of a discrete transformation on power (Figure 7B).
Figure 7. Binary and ordinal variables.

A) An example interaction when is a binary variable, run as a logistic regression. All other parameters are the same as in Figure 1A (the correlation between and is r=0.51, the correlation between and is r=0.22, the correlation between and is r=0.09, and the interaction effect size is r=0.09). B) is continuous and all parameters are the same as in (A). The sample size is N=940. and are artificially discretized (thereby attenuating correlations) in the interaction_power_r2() function. The number of discrete values is set by the k.x1, k.x2, and k.y flags. If discretization is artificial (after the fact), set the adjust.correlations flag to FALSE (it is TRUE by default).
Selecting an appropriate effect size
A power analysis cannot tell the user what interaction effect size is appropriate. However, it should be noted that prior work has found that interaction effects are commonly quite small. In review of 261 papers in psychology journals which reported interaction effects, Aguinis et al. found a median effect of (approximately r=0.044) (Aguinis et al., 2005). Recent work in personality science suggests that most replicable interaction effects are in fact smaller than this (median r=0.022) (Vize et al., 2022). Similarly, recent work in human statistical genetics has found that while interactions are pervasive, they are largely orders of magnitude smaller than main-effects (Zhu et al., 2022). These and related observations have led to prominent commentaries suggesting that researchers should plan on interaction effects that are at least a third of the size of main effects (i.e., an attenuated effect) (Gelman, 2018), as well as theoretical work suggesting that most observable interactions are necessarily quite small (Tosh et al., 2021). Thus, if researchers are unsure what interaction effect to plan for or what shape to expect, we recommend planning for an attenuated interaction effect that is at least half the size of main effects.
We should also remind the reader here that while there are many methods for determining the effects used in the power analysis, one should not draw these effects directly from an analysis that has already been performed (Gelman & Carlin, 2014; Zhang et al., 2019), unless one is planning a replication study. We recommend drawing effects from prior large studies and meta-analyses. In the absence of a published effect, we recommend conducting the power analysis at multiple reasonable values for the unknown parameter(s).
Conclusions
InteractionPoweR is a novel and useful addition to a researcher’s toolkit, as it can easily compute power for statistical tests of interactions. It is currently the only package which allows users to set many of the parameters which affect power, including cross-sectional effect sizes, the correlation between variables, variable reliability, and variable distributions. Without considering all of these in tandem, one can arrive at an incorrect power estimate, possibly leading to either grossly underpowered, or even overpowered (waste of resources), studies. Thus, the InteractionPoweR package can be a very helpful resource for researchers planning studies of interactions. Future work will seek to extend these power analyses to interaction models with covariates and mixed effect models.
Supplementary Material
Table 2. Further reading.
Selected references recommended for further reading.
| Description | Citation |
|---|---|
| Power is strongly influenced by variance explained independent of the interaction term. | McClelland, G. H., & Judd, C. M. (1993). Statistical difficulties of detecting interactions and moderator effects. Psychological Bulletin, 114(2), 376–390. https://doi.org/10.1037/0033-2909.114.2.376 |
| An interaction is not, in of itself, evidence for a causal effect. | Rohrer, J. M., & Arslan, R. C. (2021). Precise Answers to Vague Questions: Issues With Interactions. Advances in Methods and Practices in Psychological Science, 4(2), 25152459211007370. https://doi.org/10.1177/25152459211007368 |
| How to tell when interaction estimates will benefit from centering. | Olvera Astivia, O. L., & Kroc, E. (2019). Centering in Multiple Regression Does Not Always Reduce Multicollinearity: How to Tell When Your Estimates Will Not Benefit From Centering. Educational and Psychological Measurement, 79(5), 813–826. https://doi.org/10.1177/0013164418817801 |
| Issues surrounding omitted variable bias specific to interactions. | Keller, M. C. (2014). Gene × environment interaction studies have not properly controlled for potential confounders: The problem and the (simple) solution. Biological Psychiatry, 75(1), 18–24. https://doi.org/10.1016/j.biopsych.2013.09.006 |
| Estimated interaction effects are generally quite small. | Vize, C. E., Sharpe, B. M., Miller, J. D., Lynam, D. R., & Soto, C. J. (2022). Do the Big Five personality traits interact to predict life outcomes? Systematically testing the prevalence, nature, and effect size of trait-by-trait moderation. European Journal of Personality. https://doi.org/10.1177/08902070221111857 |
| Moving beyond simple-slopes for the interpretation of interactions. | Finsaas, M. C., & Goldstein, B. L. (2021). Do simple slopes follow-up tests lead us astray? Advancements in the visualization and reporting of interactions. Psychological Methods, 26(1), 38–60. https://doi.org/10.1037/met0000266 |
| Study-planning should consider more than just power. | Gelman, A., & Carlin, J. (2014). Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors. Perspectives on Psychological Science, 9(6), 641–651. https://doi.org/10.1177/1745691614551642 |
Funding
This work was funded by the National Institutes of Health: Dr. Baranger (R21 AA027827; R01 DA054869; T32 MH018951), Dr. Finsaas (T32 MH013043), Dr. Vize (T32 MH018269), and Dr. Olino (R01-MH107495).
Footnotes
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this Article.
Code and material availability
Expanded hyperlinks are available in Supplemental Table 1. The InteractionPoweR R package is freely available for download at: https://cran.r-project.org/web/packages/InteractionPoweR/index.html and https://dbaranger.github.io/InteractionPoweR/. Underlying code is available at: https://github.com/dbaranger/InteractionPoweR. The interactive Shiny App for simulations is available at: https://mfinsaas.shinyapps.io/InteractionPoweR/. The interactive Shiny App for analytic power is available at: https://david-baranger.shinyapps.io/InteractionPoweR_analytic/. The R software environment is freely available at: https://www.r-project.org/. RStudio is freely available at: https://www.rstudio.com/products/rstudio/. The code used in this tutorial is available in the Supplement and at https://osf.io/ntze5/.
References
- Aberson CL (2019). Applied power analysis for the behavioral sciences (2nd Edition). Routledge, Taylor & Francis Group. [Google Scholar]
- Aguinis H, Beaty JC, Boik RJ, & Pierce CA (2005). Effect Size and Power in Assessing Moderating Effects of Categorical Variables Using Multiple Regression: A 30-Year Review. Journal of Applied Psychology, 90(1), 94–107. 10.1037/0021-9010.90.1.94 [DOI] [PubMed] [Google Scholar]
- Aiken LS, & West SG (1991). Multiple regression: Testing and interpreting interactions (pp. xi, 212). Sage Publications, Inc. [Google Scholar]
- Altmejd A, Dreber A, Forsell E, Huber J, Imai T, Johannesson M, Kirchler M, Nave G, & Camerer C (2019). Predicting the replicability of social science lab experiments. PLoS ONE, 14(12), e0225826. 10.1371/journal.pone.0225826 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anderson SF, & Kelley K (2022). Sample size planning for replication studies: The devil is in the design. Psychological Methods. 10.1037/met0000520 [DOI] [PubMed] [Google Scholar]
- Baranger DAA, Few LR, Sheinbein DH, Agrawal A, Oltmanns TF, Knodt AR, Barch DM, Hariri AR, & Bogdan R (2020). Borderline Personality Traits Are Not Correlated With Brain Structure in Two Large Samples. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(7), 669–677. 10.1016/j.bpsc.2020.02.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beck ED, & Jackson JJ (2020). A Mega-Analysis of Personality Prediction: Robustness and Boundary Conditions. PsyArXiv. 10.31234/osf.io/7pg9b [DOI] [PMC free article] [PubMed] [Google Scholar]
- Belsky J, & Pluess M (2009). Beyond diathesis stress: Differential susceptibility to environmental influences. Psychological Bulletin, 135(6), 885–908. 10.1037/a0017376 [DOI] [PubMed] [Google Scholar]
- Bondy E, Baranger DAA, Balbona J, Sputo K, Paul SE, Oltmanns TF, & Bogdan R (2021). Neuroticism and reward-related ventral striatum activity: Probing vulnerability to stress-related depression. Journal of Abnormal Psychology, 130(3), 223–235. 10.1037/abn0000618 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brunner J, & Austin PC (2009). Inflation of Type I error rate in multiple regression when independent variables are measured with error. Canadian Journal of Statistics, 37(1), 33–46. 10.1002/cjs.10004 [DOI] [Google Scholar]
- Busemeyer JR, & Jones LE (1983). Analysis of multiplicative combination rules when the causal variables are measured with error. Psychological Bulletin, 93(3), 549–562. 10.1037/0033-2909.93.3.549 [DOI] [Google Scholar]
- Cohen J (1983). The cost of dichotomization. Applied Psychological Measurement, 7(3), 249–253. 10.1177/014662168300700301 [DOI] [Google Scholar]
- Cohen J (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Routledge. 10.4324/9780203771587 [DOI] [Google Scholar]
- Diegert P, Masten MA, & Poirier A (2022). Assessing Omitted Variable Bias when the Controls are Endogenous (arXiv:2206.02303). arXiv. http://arxiv.org/abs/2206.02303 [Google Scholar]
- Faul F, Erdfelder E, Lang A-G, & Buchner A (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. 10.3758/BF03193146 [DOI] [PubMed] [Google Scholar]
- Finsaas MC, & Goldstein BL (2021). Do simple slopes follow-up tests lead us astray? Advancements in the visualization and reporting of interactions. Psychological Methods, 26(1), 38–60. 10.1037/met0000266 [DOI] [PubMed] [Google Scholar]
- Friedman L, & Wall M (2005). Graphical Views of Suppression and Multicollinearity in Multiple Linear Regression. The American Statistician, 59(2), 127–136. [Google Scholar]
- Gelman A (2018). You need 16 times the sample size to estimate an interaction than to estimate a main effect | Statistical Modeling, Causal Inference, and Social Science. Statistical Modeling, Causal Inference, and Social Science. https://statmodeling.stat.columbia.edu/2018/03/15/need-16-times-sample-size-estimate-interaction-estimate-main-effect/ [Google Scholar]
- Gelman A, & Carlin J (2014). Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors. Perspectives on Psychological Science, 9(6), 641–651. 10.1177/1745691614551642 [DOI] [PubMed] [Google Scholar]
- Hainmueller J, Mummolo J, & Xu Y (2019). How Much Should We Trust Estimates from Multiplicative Interaction Models? Simple Tools to Improve Empirical Practice. Political Analysis, 27(2), 163–192. 10.1017/pan.2018.46 [DOI] [Google Scholar]
- Keller MC (2014). Gene × environment interaction studies have not properly controlled for potential confounders: The problem and the (simple) solution. Biological Psychiatry, 75(1), 18–24. 10.1016/j.biopsych.2013.09.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kühberger A, Fritz A, & Scherndl T (2014). Publication Bias in Psychology: A Diagnosis Based on the Correlation between Effect Size and Sample Size. PLoS ONE, 9(9), e105825. 10.1371/journal.pone.0105825 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lakens D, & Caldwell AR (2021). Simulation-Based Power Analysis for Factorial Analysis of Variance Designs. Advances in Methods and Practices in Psychological Science, 4(1), 2515245920951503. 10.1177/2515245920951503 [DOI] [Google Scholar]
- Maxwell SE (2000). Sample size and multiple regression analysis. Psychological Methods, 5, 434–458. 10.1037/1082-989X.5.4.434 [DOI] [PubMed] [Google Scholar]
- McClelland GH, & Judd CM (1993). Statistical difficulties of detecting interactions and moderator effects. Psychological Bulletin, 114(2), 376–390. 10.1037/0033-2909.114.2.376 [DOI] [PubMed] [Google Scholar]
- Olvera Astivia OL, & Kroc E (2019). Centering in Multiple Regression Does Not Always Reduce Multicollinearity: How to Tell When Your Estimates Will Not Benefit From Centering. Educational and Psychological Measurement, 79(5), 813–826. 10.1177/0013164418817801 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science. 10.1126/science.aac4716 [DOI] [Google Scholar]
- Revelle W, & Condon DM (2019). Reliability from α to ω: A tutorial. Psychological Assessment, 31(12), 1395–1411. 10.1037/pas0000754 [DOI] [PubMed] [Google Scholar]
- Rohrer JM, & Arslan RC (2021). Precise Answers to Vague Questions: Issues With Interactions. Advances in Methods and Practices in Psychological Science, 4(2), 25152459211007370. 10.1177/25152459211007368 [DOI] [Google Scholar]
- Rohrer JM, Hünermund P, Arslan RC, & Elson M (2022). That’s a Lot to Process! Pitfalls of Popular Path Models. Advances in Methods and Practices in Psychological Science, 5(2), 25152459221095828. 10.1177/25152459221095827 [DOI] [Google Scholar]
- Sherman R, & Pashler H (2019). Powerful Moderator Variables in Behavioral Science? Don’t Bet on Them (Version 3). PsyArXiv. 10.31234/osf.io/c65wm [DOI] [Google Scholar]
- Shieh G (2010). On the Misconception of Multicollinearity in Detection of Moderating Effects: Multicollinearity Is Not Always Detrimental. Multivariate Behavioral Research, 45(3), 483–507. 10.1080/00273171.2010.483393 [DOI] [PubMed] [Google Scholar]
- Sommet N, Weissman D, Cheutin N, & Elliot AJ (2022). How many participants do i need to test an interaction? Conducting an appropriate power analysis and achieving sufficient power to detect an interaction. OSF Preprints. 10.31219/osf.io/xhe3u [DOI] [Google Scholar]
- Spearman C (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15(1), 72–101. 10.2307/1412159 [DOI] [PubMed] [Google Scholar]
- Tosh C, Greengard P, Goodrich B, Gelman A, Vehtari A, & Hsu D (2021). The piranha problem: Large effects swimming in a small pond. ArXiv:2105.13445 [Math, Stat]. http://arxiv.org/abs/2105.13445 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vize CE, Sharpe BM, Miller JD, Lynam DR, & Soto CJ (2022). Do the Big Five personality traits interact to predict life outcomes? Systematically testing the prevalence, nature, and effect size of trait-by-trait moderation. European Journal of Personality. 10.1177/08902070221111857 [DOI] [Google Scholar]
- Widaman KF, Helm JL, Castro-Schilo L, Pluess M, Stallings MC, & Belsky J (2012). Distinguishing Ordinal and Disordinal Interactions. Psychological Methods, 17(4), 615–622. 10.1037/a0030003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilms R, Mäthner E, Winnen L, & Lanwehr R (2021). Omitted variable bias: A threat to estimating causal relationships. Methods in Psychology, 5, 100075. 10.1016/j.metip.2021.100075 [DOI] [Google Scholar]
- Zhang Y, Hedo R, Rivera A, Rull R, Richardson S, & Tu XM (2019). Post hoc power analysis: Is it an informative and meaningful analysis? General Psychiatry, 32(4), e100069. 10.1136/gpsych-2019-100069 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu C, Ming MJ, Cole JM, Kirkpatrick M, & Harpak A (2022). Amplification is the Primary Mode of Gene-by-Sex Interaction in Complex Human Traits (p. 2022.05.06.490973). bioRxiv. 10.1101/2022.05.06.490973 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Expanded hyperlinks are available in Supplemental Table 1. The InteractionPoweR R package is freely available for download at: https://cran.r-project.org/web/packages/InteractionPoweR/index.html and https://dbaranger.github.io/InteractionPoweR/. Underlying code is available at: https://github.com/dbaranger/InteractionPoweR. The interactive Shiny App for simulations is available at: https://mfinsaas.shinyapps.io/InteractionPoweR/. The interactive Shiny App for analytic power is available at: https://david-baranger.shinyapps.io/InteractionPoweR_analytic/. The R software environment is freely available at: https://www.r-project.org/. RStudio is freely available at: https://www.rstudio.com/products/rstudio/. The code used in this tutorial is available in the Supplement and at https://osf.io/ntze5/.
