Abstract
Tests of statistical interactions (or tests of moderation effects) in personality disorder research are a common way for researchers to examine nuanced hypotheses relevant to personality pathology. However, the nature of statistical interactions makes them difficult to reliably detect in many research scenarios. The present study used a flexible, simulation-based approach to estimate statistical power to detect trait-by-trait interactions common to psychopathy research using the Triarchic Model of Psychopathy and the Psychopathic Personality Inventory. Our results show that even above average sample sizes in these literatures (e.g., N = 428) provide inadequate power to reliably detect trait-by-trait interactions, and the sample sizes needed to detect interaction effect sizes in realistic scenarios are extremely large, ranging from 1,300 to 5,200. The implications for trait-by-trait interactions in psychopathy are discussed, as well as how the present findings might generalize to other areas of personality disorder research. We provide recommendations for how to design research studies that can provide informative tests of interactions in personality disorder research, but also highlight that a more realistic option is to abandon the traditional approach when testing for interaction effects and adopt alternative approaches that may be more productive.
Keywords: Moderation, statistical interaction, psychopathy, statistical power, power analysis
Statistical interactions, whether tested using multiple regression or related statistical models (e.g., analysis of variance), occur when one variable’s effect on an outcome is dependent on values of a separate variable, referred to as a moderator variable. Personality disorder research often includes tests for statistical interactions. These tests typically take the form of an interaction between maladaptive traits and a moderator variable, including experimental condition (e.g., Chapman et al., 2010), demographic variables like race (e.g., Haliczer et al., 2020) or gender (e.g., Verona et al., 2012), developmental variables (e.g., Gratz et al., 2011) or other traits (e.g., Lilienfeld et al., 2019; Vize et al., 2016). Such tests allow researchers to explore diverse and nuanced questions from models that emphasize factors that impact the development, expression, and maintenance of personality pathology (Wright et al., 2016).
Tests of statistical interactions, however, involve various methodological obstacles that negatively impact statistical power—the probability of finding an effect which exists in the population.1 These obstacles include unreliability of the interaction product term (e.g., Busemeyer & Jones, 1983), restricted range of values in predictor variables (McClelland & Judd, 1993), imbalanced data in categorical moderators (e.g., Frazier et al., 2004) and small effect sizes for interaction effects (Aguinis et al., 2005; Aguinis & Stone-Romero, 1997; Chaplin, 1991; Murphy & Russell, 2017). Unfortunately, these limitations are not often given sufficient attention. Using trait-by-trait interactions as examples, we show that typical sample sizes provide insufficient power to test for interactions. Importantly, the results likely generalize to many other areas of personality disorder research, and clinical science more broadly.
Statistical Interactions in Psychopathy Research
There is an explicit focus on trait-by-trait interaction tests in psychopathy research. Like other personality disorders, psychopathy is a multidimensional construct. There is consistent agreement that psychopathy is composed of traits related to antagonism (i.e., callousness, manipulativeness) and disinhibition, but disagreement for traits related to fearlessness or boldness (Lilienfeld et al., 2012; Miller & Lynam, 2012). In light of the absence of evidence for the relation between boldness and psychopathy-relevant outcomes like externalizing behaviors (Miller & Lynam, 2012; Crowe et al., 2021; Sleep et al., 2019), some researchers have posited interactive effects between boldness and other psychopathy components.
Specifically, Lilienfeld et al. (2019) argued that personality disorders (most notably Cluster B personality disorders) can be understood as emergent interpersonal syndromes, which are 1) comprised of distinct traits which may be unrelated or negatively related to one another and 2) cannot be captured by purely additive combinations of traits and instead arise from interactions among the traits. This latter feature of emergent interpersonal syndromes is argued to give rise to the impairment associated with specific types of personality pathology. Importantly, common instruments used to assess psychopathy ranging from the Psychopathy Checklist-Revised (Hare, 2003) to various self-report instruments (e.g., Elemental Psychopathy Assessment; Lynam et al., 2011; Triarchic Psychopathy Measure; Patrick & Drislane, 2014) were designed to have a multidimensional structure, allowing for straightforward tests of these interactions. Numerous trait-by-trait interaction tests have been conducted but there is little empirical evidence supporting consistent interaction effects (see Benning & Smith, 2019). These results could be due to the absence of such effects, p-hacking and/or selective reporting, or the fact that the odds are against finding such effects in many applied scenarios (McClelland & Judd, 1993). We explore the latter possibility here.
Methodological Difficulties Surrounding Interaction Tests
Although various methodological factors decrease the ability to detect interaction effects, we focus primarily on two—the small magnitude of interaction effects and measurement error. Meta-analytic evidence suggests that typical interaction effect sizes are significantly smaller than main effects. Surveying 30 years of published literature examining moderation effects in organizational psychology (636 interaction effect sizes), Auginis et al. (2005) found that the median effect size was f2 = .002, which is five times smaller than Cohen’s (1988) benchmark for a small effect. The Aguinis et al. (2005) meta-analysis focused on interactions between continuous and categorical variables, but similar concerns have been raised in other contexts (i.e., interactions between continuous variables; McClelland & Judd, 1993; Murphy & Russell, 2017). Such small effects are to be expected given the nature of interaction effects (McClelland & Judd, 1993; Tosh et al., 2021).
Measurement error is an underappreciated issue in detection of interactions. Importantly, when testing for an interaction, the reliability of the interaction term will be lower than the reliabilities of the predictors (Cohen et al., 2003). The reliability of the product term is roughly equal to the product of the reliabilities of the two predictors—if two predictors have alpha values of .80, the reliability of the product term will be close to .64. Though measurement error is known to attenuate effect sizes which in turn affects statistical power, popular software for power analyses like G*Power (Erdfelder et al., 2009) do not account for measurement error, resulting in upwardly biased estimates of power.
Importantly, low statistical power has implications beyond the Type II error rate; low power also influences the Type I error rate and is a primary driver of the replication crisis in psychology. For a given study, a small sample size increases the risk of mis-estimation, increases the risk of a Type I error, and can populate the published literature with effect size estimates that are much larger or even opposite to the true effect (Gelman & Carlin, 2014; Segerstrom & Boggero, 2020). Underpowered statistical tests lead to a proliferation of the Type I errors in the literature by creating asymmetric hypothesis tests that are only considered meaningful if a statistically significant effect is found. Low power along with publication incentives also contributes to increases in the Type I error rate by creating the conditions for p-hacking and HARKing; if a hypothesized effect is not found in an underpowered study, a significant effect must be found somewhere in the data if the study is to be published.
Current Study
Data are needed to inform decisions for study designs to reliably detect interaction effects in personality disorder research. Researchers must be aware of the conditions necessary to reliably detect these interactions; this task is likely more difficult than many researchers believe. Using a Monte Carlo simulation design and representative effect sizes from the psychopathy literature, we examine power to detect trait-by-trait interactions in applied research scenarios.
Method
To estimate power to detect trait-by-trait interactions we conducted simulation-based power analyses in R (v.4.0.3; R Core Team, 2021) using RStudio (v1.3.1093; RStudio Team, 2020) and the ‘InteractionPoweR’ package (Baranger, 2021). The InteractionPoweR package is designed for power analyses when the effect of interest is a two-way interaction.2 The package provides the ability to vary multiple parameters relevant to power, including main effects of predictors, the magnitude of the interaction effect, the reliabilities of the predictor and outcome variables, and the intercorrelations among predictors.3
Estimates of main effects and predictor intercorrelations were taken from two prior meta-analyses on the TriPM and the Psychopathic Personality Inventory-Revised (PPI-R; Lilienfeld & Andrews, 1996). Specifically, we derived main effects for the three TriPM subscales of Boldness, Meanness, and Disinhibition from Sleep et al. (2019), which reported the relations between the TriPM scales and an externalizing behavior outcome category. Average intercorrelations among the TriPM scales were also drawn from Sleep et al. (2019). Main effect values and intercorrelations for the PPI-R factors of Fearless Dominance and Self-centered Impulsivity were drawn from Miller & Lynam (2012), which also reported results for an externalizing behavior outcome. Table 1 provides an overview of various main effects that were used as inputs for our power analyses. Small effects represent the minimum effect size, median effects represent the median effect size, and large effects represent the maximum effect size observed for each domain in the relevant meta-analysis.
Table 1.
Psychopathy Scale Effect Sizes with Externalizing Outcome
| Effect Sizes for Externalizing Behavior Outcome | |||
|---|---|---|---|
| Psychopathy Scale | Small Effect Size (r) | Median Effect Size (r) | Large Effect Size (r) |
| TriPM-Boldness | .05 | .11 | .14 |
| TriPM-Meanness | .16 | .39 | .51 |
| TriPM-Disinhibition | .30 | .44 | .47 |
| PPI-R-Fearless-Dominance | −.04 | .06 | .12 |
| PPI-R-Self-centered Impulsivity | .26 | .39 | .45 |
Note: TriPM=Triarchic Psychopathy Measure; PPI-R=Psychopathic Personality Inventory-Revised; Effect sizes for the TriPM are taken from Sleep et al. (2019) while PPI-R effect sizes are taken from Miller & Lynam (2012); Small ESs represent the minimum effect size reported in the meta-analysis for a given domain, median ES represent the median effect size, and large ES represents the maximum effect size.
We examined power to detect a range of interaction effect sizes, using Pearson’s r as our effect size metric. The interaction effect sizes examined were rs of .10, .15, .20, and .25. Importantly, our analyses assumed no essential collinearity among the predictors and product term after centering. Thus, our simulation approach is akin to a scenario where the predictor variables are normally distributed and no skew is present in either variable, which would otherwise introduce essential collinearity among the two predictors and the product term. As a result, our effect sizes can also be conveyed in terms of the increment in R2 after adding the interaction term to the regression model. Corresponding ΔR2 values of our effect sizes are .01, .02, .04, and .06.4 Last, we examined two measurement error scenarios—one in which all predictor and outcome variables had reliability values of .80, and one in which all reliability values were .90. These values were chosen to be representative of most reported internal consistency values for self-report measures in the psychopathy literature.
We also varied sample size, again using data from the two meta-analyses mentioned previously to examine small (25th percentile), median (50th percentile), and large (75th percentile) sample sizes based on the published literature. For our TriPM analyses, the respective sample sizes were 159, 260, and 428. For the PPI-R, the sample sizes were 99, 217, and 399. All code needed to run the power analyses, as well as the full simulation results themselves, are available at https://osf.io/e8afy/?view_only=df4f124d1be6420dbd6c7cf293d12a1b.
Results5
Simulation results are presented in Figures 1 and 2, with Figure 1 showing results when all predictor and outcome variables had reliabilities of .80 and Figure 2 showing results when all reliabilities were .90. In cases when the interaction effect size was .10 (ΔR2 =.01), power was far below the typical threshold of .80 to detect psychopathy trait-by-trait interactions, regardless of the degree of measurement error in the variables. The highest power estimate for an interaction effect size of r =.10 was .54, which was the power to detect an interaction between TriPM-Meanness and TriPM-Boldness in a sample of N=428 assuming large main effects for both predictors and all variables had reliabilities of .90. As the interaction effect sizes increased, the power to detect effects increased correspondingly. However, in all but a few cases, small sample sizes were consistently underpowered to detect interactions, even when the interaction effect was very large (i.e., r = .25). Similarly, when the interaction effect size was r = .15, power was always below the threshold of .80 when reliabilities of the predictors and outcomes was equal to .80. This was generally the case when reliabilities were higher, but when sample sizes were larger (median or large samples) along with large main effects, power to detect interactions of r = .15 reached acceptable levels (.80 or higher) in seven cases. Higher power estimates occurred when interaction effect sizes were very large (rs = .20 or .25), but high power in these cases also depended on sample size and measurement error; power estimates were consistently above .90 when variables were highly reliable and median or large sample sizes were examined. Figures 1 and 2 also underscore the impact of an even modest decrease in reliability as it pertains to power (i.e., the decrease in reliability from .90 to .80). For example, when holding sample size constant (N=428) and focusing on the .10 interaction effect size for the interaction between TriPM-Boldness and TriPM-Meanness, decreases in power estimates ranged from 10% to 14%.
Figure 1. Power Estimates to Detect Two-way Interactions when α = .80.
Note: Number of simulations for each unique parameter combination = 2000; all predictor and outcome reliabilities = .80; Sample sizes represent the 25th, 50th, and 75th percentiles from studies reported in the meta-analyses; small, medium, and large main effects for the psychopathy subscales are drawn from Table 1.
Figure 2. Power Estimates to Detect Two-way Interactions when α = .90.
Note: Number of simulations for each unique parameter combination = 2000; all predictor and outcome reliabilities = .90; Sample sizes represent the 25th, 50th, and 75th percentiles from studies reported in the meta-analyses; small, medium, and large main effects for the psychopathy subscales are drawn from Table 1.
To examine the sample size needed to reliably detect interaction effects of .10 and .05 (with .05 being roughly equivalent to the median meta-analytic interaction effect size found in Aguinis et al., 2005), we ran additional sensitivity analyses focused on the interaction between TriPM-Boldness and TriPM-Disinhibition, assuming that the reliability of the TriPM scales and the outcome variable were all .80.6 The results showed that to detect a Boldness x Disinhibition interaction of .10 with 80% power, a sample of 1,300 would be required, while a sample of approximately 5,200 would be needed to detect an interaction effect of .05. These sample sizes are five and twenty times the median sample sizes, respectively, for studies included in Sleep and colleagues meta-analysis on the TriPM (Sleep et al., 2018).
Discussion
Using simulation-based methods that allowed for greater flexibility to incorporate factors that affect power when testing interactions, results showed that typical research scenarios will be notably underpowered to detect psychopathy trait-by-trait interactions. Only in cases where interaction effect sizes were very large, and frankly implausible, did power approach or pass acceptable threshold levels. In fact, our smallest interaction effect size of .10 may be optimistic, as it is five times as large as the median interaction effect identified in past reviews (Aguinis et al., 2005) and similarly sized interaction effects have been described as optimistic in general personality research (Chaplin, 1991). Though we focused on parameters that reflected typically observed values in psychopathy research, the current findings have implications for the broader field of personality disorder research where interaction tests are frequently reported.
Implications for Trait-by-trait Interactions in Psychopathy
The present results suggest that the PPI-R and TriPM literatures are underpowered to detect trait-by-trait interactions when considering the factors (e.g., reliability, predictor intercorrelations, sample sizes, etc.) that affect power. The simulation results underscore what researchers have previously highlighted—statistical interactions will be difficult to detect (McClelland & Judd, 1993; Benning & Smith, 2019). Though the PPI-R and TriPM are two of the more popular measures of psychopathy, research using other popular measures of psychopathy have found little evidence of interactive effects. In studies using the PCL-R, Kennealy and colleagues (2010) examined whether the interpersonal and affective components (i.e., PCL-R Factor 1) interacted with impulsive antisocial behavioral features (i.e., PCL-R Factor 2) to predict violent behavior independent of the additive effects of the two factors. This is the most well-powered test of a trait-by-trait interaction in psychopathy (N=10,555), and the authors found no evidence in support of an interaction between PCL-R Factors 1 and 2 in predicting violence (d = .00). Although some researchers have suggested that predictors with high intercorrelations (i.e., PCL-R Factors 1 and 2) will negatively impact power (Lilienfeld et al., 2019; Benning & Smith, 2019), the opposite is true in most cases.7 Our simulation results show that TriPM scales with higher overlap (i.e., Meanness and Disinhibition) had increased power to detect trait-by-trait interactions compared to scales with less overlap (e.g., Meanness and Boldness), albeit by a relatively small degree.
The current results suggest that typical research scenarios do not allow for well-powered tests of trait-by-trait interactions and thus many studies that have tested for such interactions are not informative. This applies to studies where positive results are published, but also to studies publishing null results (e.g., Vize et al., 2016). Other researchers have noted that sample sizes will need to likely exceed 1,000 participants to achieve adequately powered tests of trait-by-trait interactions in psychopathy (Benning & Smith, 2019). Our results show that slightly larger samples (i.e., N =1,300) may be needed, and highlight that if researchers are interested in reliably detecting psychopathy trait-by-trait interactions it will require significantly larger samples than what is typically observed in psychopathy research.
Implications for Interaction Tests in Personality Disorder Research
Though we have focused on psychopathy trait-by-trait interactions, many of the difficulties in detecting these interactions generalize to other personality disorder research scenarios. For example, Hyatt et al. (2021) examined power to detect interactions in laboratory aggression research, where researchers frequently examine how experimental condition, a factor with reliability of 1.0, moderates the impact of personality pathology on aggression (e.g., Vize et al., 2021; West et al., 2021). Hyatt et al. (2021) found that even with interaction effect sizes that are double the size of those typically found in the literature, adequate power (.80) would require sample sizes of at least N=1,000.
In addition, though most measures of psychopathy make it simple to test for trait-by-trait interactions, similar multidimensional operationalizations have been developed for narcissism (Back et al., 2013; Miller et al., 2017) and to a lesser extent, borderline personality disorder . For these other maladaptive personality constructs, trait-by-trait interactions will be just as difficult to detect given that the statistical features of psychopathy measures (e.g., subscale intercorrelations, typical reliability values, magnitude of main effects) are also found in measures of narcissism and borderline personality disorder. The difficulty in reliably detecting interactions will be present in many other areas of psychological research and beyond. With these difficulties in mind, there are important steps researchers can take to help ensure that tests of interactions are informative and advance the field, many of which overlap with past suggestions from outside the personality disorder field (e.g., Murphy & Russell, 2017).
Improving Research on Interactive Effects
Conceptual Considerations.
We have focused thus far on low power to reliably detect interaction effects, but the conceptual difficulties of statistical interactions are also underappreciated, and past research has highlighted the lack of theoretical support for many tests of interactions (Frazier et al., 2004; Murphy & Russell, 2017; South, 2019). In many cases, focusing on the additive effects of multiple predictors may be sufficient for many research hypotheses. As Murphy and Russell (2017) note, the size of interaction effects frequently pale in comparison to main effects—the ability to predict a single percentage point increase in outcome variance, or fractions thereof, may not be of much value. This is particularly important when weighed against the resources required (e.g., sample sizes in the thousands) to detect small interaction effects. Thus, researchers need to provide sufficient theoretical justification for why the interaction matters and the form the interaction will take (e.g., enhancing, buffering, or antagonistic; Cohen et al., 2003), so that there is a clear link between a hypothesis and the statistical model used to test the hypothesis.
A more thoughtful consideration of when and why one would test for interactions may reveal that the test of an interaction is unnecessary. A hypothesis focused on the consequences of a specific combination of maladaptive traits is, in many cases, consistent with simple additive effects, even for conceptualizations of certain personality disorders as emergent interpersonal syndromes (Lilienfeld et al., 2019). If both antagonistic traits and disinhibitory traits relate to antisocial behavior on their own, then their simultaneous presence will lead to greater antisocial behavior than either trait alone. That is, individuals who are both antagonistic and disinhibited will be the most antisocial of all. Interactive effects may become more relevant in the absence of main effects, as is the case for traits related to boldness. Nonetheless, Benning and Smith (2019) highlight that if the source of impairment in emergent interpersonal syndromes is due to the interaction among traits, one would expect large interaction effect sizes (for both two-way interaction as well as three-way interactions concerning psychopathy) which runs against the empirical realities surrounding interactive effects and the empirical results within the psychopathy literature.
Research Design Considerations.
If a theory or model does posit an interactive effect that is independent of the additive effects of multiple predictors, specific research designs will help ensure that the interaction test will be informative. An essential step in building a strong literature of interaction tests requires thoughtful power analyses that use realistic parameter estimates to determine sample size and to ascertain statistical power. We encourage researchers to utilize software packages that allow for greater flexibility in modeling the various parameters that affect power (e.g., ‘InteractionPoweR’; Baranger et al. 2021; ‘Superpower’; Lakens & Caldwell, 2021). Most features of the ‘InteractionPoweR’ package are also available in an easy-to-use Shiny application (https://mfinsaas.shinyapps.io/InteractionPoweR/; Finsaas et al., 2021) for users less familiar with using R. The current results show that typical assumptions in popular power analysis software for interactions (e.g., no measurement error) will lead to systematic bias in power analyses if left unchecked. Alternatively, structural equation modeling approaches where latent variables serve as predictors and outcomes can avoid the deleterious effects of measurement error on statistical power (Li et al., 1998). Other researchers have noted that one-sided tests can be used for statistical interactions, when researchers have hypotheses about the expected direction of the interaction effect (Benning & Smith, 2019).
More broadly, tests of interactions are no different than many other statistical tests that require transparent justification for a given statistical approach to increase the ability of researchers to evaluate the severity of the test. This includes sample size justification, justification of a selected Type I error rate, and justification of an expected effect size for an interaction. As previously noted, there is good reason to expect that effect sizes for interactions will be very small, and it is unwise to use typical benchmarks for small effects (Correll et al., 2020). Knowledge about expected effects will in turn inform the broader research design to ensure that the test is informative whether the effect is present or absent. Though there are some exceptions, most interaction tests will require very large samples to ensure adequate power. To ensure the ability to detect small interaction effects reliably, one approach may be to pool resources and data, which has been used in other areas of psychology outside of personality disorder research to conduct highly powered replication studies of various effects (Ebersole et al., 2016; Klein et al., 2018). In psychopathy research, this may be a particularly promising approach to better evaluate trait-by-trait interactions among popular measures of psychopathy like the TriPM or PPI-R where significant amounts of data are available.
Interpretive Issues Surrounding Interaction Effects.
Last, though we have focused on the conditions necessary to make conclusions about the presence or absence of an interaction, there remain difficult interpretive hurdles after an interaction effect has been observed. Rohrer and Arslan (2021) describe how modeling decisions and the scaling of measures included in the regression model can lead to spurious conclusions about interaction effects. These authors highlight common disconnects between hypotheses about interaction effects (e.g., whether the hypothesized moderation effect concerns correlations or regression slopes) that may result in incorrect conclusions regarding statistical interactions.
Relatedly, McCabe and colleagues (2018) highlighted the shortcomings of the typical approaches to visualizing interactions (i.e., simple slope and marginal effect plots), which may obscure issues in the data such as range restriction in the predictor or moderator variables that would otherwise provide valuable information about how to properly interpret the detected interaction. However, McCabe and colleagues (2018) also provide the ability to appropriately visualize the data underlying interaction effects using the ‘interActive’ application (McCabe, 2019), a valuable tool in helping researchers provide substantive interpretations of an observed interaction effect. Other recent work emphasizes similar points regarding the probing of interactions (Finsaas & Goldstein, 2021), and also provides open-source tools to improve traditional approaches to probing interaction effects.
Limitations
There are important features of the present study that may limit the generalizability of the present results. First, like all simulation-based studies, the results hinge on the assumptions of the data generating models. As previously noted, we did not explore effects of skew in our analyses, nor did we examine interactions between continuous and categorical predictors. Modeling these common features of applied research would likely produce slightly different results. For example, a categorical variable like experimental condition would not contain measurement error which negatively impacts power. Though we have discussed how certain assumptions of our simulation-based analyses are likely to generalize to other areas of research, we encourage researchers to consider which aspects may limit generalizability to their own area of research.
Second, we have focused on detecting psychopathy trait-by-trait interactions in relation to broad-based externalizing behaviors. Externalizing behaviors like substance use and antisocial behavior are some of the most clinically relevant and impairing behaviors tied to psychopathy, but other outcomes are also suitable candidates for trait-by-trait interactions to emerge and would entail different main effect values to be used for power analyses which can have some impact on power to detect interactions, though these effects are small relative to other design considerations such as sample size and measurement error. Third, our results speak only to the type of between-subjects designs that are typically used in the personality disorder field. Testing for interactions using other research designs (e.g., repeated assessments) may improve statistical power. Nonetheless, these designs may also struggle to achieve sufficient power to detect interactions (Auginis et al., 2013).
Future Directions and Conclusion
The ability to hypothesize and test for the presence of interactions in personality disorder research provides important avenues for empirically examining factors that may moderate the development, maintenance, and expression of personality pathology. Consistent with past research, the current study shows that the ability to reliably detect interactions requires substantial resources and careful study design. Ultimately, it is worthwhile to be more circumspect regarding tests of interactions—both in terms of whether the necessary conditions have been met to ensure that the interaction effect has been estimated accurately and whether they provide predictive benefits relative to simple additive effects. The present study provides informative data to guide study planning for research aimed at detecting interaction effects. Recently developed, freely available software can help provide the necessary tools to researchers interested in detecting and accurately interpreting interaction effects. Using these tools, alongside practices that make clear how severely an interaction hypothesis has been tested (e.g., preregistration of hypotheses and modeling decisions; Lakens, 2019; Nosek et al., 2018), can help strengthen the evidence base for interaction effects in personality disorder research.
Nearly all research papers focused on statistical interactions, including the present paper, have been consistent in their prognoses for interaction effects: 1) they are very difficult to detect using common approaches in psychological research, and 2) interactions rarely add substantive predictive value beyond main effects. Thus, we would be remiss not to reiterate past calls (e.g., Murphy & Russell, 2017) to simply end traditional practices for testing for statistical interactions in personality disorder research. At minimum, underpowered tests of interactions should not be conducted or reported given their propensity to introduce Type I errors into the literature.
If completely jettisoning tests of interactions seems too extreme, a lesson might be taken from the field of genetics which has transitioned from underpowered candidate gene studies to genome wide association studies (GWAS). Both literatures share some similarities. Candidate gene studies were significantly limited by small samples and most tests were significantly underpowered, resulting in a proliferation of findings that were nearly all false positives (e.g., Border et al., 2019). Like statistical interaction effects, the contribution of any one gene to a psychiatric phenotype like schizophrenia or major depressive disorder, or to personality domains like neuroticism, is incredibly small (Montag et al., 2020). To improve insights into genetic contributions to psychiatric phenotypes, large consortiums that leverage “big data” like the Psychiatric GWAS Consortium (PGC; Sullivan, 2010) have been incredibly successful in identifying meaningful risk variants for various disorders, mostly due to use of sample sizes in the hundreds of thousands. Using information derived from highly-powered GWAS studies, polygenic risk scores have been developed which show much stronger predictive abilities relative to single candidate genes (e.g., Linnér et al., 2021). In order to make similar progress, the field would need to work with a finite set of traits in the same way that GWAS consortia work with a very large but finite genome. Trait models are available for general personality (e.g., Five Factor Model; Costa & McCrae, 1992) and disordered personality in the form of the Alternative Model of Personality Disorders (Krueger et al., 2012). Using methods that allow for integrative data analysis across samples would also be essential (Hussong et al., 2013).
In sum, tests of interactions in personality disorder research should be conducted when there is high likelihood of detecting small effects. To ensure high power to detect interaction effects, significant resources are needed and data sharing may be required. Sufficient justification for interaction tests should be provided, and the justification should include why interaction effects provide important information beyond simple additive effects. Ultimately, consideration of the issues raised here can help improve clinical research focused on interaction effects.
Supplementary Material
Funding Note:
Colin E. Vize is funded through the National Institute of Mental Health (NIMH) training grant T32-MH018269; David A. A. Baranger is funded through T32-MH018951; Megan C. Finsaas is funded through NIMH training grant T32-MH013043; Thomas Olino is funded through NIMH R01-MH107495.
Footnotes
As past research has noted, detection of statistical interactions becomes much easier in purely experimental contexts where researchers are able to ensure maximal variability in the interacting variables (McClelland & Judd, 1993). However, the majority of personality disorder research would qualify as what McClelland and Judd (1993) refer to as “field studies.” The current manuscript is focused solely on this latter case as it pertains to interactions.
Our discussion and methods focus on two-way interactions, but past research has shown that the methodological issues applicable to two-way interactions also apply to higher order interactions such as three- or four-way interactions. Generally, the methodological difficulties will be more pronounced when testing higher-order interactions. For example, the reliability of a three-way interaction term will be lower than a two-way interaction term when measurement error is present in the predictor variables. If all predictor variables have reliabilities of .80, the reliability of the two-way interaction term would be .64, and .51 for a three-way interaction term.
InteractionPoweR has other useful capabilities, such as being able to model dichotomous predictors and outcomes and to incorporate skewness into estimates of power, though we do not use these options in the current paper.
Because we use the correlation between the product term and outcome as the effect size, the proportion of total y-variance accounted for by the product term remains constant across the analyses. This is in contrast to keeping f2 (i.e., proportion of remaining variance accounted for) constant across the analyses which reduces the total y-variance accounted for by the product term as the contribution of main effects increases.
Simulation results are also presented in table format in the supplementary material available on OSF.
Code and results for the sensitivity analysis are available on the OSF page for the project: https://osf.io/e8afy/?view_only=df4f124d1be6420dbd6c7cf293d12a1b
See equation 2 and the appendix in McClelland & Judd (1993) for an explication of why higher intercorrelations increase power. Briefly, equation 2 shows that when the correlation between predictors increases, there will be a greater increase in the covariance between extreme values in the two predictors relative to the adjustment for “diagonal-ness” in the joint distribution of the predictors. The net effect of this difference is an increase in the power to detect interactions.
Contributor Information
Colin E. Vize, University of Pittsburgh
David A. A. Baranger, Washington University in St. Louis
Megan C. Finsaas, Columbia University
Brandon L. Goldstein, University of Connecticut
Thomas M. Olino, Temple University
Donald R. Lynam, Purdue University
Data Availability:
All code needed to reproduce our simulation results are available on the Open Science Framework using the following URL https://osf.io/e8afy/
References
- Aguinis H, Beaty JC, Boik RJ, & Pierce CA (2005). Effect size and power in assessing moderating effects of categorical variables using multiple regression: A 30-year review. Journal of Applied Psychology, 90, 94–107. 10.1037/0021-9010.90.1.94 [DOI] [PubMed] [Google Scholar]
- Aguinis H, & Stone-Romero EF (1997). Methodological artifacts in moderated multiple regression and their effects on statistical power. Journal of Applied Psychology, 82, 192–205. 10.1037/0021-9010.82.1.192 [DOI] [Google Scholar]
- Aguinis H, Gottfredson RK, & Culpepper SA (2013). Best-Practice Recommendations for Estimating Cross-Level Interaction Effects Using Multilevel Modeling. Journal of Management, 39, 1490–1528. 10.1177/0149206313478188 [DOI] [Google Scholar]
- Back MD, Küfner ACP, Dufner M, Gerlach TM, Rauthmann JF, & Denissen JJA (2013). Narcissistic admiration and rivalry: Disentangling the bright and dark sides of narcissism. Journal of Personality and Social Psychology, 105(6), 1013–1037. 10.1037/a0034431 [DOI] [PubMed] [Google Scholar]
- Baranger DAA, Finsaas MC, Goldstein BL, Vize CE, Lynam DR, & Olino TM (2021). InteractionPoweR: Power analyses for interaction effects in cross-sectional regressions. https://github.com/dbaranger/InteractionPoweR. [Google Scholar]
- Benning SD, & Smith EA (2019). Forms, importance, and ineffability of factor interactions to define personality disorders. Journal of Personality Disorders, 33, 623–632. 10.1521/pedi.2019.33.5.623 [DOI] [PubMed] [Google Scholar]
- Border R, Johnson EC, Evans LM, Smolen A, Berley N, Sullivan PF, & Keller MC (2019). No support for historical candidate gene or candidate gene-by-interaction hypotheses for major depression across multiple large samples. American Journal of Psychiatry, 176, 376–387. 10.1176/appi.ajp.2018.18070881 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Busemeyer JR, & Jones LE (1983). Analysis of multiplicative combination rules when the causal variables are measured with error. Psychological Bulletin, 93, 549–562. 10.1037//0033-2909.93.3.549 [DOI] [Google Scholar]
- Chaplin WF (1991). The Next Generation of Moderator Research in Personality Psychology. Journal of Personality, 59, 143–178. 10.1111/j.1467-6494.1991.tb00772.x [DOI] [PubMed] [Google Scholar]
- Chaplin WF (1997). Personality, Interactive Relations, and Applied Psychology. Handbook of Personality Psychology, 873–890. 10.1016/b978-012134645-4/50034-2 [DOI] [Google Scholar]
- Chapman AL, Dixon-Gordon KL, Layden BK, & Walters KN (2010). Borderline personality features moderate the effect of a fear induction on impulsivity. Personality Disorders: Theory, Research, and Treatment, 1, 139–152. 10.1037/a0019226 [DOI] [PubMed] [Google Scholar]
- Cohen J (1988). Statistical power analysis for the behavioral sciences. In Statistical Power Analysis for the Behavioral Sciences (Vol. 2nd, p. 567). 10.1234/12345678 [DOI] [Google Scholar]
- Cohen Jacob, Cohen P, West SG, & Aiken LS (2003). Applied Multiple Regression/Correlation Analyses for the Behavioral Sciences (3rd ed.). Lawrence Earlbaum Associates. [Google Scholar]
- Correll J, Mellinger C, McClelland GH, & Judd CM (2020). Avoid Cohen’s ‘Small’, ‘Medium’, and ‘Large’ for Power Analysis. Trends in Cognitive Sciences, 24, 200–207. 10.1016/j.tics.2019.12.009 [DOI] [PubMed] [Google Scholar]
- Costa PT, & McCrae RR (1992). The Five-Factor Model of personality and its relevance to personality disorders. Journal of Personality Disorders, 6, 343–359. 10.1037/10423-001 [DOI] [Google Scholar]
- Crowe ML, Weiss BM, Sleep CE, Harris AM, Carter NT, Lynam DR, & Miller JD (2021). Fearless Dominance/Boldness Is Not Strongly Related to Externalizing Behaviors: An Item Response-Based Analysis. Assessment, 28, 413–428. 10.1177/1073191120907959 [DOI] [PubMed] [Google Scholar]
- Ebersole CR, Atherton OE, Belanger AL, Skulborstad HM, Allen JM, Banks JB, Baranski E, Bernstein MJ, Bonfiglio DBV, Boucher L, Brown ER, Budiman NI, Cairo AH, Capaldi CA, Chartier CR, Chung JM, Cicero DC, Coleman JA, Conway JG, … Nosek BA (2016). Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67, 68–82. 10.1016/j.jesp.2015.10.012 [DOI] [Google Scholar]
- Erdfelder E, FAul F, Buchner A, & Lang AG (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41, 1149–1160. 10.3758/BRM.41.4.1149 [DOI] [PubMed] [Google Scholar]
- Finsaas MC, & Goldstein BL (2021). Do simple slopes follow-up tests lead us astray? Advancements in the visualization and reporting of interactions. Psychological Methods, 26, 38–60. [DOI] [PubMed] [Google Scholar]
- Finsaas MC, Baranger DAA, Goldstein BL, Vize C, Lynam D, & Olino TM (2021). InteractionPoweR Shiny App: Power Analysis for Interactions in Linear Regression. https://intmoddev.shinyapps.io/intPower/. [Google Scholar]
- Frazier PA, Tix AP, & Barron KE (2004). Testing moderator and mediator effects in counseling psychology research. Journal of Counseling Psychology, 51, 115–134. 10.1037/0022-0167.51.1.115 [DOI] [Google Scholar]
- Gelman A, & Carlin J (2014). Beyond power calculations: Assessing Type S (sign) and Type M (magnitude errors. Perspectives on psychological Science, 9, 641–651. [DOI] [PubMed] [Google Scholar]
- Gratz KL, Latzman RD, Tull MT, Reynolds EK, & Lejuez CW (2011). Exploring the association between emotional abuse and childhood borderline personality features: The moderating role of personality traits. Behavior Therapy, 42, 493–508. 10.1016/j.beth.2010.11.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haliczer LA, Dixon-Gordon KL, Law KC, Anestis MD, Rosenthal MZ, & Chapman AL (2020). Emotion regulation difficulties and borderline personality disorder: The moderating role of race. Personality Disorders: Theory, Research, and Treatment, 11, 280–289. https://doi.org/ 10.1037/per0000355 [DOI] [PubMed] [Google Scholar]
- Hare RD (2003). The Hare Psychopathy Checklist- Revised. Muliti-Health Systems. [Google Scholar]
- Hussong AM, Curran PJ, & Bauer DJ (2013). Integrative data analysis in clinical psychology. Annual Review of Clinical Psychology, 9, 61–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hyatt CS, Crowe ML, West SJ, Vize CE, Carter NT, Chester DS, & Miller JD (2021). An empirically-based power primer for laboratory aggression research. Aggressive Behavior. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson EC, Border R, Melroy-Greif WE, de Leeuw CA, Ehringer MA, & Keller MC (2017). No Evidence That Schizophrenia Candidate Genes Are More Associated With Schizophrenia Than Noncandidate Genes. Biological Psychiatry, 82, 702–708. 10.1016/j.biopsych.2017.06.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karlsson Linnér R, Mallard TT, Barr PB, Sanchez-Roige S, Madole JW, Driver MN, Poore HE, de Vlaming R, Grotzinger AD, Tielbeek JJ, Johnson EC, Liu M, Rosenthal SB, Ideker T, Zhou H, Kember RL, Pasman JA, Verweij KJH, Liu DJ, … Dick DM (2021). Multivariate analysis of 1.5 million people identifies genetic associations with traits related to self-regulation and addiction. Nature Neuroscience, 24, 1367–1376. 10.1038/s41593-021-00908-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klein RA, Vianello M, Hasselman F, Adams BG, Adams RB, Alper S, Aveyard M, Axt JR, Babalola MT, Bahník Š, Batra R, Berkics M, Bernstein MJ, Berry DR, Bialobrzeska O, Binan ED, Bocian K, Brandt MJ, Busching R, … Nosek BA (2018). Many labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science, 1, 443–490. 10.1177/2515245918810225 [DOI] [Google Scholar]
- Krueger RF, Derringer J, Markon KE, Watson D, & Skodol AE (2012). Initial construction of a maladaptive personality trait model and inventory for DSM-5. Psychological Medicine, 42, 1879–1890. 10.1017/S0033291711002674 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lakens Daniel. (2019). The Value of Preregistration for psychological science: A Conceptual nalysis. Japanese Psychological Review, 62, 221–230. 10.31234/osf.io/jbh4w [DOI] [Google Scholar]
- Lakens Daniël, & Caldwell AR (2021). Simulation-Based Power Analysis for Factorial Analysis of Variance Designs. Advances in Methods and Practices in Psychological Science, 4, 1–14. 10.1177/2515245920951503 [DOI] [Google Scholar]
- Li F, Harmer P, Duncan TE, Duncan SC, Acock A, & Boles S (1998). Approaches to testing interaction effects using structural equation modeling methodology. Multivariate Behavioral Research, 33, 1–39. 10.1207/s15327906mbr3301_1 [DOI] [PubMed] [Google Scholar]
- Lilienfeld SO, & Andrews BP (1996). Development and preliminary validation of a self-report measure of psychopathic personality traits in noncriminal population. In Journal of Personality Assessment (Vol. 66, Issue 3, pp. 488–524). 10.1207/s15327752jpa6603_3 [DOI] [PubMed] [Google Scholar]
- Lilienfeld SO, Patrick CJ, Benning SD, Berg J, Sellbom M, & Edens JF (2012). The role of fearless dominance in psychopathy: confusions, controversies, and clarifications. Personality Disorders, 3(3), 327–340. 10.1037/a0026987 [DOI] [PubMed] [Google Scholar]
- Lilienfeld SO, Watts AL, Murphy B, Costello TH, Bowes SM, Smith SF, Latzman RD, Haslam N, & Tabb K (2019). Personality disorders as emergent interpersonal syndromes: Psychopathic personality as a case example. Journal of Personality Disorders, 33, 577–622. 10.1521/pedi.2019.33.5.577 [DOI] [PubMed] [Google Scholar]
- Lynam DR, Gaughan ET, Miller JD, Miller DJ, Mullins-Sweatt S, & Widiger T. a. (2011). Assessing the basic traits associated with psychopathy: development and validation of the Elemental Psychopathy Assessment. Psychological Assessment, 23(1), 108–124. 10.1037/a0021146 [DOI] [PubMed] [Google Scholar]
- McCabe CJ (2019). InterActive Application. [Google Scholar]
- McCabe CJ, Kim DS, & King KM (2018). Improving Present Practices in the Visual Display of Interactions. Advances in Methods and Practices in Psychological Science, 1, 147–165. 10.1177/2515245917746792 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McClelland GH, & Judd CM (1993). Statistical difficulties of detecting interactions and moderator effects. Psychological Bulletin, 114, 376–390. 10.1037/0033-2909.114.2.376 [DOI] [PubMed] [Google Scholar]
- Miller JD, & Lynam DR (2012). An examination of the Psychopathic Personality Inventory’s nomological network: a meta-analytic review. Personality Disorders, 3(3), 305–326. 10.1037/a0024567 [DOI] [PubMed] [Google Scholar]
- Miller JD, Lynam DR, Hyatt CS, & Campbell WK (2017). Controversies in narcissism. Annual Review of Clinical Psychology, 13, 291–315. [DOI] [PubMed] [Google Scholar]
- Montag C, Ebstein RP, Jawinski P, & Markett S (2020). Molecular genetics in psychology and personality neuroscience: On candidate genes, genome wide scans, and new research strategies. Neuroscience and Biobehavioral Reviews, 118, 163–174. 10.1016/j.neubiorev.2020.06.020 [DOI] [PubMed] [Google Scholar]
- Murphy KR, & Russell CJ (2017). Mend It or End It: Redirecting the Search for Interactions in the Organizational Sciences. Organizational Research Methods, 20, 549–573. 10.1177/1094428115625322 [DOI] [Google Scholar]
- Nosek BA, Ebersole CR, Dehaven AC, & Mellor DT (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115, 2600–2606. 10.1073/pnas.1708274114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Patrick CJ, & Drislane LE (2014). Triarchic Model of Psychopathy: Origins, Operationalizations, and Observed Linkages with Personality and General Psychopathology. Journal of Personality. 10.1111/jopy.12119 [DOI] [PubMed] [Google Scholar]
- R Core Team (2021). R: A language and environment for statistical computing (Version 4.0.5) [Programming language]. R Foundation for Statistical Computing. https://www.R-project.org/. [Google Scholar]
- Rohrer JM, & Arslan RC (2021). Precise Answers to Vague Questions: Issues With Interactions. Advances in Methods and Practices in Psychological Science, 4, 1–19. 10.1177/25152459211007368 [DOI] [Google Scholar]
- RStudio Team (2020). RStudio: Integrated development for R (Version 1.4.1106) [Computer software]. RStudio, PBC. http://www.rstudio.com/. [Google Scholar]
- Segerstrom SC, & Boggerom IA (2020). Expected estimation errors in studies of the cortisol awakening response: A simulation. Psychosomatic Medicine, 82, 751–756. [DOI] [PubMed] [Google Scholar]
- Sleep CE, Weiss BM, Lynam DR, & Miller JD (2019). An Examination of the Triarchic Model of Psychopathy’s Nomological Network: A Meta- Analytic Review. Clinical Psychology Review, 71, 1–26. [DOI] [PubMed] [Google Scholar]
- South SC (2019). Psychopathy as an emergent interpersonal syndrome: What is the function of fearlessness? Journal of Personality Disorders, 33, 633–639. 10.1521/pedi.2019.33.5.633 [DOI] [PubMed] [Google Scholar]
- Stone-Romero EF, Alliger GM, & Aguinis H (1994). Type II Error Problems in the Use of Moderated Multiple Regression for the Detection of Moderating Effects of Dichotomous Variables. Journal of Management, 20, 167–178. 10.1177/014920639402000109 [DOI] [Google Scholar]
- Sullivan PF (2010). The Psychiatric GWAS Consortium: Big science comes to psychiatry. Neuron, 68(2), 182–186. 10.1016/j.neuron.2010.10.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tosh C, Greengard P, Goodrich B, Gelman A, Vehtari A, & Hsu D (2021). The piranha problem: Large effects swimming in a small pond. ArXiv Preprint, 1–15. http://arxiv.org/abs/2105.13445 [Google Scholar]
- Verona E, Sprague J, & Javdani S (2012). Gender and factor-level interactions in psychopathy: Implications for self-directed violence risk and borderline personality disorder symptoms. Personality Disorders: Theory, Research, and Treatment, 3, 247–262. 10.1037/a0025945 [DOI] [PubMed] [Google Scholar]
- Visscher PM, Brown MA, McCarthy MI, & Yang J (2012). Five years of GWAS discovery. American Journal of Human Genetics, 90, 7–24. 10.1016/j.ajhg.2011.11.029 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vize CE, Lynam DR, Lamkin J, Miller JD, & Pardini D (2016). Identifying Essential Features of Juvenile Psychopathy in the Prediction of Later Antisocial Behavior. Clinical Psychological Science, 4(3), 572–590. 10.1177/2167702615622384 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vize CE, Miller JD, Collison KL, & Lynam DR (2021). Untangling the relation between narcissistic traits and behavioral aggression following provocation using an FFM framework. Journal of Personality Disorders, 35, 299–318. 10.1521/pedi_2020_34_321 [DOI] [PubMed] [Google Scholar]
- West SJ, Hyatt CS, Miller JD, & Chester DS (2021). p-Curve analysis of the Taylor Aggression Paradigm: Estimating evidentiary value and statistical power across 50 years of research. Aggressive Behavior, 47, 183–193. 10.1002/ab.21937 [DOI] [PubMed] [Google Scholar]
- Wright AGC, Hopwood CJ, & Morey LC (2016). Features of Personality Pathology. 125(8), 1120–1134. 10.1037/abn0000165.Longitudinal [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All code needed to reproduce our simulation results are available on the Open Science Framework using the following URL https://osf.io/e8afy/


