Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jan 1.
Published in final edited form as: J Abnorm Psychol. 2019 Oct 28;129(1):14–20. doi: 10.1037/abn0000480

Extreme-groups designs in studies of dimensional phenomena: Advantages, caveats, and recommendations

Joscelyn E Fisher 1,2, Anika Guha 3, Wendy Heller 2, Gregory A Miller 2,3
PMCID: PMC6928394  NIHMSID: NIHMS1052879  PMID: 31657600

Abstract

Extreme-groups designs (EGDs) are common in psychopathology research, often using diagnostic category as an independent variable. Continuous-variable analysis strategies drawing from a General Linear Model framework can be applied to such designs. The growing emphasis on dimensional examinations of psychological constructs, encouraged by the NIMH Research Domain Criteria framework, encourages continuous-variable analytic strategies. However, the interpretative implications of applying these strategies to various types of populations and sample score distributions, including those used in EGDs, are not always recognized. Appropriateness and utility of EGDs depend in part on whether the goal is to determine whether a relationship exists between two variables or to determine its strength. Whereas the literature investigating EGDs has emphasized symmetrical thresholds for defining extreme groups (e.g., bottom 10% vs. top 10%), psychopathologists often employ asymmetric thresholds (e.g., above a diagnostic threshold vs. a broader range of scores in a healthy comparison group). The present paper selectively reviews literature on EGDs and extends it with simulations of symmetric and asymmetric selection criteria. Results indicate that including a wide range of scores in EGDs substantially mitigates problems (e.g., inflation of effect size) that arise when using statistical methods classically employed for continuous variables.

Keywords: Extreme groups design, experimental design, effect size inflation

General Scientific Summary:

Psychopathology research often compares a group of individuals with high scores on some measure to a group with lower scores (extreme groups design, EGD). Simulations were undertaken to explore problems that EGDs may present. Results showed that including intermediate scores mitigates problems that may arise, such as effect-size inflation.


Psychopathology researchers often use extreme-groups designs (EGDs) for various reasons. For example, researchers may be interested in a phenomenon conceptualized to be taxonic in which high-scoring individuals are conceived to have a disorder that is discontinuous from normal function. EGDs also provide an advantage in that analyses (e.g., t-tests) are easier to implement and interpret than hierarchical multiple regression or multilevel modeling. The growing emphasis on dimensional examinations of psychological constructs leading to the NIMH Research Domain Criteria (RDoC) framework (Berenbaum, 2013; Kozak & Cuthbert, 2016; Miller, Rockstroh, Hamilton, & Yee, 2016; Yee, Javitt, & Miller, 2015), the HiTOP consortium (Kotov et al., 2017), and advances in dimensional treatments of personality disorders (e.g., Krueger, 2005; Widiger, Livesley, & Clark, 2009) and of psychopathology more generally (e.g., Caspi et al., 2014; Clark, 2005; Krueger, Watson, & Barlow, 2005; Lahey, Krueger, Rathouz, Waldman, & Zald, 2017) make understanding the advantages and disadvantages of using an EGD increasingly important. Continuous variables can represent the dimensional nature of psychopathology constructs with more fidelity, yet score distributions can vary widely across populations. In addition, the distribution of scores in a given sample may differ from the distribution of scores in the population of interest. Thus, it is important to determine whether and how effects are overestimated, or how inferential tests are biased, if parts of a distribution are missing. The present paper provides a brief, selective review of the literature on EGDs that highlights issues and recommendations for psychopathology researchers. In addition, the results of new simulations undertaken to address these issues are discussed.

Sampling Strategies in Extreme-Groups Designs

There are a variety of ways to select a sample using an EGD; the goals of the study should drive the sampling strategy (Kozak & Cuthbert, 2016). For example, an investigator may wish to ensure a balanced representation of various segments of the population and use stratified sampling of symptoms. An RDoC recommendation is to over-sample high scores while sampling the entire range (e.g., Cuthbert, 2014, 2015; Kozak & Cuthbert, 2016; Simmons & Quinn, 2014), although consequences for experimental design have not been fully identified. One might also intentionally skew the sampling of a subpopulation of convenience to approximate the distribution of scores in the full population, such as selective sampling of an undergraduate class. In another context, sampling may be designed to avoid predictors being correlated in the sample, even when they are correlated in the population being sampled. For example, if depression in the population is more common in women, but the investigator wants to evaluate the effect of depression on a dependent variable without a gender confound, sampling may be designed to produce equal numbers of men and women. This sampling strategy would systematically misrepresent the population (with regard to gender effects and interactions with gender), but it may be a good strategy to answer the investigator’s primary question by improving statistical power or interpretive clarity. As an alternative to using an EGD as a sampling strategy from the outset of a study, psychopathology researchers may collect data using instruments that yield continuous scores and then derive categorical group variables from these continuous variables (e.g., creating high- and low-depression groups).

Analysis Strategies in Extreme-Groups Designs

There is no arithmetic barrier to using analysis methods classically employed for continuous variables (e.g., regression) to evaluate categorical variables. The mechanics of using regression with categorical variables as predictors that are suitably dummy-coded are well established (e.g., Cohen, Cohen, West, & Aiken, 2003). Treating predictors as continuous (e.g., regression rather than t-test), even when portions of the distribution of the predictor or grouping variable are missing, can enhance statistical power.

Alternatively, categorical variables may be submitted to ANOVA or may be dummy-coded for use in regression. This strategy can cost statistical power by throwing away data about within-category variation that may be meaningful. Furthermore, such dummy coding may become less satisfactory as more of the population distribution is represented in a given sample. For example, if groups are defined by less extreme scores, such as when the top and bottom 33% are coded by a 2-level dummy variable (or via median split), dimensional information in the sample is substantially underutilized (c.f. top and bottom 10%). Although it is arithmetically sound to perform these types of analyses, there are several issues to consider, specifically how effect size and statistical power are affected.

Effect Size in Extreme-Groups Designs

The goal of a study may be to determine (a) whether two variables reliably share variance (traditional, thresholded, binary significance testing: should H0 be rejected?) or (b) how much variance they share. Identifying the effect size is centrally important for the latter goal. Estimates of effect size (e.g., betas, r, η2, or Cohen’s d) may be inflated if studies rely on extreme scores, with the middle portion of a distribution not represented. As sample descriptors, betas and correlations would not be inflated, as they simply represent the relationships in the data set. However, they may overestimate relationships in the population. For instance, Preacher, Rucker, MacCallum, and Nicewander (2005) reviewed literature showing that, in EGDs, standardized effect sizes tend to be inflated, whereas unstandardized effect sizes (expressed in the raw units of whatever was measured) are not (Cohen, Cohen, Aiken, & West, 1999; Feldt, 1961; Humphreys, 1985).

The vulnerability of EGD to inflate sample effect size relative to population effect size has received little attention in the psychopathology literature. The emphasis in the RDoC initiative on continuous variables may lead to reduced reliance on EGDs. Nevertheless, psychopathology researchers are likely to maintain interest in inclusion of substantial psychopathology in many designs, consistent with RDoC recommendations to oversample the high end relative to the entire population. As a result, RDoC-inspired studies will not be immune to the issue of exaggerated estimates of effect size. Researchers should thus be explicit about how their sampling strategy speaks to whether the goal of a given study is to determine whether a relationship exists vs. to determine its strength in a subpopulation of interest vs. to determine its strength in the population.

Statistical Power in Extreme-Groups Designs

It has been argued that power and statistical significance are too often emphasized at the expense of characterizing (preferably quantitatively) relationships (e.g., Cohen et al., 1999; Meehl, 1978; Miller, 2004; Preacher et al., 2005). Nevertheless, one appeal of EGDs is enhanced statistical power. Feldt (1961) analytically and Fowler (1992) empirically, among others, each demonstrated that an EGD frequently provides more statistical power than an unselected sample. Given that analysis of extreme groups can be advantageous, should investigators routinely remove middling scores from their data set? Preacher et al. (2005, p. 182) argued that selecting extreme groups after data have been collected is a mistake: “the power improvement associated with [EGD] does not apply when data are gathered from across the full range of x and then discarded from the middle of the x distribution…. The power benefit applies only when one compares [EGD] with the analysis of full-range data using the same sample size.” Furthermore, they noted that scores in the middle of a distribution are often more reliable than those near the extremes, even while also noting that observations near the middle of a distribution are often less informative than those further out. The implication is that the power advantage of EGDs comes not from the use of extreme groups per se, but from the concentration of sample N toward the extremes of score distributions. Simply discarding scores from the middle of the range reduces the N, hurting power. The choice to rely on extreme groups should be made at the outset of a study, with resources expended on extra N at the ends.

Does using groups that are more extreme enhance statistical power? The answer depends on a host of factors that vary considerably across research contexts. Very often, investigators will be constrained by resources available for sampling. For example, screening can be used to identify extreme cases, but the more extreme the cutoff, the smaller the qualifying sample. A smaller obtained N, and thus fewer degrees of freedom, may mean a race between shrinking N and rising effect size, with unforeseen implications for statistical power: limiting one’s sampling to more extreme groups can mean a bigger mean difference (increased effect size) while reducing N (undermining significance testing). Preacher et al. (2005) advised that a comparison of the top 25% and bottom 25% is often optimal. Notably, they endorsed the demonstration by Alf and Abrahams (1975) that, with such a selected sample, a regression/correlation analysis strategy provides more statistical power than a t-test comparing the same two extreme groups.

Simulation Studies of EGD with Asymmetric Thresholds

A literature survey did not identify an evaluation of sampling that discards middle portions of a continuous distribution when low and high thresholds are asymmetric, as often used in psychopathology research. For example, psychopathology researchers may contrast a relatively small, high-scoring group (perhaps clinically diagnosed, or scoring high enough to serve as a proxy for clinical severity) with a larger group representing much of the lower portion of the distribution. Asymmetric sampling methods may include most of the distribution on a single scale or may employ portions of several scales to define a symptomatically narrow sample, in line with a focus on individual symptoms, as RDoC promotes. For example, many published studies using the Wisconsin Psychosis Proneness Scales (e.g., Fisher, Heller, & Miller, 2007; for reviews, see Edell, 1995; Chapman, Chapman, & Kwapil, 1995; Kwapil & Chun, 2015) have used selection criteria recommended by L.J. Chapman and J.P. Chapman: more than 2 SD above the same-sex mean of a large, relevant sample on one of their scales (e.g., physical anhedonia) and no more than 0.5 SD above the mean on other scales (e.g., perceptual aberration). A similar strategy was used in an extensive series of EEG and fMRI studies evaluating two types of anxiety and co-occurring depression (participants with scores in the top 20% on one or more scales were compared with those in the bottom 50% on several scales; e.g., Burdwood et al., 2016; Fisher et al., 2010, 2014), thus encompassing most of the population distribution. Kozak and Cuthbert (2016, p. 296) endorsed such a sampling strategy for RDoC-inspired studies: “Rather than ‘supernormal’ subjects with no psychiatric history designed to provide maximal discrimination from a putative disease group, controls will need to represent a broad range of functioning that shades into psychopathology, so as to facilitate optimal characterization of the dimension of interest.”

To address this gap in the literature, simulations were undertaken to test whether and to what degree results of regression analyses that assume the presence of a continuous distribution are influenced by sampling reflecting an EGD. This issue was examined as a function of the degree of separation between the groups (i.e., how much of the middle of the distribution is removed) and as a function of the true correlation between two variables. A dataset of random numbers was generated and correlations calculated between segments of data with varying portions of the middle of the distribution removed. To determine whether the simulation results are similar to what one would observe using real data, which are often multivariate in psychopathology research designs, additional simulations were undertaken using a large anxiety and depression self-report dataset.

Method

Two vectors of data (y and x) consisting of 1,000 points each were randomly generated, with a uniform distribution in each vector. Kozak and Cuthbert (2016) recommended this distribution as potentially ideal for an RDoC study of working memory. Simulations for the present paper also evaluated Gaussian and Generalized Pareto distributions. Correlation between the two arrays was then introduced by the equation y = y + (x * SNR), where SNR (signal-to-noise ratio) was a value between 0 and 1 that varied systematically. Pairs of (x,y) values within the vectors were sorted according to numerical position in the x array for use when middle portions of the x distribution were excluded at a later stage, and the correlation between the two vectors was calculated.

In order to determine whether this correlation between the vectors is altered when some middle portion of the distribution is removed, the extreme ends (e.g., bottom 10% and top 10%) of the vectors were selected, and correlations between the vectors consisting of only the extreme ends were calculated. In order to examine the effect of asymmetric thresholds, the bottom 50% and top 20% were also used as cutoffs. For a given case of removal of a portion of the distribution, this procedure was run 100 times, producing 100 samples. Thus, results were not subject to the distribution of a single set of 1,000 random (x,y) pairs. Because middle scores were deleted from the full sample without replacement, the size of the retained extreme samples was not held constant. Although this procedure did not reflect the recommendation of Preacher et al. (2005) against creating extreme groups by deleting middle scores after sample collection, it had the virtue of retaining identical scores to facilitate comparison of different amounts of deletion. Thus, for example, the comparison of 10% and 20% cutoffs differed only in the scores in the second decile; those in the first decile were identical. Use of a large full sample (1,000 x,y pairs) resulted in the smallest extreme groups retaining 50 pairs in each group in the comparison of the bottom 5% and top 5%, still a sizable number and mitigating any stability concern (confirmed by the orderly results in Figure 1A). The average full-sample correlation across the 100 samples and the average extreme-sample correlations across the 100 samples were produced.

Figure 1.

Figure 1.

A: Simulations of various symmetrical sample cutoffs and various degrees of correlation between the variables in the full sample (i.e., SNR). B: Extreme Group Design with asymmetrical sample cutoffs while varying the degree of correlation in the full sample (i.e., SNR). Lower cutoff is 50%.

Separately, in order to explore the effect of sample size, a set of simulations utilizing the method above was conducted in which the two vectors of data (y and x) consisted of 20, 50, 100, 200, 500 and 1000 points each. The bottom and top 25% of the vectors were selected to generate all simulated samples and correlations between these extreme ends were calculated.

To determine whether results of simulations would differ using actual participant data, the effects of EGD were examined using an unselected dataset (N = 5095) of individuals who participated in research at the University of Illinois Urbana-Champaign between 2005–2013. Data from the Penn State Worry Questionnaire (PSWQ; Meyer et al., 1990; Molina & Borkovec, 1994) for assessing anxious apprehension and the Mood and Anxiety Symptom Questionnaire (MASQ; Watson, Clark, et al., 1995; Watson, Weber, et al., 1995) for measuring anxious arousal and anhedonic depression were used. First, the correlation between anxious apprehension and anxious arousal scores in the full dataset was calculated, yielding r = 0.27. As was done for the simulated dataset, the correlation between scores on these psychopathology measures was varied, either by adding a random number (0–200) to scores (decreasing the correlation) or adding a percentage of one score (10%−110%) to the other (increasing the correlation). To determine whether these correlations were altered by removing middle portions of the distribution, a variety of extreme-group selection criteria were applied to the sample by selecting groups according to anxious apprehension scores: bottom and top 10%, bottom and top 20%, bottom and top 25%, and bottom 50% and top 20%.

To determine how EGD affects multivariate analyses of continuous data, the same methods of sample selection and correlation manipulation were used to generate samples for conducting multiple regressions that predicted anhedonic depression from anxious apprehension arousal and anxious arousal within each sample. The total R2 values for each model were compared. Simulations were performed with MATLAB and SPSS code written by and available from J.E. Fisher and A. Guha.

Results

Simulations indicated that the inflation of correlations that is often assumed to exist in extreme-groups designs depends in part on two parameters: how extreme the group selection is and the true correlation in the population. Figure 1A provides the results of systematically varying these two parameters. When the null hypothesis (H0) is correct (rxy = 0, thus SNR = 0 the y vector has no x variance in it), selection of extreme groups has no effect on the correlation. However, when H0 is false, the extreme-groups sample correlation is inflated relative to the full-sample correlation, roughly linearly as the selection cutoffs become more extreme and monotonically with increasing magnitude of the full-sample correlation (SNR). Thus, Figure 1A shows that correlations rise as more extreme group thresholds are used.

Asymmetrical cutoffs:

As discussed above, studies of psychopathology often employ a sampling strategy using asymmetric cutoffs. Figure 1B provides the results of simulations that tested a wide range of strengths of the underlying relationship between two variables. For an EGD that retained the lower 50% and upper 20% of the distribution (depicted in the left column of Figure 1B), there was virtually no inflation of sample effect size, compared to the right column which depicts the full sample.

Sample size and distribution shape:

The findings reported above did not vary with sample size. In addition, no interaction was evident between sample size and SNR when the bottom 25% and top 25% of the vectors were selected for comparison. Similar simulations evaluated Gaussian and Generalized Pareto distributions to complement those above, which used a uniform distribution. Results were very similar to those described above (Supplementary Figure 1).

Simulations with participant data:

Simulated sampling of participant data revealed that an EGD tended to increase the observed correlation between two anxiety variables, but only when a correlation in the full sample was present (Figure 2A). An inflection was observed such that the most extreme selection (bottom and top 10%) resulted in the greatest increase in estimated effect size when the full sample correlation ranged 0.2–0.5. The regression models predicting anhedonic depression from anxious arousal and anxious apprehension demonstrated (Figure 2B) that, for all correlation values, selecting the bottom and top 10% of individuals on anxious apprehension resulted in a noticeable increase in estimated total R2. EGD generally increased observed total R2 over the full sample across all set correlations. At and above the actual correlation from the full sample (r = 0.27), all total R2 values at each EGD converged, such that values were identical for all samples when the correlation ranged 0.27-.90.

Figure 2.

Figure 2.

A: Cutoffs on PSWQ scores and the strength of the correlation between AA and PSWQ in the full sample were varied. B: Anxious arousal (AA, from MASQ) and anxious apprehension (PSWQ) scores simultaneously predicting anhedonic depression (AD, from MASQ) in linear regression models. Each model used a sample from a simulated EGD. Cutoffs on PSWQ scores and the strength of the correlation between AA and PSWQ in the full sample were varied systematically.

Discussion

As psychopathology researchers contrast and integrate categorical and dimensional constructs and methods, familiarity with the advantages and pitfalls of EGD is becoming increasingly important. A brief literature review provided an overview of EGDs. Simulations were conducted to explore the performance of EGDs when using asymmetric thresholds, in which a high-scoring group is compared to a larger lower-scoring group that encompasses more of the distribution. Results indicate that including a wide range of scores within the middle of the distribution substantially mitigates problems that arise when using statistical methods classically employed for continuous variables to analyze EGD samples. Figure 1A and in Figure 2 are consistent with earlier literature, showing that EGDs can often provide superior statistical power to find a true effect.

Yet EGDs must be implemented and interpreted with appropriate caution. Present results point to three important points. First, as previous authors have demonstrated, the risk of an inflated effect-size estimate increases as group selection is from more extreme ends of a dimension and, less so, as the magnitude of the population correlation increases.

Second, these characteristics seem to be robust to the distribution types and sample sizes explored here. Although population distributions may differ from the orderly shape assumed in the simulations, several studies with varied distributional assumptions have reached similar conclusions. For example, Fowler (1992) emphasized that distributions in psychological research are very frequently non-normal. Similarly, in present analyses, simulations returned quite similar patterns for Gaussian, uniform, and Generalized Pareto distributions and for various sample sizes.

Third, the simulations reported in Figures 1B, 2A, and 2B indicate that, at least under some common assumptions, retention of at least some middle portion of the population distribution to augment the extreme groups can substantially mitigate inflation of the estimated population effect size. RDoC-inspired research that involves broad sampling with oversampling of certain portions of score distributions may accelerate as RDoC evolves (e.g., https://www.nimh.nih.gov/news/science-news/2018/nimh-releases-updates-to-its-rdoc-framework.shtml;https://www.nimh.nih.gov/news/science-news/2019/sensorimotor-domain-added-to-the-rdoc-framework.shtml?utm_source=APS+Emails&utm_campaign=5d18f18796-PSU_011819_FULLMEMB&utm_medium=email&utm_term=0_d2c7283f04-5d18f18796-62625779

Limited multivariate simulations using real data resulted in findings similar to those of simpler analyses. Extreme-group selection inflated observed effect size only when a medium-to-large effect size existed in the full sample, with the most extreme group selections generally resulting in the greatest increase in observed effect sizes. The observed differences in effect size obtained using different selection strategies (e.g., bottom and top 10% vs. 25%) decreased when the full sample exhibited a medium to large effect size, such that the observed effects began to converge as the full sample effect increased.

Two caveats are warranted. First, neither the available literature nor present simulations investigated the impact of differences in within-group variance on estimates of effect size. A second caveat is particularly relevant to psychopathology research. When the distribution of a population variable involves a diminishing tail, as is often the case with the distribution of clinical symptom severity, the estimate of a more extreme selection criterion (e.g., the 95th percentile of a Gaussian distribution) will be more subject to sample size and to sampling noise (e.g., the estimate of the score that defines the 95th percentile). This consideration is important when selecting a “clinically significant” threshold. Thus, a tradeoff may be required in EGDs between clinically optimal cut-off threshold and available sample size.

In summary, EGDs have several benefits that are particularly notable in the context of psychopathology research. The benefits (and risks) apply broadly, regardless of inpatient vs. outpatient vs. at-risk vs. control sampling, or phase of illness, although these features would have implications for what is feasible or desirable for extreme-group sampling. Since aspects of psychopathology are often dimensional and observed across multiple diagnoses, score distributions of continuous variables can facilitate insight into relationships between them. EGDs can be advantageous for such research by increasing statistical power. Furthermore, inclusion of a wide range of non-extreme scores in EGD improves estimates of effect size and provides better generalizability to the phenomenon of interest.

The following are recommendations to consider when using EGDs:

  1. Employ EGDs on a priori grounds and provide those in the report.

  2. Describe how the sampling strategy serves the study goals, and provide a rationale for the group cutoffs selected. For symmetrical thresholds, comparing the bottom 25% and the top 25% is recommended unless study goals justify other thresholds.

  3. Use of asymmetrical thresholds may be an efficient way to deploy a limited sample size.

  4. Inclusion of some intermediate scores can mitigate statistical problems that arise when using methods classically employed for continuous variables.

Supplementary Material

Supplemental Material

Supplemental Figure 1. Various sample sizes and strengths of correlations in the full sample represented as signal-to-noise ratio (SNR) using a Gaussian distribution (A), uniform distribution (B), and Generalized Pareto distribution (C). All simulations sampled the bottom and top 25% of participants.

Acknowledgments

This research was supported by the National Institute of Drug Abuse (R21 DA14111) and the National Institute of Mental Health (P50 MH079485, R01 MH61358, R01 MH110544S1, T32 MH19554). GAM is a member of the National Institute of Health National Advisory Mental Health Council and one of two co-chairs of its Workgroup for Revisions to the RDoC Matrix. The opinions and assertions expressed herein are those of the author(s) and do not necessarily reflect the official policy or position of NIH, the NAMHC, the Uniformed Services University, or the Department of Defense. This is the only paper in which we have presented the issues, simulations, analyses, and discussion provided here. The real data used in some of the simulations were collected for a project approved by the UIUC IRB under its Protocol Number 00097.

References

  1. Adusah AK, & Brooks GP (2011). Type I error inflation of the separate-variances Welch t test with very small sample sizes when assumptions are met. Journal of Modern Applied Statistical Methods, 10, Article 33, 362–372. [Google Scholar]
  2. Burdwood EN, Infantolino ZP, Crocker LD, Spielberg JM, Banich MT, Miller GA, & Heller W (2016). Resting-state functional connectivity differentiates anxious apprehension and anxious arousal. Psychophysiology, 53, 1451–1459. doi: 10.1111/psyp.12696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Caspi A, Houts RM, Belsky DW, Goldman-Mellor SJ, Harrington H, Israel S, … & Moffitt TE (2014). The p factor: One general psychopathology factor in the structure of psychiatric disorders? Clinical Psychological Science, 2, 119–137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chapman JP, Chapman LJ, Kwapil TR (1995). Scales for the measurement of schizotypy In Raine A, Lencz T, Mednick S (Eds.), Schizotypal personality disorder (pp. 79–106). Cambridge University Press: Cambridge, England. [Google Scholar]
  5. Clark LA, & Watson D (1995). The mini mood and anxiety symptom questionnaire (Mini-MASQ). Unpublished manuscript, University of Iowa. [Google Scholar]
  6. Clark LA (2005). Temperament as a unifying basis for personality and psychopathology. Journal of Abnormal Psychology, 114, 505–521. doi: 10.1037/0021-843X.114.4.505 [DOI] [PubMed] [Google Scholar]
  7. Cohen J, Cohen P, West SG, & Aiken LS (2003). Applied multiple regression/correlation analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum. [Google Scholar]
  8. Cohen P, Cohen J, Aiken LS, & West SG (1999). The problem of units and the circumstance for POMP. Multivariate Behavioral Research, 34, 315–346. [Google Scholar]
  9. Cuthbert BN (2014). The RDoC framework: Facilitating transition from ICD/DSM to dimensional approaches that integrate neuroscience and psychopathology. World Psychiatry, 13, 28–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cuthbert BN (2015). Research Domain Criteria: Toward future psychiatric nosologies. Dialogues in Clinical Neuroscience, 17, 73–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Edell WS (1995). The psychometric measurement of schizotypy using the Wisconsin scales of psychosis proneness In Miller GA (Ed.), The behavioral high-risk paradigm in psychopathology (pp. 3–46). Springer, New York, NY. [Google Scholar]
  12. Feldt LS (1961). The use of extreme groups to test for the presence of a relationship. Psychometrika, 26, 307–316. [Google Scholar]
  13. Fisher JE, Heller W, & Miller GA (2007). Semantic associations, lateralized frontal function, and context maintenance in schizotypy. Neuropsychologia, 45, 663–672. [DOI] [PubMed] [Google Scholar]
  14. Fisher JE, Sass SM, Heller W, Silton RL, Edgar CE, Stewart JL, & Miller GA (2010). Time course of processing emotional stimuli as a function of perceived emotional intelligence, anxiety, and depression. Emotion, 4, 486–497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Fisher JE, Miller GA, Sass SM, Silton RL, Edgar JC, Stewart JL, Zhou J, & Heller W (2014). Neural correlates of suspiciousness and interactions with anxiety during emotional and neutral word processing. Frontiers in Psychology: Personality and Social Psychology, 5, 596. doi: 10.3389/fpsyg.2014.00596 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Humphreys LG (1985). Correlations in psychological research In Detterman DK (Ed.), Current topics in human intelligence: Vol. 1. Research methodology (pp. 3–24). Norwood, NJ: Ablex Publishing. [Google Scholar]
  17. Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, Sanislow C, & Wang P (2010). Research domain criteria (RDoC): Toward a new classification framework for research on mental disorders. American Journal of Psychiatry, 167, 748–751. [DOI] [PubMed] [Google Scholar]
  18. Kotov R, Krueger RF, Watson D, Achenbach TM, Althoff RR, Bagby RM, … & Eaton NR (2017). The Hierarchical Taxonomy of Psychopathology (HiTOP): A dimensional alternative to traditional nosologies. Journal of Abnormal Psychology, 126, 454–477. [DOI] [PubMed] [Google Scholar]
  19. Kozak MJ, & Cuthbert BN (2016). The NIMH research domain criteria initiative: background, issues, and pragmatics. Psychophysiology, 53, 286–297. [DOI] [PubMed] [Google Scholar]
  20. Krueger RF (2005). Continuity of Axes I and II: Toward a unified model of personality, personality disorders, and clinical disorders. Journal of Personality Disorders, 19, 233–261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Krueger RF, Watson D, & Barlow DH (2005). Introduction to the special section: Toward a dimensionally based taxonomy of psychopathology. Journal of Abnormal Psychology, 114, 491–493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kwapil TR, Chun CA (2015). The psychometric assessment of schizotypy In Mason O & Claridge G (Eds.), Schizotypy: New dimensions (pp. 7–32). Routledge: London. [Google Scholar]
  23. Lahey BB, Krueger RF, Rathouz PJ, Waldman ID, & Zald DH (2017). A hierarchical causal taxonomy of psychopathology across the life span. Psychological Bulletin, 143, 142–186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Meehl PE (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald and the slow progress of son psychology. Journal of Consulting and Clinical Psychology, 46, 806–834. [Google Scholar]
  25. Meyer TJ, Miller ML, Metzger RL, & Borkovec TD (1990). Development and validation of the penn state worry questionnaire. Behaviour research and therapy, 28(6), 487–495. [DOI] [PubMed] [Google Scholar]
  26. Miller GA (2004). Another quasi-thirty years of slow progress. Applied and Preventive Psychology: Current Scientific Perspectives, 11, 61–64. [Google Scholar]
  27. Miller GA, Rockstroh B, Hamilton HK, & Yee CM (2016). Psychophysiology as a core strategy in RDoC. Psychophysiology, 53, 410–414. doi: 10.1111/psyp.12581. [DOI] [PubMed] [Google Scholar]
  28. Molina S, & Borkovec TD (1994). The Penn State Worry Questionnaire: Psychometric properties and associated characteristics.
  29. Preacher KJ, Rucker DD, MacCallum RC, & Nicewander WA (2005). Use of the extreme groups approach: a critical reexamination and new recommendations. Psychological methods, 10, 178. [DOI] [PubMed] [Google Scholar]
  30. Widiger TA, Livesley WJ, & Clark LA (2009). An integrative dimensional classification of personality disorder. Psychological Assessment, 21, 243–244. doi: 10.1037/a0016606 [DOI] [PubMed] [Google Scholar]
  31. Simmons JM, & Quinn KJ (2014). The NIMH Research Domain Criteria (RDoC) project: Implications for genetics research. Mammalian Genome, 25, 23–31. [DOI] [PubMed] [Google Scholar]
  32. Watson D, Weber K, Assenheimer JS, Clark LA, Strauss ME, & McCormick RA (1995). Testing a tripartite model: I. Evaluating the convergent and discriminant validity of anxiety and depression symptom scales. Journal of abnormal psychology, 104(1), 3. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material

Supplemental Figure 1. Various sample sizes and strengths of correlations in the full sample represented as signal-to-noise ratio (SNR) using a Gaussian distribution (A), uniform distribution (B), and Generalized Pareto distribution (C). All simulations sampled the bottom and top 25% of participants.

RESOURCES