Skip to main content
International Journal of Methods in Psychiatric Research logoLink to International Journal of Methods in Psychiatric Research
. 2023 Apr 25;32(3):e1969. doi: 10.1002/mpr.1969

Establishing new cutoffs for Cohen's d: An application using known effect sizes from trials for improving sleep quality on composite mental health

Sareh Panjeh 1, Anders Nordahl‐Hansen 2, Hugo Cogo‐Moreira 2,
PMCID: PMC10485313  PMID: 37186318

Abstract

Objective

Cohen's d conventional effect size cutoffs [small (0.2), medium (0.5), and large (0.8)] might not be representative of the reported distribution of effect sizes across the different areas of health. Effect size cutoffs might vary not only depending on the area of research, but also on the type of intervention and population. That is, they are context dependent. Therefore, we present strategies to redefine small, medium, and large effect size based on 25, 50, and 75th percentile, respectively.

Methods

We illustrate these techniques applying them to 72 effect sizes, derived from 65 randomized controlled trials described in a recent meta‐analysis (10.1016/j.smrv.2021.101556) of improving sleep quality on composite mental health. Such percentiles are equally distanced from the average effect size as suggested by Jacob Cohen and checked for potential attenuation effects (via weight selection model) and outliers (via OutRules).

Results

new cutoffs for effect size distribution of −0.177, −0.329, and −0.557, for small, medium, and large effect size were found, respectively. applying Cohen's effect size thresholds (0.2, 0.5, and 0.8) for trials of improving sleep quality on composite mental health might over‐estimate effect sizes compared to the real‐world context, especially around medium and large effect sizes.

Keywords: Cohen's d , effect size, mental health, randomized controlled trials, sleep quality, treatment

1. INTRODUCTION

Disrupted sleep and mental health conditions not only show a bidirectional relationship, but also disrupted sleep being the strongest pathway as a causal factor in the development of other psychiatric problems (Freeman et al., 2020). Therefore, addressing sleep disturbance which leads to better mental health (Pigeon et al., 2017; Winkelman, 2020) is one of researchers' concerns in the sleep area. A recently published meta‐analysis (Scott et al., 2021), that included 72 effect sizes, derived from 65 Randomized Controlled Trials (RCTs), reported the effectiveness of improving sleep quality interventions on composite mental health at study endpoint as a medium‐sized effect (effect size of −0.53; 95% CI, −0.69 to −0.38). This interpretation of effect sizes is based on a rule of thumb where 0.2, 0.5, and 0.8 are considered as small, medium, and large effect sizes (Cohen, 2013). A medium‐sized effect was originally suggested by Jacob Cohen to represent the average effect for a field if too few studies were available to calculate a distribution of effect within that respective field, whereas small and large effects were supposed to be equidistant from this average effect (Cohen, 2013).

Cohen proposed a medium effect as an effect size observable to the naked eye. For example, a change in a treatment group that was clinically meaningful when comparing it to the control group. In other words, when improving sleep quality on overall composite mental health reports a Cohen's d of 0.53 (Scott et al., 2021), there is a 64.6% chance that a person picked at random from the treatment group will have a higher score than a person picked at random from the control group (probability of superiority). Furthermore, in order to have one more favorable outcome in the treatment group compared to the control group, we would need to treat 5.63 people on average. Cohen's guidelines were originally intended to be used when effect size distributions (ESD) are unknown (Cohen, 2013; Glass et al., 1981; Thompson, 2009). Hence, we here introduce an easy way to determine small, medium, and large effect sizes by calculating the ESD within the RCTs using improving sleep quality on composite mental health. The ESD's provided in this article can be applied as a guidance for better planning studies in interpreting the magnitude of the effects. Researchers can find the data and R code to perform ESD analysis in the supplementary materials.

2. METHODS

To illustrate the approach, we use a recent meta‐analysis of RCTs (Scott et al., 2021) that included 72 effect sizes that used composite mental health as an outcome. Medium effect (50th percentile) depicting the average effect size was calculated, as well as small (25th) and large effect sizes (75th) as they are equally distanced from the average effect size (Cohen, 1992), also reported by Quintana (2017), which used the percentiles above described but applied to case‐control studies, serving as inspiration for this work.

We applied the GOSH plot (i.e., a graphical display of study heterogeneity) to detect effect sizes that could overtly influence the ESD. A series of meta‐analyses based on all possible combinations of studies are conducted through this approach. Therefore, it is detectable if a single study or distinct subgroup of studies influences the summary effect size estimate. 50,000 random subset models were set because of computational restrictions.

A weight selection model was used to detect publication bias)see Vevea and Hedges (1995) for more information about weight selection model). The weight selection model assumes that the possibility of being published for studies with non‐significant p‐values is less than those with significant p‐values, hence the former studies will be given greater weight in the model. A likelihood ratio test was used to assess whether the model adjusted for publication bias and the unadjusted model were significantly different. A threshold of 0.1 considered according to Begg and Mazumdar (1994).

Scott et al. (2021) reported effect size of improving sleep interventions before and after excluding 11 outlier effect sizes (i.e., −0.53 and −0.42, respectively), hence, we also calculated effect sizes after removing those 11 effect sizes. Analyses were run on the RStudio version April 1, 1103 and Weka version 3.9.5.

3. RESULTS

The 25th (small effect), 50th (medium effect), and 75th (large effect) percentiles corresponded to −0.209, −0.388, and −0.658, respectively for the 72 effect sizes of sleep improving interventions (Figure 1). These percentiles were recalculated after removing outliers (1‐ outlier reported in the original study by Scott et al. (2021); 2‐ outliers detected by OutRules). Based on Scott and colleagues study, we excluded 11 effect sizes. The OutRules approach also resulted in exclusion of four effect sizes with the highest values of deviance. New percentiles were recalculated as a sensitivity analysis where we used 61 and 68 effect sizes, respectively (Table 1).

FIGURE 1.

FIGURE 1

The ESD of 72 effect sizes from sleep‐improving studies. The 25th, 50th, and 75th percentiles (dashed lines) represent the calculated thresholds for small (0.177), medium (0.329), and large (0.557) effects, respectively. ESD, effect size distribution.

TABLE 1.

Effect size percentiles of 72 and 61 studies (studies without outliers reported by original meta‐analysis (Scott et al., 2021)).

N = 72 N = 61
Effect size 25% −0.209 (−0.177) −0.175
50% −0.388 (−0.329) −0.332
75% −0.658 (−0.557) −0.523

Note: Values in parentheses are after considering attenuation effect due to publication bias.

Regarding the GOSH plot approach, no individual or group of studies was evident to have a substantial impact on the summary effect size (Figure 2).

FIGURE 2.

FIGURE 2

A GOSH plot depicting the heterogeneity (I2) and summary effects sizes of 50,000 different combinations of sleep improving studies from the original meta‐analysis. No distinct clusters were observed, suggesting that summary effect size was not affected by a single study or group of studies. GOSH, graphical display of heterogeneity.

The result of the weight selection model of 72 effect sizes suggested that the adjusted model for publication bias has an intercept of −0.3967 (SE = 0.1187), which in relation to the unadjusted model (intercept = −0.5089, (SE = 0.0735)), represents the attenuation of 15.38%. Based on the statistical test of the goodness‐of‐fit between two models, this difference is statistically significant (χ 2 (1) = 4.61, p‐value = 0.031), resulting in the presence of publication bias. Considering the attenuation effect, 15.38% was reduced from the −0.209, −0.388, and −0.658, generating new percentiles (Table 1). We also reran the weight selection model after excluding outliers reported by Scott et al. (2021). Result showed that for the 61 studies, the adjusted model has an intercept of −0.328 (SE = 0.041), which in relation to the unadjusted model (intercept = −0.320, (SE = 0.041), represents the augmentation of 1.09%. The statistical test of the goodness‐of‐fit between two models depicted that this difference is not statistically significant (χ 2 (1) = 0.403, p‐value = 0.525), resulting in the lack of publication bias.

4. DISCUSSION

This study redefines traditional effect size cutoffs for group differences in sleep improving interventions on composite mental health. By calculating the 25th (small effect), 50th (medium effect), and 75th (large effect) percentiles based on 72 effect sizes derived from 65 RCTs we found that −0.177, −0.329, and −0.557 represent small, medium, and large effect sizes, respectively, after considering the attenuation effect due to publication bias.

Hence, using Cohen's conventional rule of thumb would overestimate expected effect sizes in the context of sleep improving interventions. An effect size of 0.5, for instance, would traditionally be considered a medium effect, whereas a medium effect is associated with −0.329 (after considering the attenuation effect due to publication bias) based on our empirically derived effect size threshold. The similar results were obtained from the sensitivity analysis after excluding outlier effect sizes, specifically around medium and large effect sizes.

Notably, a selection model revealed evidence of publication bias, which would lead to inflated effect sizes. Therefore, we applied such attenuation to the larger amount of effect sizes (n = 72). In terms of limitations, when discussing the analysis involving the 72 effect sizes, weight selection model assumes that the effect sizes are independent. However, the systematic review we used had eight effect sizes from the same RCTs (e.g., studies with more than two arms), therefore, caution must be taken considering our approach. There are two ways available to deal with such a multilevel issue (i.e., effect sizes nested in the same RCT given more than two arms design): (1) excluding dependent effect sizes leaving only one effect size per study, or (2) consider all the effect sizes as independent measures. Considering the first option, regardless of the rule of exclusion criterion (i.e., excluding the largest effect or the lowest effects), we are introducing publication bias. In regard to the second option, we are not accounting for the multilevel design. Given that there is no available weight selection model for multilevel designs, one might introduce bias related to underestimation of the standard errors of the regression coefficients in the selection models, because we are not considering the Multilevel/hierarchical model with clustered‐robust standard errors. The robust standard errors are known as Huber‐White or Huber‐WhiteEiker or “sandwich” estimation (White, 1980). They are more accurate under model and distributional misspecification, which can be applied to any model (i.e., multilevel or multivariate etc.). New improvement in the selection model might incorporate such features allowing it to deal with multiple effect sizes derived from the same RCT.

Based on our results researchers planning and conducting sleep improvement interventions using composite mental health outcomes now have new effect sizes that should be used to calculate sample sizes. The original meta‐analysis by Scott et al. (2021) also includes different outcomes beside the composite mental health that we used in this study. Considering the availability of the R code, one can calculate ESD of a specific mental health outcome such as depression. Some moderators were identified in the analyses by Scott et al. (2021) that we did not include in our analysis because the idea is to obtain the percentiles of the current effect sizes of the original meta‐analysis and sub‐analysis per group would require larger amount of available effect sizes per subgroup of interest.

The approach used here for redefining effect size cutoffs can be applied to different research areas (Nordahl‐Hansen et al., 2022; Panjeh et al., 2023; Quintana, 2017). We encourage researchers to use the code of ESD analysis in their own field of study to plan future research and to better understand the magnitude of effects.

CONFLICT OF INTEREST STATEMENT

Hugo Cogo‐Moreira, Sareh Panjeh and Anders Nordahl‐Hansen declared no conflict of interest.

Supporting information

Supporting Information S1

Supporting Information S2

ACKNOWLEDGMENTS

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), finance code 001.

Panjeh, S. , Nordahl‐Hansen, A. , & Cogo‐Moreira, H. (2023). Establishing new cutoffs for Cohen's d: An application using known effect sizes from trials for improving sleep quality on composite mental health. International Journal of Methods in Psychiatric Research, 32(3), e1969. 10.1002/mpr.1969

DATA AVAILABILITY STATEMENT

The data that supports the findings of this study are available in the supplementary material of this article.

REFERENCES

  1. Begg, C. B. , & Mazumdar, M. (1994). Operating characteristics of a rank correlation test for publication bias (pp. 1088–1101). Biometrics. 10.2307/2533446 [DOI] [PubMed] [Google Scholar]
  2. Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. 10.1037//0033-2909.112.1.155 [DOI] [PubMed] [Google Scholar]
  3. Cohen, J. (2013). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge. [Google Scholar]
  4. Freeman, D. , Sheaves, B. , Waite, F. , Harvey, A. G. , & Harrison, P. J. (2020). Sleep disturbance and psychiatric disorders. The Lancet Psychiatry, 7(7), 628–637. 10.1016/S2215-0366(20)30136-X [DOI] [PubMed] [Google Scholar]
  5. Glass, G. V. , McGaw, B. , & Smith, M. L. (1981). Meta‐analysis in social research. Sage Publications. [Google Scholar]
  6. Nordahl‐Hansen, A. , Cogo‐Moreira, H. , Panjeh, S. , & Quintana, D. S. (2022). Redefining effect size interpretations for psychotherapy RCTs in depression. 10.31219/osf.io/erhmw [DOI] [PubMed]
  7. Panjeh, S. , Nordahl‐Hansen, A. , & Cogo‐Moreira, H. (2023). Moving forward to a world beyond 0.2, 0.5, and 0.8 effects sizes: New cutoffs for school‐based anti‐bullying interventions. Journal of Interpersonal Violence, 08862605221147065. 10.1177/08862605221147065 [DOI] [PubMed] [Google Scholar]
  8. Pigeon, W. R. , Bishop, T. M. , & Krueger, K. M. (2017). Insomnia as a precipitating factor in new onset mental illness: A systematic review of recent findings. Current Psychiatry Reports, 19(8), 1–11. 10.1007/s11920-017-0802-x [DOI] [PubMed] [Google Scholar]
  9. Quintana, D. S. (2017). Statistical considerations for reporting and planning heart rate variability case‐control studies. Psychophysiology, 54(3), 344–349. 10.1111/psyp.12798 [DOI] [PubMed] [Google Scholar]
  10. Scott, A. J. , Webb, T. L. , Martyn‐St James, M. , Rowse, G. , & Weich, S. (2021). Improving sleep quality leads to better mental health: A meta‐analysis of randomised controlled trials. Sleep Medicine Reviews, 60, 101556. 10.1016/j.smrv.2021.101556 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Thompson, B. (2009). A brief primer on effect sizes. Journal of Teaching in Physical Education, 28(3), 251–254. 10.1123/jtpe.28.3.251 [DOI] [Google Scholar]
  12. Vevea, J. L. , & Hedges, L. V. (1995). A general linear model for estimating effect size in the presence of publication bias. Psychometrika, 60(3), 419–435. 10.1007/bf02294384 [DOI] [Google Scholar]
  13. White, H. (1980). A heteroskedasticity‐consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817–838. 10.2307/1912934 [DOI] [Google Scholar]
  14. Winkelman, J. W. (2020). How to identify and fix sleep problems: Better sleep, better mental health. JAMA Psychiatry, 77(1), 99–100. 10.1001/jamapsychiatry.2019.3832 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information S1

Supporting Information S2

Data Availability Statement

The data that supports the findings of this study are available in the supplementary material of this article.


Articles from International Journal of Methods in Psychiatric Research are provided here courtesy of Wiley

RESOURCES