Skip to main content
. 2025 Oct 15;16:1592658. doi: 10.3389/fpsyg.2025.1592658

Figure 4.

Graph A shows the relationship between normalized observed effect size and reliability (ICC) for true correlation values of 0.5 and 0.9. Three effect size measures are compared: Pearson's r, Cohen’s d, and Rank-biserial. Graph B depicts the effect of reliability on p-values for a true correlation of 0.5 with a sample size of 60, showing a decrease in p-value as reliability increases. Graph C displays the required sample size across varying reliability for a true correlation of 0.5, alpha of 0.05, and power of 0.8, with sample size decreasing as reliability improves.

Test-retest reliability effects across different effect size metrics and statistical tests. (A) The observed effect sizes as a function of reliability for rtrue = 0.5, comparing group differences to correlational strength. Note, because the effect sizes among the tests are not directly comparable, each effect size is normalized by its own maximum value at ICC = 1. The inset shows the results for rtrue = 0.9. The dashed line denotes robserved=rtrueICCxICCy. (B) The p-value as a function of reliability for rtrue = 0.5 and the total sample size of N = 60. Dichotomizing data substantially increases p-values, especially when reliability is low. (C) The required sample size to achieve 80% statistical power at α = 0.05 as a function of reliability for the three effect size metrics. Dichotomizing data substantially increases the required sample sizes to detect the same true effect, especially when reliability is low.