Skip to main content
. 2021 Nov 8;12:6441. doi: 10.1038/s41467-021-26501-7

Fig. 2. A comparison of the four methods prevalence estimates and confidence intervals for varying proportion of cases and for three sample sizes.

Fig. 2

Mixture distributions of non-cases and T1D patients from WTCCC8 were constructed with pC=0.1,0.4,0.8 (shown in blue, grey and red respectively) and n={500,1000,5000} (shown in panels (ef), (cd), (ab) respectively). a, c, e The constructed mixture distributions and reference distributions (RC, shaded red and RN, shaded blue) from which they were constructed. b, d, f Prevalence estimates, p^C (bullseye) obtained by each of the four methods for varying pC (x-axis) and cohort size, n (rows). Each estimated p^C value is shown together with a violin plot illustrating the distribution of the 100,000 estimates of prevalence (pC) in the bootstrap samples and with confidence intervals (α=0.05) shown as horizontal lines with vertical bars at the ends. Dashed vertical lines indicate reference prevalence values pC. In all the cases, for the Excess method we observe a large offset between the violin plots (including confidence intervals) and the p^C value. This offset is a result of the systematic bias of the Excess method. The other three methods generally show much less bias. Sample sizes: RC – cases WTCCC T1D (n=982), RN – non-cases WTCCC T2D (n=962), mixtures – sampled with replacement from a holdout half of the RC (n=981) and RN (n=962) samples.