Skip to main content
. 2018 Aug 8;33(1):240–248. doi: 10.1038/s41375-018-0229-3

Fig. 2.

Fig. 2

Estimated predictive performance of the model. The results from a the discovery dataset, and bc the replication datasets. The left-hand side panels show the prediction value distributions over the LOOCV folds for the actual relapsed and non-relapsed groups by the Random forest classification model. The middle panels show the prediction ROC curves and AUC values. In a, the solid black ROC curve indicates the genetic model, the dashed gray curve indicates the model with principal components, and clinical and genetic variables, and the dotted purple curve shows the result using principal components and clinical data only. In b, the dashed green curve and the dotted blue curve show the results for allowing variants with <11 and <81 missing values, respectively. In c, the black curve and the dotted green curve show the results for higher (<0.3) and lower (<0.2) imputed genotype quality filtering stringencies, respectively. The right-hand side panels in ac show the odds ratio for the correct prediction (y-axis) along the prediction model output values (x-axis). The p-values are calculated with one-sided Mann–Whitney test. The statistical power of the AUC is calculated at alpha level 0.01