Statistical comparison of models. Consistently low test-statistics and p-values indicate strong evidence that models developed using a combination of schizophrenia polygenic risk score (PRSSZ) and all clinical/demographic variables are better able to discriminate between case and controls than models built using only PRSSZ or clinical/demographic variables alone. Difference in area under the receiver operator characteristic curve (AUROC) indicates how much higher the AUROC is for each modelling approach when using all predictors combined compared to either genetic or non-genetic alone. The test statistic for the Wilcoxon signed rank test, W, is given for all comparisons of the AUROC from each outer test fold of nested cross-validation, split by classifier and dataset. Comparisons have a W of 0 as all corresponding test folds for the combined models have a higher AUROC. P-values are FDR-corrected at 0.05; starred adjusted p-values are significant at the 5 % level.