Skip to main content
. Author manuscript; available in PMC: 2018 Mar 1.
Published in final edited form as: Nat Genet. 2017 Jan 16;49(3):332–340. doi: 10.1038/ng.3756

Figure 1. Systematic model comparison.

Figure 1

a. Top panel: Concordance C of different model predictions for overall survival. For cross-validation analyses (grey), we generated 100 training and test sets by randomly splitting the full dataset. The distribution of concordance values across the 100 random sets is shown as a box-and-whisker plot. Also shown are point estimates with error bars for predictions evaluated on pre-specified splits of the dataset, where the training set represented 2 of the 3 trials in the study and the test set was the third trial (red, blue, green) or where the training set was the full AMLSG dataset with the test set being the TCGA cohort (purple). Predictions for the multistage model are evaluated 3yrs after diagnosis.

Lower panel: Using the 100 random cross-validation splits, each of the 10 classes of predictive model was built on the training set and evaluated on the test set. The 10 models were ranked based on their relative performance on the test set and the ranks across the 100 cross-validation splits aggregated, indicating how often each model scored best (1st) to worst (10th). Time-dependent models include allogeneic hematopoietic stem cell transplants, which is treated as a time-dependent covariate to avoid bias.

b. Coefficient of determination R2 for leave-one-out predictions using time-dependent random effects and multistage predictions of the AMLSG cohort, evaluated at each time (x-axis).

c. Same as b, evaluated on TCGA data.