Table 1:
train on TCGA n=426 15% MSI | train on QUASAR n=1770 14% dMMR | train on DACHS n=2013 14% MSI | train on NLCS n=2197 10% dMMR | |
---|---|---|---|---|
test on TCGA (US) | 0.74 [0.66, 0.80] | 0.76 [0.70, 0.79] | 0.77 [0.73, 0.79] | 0.72 [0.71, 0.78] |
test on QUASAR (UK) | 0.67 [0.64, 0.68] | 0.89 [0.86, 0.91] | 0.71 [0.68, 0.75] | \ 0.76 [0.73, 0.78] |
test on DACHS (DE) | 0.81 [0.79, 0.83] | 0.68 [0.65, 0.72] | 0.92 [0.91, 0.94] | 0.80 [0.78, 0.82] |
test on NLCS (NL) | 0.77 [0.74, 0.79] | 0.80 [0.78, 0.81] | 0.82 [0.79, 0.83] | 0.90 [0.89, 0.91] |
Main performance measure was area under the receiver operating curve, shown as mean with lower and upper bounds in a 10-fold bootstrapped experiment. Intra-cohort-performance was estimated by three-fold cross-validation. US = United States, UK = United Kingdom, DE = Germany, NL = Netherlands, MSI = microsatellite instability, dMMR = mismatch repair deficiency.