Table 1.
Tumor profiling data | Number of considered features | Number of selected features | Median MCC(CART trained on original data) | Median MCC(CART trained on class-permutated data) | p-value(original vs permutated) |
---|---|---|---|---|---|
miRNA | 337 | 4 | 0.43 | 0.09 | 4.57·10-6 |
methy_CpG | 22,941 | 2 | 0.54 | 0.23 | 2.86·10-4 |
The predictive performance of CART models was presented in Figure 1. Here we summarize the characteristics of the two best models (i.e. those exploiting miRNA expression and CpG methylation profiles). A median MCC was calculated with the 10 MCCs coming from LOOCV experiments (each with a different random seed). This five additional LOOCV runs with respect to those presented in Figure 3 were carried out to better characterize the performance of the best models found in our study. The small difference found in median MCC (0.52 in Figure 3 versus 0.54 here) suggests that this performance metric is quite robust to the number of LOOCV runs for CART. The training sets were also class-permutated during cross-validation as explained in the Methods section and CART trained on the resulting data to provide a second set of 10 MCCs per profile. The p-value (two-sided paired Student’s t-test) of this class-permutated test shows how likely are the MCCs of the CART models to arise by chance. The first model was trained on miRNA expression: 4 out of 337 mature miRNAs were retained to build this model reaching a median MCC of 0.43 and performing significantly better than models based on permutated data (p-value = 4.57·10-6). The second model is obtained processing CpG site methylation (shorten as ‘methy_CpG’): 2 out of 22,941 CpG sites were retained to build this model achieving a median MCC of 0.54 and performing significantly better than permutation models (p-value = 2.86·10-4).