Skip to main content
. 2019 Oct 25;10:1041. doi: 10.3389/fgene.2019.01041

Table 1.

Best CART models.

Tumor profiling data Number of considered features Number of selected features Median MCC(CART trained on original data) Median MCC(CART trained on class-permutated data) p-value(original vs permutated)
miRNA 337 4 0.43 0.09 4.57·10-6
methy_CpG 22,941 2 0.54 0.23 2.86·10-4

The predictive performance of CART models was presented in Figure 1. Here we summarize the characteristics of the two best models (i.e. those exploiting miRNA expression and CpG methylation profiles). A median MCC was calculated with the 10 MCCs coming from LOOCV experiments (each with a different random seed). This five additional LOOCV runs with respect to those presented in Figure 3 were carried out to better characterize the performance of the best models found in our study. The small difference found in median MCC (0.52 in Figure 3 versus 0.54 here) suggests that this performance metric is quite robust to the number of LOOCV runs for CART. The training sets were also class-permutated during cross-validation as explained in the Methods section and CART trained on the resulting data to provide a second set of 10 MCCs per profile. The p-value (two-sided paired Student’s t-test) of this class-permutated test shows how likely are the MCCs of the CART models to arise by chance. The first model was trained on miRNA expression: 4 out of 337 mature miRNAs were retained to build this model reaching a median MCC of 0.43 and performing significantly better than models based on permutated data (p-value = 4.57·10-6). The second model is obtained processing CpG site methylation (shorten as ‘methy_CpG’): 2 out of 22,941 CpG sites were retained to build this model achieving a median MCC of 0.54 and performing significantly better than permutation models (p-value = 2.86·10-4).