. 2019 Oct 25;10:1041. doi: 10.3389/fgene.2019.01041

Table 1.

Best CART models.

Tumor profiling data	Number of considered features	Number of selected features	Median MCC(CART trained on original data)	Median MCC(CART trained on class-permutated data)	p-value(original vs permutated)
miRNA	337	4	0.43	0.09	4.57·10^-6
methy_CpG	22,941	2	0.54	0.23	2.86·10^-4

The predictive performance of CART models was presented in Figure 1. Here we summarize the characteristics of the two best models (i.e. those exploiting miRNA expression and CpG methylation profiles). A median MCC was calculated with the 10 MCCs coming from LOOCV experiments (each with a different random seed). This five additional LOOCV runs with respect to those presented in Figure 3 were carried out to better characterize the performance of the best models found in our study. The small difference found in median MCC (0.52 in Figure 3 versus 0.54 here) suggests that this performance metric is quite robust to the number of LOOCV runs for CART. The training sets were also class-permutated during cross-validation as explained in the Methods section and CART trained on the resulting data to provide a second set of 10 MCCs per profile. The p-value (two-sided paired Student’s t-test) of this class-permutated test shows how likely are the MCCs of the CART models to arise by chance. The first model was trained on miRNA expression: 4 out of 337 mature miRNAs were retained to build this model reaching a median MCC of 0.43 and performing significantly better than models based on permutated data (p-value = 4.57·10^-6). The second model is obtained processing CpG site methylation (shorten as ‘methy_CpG’): 2 out of 22,941 CpG sites were retained to build this model achieving a median MCC of 0.54 and performing significantly better than permutation models (p-value = 2.86·10^-4).