Skip to main content
. 2020 Jun 30;10:1065. doi: 10.3389/fonc.2020.01065

Figure 3.

Figure 3

Data splitting procedure. To avoid information leakage due to the use of the same data both for feature selection and model training, we considered different train and test sets according to the integration scheme. In particular, each data set is split into three non-overlapping partitions (TR/TS/TS2), corresponding to the 50/30/20% of the entire data set, respectively. The TR/TS/TS2 partitions preserve the original proportion of patient phenotypes. Predictive models for juXT and rSNF are trained on TR and validated on TS, while for rSNFi the train set is TS (with features restricted to the intersected biomarkers of juXT and rSNF) and TS2 the test set.