Figure 3. Ranking relevance of each species in the predictive models for each dataset and identification of a minimal microbial signature for CRC detection.
(A) The importance of each species for the cross-validation prediction performance in each dataset estimated using the internal RF scores. Only species appearing in the five top ranking features in at least one dataset are reported. Prediction performances at increasing number of microbial species obtained by re-training the RF classifier on the N top ranked features identified with a first RF model training in a cross-validation (B) and LODO-setting (C). The rankings are obtained excluding the testing dataset to avoid overfitting.