Skip to main content
. 2021 Apr 20;17(4):e1009149. doi: 10.1371/journal.ppat.1009149

Fig 5. Variable importance of genomic features.

Fig 5

Variable importance of genome composition features in ensemble random forest models predicting coronavirus host category from whole genome sequences (x axis) and spike protein sequences (y axis), with labelling of top ten most informative features from both analyses. Points denote mean values of relative decrease in Gini impurity associated with each feature across A) m = 222 and B) m = 185 random forests during hold-one-out cross-validation. Colour key denotes genomic feature type.