Skip to main content
. 2021 Apr 20;17(4):e1009149. doi: 10.1371/journal.ppat.1009149

Fig 3. Random forest host predictions based on coronavirus genome composition.

Fig 3

Stacked bar plots of predicted probabilities of each host category for coronavirus sequences. Predictions were obtained from ensemble random forest models trained on A) spike protein and B) whole genome composition features. Panels depict sequences from each metadata-derived host category and colour coding denotes model-predicted host category. Stacks represent individual coronavirus sequences, ordered from largest to smallest probability of the correct host, i.e., greater panel area matching the correct host category indicates better overall model performance. Non-zoonotic coronavirus sequences originating from humans (human coronaviruses HKU1, NL63, OC43, 229E) are labelled for clarity. Versions stratified by genera and species are provided as S4 and S5 Figs.