Skip to main content
. 2020 May 26;16(5):e1007894. doi: 10.1371/journal.pcbi.1007894

Fig 1. Workflow for extracting and testing different level feature sets for predicting host taxon information.

Fig 1

Virus genome data was represented by four information layers and features derived from each. Binary classification with linear SVM was used on the equal sized positive and negative classes of virus-host association, split into training and test sets. Area under the ROC curve, AUC, score was measured for each dataset-feature set combination.