Skip to main content
. 2017 Jan 4;7:39194. doi: 10.1038/srep39194

Figure 1. Overview of PaPrBaG workflow.

Figure 1

Reads are simulated from genomes in both the training (left) and prediction (center) workflows, from which features are extracted. The training sequence features together with the associated phenotype labels compose the training database, on which the random forest algorithm trains a pathogenicity classifier. This classifier predicts the pathogenic potential for each read in the test set. From these raw results, the prediction profile, the genome aggregate prediction and a combined prediction can be generated (right).