Skip to main content
. 2015 Mar 16;11(3):e1004127. doi: 10.1371/journal.pcbi.1004127

Fig 2. Gene expression compendium and classification workflow.

Fig 2

The workflow is divided into three steps: (A) data preprocessing that combines RNA-Seq and microarray datasets. EcoGEC is categorized into three differential expression bins (under-expressed, UE; wild-type, WT; over-expressed OE) and pre-processed for batch-effect and bias correction. (B) model training, where parameters are trained based on four different machine learning methods for each of the classification tasks, and (C) model testing where new samples are assigned to the class labels that have the majority of votes from 4 prediction methods for each of the eight characteristic predictors.