Skip to main content
. 2008 May 30;3:22. doi: 10.1186/1745-6150-3-22

Figure 1.

Figure 1

SVM Framework. This figure shows the data mining scheme for making TF classifiers. 50 classifiers are constructed for each TF, each using a different random sub-sample of the negative set. First, on the far left, the negative pool for a TF is under-sampled so that it is the same size as the positive set. A classifier built on the training set is evaluated using leave-one-out cross validation (this is shown as a split into train and test sets). For every cross-validation split, the top 1500 features are selected using SVM-RFE (center solid box) and the classifier is trained and finally used to classify the test set. This process is repeated 50 times, and the accuracy for the procedure is the average of the 50 cross-validation accuracies. To classify a potential new target for a TF, all 50 classifiers are applied to the gene's feature vector, and an enrichment score is calculated as discussed in Methods. A score greater than 0.5 indicates a positive classification (an average score greater than 0.95 is used to predict new targets).