Skip to main content
. Author manuscript; available in PMC: 2012 Jun 1.
Published in final edited form as: Proteins. 2011 Apr 12;79(6):1952–1963. doi: 10.1002/prot.23020

Figure 2. Nested CV10 experiment.

Figure 2

For concurrent parameter optimization and performance evaluation, a nested CV10 experiment is applied to the benchmark dataset. The five training steps (rectangles numbered 1–5) and the final test step (numbered 6) are repeated ten times with alternating test sets. Steps involving inner CV9 experiments are indicated by filled rectangles. Flow of information is indicated by arrows, the colors of which indicate the information type. A summary of the optimized parameters and techniques can be found in Table II.

Step 1. The benchmark dataset is divided into 10 equal-sized and disjoint protein subsets, one of which is set aside as a test set (red). The remaining nine are used for training (blue). Step 2. An exhaustive grid search for the optimal amount and penalty ratios (AR and PR, respectively; see text) is conducted on the training set using a CV9 experiment for every AR-PR grid point. This CV9 experiment uses one subset (orange) for validation and the remaining eight for training (blue). The optimal AR and PR are used in the subsequent steps (plum arrows). Step 3. A CV9 search for the optimal probability threshold (PT) to be used in the test step is performed (cyan arrow). Step 4. The optimal AR is used to balance the training set. Step 5. The resulting balanced training set is used to train SVM with the optimal PR. Step 6. The learned SVM model is evaluated on the test set (red arrow) using the optimal PT (cyan arrow).