Skip to main content
. 2010 May 12;11:299. doi: 10.1186/1471-2164-11-299

Figure 1.

Figure 1

Performance comparison of the RF model trained on different types of input data. 500 RF models with randomly generated P1 and P2 sets, to correct for class A and P inbalance, were trained on each of the following 6 types of data: the full-length amino acid sequences, the signal peptides (SP) and the mature protein amino acid sequences, each analyzed with either the residue frequency of single amino acids or the frequency of 2 adjacent amino acids. When the 6 top-performing models of each input type are compared, the model trained with full-length protein sequences with the 2 adjacent amino acids combination shows the highest overall accuracy (89%) and A protein recall (90%).