Skip to main content
. 2002 Jan;12(1):203–214. doi: 10.1101/gr.199701

Table 3A.

Document Classification Performance of Different Supervised Machine Learning Algorithms

Maximum entropy
No. of words/code 10 50 100 250 500 750 1000 2000 4000
Iteration 83 109 186 104 169 104 199 65 69
Accuracy 68.62 72.73 72.8 72.56 72.83 71.54 71.44 69.47 67.66
Naïve Bayes
No. of words 100 500 1000 5000 All
Accuracy 63.89 66.92 66.88 65.59 63.79
Nearest neighbor
Neighbors No. of words


100 500 1000 5000 All





1 58.04 54.06 52.84 53.28 52.19
5 60.52 57.53 57.84 58.38 56.82
20 59.71 59.91 60.8 61.88 61.24
50 59.23 60.39 61.85 62.9  62.26
100 58.76 60.29 61.41 62.77 61.54
200 56.65 59.16 60.08 61.31 60.05

Document classification performance for three different algorithms on the Test 2000 dataset for a series of parameters. For maximum entropy classification, we attempted different numbers of word-features/code; also we tested the accuracy at each iteration of the GIS optimization algorithm. Here we report in each column the number of words/code used, the highest accuracy obtained, and the first iteration obtaining that highest accuracy. For naïve Bayes classification, we calculated accuracy on different vocabularies. The size of the vocabulary and the accuracy is reported in each column. For nearest-neighbor classification we calculated accuracy for different numbers of neighbors and different vocabularies. The accuracy data is reported in a grid, with different numbers of neighbors for each row, and with different vocabularies for each column. The best performance achieved for each method is underlined.