Table 3A.
Document Classification Performance of Different Supervised Machine Learning Algorithms
Maximum entropy | |||||||||
No. of words/code | 10 | 50 | 100 | 250 | 500 | 750 | 1000 | 2000 | 4000 |
Iteration | 83 | 109 | 186 | 104 | 169 | 104 | 199 | 65 | 69 |
Accuracy | 68.62 | 72.73 | 72.8 | 72.56 | 72.83 | 71.54 | 71.44 | 69.47 | 67.66 |
Naïve Bayes | |||||||||
No. of words | 100 | 500 | 1000 | 5000 | All | ||||
Accuracy | 63.89 | 66.92 | 66.88 | 65.59 | 63.79 | ||||
Nearest neighbor | |||||||||
Neighbors | No. of words | ||||||||
---|---|---|---|---|---|---|---|---|---|
100 | 500 | 1000 | 5000 | All | |||||
1 | 58.04 | 54.06 | 52.84 | 53.28 | 52.19 | ||||
5 | 60.52 | 57.53 | 57.84 | 58.38 | 56.82 | ||||
20 | 59.71 | 59.91 | 60.8 | 61.88 | 61.24 | ||||
50 | 59.23 | 60.39 | 61.85 | 62.9 | 62.26 | ||||
100 | 58.76 | 60.29 | 61.41 | 62.77 | 61.54 | ||||
200 | 56.65 | 59.16 | 60.08 | 61.31 | 60.05 |
Document classification performance for three different algorithms on the Test 2000 dataset for a series of parameters. For maximum entropy classification, we attempted different numbers of word-features/code; also we tested the accuracy at each iteration of the GIS optimization algorithm. Here we report in each column the number of words/code used, the highest accuracy obtained, and the first iteration obtaining that highest accuracy. For naïve Bayes classification, we calculated accuracy on different vocabularies. The size of the vocabulary and the accuracy is reported in each column. For nearest-neighbor classification we calculated accuracy for different numbers of neighbors and different vocabularies. The accuracy data is reported in a grid, with different numbers of neighbors for each row, and with different vocabularies for each column. The best performance achieved for each method is underlined.