Table 5.
Rank | Setting | Precision | Recall | F-score | Significance |
---|---|---|---|---|---|
1 | Rank 1 system | 88.48 | 85.97 | 87.21 | 6–11 |
2 | Rank 2 system | 89.30 | 84.49 | 86.83 | 8–11 |
3 | BANNER_Dict+DistSem | 88.25 | 85.12 | 86.66 | 8–11 |
4 | Rank 3 system | 84.93 | 88.28 | 86.57 | 8–11 |
5 | BANNER_noDict+DistSem | 87.95 | 85.06 | 86.48 | 10–11 |
6 | Rank 4 system | 87.27 | 85.41 | 86.33 | 10–11 |
7 | Rank 5 system | 85.77 | 86.80 | 86.28 | 10–11 |
8 | Rank 6 system | 82.71 | 89.32 | 85.89 | 10–11 |
9 | BANNER_Dict | 86.41 | 84.55 | 85.47 | – |
10 | Rank 7 system | 86.97 | 82.55 | 84.70 | – |
11 | BANNER_noDict | 85.63 | 83.10 | 84.35 | – |
Notes: The significance column indicates which systems are significantly less accurate than the system in the corresponding row. These values are based on the Bootstrap re-sampling calculations performed as part of the evaluation in the BioCreative II shared task (the latest gene or protein tagging task). BANNER_Dict+DistSem is the system that uses both manual and empirical lexical resources. BANNER_noDict+DistSem is the system that uses only empirical lexical resources. BANNER_Dict is the system that uses only manual lexical resources. This is the system available prior to this research, and the baseline for this study. BANNER_noDict is the system that uses neither manual nor empirical lexical resources. BANNER_Dict+DistSem is the system that is significantly more accurate than the baseline. It is equally important to the improvement that the accuracy of BANNER_noDict+DistSem is better than BANNER_noDict. The most significant contribution in terms of research is that an equivalent accuracy (BANNER_noDict+DistSem and BANNER_Dict) could be achieved even without using any manually compiled lexical resources apart from the annotated corpora.