Skip to main content
. 2008 Sep 1;9(Suppl 2):S9. doi: 10.1186/gb-2008-9-s2-s9

Table 6.

GN results: performance of nine heuristics used to filter false-positive gene mentions or modify gene mentions to improve dictionary matching performance.

Presence of ... Example P R F Modified
0 0.770 0.673 0.718 0
1 Gene chromosome location 3p11-3p12.1 0.772 0.673 0.719 34
2 Single, short lowercase word heme 0.778 0.672 0.721 112
3 Strings of only numbers &/or punct 9+/-76 0.779 0.672 0.722 206
4 Extra preceding words protein SNF to SNF 0.790 0.681 0.731 225a
5 Extra trailing words SNF protein to SNF 0.812 0.723 0.765 419a
6 Amino acids Ser-119 0.815 0.723 0.766 460
7 Protein families Bcl-2 family proteins 0.816 0.722 0.766 701
8 Protein domains, motifs, fusion SNH domain 0.828 0.722 0.771 883
9 Nonhuman keywords rat IFN gamma 0.829 0.725 0.774 1,086a

Results depicted here are from the development dataset. Step 0 indicates performance before application of any rules. At each step, the rules of preceding steps are also applied. Modified refers to the cumulative number of gene mentions removed or altered. aRules 4 and 5 result in modification of gene mentions only. Rule 9 can result in either modification or removal of gene mentions. All other rules result in removal of gene mentions. GN, gene normalization.