Skip to main content
. 2011 Jun 14;27(13):i111–i119. doi: 10.1093/bioinformatics/btr214

Table 3.

Feature templates used in the CRF tagger

Word unigram wi−5, wi−4, wi−3, wi−2, wi−1, wiwi+1, wi+2, wi+3, wi+4, wi+5 & yi
Word bigram wi−1wi, wiwi+1 & yi
Word trigram wi−1wiwi+1 & yi
Substrings substrings of wi & yi
(up to length 10)
Word shape S(wi) & yi
Tag bigram True & yi−1yi

wi is the current word. yi is the current tag. Word shape S(wi) is produced by converting capital letters into ‘A’, small letters into ‘a’ and numerals into ‘#’.

HHS Vulnerability Disclosure