Table 3.
Feature templates used in the CRF tagger
Word unigram | wi−5, wi−4, wi−3, wi−2, wi−1, wiwi+1, wi+2, wi+3, wi+4, wi+5 | & yi |
---|---|---|
Word bigram | wi−1wi, wiwi+1 | & yi |
Word trigram | wi−1wiwi+1 | & yi |
Substrings | substrings of wi | & yi |
(up to length 10) | ||
Word shape | S(wi) | & yi |
Tag bigram | True | & yi−1yi |
wi is the current word. yi is the current tag. Word shape S(wi) is produced by converting capital letters into ‘A’, small letters into ‘a’ and numerals into ‘#’.