Table 4.
Summary of the feature settings. (The w denotes the window size. If the value is absent, only feature of the current token is used. The n denotes the n of the n-gram. The ‘len’ denotes the length of affixes. The matching features denote the result of controlled vocabulary matching)
Set | Token | Norm-token | n-gram | character affix | capitalization | POS/Chunk | Matching |
---|---|---|---|---|---|---|---|
#1-context | w = 3 | w = 3 | |||||
#2-morph | w = 3 | w = 3 |
len = 2~3 w = 3 |
||||
#3-i2b2 | w = 5 | w = 5 |
n = 2 w = 5 |
len = 2~7 w = 3 |
w = 1 | ||
#3-snuh | w = 5 | w = 3 |
n = 2 w = 5 |
len = 2~3 | modifier /control | ||
#3-conll | w = 5 |
len = 3~4 w = 5 |
w = 5 | n = 1 |