Table 4.
Summary of experiments performed to identify risk factors. PP=post-processing rules, OAI=optimization against annotation imbalance (n=number of tokens before/after annotated tokens)
CRF model | PP | OAI | Tested hypothesis |
---|---|---|---|
Complex | No | No | A CRF with complex features identifies more risk factors than a lexicon projection |
Complex | Yes | No | Post-processing rules identify risk factors repre- sented as numerical values higher than defined threshold |
Simple | Yes | No | A CRF with simple features (the token and its part-of-speech tag) identifies already known risk factors |
Simple | Yes | Yes (n = 35) |
The reduction of unannotated tokens occurring before and after annotated tokens counters anno- tation imbalance and improves results |