Table 1.
Comparison of annotation counts between tokenization approaches
Re-tokenization | Protein | Cellular | Tissue | Molecule | Cell | Organism |
---|---|---|---|---|---|---|
Without | 97.178 | 99.772 | 95.951 | 96.107 | 97.099 | 93.691 |
With | 99.187 | 99.866 | 99.842 | 99.424 | 99.559 | 98.921 |
The comparison of annotation counts between preprocessing with only NERsuite tokenization module (without) and with both NERsuite tokenization and additional tokenization (with). The numbers are percents of annotations compared to the provided data presented for each entity type.