Skip to main content
. 2023 Oct 19;10:722. doi: 10.1038/s41597-023-02617-x

Table 3.

Overall annotation statistics comparing the existing Europe PMC dictionary-based text mining approach to the human curation for the selected 300 gold-standard articles.

Europe PMC dictionary-based Gold-standard human annotation
Gene/Protein Disease Organism Total Gene/Protein Disease Organism Total
Annotations Total 28,869 10,515 18,040 57,425 36,369 14,518 21,491 72,378
Unique 3,419 1,752 1,700 6,871 5,600 2,037 2,347 9,970
Normalised to a DB entry Total 21,664 8,476 16,021 46,161
Median per article Total 53.5 19.5 34 170 54.5 16 30 192
Unique 12 8 8 36 13 6.5 8 44.5
Max annotation per article Total 722 219 407 955 795 478 456 940
Unique 113 78 111 156 178 76 170 201

Overall we have gained around 11k term annotations, with the highest gain existing for the Gene/Protein category. We report unique term count based on the string match and how many normalise to a database identifier of the databases mentioned above rather than unique database identifier counts.