. 2023 Oct 19;10:722. doi: 10.1038/s41597-023-02617-x

Table 3.

Overall annotation statistics comparing the existing Europe PMC dictionary-based text mining approach to the human curation for the selected 300 gold-standard articles.

		Europe PMC dictionary-based				Gold-standard human annotation
		Gene/Protein	Disease	Organism	Total	Gene/Protein	Disease	Organism	Total
Annotations	Total	28,869	10,515	18,040	57,425	36,369	14,518	21,491	72,378
Annotations	Unique	3,419	1,752	1,700	6,871	5,600	2,037	2,347	9,970
Normalised to a DB entry	Total	—	—	—	—	21,664	8,476	16,021	46,161
Median per article	Total	53.5	19.5	34	170	54.5	16	30	192
Median per article	Unique	12	8	8	36	13	6.5	8	44.5
Max annotation per article	Total	722	219	407	955	795	478	456	940
Max annotation per article	Unique	113	78	111	156	178	76	170	201

Overall we have gained around 11k term annotations, with the highest gain existing for the Gene/Protein category. We report unique term count based on the string match and how many normalise to a database identifier of the databases mentioned above rather than unique database identifier counts.