Skip to main content
. 2021 Jul 2;37(21):3856–3864. doi: 10.1093/bioinformatics/btab474

Table 4.

Statistics of the datasets used in the experiments

CDR Disease CDR Chem CT Condition CT Intervention
Domain Abstracts Abstracts Clinical trials Clinical trials
Entity type Disease Chemicals Conditions Drugs
Terminology MEDIC CTD Chemicals MeSH Drugbank
Entity level statistics
% numerals 0.11% 7.32% 7.69% 25.3%
% punctuation 1.21% 0.07% 14.28% 24.83%
Avg. len 14.88 11.27 17.92 21.68
Number of pre-processed entity mentions
Train set 4182 5203
Dev set 4244 5347 100 100
Test set 4424 5385 719 975
Filtered test 1240 (28.02%) 826 (15.38%) 642 (78.4%) 846 (78.7%)

Note: Two sets of annotated clinical trials’ fields are marked with ‘CT’.