Skip to main content
. 2016 Oct 3;11(10):e0163480. doi: 10.1371/journal.pone.0163480

Table 2. Basic statistics of the two utilized datasets of the DDI corpus.

MEDLINE DrugBank
Test Train Total Test Train Total
Documents 33 142 175 158 572 730
Sentences 326 1301 2308 973 5675 6648
Drug Names 426 1836 2308 2512 12,929 15,441
True DDI candidates 95 232 327 884 3788 4672
False DDI candidates 356 1555 1911 4381 22,217 26,598
Candidates with clause connectors 126 478 604 2067 9215 11,282
Number of Tokens 14,358 61,525 75,883 244,658 1,163,072 1,407,730
DDI Candidates with negation 43 316 359 1367 4558 5925
Total number of DDI candidates 482 2033 1787 5265 31,432 36,697