Skip to main content
. Author manuscript; available in PMC: 2015 Jun 1.
Published in final edited form as: Nat Commun. 2014 Dec 1;5:5676. doi: 10.1038/ncomms6676

Table 1. Disease and fragment statistics.

Analysis of M and T chemical-disease annotations, ‘Total’ column refers to the union of both categories. When applicable, median values are shown for count data, while mean values are shown for performance metrics. Point performance metrics are taken with default 0.5 cutoff in the random forest classifier. The cutoff could be slid along the classifier’s outcome to get different point performances along the ROC space.

M T Total
Diseases 934 835 1,176
Molecules per disease 36 25 30
LC fragments 23,135 28,325 37,809
HC fragments 910 1,107 1,550
LC fragments per disease 204.5 196.5 200.5
HC fragments per disease 5 6 6
Liable (M) and privileged (T) fragments 348 367 715
Diseases with ≥ 1 HC fragment 385 409 794
AUC 0.613 0.641 0.627
Specificity 0.878 0.882 0.880
Sensitivity 0.265 0.292 0.278
Balanced accuracy 0.571 0.588 0.579
Positive predictive value 0.032 0.023 0.029
G-mean 0.463 0.488 0.475
F1-score 0.053 0.044 0.049
Diseases with AUC > 0.7 184 216 400