Table 2:
The performance comparison of drug repurposing prediction (DRP) between KGML-xDTD and different baseline models based on test set (described in “Data split” section). The top panel shows the performance of state-of-the-art (SOTA) baseline models; the middle panel shows the performance of variants of the KGML-xDTD model framework; the bottom panel shows the performance of KGML-xDTD model framework
Model | Accuracy | Macro F1 score | MRR | Hit@1 | Hit@3 | Hit@5 |
---|---|---|---|---|---|---|
TransE | 0.708 | 0.708 | 0.301 (±0.005) | 0.134 (±0.007) | 0.327 (±0.009) | 0.482 (±0.007) |
TransR | 0.858 | 0.855 | 0.329 (±0.006) | 0.150 (±0.009) | 0.378 (±0.008) | 0.542 (±0.005) |
RotatE | 0.704 | 0.704 | 0.281 (±0.007) | 0.098 (±0.008) | 0.314 (±0.007) | 0.497 (±0.009) |
DistMult | 0.555 | 0.495 | 0.182 (±0.004) | 0.042 (±0.002) | 0.157 (±0.010) | 0.292 (±0.010) |
ComplEx | 0.624 | 0.460 | 0.138 (±0.004) | 0.026 (±0.004) | 0.106 (±0.007) | 0.205 (±0.008) |
ANALOGY | 0.594 | 0.465 | 0.188 (±0.004) | 0.044 (±0.004) | 0.165 (±0.009) | 0.301 (±0.008) |
SimplE | 0.599 | 0.472 | 0.167 (±0.006) | 0.036 (±0.006) | 0.140 (±0.008) | 0.259 (±0.011) |
GAT | 0.936 | 0.934 | 0.002 (±0.000) | 0.000 (±0.000) | 0.000 (±0.000) | 0.000 (±0.000) |
GraphSAGE-link | 0.919 | 0.915 | 0.002 (±0.000) | 0.000 (±0.000) | 0.000 (±0.000) | 0.000 (±0.000) |
GraphSAGE+logistic | 0.791 | 0.784 | 0.002 (±0.000) | 0.000 (±0.000) | 0.000 (±0.000) | 0.000 (±0.000) |
GraphSAGE+SVM | 0.807 | 0.793 | 0.002 (±0.000) | 0.000 (±0.000) | 0.000 (±0.000) | 0.000 (±0.000) |
KGML-xDTD w/o NAEs | 0.909 (0.898*) | 0.891 (0.892*) | 0.159 (±0.003) | 0.035 (±0.002) | 0.143 (±0.006) | 0.262 (±0.008) |
2-class KGML-xDTD | 0.929 | 0.925 | 0.278 (±0.003) | 0.183 (±0.006) | 0.321 (±0.003) | 0.389 (±0.006) |
KGML-xDTD (ours) | 0.935 (0.930*) | 0.923 (0.926*) | 0.382 (±0.004) | 0.238 (±0.007) | 0.425 (±0.006) | 0.543 (±0.006) |
The values with * inside the parentheses are the adjusted results by excluding the “unknown” category for a fair comparison.
The ranking metrics (e.g., “MRR” and “Hit@K”) are calculated as the mean along with standard deviation based on 10 independent sets of non-true-positive drug–disease candidates generated by the random drug–disease replacement method (i.e., for each true-positive drug–disease pair in test set, we use 1,000 random drug–disease pairs as non-true-positive drug–disease candidates to calculate the rank). See more details in “Drug repurposing prediction evaluation method” section.
The abbreviation “w/o NAEs” in the name of model “KGML-xDTD w/o NAEs” represents without using node attribute embeddings.