Table 1.
Model | Accuracy | Weighted precision | Weighted recall | Weighted F1-score | AUC macro | AUC weighted |
---|---|---|---|---|---|---|
GF49 | 0.879 ± 0.008 | 0.852 ± 0.011 | 0.879 ± 0.008 | 0.863 ± 0.009 | 0.913 ± 0.007 | 0.944 ± 0.006 |
Deepwalk43 | 0.894 ± 0.008 | 0.870 ± 0.010 | 0.894 ± 0.008 | 0.879 ± 0.009 | 0.926 ± 0.010 | 0.952 ± 0.007 |
LINE44 | 0.742 ± 0.026 | 0.727 ± 0.030 | 0.742 ± 0.026 | 0.732 ± 0.029 | 0.874 ± 0.021 | 0.881 ± 0.023 |
Node2vec45 | 0.902 ± 0.007 | 0.868 ± 0.007 | 0.902 ± 0.007 | 0.883 ± 0.007 | 0.896 ± 0.011 | 0.932 ± 0.010 |
SDNE46 | 0.820 ± 0.015 | 0.791 ± 0.019 | 0.820 ± 0.015 | 0.799 ± 0.017 | 0.904 ± 0.012 | 0.930 ± 0.011 |
IMSP | 0.971 ± 0.005 | 0.972 ± 0.006 | 0.971 ± 0.005 | 0.971 ± 0.006 | 0.997 ± 0.001 | 0.996 ± 0.001 |
AUC, area under the receiver-operating characteristic curve. This table presents six evaluation metrics regarding the link prediction performance of our model compared with five other baseline models. While evaluating performance, we followed 5-fold stratified cross-validation setting with shuffle enabled. This method preserved the percentage of samples for each class (i.e., type of edge) in each fold. We created a sampling strategy to ensure that the training subset in each cross-validation run can form a fully connected network. To ensure the balance of input data, we gathered negative (non-connected) edges in addition to positive (connected) edges that already existed in each fold. While sampling negative edges, we randomly selected some from known negative edges (i.e., true negatives), which consisted of spike-receptor interactions demonstrated as nonexistent. We randomly selected the remaining negative edges from other non-connected node pairs, which we assumed did not exist. These negative edges were then added to each fold to match the number of positive edges. We performed this 5-fold stratified cross-validation experiment for 30 runs. In each run, we would generate a new 5-fold split. We then performed two-sample heteroscedastic t tests for these six overall performance evaluation metrics to test the significance of IMSP improvement. Lastly, we reported the average with SD for each metric.