Skip to main content
. 2021 Mar 29;2(5):100242. doi: 10.1016/j.patter.2021.100242

Table 1.

Link prediction: Overall performance evaluation and comparison

Model Accuracy Weighted precision Weighted recall Weighted F1-score AUC macro AUC weighted
GF49 0.879 ± 0.008 0.852 ± 0.011 0.879 ± 0.008 0.863 ± 0.009 0.913 ± 0.007 0.944 ± 0.006
Deepwalk43 0.894 ± 0.008 0.870 ± 0.010 0.894 ± 0.008 0.879 ± 0.009 0.926 ± 0.010 0.952 ± 0.007
LINE44 0.742 ± 0.026 0.727 ± 0.030 0.742 ± 0.026 0.732 ± 0.029 0.874 ± 0.021 0.881 ± 0.023
Node2vec45 0.902 ± 0.007 0.868 ± 0.007 0.902 ± 0.007 0.883 ± 0.007 0.896 ± 0.011 0.932 ± 0.010
SDNE46 0.820 ± 0.015 0.791 ± 0.019 0.820 ± 0.015 0.799 ± 0.017 0.904 ± 0.012 0.930 ± 0.011
IMSP 0.971 ± 0.005 0.972 ± 0.006 0.971 ± 0.005 0.971 ± 0.006 0.997 ± 0.001 0.996 ± 0.001

AUC, area under the receiver-operating characteristic curve. This table presents six evaluation metrics regarding the link prediction performance of our model compared with five other baseline models. While evaluating performance, we followed 5-fold stratified cross-validation setting with shuffle enabled. This method preserved the percentage of samples for each class (i.e., type of edge) in each fold. We created a sampling strategy to ensure that the training subset in each cross-validation run can form a fully connected network. To ensure the balance of input data, we gathered negative (non-connected) edges in addition to positive (connected) edges that already existed in each fold. While sampling negative edges, we randomly selected some from known negative edges (i.e., true negatives), which consisted of spike-receptor interactions demonstrated as nonexistent. We randomly selected the remaining negative edges from other non-connected node pairs, which we assumed did not exist. These negative edges were then added to each fold to match the number of positive edges. We performed this 5-fold stratified cross-validation experiment for 30 runs. In each run, we would generate a new 5-fold split. We then performed two-sample heteroscedastic t tests for these six overall performance evaluation metrics to test the significance of IMSP improvement. Lastly, we reported the average with SD for each metric.