. 2021 Mar 29;2(5):100242. doi: 10.1016/j.patter.2021.100242

Table 1.

Link prediction: Overall performance evaluation and comparison

Model	Accuracy	Weighted precision	Weighted recall	Weighted F1-score	AUC macro	AUC weighted
GF⁴⁹	0.879 ± 0.008	0.852 ± 0.011	0.879 ± 0.008	0.863 ± 0.009	0.913 ± 0.007	0.944 ± 0.006
Deepwalk⁴³	0.894 ± 0.008	0.870 ± 0.010	0.894 ± 0.008	0.879 ± 0.009	0.926 ± 0.010	0.952 ± 0.007
LINE⁴⁴	0.742 ± 0.026	0.727 ± 0.030	0.742 ± 0.026	0.732 ± 0.029	0.874 ± 0.021	0.881 ± 0.023
Node2vec⁴⁵	0.902 ± 0.007	0.868 ± 0.007	0.902 ± 0.007	0.883 ± 0.007	0.896 ± 0.011	0.932 ± 0.010
SDNE⁴⁶	0.820 ± 0.015	0.791 ± 0.019	0.820 ± 0.015	0.799 ± 0.017	0.904 ± 0.012	0.930 ± 0.011
IMSP	0.971 ± 0.005	0.972 ± 0.006	0.971 ± 0.005	0.971 ± 0.006	0.997 ± 0.001	0.996 ± 0.001

AUC, area under the receiver-operating characteristic curve. This table presents six evaluation metrics regarding the link prediction performance of our model compared with five other baseline models. While evaluating performance, we followed 5-fold stratified cross-validation setting with shuffle enabled. This method preserved the percentage of samples for each class (i.e., type of edge) in each fold. We created a sampling strategy to ensure that the training subset in each cross-validation run can form a fully connected network. To ensure the balance of input data, we gathered negative (non-connected) edges in addition to positive (connected) edges that already existed in each fold. While sampling negative edges, we randomly selected some from known negative edges (i.e., true negatives), which consisted of spike-receptor interactions demonstrated as nonexistent. We randomly selected the remaining negative edges from other non-connected node pairs, which we assumed did not exist. These negative edges were then added to each fold to match the number of positive edges. We performed this 5-fold stratified cross-validation experiment for 30 runs. In each run, we would generate a new 5-fold split. We then performed two-sample heteroscedastic t tests for these six overall performance evaluation metrics to test the significance of IMSP improvement. Lastly, we reported the average with SD for each metric.