Table 9. Text embeddings on Citeseer-M10 for link prediction (micro-F1, metric lies between (0,1) and higher value means better results).
| % Train edges | 5% | 10% | 30% | 50% |
|---|---|---|---|---|
| BoW | 0.52 ± 0.01 | 0.52 ± 0.00 | 0.52 ± 0.01 | 0.52 ± 0.00 |
| TF-IDF | 0.52 ± 0.01 | 0.52 ± 0.01 | 0.53 ± 0.01 | 0.53 ± 0.00 |
| LDA | 0.69 ± 0.01 | 0.69 ± 0.01 | 0.70 ± 0.01 | 071 ± 0.01 |
| SBERT pretrained | 0.84 ± 0.00 | 0.85 ± 0.00 | 0.86 ± 0.01 | 0.86 ± 0.01 |
| Word2Vec pretrained | 0.53 ± 0.01 | 0.53 ± 0.01 | 0.54 ± 0.00 | 0.54 ± 0.01 |
| Word2Vec (d = 300) | 0.54 ± 0.00 | 0.54 ± 0.00 | 0.54 ± 0.00 | 0.54 ± 0.01 |
| Word2Vec (d = 64) | 0.54 ± 0.01 | 0.54 ± 0.01 | 0.54 ± 0.00 | 0.54 ± 0.01 |
| Doc2Vec (pretrained) | 0.55 ± 0.01 | 0.55 ± 0.00 | 0.55 ± 0.00 | 0.55 ± 0.00 |
| Doc2Vec (d = 300) | 0.77 ± 0.01 | 0.77 ± 0.00 | 0.78 ± 0.00 | 0.79 ± 0.00 |
| Doc2Vec (d = 64) | 0.77 ± 0.01 | 0.77 ± 0.01 | 0.77 ± 0.00 | 0.78 ± 0.00 |
| Sent2Vec pretrained | 0.54 ± 0.01 | 0.54 ± 0.01 | 0.55 ± 0.01 | 0.55 ± 0.01 |
| Sent2Vec (d = 600) | 0.54 ± 0.00 | 0.55 ± 0.01 | 0.55 ± 0.00 | 0.56 ± 0.01 |
| Sent2Vec (d = 64) | 0.53 ± 0.00 | 0.53 ± 0.01 | 0.54 ± 0.00 | 0.54 ± 0.01 |
| Ernie pretrained | 0.84 ± 0.01 | 0.84 ± 0.01 | 0.85 ± 0.01 | 0.85 ± 0.01 |
Note:
The best values with respect to confidence intervals are highlighted in bold.