Table 3. Text methods on DBLP for node classification (micro-F1, metric lies between (0,1) and higher value means better results).
% Labels | 5% | 11% | 30% | 50% |
---|---|---|---|---|
BoW | 0.75 ± 0.00 | 0.77 ± 0.00 | 0.79 ± 0.00 | 0.80 ± 0.00 |
TF-IDF | 0.74 ± 0.01 | 0.76 ± 0.01 | 0.79 ± 0.01 | 0.80 ± 0.00 |
LDA | 0.54 ± 0.00 | 0.55 ± 0.00 | 0.55 ± 0.00 | 0.56 ± 0.00 |
SBERT pretrained | 0.69 ± 0.00 | 0.72 ± 0.00 | 0.75 ±0.01 | 0.75 ± 0.01 |
Word2Vec pretrained | 0.72 ± 0.01 | 0.73 ± 0.01 | 0.74 ± 0.00 | 0.74 ± 0.01 |
Word2Vec (d = 300) | 0.76 ± 0.00 | 0.76 ± 0.00 | 0.77 ± 0.00 | 0.77 ± 0.01 |
Word2Vec (d = 64) | 0.76 ± 0.01 | 0.76 ± 0.00 | 0.76 ± 0.00 | 0.77 ± 0.00 |
Doc2Vec pretrained | 0.73 ± 0.00 | 0.75 ± 0.00 | 0.76 ± 0.00 | 0.76 ± 0.00 |
Doc2Vec (d = 300) | 0.55 ± 0.01 | 0.56 ± 0.00 | 0.57 ± 0.00 | 0.58 ± 0.00 |
Doc2Vec (d = 64) | 0.54 ± 0.01 | 0.54 ± 0.00 | 0.55 ± 0.00 | 0.55 ± 0.00 |
Sent2Vec pretrained | 0.73 ± 0.00 | 0.75 ± 0.00 | 0.77 ± 0.01 | 0.77 ± 0.01 |
Sent2Vec (d = 600) | 0.77 ± 0.00 | 0.78 ± 0.00 | 0.79 ± 0.00 | 0.79 ± 0.01 |
Sent2Vec (d = 64) | 0.77 ± 0.01 | 0.78 ± 0.00 | 0.78 ± 0.00 | 0.78 ± 0.00 |
Ernie pretrained | 0.70 ± 0.01 | 0.71 ± 0.00 | 0.71 ± 0.00 | 0.73 ± 0.00 |
Note:
The best values with respect to confidence intervals are highlighted in bold.