Skip to main content
. 2024 Aug 17;40(8):btae490. doi: 10.1093/bioinformatics/btae490

Table 1.

Metabolite annotation evaluation on [M+H]+ precursor.a

Average rank Rank@1 Rank@3 Rank@10 Cosine similarity



The lower the better The higher the better The higher the better
A. Training on NIST-20; test candidates from PubChem
MLP 279.964 0.197 0.303 0.453 0.735
GNN 340.306 0.056 0.137 0.291 0.699
MLP-PD 299.007 0.205 0.318 0.463 0.733
GNN-PD 320.726 0.075 0.170 0.333 0.694
ESP-SL 275.182 0.190 0.312 0.473 0.738
ESP-RU 214.226 0.167 0.301 0.482 0.750
ESP 213.597 0.169 0.301 0.488 0.750
B. Training on NIST-20; test candidates from COCONUT
MLP-PD 6.653 0.531 0.740 0.877 0.731
GNN-PD 6.339 0.488 0.693 0.859 0.689
ESP 5.659 0.551 0.763 0.900 0.746
C. Training on NIST-20; test on most similar candidates from PubChem
MLP-PD 14.333 0.283 0.435 0.646 0.733
GNN-PD 17.823 0.120 0.256 0.521 0.694
ESP 13.491 0.225 0.397 0.655 0.750
D. Training on NPLIB1; test candidates from PubChem
MLP-PD 313.927 0.237 0.389 0.519 0.633
GNN-PD 225.928 0.121 0.270 0.459 0.618
ESP 291.104 0.178 0.331 0.479 0.624
a

Performance is presented based on the data on which the models are trained and the source of test candidates. The metrics include average rank, rank@1, 3, and 10, and cosine similarity. Average rank reports on the overall performance of the relevant test set. Rank@k represents the portion of correct identifications when considering the top k candidates. Cosine similarity is the average cosine similarity of the predicted test spectra against the ground truth spectra. Models are trained and evaluated on the NIST-20 in all cases except (D), where they are trained on the NPLIB1 dataset. Test candidates include the full candidate set from PubChem in (A), the full candidate set from COCONUT in (B), the 100 most similar candidates from PubChem in (C), and the full candidate set from PubChem in (D). The bold entries include the best rank within each subtable.