. 2024 Aug 17;40(8):btae490. doi: 10.1093/bioinformatics/btae490

Table 1.

Metabolite annotation evaluation on [M+H]+ precursor.^a

	Average rank	Rank@1	Rank@3	Rank@10	Cosine similarity

	The lower the better	The higher the better			The higher the better
A. Training on NIST-20; test candidates from PubChem
MLP	279.964	0.197	0.303	0.453	0.735
GNN	340.306	0.056	0.137	0.291	0.699
MLP-PD	299.007	0.205	0.318	0.463	0.733
GNN-PD	320.726	0.075	0.170	0.333	0.694
ESP-SL	275.182	0.190	0.312	0.473	0.738
ESP-RU	214.226	0.167	0.301	0.482	0.750
ESP	213.597	0.169	0.301	0.488	0.750
B. Training on NIST-20; test candidates from COCONUT
MLP-PD	6.653	0.531	0.740	0.877	0.731
GNN-PD	6.339	0.488	0.693	0.859	0.689
ESP	5.659	0.551	0.763	0.900	0.746
C. Training on NIST-20; test on most similar candidates from PubChem
MLP-PD	14.333	0.283	0.435	0.646	0.733
GNN-PD	17.823	0.120	0.256	0.521	0.694
ESP	13.491	0.225	0.397	0.655	0.750
D. Training on NPLIB1; test candidates from PubChem
MLP-PD	313.927	0.237	0.389	0.519	0.633
GNN-PD	225.928	0.121	0.270	0.459	0.618
ESP	291.104	0.178	0.331	0.479	0.624

Performance is presented based on the data on which the models are trained and the source of test candidates. The metrics include average rank, rank@1, 3, and 10, and cosine similarity. Average rank reports on the overall performance of the relevant test set. Rank@k represents the portion of correct identifications when considering the top k candidates. Cosine similarity is the average cosine similarity of the predicted test spectra against the ground truth spectra. Models are trained and evaluated on the NIST-20 in all cases except (D), where they are trained on the NPLIB1 dataset. Test candidates include the full candidate set from PubChem in (A), the full candidate set from COCONUT in (B), the 100 most similar candidates from PubChem in (C), and the full candidate set from PubChem in (D). The bold entries include the best rank within each subtable.