Table 1.
Average rank | Rank@1 | Rank@3 | Rank@10 | Cosine similarity | |
---|---|---|---|---|---|
|
|
|
|||
The lower the better | The higher the better | The higher the better | |||
A. Training on NIST-20; test candidates from PubChem | |||||
MLP | 279.964 | 0.197 | 0.303 | 0.453 | 0.735 |
GNN | 340.306 | 0.056 | 0.137 | 0.291 | 0.699 |
MLP-PD | 299.007 | 0.205 | 0.318 | 0.463 | 0.733 |
GNN-PD | 320.726 | 0.075 | 0.170 | 0.333 | 0.694 |
ESP-SL | 275.182 | 0.190 | 0.312 | 0.473 | 0.738 |
ESP-RU | 214.226 | 0.167 | 0.301 | 0.482 | 0.750 |
ESP | 213.597 | 0.169 | 0.301 | 0.488 | 0.750 |
B. Training on NIST-20; test candidates from COCONUT | |||||
MLP-PD | 6.653 | 0.531 | 0.740 | 0.877 | 0.731 |
GNN-PD | 6.339 | 0.488 | 0.693 | 0.859 | 0.689 |
ESP | 5.659 | 0.551 | 0.763 | 0.900 | 0.746 |
C. Training on NIST-20; test on most similar candidates from PubChem | |||||
MLP-PD | 14.333 | 0.283 | 0.435 | 0.646 | 0.733 |
GNN-PD | 17.823 | 0.120 | 0.256 | 0.521 | 0.694 |
ESP | 13.491 | 0.225 | 0.397 | 0.655 | 0.750 |
D. Training on NPLIB1; test candidates from PubChem | |||||
MLP-PD | 313.927 | 0.237 | 0.389 | 0.519 | 0.633 |
GNN-PD | 225.928 | 0.121 | 0.270 | 0.459 | 0.618 |
ESP | 291.104 | 0.178 | 0.331 | 0.479 | 0.624 |
Performance is presented based on the data on which the models are trained and the source of test candidates. The metrics include average rank, rank@1, 3, and 10, and cosine similarity. Average rank reports on the overall performance of the relevant test set. Rank@k represents the portion of correct identifications when considering the top k candidates. Cosine similarity is the average cosine similarity of the predicted test spectra against the ground truth spectra. Models are trained and evaluated on the NIST-20 in all cases except (D), where they are trained on the NPLIB1 dataset. Test candidates include the full candidate set from PubChem in (A), the full candidate set from COCONUT in (B), the 100 most similar candidates from PubChem in (C), and the full candidate set from PubChem in (D). The bold entries include the best rank within each subtable.