Table 2:
Average improvement over randomly sampled FDA drugs grouped by LLM-estimated relevance of the retrieved BioAssays. The value represents the increase in the docking score, measured in kcal/mol.
| Model | High (39%) |
Medium (42%) |
Low (7%) |
No (12%) |
Overall |
|||||
|---|---|---|---|---|---|---|---|---|---|---|
| Avg. | Med. | Avg. | Med. | Avg. | Med. | Avg. | Med. | Avg. | Med. | |
|
| ||||||||||
| TargetDiff | 0.838 | 0.802 | 0.701 | 0.777 | 0.669 | 0.696 | 0.771 | 1.052 | 0.761 | 0.796 |
| Gemma-3–27B | 0.196 | 0.170 | -0.050 | -0.145 | 0.197 | 0.293 | -0.390 | -0.535 | 0.023 | 0.034 |
| GPT 4o | 0.331 | 0.228 | 0.116 | 0.118 | 0.396 | 0.302 | -0.289 | -0.122 | 0.171 | 0.130 |
| DeepSeekV3 | 0.379 | 0.311 | 0.079 | 0.070 | 0.429 | 0.300 | -0.067 | -0.098 | 0.203 | 0.159 |
| Assay2Mol (Gemma-3–27B) | 1.277 | 1.124 | 1.037 | 1.121 | 0.535 | 0.770 | 0.554 | 0.606 | 1.037 | 1.069 |
| Assay2Mol (GPT 4o) | 1.061 | 1.046 | 0.732 | 0.741 | 0.223 | 0.517 | 0.269 | 0.151 | 0.769 | 0.777 |
| Assay2Mol (DeepSeekV3) | 1.042 | 0.634 | 0.842 | 0.921 | 0.599 | 0.579 | 0.267 | 0.273 | 0.834 | 0.849 |