Table 4:
Manual review of LLM BioAssay relevance assessment by a single expert computational chemist (author S.S.E.). All targets have 10 BioAssays retrieved except for 2RHY and 2PQW, which have 6 each. The reason is that only 6 BioAssays remained after filtering.
| Relevance | Target | GPT 4o errors | GPT 4o accuracy (%) | DeepSeek-V3 errors | DeepSeek-V3 accuracy (%) | GPT 4o accuracy – DeepSeek-V3 accuracy (%) |
|---|---|---|---|---|---|---|
|
| ||||||
| high | 5W2G | 1 | 90 | 1 | 90 | 0 |
| high | 3G51 | 4 | 60 | 10 | 0 | 60 |
| high | 1COY | 3 | 70 | 9 | 10 | 60 |
| high | 2JJG | 5 | 50 | 9 | 10 | 40 |
| high | 2RHY | 1 | 83 | 3 | 50 | 33 |
| high | 2PQW | 0 | 100 | 2 | 67 | 33 |
| high | 4G3D | 5 | 50 | 5 | 50 | 0 |
| medium | 4AAW | 4 | 60 | 6 | 40 | 20 |
| medium | 4YHJ | 0 | 100 | 1 | 90 | 10 |
| medium | 14GS | 1 | 90 | 9 | 10 | 80 |
| medium | 4RN0 | 1 | 90 | 4 | 60 | 30 |
| medium | 1FMC | 0 | 100 | 10 | 0 | 100 |
| medium | 3DAF | 0 | 100 | 10 | 0 | 100 |
| medium | 1A2G | 0 | 100 | 10 | 0 | 100 |
| medium | 3DZH | 0 | 100 | 8 | 20 | 80 |
| medium | 5BUR | 0 | 100 | 10 | 0 | 100 |
| low | 1R1H | 0 | 100 | 7 | 30 | 70 |
| low | 5B08 | 0 | 100 | 3 | 70 | 30 |
| low | 5I0B | 1 | 90 | 4 | 60 | 30 |
| low | 3KC1 | 0 | 100 | 4 | 60 | 40 |
| low | 1D7J | 1 | 90 | 4 | 60 | 30 |
| no | 2Z3H | 0 | 100 | 0 | 100 | 0 |
| no | 2V3R | 0 | 100 | 0 | 100 | 0 |
| no | 3B6H | 8 | 20 | 10 | 0 | 20 |
| no | 4P6P | 0 | 100 | 0 | 100 | 0 |
|
| ||||||
| total | 25 | 35 | 86 | 139 | 43 | 43 |