Ternary classification accuracies of fine-tuned GPT-3 and trained GNN models for “unknown” molecules.
| Conjugated fragment | Number of molecules | HOMO accuracy | LUMO accuracy | ||
|---|---|---|---|---|---|
| GPT-3 | GNN | GPT-3 | GNN | ||
| Naphthalene (1) | 475 | 0.94 | 0.95 | 0.88 | 0.91 |
| Anthracene (2) | 577 | 0.99 | 1.00 | 0.93 | 0.97 |
| Tetracene (3) | 72 | 0.96 | 1.00 | 0.90 | 0.99 |
| Pyrene (4) | 237 | 0.98 | 1.00 | 0.97 | 0.99 |
| Perylene (5) | 41 | 0.98 | 1.00 | 0.98 | 0.95 |
| (1) + (2) + (3) + (4) + (5)a | 1402 | 0.97 | 0.98 | 0.93 | 0.95 |
| p-Benzoquinone (6) | 295 | 0.83 | 0.91 | 0.87 | 0.87 |
| 1,4-Naphthoquinone (7) | 282 | 0.82 | 0.91 | 0.94 | 0.96 |
| 9,10-Anthraquinone (8) | 186 | 0.86 | 0.91 | 0.99 | 1.00 |
| (1) + (2) + (3) + (4) + (5) + (6) + (7) + (8)b | 2165 | 0.88 | 0.91 | 0.95 | 0.96 |
| 1,8-Naphthalimide (9) | 85 | 0.86 | 0.93 | 1.00 | 1.00 |
| Naphthalenetetracarboxylic diimide (10) | 88 | 0.86 | 0.88 | 0.95 | 0.98 |
| Perylenetetracarboxylic diimide (11) | 76 | 0.85 | 0.89 | 0.97 | 0.97 |
| (1) + (2) + (3) + (4) + (5) + (6) + (7) + (8) + (9) + (10) + (11)c | 3177 | 0.88 | 0.88 | 0.97 | 0.97 |
All five families of molecules were excluded from model training. The HOMO/LUMO prediction accuracies reported in this row were measured on these five families of molecules.
All eight families of molecules were excluded from model training. The HOMO/LUMO prediction accuracies reported in this row were measured on the families 6–8 of molecules.
All 11 families of molecules were excluded from model training. The HOMO/LUMO prediction accuracies reported in this row were measured on the families 9–11 of molecules.