LIFT for molecular atomization energies on the QM9-G4MP2 dataset. Metrics for models tuned on 90% of the QM9-G4MP2 dataset (117 232 molecules), using 10% (13 026 molecules) as a holdout test set. GPTChem refers to the approach reported by Jablonka et al.,32 GPT-2-LoRA to PEFT of the GPT-2 model using LoRA. The results indicate that the LIFT framework can also be used to build predictive models for atomization energies, that can reach chemical accuracy using a Δ-ML scheme. Baseline performance (mean absolute error reported by Ward et al.45): 0.0223 eV for FCHL-based prediction of GP4(MP2) atomization energies and 0.0045 eV (SchNet) and 0.0052 eV (FCHL) for the Δ-ML scheme.
Mol. repr. & framework | G4(MP2) atomization energy | (G4(MP2)-B3LYP) atomization energy | ||
---|---|---|---|---|
R 2 | Median absolute deviation (MAD)/eV | R 2 | MAD/eV | |
SMILES: GPTChem | 0.984 | 0.99 | 0.976 | 0.03 |
SELFIES: GPTChem | 0.961 | 1.18 | 0.973 | 0.03 |
SMILES: GPT2-LoRA | 0.931 | 2.03 | 0.910 | 0.06 |
SELFIES: GPT2-LoRA | 0.959 | 1.93 | 0.915 | 0.06 |