Skip to main content
. 2023 Aug 8;2(5):1233–1250. doi: 10.1039/d3dd00113j

LIFT for molecular atomization energies on the QM9-G4MP2 dataset. Metrics for models tuned on 90% of the QM9-G4MP2 dataset (117 232 molecules), using 10% (13 026 molecules) as a holdout test set. GPTChem refers to the approach reported by Jablonka et al.,32 GPT-2-LoRA to PEFT of the GPT-2 model using LoRA. The results indicate that the LIFT framework can also be used to build predictive models for atomization energies, that can reach chemical accuracy using a Δ-ML scheme. Baseline performance (mean absolute error reported by Ward et al.45): 0.0223 eV for FCHL-based prediction of GP4(MP2) atomization energies and 0.0045 eV (SchNet) and 0.0052 eV (FCHL) for the Δ-ML scheme.

Mol. repr. & framework G4(MP2) atomization energy (G4(MP2)-B3LYP) atomization energy
R 2 Median absolute deviation (MAD)/eV R 2 MAD/eV
SMILES: GPTChem 0.984 0.99 0.976 0.03
SELFIES: GPTChem 0.961 1.18 0.973 0.03
SMILES: GPT2-LoRA 0.931 2.03 0.910 0.06
SELFIES: GPT2-LoRA 0.959 1.93 0.915 0.06