Performance of ML models for atomization energies
of small organic
molecules and radicals. (a) Learning curves for the QM9 dataset,232 using a variety of representations and regression
methods. For each value of the training set size, we show the mean
absolute error (MAE) evaluated on the test set which consists of the
remaining structures from the full dataset. Models based on FCHL (2018),233 SOAP (2018),69 aSLATM,234 Coulomb Matrix (CM),235 and Bag-of-bonds (BOB)236 representations
use Gaussian process/kernel ridge regression, whereas NICE73 and MTP67 use linear
ridge regression, and SchNet237 and PhysNet238 are graph neural networks. GM-sNN uses a representation
similar in spirit to MTP but based on a Gaussian radial basis set
and a feed-forward neural network for regression.239 (b) Learning curve for the Rad-6 dataset.240 Example species are shown including a radical species,
which actually account for over 90% of the total dataset (reprinted
from ref (240); original
work published under the CC BY 4.0 license; https://creativecommons.org/licenses/by/4.0/).