Table 3. Train/test data and CSD test set RMSEs and max UEs for ΔEH–L in kcal mol–1 for different machine learning methods and descriptor sets compared: KRR, kernel ridge regression, using square-exponential kernel for descriptor set g and the L1 matrix distance52 for the sorted Coulomb matrix descriptor; SVR, support vector regression using square-exponential kernel; ANN, artificial neural network. Results are also given for the KRR/Coulomb case, restricted to B3LYP only since the Coulomb matrix does not naturally account for varying HF exchange.
Model | Descriptor | Training |
Test |
CSD |
|||
RMSE | Max UE | RMSE | Max UE | RMSE | Max UE | ||
LASSO | Set g | 16.1 | 89.7 | 15.7 | 93.5 | 19.2 | 72.5 |
KRR | Set g | 1.6 | 8.5 | 3.9 | 17.0 | 38.3 | 88.4 |
SVR | Set g | 2.1 | 20.9 | 3.6 | 20.4 | 20.3 | 64.8 |
ANN | Set g | 3.0 | 12.3 | 3.1 | 15.6 | 13.1 | 30.4 |
KRR | Sorted Coulomb | 4.3 | 41.5 | 30.8 | 103.7 | 54.5 | 123.9 |
KRR, B3LYP only | Sorted Coulomb | 17.2 | 58.0 | 28.1 | 69.5 | 46.7 | 118.7 |