Skip to main content
. 2020 Sep 14;10:15026. doi: 10.1038/s41598-020-71693-5

Table 1.

Number of features, R2 score, Pearson correlation (R), Major Error (ME), Very Major Error (VME), area under the receiver operating curve (AUC), Accuracy within a two/four-fold dilution (ACC-2, ACC-4) and Mean Absolute Fold Error (MAFE) on the unseen test data. For the AUC, ME, VME the data was binarized using 1 mg/L threshold.

Model N_feat R2b Rb MEa,c VMEa,c AUC ACC-2 ACC-4 MAFEc
Random forest 4 0.932 0.966 1 0 1.000 0.658 0.944 0.883
Random forest 15 0.902 0.951 5 0 0.996 0.680 0.914 0.915
Linear regression 4 0.918 0.959 0 2 1.000 0.650 0.929 0.954

The number of features were selected according to the performance using leave-one-country-out validation on the training data, see Supplementary Fig. S1.

aNumber of samples.

bCalculated on the log2 values.

cThe lower the better.