Table 2.
Random Forest (RF), Support Vector Machines (SVM), and Naïve Bayes (NB) classification model parameters for cross-validation and external validation in authenticating extra virgin olive oil.
Pre-processing | Model | Seven-Class Models |
Three-Class Models |
Two-Class/Binary Models |
|||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Optimal Parameters | ACC.cv | ACC.p | Optimal Parameters | ACC.cv | ACC.p | Optimal Parameters | ACC.cv | Sens. cv | Prec. cv | Spec. cv | F1.cv | ACC.p | Sens. p | Prec. p | Spec. p | F1.p | MCC.p | ||
Unprocessed | RF | mt = 18, nt = 500 | 81.6 | 64.2 | mt = 43,nt = 500 | 99.1 | 97.7 | mt = 55,nt = 500 | 99.4 | 99.7 | 99.8 | 95.6 | 99.7 | 99.2 | 99.1 | 100 | 100 | 99.6 | 0.94 |
SG smoothing | RF | mt = 43, nt = 500 | 81.2 | 64.0 | mt = 43,nt = 500 | 99.2 | 97.7 | mt = 72,nt = 500 | 99.5 | 99.6 | 99.8 | 96.0 | 99.7 | 99.2 | 99.1 | 100 | 100 | 99.6 | 0.94 |
SG+1st deriv. | RF | mt = 14, nt = 500 | 92.8 | 82.1 | mt = 14, nt = 500 | 99.9 | 99.8 | mt = 7,nt = 500 | 99.8 | 100 | 99.9 | 95.7 | 99.9 | 99.9 | 99.7 | 100 | 100 | 99.8 | 0.98 |
SG+2nd deriv. | RF | mt = 14, nt = 500 | 93.0 | 80.2 | mt = 14, nt = 500 | 99.9 | 99.8 | mt = 7,nt = 500 | 99.8 | 100 | 99.8 | 96.7 | 99.9 | 100 | 100 | 100 | 100 | 100 | 1.00 |
SNV | RF | mt = 43, nt = 500 | 90.3 | 71.5 | mt = 14, nt = 500 | 99.7 | 97.4 | mt = 7,nt = 500 | 99.7 | 99.9 | 99.8 | 94.9 | 99.8 | 98.0 | 99.4 | 98.4 | 78.6 | 99.0 | 0.84 |
SNV + SG Smoothing | RF | mt = 14, nt = 500 | 90.6 | 70.7 | mt = 14, nt = 500 | 99.7 | 97.4 | mt = 7, nt = 500 | 99.7 | 99.9 | 99.8 | 95.0 | 99.8 | 98.0 | 99.5 | 98.4 | 78.6 | 99.0 | 0.84 |
SNV + SG+1st deriv. | RF | mt = 14, nt = 500 | 94.7 | 76.4 | mt = 14,nt = 500 | 100 | 100 | mt = 7,nt = 500 | 99.9 | 100 | 99.9 | 97.3 | 99.9 | 100 | 100 | 100 | 100 | 100 | 1.00 |
SNV + SG+2nd deriv. | RF | mt = 14, nt = 500 | 96.0 | 86.0 | mt = 14,nt = 500 | 100 | 100 | mt = 7,nt = 500 | 99.9 | 100 | 99.9 | 97.1 | 99.9 | 100 | 100 | 100 | 100 | 100 | 1.00 |
MSC | RF | mt = 43, nt = 500 | 90.8 | 71.2 | mt = 43,nt = 500 | 99.8 | 98.9 | mt = 7,nt = 500 | 99.7 | 99.9 | 99.8 | 95.0 | 99.9 | 98.5 | 99.0 | 98.0 | 71.4 | 98.4 | 0.88 |
MSC + SG Smoothing | RF | mt = 43, nt = 500 | 90.8 | 71.4 | mt = 14,nt = 500 | 99.8 | 97.4 | mt = 7,nt = 500 | 99.7 | 99.9 | 99.7 | 94.7 | 99.9 | 99.2 | 99.3 | 99.8 | 97.8 | 98.7 | 0.90 |
MSC + SG+1st deriv. | RF | mt = 14, nt = 500 | 92.8 | 82.1 | mt = 14,nt = 500 | 99.9 | 99.8 | mt = 7,nt = 500 | 99.8 | 100 | 99.8 | 95.7 | 99.9 | 99.7 | 99.7 | 100 | 100 | 99.8 | 0.98 |
MSC + SG+2nd deriv. | RF | mt = 14, nt = 500 | 96.4 | 86.2 | mt = 43,nt = 100 | 99.9 | 99.8 | mtry = 7, nt = 500 | 99.8 | 100 | 99.8 | 96.2 | 99.9 | 100 | 100 | 100 | 100 | 100 | 1.00 |
Unprocessed | SVM | C = 5, σ = 0.01 | 55.2 | 55.6 | C = 10, σ = 0.01 | 99.7 | 99.2 | C = 5, σ = 0.01 | 99.5 | 99.6 | 99.9 | 97.7 | 99.7 | 99.5 | 99.8 | 99.7 | 95.2 | 99.7 | 0.96 |
SG smoothing | SVM | C = 10, σ = 0.01 | 55.3 | 55.6 | C = 10, σ = 0.01 | 99.7 | 99.2 | C = 5, σ = 0.01 | 99.4 | 99.5 | 99.9 | 97.8 | 99.7 | 99.4 | 99.7 | 99.7 | 95.2 | 99.7 | 0.95 |
SG+1st deriv. | SVM | C = 5, σ = 0.01 | 60.7 | 56.6 | C = 0.05, σ = 0.01 | 99.6 | 99.4 | C = 0.5, σ = 0.01 | 99.9 | 100 | 99.8 | 97.0 | 99.9 | 98.1 | 100 | 100 | 100 | 100 | 1.00 |
SG+2nd deriv. | SVM | C = 5, σ = 0.01 | 60.8 | 56.7 | C = 0.1, σ = 0.01 | 99.7 | 99.5 | C = 0.5, σ = 0.01 | 99.9 | 100 | 99.8 | 97.1 | 99.9 | 98.0 | 100 | 100 | 100 | 100 | 1.00 |
SNV | SVM | C = 10, σ = 0.01 | 58.6 | 50.7 | C = 0.5, σ = 0.01 | 99.9 | 97.6 | C = 0.5, σ = 0.01 | 99.8 | 100 | 99.8 | 97.0 | 99.9 | 97.6 | 100 | 97.4 | 64.3 | 98.7 | 0.79 |
SNV + SG Smoothing | SVM | C = 0.1, σ = 0.01 | 57.7 | 63.1 | C = 0.5, σ = 0.01 | 99.8 | 97.6 | C = 0.5, σ = 0.01 | 99.8 | 100 | 99.8 | 97.0 | 99.9 | 97.6 | 100 | 97.4 | 64.3 | 98.7 | 0.79 |
SNV + SG+1st deriv. | SVM | C = 0.05, σ = 0.01 | 61.3 | 55.9 | C = 0.05, σ = 0.01 | 99.7 | 98.2 | C = 0.1, σ = 0.01 | 99.7 | 100 | 99.9 | 97.6 | 99.8 | 100 | 100 | 100 | 100 | 100 | 1.00 |
SNV + SG+2nd deriv. | SVM | C = 0.05, σ = 0.01 | 63.1 | 54.0 | C = 0.5, σ = 0.01 | 99.9 | 99.0 | C = 0.05, σ = 0.01 | 99.9 | 100 | 99.9 | 99.9 | 99.9 | 100 | 100 | 100 | 100 | 100 | 1.00 |
MSC | SVM | C = 10, σ = 0.01 | 58.6 | 50.7 | C = 0.5, σ = 0.01 | 99.9 | 97.6 | C = 0.5, σ = 0.01 | 99.8 | 99.9 | 99.8 | 97.0 | 99.9 | 97.6 | 100 | 97.4 | 64.3 | 98.7 | 0.79 |
MSC + SG Smoothing | SVM | C = 0.1, σ = 0.01 | 57.8 | 63.3 | C = 0.5, σ = 0.01 | 99.9 | 97.6 | C = 0.5, σ = 0.01 | 99.8 | 99.9 | 99.8 | 97.0 | 99.9 | 97.6 | 100 | 97.4 | 64.3 | 98.7 | 0.79 |
MSC + SG+1st deriv. | SVM | C = 5, σ = 0.01 | 60.7 | 55.9 | C = 0.05, σ = 0.01 | 99.6 | 99.2 | C = 0.5, σ = 0.01 | 99.9 | 100 | 99.8 | 97.0 | 99.9 | 100 | 100 | 100 | 100 | 100 | 1.00 |
MSC + SG+2nd deriv. | SVM | C = 1, σ = 0.01 | 61.3 | 59.5 | C = 0.5, σ = 0.01 | 99.9 | 99.0 | C = 0.5, σ = 0.01 | 99.9 | 100 | 99.9 | 98.2 | 99.9 | 99.0 | 100 | 99.0 | 85.7 | 99.5 | 0.95 |
Unprocessed | NB | lc = 0.1, ad = 0.0 | 51.0 | 48.6 | lc = 0.1, ad = 0.0 | 97.4 | 94.3 | lc = 0.1, ad = 0.0 | 96.6 | 96.5 | 99.9 | 97.2 | 98.2 | 94.1 | 93.7 | 100 | 100 | 96.8 | 0.71 |
SG smoothing | NB | lc = 0.1, ad = 0.0 | 51.1 | 48.8 | lc = 0.1, ad = 0.0 | 97.4 | 94.3 | lc = 0.1, ad = 0.0 | 96.5 | 97.7 | 99.9 | 97.7 | 98.2 | 94.1 | 93.1 | 100 | 100 | 96.8 | 0.71 |
SG+1st deriv. | NB | lc = 0.1, ad = 1.0 | 72.0 | 68.3 | lc = 0.1, ad = 0.0 | 99.4 | 98.9 | lc = 0.1, ad = 0.0 | 99.3 | 99.3 | 99.0 | 98.0 | 99.6 | 96.8 | 96.6 | 100 | 100 | 99.4 | 0.92 |
SG+2nd deriv. | NB | lc = 0.1, ad = 1.0 | 72.2 | 68.2 | lc = 0.1, ad = 0.0 | 99.4 | 98.9 | lc = 0.1, ad = 0.0 | 99.2 | 99.3 | 99.9 | 97.8 | 99.6 | 98.9 | 98.8 | 100 | 100 | 99.4 | 0.92 |
SNV | NB | lc = 0.1, ad = 0.0 | 65.6 | 60.5 | lc = 0.1, ad = 0.0 | 98.7 | 94.1 | lc = 0.1, ad = 0.0 | 98.5 | 98.4 | 100 | 99.9 | 99.2 | 95.0 | 95.3 | 99.3 | 90.5 | 97.2 | 0.70 |
SNV + SG Smoothing | NB | lc = 0.1, ad = 0.0 | 65.7 | 60.5 | lc = 0.1, ad = 0.0 | 98.7 | 94.1 | lc = 0.1, ad = 0.0 | 98.5 | 98.4 | 100 | 99.9 | 99.2 | 95.0 | 95.3 | 99.2 | 90.5 | 97.2 | 0.70 |
SNV + SG+1st deriv. | NB | lc = 0.1, ad = 0.0 | 80.2 | 70.5 | lc = 0.1, ad = 0.0 | 99.8 | 99.5 | lc = 0.1, ad = 0.0 | 99.3 | 99.4 | 99.9 | 98.0 | 99.6 | 99.2 | 99.3 | 99.3 | 97.6 | 99.6 | 0.94 |
SNV + SG+2nd deriv. | NB | lc = 0.1, ad = 0.0 | 84.3 | 82.4 | lc = 0.1, ad = 0.0 | 99.7 | 97.6 | lc = 0.1, ad = 0.0 | 99.9 | 100 | 99.9 | 98.0 | 99.9 | 97.7 | 100 | 97.6 | 66.7 | 98.8 | 0.81 |
MSC | NB | lc = 0.1, ad = 0.0 | 65.6 | 60.8 | lc = 0.1, ad = 0.0 | 98.7 | 94.1 | lc = 0.1, ad = 0.0 | 98.4 | 98.4 | 100 | 99.9 | 99.2 | 95.0 | 95.2 | 99.3 | 90.5 | 97.2 | 0.70 |
MSC + SG Smoothing | NB | lc = 0.1, ad = 0.0 | 65.7 | 60.7 | lc = 0.1, ad = 0.0 | 98.7 | 94.4 | lc = 0.1, ad = 0.0 | 98.4 | 98.4 | 100 | 100 | 99.2 | 95.0 | 95.2 | 99.2 | 90.5 | 97.2 | 0.70 |
MSC + SG+1st deriv. | NB | lc = 0.1, ad = 1.0 | 72.0 | 68.2 | lc = 0.1, ad = 0.0 | 99.3 | 98.9 | lc = 0.1, ad = 0.0 | 99.3 | 99.3 | 99.9 | 98.0 | 99.6 | 98.9 | 98.8 | 100 | 100 | 99.4 | 0.92 |
MSC + SG+2nd deriv. | NB | lc = 0.1, ad = 0.0 | 84.0 | 82.1 | lc = 0.1, ad = 0.0 | 99.5 | 99.2 | lc = 0.1, ad = 0.0 | 99.5 | 99.6 | 99.9 | 98.5 | 99.8 | 99.3 | 99.8 | 99.5 | 92.9 | 99.7 | 0.95 |
The metric values for the trained models represent averaged classification parameters of 10-fold cross-validation repeated ten times. ACC.cv = Accuracy, Sens.cv = Sensitivity, Prec.cv = Precision, Spec.cv = Specificity, and F1.cv = F1 Score for cross-validation. ACC.p = Accuracy, Sens.p = Sensitivity, Prec.p = Precision, Spec.p = Specificity, and F1.p = F1 Score for the external validation set (test set). SNV = Standard Normal Variate; MSC = Multiplicative Scatter Correction; SG = Savitzky-Golay smoothing; 1st deriv. = 1st derivative; 2nd deriv. = second derivative. mt = mtry: optimal the number of features randomly sampled at each split in a decision tree within the Random Forest using cross-validation and out-of-bag error; nt = ntree, denotes the total number of decision trees created in the Random Forest ensemble based on model tuning and cross-validation. C = cost parameter, σ = Gaussian Radial Basis kernel function for SVM model. Lc = lap lace, ad = adjust parameters for the Naïve Bayes model. For the Seven-Class system, the classification involves seven groups: extra-virgin olive oil (EVOO), hazelnut oil (HZO), olive pomace oil (POO), refined olive oil (ROO), EVOO + HZO, EVOO + POO, and EVOO + ROO. The Three-Class system categorizes oils into three groups: authentic extra-virgin olive oil, edible oil adulterant (100%), or adulterated olive oil (1–40%). The Two-Class system is a binary classification distinguishing between pure EVOO and adulterated olive oil (1–100% adulteration).