Skip to main content
Journal of Food Science and Technology logoLink to Journal of Food Science and Technology
. 2019 Nov 26;57(4):1535–1543. doi: 10.1007/s13197-019-04189-4

Comparison of different classification algorithms to identify geographic origins of olive oils

Ozgur Gumus 1, Erkan Yasar 1, Z Pinar Gumus 2,, Hasan Ertas 3
PMCID: PMC7054565  PMID: 32180650

Abstract

Research on investigation and determination of geographic origins of olive oils is increased by consumers’ demand to authenticated olive oils. Classification algorithms which are machine learning methods can be employed for the authentication of olive oils. In this study, different classification algorithms were evaluated to reveal the most accurate one for authentication of Turkish olive oils. BayesNet, Naive Bayes, Multilayer Perception, IBK, Kstar, SMO, Random Forest, J48, LWL, Logistic Regression, Simple Logistic, LogitBoost algorithms were implemented on 61 chemical analysis parameters of 49 olive oil samples from 6 different locations at Western Turkey. These 61 parameters were obtained from five different chemical analyses which are stable carbon isotope ratio, trace elements, sterol compositions, FAMEs and TAGs. This study is the most comprehensive study to determine the geographical origin of Turkish olive oils in terms of these mentioned features. Classification performances of the algorithms were compared using accuracy, specificity and sensitivity metrics. Random Forest, BayesNet, and LogitBoost algorithms were found as the best classification algorithms for authentication of Turkish olive oils. Using the classification model in this study, geographic origin of an unknown olive oil can be predicted with high accuracy. Besides, similar models can be developed to obtain useful information for authentication of other food products.

Electronic supplementary material

The online version of this article (10.1007/s13197-019-04189-4) contains supplementary material, which is available to authorized users.

Keywords: Machine learning, Classification algorithms, Authentication, Geographic origin, Olive oil

Introduction

Olive oil is one of the most important food in human diet in Mediterranean countries because of the beneficial health effect and sensory characteristics (Bakhouche et al. 2015; Loubiri et al. 2017). Authentication is a very important issue for the production of high-quality olive oils (Karabagias et al. 2013). For that reason, geographic origin becomes a strategic tool to find out Protected Designations of Origin (PDOs) and Protected Geographical Indications (PGIs) in many countries (Camin et al. 2010; Bajoub et al. 2016; Beltrán et al. 2015). Research on investigation and determination of geographic origins of olive oils is increased by consumers’ demand to authenticated olive oils (Aparicio et al. 2013). So quality, purity and safety assurance systems are needed to address olive oils.

The main components of olive oil are triacylglycerols, fatty acids and sterols compositions, which are very useful parameters for determining authenticity (Longobardi et al. 2012). In addition, trace elements and stable carbon isotope ratio are important factors for decisive origin (Gonzalvez et al. 2009; Drivelos and Georgiou 2012; Kelly et al. 2005). Analytical instruments generate excessive amounts of data for great number of samples in the short time and that data requires computational procedures, like machine learning methods, to reveal most valuable information (Buscema et al. 2014). Machine learning methods are starting to play important role in food classification that is one of the major methodologies in multivariate analysis (Buscema et al. 2014). So classification algorithms which are machine learning methods can be employed for the authentication of foods (Ropodi et al. 2016), adulteration of olive oil (Ruiz-Samblás et al. 2014) and to find out similarities of vegetable oils (Ai et al. 2014). Classification algorithms use the information about the class membership of the samples to a certain group (class or category) based on its pattern of measurements (Karakatič and Podgorelec 2016).

In the literature, there are many studies using classification algorithms for olive oil authentication. Decision tree algorithm, which is a classification algorithm, has been used for geographical classification of olive oils in many previous studies. For example, some varieties of decision tree algorithms such as CHAID, CART, C4.5, C5.0, and QUEST algorithms have been realized on Turkish olive oils and compared based on accuracy values in the study of Nasibov et al. (2016). A similar study was also performed on Greek olive oils by Petrakis et al. (2008). In the study of Nasibov et al., the fuzzy id3 algorithm was used for uncertain data (Nasibov et al. 2016) In a study by García-González et al. (2009) an artificial neural network model was used for geographical characterization of olive oils. Hence, it is important to find the best and most accurate algorithms for geographical classification of olive oils among many classification algorithms.

The main aim of this study is to compare different classification algorithms for authentication of Turkish olive oils and to reveal the most accurate one. For this purpose, 49 olive oil samples from 6 different locations of Western Turkey were chemically analysed in terms of quality and purity. 61 parameters from five different chemical analyses were selected. To identify geographic regions of olive oils, obtained data was evaluated with different classification algorithms, which are BayesNet, Naive Bayes, Multilayer Perception, IBK, Kstar, SMO, Random Forest, J48, LWL, Logistic Regression, Simple Logistic, LogitBoost. To measure effectiveness of the algorithms, accuracy, specificity and sensitivity metrics were used.

The novelty of this study is that it is the most comprehensive study so far to determine the geographical origin of Turkish olive oils in terms of the following characteristics: the highest number of locations where samples were collected (6), the highest number of chemical analysis types (5), the highest number of chemical analysis parameters (61), the highest number of classification algorithms used (12). In addition, this study has the highest accuracy value (93.87%). In addition to the comparison of algorithms, principal component analysis (PCA) was performed to show the effect of actual variables in samples based on geographical origin.

Materials and methods

In this study, 49 olive oil samples were collected from six different locations from western regions of Turkey including the Marmara, North Aegean, Central Aegean and South Aegean regions. 9 olive oil samples from Bursa (B), 11 samples from Edremit Bay (K), 8 samples from İzmir (I), 7 samples from Aydın (A), 6 samples from Muğla (ML) and 8 samples from Manisa (MS) were collected.

Some of the quality and purity analyses performed for this study are following: triacylglycerols (TAGs), fatty acid methyl esters (FAMEs), sterols, trace elements and δ 13C isotope ratio. Sterol analysis, FAME analysis and TAG analysis were performed according to the official methods described in the COI/T.20/Doc. No.10/Rev.1, COI/T.20/Doc. No 33 and COI/T.20/Doc. No.20/Rev.2 and COI/T.20/Doc. No. 25/Rev. 1, respectively. The TAGs are designated by letters corresponding to abbreviated names of fatty acid carbon chains that are attached to the glycerol. The abbreviations of fatty acids are palmitoyl (P), palmitoleyl (Po), stearoyl (S), oleoyl (O), linoleoyl (L), and linolenyl (Ln). Trace elements analysis was determined according to EPA Methods 3051, AOAC 986.15A, TS EN 14332 with a quadrupole ICP-MS (Agilent 7500CE, Agilent Technologies). A microwave oven (CEM-MARS) was used for digestion of olive oil samples for trace element analysis. Analysis of δ13C isotope ratios in olive oils was analysed using an isotope ratio mass spectrometer (Micromass, IsoPrime) connected with Dumas combustion for an elemental analyser (EuroVector) for δ13C isotope ratios regard as (AOAC 984.23, AOAC 991.41 Methods (Official Methods of Analysis of the Association of Official Analytical Chemists 18th. Ed. 2005, TS EN 12140) (Gumus et al. 2017, 2018). GC-FID system was used in sterol composition and FAME analysis, HPLC-RID system was used in TAGs analysis. ICP-MS was used for trace elements and IR-MS was used for δ13C stable isotope ratio. Details on the analytical techniques was given in the supplementary material (SM-2).

As a result of applying these chemical analyses to olive oil samples, 61 parameters expressing quality and purity values for each olive oil sample were obtained. Name of these parameters were given in the referred literatures according to chemical analysis and supplementary material (SM-2) (Gumus et al. 2017, 2018). Obtained values of these parameters were normalized to fit into specific ranges. After that different classification algorithms were implemented on these normalized data to establish the best classification model for authentication. These classification algorithms, evaluation metrics to compare them and application procedure are explained in follow.

Classification algorithms

The data set is evaluated with BayesNet, Naive Bayes, Multilayer Perception (Artificial Neural Network), IBK (K-Nearest Neighbour), Kstar (K*), Sequential Minimal Optimization (SMO), Random Forest, J48 (Decision Tree), Locally Weighted Learning (LWL), Logistic Regression, Simple Logistic, LogitBoost classification algorithms. Brief descriptions of these classification algorithms are summarized in Table 1.

Table 1.

Description of classification algorithms used in this study

Classification algorithm Description
BayesNet These networks are known in the literature as Bayes network or belief network and their most important features are statistical graphs. In Bayesian networks, variables are represented by nodes and relations between these variables are indicated by directional arrows. These networks reduce the complexity of relationships between data (Pearl 1988)
Naive Bayes This algorithm is a simplified version of the Bayesian theory. It is a probabilistic classifier that uses statistical methods (Kavitha et al. 2016)
Multilayer Perception It is a classification algorithm based on biological neural networks. It consists of artificial neurons and their connections to each other. They have a structure consisting of 3 layers. These layers are the input layer on which the inputs are located, the hidden layer on which the prediction operations are carried out, and finally the output layer on which the results are located (Parlos et al. 1994)
IBK IBK is also known the k nearest neighbour’s algorithm (KNN) algorithm. At first, all observation values are evaluated as a cluster. Gradually, these clusters are merged into new clusters. Euclidean distance method is used for distance calculation (Romero et al. 2013; Nettleton et al. 2010)
Kstar (K*) K* is an instance-based classifier and it uses an entropy-based distance function. instance based classifier which is also called memory based classifier is able to learn quickly (Cleary and Trigg 1995)
SMO The SMO algorithm, also known as the support vector machine, is one of the frequently used classifiers in machine learning. Classifies the non-linearly separable data with the help of a kernel function (Nettleton et al. 2010; Huang et al. 2015)
Random Forest Random Forest is an ensemble learning method and It consists of a combination of many decision trees. The basic principle is that a group of “weak learners” can come together to form a “strong learner” (Breiman and Cutler 2005)
J48 J48 which is a type of decision Tree algorithm is a commonly used in machine learning. It uses a tree structure to make classification and regression. Each inner node of the tree corresponds to a property, and each leaf node corresponds to a class label (Kavitha et al. 2016; Romero et al. 2013)
LWL Locally weighted learning algorithm is a non-parametric classifier. Instead of creating a global model for the entire functional space, a local model based on the neighbouring data of the query point is created for each point of interest (Christopher et al. 1997)
Logistic Regression Logistic regression is known one of the most common algorithms for classification and regression problems by using a logistic function which also called sigmoid function. This algorithm is used to estimate the probability of the occurrence of an event and explain the relationship between the dependent variable and independent variables (Hall et al. 2009; Friedman et al. 2005)
Simple Logistic Simple logistic regression is a classifier for building linear logistic regression models. Simple Logistic model using LogitBoost algorithm and incorporates attribute selection by fitting simple regression functions in LogitBoost (Hall et al. 2009)
LogitBoost LogitBoost is a boosting classification algorithm. Boosting is combine several weak classifiers to improve the classification performance. This algorithm is an extension of Adaboost algorithm and this algorithm is an adaptive algorithm that can get much higher prediction precision. LogitBoost algorithm could reduce training errors linearly and hence yield better generalization (Hall et al. 2009)

Evaluation metrics

In classification problems, accuracy is generally used as an evaluation criterion. However, if there are more than two classes, accuracy can be misleading and alone does not provide sufficient information. Calculation of the confusion matrix provides more information about the performance of the classification algorithms. Confusion matrix gives information about what kind of mistakes were made during estimation. Therefore, it is identifying which points are correctly classified and which points are incorrectly classified. While columns show predicted values, the rows show actual values in the confusion matrix.

In this study, accuracy, sensitivity and specificity metrics were used to compare classification performance of the algorithms. Accuracy approximates how effective the algorithm is by showing the probability of the true value of the class label. In our domain, accuracy is the number of successful predictions of geographic regions relative to the total number of predictions. Accuracy is calculated as the number of all correct predictions divided by the total number of the dataset:

Accuracy=TP+TNTP+TN+FP+FN

where TP, TN, FP and FN denotes true positives, true negatives, false positives and false negatives, respectively.

Sensitivity which also called recall addresses the question “Given a positive example, will the classifier detect it?” and it is measures the proportion of positives that are correctly identified:

Sensitivity=TPTP+FN

Specificity which also called true negative rate is measures the proportion of negatives that are correctly identified:

Specificity=TNTN+FP

Kappa value is also a very important criterion. It measures interrater (classification algorithms) reliability. Interrater reliability, or precision, happens when interrater give the same score to the same data item. Kappa value ranges from −1 (total disagreement) through 0 (random classification) to 1 (perfect agreement) (Viera and Garrett 2005).

Application procedure

Implementations of classification algorithms were performed with the WEKA 3.8 data analyses tool (Waikato Environment for Knowledge Analysis, Waikato University, New Zealand). WEKA is a library of Java programs for data pre-processing, classification, regression, clustering, association rules, and visualization (WEKA: www.cs.waikato.ac.nz). First, the data was converted into csv format supported by the WEKA program to analyse the data. Then the data was normalized for getting all features on the same scale. After that BayesNet, Naive Bayes, Multilayer Perception, IBK, Kstar, SMO, Random Forest, J48, LWL, Logistic Regression, Simple Logistic and LogitBoost algorithms were performed using the WEKA tool in order to show which classification algorithm is effective. The parameters of the WEKA program have been left to default values since any changes could positively or negatively affect model performance.

Since the number of samples was low, k fold cross-validation method was used and the number of fold (k) was taken as 5. In k-fold cross-validation, the dataset is splitted k equal sized subsets. One subset kept for verifying of testing the model and remaining subsets used for training of the model. The cross-validation process repeats k times (k-fold) and each of the k-subsets is used exactly once as the verification data. Then, the k results obtained from the folds can be averaged to produce a single estimation.

Evaluation metrics used in this study are accuracy, sensitivity and specificity. Based on the confusion matrices, values of these evaluation metrics were calculated for each algorithm. After that, the performance of 12 classification algorithms were compared by evaluation metrics and Kappa values.

Principal component analysis (PCA)

The multivariate data analyses were performed using the MINITAB 15 Statistical Software. PCA results are showed with scores and loading plots. Score plot was given knowledge about contact between principal groupings and observations. Loading plots are used to explain the relationship between variables and cluster observations in the score plots.

Results and discussion

Accuracy values of twelve algorithms for fivefold cross-validation (CV) and average accuracy value of these folds are presented in Table 2.

Table 2.

Accuracy values of the algorithms for fivefold cross-validation

Classification algorithms Accuracy (%)
BayesNet 91.84
Naive Bayes 79.59
Multilayer Perception 85.71
IBK 85.71
Kstar 83.67
SMO 83.67
Random Forest 93.88
J48 79.59
LWL 77.55
Logistic Regression 79.59
Simple Logistic 89.80
LogitBoost 93.88

The highest accuracy values are written in bold

According to Table 2, all algorithms have a value above 77.55% accuracy for this data set. Also Random Forest, BayesNet and LogitBoost algorithms have the best performance and they have over 90% accuracy. Compared to Nasibov et al.’s study and Petrakis et al.’ study, accuracy values of these three algorithms obtained are much better (Nasibov et al. 2016; Petrakis et al. 2008). In the study of Nasibov et al., the highest accuracy value was 88.45% with CART algorithm, which is a type of decision tree algorithm. In the study of Petrakis et al., the highest accuracy value was 87.02% with canonical discriminant analyses. In the study of García-González et al. (2009), olive oils were classified according to country, region, province and county with artificial neural network method. In that study, accuracy values according to country and region were higher than %91 but according to province which is similar to this study was 92.1%.

As it is seen in the Table 2, while Random Forest, BayesNet and LogitBoost algorithms are more successful than others, Naive Bayes, J48, LWL and Logistic Regression algorithms have performed poorly.

In a confusion matrix, the number of correct and incorrect predictions are summarized with count values and broken down by each class. So confusion matrices of the algorithms with the best performance are shown in Tables 3, 4 and 5. Confusion matrices and statistical evaluation results of each algorithm using fivefold cross-validation are given in SM-2. The Kappa statistic values of BayesNet, Random Forest and LogitBoost algorithms using fivefold CV are 0.9014, 0.9013 and 0.9261, respectively. The fact that the Kappa statistic values are close to + 1 indicates the correctness of the results and how good the fit is.

Table 3.

Confusion Matrix of BayesNet using fivefold cross-validation

Predicted class Sensitivity (%)
Aydin Bursa Izmir Edremit Muğla Manisa
Actual class
Aydin 7 0 0 0 0 0 100.00
Bursa 1 8 0 0 0 0 88.89
Izmir 0 1 6 0 0 1 75.00
Edremit 0 0 0 11 0 0 100.00
Muğla 0 0 0 0 6 0 100.00
Manisa 0 0 1 0 0 7 87.50
Specificity (%) 97.62 97.50 97.56 100.00 100.00 95.24 91.84% (Accuracy)

Table 4.

Confusion Matrix of RandomForest using fivefold cross-validation

Predicted class Sensitivity (%)
Aydin Bursa Izmir Edremit Muğla Manisa
Actual class
Aydin 7 0 0 0 0 0 100.00
Bursa 0 9 0 0 0 0 100.00
Izmir 0 1 6 0 0 1 75.00
Edremit 0 0 0 11 0 0 100.00
Muğla 0 0 0 0 6 0 100.00
Manisa 0 0 1 0 0 7 87.50
Specificity (%) 100.00 97.50 97.56 100.00 100.00 97.56 93.88% (Accuracy)

Table 5.

Confusion Matrix of LogitBoost using fivefold cross-validation

Predicted class Sensitivity (%)
Aydin Bursa Izmir Edremit Muğla Manisa
Actual class
Aydin 7 0 0 0 0 0 100.00
Bursa 0 8 1 0 0 0 88.89
Izmir 0 0 7 0 0 1 87.50
Edremit 0 0 0 11 0 0 100.00
Muğla 0 0 0 0 6 0 100.00
Manisa 1 0 0 0 0 7 87.50
Specificity (%) 97.62 100.00 97.56 100.00 100.00 97.56 93.88% (Accuracy)

As seen in Table 3, all of the olive oil samples collected from Aydın, Mugla and Edremit Bay locations are classified correctly with BayesNet using fivefold CV. Only one sample from Bursa and Manisa and two samples from İzmir are misclassified and resemble other regions.

According to confusion matrix in Table 4, samples from Aydın, Bursa, Mugla and Edremit Bay locations are correctly classified with Random Forest using fivefold CV. Two samples from İzmir and one sample from Manisa are misclassified.

The confusion matrix of LogitBoost algorithm using fivefold CV is given in Table 5. Samples from Aydın, Edremit Bay and Mugla are correctly classified according to LogitBoost. One oil sample from the other three locations is misclassified.

It is seen that all the olive oil samples belonging to Aydın, Mugla, and Edremit Bay regions, which are geographically distant from each other, are correctly classified in all three algorithms. Some olive oil samples from Izmir are incorrectly classified as in Manisa or in Bursa in all three algorithms. In fact, Izmir and Manisa were geographically close to each other and this kind of deviation is expected. However Izmir and Bursa are geographically distant from each other but their climate characteristics are similar. In conclusion, the results obtained from this study confirm that geographic proximity and climate characteristics are effective in determining the geographical origins of olive oils.

Olive oil samples were collected directly from the producer in different locations of Aegean and Marmara regions of Turkey. Olive oil samples were collected from the cities in Aegean and Marmara regions where most of Turkey’s olive oil are produced in. Aydın and Muğla are in the Southern Aegean, İzmir and Manisa are in the Middle Aegean, Edremit Bay is in the Northern Aegean while Bursa is in the Marmara Region. Oils identified as Edremit Bay were collected from different places along the coast of Edremit Bay.

In the Aegean region, the Mediterranean climate is generally dominant on the coasts. Manisa has a Mediterranean climate and continental climate characteristics of Central Anatolia. Although Manisa does not have a coastline, it is one of the closest provinces. Spil Mountain, which is a volcanic mountain, has an effect on the climate of Manisa. Bursa has Marmara climate and Uludağ Mountain has effected climate in Bursa. Edremit Bay is generally affected by the Mediterranean climate.

All locations show differences in climatic characteristics. There are differences in terms of climatic characteristics such as annual average temperature, rainfall amount, humidity amount and number of days covered with snow.

While the mountains in the Aegean region lie perpendicular to the sea, Manisa and Bursa are under the influence of Spil and Uludağ mountains respectively. However, Mount Ida has an impact on Edremit Bay.

Besides these differences in climate and geographical characteristics, soil properties are as follows. There are reddish mediterranean soils in Muğla and reddish mediterranean soils in İzmir as well as alluvial soils. While alluvial soils dominate Manisa, there is a little reddish Mediterranean soils. In Edremit Bay, Rendzina calcareous soils and calcareous forest soils are dominant. There are mostly calcareous forest soils in Aydın and acid-reacted forest soils in Bursa.

These differences and similarities change the properties of the oils collected from these locations and differentiate their quality and prices.

As can be seen from the PCA Score Plot, Izmir and Manisa samples were separated from other locations by PC1 effect. Both PC1 and PC2 effects differentiated Bursa location from other samples. Edremit Bay was separated from other locations and clustered with the effect of PC2. Aydın and Muğla are very close to each other. However, Muğla is clustered in itself. Although the total effect of PC1 and PC2 is 25.68%, the classification of locations is clearly visible (Fig. 1).

Fig. 1.

Fig. 1

Score plot of PCA

The Edremit Bay is grouped in itself and separated from other locations. This is expected because Edremit Bay differs from other locations according to delta C13 values. Bursa is separated from other locations except one oil sample. Climatic features dominate the Bursa location in the Marmara region. Although İzmir and Manisa are close to each other, these two locations are not clustered. Samples collected from the non-coastal locations of İzmir showed similarities with Manisa and Aydın. Muğla and Aydın clustered very close to each other.

Although the Izmir and Manisa locations in the Middle Aegean region differ from other locations in spite of their climate and soil characteristics, some Manisa locations are located in the same region as Izmir. In general, Izmir samples were separated with the effect of similar parameters. The PCA Loading plot shows the similarity and differences from which parameters (Fig. 2).

Fig. 2.

Fig. 2

Loading plot of PCA

As can be seen from the loading plot, trace elements such as Zn, Fe, Cu and Ca were dominant on the Manisa while TAGs were more dominant in the separation of İzmir. Although Edremit Bay and Izmir are coastline, according to delta C13 results, Edremit Bay is separated from Izmir. Fatty acid composition is dominant on Mugla locations. The parameters effective on Bursa were oleic acid, arachidonic acid, stearic acid, delta 7 stigmastenol, clerosterol and TAGs such as POO, SOO, SOS + SLS and OOO + PoPP.

Although Aydın, İzmir, Muğla, Aydın and Edremit Bay locations are in the Aegean Region, agro-climatic properties, soil properties and mountains have affected the classification of these locations. Bursa, which is located in Marmara region, has differentiated from other locations and has clearly demonstrated the effect of the geographical region difference. Agro-climatic changes, even in geographically close locations apart from the distance between them, have been effective in separating olive oil.

Conclusion

In this study, different classification methods were evaluated in order to predict geographical locations according to chemical properties of olive oils. According to the best accuracy measure, while BayesNet and Random Forest gave the best performance with 93.87%, Logitboost has 91.83% accuracy. In terms of Kappa value, LogitBoost were given the highest value with 0.9261. Considering accuracy values as well as Kappa statistics of algorithms, it can be said that they are very successful algorithms for classifying multi-parametric datasets. Thus, BayesNet, Random Forest, and LogitBoost have been identified as the most successful classification algorithms for Turkish olive oils. Using the classification model in this study, geographic origin of an unknown olive oil can be predicted with high accuracy. Besides, similar models can be developed to obtain useful information for PDO and PGI of other food products.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Acknowledgements

This study was supported Ege University, Council of Scientific Research Projects (Project No. 14-MUH-063 BAP project). Chemical analyses of this work was supported by the EGE University Drug Research and Pharmacokinetic Development and Applied Center (ARGEFAR).

Compliance with ethical standards

Conflict of interest

The authors have declared that they have no conflict of interest.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Ai FF, Bin J, Zhang ZM, Huang JH, Wang JB, Liang YZ, Yu L, Yang ZY. Application of random forests to select premium quality vegetable oils by their fatty acid composition. Food Chem. 2014;143:472–478. doi: 10.1016/j.foodchem.2013.08.013. [DOI] [PubMed] [Google Scholar]
  2. Aparicio R, Morales MT, Aparicio-Ruiz R, Tena N, García-González DL. Authenticity of olive oil: mapping and comparing official methods and promising alternatives. Food Res Int. 2013;54:2025–2038. doi: 10.1016/j.foodres.2013.07.039. [DOI] [Google Scholar]
  3. Bajoub A, Ajal EA, Fernández-Gutiérrez A, Carrasco-Pancorbo A. Evaluating the potential of phenolic profiles as discriminant features among extra virgin olive oils from Moroccan controlled designations of origin. Food Res Int. 2016;84:41–51. doi: 10.1016/j.foodres.2016.03.010. [DOI] [Google Scholar]
  4. Bakhouche A, Lozáno-Sanchez J, Fernández-Gutiérrez A, Carretero AS (2015) Trends in chemical characterization of virgin olive oil phenolic profile: an overview and new challenges. Olivea 3–15. www.internationaloliveoil.org/store/download/92
  5. Beltrán M, Sánchez-Astudillo M, Aparicio R, García-González DL. Geographical traceability of virgin olive oils from south-western Spain by their multi-elemental composition. Food Chem. 2015;169:350–357. doi: 10.1016/j.foodchem.2014.07.104. [DOI] [PubMed] [Google Scholar]
  6. Breiman L, Cutler A (2005). Random forests. Berkeley
  7. Buscema M, Consonni V, Ballabio D, Mauri A, Massini G, Breda M, Todeschini R. K-CM: a new artificial neural network. Application to supervised pattern recognition. Chemom Intell Lab Syst. 2014;138:110–119. doi: 10.1016/j.chemolab.2014.06.013. [DOI] [Google Scholar]
  8. Camin F, Larcher R, Perini M, Bontempo L, Bertoldi D, Gagliano G, Nicolini G, Versini G. Characterisation of authentic Italian extra-virgin olive oils by stable isotope ratios of C, O and H and mineral composition. Food Chem. 2010;118:901–909. doi: 10.1016/j.foodchem.2008.04.059. [DOI] [Google Scholar]
  9. Christopher A, Andrew M, Stefan S. Locally weighted learning. Artif Intell Rev. 1997;11:11–73. doi: 10.1023/A:1006559212014. [DOI] [Google Scholar]
  10. Cleary JG, Trigg LE. K*: an instance-based learner using an entropic distance measure. Proc Twelveth Int Conf Mach Learn. 1995;5:108–114. [Google Scholar]
  11. Drivelos S, Georgiou C. Multi-element and multi-isotope-ratio analysis to determine the geographical origin of foods in the European Union. TrAC Trends Anal Chem. 2012;40:38–51. doi: 10.1016/j.trac.2012.08.003. [DOI] [Google Scholar]
  12. Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors) Ann Stat. 2000;28(2):337–407. doi: 10.1214/aos/1016218223. [DOI] [Google Scholar]
  13. García-González DL, Luna G, Morales MT, Aparicio R. Stepwise geographical traceability of virgin olive oils by chemical profiles using artificial neural network models. Eur J Lipid Sci Technol. 2009;111:1003–1013. doi: 10.1002/ejlt.200900015. [DOI] [Google Scholar]
  14. Gonzalvez A, Armenta S, de la Guardia M. Trace-element composition and stable-isotope ratio for discrimination of foods with protected designation of origin. TrAC Trends Anal Chem. 2009;28:1295–1311. doi: 10.1016/j.trac.2009.08.001. [DOI] [Google Scholar]
  15. Gumus ZP, Celenk VU, Tekin S, Yurdakul O, Ertas H. Determination of trace elements and stable carbon isotope ratios in virgin olive oils from Western Turkey to authenticate geographical origin with a chemometric approach. Eur Food Res Technol. 2017;243:1719–1727. doi: 10.1007/s00217-017-2876-4. [DOI] [Google Scholar]
  16. Gumus ZP, Ertas H, Yasar E, Gumus O. Classification of olive oils using chromatography, principal component analysis and artificial neural network modelling. Food Measur Charact. 2018;12:1325–1333. doi: 10.1007/s11694-018-9746-z. [DOI] [Google Scholar]
  17. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl. 2009;11:10–18. doi: 10.1145/1656274.1656278. [DOI] [Google Scholar]
  18. Huang X, Shi L, Suykens JAK. Sequential minimal optimization for SVM with pinball loss. Neurocomputing. 2015;149:1596–1603. doi: 10.1016/j.neucom.2014.08.033. [DOI] [Google Scholar]
  19. Karabagias I, Michos C, Badeka A, Kontakos S, Stratis I, Kontominas MG. Classification of Western Greek virgin olive oils according to geographical origin based on chromatographic, spectroscopic, conventional and chemometric analyses. Food Res Int. 2013;54:1950–1958. doi: 10.1016/j.foodres.2013.09.023. [DOI] [Google Scholar]
  20. Karakatič S, Podgorelec V. Improved classification with allocation method and multiple classifiers. Inf Fusion. 2016;31:26–42. doi: 10.1016/j.inffus.2015.12.006. [DOI] [Google Scholar]
  21. Kavitha AP, Jaleel UCA, Mujeeb VMA, Muraleedharan K. Performance of knowledge-based biological models in higher dimensional chemical space. Chemom Intell Lab Syst. 2016;153:58–66. doi: 10.1016/j.chemolab.2016.02.009. [DOI] [Google Scholar]
  22. Kelly S, Heaton K, Hoogewerff J. Tracing the geographical origin of food: the application of multi-element and multi-isotope analysis. Trends Food Sci Technol. 2005;16:555–567. doi: 10.1016/j.tifs.2005.08.008. [DOI] [Google Scholar]
  23. Longobardi F, Ventrella A, Casiello G, Sacco D, Tasioula-Margari M, Kiritsakis K, Kontominas MG. Characterisation of the geographical origin of Western Greek virgin olive oils based on instrumental and multivariate statistical analysis. Food Chem. 2012;133:169–175. doi: 10.1016/j.foodchem.2011.09.130. [DOI] [PubMed] [Google Scholar]
  24. Loubiri A, Taamalli A, Talhaoui N, Mohamed SN, Carretero AS, Zarrouk M. Usefulness of phenolic profile in the classification of extra virgin olive oils from autochthonous and introduced cultivars in Tunisia. Eur Food Res Technol. 2017;243(3):467–479. doi: 10.1007/s00217-016-2760-7. [DOI] [Google Scholar]
  25. Nasibov E, Kantarcı S, Vahaplar A, Kınay AÖ. A survey on geographic classification of virgin olive oil with using T-operators in fuzzy decision tree approach. Chemom Intell Lab Syst. 2016;155:86–96. doi: 10.1016/j.chemolab.2016.04.004. [DOI] [Google Scholar]
  26. Nettleton DF, Orriols-Puig A, Fornells A. A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev. 2010;33:275–306. doi: 10.1007/s10462-010-9156-z. [DOI] [Google Scholar]
  27. Parlos AG, Member S, Femandez B, Atiya AF, Ieee M, Muthusami J, Tsai WK. An accelerated learning algorithm for multilayer perceptron networks. IEEE Trans Neural Netw Learn Syst. 1994;5:493–497. doi: 10.1109/72.286921. [DOI] [PubMed] [Google Scholar]
  28. Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference. LosAlios: Morgan Kaufmann; 1988. [Google Scholar]
  29. Petrakis PV, Agiomyrgianaki A, Christophoridou S, Spyros A, Dais P. Geographical characterization of Greek virgin olive oils (cv. Koroneiki) using 1H and 31P NMR fingerprinting with canonical discriminant analysis and classification binary trees. J Agric Food Chem. 2008;56:3200–3207. doi: 10.1021/jf072957s. [DOI] [PubMed] [Google Scholar]
  30. RandomForest http://www.stat.berkeley.edu/~breiman/RandomForests/. Accessed 09 June 2019
  31. Romero JR, Roncallo PF, Akkiraju PC, Ponzoni I, Echenique VC, Carballido JA. Using classification algorithms for predicting durum wheat yield in the province of Buenos Aires. Comput Electron Agric. 2013;96:173–179. doi: 10.1016/j.compag.2013.05.006. [DOI] [Google Scholar]
  32. Ropodi AI, Panagou EZ, Nychas GJE. Data mining derived from food analyses using non-invasive/non-destructive analytical techniques; determination of food authenticity, quality & safety in tandem with computer science disciplines. Trends Food Sci Technol. 2016;50:11–25. doi: 10.1016/j.tifs.2016.01.011. [DOI] [Google Scholar]
  33. Ruiz-Samblás C, Cadenas JM, Pelta DA, Cuadros-Rodríguez L. Application of data mining methods for classification and prediction of olive oil blends with other vegetable oils. Anal Bioanal Chem. 2014;406:2591–2601. doi: 10.1007/s00216-014-7677-z. [DOI] [PubMed] [Google Scholar]
  34. Viera AJ, Garrett JM. Understanding interobserver agreement: the kappa statistic. Fam Med. 2005;37(5):360. [PubMed] [Google Scholar]
  35. WEKA link: http://www.cs.waikato.ac.nz/ml/weka/. Accessed 09 June 2019

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Journal of Food Science and Technology are provided here courtesy of Springer

RESOURCES