. 2019 Dec 21;19:281. doi: 10.1186/s12911-019-1004-8

Table 2.

Advantages and limitations of different supervised machine learning algorithms

Supervised algorithm	Advantages	Limitations
Artificial neural network (ANN)	- Can detect complex nonlinear relationships between dependent and independent variables. - Requires less formal statistical training. - Availability of multiple training algorithms. - Can be applied to both classification and regression problems.	- Have characteristics of ‘black box’ - user can not have access to the exact decision-making process and therefore, - Computationally expensive to train the network for a complex classification problem. - Predictor or Independent variables require pre-processing.
Decision tree (DT)	- Resultant classification tree is easier to understand and interpret. - Data preparation is easier. - Multiple data types such as numeric, nominal, categorical are supported. - Can generate robust classifiers and can be validated using statistical tests.	- Require classes to be mutually exclusive. - Algorithm cannot branch if any attribute or variable value for a non-leaf node is missing. - Algorithm depends on the order of the attributes or variables. - Do not perform as well as some other classifier (e.g., Artificial Neural Network) [80]
K-nearest neighbour (KNN)	- Simple algorithm and can classify instances quickly. - Can handle noisy instances or instances with missing attribute values. - Can be used for classification and regression.	- Computationally expensive as the number of attributes increases. - Attributes are given equal importance, which can lead to poor classification performance. - Provide no information on which attributes are most effective in making a good classification.
Logistic regression (LR)	- Easy to implement and straightforward. - LR-based models can be updated easily. - Does not make any assumptions regarding the distribution of independent variable (s). - It has a nice probabilistic interpretation of model parameters.	- Does not have good accuracy when input variables have complex relationships. - Does not consider the linear relationship between variables. - Key components of LR - logic models, are vulnerable to overconfidence. - May overstate the prediction accuracy due to sampling bias. - Unless multinomial, generic LR can only classify variables that have two states (i.e., dichotomous).
Naïve Bayes (NB)	- Simple and very useful for large datasets. - Can be used for both binary and multi-class classification problems. - It requires less amount of training data. - It can make probabilistic predictions and can handle both continuous and discrete data.	- Classes must be mutually exclusive. - Presence of dependency between attributes negatively affects the classification performance. - It assumes the normal distribution of numeric attributes.
Random forest (RF)	- Lower chance of variance and overfitting of training data compared to DT, since RF takes the average value from the outcomes of its constituent decision trees. - Empirically, this ensemble-based classifier performs better than its individual base classifiers, i.e., DTs. - Scales well for large datasets. - It can provide estimates of what variables or attributes are important in the classification.	- More complex and computationally expensive. - Number of base classifiers needs to be defined. - It favours those variables or attributes that can take high number of different values in estimating variable importance. - Overfitting can occur easily.
Support vector machine (SVM)	- More robust compared to LR - Can handle multiple feature spaces. - Less risk of overfitting. - Performs well in classifying semi-structured or unstructured data, such as texts, images etc.	- Computationally expensive for large and complex datasets. - Does not perform well if the data have noise. - The resultant model, weight and impact of variables are often difficult to understand. - Generic SVM cannot classify more than two classes unless extended.

Supervised algorithm

Advantages

Limitations

Artificial neural network (ANN)

- Can detect complex nonlinear relationships between dependent and independent variables.

- Requires less formal statistical training.

- Availability of multiple training algorithms.

- Can be applied to both classification and regression problems.

- Have characteristics of ‘black box’ - user can not have access to the exact decision-making process and therefore,

- Computationally expensive to train the network for a complex classification problem.

- Predictor or Independent variables require pre-processing.

Decision tree (DT)

- Resultant classification tree is easier to understand and interpret.

- Data preparation is easier.

- Multiple data types such as numeric, nominal, categorical are supported.

- Can generate robust classifiers and can be validated using statistical tests.

- Require classes to be mutually exclusive.

- Algorithm cannot branch if any attribute or variable value for a non-leaf node is missing.

- Algorithm depends on the order of the attributes or variables.

- Do not perform as well as some other classifier (e.g., Artificial Neural Network) [80]

K-nearest neighbour (KNN)

- Simple algorithm and can classify instances quickly.

- Can handle noisy instances or instances with missing attribute values.

- Can be used for classification and regression.

- Computationally expensive as the number of attributes increases.

- Attributes are given equal importance, which can lead to poor classification performance.

- Provide no information on which attributes are most effective in making a good classification.

Logistic regression (LR)

- Easy to implement and straightforward.

- LR-based models can be updated easily.

- Does not make any assumptions regarding the distribution of independent variable (s).

- It has a nice probabilistic interpretation of model parameters.

- Does not have good accuracy when input variables have complex relationships.

- Does not consider the linear relationship between variables.

- Key components of LR - logic models, are vulnerable to overconfidence.

- May overstate the prediction accuracy due to sampling bias.

- Unless multinomial, generic LR can only classify variables that have two states (i.e., dichotomous).

Naïve Bayes (NB)

- Simple and very useful for large datasets.

- Can be used for both binary and multi-class classification problems.

- It requires less amount of training data.

- It can make probabilistic predictions and can handle both continuous and discrete data.

- Classes must be mutually exclusive.

- Presence of dependency between attributes negatively affects the classification performance.

- It assumes the normal distribution of numeric attributes.

Random forest (RF)

- Lower chance of variance and overfitting of training data compared to DT, since RF takes the average value from the outcomes of its constituent decision trees.

- Empirically, this ensemble-based classifier performs better than its individual base classifiers, i.e., DTs.

- Scales well for large datasets.

- It can provide estimates of what variables or attributes are important in the classification.

- More complex and computationally expensive.

- Number of base classifiers needs to be defined.

- It favours those variables or attributes that can take high number of different values in estimating variable importance.

- Overfitting can occur easily.

Support vector machine (SVM)

- More robust compared to LR

- Can handle multiple feature spaces.

- Less risk of overfitting.

- Performs well in classifying semi-structured or unstructured data, such as texts, images etc.

- Computationally expensive for large and complex datasets.

- Does not perform well if the data have noise.

- The resultant model, weight and impact of variables are often difficult to understand.

- Generic SVM cannot classify more than two classes unless extended.