Table 1.
Algorithm | Potential applications | Advantages | Limitations | Accuracy | |
---|---|---|---|---|---|
| |||||
Test | Cross-validation | ||||
Bayesian Network | Classification of documents, medical prognosis system | Takes less time to train the model and can interpret the relationship among predictors. | Cannot deal handle high dimensional data and efficiency of the model decreases with the increase of data. | 0.73 | 0.77 |
Logistic Regression | Crash types, and injury severity, handle the nonlinearity in data. | The model is capable of handling nonlinear data and interprets the output as probability. | Suffer multicollinearity and require large data to provide stable results. | 0.75 | 0.79 |
Random Forests | Identifies a cluster of patients, object detection, and classification of microarray data. | Scalable, fast, robust to noise, does not overfit and provides explanation and visualization of the output. | As the number of trees increases the algorithm slows down. | 0.77 | 0.82 |
SVM | Text classification | High accuracy, does not overfit, accuracy and performance of the model is independent of features, excellent generalizability, and | Slow training speed, highly complex model, and the performance of the model highly depends upon the selected parameters | 0.76 | 0.82 |
k-NN | Vision and computational geometry | Suitable for multi-modal classes, the model is independent of the joint distribution of sample points | Low efficiency, output depends upon the selection of the K value, the model is adversely affected by the noise and irrelevant features, and performance varies according to the size of the data set. | 0.63 | 0.81 |
Neural Networks | Image classification | Deals with the relationship which may be either nonlinear or dynamic, independent of variables, robust to irrelevant input or noise, and used for | Takes time to train, performance is sensitive to the chosen parameters and the size of hidden layers. | 0.74 | 0.81 |
The difference between cross-validation and test accuracy demonstrates the degree of over-fitting implicit in each model (Singh et al., 2016).