Skip to main content
. 2022 Jun 9;2022:1883698. doi: 10.1155/2022/1883698

Table 3.

Benefits and limitations of the machine and deep learning model.

Model Benefits Limitations
Naïve Bayes It needs less training data; probabilistic approach handles continuous and discrete data; and it is not sensitive to irrelevant features, easily updatable Data scarcity can lead to loss of accuracy because it is based on assumption that any two features are independent given the output class.

SVM It is possible to apply to unstructured data also such as text, images, and so on; kernel provides strength to the algorithm and can work for high-dimensional data It needs long training time on large data sets and is difficult to choose good kernel function, and choosing key parameters varies from problem to problem.

KNN It can be implemented for classification and regression problems and produces the best results if large training data is available or even noisy training data, preferred for multi-class problems Cost is high for computing distance for each instance; finding attributes for distance-based learning is quite a difficult task; imbalanced data causes problems; and no treatment is required for missing value.

Decision tree It reduces ambiguity in decision-making; implicitly performs feature selection, easy representation, and interpretation; and requires fewer efforts for data preparation It is unstable due to the effect of changes in data requires changes in the whole structure, is not suitable for continuous values, and causes overfitting problem.

Boosted decision tree It is highly interpretable and prediction accuracy is improved. It can model feature interactions and execute feature selection on its own. Gradient boosted trees are less prone to overfitting since they are trained on a randomly selected subset of the training data. These are computationally expensive and frequently need a large number of trees (>1,000), which can take a long time and consume a lot of memory.

Random forest In contrast to other methods, clusters of decision trees are very easy to train, and the preparation and preprocessing of the input data do not require. More trees in random forests increase the time complexity in the prediction stage, and high chances of overfitting occur.

CNN It provides fast predictions, is best suited for a large volume of data, and requires no human efforts for feature design. Computationally expensive requires a large data set for training.

RNN It implements feedback model so considers best for time series problems and makes accurate predictions than other ANN models. Training of model is difficult and takes a long time to find nonlinearity in data, and gradient vanishing problem occurs.

LSTM, Bi-LSTM Adds short- and long-term memory components into RNN so it considers best for applications that have a sequence and uses for solving NLP problems such as text classification and text generation, and computing speed is high. Bi-LSTM solves the issue of predicting fixed sequence to sequence. It is expensive and complex due to the backpropagation model, increases the dimensionality of the problem, and makes it harder to find the optimal solution. Since Bi-LSTM has double LSTM cells so it is expensive to implement.

Gated RNN (GRU) In natural language processing, GRUs learn quicker and perform better than LSTMs on less training data. As it requires fewer training parameters. GRUs are simpler and hence easier to modify and do not need memory units, such as by adding extra gates if the network requires more input. Slow convergence and limited learning efficiency are still issues with GRU.

Transformer with an attention mechanism The issue with RNNs and CNNs is that when sentences are too long, they are not able to keep up with context and content. By paying attention to the word that is currently being operated on this limitation was resolved, the attention strategy is an effort to selectively concentrate on a few important items while avoiding those in deep neural networks to execute the same operation, enabling much more parallelization than RNNs and thus reduces training times. At inference time, it is strongly compute-intensive.