Skip to main content
. 2007 Feb 11;2:59–77.

Table 1.

Summary of benefits, assumptions and limitations of different machine learning algorithms

Machine Learning Algorithm Benefits Assumptions and/or Limitations
Decision Tree
(Quinlan 1986)
  • easy to understand and efficient training algorithm

  • order of training instances has no effect on training

  • pruning can deal with the problem of overfitting

  • classes must be mutually exclusive

  • final decision tree dependent upon order of attribute selection

  • errors in training set can result in overly complex decision trees

  • missing values for an attribute make it unclear about which branch to take when that attribute is tested

Naïve Bayes
(Langley et al 1992)
  • foundation based on statistical modelling

  • easy to understand and efficient training algorithm

  • order of training instances has no effect on training

  • useful across multiple domains

  • assumes attributes are statistically independent*

  • assumes normal distribution on numeric attributes

  • classes must be mutually exclusive

  • redundant attributes mislead classification

  • attribute and class frequencies affect accuracy

k-Nearest Neighbour
(Patrick & Fischer 1970; Aha 1992)
  • fast classification of instances

  • useful for non-linear classification problems

  • robust with respect to irrelevant or novel attributes

  • tolerant of noisy instances or instances with missing attribute values

  • can be used for both regression and classification

  • slower to update concept description

  • assumes that instances with similar attributes will have similar classifications

  • assumes that attributes will be equally relevant

  • too computationally complex as number of attributes increases

Neural Network
(Rummelhart et al 1986)
  • can be used for classification or regression

  • able to represent Boolean functions (AND, OR, NOT)

  • tolerant of noisy inputs

  • instances can be classified by more than one output

  • difficult to understand structure of algorithm

  • too many attributes can result in overfitting

  • optimal network structure can only be determined by experimentation

Support Vector Machine
(Vapnik 1982; Russell and Norvig, p 749–52)
  • models nonlinear class boundaries

  • overfitting is unlikely to occur

  • computational complexity reduced to quadratic optimization problem

  • easy to control complexity of decision rule and frequency of error

  • training is slow compared to Bayes and Decision Trees

  • difficult to determine optimal parameters when training data is not linearly separable

  • difficult to understand structure of algorithm

Genetic Algorithm
(Holland 1975)
  • simple algorithm, easy to implement

  • can be used in feature classification and feature selection

  • primarily used in optimization

  • always finds a “good” solution (not always the best solution)

  • computation ordevelopment of scoring function is non trivial

  • not the most efficient method to find some optima, tends to find local optima rather than global

  • complications involved in the representation of training/output data