Decision Tree
(Quinlan 1986) |
easy to understand and efficient training algorithm
order of training instances has no effect on training
pruning can deal with the problem of overfitting
|
classes must be mutually exclusive
final decision tree dependent upon order of attribute selection
errors in training set can result in overly complex decision trees
missing values for an attribute make it unclear about which branch to take when that attribute is tested
|
Naïve Bayes
(Langley et al 1992) |
foundation based on statistical modelling
easy to understand and efficient training algorithm
order of training instances has no effect on training
useful across multiple domains
|
assumes attributes are statistically independent*
assumes normal distribution on numeric attributes
classes must be mutually exclusive
redundant attributes mislead classification
attribute and class frequencies affect accuracy
|
k-Nearest Neighbour
(Patrick & Fischer 1970; Aha 1992) |
fast classification of instances
useful for non-linear classification problems
robust with respect to irrelevant or novel attributes
tolerant of noisy instances or instances with missing attribute values
can be used for both regression and classification
|
slower to update concept description
assumes that instances with similar attributes will have similar classifications
assumes that attributes will be equally relevant
too computationally complex as number of attributes increases
|
Neural Network
(Rummelhart et al 1986) |
can be used for classification or regression
able to represent Boolean functions (AND, OR, NOT)
tolerant of noisy inputs
instances can be classified by more than one output
|
difficult to understand structure of algorithm
too many attributes can result in overfitting
optimal network structure can only be determined by experimentation
|
Support Vector Machine
(Vapnik 1982; Russell and Norvig, p 749–52) |
models nonlinear class boundaries
overfitting is unlikely to occur
computational complexity reduced to quadratic optimization problem
easy to control complexity of decision rule and frequency of error
|
training is slow compared to Bayes and Decision Trees
difficult to determine optimal parameters when training data is not linearly separable
difficult to understand structure of algorithm
|
Genetic Algorithm
(Holland 1975) |
simple algorithm, easy to implement
can be used in feature classification and feature selection
primarily used in optimization
always finds a “good” solution (not always the best solution)
|
computation ordevelopment of scoring function is non trivial
not the most efficient method to find some optima, tends to find local optima rather than global
complications involved in the representation of training/output data
|