Skip to main content
. 2023 Dec 21;15(2):e02050-23. doi: 10.1128/mbio.02050-23

TABLE 3.

Examples of machine learning models used in microbial ecology

Model name Purpose/description Use case/limitations
Support vector machines Segregate the data into two groups on either side of a linear boundary, maximizing the distance between them. Despite the linear nature of SVMs, they can be used on data that is not linearly-separable SVMs need to be scaled to prevent outliers from unduly impacting the distance metric and tend to perform poorly on noisy data and big data.
Naïve Bayes classifiers Classifiers that use posterior probabilities based on Bayes theorem Fast even on high-dimensional data, and less sensitive to scaling. Assumes independence of features.
Linear/logistic regression Supervised probabilistic model for binary classification Easy to interpret and simple to implement but often perform poorly with non-linear classification and non-normal data.
Random forest An ensemble method that relies on the majority consensus of resampled decision trees Resilient to overfitting, less dependent on scaling or normalization procedures, cannot predict values outside of training data in regression problems
XGboost An ensemble of decision trees that are optimized for minimization of the loss function using gradient descent More reliable than random forests for unbalanced classes and preferable to random forests when the aim is to decrease bias