. 2023 Dec 21;15(2):e02050-23. doi: 10.1128/mbio.02050-23

TABLE 3.

Examples of machine learning models used in microbial ecology

Model name	Purpose/description	Use case/limitations
Support vector machines	Segregate the data into two groups on either side of a linear boundary, maximizing the distance between them. Despite the linear nature of SVMs, they can be used on data that is not linearly-separable	SVMs need to be scaled to prevent outliers from unduly impacting the distance metric and tend to perform poorly on noisy data and big data.
Naïve Bayes classifiers	Classifiers that use posterior probabilities based on Bayes theorem	Fast even on high-dimensional data, and less sensitive to scaling. Assumes independence of features.
Linear/logistic regression	Supervised probabilistic model for binary classification	Easy to interpret and simple to implement but often perform poorly with non-linear classification and non-normal data.
Random forest	An ensemble method that relies on the majority consensus of resampled decision trees	Resilient to overfitting, less dependent on scaling or normalization procedures, cannot predict values outside of training data in regression problems
XGboost	An ensemble of decision trees that are optimized for minimization of the loss function using gradient descent	More reliable than random forests for unbalanced classes and preferable to random forests when the aim is to decrease bias