Skip to main content
. 2021 Mar 16;45(5):fuab015. doi: 10.1093/femsre/fuab015

Table 3.

Publically available R packages for machine learning.

Algorithm Name R package/function
Decision treea rpart
Random forestb randomForest
k-Nearest Neighbour Classifier knn R function contained in Class package.
Naive Bayes Classifier naiveBayes R function contained in e1071 package
Neural network nnet R function contained in nnet package
Support vector machinec ksvm R function contained in the kernlab package
t-Distributed Stochastic Neighbour Embedding (t-SNE) Rtsne R package—an R implementation of the t-SNE dimensionality reduction procedure
GLM (generalized linear model)-based Ordination Method for Microbiome Samples (GOMMS) gomms R package—an R implementation of the GOMMS ordination reduction method
Agglomerative nested (AGNES) clustering and other clustering methods The R packages agnes and cluster are R implementations of various popular clustering methods. Additionally, hclust which is part of the core R stats package includes some implementations of popular clustering procedures.
a

Notable arguments for the rpart function are method = class to build classification model and parms = list(split = ‘information’) to use an information gain formula for deciding between alternative splits (a different formula that can be used is based on the Gini index of diversity).

b

randomForest function allows a user to vary either or both the number of decision trees and the number of variables to try at each split in the multiple decision trees. There is an extractor function called importance contained in the randomForest package that measures variable importance with respect to a generated RF model. The average out-of-bag (OOB) estimate of the error can be calculated for multiple runs. The error is an indication that when the resulting model is applied to new data, the classification predictions are expected to be in error by a similar amount.

c

Package provides several kernels (e.g. Radial Basis ‘Gaussian’, polynomial, linear, hyperbolic tangent, Laplacian, Bessel, ANOVA RBF and spline) that transform the data into high-dimensional feature space. There are also several model types (e.g. C, nu and bound-constraint classifications), which determine the hyperplane.