Skip to main content
. Author manuscript; available in PMC: 2020 Oct 1.
Published in final edited form as: Mol Phylogenet Evol. 2019 Jul 16;139:106562. doi: 10.1016/j.ympev.2019.106562

Table 1.

Comparison of unsupervised machine learning methods used in this study.

Method Purpose Approach
used
General algorithm Relevant
output

Random Forest (RF) Classification and regression Supervised Unsupervised Ensemble method that grows many classification trees based on training data, runs input data down trees, and the classification with the most votes is chosen. Proximity matrix
Variational Autoencoder (VAE) Generative model Unsupervised Compresses data through multiple encoding layers into latent variables, then un-compresses latent variables through multiple decoder layers into reconstructed data. Learns the marginal likelihood distribution of the data using latent variables. Latent variables (two-dimensional encoding)
t-Distributed Stochastic Neighbor Embedding (t-SNE) Data embedding and visualization Unsupervised Constructs probability distribution of sample pairs, then minimizes divergence between high dimensional space and low dimension embedding, such that similar pairs are embedded nearby while dissimilar pairs are repelled. Low dimensional embedding