. Author manuscript; available in PMC: 2020 Oct 1.

Published in final edited form as: Mol Phylogenet Evol. 2019 Jul 16;139:106562. doi: 10.1016/j.ympev.2019.106562

Table 1.

Comparison of unsupervised machine learning methods used in this study.

Method	Purpose	Approach used	General algorithm	Relevant output

Random Forest (RF)	Classification and regression	Supervised Unsupervised	Ensemble method that grows many classification trees based on training data, runs input data down trees, and the classification with the most votes is chosen.	Proximity matrix
Variational Autoencoder (VAE)	Generative model	Unsupervised	Compresses data through multiple encoding layers into latent variables, then un-compresses latent variables through multiple decoder layers into reconstructed data. Learns the marginal likelihood distribution of the data using latent variables.	Latent variables (two-dimensional encoding)
t-Distributed Stochastic Neighbor Embedding (t-SNE)	Data embedding and visualization	Unsupervised	Constructs probability distribution of sample pairs, then minimizes divergence between high dimensional space and low dimension embedding, such that similar pairs are embedded nearby while dissimilar pairs are repelled.	Low dimensional embedding