. Author manuscript; available in PMC: 2020 Feb 20.

Published in final edited form as: Am J Transplant. 2019 Mar 20;19(5):1278–1287. doi: 10.1111/ajt.15316

TABLE 2.

Single-cell immune analysis computational methodology

Method	Output	Key hyperparameters	Advantages	Disadvantages	Ref.
Dimensionality reduction	Lower-dimensional representation of original data		Visualization of high dimensional data, discovery of subsets of data	Potential information loss
Principal Component Analysis (PCA)	Original data on new axes where axes are linear combinations of original dimensions		Well-established, easy to interpret, fast, consistent results across applications on the same data	Misses nonlinear patterns in data	⁴⁴
T-distributed Stochastic Neighbor Embedding (t-SNE)	Original data on new axes where axes have no inherent interpretation	Effective number of nearest neighbors (Perplexity) Cycles before algorithm is considered done (Iterations)	Discovery of nonlinear patterns	Difficult to interpret axes, slow, repeat applications produce different results, requires downsampling	²⁹
Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP)	Original data on new axes where axes have no inherent interpretation	Minimum distance between neighbors in new space Number of neighbors	Discovery of nonlinear patterns, fast, does not require downsampling	Difficult to interpret, repeat applications produce different results	^30,31
Clustering	Algorithmically-determined groupings of data-points	Distance Metric (how to assign distance between two points)	Unbiased discovery of potentially biologically meaningful groups of data points
Hierarchical clustering	Data points organized into a tree structure	How distance between clusters is determined (Linkage)	Easily observe multilevel clustering	Determining where to cut tree to produce clusters can be difficult
k-means	k clusters of original data	Number of clusters (k)	Fast, well-established	Need to specify number of clusters beforehand, cannot find clusters that are not simple spheres or ellipses
Density-based spatial clustering of applications with noise (DBSCAN)	Clusters of original data	Min number of points to call a region dense Radius of point’s neighborhood (Epsilon)	No need to specify number of clusters, can find	Many data points may be classified as “noise” or one large cluster depending on hyperparameters	⁴⁵
Repertoire analysis					³⁸
Diversity	Measure of clonal diversity	Choice of diversity metric (Gini, Entropy, Chao1, Hill, etc.)	Provides a single diversity metric for a sample or population of cells, can be compared across samples and conditions	Can be difficult to interpret intuitively, sensitive to number of samples
Sequence distance	Distance between two TCR or BCR sequences	Choice of distance metric (Levenshtein, etc.)	Distances can be used in downstream applications like clustering or dimensionality reduction for visualization	Distances might not be biologically meaningful
Motif enrichment	Significant sequence motifs	Choice of algorithm (GLIPH, etc.)	Discovery of motifs that may confer specificity	May miss larger motifs depending on hyperparameter choices
Phylogenetics	BCR clonal family trees	Evolutionary model for amino acid mutation	Can infer lineages and branching points during affinity maturation	Can be sensitive to hyperparameters, methods typically optimized for traditional evolutionary models

TCR, T cell receptor; BCR, B cell receptor; GLIPH, Grouping of Lymphocyte Interactions by Paratope Hotspots.