Skip to main content
. Author manuscript; available in PMC: 2020 Feb 20.
Published in final edited form as: Am J Transplant. 2019 Mar 20;19(5):1278–1287. doi: 10.1111/ajt.15316

TABLE 2.

Single-cell immune analysis computational methodology

Method Output Key hyperparameters Advantages Disadvantages Ref.
Dimensionality reduction Lower-dimensional representation of original data Visualization of high dimensional data, discovery of subsets of data Potential information loss
 Principal Component Analysis (PCA) Original data on new axes where axes are linear combinations of original dimensions Well-established, easy to interpret, fast, consistent results across applications on the same data Misses nonlinear patterns in data 44
 T-distributed Stochastic Neighbor Embedding (t-SNE) Original data on new axes where axes have no inherent interpretation Effective number of nearest neighbors (Perplexity)
 Cycles before algorithm is considered done (Iterations)
Discovery of nonlinear patterns Difficult to interpret axes, slow, repeat applications produce different results, requires downsampling 29
 Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) Original data on new axes where axes have no inherent interpretation Minimum distance between neighbors in new space
 Number of neighbors
Discovery of nonlinear patterns, fast, does not require downsampling Difficult to interpret, repeat applications produce different results 30,31
Clustering Algorithmically-determined groupings of data-points Distance Metric (how to assign distance between two points) Unbiased discovery of potentially biologically meaningful groups of data points
 Hierarchical clustering Data points organized into a tree structure How distance between clusters is determined (Linkage) Easily observe multilevel clustering Determining where to cut tree to produce clusters can be difficult
 k-means k clusters of original data Number of clusters (k) Fast, well-established Need to specify number of clusters beforehand, cannot find clusters that are not simple spheres or ellipses
 Density-based spatial clustering of applications with noise (DBSCAN) Clusters of original data Min number of points to call a region dense
 Radius of point’s neighborhood (Epsilon)
No need to specify number of clusters, can find Many data points may be classified as “noise” or one large cluster depending on hyperparameters 45
Repertoire analysis 38
 Diversity Measure of clonal diversity Choice of diversity metric (Gini, Entropy, Chao1, Hill, etc.) Provides a single diversity metric for a sample or population of cells, can be compared across samples and conditions Can be difficult to interpret intuitively, sensitive to number of samples
 Sequence distance Distance between two TCR or BCR sequences Choice of distance metric (Levenshtein, etc.) Distances can be used in downstream applications like clustering or dimensionality reduction for visualization Distances might not be biologically meaningful
 Motif enrichment Significant sequence motifs Choice of algorithm (GLIPH, etc.) Discovery of motifs that may confer specificity May miss larger motifs depending on hyperparameter choices
 Phylogenetics BCR clonal family trees Evolutionary model for amino acid mutation Can infer lineages and branching points during affinity maturation Can be sensitive to hyperparameters, methods typically optimized for traditional evolutionary models

TCR, T cell receptor; BCR, B cell receptor; GLIPH, Grouping of Lymphocyte Interactions by Paratope Hotspots.