Schematic demonstration of DA-seq. (A) Illustration of the DA-seq algorithm. DA-seq detects DA subpopulations by analyzing cells from two biological states. The input of the algorithm is the union of data from two states after initial dimension reduction. Step 1: Computing a multiscale score vector, based on the -nearest neighbors (NN) of each cell, for several values of (e.g., ). Step 2: Training a logistic classifier to predict the biological state of each cell based on the multiscale score to obtain a single DA measure. The algorithm retains only cells for which the DA measure is above a threshold or below and hence may reside in DA subpopulations. Step 3: Clustering the cells retained in step 2 to obtain contiguous DA subpopulations above a predefined size. These subpopulations are denoted , , and . The degree of their differential abundance is quantified by a DA score (SI Appendix, Note 1). Step 4: Detect subsets of genes that characterize each of the DA subpopulations. For example, the genes G7 and G8 characterize . (B) Standard clustering analysis vs. DA-seq. (Left) Cluster information obtained through standard clustering analysis. (Center) DA subpopulations identified through DA-seq. (Right) Normalized differential abundance of DA subpopulations and clusters, represented by DA score.