Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2024 Dec 7.
Published in final edited form as: Nat Biotechnol. 2021 Sep 30;40(2):245–253. doi: 10.1038/s41587-021-01033-z

Differential abundance testing on single-cell data using K-nearest neighbour graphs

Emma Dann 1, Neil C Henderson 2,3, Sarah A Teichmann 1,4, Michael D Morgan 5,6,, John C Marioni 1,5,6,
PMCID: PMC7617075  EMSID: EMS132511  PMID: 34594043

Abstract

Current computational workflows for comparative analyses of single cell datasets typically use discrete clusters as input when testing for differential abundance between experimental conditions. However, clusters do not always provide the appropriate resolution and cannot capture continuous trajectories. Here we present Milo, a scalable statistical framework that performs differential abundance testing by assigning cells to partially overlapping neighbourhoods on a k-nearest neighbour graph. Using simulations and scRNA-seq data, we show that Milo can identify perturbations that are obscured by discretising cells into clusters, that it maintains FDR control across batch effects, and that it outperforms alternative differential abundance testing strategies. Milo identifies the decline of a fate-biased epithelial precursor in the ageing mouse thymus, and identifies perturbations to multiple lineages in human cirrhotic liver. As Milo is based on a cell-cell-similarity structure, it may also be applicable to single-cell data other than scRNA-seq. Milo is provided as an open-source R software package at https://github.com/MarioniLab/miloR.

Introduction

The advent and expansion of high-throughput and high-dimensional single-cell measurements has empowered the discovery of specific cell-state changes associated with disease, development and experimental perturbations. Perturbed cell states can be detected by quantifying shifts in abundance of cell types in response to a biological insult. A common analytical approach for quantitatively identifying such shifts is to ask whether the composition of cells in predefined and discrete clusters differs between experimental conditions [15]. However, assigning single-cells to discrete clusters can be problematic, especially in the context of continuous differentiation, developmental or stimulation trajectories, thus limiting the power and resolution of such differential abundance (DA) testing strategies.

Alternative approaches to perform differential abundance analysis without requiring clusters have been proposed for high-throughput mass cytometry data [6] or for scRNA-seq data [7,8]. However, these have different limitations, where they either do not model variability in cell numbers between replicate samples or are primarily designed for pairwise comparisons. This limits their application to datasets with more complex experimental designs, including continuous covariates (such as age or disease severity) and confounding sources of variation.

To solve these challenges, we have developed a computational framework to perform differential abundance testing without relying on clustering cells into discrete groups. We make use of a common data-structure that is embedded in many single-cell analyses: K-nearest neighbour (KNN) graphs. We model cellular states as overlapping neighbourhoods on such a graph, which are then used as the basis for differential abundance testing. We account for the non-independence of overlapping neighbourhoods by applying a weighted version of the Benjamini-Hochberg (BH) method, where p-values are weighted by the reciprocal of the neighbourhood connectivity; an adaptation to graphs of a previously described strategy to control the spatial False Discovery Rate (FDR) [6].

Our method, which we call Milo, leverages the flexibility of generalized linear models (GLM), thus allowing complex experimental designs [AU: please explain how the use of GLMs enables application of Milo for more complex experimental designs (or remove the sentence)]. Moreover, by modelling cell states as overlapping neighbourhoods, we are able to accurately pinpoint the perturbed cellular states, enabling the identification of the underlying molecular programs. We demonstrate the power of our approach by identifying perturbed cellular states from publicly available datasets in the context of human liver cirrhosis and by uncovering a fate-biased progenitor in the ageing murine thymus. Furthermore, we demonstrate the speed and scalability of our open-source implementation of Milo, and demonstrate its superiority to alternative approaches.

Results

Modelling cell states as neighbourhoods on a KNN graph

We propose to model the differences in the abundance of cell states between experimental conditions using graph neighbourhoods (Fig 1). Our computational approach allows overlapping neighbouring regions, which alleviates the principal pitfalls of using discrete clusters for differential abundance testing. We make use of a refined sampling implementation [9], which leads to high coverage of the graph while simultaneously controlling the number of neighbourhoods that need to be tested. For each neighbourhood we then perform hypothesis testing between biological conditions to identify differentially abundant cell states whilst controlling the FDR across the graph neighbourhoods.

graphic file with name EMS132511-f001.jpg

Our method works on a KNN graph that represents the high-dimensional relationships between single-cells, a common scaffold for many single-cell analyses [14] (Fig 1A; practical guidance for selection of parameters for KNN graph construction are provided in the Supplementary Notes). The first step in our method is to define a set of representative neighbourhoods on the KNN graph, where a neighbourhood is defined as the group of cells that are connected to an index cell by an edge in the graph. Consequently, we need to sample a subset of single-cells to use as neighbourhood indices. Adopting a purely random sampling approach means that the number of neighbourhoods required to sample a fixed proportion of cells scales linearly with the total number of index cells (Supp Fig 1A). This leads to an increased multiple testing burden, with the potential to reduce statistical power. To solve this problem we have implemented a refined sampling scheme (Fig 1A) [9]. Concretely, we perform an initial sparse sampling, without replacement, of single-cells and compute the K nearest neighbors for each sampled cell. We then calculate the median position of each set of nearest neighbors and find the nearest cell to this median position. These adjacent cells become the set of indices from which we compute the final set of neighbourhoods. This procedure has three main advantages: (1) fewer, yet more representative, neighbourhoods are selected, as initial random samplings from dense regions of the KNN graph will often converge to the same index cell (Supp Fig 1A), (2) the representative neighbourhoods include more cells on average (Supp Fig 1B) and (3) neighbourhood selection is more robust across initializations (Supp Fig 1C).

Next, we count the numbers of cells present in each neighbourhood (per experimental sample) and use these for differential abundance testing between conditions. To incorporate complex experimental designs (e.g., the presence of multiple conditions) we test for differences in abundance using a Negative Binomial GLM framework [10,11], normalising for differences in cell numbers across samples [12] (see Methods and Supplementary Note). By doing this, we can borrow information across neighbourhoods, allowing robust estimation of dispersion parameters. We employ a quasi-likelihood F-statistic [13] for comparing different hypotheses, which has been shown to be powerful in single-cell differential expression testing [14]. To account for multiple hypothesis testing we use a weighted FDR procedure [13] that accounts for the spatial overlap of neighbourhoods, building upon an approach initially introduced in Cydar [6]. We adapt this procedure for a KNN graph, and weight each hypothesis test P-value by the reciprocal of the kth nearest neighbour distance.

To illustrate the Milo workflow we generated a simulated trajectory [15] composed of cells sampled from two experimental conditions (‘A’ and ‘B’; Fig 1B). Cells in a defined subpopulation of this trajectory were simulated to be more abundant in the ‘B’ condition (Fig 1B); this region of differential abundance is not defined as a distinct cluster by widely-used clustering algorithms (Supp Fig 2). However, applying Milo to these simulated data specifically detects that this region contains different abundances of cells from the two conditions (Fig 1C-D).

Milo outperforms existing methods for differential abundance testing and controls for false discoveries in the presence of batch effects

To illustrate the power and accuracy of Milo, we tested its performance against alternative methods for differential abundance analysis, simulating regions of differential abundance between 2 conditions (C1 or C2) on real and simulated single-cell datasets. The methods that we compare define regions for DA testing in the high-dimensional single-cell space in different ways (see Methods). Therefore, to provide a fair comparison between them, we reasoned that generating single-cell probabilities and using these to assign cells to a DA region based on a threshold defines a ground truth that does not favour a specific method. To do this we generated a smooth probability for each cell that it was sampled from C1 over a defined region of a KNN graph (hereafter referred to as P(C1)). We then simulate a condition label (C1 or C2) to each single cell using this probability (see Methods). Cells from each condition are then assigned to one of 3 simulated replicates, thus mimicking a balanced experimental design with a minimal number of replicates required to estimate a variance parameter. These datasets with simulated condition labels provide a ground truth against which the performance of differential abundance testing approaches can be compared (Fig 2A). We define differential abundance using an empirical threshold, based on the distribution of P(C1), for each simulated dataset. Specifically, cells with P(C1) > 75th percentile across all cells and simulations for a given topology are assigned to a differentially abundant region (Supp Fig 3A). This choice of threshold does not penalise detection of small shifts in differential abundance, while filtering out false positives due to noise around P(C1)=0.5 (Supp Fig 3B).

graphic file with name EMS132511-f002.jpg

We created 3 simulated datasets, to which we applied the above condition labelling: 3 discrete clusters (2700 cells, Supp Fig 4A), a linear trajectory (7500 cells, Supp Fig 4B), and a branching trajectory (7500 cells, Supp Fig 4C). In addition, to provide a comparison using a more realistic dataset we simulated DA labels on a real dataset based on a single-cell atlas of mouse gastrulation (64018 cells, Fig 2A) [4]. For each dataset we simulated labels varying the location of the DA population in the graph, as well as the maximum P(C1), thus mimicking different DA fold changes. Parameters used to simulate multi-condition datasets for benchmarking are summarized in Supplementary Table 1 and described in detail in the Methods. This framework allows us to evaluate Milo’s performance on a range of KNN graph geometries, including real-world data representing complex developmental trajectories, whilst varying both the size and location of DA regions and effect sizes, representing a range of challenging scenarios for differential abundance analysis.

To benchmark the performance of Milo, we compared the results to two alternative methods for clustering-free differential abundance analysis (Fig 2B, Supp Table 2): Cydar, originally designed to model differential abundance in mass cytometry data [6] and DAseq [7], which uses the structure of single-cell KNN graphs to identify DA regions. In addition, we applied the current standard-of-practise differential abundance analysis for single-cell experiments: graph-clustering followed by differential abundance testing between conditions within clusters. To do this we applied the Louvain clustering algorithm to the same KNN graph as used for Milo, and tested for differential abundance using the same NB-GLM framework employed by Milo. To ensure comparability between methods we used the same reduced dimensional space as the input for all methods and the same parameter values, where these were shared, e.g. the value of K for KNN graph building. Where parameters were specific to a method, we made use of the recommended practise by the method developers to select an appropriate value (Supp Table 3).

Milo detected the simulated DA regions with high sensitivity and maintained FDR control across benchmarking scenarios (Fig 2C, Supp Fig 4). While in specific simulated datasets other methods achieve comparable performance, Milo out-performed all methods on simulated data generated using the real KNN graph. In contrast, DA-seq showed a consistently lower sensitivity, even for large simulated fold changes (Fig 2C, Supp Fig4). In the discrete cluster simulation, we found that multiple methods had an inflated FDR due to compositional biases; this was most evident where the differentially abundant cluster comprised ≥50% of the dataset cells (Supp Fig 4A). Notably, Milo mitigates against these compositional biases using a combination of TMM normalisation and a graph spatial FDR, which was demonstrated by a median FDR ~10% across these specific simulations (Supp Fig 4A).

Although Milo generally outperforms alternative methods, we observed some variability in TPR from Milo when attempting to identify DA populations in different regions of the KNN graph (Fig 2C). We found that while Milo is consistently sensitive to changes at higher fold changes (Supp Fig 5a), the variability in power between simulations from different population centroids is primarily accounted for by the fraction of true positive cells with P(C1) close to the threshold used to define true DA (i.e. P(C1) = 0.6) (Supp Fig 5), rather than by other factors such as the coverage or the size of the DA region (Supp Fig 6).

Burkhardt et al. recently published MELD, a method that quantifies shifts in abundance between conditions over a KNN graph [8]. MELD estimates the probability of observing each cell in each of the experimental conditions, whilst averaging the density over replicates from the same condition, but does not provide any statistical measure of confidence. This hinders the computation of TPR and FDR since DA would be based on arbitrary probability thresholds. Instead, we separately compared the accuracy of the effect size estimates from Milo and MELD relative to the true effect sizes from our mouse gastrulation simulations. We found that MELD consistently underestimates the fold-change of true DA regions (Supp Fig 7A-B). By contrast, the effect size estimates from Milo were unbiased and were more accurate for true DA regions, especially for higher fold changes. Notably, because Milo models sample-to-sample variability, increasing the number of replicates increases the accuracy of the effect estimates (Supp Fig 8).

Technical batch effects have the potential to pose a particular challenge to differential abundance testing methods, including all of those benchmarked above. Moreover, biological sources of variation may confound analyses, such as the co-variation of age, gender and other biological factors with the experimental variable of interest.

To assess the ability of Milo to detect DA between conditions of interest in the presence of confounding factors, we extended our benchmarking using synthetic condition labels by introducing synthetic batch effects with increasing magnitude (see Methods) (Fig 2D). Current best-practice recommendations suggest applying batch correction methods during dataset pre-processing, to obtain a KNN graph embedding where differences due to batch are minimized (see [1618] for a systematic review and benchmarking of different batch correction strategies). We confirm that using an in silico batch correction maintains the sensitivity of all DA methods, comparable to the case of no batch effect, across different batch effect magnitudes (Supp Fig 9; batch effects corrected using fastMNN [19] - further practical considerations on minimizing batch effects are discussed in the Supplementary Note 1.1.2). However, no batch correction algorithm is perfect, and uncorrected or incompletely corrected batch effects may lead to false discoveries in differential abundance analysis. The GLM framework implemented in Milo can model nuisance covariates, enabling a direct adjustment for such batch effects.

To assess the impact of such an adjustment, we tested for differential abundance in the datasets with uncorrected synthetic batch effects, incorporating the batch covariate in the NB-GLM model used in both Milo and for testing on Louvain clusters. We found that Milo was the only method to maintain FDR control across batch effects of increasing magnitude (Fig 2E). Even in the presence of partial confounding between the experimental variable and batch, Milo is still able to identify differential abundance and control FDR, particularly for large effect sizes despite relatively strong batch effects (Supp Fig 10). The inability of methods such as MELD to incorporate the experimental design in the likelihood estimation severely hinders performance in the presence of uncorrected batch effects (Supp Fig 7C-F). By contrast, explicit modelling of the batch effect as a regression term in the Milo GLM maintains the sensitivity of the differential abundance (Supp Fig 11).

In summary, we show that Milo outperforms alternative methods for differential abundance testing across a range of experimental scenarios, including in the presence of residual batch effects.

Milo is fast and scalable

The benchmarking datasets discussed above are fairly typical in size for current single-cell experiments. However, moving forward, the number of cells assayed is likely to increase with advances in experimental sample multiplexing [20,21]. As such, we tested the scalability of the Milo workflow, and profiled the memory usage across multiple steps. For this we ran Milo on 3 published datasets of differing sizes from ~2000 to ~130,000 cells, representing differences in both biological and experimental complexity [24], as well as a dataset of 200,000 simulated single-cells from a linear trajectory (see Methods). Using these 4 data sets we measured the amount of time required to execute the Milo workflow from graph-building through to differential abundance testing (Fig 3A). In parallel, we profiled the amount of memory used across the entire workflow (Fig 3B) and at each defined step (Supp Fig 12). Notably, the amount of time taken increased linearly with the total size of the data set (Fig 3A), which for a large set of 200k cells was less than 60 minutes. Moreover, the total memory usage across all steps of the Milo workflow scaled primarily with the size of the input dataset (Fig 3B), indicating that the complexity and composition of the single-cells largely determines the memory requirements (Supp Fig 12). These memory requirements are within the resources of common desktop computers (i.e. <16GB). This benchmarking analysis demonstrates that Milo is able to perform differential abundance analysis in large and complex datasets at a scale and speed that is feasible on a desktop computer.

graphic file with name EMS132511-f003.jpg

Milo identifies the decline of a fate-biased epithelial precursor in the ageing mouse thymus

To demonstrate the utility of Milo in a real-world setting we applied it to a single-cell RNA-seq dataset of mouse thymic epithelial cells (TEC) sampled across the first year of mouse life, which were previously clustered into 9 distinct TEC subtypes (Fig 4A) [3]. These data, generated using plate-based SMART-seq2, consist of 2327 single-cells equally sampled from mice at 5 different ages: 1, 4, 16, 32 and 52 weeks old (Fig 4B). Moreover, the experimental design included 5 replicate experimental samples of cells for each age. The goal of the study was to identify TEC subtypes that change in frequency during natural ageing.

graphic file with name EMS132511-f004.jpg

To this end, we first constructed a KNN graph (K=21), before assigning cells to 363 neighbourhoods, which were then used to test for differential abundance of TEC states across time. At a 10% FDR, we identified 208 DA neighbourhoods (101 showed a decreased abundance with age, 107 an increased abundance with age) spanning multiple TEC states (Fig 4C). We compared our results to those generated in the original publication, which demonstrated that we were able to identify all previously identified DA states (Fig 4D), including changes in the abundance of the ‘sTEC’ population, which consisted of just 24 cells. Moreover, whilst we recovered the previously reported accumulation of Intertypical TEC with age, we also identified an additional subset of these cells that were depleted with age (Fig 4C-D).

To understand the function of the sub-state of Intertypical TEC identified using Milo we first grouped the DA neighbourhoods with overlapping cells and concordant DA fold-change (Fig 4E, Methods), and then performed marker gene expression identification between the neighbourhood groups corresponding to Intertypical TEC enriched or depleted in younger mice (FDR 10%; Fig 4F). This analysis indicated that the cells from younger mice up-regulated multiple cytokine response genes (e.g. Stat1, Stat4, Aff3), illustrated by the enriched Gene Ontology term GO:0034097 ‘response to cytokine’ (enrichment adjusted p-value=0.047). Cytokine signaling is key to mTEC differentiation [22,23], indicating that these TEC from younger mice might be differentiating more efficiently to the mTEC lineage.

Independent evidence in support of the observation that a subpopulation of Intertypical TEC are depleted with age was previously described by Baran-Gale et al. [3], using a dataset comprised of -90,000 single-cell transcriptomes (profiled with the 10X Genomics platform) coupled with lineage tracing. To further validate the robustness of our observation in the small SMART-seq based dataset, we first integrated the SMART-seq and droplet scRNA-seq datasets using the mutual nearest neighbours (MNN) algorithm [19]. To identify the subpopulations in the droplet scRNA-seq data that are transcriptionally similar to the cells in the neighbourhood groups identified by Milo, we transferred neighbourhood group labels from the SMART-seq data onto the droplet scRNA-seq data set (Fig 4G-I, Supp Fig 13A, Methods). Using these groups, we examined how these subpopulations vary across ages, by calculating the proportions of cells assigned to each neighbourhood group across ages (Fig 4H, Supp Fig 13B). This analysis revealed that the cells transcriptionally similar to the mTEC-biased Intertypical TEC (‘Neighbourhood Group 5’) were indeed depleted in older mice, providing independent validation of our findings in a dataset comprising ~90,000 single-cells (Fig 4I). In summary, these analyses demonstrate the sensitivity of Milo by identifying that a mTEC progenitor state is depleted with age, a finding that was not resolved using clustering approaches.

Milo identifies compositional disorder in cirrhotic human liver

To demonstrate the applicability of our method in multiple biological contexts, we next applied Milo to a dataset of 58358 hepatic cells isolated from 5 healthy and 5 cirrhotic human livers [2]. The original study assigned cells to multiple lineages, including immune, endothelial and mesenchymal cells (Fig 5A-B). A key goal of the study was to ask whether different cell types were differentially abundant between experimental samples taken from healthy and cirrhotic tissue. In the original study, cells from each lineage were sub-clustered and these sub-clusters were interrogated using a Poisson GLM to ask whether there were differential contributions from cirrhotic and healthy donors.

graphic file with name EMS132511-f005.jpg

To explore whether more subtle differences could be detected, we applied Milo, identifying 2677 neighbourhoods spanning the KNN graph, of which 1351 showed evidence of differential abundance (10% FDR; Fig 5C). To assess performance, we compared DA results with those from the compositional analysis performed by Ramachandran et al. [2]. Milo recovered DA neighbourhoods in all clusters identified as differentially abundant between cirrhotic or uninjured tissue in the original study (Fig 5D).

Moreover, Milo identified multiple groups of neighbourhoods within the same pre-defined subclusters that showed opposing directions of differential abundance between the control and cirrhotic liver experimental samples (Fig 5D). In other words, within a sub-cluster, some neighbourhoods were enriched for control experimental samples whilst others were enriched for disease experimental samples. These patterns, exemplified by the T cell (2) and the endothelial (5) compartments were obscured in the previous study due to the reliance on pre-clustering (Fig 5D).

To further explore the biological meaning of these neighbourhoods, we first focused on the hepatic endothelial cells, where we resolved disease specific subpopulations at higher resolution than was possible by clustering-based analysis (Fig 5D). Milo identified a gradient of changes in neighbourhood abundance across this compartment, suggestive of a continuous transition between healthy and diseased cell states (Fig 5E). To identify gene expression signatures associated with this change, we performed differential expression analysis between cells in DA neighbourhoods with positive and negative log fold changes, identifying 788 differentially expressed genes (FDR 5%; Methods) (Fig 5F). In the cirrhosis-enriched neighbourhoods, we recovered over-expression of known markers of scar-associated endothelium, including ACKR1 and PLVAP (Fig 5F) [2]. We also recovered over-expression of genes associated with regulation of leukocyte recruitment, confirming the validated immunomodulatory phenotype displayed by scar-associated tissue (Supp Fig 14A) [24]. In addition, cirrhotic endothelium displayed a down-regulation of genes involved in response to infection, endocytosis and immune complex clearance, including FCN2 and FCN3 (Supp Fig 14B), which has been suggested as an additional component of cirrhosis-associated immune dysfunction [25,26].

Milo also identified strong DA between healthy and cirrhotic cells in lineages that were unexplored in the original study, such as the cholangiocyte compartment (Fig 5D). Cholangiocytes are epithelial cells that line a three-dimensional network of bile ducts known as the biliary tree, and cholangiocyte proliferation can be induced by a broad range of liver injuries, in a process termed the ductular reaction [27]. However, the gene signatures associated with this process in human cirrhosis are largely unexplored. Milo recovered an enrichment of disease-specific cholangiocytes (Supp Fig 15A-B). Performing differential gene expression analysis restricted to this subset, we detected over-expression of genes associated with fibrosis, wound healing and angiogenesis (Supp Fig 15C-D), which has been shown to accompany the ductular reaction [28,29].

These analyses demonstrate the potential of using DA subpopulations detected by Milo to recover known and novel signatures of disease-specific cell states.

Discussion

Given the increasing number of complex single-cell datasets where multiple conditions are assayed [20,21], Milo tackles a key problem: determining sets of cells that are differentially abundant between conditions without relying on pre-existing sets of clusters. Moreover, Milo is fully interoperable with established single-cell analysis workflows and is implemented as an open-source R software package [30] with documentation and tutorials at https://github.com/MarioniLab/miloR.

The definition of neighbourhoods, as implemented in Milo, overcomes the main limitations of standard-of-practice clustering-based DA analysis, whilst using a common data-structure in single-cell analysis - graphs. A strength of our approach is that it is applicable to a wide range of datasets with different topologies, including gradual state transitions, thus removing the need for time-consuming iterative sub-clustering, and identifying subtle differences in differential abundance that would otherwise be obscured (Fig 5D).

Recently, other clustering-free methods have been proposed to detect compositional differences between experimental conditions [7,8]. However, these methods do not exploit the replication structure in the experimental design to account for technical variability between samples. In addition, MELD and DAseq are designed for pairwise comparisons between two biological conditions, and cannot be easily extended to detect differential abundance with complex experimental designs, including continuous variables (age, time points), multifactorial conditions or nuisance covariates. By modelling cell counts with a NB GLM, Milo can incorporate arbitrarily complex experimental designs as demonstrated by our application of Milo to detect compositional changes in the ageing mouse thymus (Fig 4). This flexibility is illustrated in the ability of Milo to account appropriately for batch effects of varying magnitude, controlling false discoveries that is not possible using other DA testing algorithms (Fig 2E).

Although we have addressed several challenges, a number of qualifiers should be considered when performing DA analysis with Milo. First, reliable results from DA testing depend on well-designed single-cell experiments. Biological replicates are required to estimate the negative binomial dispersion parameter for each neighbourhood. Moreover, when considering experimental design, it is vital to avoid complete confounding between technical sources of variation and experimental variables of interest, including the number of cells acquired for each condition. While we have shown that nuisance effects can be minimized by applying batch integration prior to graph building (Supp Fig 9) and by incorporating known confounders in the testing framework (Fig 2E, Supp Fig 11), these strategies can lead to loss of biological signal if the condition of interest and the confounders are strongly correlated. Secondly, cells in a single neighbourhood do not necessarily represent a unique biological subpopulation; a cellular state might span multiple neighbourhoods. Accordingly, we search for marker genes of DA states by aggregating cells in adjacent and concordantly DA neighbourhoods (Fig 4E, 5F). One challenge of this approach is that rare cell states may be represented by a small subset of neighbourhoods, thus making identification of marker genes challenging. To overcome this problem one can either choose a smaller value of K or alternatively construct a graph on cells from a particular lineage of interest. Thirdly, there will be cases where DA analysis on clusters or predefined cell populations will be preferable, for example where differences are apparent in large clusters by visualization, or to compare abundances of pre-annotated cell populations across datasets without requiring integration of single-cells on a common manifold. Alternative methods that model cell type proportions might be more suitable for these applications [31].

Following the generation of reference single-cell atlases for multiple organisms and tissues, an increasing number of studies now focus on quantifying how cell populations are perturbed in disease, aging, and development, using, for example, large scaled pooled CRISPR screens [3234]. We envision that Milo will see use in all of these contexts. Milo may also be applicable to single-cell assays other than scRNA-seq, including multi-omic assays [3539]. Thus, Milo has the potential to facilitate the discovery of fundamental biological and clinically relevant processes across multiple layers of molecular regulation when they are assayed at single-cell resolution.

Methods

Milo

Milo detects sets of cells that are differentially abundant between conditions by modelling counts of cells in neighbourhoods of a KNN graph. The workflow includes the following steps:

(A). Construction of the KNN graph

Milo uses a KNN graph computed based on similarities in gene expression space as a representation of the phenotypic manifold on which cells lie. To construct a KNN graph, we follow best practices in single-cell analysis [40] by re-scaling UMI counts by per-cell sequencing depth, applying log-transformation and projecting the gene expression matrix of M cells onto the d leading principal components (PCs). Then we construct the KNN graph by calculating the Euclidean distance between each cell and its K nearest neighbours in PC space. We provide practical guidance for selection of d and K in the Supplementary Notes.

We assume that the KNN graph is a faithful representation of the single cell phenotypes, where cell-cell similarities are driven by true biological effects, rather than technical/batch effects. While technical covariates can be accounted for in the experimental design of the DA test, we recommend mitigating batch effects between cells from different experimental batches prior to graph building to maximize the power of DA testing. A large number of methods to integrate single-cells from different experimental samples have been reviewed and benchmarked in [1618]. We provide further practical considerations on how to account for batch effects in the Supplementary Notes.

(B). Definition of cell neighbourhoods

We define the neighbourhood niof cell ci as the group of cells that are connected to ci by an edge in the graph. We refer to ci as the index of the neighbourhood. In order to define a representative subset of neighbourhoods that span the whole KNN graph, we implement a previously developed algorithm to sample the index cells in a graph [9,41] (See Supplementary Note 1.1.2 for a detailed description).

(C). Counting cells in neighbourhoods

For each neighbourhood we count the number of cells from each experimental sample, S, constructing an N x S, neighbourhood x experimental sample count matrix.

(D). Testing for differential abundance in neighbourhoods

To test for differential abundance, we analyze neighbourhood counts using the quasi-likelihood (QL) method in edgeR, similarly to the implementation in Cydar [6]. We fit a NB GLM to the counts for each neighbourhood, accounting for different numbers of cells across samples using TMM normalisation [12], and use the QL F-test with a specified contrast to compute a P value for each neighbourhood. Further details of the statistical framework are provided in Supplementary Note 1.1.5.

(E). Controlling the Spatial FDR in neighbourhoods

To control for multiple testing, we adapt the Spatial FDR method introduced by Cydar [6]. The Spatial FDR can be interpreted as the proportion of the union of neighbourhoods that is occupied by false-positive neighbourhoods. To control the spatial FDR in the KNN graph, we apply a weighted version of the Benjamini-Hochberg (BH) method, where P values are weighted by the reciprocal of the neighbourhood connectivity. As a measure of neighbourhood connectivity, we use the Euclidean distance to the k-th nearest neighbour of the index cell for each neighbourhood.

Visualization of DA neighbourhoods

To visualize results from differential analysis on neighbourhoods, we construct an abstracted graph, where nodes represent neighbourhoods and edges represent the number of cells in common between neighbourhoods. The size of nodes represents the number of cells in the neighbourhood. The position of nodes is determined by the position of the sampled index cell in the single-cell UMAP, to allow qualitative comparison with the single cell embedding.

Benchmarking of DA methods

To evaluate methods for DA analysis using a ground truth, we applied a semi-synthetic approach where we simulated condition labels on KNN graphs from real and simulated single-cell datasets.

Benchmarking datasets

for benchmarking using in silico generated datasets (Supp Fig 4) we simulated single-cell data representing different trajectory geometries (1D trajectory, branching trajectory) using the R package dyntoy [15], as well as discrete clusters. For benchmarking on the KNN graph from real data, we downloaded the raw count matrix and the batch corrected PCA matrix for the mouse gastrulation atlas [4] via the R package MouseGastrulationData [42]. We subset the dataset to embryos at developmental stages E7.75 to E8.5 (64018 cells), and used the batch-corrected PCA representation of the data provided in the package for KNN graph construction.

Generation of Condition probability

to assign cells in a KNN graph to two simulated experimental conditions (C1 and C2) and to define a ground-truth for differential abundance, we generate for each cell xi a probability P(C1)i of belonging to condition C1. For datasets representing continuous trajectory geometries (1D trajectory, branching trajectory, mouse gastrulation data), we generation P(C1)i using the following procedure:

  1. We select one cell population q in which to generate the maximum differential abundance between conditions. For the simulated datasets, we use the cell clusters defined by the simulation. For the mouse gastrulation data we use the cell type annotations provided by the publication [4].

  2. We identify the centroid x¯q of the cell population based on average position of cells in q in PC space

  3. For each cell xi, following the approach taken in fuzzy clustering, we calculate a weighted distance from x¯q as:
    wi=1j=1Q(xix¯qxix¯j)2m1

    Where m is a hyper-parameter controlling the strength of membership to the cell population (the higher m, the weaker membership). Unless otherwise stated, we used m = 2.

  4. We use a logit transformation to normalise wi:
    wi=11+eawi

    Where a = 0.5 unless otherwise stated

  5. We then re-scale wi between 0.5 and f, with 0.5 < f < 1, to obtain:
    pi=wimin(wi)max(wi)min(wi)(f0.5)+0.5

    Here P(C1)i=0.5 indicates the absence of DA (equal probability of being in condition C1 or C2), while f indicates the maximum enrichment of condition C1 in population q (the DA effect size).

In the dataset with discrete clusters (Supp Fig 4A) we select one cluster q and assign the same probability of being in condition C1 P(C1)i > 0.5 to all the cells in cluster q. To all other cells we assign P(C1)i = 0.5

Assignment of simulated experimental condition labels

Cells were assigned to one of two condition labels (C1 or C2), by randomly sampling a label for each cell based on the probability P(C1)i. For each condition, cells were then randomly assigned to one of the 3 simulated replicates. This resulted in a total of 6 simulated experimental samples.

Definition of ground-truth for DA testing

To define a region displaying true differential abundance between conditions, we define a probability threshold 0.5 < t < f, and assign to each cell a true DA outcome label oi, based on the simulated probability:

oi=0(notDA)ifP(C1)it

oi=1(enrichedinC1)ifP(C1)i>t

For datasets representing a continuous trajectory geometry, we set t equal to the 75th percentile of the P(C1) distribution for all simulations for each dataset topology (Supp Fig 3). For the cluster dataset we set t = 0.5.

Simulation of batch effects

To recreate a batch effect with an unbalanced experimental design, we randomly assign experimental samples to 2 simulated batches. We simulate batch effects by generating a random 0-centred Gaussian vector of length d and adding the same vector to the PC profile of all cells in the same batch. We simulate batch effects of increasing magnitude by increasing the standard deviation of the gaussian vector (from 0 to 1, steps of 0.25). To demonstrate the effect of in silico batch correction prior to DA analysis (Supp Fig 9), we performed batch correction using the MNN method, as implemented in the R package batchelor by the function fastMNN, using default parameters [19].

Benchmarked methods

We benchmark the performance of Milo against 4 other methods designed to quantify differential abundance in single-cell datasets. We provide details on how each method was run and how we assigned each single cell, i, an outcome label to compare with the true DA label oi:

  1. Louvain: louvain clustering was performed using the function cluster_louvain from the R package igraph [43]. We tested for differential abundance within clusters using a generalized linear model with negative binomial likelihood, using the quasi-likelihood method implemented in edgeR, using TMM normalisation, thereby replicating the testing framework used by Milo on neighbourhoods. FDR correction was performed using the Benjamini-Hochberg procedure. We used 10% FDR as a threshold for significance to assign an outcome label to each cluster. We then assign the same outcome label to all of the cells that are a member of that cluster.

  2. Cydar [6]: here we use Cydar by constructing hyperspheres in PC space and asking whether abundance of cells from different conditions varies in each hypersphere, using the implementation in the Bioconductor package cydar. To select an appropriate value for the radius parameter for each dataset, we examined the distribution of distances from each cell to its nearest neighbours, as recommended by the authors.

  3. DAseq [7]: DAseq computes for each cell a DA score based on the relative prevalence of cells from both biological states in the cell’s neighborhood, using a range of K values. The scores are used as input for a logistic classifier to predict the biological condition of each cell. The method is implemented in https://github.com/KlugerLab/DAseq. To choose a range of K values (k.vector parameter of getDAcells function), we use the same value of K used for DA analysis with Milo and MELD as a lower limit (which represents the smallest number of cells that a user will consider a meaningful region), K=500 as upper limit and step=50, as used by default. Of note, the authors demonstrate that the upper limit has limited impact on the method’s performance. As recommended by the authors, we select as cells showing significant enrichment/depletion cells with absolute DA measure values larger than the maximum DA measure obtained with randomly permuted labels.

  4. Milo: Milo was run as previously described. To convert neighbourhood level outcomes to single-cell level outcomes, we consider the average outcome in neighbourhoods to which a cell belongs.

  5. MELD [8]: MELD estimates the probability density of each sample over a KNN graph, which is then used to quantify the relative likelihood that each cell would be observed in one condition relative to the others. The average probability between all samples in a given condition is taken as the condition probability. We used the functions and tutorials implemented in https://github.com/KrishnaswamyLab/MELD. MELD does not perform any statistical inference, instead the user selects a threshold on the per-cell likelihoods to define a cell as being in a DA region or not.

Details on parameters used for all benchmarking datasets are provided in Supplementary Table 3.

Evaluation metrics

we evaluate method performance by quantifying the True Positive Rate and False Discovery Rate when comparing the predicted single-cell outcomes to the true DA labels oi. To compare Milo and MELD (Supp Fig 7), we converted the simulated probability to a ground-truth log-fold change (LFC), such that LFC = log(P(C1)/(1 − P(C1))). We use the same formula to convert the MELD-estimated single-cell probabilities to an estimated LFC. we converted the simulated ground-truth and MELD-estimated single-cell probabilities to log fold-changes. We computed the mean squared error (MSE) between the ground truth log fold-changes and the estimates generated by Milo and MELD at the neighbourhood index cells.

Scalability analysis

We assessed the scalability of Milo by profiling the time taken to execute the workflow, starting with the KNN graph building step and concluding with the differential abundance testing. We simulated a dataset of 200000 single-cells using the dyntoy package implemented in R [15]. With this large simulation we down-sampled to specific proportions, ranging from 1 to 100%, and recorded the elapsed system time to complete the Milo workflow using the system.time function in R [30]. In addition, we performed an equivalent analysis using the published data-sets included in this manuscript: mouse thymus [3], human liver [2], and mouse gastrulation [4]. All timings are reported in minutes.

To assess the memory usage of the Milo workflow we made use of the Rprof function in R to record the total amount of memory used at each step. We followed the same approach as above, down-sampling simulated and published datasets from 1 to 100% of the total cell numbers. All memory usage is reported in megabytes (MB).

For both the system timing and memory usage we ran the simulated and published datasets down-sampling analyses on a single node of the high performance computing (HPC) cluster at the Cancer Research UK - Cambridge Institute. Each node has 2x Intel Xeon E5-2698 2.20Ghz processors with 40 cores per node and 384GB DDR4 memory; cluster jobs were run using a single core.

Mouse thymus analysis

Single-cell data are available from ArrayExpress (accession E-MTAB-8560), additional meta-data were acquired from Baran-Gale et al. [3] including cluster identity and highly variable genes (HVGs). The dataset consists of 2327 single thymic epithelial cells that passed QC (see [3] for details). Following the preprocessing steps from the original publication, we used log-normalized gene expression values as input, along with 4906 HVGs, to estimate the first 50 principal components using a randomized PCA implemented in the R package irlba, the first 40 of which were used for KNN graph building and UMAP embedding. The refined sampling, using an initial random sampling of 30% of cells, identified 363 neighbourhoods. Differential abundance testing used the mouse age as a linear predictor variable, thus log fold changes are interpreted as the per-week linear change in neighbourhood abundance. Neighbourhood cluster identity was assigned by taking the most abundant cluster identity amongst neighbourhood cells.

Differential expression (DE) testing was performed on cells within neighbourhoods containing a majority of cells from the Intertypical TEC cluster. Neighbourhoods were first aggregated into groups by constructing a neighbourhood adjacency matrix, where the rows and columns represent the graph neighbourhoods, and the elements of the matrix are the number of cells shared between each pair of neighbourhoods. The adjacency matrix elements were then censored at 0 where the number of overlapping cells was < 5, and where the difference in log fold change between neighbourhoods was > 0.1. This adjacency matrix, representing a neighbourhood graph, was then used as input to group neighbourhoods using Louvain clustering. DE testing was performed comparing the log normalized gene expression of neighbourhood cells between the enriched and depleted abundant neighbourhood groups from the larger Intertypical TEC population (i.e. neighbourhood groups 3 & 4 Vs. 5; Fig 4F) using a linear model implemented in the Bioconductor [44,45] package limma [46], using 5% FDR. Gene Ontology Biological Process term analysis was performed on the 407 DE genes (FDR 10%) using the R package enrichR [47].

Droplet and SMART-seq scRNA-seq datasets were integrated using the MNN approach [19] implemented in the batchelor package function fastMNN (k=60). Neighbourhood group labels were transferred from the SMART-seq cells onto the droplet cells using the following procedure: (i) MNNs were identified between the 2 datasets in the integrated space (30 dimensions), (ii) for each cell in the SMART-seq dataset, the neighbourhood group label was transferred onto the corresponding set of 150 MNNs in the droplet scRNA-seq dataset, (iii) the frequency of each transferred neighbourhood group label (Fig 4I, Supp Fig 13B) was then computed in each experimental replicate and age in the droplet scRNA-seq data (n=3 mice per age).

Liver cirrhosis analysis

The dataset including cell type annotations was downloaded from https://datashare.is.ed.ac.uk/handle/10283/3433 (GEO accession: GSE136103) [2]. The dataset comprises 58358 cells, obtained from 5 healthy and 5 cirrhotic liver samples. We followed the pre-processing steps from the original publication, namely dimensionality reduction with PCA was performed on the 3000 top highly variable genes (HVGs), calculated using modelGeneVar and getTopHVGs from the R package scran [48], and the top 11 PCs were retained for KNN graph building and UMAP embedding. Refined sampling identified 2676 neighbourhoods (K=30, initial proportion of sampled cells = 0.05). We run Milo to test for DA between cirrhotic and healthy experimental samples. To assign cell type annotations to neighbourhoods, we take the most frequent annotation between cells in each neighbourhood. Neighbourhoods are generally homogeneous, with an average of 80% of cells belonging to the most abundant cell type label.

For the focused analysis on the endothelial and cholangiocyte lineages, DE testing was performed on the subset of neighbourhoods from the selected lineage. Neighbourhoods displaying significant DA were aggregated into 2 groups based on similarity of log fold change direction. DE testing was performed summing the gene expression counts for each experimental sample and neighbourhood group between the more and less abundant groups using the quasi-likelihood test implemented in edgeR [11], using 5% FDR. GO term analysis was performed on the significant DE genes using the R package clusterProfiler [49].

Extended Data

Extended Data Fig. 1. Benchmarking DA methods on simulated data.

Extended Data Fig. 1

DA analysis performance on KNN graphs from simulated datasets of different topologies: (A) discrete clusters (2700 cells, 3 populations); (B) 1-D linear trajectory (7500 cells, 7 populations); (C) Branching trajectory (7500 cells, 10 populations). Boxplots show the median with interquartile ranges (25–75%); whiskers extend to the largest value no further than 1.5x the interquartile range from the distance from the box, with outlier data points shown beyond this range.

Extended Data Fig. 2. Sensitivity of DA methods to low fold change in abundance.

Extended Data Fig. 2

(A) True positive rate (TPR, top) and false positive rate (FPR, bottom) of DA methods calculated on cells in different bins of P(C1) used to generate condition labels (bin size = 0.05, the number on the x-axis indicates the lower value in the bin). The results for 36 simulations on 2 representative populations (colors) are shown. The filled points indicate the mean of each P(C1) bin. (B) Variability in Milo power is explained by the fraction of true positive cells close to the DA threshold for definition of ground truth. Example distributions of P(C1) for cells detected as true positives (TP) or false negatives (FN) by Milo. Examples for simulations on 2 populations (rows) and 3 simulated fold changes (columns) are shown. (C-D) True Positive Rate (TPR) of DA detection for simulated DA regions of increasing size centred at the same centroid (Erythroid2 (C) and Caudal neuroectoderm (D)). Results for 3 condition simulations per population and fold change are shown.

Extended Data Fig. 3. Comparison of Milo and MELD for abundance fold change estimation.

Extended Data Fig. 3

(A-D) Scatter-plots of the true fold change at the neighbourhood index against the fold change estimated by Milo (A,C) and MELD (B,D), without batch effect (A-B) and with batch effect (magnitude = 0.5) (C-D), where LFC = log(pc’/(1 - pc’)). The neighbourhoods overlapping true DA cells (pc’ greater than the 75% quantile of P(C1) in the mouse gastrulation dataset) are highlighted in red. (E-F) Mean Squared Error (MSE) comparison for MELD and Milo for true negative neighbourhood (E) and true positive neighbourhoods (F), with increasing simulated log-Fold Change and magnitude of batch effect. Each boxplot summarises the results for n=27 simulations. Box plots show the median with interquartile ranges (25–75%); whiskers extend to the largest value no further than 1.5x the interquartile range from the distance from the box, with outlier data points shown beyond this range.

Extended Data Fig. 4. Controlling for batch effects in differential abundance analysis.

Extended Data Fig. 4

(A) In silico batch correction enhances the performance of DA methods in the presence of batch effects: comparison of performance of DA methods with no batch effect, with batch effects of increasing magnitude corrected with MNN, and uncorrected batch effects. Each boxplot summarises results from simulations on n=9 populations. (B) True Positive Rate (TPR, left) and False Discovery Rate (FDR, right) for recovery of cells in simulated DA regions for DA populations with increasing batch effect magnitude on the mouse gastrulation dataset. For each boxplot, results from 8 populations and 3 condition simulations per population are shown (n=24 simulations). Each panel represents a different DA method and a different simulated log-Fold Change. (C) Comparison of Milo performance with (~ batch + condition) or without (~ condition) accounting for the simulated batch in the NB-GLM. For each boxplot, results from 8 populations, simulated fold change > 1.5 and 3 condition simulations per population and fold change are shown (72 simulations per boxplot). In all panels, boxplots show the median with interquartile ranges (25–75%); whiskers extend to the largest value no further than 1.5x the interquartile range from the distance from the box, with outlier data points shown beyond this range.

Supplementary Material

Supplementary Figures and Supplementary Information

Acknowledgments

We thank Shila Ghazanfar for feedback on the method; Natsuhiko Kumasaka for comments on the manuscript; Chenqu Suo, Veronika Kedlian, Rasa Elmentaite, Jan Patrick Pett, Kelvin Tuong and Benjamin Stewart for feedback on the software package; Daniel Burkhardt, Malte Luecken and Wesley Lewis for discussions on benchmarking. JCM acknowledges core funding from EMBL and core funding from Cancer Research UK (C9545/A29580), which supports MDM. ED and SAT acknowledge Wellcome Sanger core funding (WT206194). NCH is supported by a Wellcome Trust Senior Research Fellowship in Clinical Science (ref. 219542/Z/19/Z), Medical Research Council, and a Chan Zuckerberg Initiative Seed Network Grant.

Footnotes

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Author contributions:

E.D., M.D.M. & J.C.M. conceived the method idea. E.D. & M.D.M. developed the method, wrote the code and performed analyses. E.D., M.D.M., S.A.T. & N.C.H. interpreted the results. E.D., M.D.M., S.A.T, N.C.H. & J.C.M. wrote and approved the manuscript. M.D.M. & J.C.M. oversaw the project.

Competing Interests:

Code availability

Milo is implemented as an open-source package in R: https://github.com/MarioniLab/miloR, and is installable from Bioconductor (≥3.13; http://www.bioconductor.org/packages/release/bioc/html/miloR.html). Code used to generate figures and perform analyses can be found at https://github.com/MarioniLab/milo_analysis_2020.

References

  • 1.Kiselev VY, Andrews TS, Hemberg M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nature Reviews Genetics. 2019;20:273–282. doi: 10.1038/s41576-018-0088-9. [DOI] [PubMed] [Google Scholar]
  • 2.Ramachandran P, Dobie R, Wilson-Kanamori JR, Dora EF, Henderson BEP, Luu NT, et al. Resolving the fibrotic niche of human liver cirrhosis at single-cell level. Nature. 2019;575:512–518. doi: 10.1038/s41586-019-1631-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Baran-Gale J, Morgan MD, Maio S, Dhalla F, Calvo-Asensio I, Deadman ME, et al. Ageing compromises mouse thymus function and remodels epithelial cell differentiation. eLife. 2020;9 doi: 10.7554/eLife.56221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Pijuan-Sala B, Griffiths JA, Guibentif C, Hiscock TW, Jawaid W, Calero-Nieto FJ, et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature. 2019;566:490–495. doi: 10.1038/s41586-019-0933-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Haber AL, Biton M, Rogel N, Herbst RH, Shekhar K, Smillie C, et al. A single-cell survey of the small intestinal epithelium. Nature. 2017;551:333–339. doi: 10.1038/nature24489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lun ATL, Richard AC, Marioni JC. Testing for differential abundance in mass cytometry data. Nature Methods. 2017;14:707–709. doi: 10.1038/nmeth.4295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Zhao J, Jaffe A, Li H, Lindenbaum O, Sefik E, Jackson R, et al. Detection of differentially abundant cell subpopulations discriminates biological states in scRNA-seq data. Bioinformatics. 2019 Jul; doi: 10.1073/pnas.2100293118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Burkhardt DB, Stanley JS, Tong A, Perdigoto AL, Gigante SA, Herold KC, et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat Biotechnol. 2021 doi: 10.1038/s41587-020-00803-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gut G, Tadmor MD, Pe’er D, Pelkmans L, Liberali P. Trajectories of cell-cycle progression from fixed cell populations. Nature Methods. 2015;12:951–954. doi: 10.1038/nmeth.3545. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research. 2012;40:4288–4297. doi: 10.1093/nar/gks042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25. doi: 10.1186/gb-2010-11-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Benjamini Y, Hochberg Y. Multiple Hypotheses Testing with Weights. Scandinavian Journal of Statistics. 1997;24:407–418. doi: 10.1111/1467-9469.00072. [DOI] [Google Scholar]
  • 14.Soneson C, Robinson MD. Bias, robustness and scalability in single-cell differential expression analysis. Nature Methods. 2018;15:255–261. doi: 10.1038/nmeth.4612. [DOI] [PubMed] [Google Scholar]
  • 15.Cannoodt R, Saelens W, Deconinck L, Saeys Y. dyngen: a multi-modal simulator for spearheading new single-cell omics analyses. Bioinformatics. 2020 Feb; doi: 10.1038/s41467-021-24152-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Luecken M, Büttner M, Chaichoompu K, Danese A, Interlandi M, Mueller M, et al. Benchmarking atlas-level data integration in single-cell genomics. Bioinformatics. 2020 May; doi: 10.1038/s41592-021-01336-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Tran HTN, Ang KS, Chevrier M, Zhang X, Lee NYS, Goh M, et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 2020;21:12. doi: 10.1186/s13059-019-1850-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chazarra-Gil R, van Dongen S, Kiselev VY, Hemberg M. Flexible comparison of batch correction methods for single-cell RNA-seq using BatchBench. Bioinformatics. 2020 May; doi: 10.1093/nar/gkab004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Haghverdi L, Lun ATL, Morgan MD, Marioni JC. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nature Biotechnology. 2018;36:421–427. doi: 10.1038/nbt.4091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Stoeckius M, Zheng S, Houck-Loomis B, Hao S, Yeung BZ, Mauck WM, et al. Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome Biology. 2018;19 doi: 10.1186/s13059-018-1603-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.McGinnis CS, Patterson DM, Winkler J, Conrad DN, Hein MY, Srivastava V, et al. MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. Nature Methods. 2019;16:619–626. doi: 10.1038/s41592-019-0433-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Akiyama T, Shimo Y, Yanai H, Qin J, Ohshima D, Maruyama Y, et al. The Tumor Necrosis Factor Family Receptors RANK and CD40 Cooperatively Establish the Thymic Medullary Microenvironment and Self-Tolerance. Immunity. 2008;29:423–437. doi: 10.1016/j.immuni.2008.06.015. [DOI] [PubMed] [Google Scholar]
  • 23.Hikosaka Y, Nitta T, Ohigashi I, Yano K, Ishimaru N, Hayashi Y, et al. The Cytokine RANKL Produced by Positively Selected Thymocytes Fosters Medullary Thymic Epithelial Cells that Express Autoimmune Regulator. Immunity. 2008;29:438–450. doi: 10.1016/j.immuni.2008.06.018. [DOI] [PubMed] [Google Scholar]
  • 24.Wilkinson AL, Qurashi M, Shetty S. The Role of Sinusoidal Endothelial Cells in the Axis of Inflammation and Cancer Within the Liver. Frontiers in Physiology. 2020;11 doi: 10.3389/fphys.2020.00990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Foldi I, Tornai T, Tornai D, Sipeki N, Vitalis Z, Tornai I, et al. Lectin-complement pathway molecules are decreased in patients with cirrhosis and constitute the risk of bacterial infections. Liver International. 2017;37:1023–1031. doi: 10.1111/liv.13368. [DOI] [PubMed] [Google Scholar]
  • 26.Ganesan LP, Kim J, Wu Y, Mohanty S, Phillips GS, Birmingham DJ, et al. FcγRIIb on Liver Sinusoidal Endothelium Clears Small Immune Complexes. The Journal of Immunology. 2012;189:4981–4988. doi: 10.4049/jimmunol.1202017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Sato K, Marzioni M, Meng F, Francis H, Glaser S, Alpini G. Ductular Reaction in Liver Diseases: Pathological Mechanisms and Translational Significances: Liver Injury and Regeneration. Hepatology. 2019;69:420–430. doi: 10.1002/hep.30150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Morell CM, Fabris L, Strazzabosco M. Vascular biology of the biliary epithelium: Biliary epithelium vascular biology. J Gastroenterol Hepatol. 2013;28:26–32. doi: 10.1111/jgh.12022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Mariotti V, Fiorotto R, Cadamuro M, Fabris L, Strazzabosco M. New insights on the role of vascular endothelial growth factor in biliary pathophysiology. JHEP Reports. 2021;3:100251. doi: 10.1016/j.jhepr.2021.100251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2017. Available: https://www.R-projectorg. [Google Scholar]
  • 31.Büttner M, Ostner J, Müller Cl, Theis Fj, Schubert B. scCODA: A Bayesian model for compositional single-cell data analysis. Bioinformatics. 2020 Dec; doi: 10.1038/s41467-021-27150-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Dixit A, Parnas O, Li B, Chen J, Fulco CP, Jerby-Arnon L, et al. Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell. 2016;167:1853–1866.:e17. doi: 10.1016/j.cell.2016.11.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Datlinger P, Rendeiro AF, Schmidl C, Krausgruber T, Traxler P, Klughammer J, et al. Pooled CRISPR screening with single-cell transcriptome readout. Nature Methods. 2017;14:297–301. doi: 10.1038/nmeth.4177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Jaitin DA, Weiner A, Yofe I, Lara-Astiaso D, Keren-Shaul H, David E, et al. Dissecting Immune Circuits by Linking CRISPR-Pooled Screens with Single-Cell RNA-Seq. Cell. 2016;167:1883–1896.:e15. doi: 10.1016/j.cell.2016.11.039. [DOI] [PubMed] [Google Scholar]
  • 35.Stoeckius M, Hafemeister C, Stephenson W, Houck-Loomis B, Chattopadhyay PK, Swerdlow H, et al. Simultaneous epitope and transcriptome measurement in single cells. Nature Methods. 2017;14:865–868. doi: 10.1038/nmeth.4380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Cao J, Cusanovich DA, Ramani V, Aghamirzaie D, Pliner HA, Hill AJ, et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science. 2018;361:1380–1385. doi: 10.1126/science.aau0730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Chen S, Lake BB, Zhang K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nature Biotechnology. 2019;37:1452–1457. doi: 10.1038/s41587-019-0290-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Zhu C, Yu M, Huang H, Juric I, Abnousi A, Hu R, et al. An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nature Structural & Molecular Biology. 2019;26:1063–1070. doi: 10.1038/s41594-019-0323-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Ma S, Zhang B, LaFave LM, Earl AS, Chiang Z, Hu Y, et al. Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin. Cell. 2020;183:1103–1116.:e20. doi: 10.1016/j.cell.2020.09.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Luecken MD, Theis FJ. Current best practices in single‐cell RNA‐seq analysis: a tutorial. Mol Syst Biol. 2019;15 doi: 10.15252/msb.20188746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Setty M, Tadmor MD, Reich-Zeliger S, Angel O, Salame TM, Kathail P, et al. Wishbone identifies bifurcating developmental trajectories from single-cell data. Nature Biotechnology. 2016;34:637–645. doi: 10.1038/nbt.3569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Griffiths J, Lun A. MouseGastrulationData: Single-Cell Transcriptomics Data across Mouse Gastrulation and Early Organogenesis. 2020. Available: https://github.com/MarioniLab/MouseGastrulationData.
  • 43.Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal, Complex Systems. 2006;1695:1–9. [Google Scholar]
  • 44.Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nature Methods. 2015;12:115–121. doi: 10.1038/nmeth.3252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biology. 2004;5:R80. doi: 10.1186/gb-2004-5-10-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research. 2015;43:e47–e47. doi: 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research. 2016;44:W90–W97. doi: 10.1093/nar/gkw377. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Lun ATL, McCarthy DJ, Marioni JC. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Research. 2016;5:2122. doi: 10.12688/f1000research.9501.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Yu G, Wang L-G, Han Y, He Q-Y. clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. OMICS: A Journal of Integrative Biology. 2012;16:284–287. doi: 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Figures and Supplementary Information

Data Availability Statement

Milo is implemented as an open-source package in R: https://github.com/MarioniLab/miloR, and is installable from Bioconductor (≥3.13; http://www.bioconductor.org/packages/release/bioc/html/miloR.html). Code used to generate figures and perform analyses can be found at https://github.com/MarioniLab/milo_analysis_2020.

RESOURCES