BFF and cellhashR: analysis tools for accurate demultiplexing of cell hashing data

Gregory J Boggy; G W McElfresh; Eisa Mahyari; Abigail B Ventura; Scott G Hansen; Louis J Picker; Benjamin N Bimber

doi:10.1093/bioinformatics/btac213

. 2022 Apr 8;38(10):2791–2801. doi: 10.1093/bioinformatics/btac213

BFF and cellhashR: analysis tools for accurate demultiplexing of cell hashing data

Gregory J Boggy ¹, G W McElfresh ², Eisa Mahyari ³, Abigail B Ventura ⁴, Scott G Hansen ⁵, Louis J Picker ⁶, Benjamin N Bimber ^7,^✉

Editor: Anthony Mathelier

PMCID: PMC9113275 PMID: 35561167

Abstract

Motivation

Single-cell sequencing methods provide previously impossible resolution into the transcriptome of individual cells. Cell hashing reduces single-cell sequencing costs by increasing capacity on droplet-based platforms. Cell hashing methods rely on demultiplexing algorithms to accurately classify droplets; however, assumptions underlying these algorithms limit accuracy of demultiplexing, ultimately impacting the quality of single-cell sequencing analyses.

Results

We present Bimodal Flexible Fitting (BFF) demultiplexing algorithms BFF_cluster and BFF_raw, a novel class of algorithms that rely on the single inviolable assumption that barcode count distributions are bimodal. We integrated these and other algorithms into cellhashR, a new R package that provides integrated QC and a single command to execute and compare multiple demultiplexing algorithms. We demonstrate that BFF_cluster demultiplexing is both tunable and insensitive to issues with poorly behaved data that can confound other algorithms. Using two well-characterized reference datasets, we demonstrate that demultiplexing with BFF algorithms is accurate and consistent for both well-behaved and poorly behaved input data.

Availability and implementation

cellhashR is available as an R package at https://github.com/BimberLab/cellhashR. cellhashR version 1.0.3 was used for the analyses in this manuscript and is archived on Zenodo at https://www.doi.org/10.5281/zenodo.6402477.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Single-cell sequencing has revolutionized the study of biology by enabling the analysis of gene expression within individual cells at scale (Svensson et al., 2018). In droplet-based single-cell RNA-seq (scRNA-seq) assays (Klein et al., 2015; Macosko et al., 2015; Zheng et al., 2017), individual cells are mixed with beads that contain barcoded capture oligos and encapsulated in droplets (Salomon et al., 2019). Within the droplets, cDNA synthesis is performed in a massively parallel manner, resulting in barcoded cDNA representing the transcriptome of each cell (Macosko et al., 2015). These barcoded molecules can be pooled and processed in bulk, providing an efficient way to generate RNA-seq libraries from thousands of cells in a single experiment. This approach has been extended to allow simultaneous acquisition of surface protein data (Luo et al., 2020; Peterson et al., 2017; Stoeckius et al., 2017), TCR/BCR clonotype (Canzar et al., 2017; Carter et al., 2019; De Simone et al., 2018; Goldstein et al., 2019; Redmond et al., 2016; Singh et al., 2019) and even chromatin accessibility data (Buenrostro et al., 2015; Cao et al., 2018; Muto et al., 2021; Satpathy et al., 2019; Swanson et al., 2021). While extremely powerful, these technologies have limitations. A primary barrier is expense: each lane of droplet-based methods can cost thousands of dollars (Ziegenhain et al., 2017). Further, with droplet-based methods it is possible for a single droplet to contain more than one cell (usually two cells; thus, called doublets), which occurs with frequency proportional to the cell density loaded into the droplet-generating instrument (Bloom, 2018). Doublets confound downstream analysis of single-cell sequencing assays (Ilicic et al., 2016; Macaulay et al., 2017), so for accurate analysis, doublets either need to be avoided by limiting sample density (Bloom, 2018), or identified and removed from data (Bais and Kostka, 2020; DePasquale et al., 2019).

Cell hashing is the term given to methods that allow labeling and identification of samples or cell subsets within single-cell RNA-seq experiments. Prior to loading, cells are labeled with a reagent that tags cells with a barcoded oligo, termed the hashtag oligo (HTO; Stoeckius et al., 2018). Cell hashing methods have been published that use barcoded antibodies that bind surface proteins (Stoeckius et al., 2018), barcoded lipids that insert into the cell membrane (McGinnis et al., 2019) or barcoded protein complexes that bind glycoproteins embedded in the cell or nuclear membrane (Fang et al., 2021). After labeling, samples can be pooled and processed for droplet-based sequencing. The barcoded oligos are tethered to the cell, typically contain PCR handles compatible with the standard RNA-seq chemistry, and are therefore processed alongside the cellular mRNA (Picelli, 2017). Cell hashing can serve two functions. If a given sample does not require the full capacity of the scRNA-seq lane, multiple samples can be pooled and run in parallel, reducing the cost per sample (Gaublomme et al., 2019). Cell hashing can also aid in detection of doublets, whether or not multiple samples are pooled. If the input sample or samples are partitioned and labeled with multiple unique HTOs, then doublets are likely to contain cells with distinct HTOs, which can be detected during HTO demultiplexing (Stoeckius et al., 2018).

While cell hashing can be extremely useful, it has pitfalls. Sequencing of the cell hashing library produces a count matrix with the number of reads detected for each HTO from each cell. While in theory each cell should only have counts from the HTO barcode used to label it, due to cross-contamination after pooling, artifacts in library construction, or sequencing errors, there is the potential for cells to be labeled to varying degrees with other barcodes present in the assay (Xin et al., 2020). In order to assign cells to samples, a demultiplexing algorithm analyzes cell hashing data (i.e. sample barcode counts in every droplet). The droplets can be classified as negative (unable to score), singlet (assigned to one sample) or multiplet (having cells from two or more samples).

Multiple demultiplexing algorithms have been developed in the past three years (Lun et al., 2019; McGinnis et al., 2019; Stoeckius et al., 2018; Xin et al., 2020), each of which relies on its own set of assumptions about how cell hashing data behave. When these assumptions are violated, which can occur when analyzing real-world data, a demultiplexing algorithm’s analysis can be corrupted, yielding inaccurate droplet classifications. For example, Seurat’s HTODemux assumes that positive and negative count distributions can be defined based on the non-deterministic k-medoids clustering, while deMULTIplex assumes that out of all local maxima in a fitted barcode count distribution, the maximum with the largest counts defines the positive peak and the maximum with the highest peak value defines the negative peak of the barcode’s bimodal count distribution; however, both of these heuristic methods for defining count distributions have been shown to be unreliable, with HTODemux classification producing inconsistent results between runs on the same data and deMULTIplex classification failing in cases where there are more positive counts than negative counts for some HTOs (Xin et al., 2020). DropletUtils assumes that positive counts are a specified minimum log-fold change above the ambient count values obtained from cell hashing data, making performance of the algorithm heavily dependent on the correct choice of this parameter (i.e. the algorithm assumes that this parameter is chosen correctly). GMM-Demux assumes that the bimodal normalized count distribution can be modeled by a Gaussian Mixture Model. What all demultiplexing algorithms have in common is that they classify cell hashing data based on a bimodal model, where every barcode is expected to have a barcode count distribution with a peak that represents ‘positive’ droplets (indicating that the droplet can be assigned to the sample represented by the barcode) and a peak that represents ‘negative’ counts, which is composed primarily of droplets that are positive for a different barcode. When an algorithm has difficulty applying its model to poorly behaved data (e.g. if there is poor peak separation between positive and negative peaks), the resulting analysis can be corrupted, yielding inaccurate results.

Here, we report the development of a novel class of demultiplexing algorithms, called Bimodal Flexible Fitting (BFF), which rely on a single inviolable assumption: that a sample barcode count distribution is bimodal. We present the algorithms BFF_cluster and BFF_raw, which are implemented in our automated cell hashing analysis R-package cellhashR, and benchmark their accuracy and consistency against multiple other demultiplexing algorithms. Using previously characterized datasets, we show that, in contrast to other demultiplexing algorithms tested, BFF algorithms are accurate and consistent for demultiplexing both well-behaved and poorly behaved input data.

2 System and methods

2.1 Barnyard dataset

To develop and evaluate BFF, we used a previously published dataset, referred to here as the Barnyard Dataset [obtained from McGinnis et al. (2019)]. These data come from a proof-of-concept experiment intended to demonstrate feasibility of using lipid-modified-oligos (LMOs) and cholesterol-modified-oligos (CMOs) for cell hashing of single-nucleus RNA sequencing (snRNA) experiments. The experiment was a version of the ‘barnyard experiment’ where samples to be multiplexed come from different species (in this case, 4848 human cells and 1046 mouse cells). Three separate cell lines were used in the experiment: human cells consisted of Human Embryonic Kidney (HEK) cells and Jurkat T cells; mouse cells consisted of Mouse Embryonic Fibroblast (MEF) cells. The Jurkat cells were involved in a separate time-course experiment, where each barcode (LMO barcodes Bar5-Bar12, sequentially) represented a separate time point. LMO-labeled nuclei from Jurkats were mixed with LMO- and CMO-labeled nuclei from MEF and HEK cells (Bar1 = LMO-HEK, Bar2 = CMO-HEK, Bar3 = LMO-MEF, Bar4 = CMO-MEF) prior to loading the nuclei onto a droplet generating instrument for creation of the multiplexed sequencing library. After sequencing, ‘ground truth’ labels were generated by analyzing expression of marker genes, specific to each cell-type, in order to classify cells by their cell-type.

2.2 Human PBMC dataset

To further evaluate BFF performance, we benchmarked the performance of demultiplexing algorithms on a previously published dataset (referred to as the PBMC Dataset) consisting of PBMC samples obtained from eight human donors [data obtained from Stoeckius et al. (2018)]. Cells in the PBMC Dataset were labeled with monoclonal antibodies conjugated to HTOs unique for a particular sample. The monoclonal antibodies used for labeling targeted immune surface markers: CD45, CD98, CD44 and CD11a (Stoeckius et al., 2018).

2.3 Threshold determination

Demultiplexing algorithms commonly rely on identification of sample barcode count thresholds that distinguish ‘positive’ cells (cells with counts above the threshold) from ‘negative’ cells (cells with counts below the threshold). In many cases, this threshold identification occurs on normalized count data rather than raw count data, so that the demultiplexing procedure implicitly relies on the assumptions built into the normalization method. By contrast, BFF algorithms identify sample barcode thresholds, from log-scale raw counts, thereby avoiding reliance on assumptions that may be at odds with the unpredictable nature of real-world data. BFF algorithms identify thresholds by fitting each barcode’s count distribution with a smoothed model of the distribution (with Kernel Density Estimation) and setting the threshold at the minimum fitted value between the positive and negative peaks of the fitted bimodal distribution. The minimum between peaks of the HTO count distribution was chosen as the optimal threshold for classifying HTO count values as either positive or negative because, without making assumptions about peak shapes which would enable fitting a parametric model to the peaks, the interpeak minimum represents the best estimate of the HTO count value that is equally likely to be positive or negative. Figure 1 demonstrates where BFF algorithms place sample barcode count thresholds relative to the peaks of distributions in the Barnyard Dataset. The curves colored by barcode represent the fit of sample barcode count data and the solid black vertical lines represent the thresholds found by BFF algorithms. The fitted data shown are histograms where the y-axis displays the square-root of log-scale barcode count density (the square-root transformation magnifies the positive peak for improved ability to visually inspect the data).

Fig. 1. — Threshold determination by BFF. BFF places count thresholds at the minimum density between fitted peaks of log-scale raw count distributions. To find thresholds, BFF first fits log-scale raw count distributions with a smoothed distribution model obtained using Kernel Density Estimation (KDE). Thresholds are then placed at the minimum density values between the peaks of the model. The data shown are histograms where the y-axis displays the square-root of density for each barcode in the Barnyard Dataset dataset. The colored curves represent the model fits of the data. These same colors are used throughout the article when displaying log-scale barcode count data

Correct placement of thresholds based on experimental data can be challenging, particularly when the valley between positive and negative peaks is not well-defined. The Barnyard Dataset barcode count distributions shown in Figure 1 largely suffer from a lack of peak separation (with the exception of Bar4), making threshold placement for this dataset somewhat unreliable, whereas the PBMC Dataset contains barcode distributions with well-separated peaks. The challenge with threshold placement arises because fitting a bimodal distribution model to count data, using a technique such as Kernel Density Estimation (KDE), requires smoothing of the data (by controlling bandwidth of the kernel) to achieve bimodality of the model; under-smoothing may result in identification of false-peaks, which would corrupt data analysis; over-smoothing may result in features of the data (such as shallow valleys) being smoothed away, which can introduce error into the analysis. The location of the minimum of the distribution model, which defines threshold location, is therefore somewhat dependent on the bandwidth used for KDE, especially for data with a lack of peak separation.

For well-behaved data (i.e. bimodal data with well-separated peaks), such as observed in the PBMC dataset, a wide range of KDE bandwidths would produce a bimodal model so that the model is fairly robust and thresholds are reliable, whereas the acceptable bandwidth range for data lacking peak separation is much narrower (performing KDE with a bandwidth that is too small produces too many peaks for a bimodal model and using a bandwidth that is too large smooths away bimodality and produces a unimodal distribution model). As a result of these limitations on Kernel Density Estimation, BFF threshold determination is somewhat limited in accuracy for data without peak separation. While we have described here the mechanism for how threshold determination can be error-prone with Kernel Density Estimation, lack of peak separation is a general problem for modeling distributions in data and is expected to impact all demultiplexing algorithms to some extent.

2.4 Overview of BFF_raw and BFF_cluster classification

BFF classification can be performed in either of two modes: BFF_raw and BFF_cluster. In BFF_raw, cell classification is performed on raw counts, directly after threshold determination, illustrated by the blue-shaded box in Figure 2. BFF_raw classifies droplets as singlets when they are positive (i.e. having counts above the barcode threshold) for just one barcode, as doublets when positive for multiple barcodes, and as negatives when they are negative for all barcodes. In the BFF_cluster process illustrated by the green-shaded box in Figure 2, raw count thresholds are used to perform a normalization method we have developed, called Bimodal Quantile Normalization (BQN, described below), followed by comparison of a droplet’s highest (hi) and 2nd-highest (2nd) normalized barcode count values with several thresholds [lim(hi), lim(2nd), min(diff)] defined by densities of the normalized data distributions (further described in Section 2.5).

Fig. 2. — Overview of BFF_raw and BFF_cluster. BFF _raw count thresholds are used for classification in BFF_raw and for BQN in BFF_cluster. A flowchart outlining the classification processes used by BFF is shown. BFF classifies cells in one of two modes: threshold mode (top) or cluster mode (bottom). Both modes rely on identification of thresholds that separate positive HTO counts from negative HTO counts for each barcode. In BFF_raw, barcode counts for a given cell are compared directly to the barcode thresholds and cells are classified as singlets, negatives or doublets based on how many barcodes have counts that exceed the barcode thresholds. In BFF_cluster, thresholds are used to split positive and negative count data for normalization by Bimodal Quantile Normalization. The distributions of highest counts and second-highest counts are used to set thresholds that separate singlets from doublets and from negatives. In order for a cell to be classified as a singlet by BFF_cluster classification, the normalized highest count must exceed the negative threshold, the normalized second-highest count must be lower than the doublet threshold, and the difference between normalized highest and second-highest counts must exceed some minimum threshold value

2.5 Bimodal quantile normalization

BQN is a procedure designed to remove bias from barcode count data, while preserving the bimodal nature of sample barcode count distributions. It is an implementation of the ‘class-specific’ quantile normalization strategy, where data are split into their component classes, each of which are quantile normalized independently then recombined for downstream analysis. In a recent benchmarking analysis, class-specific quantile normalization was found to outperform other quantile normalization strategies, especially when variables are correlated with class (Zhao et al., 2020). Due to the fact that the two classes for each barcode (positive and negative) are differentiated by the number of barcode counts, with the positive class comprising cells with high barcode counts and the negative class comprising cells with low barcode counts, barcode counts are highly correlated with class, making class-specific quantile normalization the appropriate quantile normalization strategy for normalizing cell hashing data.

In quantile normalization, all sample distributions being normalized are reshaped to a common normalized distribution, preserving the rank order of observed values in sample distributions but scaling the values according to the normalized distribution. After performing quantile normalization, it is valid to compare values across samples. Similarly, in the cell hashing context, after performing BQN, normalized count values for all barcodes in all droplets follow the same distribution (termed the BQN distribution), and it is valid to compare across sample barcode counts within an individual droplet. The comparison of within-droplet counts enables the correct assignment of a droplet to the appropriate sample.

BQN, like quantile normalization in general, is a non-parametric normalization procedure, thus in contrast to other normalization procedures, it avoids reliance on assumptions about the behavior of barcode count data that are inherent in parametric normalization procedures. The violin plot of Figure 3A is another representation of the same count distributions shown in Figure 1. The distributions of barcodes Bar2 and Bar4 are shifted significantly upward relative to the other barcodes in the dataset (partially due to these barcodes having a different chemistry than the remaining barcodes). In BQN, the data for each barcode are split at the threshold into positive and negative counts (Fig. 3B and C, respectively). Each fraction is quantile normalized independently (Fig. 3D and E, respectively). Finally, the normalized positive and negative counts are recombined to yield distributions shown in Figure 3F. This allows the data to be normalized while preserving bimodality.

Fig. 3. — Bimodal Quantile Normalization (BQN) removes bias from raw HTO count data while preserving bimodality. The process for BQN is shown. (A) First, a threshold is placed between the ‘positive’ and ‘negative’ peaks for each HTO (crossbars). (**B–C**) Second, counts above the thresholds are grouped together in ‘positive’ data (B), while data below the thresholds are grouped together in ‘negative’ data (C). (**D–E**) Third, positive and negative counts are independently normalized by quantile normalization. The distribution widths in each plot have been scaled to the maximum of the distribution widths represented in that plot (i.e. distribution widths are not comparable between B and C or between D and E). (F) After the independent normalization of positive and negative counts shown in D and E, the normalized positive and negative counts are recombined

3 Results

3.1 Comparison of barcode normalizations

To determine the effectiveness of BQN, we utilized cellhashR’s QC plotting functionality to visually compare raw data (Fig. 4A), BQN data (Fig. 4B) and Log2Center normalized data (Fig. 4C), using data from the Barnyard Dataset. BQN and Log2Center normalization both appear to remove bias present in the negative distributions of the raw data. In the violin plots in the top row of Figure 4 plots, the negative distributions (the widest part of the barcode distributions) for BQN data and Log2Center normalized data are much better aligned than in the raw data. However, bias in the positive distributions remains for the Log2Center normalized data as can be seen in the top count distributions shown in the top margin of the scatterplot in Figure 4C, as well as in the non-alignment of the positive distributions (the long portion of barcode distributions above the widest part) in the violin plot of Figure 4C. The top margin distributions for the Log2Center normalized data of Figure 4C have two right-shifted sample barcode distributions (Bar2 and Bar4) for droplet highest counts, indicating bias, whereas the top margin distributions for the BQN data in Figure 4B all follow the same distribution. Based on this observation, one might suspect that demultiplexing methods that rely on Log2Center normalization (e.g. deMULTIplex), might produce suboptimal classification results when analyzing biased data. In addition to Log2Center normalization, CLR normalization (used by Seurat’s HTODemux and GMM-Demux) and quantile normalization also fail to eliminate bias (Supplementary Fig. S1).

3.2 BFF_cluster and BFF_raw classification

BFF_cluster classifies droplets based on the theory that BQN-normalized top two droplet barcode counts contain all of the information necessary to correctly classify droplets. The top two normalized barcode counts are necessary and sufficient for distinguishing singlets from doublets because while singlets are expected to have 2nd-highest barcode counts at the level of noise, doublets would be expected to have a 2nd-highest barcode count at a value more similar to the highest barcode count than to the level of noise; lower ranked barcode counts do not add information relevant to determining whether a droplet is a negative, singlet or doublet and can therefore be excluded from further analysis. Setting a doublet threshold on the 2nd-highest count is thus an effective way to identify doublets. In a similar fashion, DropletUtils demultiplexing identifies doublets by thresholding on the Log-fold change between the 2nd-highest barcode count and the ambient level of barcode counts.

BFF_cluster effectively clusters droplets into singlet, negative and doublet clusters in the BQN space (i.e. the 2D space of the top two BQN barcode counts) as shown in Figure 5D, which demonstrates clustering of BQN counts for the Barnyard Dataset. The main cluster, in the lower-right quadrant, contains singlets and is bounded by two thresholds: the negative threshold (T_n) and the doublet threshold (T₂); a third optional threshold, the difference threshold (T_d), allows researchers to impose a minimum allowable difference between the highest and 2nd-highest BQN counts for singlet droplets, if they are concerned with the possibility of assigning droplet membership to the wrong sample. For the data analyzed in the present study, T_d provides little additional benefit beyond setting the negative and doublet thresholds, and is thus excluded (set to the default value of 0). For further information on parameter setting, including how T_d can be used to increase confidence in sample assignment, see the BFF_cluster Parameter Setting Guide provided in the Supplementary Text available online.

Fig. 5. — BFF_cluster and BFF_raw classification and thresholds. In the 2D BQN space, BFF_cluster classifies droplets with two optionally tunable thresholds (shown in A and B) and BFF_raw classifies droplets using a single threshold (shown in C). (A–C) Simulated density curves that demonstrate the relationship between the three BFF thresholds and the log-scale distributions to which they apply. The negative threshold T_n is shown as a solid vertical line in both A and D. The doublet threshold T₂ is shown as a solid vertical line in B and a solid horizontal line in panel D. The location of thresholds T₂ and T_n are both set at the location where the local count density is a user-specified fraction of the maximum count density within the singlet peak. That is, the density of the highest count distribution at T_n, D(T_n), is the maximum density, D_max, scaled by factor $α$ ; the density of the second-highest count distribution at T₂, D(T₂), is the maximum density, D_max, scaled by factor $β$ . (C) This demonstrates that the BQN Threshold is located at the interpeak minimum of the BQN distribution. (D) This demonstrates how the three BFF thresholds classify droplets in BQN space for the Barnyard Dataset. The data-derived distributions shown in the top and right margins of the scatterplot correspond to the simulated distributions shown in panels A and B, respectively. The threshold locations shown for T_n and T₂ represent threshold locations corresponding to the 0.05 default values for parameters $α$ and $β$

The negative threshold, T_n, is the minimum allowable highest count value for a droplet to be classified as a singlet rather than a negative (solid vertical line in Fig. 5D and dashed line in Fig. 5A). The location for the negative threshold is determined by finding the minimum count value where density is equivalent to the maximum density scaled by the user defined fractional coefficient, $α$ , as shown in Figure 5A. The doublet threshold, T₂ is the maximum allowable 2nd-highest count value for a droplet to be classified as a singlet rather than a doublet (solid horizontal line in Fig. 5D and dashed line in Fig. 5B). The location for the doublet threshold is determined by finding the count value where density is equivalent to the maximum density scaled by the user defined fractional coefficient, $β$ , as shown in Figure 5B.

A major challenge with threshold placement is that it is not possible to determine the barcode count value where singlets end and non-singlets begin; there is likely some amount of overlap between the two classes. In BFF_cluster, the negative and doublet thresholds are set at locations within the singlet distribution of the highest and 2nd-highest count distributions (thus sacrificing a small amount of data) in order to ensure that outliers to the singlet distribution (i.e. negatives and doublets) are reliably excluded from cell hashing data. The motivation for defining negative and doublet thresholds in terms of the maximum density of the singlet distribution is that this allows the user to determine how much data they are willing to sacrifice, in order to exclude non-singlets, based on a property of the distribution. We have chosen 0.05 as the default value for $α$ and $β$ (i.e. data at density less than 5% of the maximum density are excluded) because for most datasets, this is a value that is likely effective at excluding most non-singlets without sacrificing an excessive amount of data. While we suggest a value of 0.05 for $α$ and $β$ , BFF_cluster users are able to choose parameter values according to their analysis goals. For example, if higher purity of cells is required, higher values can be used for these parameters.

In contrast to BFF_cluster, BFF_raw classifies droplets based on the theory that raw barcode counts can be classified by comparison with barcode-specific thresholds. BFF_raw results can be easily visualized and compared to BFF_cluster results on the same 2D scatterplot, since the BQN threshold is equivalent to a barcode’s raw count threshold mapped to the BQN space. In fact, BFF_cluster and BFF_raw produce equivalent results for the case where the negative and doublet thresholds are both equivalent to the BQN threshold. The two algorithms largely agree with each other as can be seen from the regions of BQN space where classifications agree in Figure 5D. The only disagreements occur in the space between the BFF_cluster thresholds and the BQN thresholds. Droplets in these regions are represented as red squares, red triangles and blue triangles.

3.3 BFF comparison to other demultiplexing algorithms

To compare BFF classification performance against other commonly used demultiplexing algorithms, we performed benchmarking studies on the PBMC and Barnyard Datasets. These benchmarking studies were conducted using our R-package, cellhashR. cellhashR integrates multiple cell hashing algorithms and provides a single interface to execute and compare the results of each.

Figure 6A and B show the results of 10-fold cross-validation studies performed on the PBMC Dataset (A) and on the Barnyard Dataset (B). The ‘ground truth’ used for these benchmarking studies are classification labels obtained by the authors who published the datasets, as described in Section 6. On the PBMC data shown in Figure 6A, all of the demultiplexing algorithms tested achieved a similar level of accuracy. BFF_cluster and BFF_raw have the highest accuracies on this dataset (at 0.966 and 0.962, respectively), however, the difference between these values and the next highest value, HTODemux’s 0.960 accuracy, is within the margin of error and the lowest accuracy, obtained by deMULTIplex, was 0.903. On the Barnyard Dataset shown in Figure 6B, demultiplexing accuracy across algorithms was much more varied. BFF_cluster and BFF_raw have the highest accuracies on this dataset (at 0.922 and 0.860, respectively), the next highest accuracy is GMM-Demux’s 0.854 and the lowest accuracy is DropletUtil’s 0.440. It should be noted that all algorithms were run with default parameters, and it is possible that particular algorithms could be tuned to perform better on this particular dataset.

Fig. 6. — Accuracy and consistency of BFF algorithms relative to other demultiplexing algorithms. Accuracy of demultiplexing algorithms obtained by 10-fold cross-validation is shown for the PBMC Dataset (A) and for the Barnyard Dataset (B). Error-bars represent standard error of the mean accuracy (SEM) across the 10 analysis runs. Bar colors represent the algorithms shown in the figure legend of B. The asterisk above demuxEM indicates that only the full dataset was analyzed and SEM was not calculated. (C) A scatterplot of a subset of the Barnyard Dataset that contains droplets with a cell hashing classification (from at least one of the two algorithms, BFF_cluster and GMM-Demux) that differs from ground-truth based on gene expression (GEX). Green points represent droplets where both algorithms produced the same classification, but disagree with ground-truth; red points represent droplets with GMM-Demux classifications that agree with ground-truth and BFF_cluster does not; blue points represent droplets with BFF_cluster classifications that agree with ground-truth and GMM-Demux does not. Droplets concordant between all three have been excluded. There were an additional 12 droplets where the demultiplexing algorithm classifications differ from each other and from ground-truth, which are not shown. The dashed lines represent the BQN threshold and the solid lines represent the BFF_cluster thresholds. (**D–F**) summarize classifications for droplets shown in the scatterplot of (C). The tile plot in (D) summarizes the blue points, (E) summarizes the red points and (F) summarizes the green points

In addition to accuracy of classification results, another important consideration is consistency. The error bars shown in Figure 6A and B represent the standard error of the mean accuracy (SEM), which is an indication of the variance in the cross-validation results shown. In addition to being the most accurate algorithms, BFF_cluster and BFF_raw are among the most consistent on both datasets (0.184% and 0.148%, respectively for PBMC; 0.091% and 0.290%, respectively for Barnyard), along with DropletUtils (0.092% and 0.272% for PBMC and Barnyard Datasets, respectively). GMM-Demux is highly consistent for the PBMC dataset (0.092%) but not on the Barnyard dataset (1.51%). Seurat’s HTODemux is the least consistent of all algorithms measured on both datasets (0.791% and 3.88% for PBMC and Barnyard Datasets, respectively). The increased variance observed for the Barnyard Dataset 10-fold cross-validation studies indicates that changes to this dataset (e.g. holding out one of the ten folds of data) have a larger impact on performance of most algorithms than comparable changes to the PBMC Dataset, which is likely a consequence of the dataset’s poor behavior. BFF_cluster is the only algorithm tested whose SEM value did not increase for the Barnyard Dataset (it decreased by a factor of 2), which demonstrates that BFF_cluster is less sensitive to changes in data that adversely affect the performance of other algorithms. We did not perform the cross-validation analysis to measure variance in accuracy for DemuxEM because, when benchmarking, we noticed that this algorithm does not work well on subsets of data, so the algorithm was applied to the full dataset.

To better understand how BFF algorithms achieve higher demultiplexing accuracy than the other callers on the Barnyard Dataset, we compared BFF calls to classifications made by other callers on droplets that were falsely identified (with gene expression, or GEX, as ground-truth) by BFF and/or other algorithms. Figure 6C is a scatterplot of BQN counts for droplets that were incorrectly identified by either BFF_cluster and/or GMM-Demux, the most accurate non-BFF classifier for the Barnyard Dataset. The majority of cases where neither algorithm makes the correct call (green points) are located in the BFF_cluster doublet zone and the BFF_cluster negative zone (above solid horizontal line and to the left of the solid vertical line, respectively).

The green negative droplets in Figure 6C are not labeled as negative by gene expression because there are no true negatives (i.e. droplets containing no cells) present in the data (i.e. all negatives in this data are false negatives). Empty droplets were removed by filtering droplets, based on low gene expression counts, prior to demultiplexing; however, the droplets in the negative zone are considered negatives by BFF_cluster because they do not have enough barcode counts present for accurate classification and are thus considered outliers to the singlet distribution.

The majority of droplets in the Barnyard Dataset have BFF_cluster and GMM-Demux classifications that agree. Of the 5779 Barnyard Dataset droplets, only 411 (7.1%) have BFF_cluster and GMM-Demux classifications that disagree. The blue points in the scatterplot of Figure 6C represent the portion of these 411 droplets that are classified correctly by BFF_cluster and incorrectly by GMM-Demux, according to gene expression ground-truth (390 droplets, or 94.9% of all classification disagreements). The 21 remaining droplets consist of nine droplets that are classified correctly by GMM-Demux and incorrectly by BFF_cluster (represented by the red points in the scatterplot shown in Fig. 6C, and summarized in the tile plot of Fig. 6E) as well as 12 droplets with conflicting BFF_cluster and GMM-Demux classifications that are also different from gene expression ground-truth (excluded from Fig. 6C–F for the sake of clarity). The 12 droplets with conflicting classifications consist of 10 Jurkat cells and 2 HEK cells (by GEX). These droplets were called as either the wrong cell type (8 droplets: 1 MEF, 5 HEK and 2 Jurkat cells) or as negatives (4 droplets) by BFF_cluster and as negatives (2 droplets), wrong cell types (2 droplets classified as HEK cells) or doublets (8 droplets) by GMM-Demux.

The calls made by each algorithm on the 399 discordant droplets represented as red and blue dots in Figure 6C are compared in the tile plots of Figure 6D and E. The vast majority of disagreements are droplets where BFF_cluster matches ground-truth, with BFF_cluster singlets/GMM-Demux doublets (280 droplets) or BFF_cluster singlets/GMM-Demux negatives (102 droplets). The 280 singlet/doublet droplets are primarily located in a cluster of droplets slightly below the doublet threshold (solid horizontal line in Fig. 6C), while the majority of the 102 singlet/negative droplets are located in a cluster of droplets slightly to the right of the negative threshold (solid vertical line in Fig. 6C). This pattern of disagreements suggests that the majority of disagreements arise due to the algorithms having slightly different thresholds, with GMM-Demux effectively thresholding negatives near the BQN threshold (dashed vertical line) and effectively thresholding doublets near a value of 2.0 on the BQN 2nd-highest counts axis (i.e. the y-axis of the scatterplot). The fact that most disagreements between BFF_cluster and GMM-Demux lie in such a small slice of BQN space helps to explain how GMM-Demux performs nearly as well as BFF_cluster on the Barnyard Dataset. Scatterplots similar to Figure 6C that show disagreements between BFF_cluster and the best-performing algorithms on the Barnyard Dataset, can be found in the Supplementary Text. These plots further support the observation that disagreements occur primarily near thresholds due to differences in threshold placement between algorithms.

The tile plot in Figure 6F summarizes the classifications of the 428 droplets where BFF_cluster and GMM-Demux produced the same call, but disagree with ground-truth (Fig. 6C, green points). We were interested by the fact that multiple demultiplexing algorithms arrived at the same classification, but disagreed with ground-truth. In the case of the Barnyard Dataset, gene expression profiles were used to assign droplets to a cell type. As a result, doublets comprised of two highly distinct cells (i.e. cross-species) should be predicted with the highest accuracy. Two cells of the same species but different cell types can be differentiated based on gene expression, but accuracy might be reduced. Finally, droplets containing two cells of the same type may be difficult to identify as doublets based on gene expression alone. Of these 428 droplets where HTO-based methods consistently disagree with GEX-based ground-truth assignment, we do in fact see enrichment for same-species and especially same-cell type doublets, which suggests there may be a level of inaccuracy in the ground-truth predictions (see Supplementary Section S3 for more detail). Additionally, the doublet rate predicted for the Barnyard Dataset by gene expression, 0.017, is lower than the expected observable doublet rate of 0.042 (calculated according to the formula at https://satijalab.org/costpercell/ and detailed in the Methods section). For this data, BFF_cluster and GMM-Demux predict doublet rates of 0.066 and 0.12, respectively, both of which are greater than the expected doublet rate. The fact that the gene expression doublet rate prediction is less than expected, while two of the most accurate demultiplexing algorithms tested both predict a doublet rate that is greater than expected, provides further support to the idea that gene expression may undercount doublets for the Barnyard Dataset.

Finally, there were the 69 droplets scored as singlets (by both cell hashing and GEX) where cell hashing algorithms assigned the droplets to the wrong cell type (1.1% of total droplets). This includes 68 Jurkat cells (based on GEX) that are called as HEK cells by cell hashing classifiers and one HEK cell (based on GEX) called as a Jurkat cell by both BFF_cluster and GMM-Demux. It is noteworthy that the discordant calls occur entirely among human cell types. The fact the cell hashing-based methods consistently assign droplets to one sample and the gene-expression-based classification assigns these droplets to another cell type is interesting. It suggests that for some droplets cell hashing profiles may be problematic for all demultiplexing algorithms, although it is also conceivable that at least some portion of these cells represent incorrect gene-expression-based assignment as well. These droplets are nonetheless a small percentage of total cells.

4 Discussion

Here, we present BFF demultiplexing algorithms, a novel class of cell hashing demultiplexing algorithms, and demonstrate that in contrast to popular demultiplexing algorithms, BFF algorithms demultiplex cell hashing data accurately and consistently for both well-behaved and poorly behaved data. The patterns of accuracy and variance observed for the other demultiplexing algorithms tested on the PBMC and Barnyard Datasets are quite revealing. All algorithms tested performed fairly accurately on the well-behaved PBMC Dataset, but aside from BFF_cluster, most had substantially lower accuracy for demultiplexing on the Barnyard Dataset. The non-BFF algorithm that performed most accurately on this data, GMM-Demux, still exhibited a significant increase in variance relative to the variance observed on the well-behaved PBMC Dataset. These observations likely indicate that the demultiplexing algorithms tested are fundamentally similar and perform comparably on well-behaved data, but are confounded by poorly behaved data to the extent that they find it difficult to model the data correctly. Characteristics observed in the Barnyard Dataset that are missing from the PBMC Dataset, such as poor peak separation and biased count distributions, contribute to the Barnyard Dataset being considered poorly behaved. Two capabilities of BFF_cluster enable it to accurately demultiplex this dataset in spite of its poor behavior: first, BQN removes the effects of bias; second, setting global thresholds to exclude outliers from the singlet distribution enables BFF_cluster to overcome imperfect threshold setting, which can negatively affect BFF_raw demultiplexing accuracy.

We have demonstrated that BFF_cluster can optionally be tuned according to the needs of an experiment by varying parameter values for $α$ and $β$ . BFF_cluster determines the location of the negative threshold as the lowest count where the local density of highest counts is equal to the maximum density of the singlet distribution, scaled by $α$ . Similarly, it determines the location of the doublet threshold as the highest count where the local density of second-highest counts is equal to the maximum density of the singlet distribution, scaled by $β$ . Because of the manner in which BFF_cluster determines threshold locations, the best BFF_cluster performance would be expected with $α$ and $β$ values chosen to be low enough not to discard an unreasonable number of droplets, yet high enough such that the local density along the singlet distribution is higher than the density of any count values in the noise. When these criteria are met, the threshold would be set somewhere in the tail of the singlet distribution where it would reliably discard droplets that are outliers to the singlet distribution (while also attempting to assign as many droplets to their samples as is reasonable). The default value for parameters $α$ and $β$ , is 0.05, because this is a fairly low value where a threshold of 5% of the maximum singlet density would be expected to be greater than density values encountered in regions of noise. The default $α$ and $β$ parameters are expected to work for most datasets; however, users are able to choose parameter values according to their analysis goals. For example, if higher purity of cells is required, higher parameter values can be used.

Finally, in order to better understand how BFF algorithms outperform other algorithms tested, we have explored classification differences between BFF and non-BFF algorithms on the Barnyard Dataset. We have demonstrated that when visualizing classification differences in BQN space, these differences tend to cluster near threshold boundaries when algorithms have slightly different negative and doublet thresholds. This is observed for the comparison between BFF_cluster and GMM-Demux classifications in Figure 6C, and this observation holds true for comparisons of BFF algorithms to the highest performing of the demultiplexing algorithms tested (see Supplementary Text for more detail). One final consideration for BFF_cluster is that there are situations where BFF_cluster can behave unexpectedly. Notably, BQN has been observed to produce BQN count distributions with extremely sharp peaks on some hashing datasets where there is extreme bias and there are few true-positive droplets for a given sample barcode. In these cases, BFF_raw is likely to be the optimal demultiplexing algorithm because it accurately demultiplexes cell hashing data while avoiding normalization altogether.

The results shown in Figure 6 demonstrate that poorly behaved cell hashing data can have a profound effect on demultiplexing algorithm performance. Because data quality is difficult to control and it is difficult to identify when features in the data cause problems for some demultiplexing algorithms, we have developed cellhashR to provide researchers with extensive QC capabilities that help identify issues in cell hashing data, and to enable the simultaneous execution of any combination of algorithms on a dataset (BFF and others). cellhashR outputs a unified table of classifications, along with QC plots contrasting the results of each algorithm.

5 Conclusion

Here, we introduced the novel HTO demultiplexing algorithms BFF_cluster and BFF_raw, and the normalization procedure Bimodal Quantile Normalization, all of which have been implemented in our novel R package for automated cell hashing analysis, cellhashR. We utilized cellhashR’s QC features to demonstrate bias removal in cell hashing data with BQN, in contrast with other commonly used procedures for normalizing cell hashing data. We have demonstrated how the BFF algorithms function and how their classifications can be visualized in BQN space. We have shown how accurately and reproducibly BFF_cluster classifies the well-behaved PBMC Dataset, as well as the poorly behaved Barnyard Dataset. Finally, in order to better understand how BFF algorithms outperform other algorithms tested on the Barnyard Dataset, we have explored classification differences between BFF and non-BFF algorithms by visualizing them in the 2D BQN space. With this exploration, we have demonstrated that the majority of droplets are classified consistently by most algorithms, but inconsistencies occur in small regions of BQN space near thresholds.

6 Implementation

The software and versions used in this study were: R (4.1.0), cellhashR (1.0.0), preprocessCore (1.52.1), DropletUtils (1.12.1; Lun et al., 2019), Seurat (4.0.1; Butler et al., 2018), GMM-Demux (0.2.1.3; Xin et al., 2020), reticulate (1.20), stats (4.0.5), ggplot2 (3.3.3), ggExtra (0.9), patchwork (1.1.1).

6.1 Software availability

cellhashR is available at https://github.com/BimberLab/cellhashR.

6.2 Data availability

The datasets analyzed in this article are available from sources in the public domain:

Barnyard Dataset (McGinnis et al., 2019, Source Data, Fig. 1): https://www.nature.com/articles/s41592-019-0433-8#Sec31.

PBMC Dataset (Stoeckius et al., 2018): https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108313.

PBMC Ground-truth labels (Xu et al., 2019): https://github.com/jon-xu/scSplit_paper_data/tree/master/Table%203/Hashtag

6.3 Data sources

The Barnyard Dataset data was obtained from the Data Source file for Figure 1 in McGinnis et al. (McGinnis et al., 2019). These data were obtained from a single-nucleus RNA sequencing study where HEK cells (Bar1 and Bar2) were multiplexed with MEF cells (Bar3 and Bar4) and Jurkat cells (remaining barcodes). Barcodes Bar2 and Bar4 were cholesterol-modified oligos (CMOs) whereas all other barcodes in the study were lipid-modified oligos (LMOs). The Excel file containing the Barnyard Dataset data contains 4 sheets. The sheets, ‘POC_Nuc-hs_BarMatrix_MetaData’ and ‘POC_Nuc-mm_BarMatrix_MetaData’ were concatenated to form a single sheet, with cells in rows and features in columns, from which all rows with a ‘Classification’ entry of ‘Unlabeled’ were removed. The resulting ‘labeled’ dataset was split into (1) a barcode count matrix containing unique rows from a data frame consisting of data from the ‘CellID’ column and all of the 12 barcode columns ‘Bar1’ … ‘Bar12’, and (2) metadata containing the ‘CellID’ column and the columns that do not contain barcode counts. Entries in the barcode count matrix with counts of 0 for every barcode were removed to yield the final barcode count matrix with 5779 droplets, which was used for analysis in this study.

While all other demultiplexing algorithms required only cell hashing data for demultiplexing and were used on data filtered by gene expression, demuxEM requires gene expression data and unfiltered cell hashing data. Gene expression data for the Barnyard Dataset was obtained from GEO. Datasets with SRA IDs, SRR8890648 and SRR8890625 were processed together using cell ranger software (10x Genomics, version 4.0). The dataset SRR8890636 was also processed with cell ranger software to yield a .csv file with raw cell hashing count data. The raw_feature_bc_matrix.h5 file resulting from processing of gene expression data was input along with the cell hashing .csv file to demuxEM for analysis (to make this CSV compatible with demuxEM, the string ‘Antibody’ was inserted into the first position of the header line, rather than an empty entry). The resulting zarr file contained cell hashing calls that were analyzed along with the results from other algorithms.

The PBMC Dataset was obtained from the GEO data linked to Stoeckius et al. (2018). Gene expression data was contained in the file with SRA ID SRR8281306, which was processed using cell ranger software (10x Genomics, version 4.0) for input to demuxEM. Cell hashing data was contained in the file GSM2895283_Hashtag-HTO-count.csv. Both files were input to demuxEM (to make this CSV compatible with demuxEM, the string ‘Antibody’ was inserted into the first position of the header line, rather than an empty entry) and the resulting zarr file containing cell hashing calls was analyzed along with the results from other algorithms.

‘Ground truth’ labels for the PBMC data were obtained from files ‘A’-‘H’, and the ‘doublets’ file at https://github.com/jon-xu/scSplit_paper_data/tree/master/Table%203/Hashtag. These labels were originally used in an analysis of the PBMC Dataset by Xu et al. (2019). The list of 7931 sample labels (obtained by combining the droplet barcodes from the aforementioned files) was generated by randomly sampling 10 000 cells from the Stoeckius et al. data, removing control cells after sampling and applying labels obtained from the authors of Stoeckius et al. (from personal correspondence with the authors of Xu et al.).

6.4 BFF threshold determination

Thresholds are determined for each sample barcode as described below. The first step involved in threshold determination for a sample barcode is log-transforming the raw data:

Logscale counts = \log_{10} (Counts + 1 + ε)

(1)

where ε represents a small amount of randomly generated noise (random value between 0 and 1) added to the counts in order to avoid artifacts introduced by fitting a continuous function to discrete data. Prior to log-transformation, a value of 1 is also added to raw counts to ensure that log-scale counts generated are finite numbers.

Following log-transformation, the log-scale count distribution is fitted with a smoothed model of the distribution. To adequately smooth the data (without over-smoothing), kernel-density estimation (KDE) is iteratively performed with increasing bandwidths until there are fewer than a pre-determined number of peaks in the first-derivative of the smoothed distribution. KDE is performed with the ‘density’ function (with Gaussian kernel, ‘SJ’ bandwidth and incrementing adjust parameter values) of the R package, stats. The default maximum number of derivative peaks allowed is 4. Only two derivative peaks are expected in the smoothed distribution, but stopping smoothing before all minor derivative peaks are smoothed away with larger bandwidths ensures that data is not over-smoothed, which can lead to inaccurate determination of the barcode threshold. Smoothing away most derivative peaks ensures that the smoothed model does not capture an excessive amount of noise.

Finally, the barcode count threshold is placed at the count value with the minimum density value between positive and negative peaks of the smoothed distribution. BFF thresholds are used by BFF_cluster and BFF_raw as described in the text and in Figure 2.

6.5 Normalization methods

Quantile Normalization: Quantile Normalization is performed using the normalize.quantiles function from the preprocessCore BioConductor package in R
Bimodal Quantile Normalization (BQN): BQN involves:
1. Finding raw count thresholds for every barcode as described above, in Section 6.3.
2. Splitting the matrix of raw counts (with the two dimensions representing cells and barcodes) at the barcode threshold values into a ‘positive’ matrix and a ‘negative’ matrix where 0 values in each matrix replace values belonging to the other group.
3. Converting missing values from these matrices sets into ‘NA’ values then performing quantile normalization on each of the matrices.
4. Converting ‘NA’ values back to 0 values, and adding the positive and negative quantile-normalized matrices to yield the matrix of BQN counts.
Log2Center Normalization: Log2Center normalization was implemented in cellhashR according to the implementation in deMULTIplex (https://github.com/chris-mcginnis-ucsf/MULTI-seq).
CLR Normalization: CLR normalization is performed using Seurat’s NormalizeData function with the normalization.method parameter set to ‘CLR’.

6.6 Performing parameter scans of BFF_cluster parameters

The ParameterScan function of cellhashR was developed to help researchers choose appropriate parameter values for BFF_cluster. It scans values 0.05, 0.1, 0.15, 0.2, 0.25 and 0.5 for all parameters, calculates the associated threshold position and determines the number of cells discarded as a result. The results are output in graphical form to aid researchers in choosing appropriate values for parameters $α$ , $β$ and $Δ$ (the coefficient used in determining the difference threshold, T_d).

6.7 Demultiplexing algorithms

The BFF algorithms have been implemented in cellhashR. Versions of the HTODemux and deMULTIplex algorithms were also implemented in cellhashR (which include minor changes for improved error handling) and used in the benchmarking study. The other non-BFF algorithms used in the benchmarking study, hashedDrops from the DropletUtils R package, GMM-Demux from the gmm-demux python package and DemuxEM from the demuxEM python package, are executed directly by cellhashR.

6.8 Accuracy determination

Accuracy was calculated by comparing algorithm classifications to gene expression classifications (the ‘ground truth’) by the formula:

Accuracy = \frac{No . of concordant classifications}{Total no . of classifications}

(2)

6.9 Tenfold cross validation

To produce data for 10-fold Cross Validation, the barcode count matrix was randomly subsetted into 10 folds. Ten different subsets of the data were then constructed by accumulating all of the folds except one, for ten iterations, leaving a different fold out of the accumulated data each time. The 10 subsets of the data were then analyzed by the demultiplexing algorithms. Accuracy shown in Figure 6A is the mean accuracy across the 10 subsets. Error bars represent the standard error of the mean accuracy, calculated as follows:

SEM = \sqrt{\frac{\sum_{i = 1}^{N} {({Accuracy}_{i} - \bar{Accuracy})}^{2}}{N - 1}}

(3)

6.10 Doublet rate predictions

Doublet rate calculations for each demultiplexing algorithm involved dividing the number of doublets predicted by the total number of droplets observed. Expected rates for observable doublets were calculated using the following formula (obtained from https://satijalab.org/costpercell/):

obs . doublet rate = k * (n_{cells}) * (n_{barcodes} - 1) / (n_{barcodes})

(4)

where k is conversion factor m ( $4.597701 \times 10^{- 6}$ ) multiplied by the inverse recovery rate (1.74 for 107× Genomics version 2 chemistry), $n_{cells}$ is the total number of cells (droplets) in the data and $n_{barcodes}$ is the number of sample barcodes.

Funding

This work was supported by the National Institutes of Health [P51 OD011092, 5UM1 AI124377-05 to L.J.P., 5U19 AI128741-05 to L.J.P.], and the Bill and Melinda Gates Foundation [OPP1108533/INV-008046 to L.J.P.].

Conflict of Interest: none declared.

Supplementary Material

btac213_Supplementary_Data

Click here for additional data file.^{(8.5MB, zip)}

Contributor Information

Gregory J Boggy, Oregon National Primate Research Center, Oregon Health and Science University, Beaverton, OR 97006, USA.

G W McElfresh, Oregon National Primate Research Center, Oregon Health and Science University, Beaverton, OR 97006, USA.

Eisa Mahyari, Oregon National Primate Research Center, Oregon Health and Science University, Beaverton, OR 97006, USA.

Abigail B Ventura, Vaccine and Gene Therapy Institute, Oregon Health and Science University, Beaverton, OR 97006, USA.

Scott G Hansen, Vaccine and Gene Therapy Institute, Oregon Health and Science University, Beaverton, OR 97006, USA.

Louis J Picker, Vaccine and Gene Therapy Institute, Oregon Health and Science University, Beaverton, OR 97006, USA.

Benjamin N Bimber, Oregon National Primate Research Center, Oregon Health and Science University, Beaverton, OR 97006, USA.

References

Bais A.S., Kostka D. (2020) SCDS: computational annotation of doublets in single-cell RNA sequencing data. Bioinformatics, 36, 1150–1158. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bloom J.D. (2018) Estimating the frequency of multiplets in single-cell RNA sequencing from cell-mixing experiments. PeerJ, 6, e5578. [DOI] [PMC free article] [PubMed] [Google Scholar]
Buenrostro J.D. et al. (2015) Single-cell chromatin accessibility reveals principles of regulatory variation. Nature, 523, 486–490. [DOI] [PMC free article] [PubMed] [Google Scholar]
Butler A. et al. (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol., 36, 411–420. [DOI] [PMC free article] [PubMed] [Google Scholar]
Canzar S. et al. (2017) BASIC: BCR assembly from single cells. Bioinformatics, 33, 425–427. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cao J. et al. (2018) Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science, 361, 1380–1385. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carter J.A. et al. (2019) Single T cell sequencing demonstrates the functional role of alphabeta TCR pairing in cell lineage and antigen specificity. Front. Immunol., 10, 1516. [DOI] [PMC free article] [PubMed] [Google Scholar]
De Simone M. et al. (2018) Single cell T cell receptor sequencing: techniques and future challenges. Front. Immunol., 9, 1638. [DOI] [PMC free article] [PubMed] [Google Scholar]
DePasquale E.A.K. et al. (2019) DoubletDecon: deconvoluting doublets from single-cell RNA-sequencing data. Cell Rep., 29, 1718–1727.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fang L. et al. (2021) CASB: a concanavalin A-based sample barcoding strategy for single-cell sequencing. Mol. Syst. Biol., 17, e10060. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gaublomme J.T. et al. (2019) Nuclei multiplexing with barcoded antibodies for single-nucleus genomics. Nat. Commun., 10, 2907. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goldstein L.D. et al. (2019) Massively parallel single-cell B-cell receptor sequencing enables rapid discovery of diverse antigen-reactive antibodies. Commun. Biol., 2, 304. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ilicic T. et al. (2016) Classification of low quality cells from single-cell RNA-seq data. Genome Biol., 17, 29. [DOI] [PMC free article] [PubMed] [Google Scholar]
Klein A.M. et al. (2015) Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell, 161, 1187–1201. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lun A.T.L. et al. ; Participants in the 1st Human Cell Atlas Jamboree. (2019) EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol., 20, 63. [DOI] [PMC free article] [PubMed] [Google Scholar]
Luo J. et al. (2020) Simultaneous measurement of surface proteins and gene expression from single cells. Methods Mol. Biol., 2111, 35–46. [DOI] [PubMed] [Google Scholar]
Macaulay I.C. et al. (2017) Single-cell multiomics: multiple measurements from single cells. Trends Genet., 33, 155–168. [DOI] [PMC free article] [PubMed] [Google Scholar]
Macosko E.Z. et al. (2015) Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 161, 1202–1214. [DOI] [PMC free article] [PubMed] [Google Scholar]
McGinnis C.S. et al. (2019) MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. Nat. Methods., 16, 619–626. [DOI] [PMC free article] [PubMed] [Google Scholar]
Muto Y. et al. (2021) Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney. Nat. Commun., 12, 2190. [DOI] [PMC free article] [PubMed] [Google Scholar]
Peterson V.M. et al. (2017) Multiplexed quantification of proteins and transcripts in single cells. Nat. Biotechnol., 35, 936–939. [DOI] [PubMed] [Google Scholar]
Picelli S. (2017) Single-cell RNA-sequencing: the future of genome biology is now. RNA Biol., 14, 637–650. [DOI] [PMC free article] [PubMed] [Google Scholar]
Redmond D. et al. (2016) Single-cell TCRseq: paired recovery of entire T-cell alpha and beta chain transcripts in T-cell receptors from single-cell RNAseq. Genome Med., 8, 80. [DOI] [PMC free article] [PubMed] [Google Scholar]
Salomon R. et al. (2019) Droplet-based single cell RNAseq tools: a practical guide. Lab Chip., 19, 1706–1727. [DOI] [PubMed] [Google Scholar]
Satpathy A.T. et al. (2019) Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol., 37, 925–936. [DOI] [PMC free article] [PubMed] [Google Scholar]
Singh M. et al. (2019) High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes. Nat. Commun., 10, 3120. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stoeckius M. et al. (2017) Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods, 14, 865–868. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stoeckius M. et al. (2018) Cell hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome Biol., 19, 224. [DOI] [PMC free article] [PubMed] [Google Scholar]
Svensson V. et al. (2018) Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc., 13, 599–604. [DOI] [PubMed] [Google Scholar]
Swanson E. et al. (2021) Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq. Elife, 10, e63632. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xin H. et al. (2020) GMM-Demux: sample demultiplexing, multiplet detection, experiment planning, and novel cell-type verification in single cell sequencing. Genome Biol., 21, 188. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu J. et al. (2019) Genotype-free demultiplexing of pooled single-cell RNA-seq. Genome Biol., 20, 290. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao Y. et al. (2020) How to do quantile normalization correctly for gene expression data analyses. Sci. Rep., 10, 15534. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zheng G.X. et al. (2017) Massively parallel digital transcriptional profiling of single cells. Nat. Commun., 8, 14049. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ziegenhain C. et al. (2017) Comparative analysis of single-Cell RNA sequencing methods. Mol. Cell., 65, 631–643.e634. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btac213_Supplementary_Data

Click here for additional data file.^{(8.5MB, zip)}

[btac213-B1] Bais A.S., Kostka D. (2020) SCDS: computational annotation of doublets in single-cell RNA sequencing data. Bioinformatics, 36, 1150–1158. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B2] Bloom J.D. (2018) Estimating the frequency of multiplets in single-cell RNA sequencing from cell-mixing experiments. PeerJ, 6, e5578. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B3] Buenrostro J.D. et al. (2015) Single-cell chromatin accessibility reveals principles of regulatory variation. Nature, 523, 486–490. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B4] Butler A. et al. (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol., 36, 411–420. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B5] Canzar S. et al. (2017) BASIC: BCR assembly from single cells. Bioinformatics, 33, 425–427. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B6] Cao J. et al. (2018) Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science, 361, 1380–1385. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B7] Carter J.A. et al. (2019) Single T cell sequencing demonstrates the functional role of alphabeta TCR pairing in cell lineage and antigen specificity. Front. Immunol., 10, 1516. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B8] De Simone M. et al. (2018) Single cell T cell receptor sequencing: techniques and future challenges. Front. Immunol., 9, 1638. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B9] DePasquale E.A.K. et al. (2019) DoubletDecon: deconvoluting doublets from single-cell RNA-sequencing data. Cell Rep., 29, 1718–1727.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B10] Fang L. et al. (2021) CASB: a concanavalin A-based sample barcoding strategy for single-cell sequencing. Mol. Syst. Biol., 17, e10060. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B11] Gaublomme J.T. et al. (2019) Nuclei multiplexing with barcoded antibodies for single-nucleus genomics. Nat. Commun., 10, 2907. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B12] Goldstein L.D. et al. (2019) Massively parallel single-cell B-cell receptor sequencing enables rapid discovery of diverse antigen-reactive antibodies. Commun. Biol., 2, 304. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B13] Ilicic T. et al. (2016) Classification of low quality cells from single-cell RNA-seq data. Genome Biol., 17, 29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B14] Klein A.M. et al. (2015) Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell, 161, 1187–1201. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B15] Lun A.T.L. et al. ; Participants in the 1st Human Cell Atlas Jamboree. (2019) EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol., 20, 63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B16] Luo J. et al. (2020) Simultaneous measurement of surface proteins and gene expression from single cells. Methods Mol. Biol., 2111, 35–46. [DOI] [PubMed] [Google Scholar]

[btac213-B17] Macaulay I.C. et al. (2017) Single-cell multiomics: multiple measurements from single cells. Trends Genet., 33, 155–168. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B18] Macosko E.Z. et al. (2015) Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 161, 1202–1214. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B19] McGinnis C.S. et al. (2019) MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. Nat. Methods., 16, 619–626. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B20] Muto Y. et al. (2021) Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney. Nat. Commun., 12, 2190. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B21] Peterson V.M. et al. (2017) Multiplexed quantification of proteins and transcripts in single cells. Nat. Biotechnol., 35, 936–939. [DOI] [PubMed] [Google Scholar]

[btac213-B22] Picelli S. (2017) Single-cell RNA-sequencing: the future of genome biology is now. RNA Biol., 14, 637–650. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B23] Redmond D. et al. (2016) Single-cell TCRseq: paired recovery of entire T-cell alpha and beta chain transcripts in T-cell receptors from single-cell RNAseq. Genome Med., 8, 80. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B24] Salomon R. et al. (2019) Droplet-based single cell RNAseq tools: a practical guide. Lab Chip., 19, 1706–1727. [DOI] [PubMed] [Google Scholar]

[btac213-B25] Satpathy A.T. et al. (2019) Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol., 37, 925–936. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B26] Singh M. et al. (2019) High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes. Nat. Commun., 10, 3120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B27] Stoeckius M. et al. (2017) Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods, 14, 865–868. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B28] Stoeckius M. et al. (2018) Cell hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome Biol., 19, 224. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B29] Svensson V. et al. (2018) Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc., 13, 599–604. [DOI] [PubMed] [Google Scholar]

[btac213-B30] Swanson E. et al. (2021) Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq. Elife, 10, e63632. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B31] Xin H. et al. (2020) GMM-Demux: sample demultiplexing, multiplet detection, experiment planning, and novel cell-type verification in single cell sequencing. Genome Biol., 21, 188. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B32] Xu J. et al. (2019) Genotype-free demultiplexing of pooled single-cell RNA-seq. Genome Biol., 20, 290. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B33] Zhao Y. et al. (2020) How to do quantile normalization correctly for gene expression data analyses. Sci. Rep., 10, 15534. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B34] Zheng G.X. et al. (2017) Massively parallel digital transcriptional profiling of single cells. Nat. Commun., 8, 14049. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac213-B35] Ziegenhain C. et al. (2017) Comparative analysis of single-Cell RNA sequencing methods. Mol. Cell., 65, 631–643.e634. [DOI] [PubMed] [Google Scholar]

PERMALINK

BFF and cellhashR: analysis tools for accurate demultiplexing of cell hashing data

Gregory J Boggy

G W McElfresh

Eisa Mahyari

Abigail B Ventura

Scott G Hansen

Louis J Picker

Benjamin N Bimber

Roles

Abstract

Motivation

Results

Availability and implementation

Supplementary information

1 Introduction

2 System and methods

2.1 Barnyard dataset

2.2 Human PBMC dataset

2.3 Threshold determination

Fig. 1.

2.4 Overview of BFFraw and BFFcluster classification

Fig. 2.

2.5 Bimodal quantile normalization

Fig. 3.

3 Results

3.1 Comparison of barcode normalizations

Fig. 4.

3.2 BFFcluster and BFFraw classification

Fig. 5.

3.3 BFF comparison to other demultiplexing algorithms

Fig. 6.

4 Discussion

5 Conclusion

6 Implementation

6.1 Software availability

6.2 Data availability

6.3 Data sources

6.4 BFF threshold determination

6.5 Normalization methods

6.6 Performing parameter scans of BFFcluster parameters

6.7 Demultiplexing algorithms

6.8 Accuracy determination

6.9 Tenfold cross validation

6.10 Doublet rate predictions

Funding

Supplementary Material

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

2.4 Overview of BFF_raw and BFF_cluster classification

3.2 BFF_cluster and BFF_raw classification

6.6 Performing parameter scans of BFF_cluster parameters