Summary
We introduce BacDrop, a highly scalable technology for bacterial single-cell RNA sequencing that has overcome many challenges hindering the development of scRNA-seq in bacteria. BacDrop can be applied to thousands to millions of cells from both gram-negative and gram-positive species. It features universal ribosomal RNA depletion and combinatorial barcodes that enable multiplexing and massively parallel sequencing. We applied BacDrop to study Klebsiella pneumoniae clinical isolates and to elucidate their heterogeneous responses to antibiotic stress. In an unperturbed population presumed to be homogenous, we find within-population heterogeneity largely driven by the expression of mobile genetic elements that promote the evolution of antibiotic resistance. Under antibiotic perturbation, BacDrop revealed transcriptionally distinct subpopulations associated with different phenotypic outcomes including antibiotic persistence. BacDrop thus can capture cellular states that cannot be detected by bulk RNA-seq, which will unlock new microbiological insights into bacterial responses to perturbations and larger bacterial communities such as the microbiome.
Keywords: Bacterial single-cell RNA-seq, droplet, massively parallel sequencing, antibiotic perturbation, bacterial heterogeneity
Graphical Abstract
In Brief:
BacDrop is a droplet-based technology for single-cell RNA-seq in bacteria, that can be scaled to millions of bacterial cells or hundreds of samples, and was used to elucidate transcriptionally distinct bacterial subpopulations associated with varying phenotypic outcomes linked to antibiotic resistance and persistence
Introduction
Single-cell RNA-seq (scRNA-seq) has led to important discoveries in mammalian systems, resulting in great appreciation of the transcriptional heterogeneity of cell types and cell states1–8. While heterogeneity is essential for bacterial communities, bacterial scRNA-seq is underdeveloped due to longstanding technical challenges, including bacterial lysis, the absence of polyadenylated tails on messenger RNA (mRNA), and the paucity of mRNA molecules in a single bacterial cell9, which collectively lead to degraded transcriptome coverage and quality relative to mammalian systems.
Recently, several bacterial scRNA-seq approaches have been described, including plate-based methods such as microSPLiT, PETRI-seq, and MATQ-seq10–13, and probe-based methods such as par-seqFISH14,15. These plate-based scRNA-seq technologies enable genome-wide scRNA-seq, but they have to date been limited by the numbers of cells (scale) that can be studied. Additionally, sequencing reads of these plate-based methods are dominated (>90%) by the overwhelmingly abundant rRNA, wasting a large portion of the sequencing investment. In contrast, the probe-based methods avoid the problem of rRNA and have improved scale. However, they require prior knowledge of the genome(s) of interest to enable probe design and generation, limiting the numbers of genes that can be queried. Due to these limitations, studies using these previously reported methods have mainly focused on between population heterogeneity, with a focus on proof of principle, whereas within population heterogeneity, which requires the characterization of large number of cells from a single population at genome level, has not been extensively described.
One of the fundamental lessons from eukaryotic scRNA-seq is that because of the co-variation structure of gene expression within and between cells, profiling larger number of cells shallowly is a more favorable experimental design than profiling a small number of cells deeply, and can better recover the statistical properties of cell populations and gene programs16. This is especially the case when molecular techniques limit the profile of each cell to a relatively low fraction of transcripts, randomly sampled from all transcripts in the cell. Combined with powerful algorithms, numerous scRNA-seq studies in eukaryotic systems have demonstrated that novel cell types, states, dynamic trajectories, gene programs, and even features like spatial locations and cell interactions can be comprehensively recovered by such massively parallel methods, whereas per-cell coverage plays a less important role1–4,6,16–26. Experimental and computational analyses have also demonstrated that low coverage and/or low sequencing depth is sufficient for effective cell clustering, detection of rare populations, identification of biomarkers when analyzing a large number of cells and conducting genetic “Perturb-Seq” screens25–29. The underlying principle of this observation is that scRNA-seq remains a sampling strategy, with only a portion of cells sampled from a population (scale) and a portion of RNA molecules sampled from a cell (coverage)16. Although higher coverage enables deep profiling of some cells and genes, it lacks the statistical power to recapitulate the phenotypic landscape of the population when the scale is limited. In contrast, because gene expression is structured due to shared regulatory mechanism, when a large number of cells are analyzed, even lower coverage of each cell allows the recovery of shared patterns such as clustering of similar cells to identify distinct subpopulations or ordering cells by Pseudotime to recover temporal trajectories16,26,30. Thus, to capture the phenotypic landscape of the population and its statistical properties and distributions, rather than what is happening to one particular gene in any one particular cell, scale is more critical.
This same biological and statistical principle is more important in single cell transcriptional analysis of bacteria due to the inherent paucity of mRNA molecules in a single bacterial cell9. Additionally, if applied to samples with high complexity or diversity, large numbers of cells are critical to ensure the capture of enough cells from any individual rare subpopulation or species, such as persister or heteroresistant subpopulations9,31–33 or the microbiome, mirroring the requirements in large-scale mammalian projects such as the Human Cell Atlas34. Therefore, methods for increasing bacterial scRNA-seq scale will be essential for studies in microbial systems.
Here we report the development and application of BacDrop, a droplet-based genome-wide massively parallel bacterial scRNA-seq technology. BacDrop has the flexibility to investigate a wide range of numbers of cells in one experiment, from thousands to millions of single bacterial cells, without requiring prior knowledge of the genome or probe-design. BacDrop also includes a universal and efficient rRNA-depletion step, reducing sequencing costs by at least ten-fold while simultaneously increasing information content. Furthermore, we demonstrated that BacDrop works on a variety of bacterial species, including gram-negative Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa, and gram-positive bacterium Enterococcus faecium.
In this study, we applied BacDrop to assess any within population heterogeneity of K. pneumoniae and characterize its responses to antibiotics. K. pneumoniae is one of the leading threats in the antibiotic resistance crisis, especially resistance to carbapenems, the last-resort antibiotic used to treat the most resistant infections35. Under stable and dynamic conditions (i.e., without and with antibiotic perturbations), we identified important features of heterogeneity that have been masked by bulk analysis. In the absence of antibiotic perturbation, we found within population heterogeneity driven predominantly by mobile genetic elements (MGEs) which promote the evolution of antibiotic resistance, thus demonstrating novel subpopulation structure and function in a previously presumed homogenous population. After antibiotic perturbation, BacDrop revealed transcriptionally distinct subpopulations that are associated with different phenotypic outcomes including decreased antibiotic efficacy and persister formation. With this demonstration of the power of BacDrop to illuminate heterogeneity both in static bacterial populations and their dynamic responses to perturbations, we propose that BacDrop has the potential to transform our understanding of bacterial survival, adaptation, and evolution, with numerous potential applications including the characterization of complex communities.
Results
BacDrop: a bacterial droplet-based massively parallel scRNA-seq technology
We developed BacDrop based on droplet-based microfluidics technology, which has the advantage of achieving much higher scale than plate-based technologies, leveraging the 10x Genomics™ platform, a reliable and commercially available platform for scRNA-seq24. Two of the unique features of bacteria compared to mammalian cells are the challenge of lysing the bacterial cell wall, which requires harsher lysis conditions, while preserving RNA integrity, and the absence of mRNA polyadenylated tails, thus requiring an alternative approach to isolate mRNA (~5%) from the vastly more abundant rRNA (~95%). To overcome these problems, we adapted a previously reported cell fixation and permeabilization protocol to avoid cell lysis prior to droplet encapsulation10,12, and implemented universal rRNA and genomic DNA (gDNA) depletion steps within the permeabilized, fixed cells using RNase H and DNase I, respectively (Fig. 1A, Fig. S1A).
Figure 1. BacDrop: a bacterial droplet-based massively parallel scRNA-seq technology.
(A) BacDrop workflow. Following cell fixation and permeabilization, rRNA and gDNA is depleted from cells in bulk. Then CB1 and UMIs are added to the 5’ end of cDNA via RT reactions (round 1 plate barcoding) in 96- or 384-well plates. After round 1 plate barcoding, all cells are pooled and cDNA is polyadenylated at the 3’ end using terminal transferase, followed by droplet generation and round 2 droplet barcoding. The 3’ poly-A tail of cDNA enables second strand synthesis using oligo-dT primers. Round 2 droplet barcoding is achieved via second strand cDNA synthesis and 4 capturing cycles by barcoded primers (on 10x gel beads) in droplets. The successfully captured cDNA contains UMIs, CB1 and CB2, as well as adaptor sequences at both 5’ and 3’ end. Each cell is identified by a combination of CB1 and CB2.
(B) Scheme of two rounds of cell barcoding and library construction. The RT primer is composed of a partial primer binding sequence (PBS) for Illumina sequencing, UMI (8 bp), CB1 (13 bp), and 6-bp random sequence for RT priming. The 3’ of cDNA was polyadenylated in cells after RT. A specific number of cells (thousands to millions) were then encapsulated, followed by 2nd strand cDNA synthesis and round 2 barcoding in droplets to attach CB2 to the double-stranded cDNA. An adapter (SMRT) sequence was added to the polyA end of the cDNA to enable cDNA amplification. After cDNA purification and amplification, tagmentation and PCR enrichment was performed to generate Illumina sequencing libraries.
Droplet-based single cell approaches require Poisson loading at mean droplet occupancy values (λ) far less than 1 to ensure that multiple cells are not encapsulated within a single droplet 1 and thus do not share the same barcode, limiting numbers of cells that can be loaded in each microfluidic channel. We overcame this problem by utilizing a multi-step barcoding strategy wherein cells are uniquely identified using the combination of two barcodes (Fig. 1): a “plate barcode” (CB1) as a pre-indexing step corresponding to one of a possible 384 different barcodes achieved via reverse transcription (RT), and a second “droplet barcode” (CB2) that uniquely identifies each droplet by utilizing the commercially available single-cell kit from 10x Genomics™ (see Method details) and loaded 3–6 cells per droplet. Together CB1 and CB2 constitute the cell barcode. This pre-indexing strategy has recently been successfully applied to mammalian droplet-based scRNA-seq36. By pre-indexing, we were able to load each droplet with multiple cells, increasing the scale significantly from a λ of ~0.3 for conventional droplet scRNA-seq experiments to a λ of >1 for our approach with multiple cells in each drop. Additionally, template switching is less efficient for second strand cDNA synthesis from bacterial than mammalian mRNA due to the lack of proper modifications at the 5’ end of bacterial mRNA37,38. We thus used Terminal Transferase (TdT) to append poly-A tails to the 3’ end of cDNA to facilitate second strand cDNA synthesis inside droplets.
Determining the technical performance of BacDrop
To characterize the performance of BacDrop, we tested the efficiency of rRNA depletion, the effect of fixation on gene expression profiles, the congruence of transcriptional profiles with bulk RNA-seq libraries, barcode collision rates, and generalizability to different bacterial species. After rRNA depletion, the fraction of mRNA reads increased from ~5% to 50–90% of the total aligned reads, while preserving mRNA profiles relative to non-depleted samples (R2 = 0.81; Fig. 2A–2B; Fig. S1B). In addition, we confirmed that mRNA profiles were not affected by cell fixation in bulk experiments (R2 = 0.97; Fig. 2C), and that BacDrop produces transcriptional profiles that are well-correlated with those generated by the traditional bulk RNA-seq method (R2 = 0.91; Fig. 2D).
Figure 2. Validation and technical performance of BacDrop.
(A) rRNA depletion inside fixed and permeabilized cells significantly increases the percentage of total reads that align to mRNA in BacDrop libraries of K. pneumoniae. The average percentage of reads aligned to mRNA genes was calculated from 10 independent BacDrop libraries of K. pneumoniae (5 without rRNA depletion and 5 with rRNA depletion), and error bars were plotted as the standard deviation.
(B) rRNA depletion does not affect transcriptional profiles. BacDrop libraries were constructed using in K. pneumoniae samples with or without rRNA depletion, and a linear regression model was fitted to the mRNA counts from each library (R2 = 0.81).
(C) Cell fixation and permeabilization does not affect transcriptional profiles. Bulk RNA-seq results derived from Trizol-extracted RNA samples of K. pneumoniae versus RNA derived from fixed and permeabilized K. pneumoniae cells were highly correlated (R2 = 0.97).
(D) Single-cell BacDrop results are highly correlated with bulk RNA-seq results (R2 = 0.91) when analyzed in bulk mode (without cell barcode extraction).
(E) BacDrop has low barcode collision rates in an experiment where 2 million bacterial cells, mixed with K. pneumoniae and P. aeruginosa cells, were loaded into one 10x channel (~6 cells per droplet). About 2.8 % of the cells were assigned to two species, resulting in a 6.6% barcode collision rate.
(F) BacDrop was performed on 4 different bacterial species including E. coli, K. pneumoniae, P. aeruginosa, and E. faecium. At the sequencing depth of 500 reads per cell, approximately 5,000 cells of E. faecium, 2,500 cells of E. coli, 1,000 cells of K. pneumoniae, and 300 cells of P. aeruginosa passed the analysis threshold (see methods). Uniform Manifold Approximation and Projection (UMAP) of this mixed population shows the separation of different species, colored by species identity.
(G-I) Testing the sensitivity of BacDrop using three GFP strains of E. coli. The expression levels of gfp in these three E. coli strains were confirmed via flow cytometry (G). The mean numbers of gfp transcripts per cell were estimated via RT-qPCR (H). Three biological replicates were performed, and error bars were plotted as standard deviation. The two-tailed Student’s t-test was used for statistical analysis.
(I) Roughly 3,300 cells from each gfp strain were mixed to create a heterogeneous population and a BacDrop library was generated. The gfp expression levels were calculated using log2-transformed value of transcript per 10,000 reads (log2 (TP10K+1)) per cell from the BacDrop results. Compared to the RT-qPCR results (H), BacDrop showed a good sensitivity for the gfp.high strain. The difference between gfp.mid and gfp.low is less distinct but statistically significant (p < 0.005). The Wilcoxon signed-ranks test was used for the statistical analysis.
See also Figure S1.
We confirmed that the two-step barcoding strategy, which was used to enable loading of multiple cells per droplet, resulted in minimal cell barcode collision rates. We conducted round 1 plate barcoding using 384 RT primers and round 2 droplet barcoding containing on average 6 cells per droplet, though with a range of cell numbers per droplet, on a mixed population of two distinguishable species, K. pneumoniae and P. aeruginosa. Of 44,140 cells that passed quality control at a sequencing depth of ~1,000 reads per cell, 42,893 cells (97.2%) had > 99% of UMIs aligning to a single species, whereas the other 1,247 cells (2.8%) had > 1% of UMIs aligning to both species (Fig. 2E). The resulting barcode collision rate of 6.6% compares to published methods for eukaryotic droplet scRNAseq which ranges from 0.36% to 11.3% depending on numbers of cells loaded in droplets1,36. We performed all subsequent experiments with < 3 cells per droplet, which would yield a library containing maximally a million cells in each 10x channel with an even lower collision rate.
To demonstrate the generalizability of BacDrop to different bacterial species and the ability of BacDrop to differentiate among multiple species, we applied BacDrop to the gram-negative E. coli, K. pneumoniae, P. aeruginosa39, and the gram-positive E. faecium40 (Fig. 2F, Table S2). Each strain was separately labeled with a unique set of CB1 (Table S1), allowing us to track and validate the accuracy of species identification in downstream analyses. CB1-barcoded cells were then pooled for round 2 droplet barcoding and library construction. All four species were distinguished in the analysis using Seurat and the RNase-H based rRNA depletion worked efficiently across all four species (Fig. 2F, Fig. S1C–S1D). Cells from E. faecium fell into several subpopulations distinguished by differential expression of two highly expressed housekeeping genes (ef-tu and ef-g; log2 fold change ~ 0.6). We noted that P. aeruginosa cell numbers were underrepresented among the four species in the sequenced library and determined that their loss before round 2 droplet barcoding could be accounted for by inefficient recovery following centrifugation due to their small size (Fig. S1E–S1F). This technical consideration was corrected in subsequent experiments by optimizing centrifugation speeds for each species/sample and by only pooling samples just prior to round 2 droplet barcoding to ensure equal number of cells from each sample are loaded into the 10x channel.
We next examined the transcriptome coverage of BacDrop in E. coli and K. pneumoniae by generating two small libraries containing either ~10,000 cells of E. coli or ~12,000 cells of K. pneumoniae (Fig. S1G–S1H). Both libraries were sequenced with ~80,000 reads per cell, and we recovered roughly 4,000 or 6,000 cells (40% to 50% cell recovery rates), containing at least 15 mRNA genes per cell, respectively. We detected the expression of ~70% genes of the E. coli genome and ~80% genes of the K. pneumoniae genome when we analyzed all cells together (Table S3, Table S4). At the single-cell level, we detected an average of 90 and 88 mRNA genes per cell (only including cells in which at least 15 mRNA genes detected), respectively, which is comparable to other reported bacterial scRNA-seq methods10,12,13. We also generated a large library from ~1 million K. pneumoniae cells and sequenced with ~5,000 reads per cell. We recovered ~60,000 cells with at least 15 mRNA genes per cell and detected the expression of 96% genes of the entire genome when we analyzed all cells together (Table S5). At the single-cell level we detected an average of 30 mRNA genes per cell across all 60,000 cells with at least 15 mRNA genes detected. However, from this large library, the top 3,000 high-quality cells had an average of 127 mRNA genes detected per cell (Fig. S1I). Since this large library was only sequenced with ~5,000 reads per cell, we expect to detect a higher number of mRNA genes per cell with increased sequencing depth, as illustrated with the smaller libraries.
Finally, we assessed BacDrop’s sensitivity to distinguish different expression levels of a gene. We created a heterogeneous population containing three E. coli strains constitutively expressing gfp at different levels (Table S2). We used flow cytometry to confirm the differing expression levels of gfp in these strains and estimated the mRNA copy numbers of gfp in each of these strains using RT-qPCR9 (Fig. 2G–2H). The estimated mRNA copy numbers per cell were 1–5 for the gfp.low strain, 9–30 for the gfp.mid strain, and 30–70 for the gfp.high strain. We then quantified gfp expression in single cells from the BacDrop results (Fig. 2I) and found that gfp expression levels estimated from BacDrop are statistically different among three strains (p <0.005), which is consistent with the RT-qPCR results. Notably, BacDrop did not detect the expression of gfp in certain fractions of cells from all three GFP strains, which is likely a true biological phenomenon given that the bimodality in GFP protein levels previously described in E. coli41 was also observed here with flow-cytometry (Fig. 2G).
Validating BacDrop’s ability to identify subpopulations of a single bacterial isolate
Since our initial results showed that BacDrop is a robust and reliable technology with sufficient coverage and sensitivity that can be applied across different numbers of cells and species, we next investigated whether BacDrop could also reproducibly identify subpopulations of cells within the same isolate. We chose to explore biological heterogeneity in the antibiotic-susceptible clinical isolate K. pneumoniae MGH66 in the absence and presence of antibiotic perturbations (Fig. 3A, Table S2). We split one MGH66 culture (OD600 ~0.2) into four identical cultures, one of which was left untreated while the other three were treated with an antibiotic with a different mechanism of action including: inhibition of cell-wall synthesis (meropenem), DNA synthesis (ciprofloxacin), and protein synthesis (gentamicin). This experiment was performed in duplicate for two biological replicates (Replicate 1 and Replicate 2). For each replicate, bulk RNA-seq on samples collected using the same treatment schemes revealed distinct cellular responses to each antibiotic relative to the untreated control. As expected (Fig. 3B), ciprofloxacin induced genes involved in the SOS response, e.g., recA, while gentamicin treatment induced a group of heat shock chaperone proteins, e.g., ibpB. In contrast, at 30 minutes, minimal transcriptional responses were observed in bulk from meropenem treatment, under the condition used (OD600 ~ 0.2, meropenem concentration 2 μg/mL), which is consistent with previous observations42.
Figure 3. Validating BacDrop’s ability to distinguish subpopulations based on distinct responses to different antibiotic treatments.
(A) Creation of a BacDrop library containing cells of the same bacterial strain under 4 different conditions, including treatment of meropenem, ciprofloxacin, gentamicin, and untreated control. Cells were collected and processed separately until after round 1 plate barcoding. The four samples were then pooled for round 2 droplet barcoding and library construction. Two biological replicates were performed.
(B) Bulk RNA-seq results of cells exposed to the same antibiotic conditions as in (A). The abundance for each treated condition and comparison between the treated and untreated cultures are shown as well as significantly up- and down-regulated genes from each treatment (performed in triplicates).
(C) UMAP plot based on the original identity of the 6 samples treated with meropenem (M and its replicate M.2), ciprofloxacin (C and its replicate C.2), and gentamicin (G and its replicate G.2).
(D) Unsupervised UMAP showed three clusters with significantly (p < 0.05) higher expression of genes in the SOS-response pathway, heat-shock response, and genes encoding an IS903B transposase (MGE).
(E) No strong batch effect was observed between the two biological replicates with the same treatment conditions.
(F) Expression of a representative gene from each cluster was highlighted on the UMAP. The purple color bars represent the normalized expression of a gene across all cells analyzed.
See also Figure S2.
To validate BacDrop, we applied round 1 plate barcoding to each of these 4 samples separately using one of 4 distinct sets of CB1 (96 different CB1s corresponded to each sample, Table S1) that allowed us to confirm the original identity of these 4 samples and associate them with the corresponding antibiotic exposure. We then mixed all cells from the different treatments for round 2 droplet barcoding and library construction (Fig. 3A). Each of the replicate libraries contained roughly one million cells. Replicate 1 was sequenced with ~5 billion paired-end reads (~5,000 reads per cell) and Replicate 2 with ~3 billion paired-end reads (~3,000 reads per cell).
To confirm BacDrop’s robustness and reproducibility, we compared the two replicate experiments. For the depth of sequencing performed for each experiment, we obtained ~80,000 cells with ≥15 mRNA genes per cell, with an average of ~30 unique mRNA genes detected per cell. No strong batch effect was observed between the replicates. Treatment with the different antibiotics resulted in cells clustering based on their treatment conditions (Fig. 3C–3F). The overlap between the meropenem-treated and untreated samples (Fig. S2) was consistent with their bulk RNA-seq results which suggest relatively minimal transcriptional response to meropenem, at least on the population level (Fig. 3B). Across both replicates, 72% and 56% of the ciprofloxacin-treated cells belonged to the SOS-response cluster, while 72% and 66% of the gentamicin-treated cells belonged to the heat-shock response cluster, with good reproducibility across the two replicates (Fig. 3E). These experiments confirmed that BacDrop could reproducibly identify population heterogeneity.
BacDrop reveals within population heterogeneity with subpopulations driven largely by the expression of MGEs
We next analyzed in depth the untreated culture of MGH66 at the single cell level to understand if it contained any previously unrecognized within population heterogeneity in transcriptional states. From this untreated condition we recovered ~50,000 cells with at least 15 unique mRNA genes detected in each cell, and we identified two major subpopulations in both replicates using an unsupervised clustering approach (see Method details; Fig. 4A, Fig. S3A). While most cells fell into one major homogenous subpopulation, 2,191 cells (~4.5%) fell into the MGE subpopulation driven by IS903B transposase genes (Fig. 4B), which has 83 copies in MGH66 genome. (In fact, this MGE subpopulation is present in all 8 samples, untreated and antibiotic treated replicates, suggesting both that the presence of this MGE population is a robust phenomenon and that BacDrop is reproducible (Fig. 3D–3F, 4A).)
Figure 4. BacDrop reveals within population heterogeneity driven largely by MGE.
(A and B) In untreated culture of MGH66, a population showing high-level expression of IS903B transposase (MGE, 4.5%; green) was detected.
(C and D) Flow cytometry of reporter MGH66 strains expressing GFP driven by the promoter of IS903B (MGH66:PIS903B:gfp) shows a heterogenous expression pattern. The MGE.high population (~10% of the whole population) and MGE.low population (~10% of the whole population) were sorted into MHB medium without antibiotics, and mutation frequencies (D) were measured under meropenem treatment. Experiments in panel (C-D) were repeated with nine biological replicates. Error bars were plotted as the standard deviation. The Student’s t-test was used for statistical analysis.
(E and F) MGE-driven subpopulations were detected in another K. pneumoniae clinical isolate BIDMC35. UMAP (E) and heatmap (F) shows 4 subpopulations differing from the majority population (Cluster 0; red) of BIDMC35. Clusters 2, 3, and 4 are each driven by the high expression of a different transposase gene. In cluster 1, nearly all highly expressed genes belong to a prophage in BIDMC35 genome. Expression levels of all genes are normalized to expression in Cluster 0 (F).
Previously we had functionally shown that MGH66 and several other K. pneumoniae isolates have high-level transposon insertional mutagenesis activity, which contributes to their high frequencies of carbapenem resistance acquisition43; however, bulk RNA-sequencing had failed to show elevated expression of any transposon genes in these strains. Here, the existence of this subpopulation with high-level transposon expression provides a possible explanation for the strain’s elevated carbapenem-resistance frequencies, with resistance likely emerging from this small subpopulation. To test this hypothesis, we engineered a MGH66 reporter strain expressing the green fluorescent protein44 driven by the promoter of one copy of the IS903B transpose genes (MGH66: PIS903B:gfp). Consistent with the BacDrop results, we observed heterogeneous expression pattern of gfp in this reporter strain by flow cytometry (Fig. 4C). We then performed fluorescence activated cell sorting (FACS) and sorted the MGE.high (high GFP expression) and MGE.low (low GFP expression) populations (Fig. 4C–4D; Fig. S3B) and measured their mutation frequencies under meropenem treatment using a modified fluctuation analysis43. The MGE.high population had at least 7 times higher mutation frequencies then the MEG.low population under meropenem treatment (p = 0.002) (Fig. 4D), confirming our hypothesis that resistance is more likely to emerge from this subpopulation highly expressing MGE genes and revealing phenotypic consequences of the different subpopulations.
To examine the robustness of the MGE subpopulation in our datasets and to compare the relative values of deeper sequencing of fewer cells versus more shallow sequencing of more cells, we analyzed the untreated samples in Replicate 1 and Replicate 2 libraries separately, along with a smaller BacDrop library containing ~3,000 cells of MGH66 similarly collected but sequenced more deeply. We sequenced this smaller library to obtain 80,000 reads per cell with recovery of ~2,000 cells with at least 15 mRNA genes per cell, and collectively an average of 85 mRNA genes detected per cell of these ~2,000 cells. Analysis of this smaller library detected the same MGE population identified in the larger cell population, albeit less distinctly (Fig. S4). In contrast, both the replicates of the larger libraries, even when sequenced only to a depth to obtain ~30 mRNA genes per cell, identified an additional small subpopulation (0.25% – 0.36%) featuring high-level expression of maltose transport genes that was not identified in the smaller library, despite its deeper sequencing and higher coverage (Fig. S4). Together, these results show that analyzing larger numbers of cells can help to reveal heterogeneity within a bacterial population, which is consistent with analyses in eukaryotic systems16,17,19,25,26,28,29. Although the coverage can be higher when fewer cells were analyzed, increasing the scale, rather than the coverage, resulted in identification of a rare population.
To determine if MGE-driven subpopulations are unique to MGH66, we applied BacDrop to another K. pneumoniae clinical isolate (BIDMC35) (Table S2). BIDMC35 is a carbapenem-resistant isolate in which carbapenem resistance results from a transposon disruption of the major porin gene ompK36 and the transposon-mediated high-level expression of a ß-lactamase gene blaOXA-66345. We again observed MGE-driven subpopulations, as in MGH66. Analyzing 9,748 BIDMC35 single cells that passed the analysis threshold at a sequencing depth of ~2,000 reads per cell (Fig. 4E–4F), we identified three clusters each driven by a unique transposon gene, including Cluster 2 driven by an IS4321 family transposase (195 cells, 2%), Cluster 3 driven by the insH transposase (146 cells, 1.5%), and Cluster 4 driven by an IS110 family transposase (133 cells, 1.4%). Together with the observation in MGH66, it reinforces the finding that variable expression of MGEs may be one of the major drivers of population heterogeneity.
In BIDMC35, besides these MGE-driven subpopulations, we observed another unique subpopulation (190 cells, 2%) driven by the 30- to 320-fold higher expression of a group of prophage genes (Fig. 4E–4F, Table S6), compared to the rest of the populations, indicating that this cluster of cells was likely undergoing spontaneous phage induction. This observation is similar to the phage-induction subpopulation reported in the microSPLiT study12. Additionally, we found the expression of blaOXA-663 is significantly lower in this phage-induction subpopulation (Fig. 4F), possibly caused by spontaneous phage induction.
BacDrop reveals heterogeneous stress responses to antibiotic exposure
Finally, to determine if a perturbed population might have heterogeneous dynamic transcriptional responses, we analyzed single cell responses after different antibiotic exposures (Fig. 3A; Fig. S5). While ciprofloxacin or gentamicin resulted in clear transcriptional responses following treatment, there was no obvious heterogeneity of response at the single cell level under the conditions examined. In contrast, meropenem treatment, which only induced minimal transcriptional responses in bulk (Fig. 3B), despite impacting cell killing at 30 minutes (Fig. S5A), demonstrated heterogenous responses on the single cell level. Meropenem treatment induced four interesting subpopulations with distinct molecular responses, in addition to the MGE subpopulations (Fig. 5A–5B; Fig. S5B; Table S7). These subpopulation clusters were characterized by co-upregulation of genes involved in: i) the stress response (e.g., rseB, yidC, and yhcN); ii) cell wall/membrane synthesis (e.g., nlpl and lpxH); iii) cell wall synthesis and DNA replication (e.g., dnaG and ftsI); or iv) cold shock response (e.g., pnp and cspD). Notably, while CspD was initially identified as a cold shock protein, it has also been reported to be a DNA-binding toxin that inhibits DNA synthesis and induces the formation of persisters in E. coli46.
Figure 5. BacDrop reveals heterogeneous responses to meropenem exposure.
(A) Besides the subpopulation highly expressing the IS903B transposase genes (MGE), meropenem treatment induced heterogenous responses, including a stress-response subpopulation, a cell wall synthesis subpopulation, a DNA replication and cell wall synthesis subpopulation, and a cspD-expressing subpopulation.
(B) Dot plot showing the expression of genes that are significantly different among clusters and the percentage of cells expressing these genes in each cluster.
(C) Validation of subpopulations identified in the meropenem-treated MGH66 using RNA FISH with double marker genes from the “stress response” subpopulation (rseB (green)+ yidC (red)) and “cspD expressing” subpopulation (cspD (green) + rihC (red)). A probe targeting the housekeeping gene ef-tu was used as the positive control to show that more than 99% cells were successfully permeabilizated and hybridized. Subpopulations co-expressing double marker genes were identified. The scale bar size is 15 μm.
(D) Across 20 fields of view, the RNA FISH results showed that ~1% of cells co-expressed cspD and rihC, and ~10% of cells co-expressed rseB + yidC. These results were statistically consistent with the BacDrop result in which ~0.6% of cells co-expressed cspD and rihC, and ~8% cells co-expressed rseB + yidC. Two-way ANOVA was performed for the statistically analysis (p = 0.348). This experiment was repeated twice, and data was plotted separated from two replicates.
(E) Flow cytometry of reporter MGH66 strains expressing GFP driven by the promoter of yhcN (MGH66:PyhcN:gfp; left) or cspD (MGH66:PcspD:gfp; right) shows a heterogenous response to meropenem (green) but not to ciprofloxacin (blue) treatment, relative to untreated control (red).
Having identified genes highly expressed in specific subpopulations after meropenem treatment (Fig. 5B, Table 1), we used RNA fluorescence in situ hybridization (FISH) to validate the existence of the “stress response” and “cspD-expressing” subpopulations by probing for two genes highly co-expressed in each cluster (rseB and yidC, “stress response” cluster; cspD and rihC, “cspD-expressing” cluster) (Fig. 5C). RNA-FISH confirmed both the existence and proportion of cells assigned to the clusters identified by BacDrop (Fig. 5C–5D).
Table 1.
Marker genes identified in clusters of the meropenem-treated sample
Gene name | Gene product | Function | Biological pathways | Cluster | Log2 fold change | P value |
---|---|---|---|---|---|---|
ftsI | PBP3 | Target of β-lactams | Cell wall synthesis | DNA replication and cell wall synthesis | 5.2957 | <1.00E-300 |
dacC | PBP6 | Target of β-lactams | Cell wall synthesis | Stress response | 3.7036599 | 1.81E-40 |
dtpB | DtpB | Transportation of beta-lactams | Dipeptide/Tripeptide Transport | Cell wall synthesis | 6.281594 | <1.00E-300 |
suhB | SuhB | Membrane protein transport and insertion | Cell wall/membrane synthesis | Cell wall synthesis | 4.406598 | 1.39E-125 |
lpxH | LpxH | Lipid A biosynthesis | Cell wall/membrane synthesis | Cell wall synthesis | 7.605374 | <1.00E-300 |
nlpl | Nlpl | Cell wall synthesis, bind to PBP4 and PBP1A | Cell wall/membrane synthesis | Cell wall synthesis | 3.487722 | 2.54E-87 |
rffG | RffG | LPS synthesis | Cell wall/membrane synthesis | Cell wall synthesis | 4.280285 | 5.97E-107 |
dnaG | DnaG | DNA replication | DNA replication | DNA replication and cell wall synthesis | 4.490715 | <1.00E-300 |
yidC | YidC | Membrane damage response | Stress response | Stress response | 4.0687447 | 8.25E-100 |
spy | Spy | Membrane damage response | Stress response | Stress response | 4.0142941 | 2.33E-98 |
rseB | RseB | Membrane damage response | Stress response | Stress response | 5.1874707 | 2.26E-218 |
yhcN | YhcN | Cytoplasmic acid stress | Stress response | Stress response | 6.1654092 | <1.00E-300 |
hscA | HscA | Cold shock | Stress response | Stress response | 3.2262902 | 5.75E-26 |
fumC | FumC | Oxidative stress | Stress response | DNA replication and cell wall synthesis | 5.304257 | <1.00E-300 |
ychF | YchF | Oxidative stress | Stress response | DNA replication and cell wall synthesis | 5.286679 | <1.00E-300 |
groS | GroES | Stress response | Stress response | DNA replication and cell wall synthesis | 4.510794 | 2.41E-261 |
rihC | RihC | DNA damage response | Stress response | cspD expressing | 6.16122 | <1.00E-300 |
cspD | CspD | Persister formation | DNA replication inhibition | cspD expressing | 6.342385 | <1.00E-300 |
IS903B | IS903B | Insertion sequence | Mobile genetic elements | MGE | 4.463117 | <1.00E-300 |
We also validated BacDrop using fluorescence cytometry to independently quantify cell numbers in the “stress response” and “cspD-expressing” clusters. We engineered MGH66 reporter strains expressing GFP44 driven by the promoter of yhcN, a “stress response” gene (MGH66:PyhcN:gfp), or cspD (MGH66:PcspD:gfp). Consistent with BacDrop results, meropenem but not ciprofloxacin induced heterogeneous expression of both yhcN and cspD (Fig. 5E). We confirmed that the cells with reduced fluorescence of MGH66:PcspD:gfp (GFP-low) were live and not simply dying using a live-dead stain and plating for colony forming units (Fig. S6). Interestingly, the GFP-low population had lower fluorescence than the untreated population, suggesting that there may be suppression of cspD expression in this population. Of note, the fractions of cells highly expressing these two genes were greater by flow cytometry than by BacDrop; this would be consistent with the fact that RNA and protein levels are not necessarily well correlated in single cells9, as one copy of mRNA can produce one to hundreds of copies of proteins and mRNA is less stable than protein. Nevertheless, the heterogeneous responses were observed by both methods of measurement.
Taken together, under meropenem perturbation, we observed strong heterogeneous responses driven by various stress response pathways that had been previously masked in bulk RNA-seq results. Rather than uniformly turning on a specific stress response pathway in all cells, a diverse range of stress responses appear to be induced in different subpopulations, which could potentially contribute to heterogeneous cell fates such as cell lysis or antibiotic tolerance.
BacDrop identifies a subpopulation with reduced meropenem efficacy and increased persisters
Given CspD’s reported role in inducing the formation of antibiotic-tolerant persisters in E. coli46, we wondered whether antibiotic tolerant cells might make up a greater percentage of the GFP-high than the GFP-low subpopulation, with cells in which cspD had been induced surviving preferentially under meropenem treatment. Because CspD plays a role in persister formation rather than maintenance, we anticipated that some cells might induce cspD and thus make high levels of GFP, become antibiotic tolerant, then turn off CspD expression resulting in reduced expression of GFP as the GFP is degraded. Despite this possibility, we nevertheless hypothesized that the GFP-high subpopulation might still be enriched for tolerant cells compared to the GFP-low subpopulation. We performed FACS with MGH66:PcspD:gfp coupled with dead-cell stain, after a 30-minute exposure to meropenem, and sorted live cells into GFP-high and GFP-low (Fig. 6A) subpopulations directly into liquid media with and without meropenem. (We confirmed that all sorted cells have similar survival rates on LB agar plates without antibiotics; Fig. S6.) Indeed, the GFP-high subpopulation was enriched for meropenem tolerant cells compared to the GFP-low subpopulation, with evidence of a persister population in the GFP-high but not the GFP-low subpopulations (Fig. 6B). We verified that no genetic mutations were acquired by these persister cells via whole genome sequencing. Consistent with this observation, we also observed ~100 times more persister cells in a MGH66 strain over-expressing cspD, though the over-expression of cspD did not reduce antibiotic susceptibility as reflected in the minimal inhibitory concentration of meropenem (Fig. 6C–6D). Together, these results confirmed the role that cspD plays in persister formation in a subpopulation of K. penumoniae MGH66.
Figure 6. cspD-expressing cluster was enriched with persister cells.
(A) MGH66:PcspD:gfp cells were analyzed by FACS at time 0 and 30 minutes after exposure to meropenem (2μg/mL). After 30 minutes, live cells from GFP-low and GFP-high subpopulations were identified and sorted.
(B) GFP-low and GFP-high subpopulations sorted from meropenem-treated MGH66:PcspD:gfp cells differed in their response to meropenem. GFP-low and GFP-high subpopulations were sorted directly into media with no antibiotic or with meropenem (2 μg/mL). Samples were then taken over time and plated on solid agar to enumerate CFU. Persisters only emerged from the GFP-high subpopulation. The limit of detection is indicated by black dashed line. Three asterisks from the treated GFP-low subpopulation indicate no persister was observed.
(C-D) Overexpression of cspD in MGH66 increased numbers of persisters but did not affect the susceptibility of meropenem. cspD driven by the arabinose inducible promoter pBAD was transformed into MGH66. (C) Induction of cspD with 1% arabinose did not affect the minimal inhibitory concentration (MIC) of meropenem. (D) When cspD was induced with 1% arabinose and treated with meropenem at 2 μg/mL (purple), the numbers of persister cells were significantly greater at 6- and 24-hour time points (asterisk shows significance at 6-hour and 24-hour time points) compared to the culture without arabinose induction under 2 μg/mL meropenem treatment (blue).
All experiments were repeated three times. Error bars were plotted as the standard deviation. The Student’s t-test was used for statistical analysis.
See also Figure S6.
Discussion
We report a novel bacterial scRNA-seq method, BacDrop, that is robust and reproducible, and leverages droplet-based technology to enable the massively parallel profiling of the transcriptional program of thousands to millions of single bacterial cells. In line with analyses in mammalian systems16,25–29, we demonstrated that studying large numbers of cells can robustly identify population heterogeneity including rare subpopulations even with the relatively sparse per cell coverage afforded by single cell technologies (Fig. S4). We applied BacDrop to study the naturally occurring heterogeneity in a heretofore presumed uniform culture of bacteria and its heterogenous responses to perturbation. Due to limited scale, previous genome-wide single cell transcriptional studies on smaller numbers of cells (hundreds to thousands) have demonstrated between population heterogeneity by artificially mixing different populations10–12. Here, we characterized both a stable and a dynamic population of cells derived from the same bacterial isolate and demonstrated a diversity of states within the stable population and heterogenous responses in the dynamic population after perturbation with antibiotic.
Using BacDrop, we report the observation of within population heterogeneity driven predominantly by the expression of MGEs in the presumed uniform cultures of K. pneumoniae. While MGEs drive genetic diversity by their relatively random movement in the genome, we find that the expression levels of individual MGEs are also highly variable within populations of MGH66 and BIDMC35 (Fig. 4). Whether this heterogeneity in expression is due to genotypic variation resulting from the movement of genetic elements such as transposon insertions or phase variation, to transcriptional changes in response to stochastic/or local microenvironmental cues, or to epigenetic mechanisms, remains to be understood. However, the consequences of such variation can be significant. We demonstrated that the high-level expression of MGE genes found only in a subpopulation of MGH66 contribute to the strain’s propensity to become carbapenem resistant, consistent with our previous finding that high-level transposon mutagenesis plays an important role in high-frequency evolution of carbapenem resistance43.
Meanwhile, in response to a perturbation such as exposure to the important antibiotic meropenem, BacDrop revealed a wide range of transcriptional responses that are masked in bulk RNA-seq. Interestingly, a diverse range of stress responses appear to be induced in different subpopulations rather than a single or uniform response occurring in all cells; this subpopulation diversity could potentially contribute to heterogeneous cell fates or phenotypic outcomes, including cell lysis or antibiotic tolerance. It remains to be understood whether this heterogeneous stress response occurring in subpopulations is a general phenomenon in response to stressors or is specific to certain types of stresses such as meropenem, which may be working in a more pleiotropic manner than is assumed, thereby eliciting a wide range of stress responses. Meanwhile, we examined one such subpopulation defined by high-level expression of cspD, a gene encoding a toxin that inhibits DNA replication and induces the formation of persisters, but that is not significantly upregulated in bulk RNA-seq results. Using both FISH and flow cytometry, we confirmed that cspD is induced in a subpopulation after meropenem treatment. Moreover, we found a higher survival rate under meropenem treatment in the cspD-induced subpopulation, pointing to its role in antibiotic tolerance within this subpopulation and more generally, to the importance of certain responses in some subpopulations in surviving the lethal effects of antibiotics.
We have developed BacDrop to be a versatile method to characterize the transcriptional programs of thousands to millions of bacterial cells in multiple species. To demonstrate its utility, we validated BacDrop in four different pathogenic bacterial species. We expect that BacDrop should be easily adapted to more species, including commensal bacterial strains of the microbiome and other environmental species. Therefore, BacDrop will enable the identification of both heterogeneity in cell types (species) in a mixed community or cell states in a single strain under stable or dynamic conditions. We thus propose that BacDrop will become a powerful tool for a broad range of studies, including studies focused on elucidating phenotypic heterogeneity, understanding bacterial interactions in microbial communities, dissecting host-pathogen interactions, expanding our knowledge of the microbiome beyond genomes and bulk metabolomics, and investigating the emergence of antibiotic resistance, persistence and tolerance.
Limitations of the Study
Although BacDrop has the capacity to study millions of cells simultaneously, we are currently limited by sequencing costs which impact sequencing depth and the information (i.e., genes per cell) that can be extracted per cell. Nevertheless, we expect that sequencing costs will continue decreasing and BacDrop will be a powerful platform for large-scale single-cell experiments in bacteria. While we have demonstrated the ability of BacDrop to work on four major bacterial pathogenic species, including the gram-negative E. coli, K. pneumoniae, and P. aeruginosa, and the gram-positive E. faecium (Fig. 2F), we nevertheless recognize that there will be phylogenetic biases in performance, including, as we show, variability in centrifugation in permeabilized cells, but also other factors, e.g., permeabilization reagents and concentrations. We thus anticipate that limitations will be present when applying BacDrop to complex bacterial communities such as microbiota samples with the ability to characterize the single cell transcriptional programs of millions of cells without requiring any prior knowledge of the genomes of interest outweighing more modest technical limitations of individual species.
STAR METHODS
RESOURCE AVAILABILITY
Lead contact:
Please direct requests for resources and reagents to lead contact: Deborah T. Hung (hung@molbio.mgh.harvard.edu)
Materials availability
Plasmids generated in this study are available from the lead contact upon request.
Data and Code availability
Sequencing data and the processed counting matrix have been deposited to GEO repository (GSE180237).
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Bacterial strains
Bacterial strains used in this study are listed in Table S2. K. pneumoniae, E. coli, and P. aeruginosa were cultured in Luria-Bertani (LB) medium or Mueller-Hinton Broth (MHB) with shaking at 37 °C. E. faecium was cultured in Todd-Hewitt broth (THB) with shaking at 37 °C. For the GFP experiment, E. coli strains expressing gfp driven by different promoters were cultured separately in LB at 37 °C. Early exponential growth phase cells (OD600 ~0.2) were collected and fixed immediately. For the antibiotic treatment experiment, K. pneumoniae clinical isolate MGH66 was cultured in MHB at 37 °C until early exponential phase. Then the culture was diluted to OD600 0.05 in MHB. After growing for two doublings (~ 40 min, OD600 ~0.2), the cultures were split into four equal volume cultures. One culture was left untreated, while the other three were treated with relevant antibiotic at breakpoint concentrations set by the Clinical and Laboratory Standards Institute (CLSI): 2 μg/mL for meropenem, 2.5 μg/mL for ciprofloxacin and 4 μg/mL for gentamicin. After 30 min, 7 mL of cells were collected from each of these cultures, and immediately proceeded with the cell fixation protocol. For the bulk RNA-seq experiments, samples were collected using the same treatment schemes, and three biological replicates were included in each condition. For the BIDMC35 experiment, BIDMC35 was cultured in LB medium at 37 °C until early exponential phase (OD600 ~0.2). 7 mL cells were then collected and immediately proceed with the cell fixation protocol.
METHODS DETAILS
Cell fixation and permeabilization
All reagents (Key Resources Table) for cell fixation and permeabilization were kept ice-cold. Up to 10 billion bacterial cells grown at specified conditions were collected by centrifuging at 5525 × g for 10 min at 4 °C. The supernatant was removed and cell pellets were resuspended in 7 mL fresh, ice-cold 4% formaldehyde (Sigma, CN 47608) in 1x PBS and incubated with shaking overnight at 4 °C (Digital Platform Rocker Shaker, VWR). Following overnight fixation, cells were centrifuged at 5525 × g for 10 min at 4 °C. The supernatant was removed and cells were resuspended in 7 mL PBS-RI (1x PBS supplemented with 0.1 U/μL NxGen RNase inhibitor (Lucigen, CN 30281)). Cells were centrifuged again at 5525 × g for 10 min at 4 °C and pellets were resuspended in 700 μL PBS-RI. Subsequent centrifugations were carried out at 7000 × g for 5 minutes at 4 °C. We noticed that some species, e.g., P. aeruginosa, do not pellet very well at this centrifugation speed (Fig. S1E–S1F). Thus, we recommend optimizing the centrifugation speed for the specific species of interest. Cells were centrifuged and resuspended in 700 μL 50% Ethanol in PBS-RI. Cells were then washed twice with PBS-RI. After the second wash, cells were resuspended in 1 mL 100 mM Tris-HCL (pH 7.5) supplemented with 0.1 U/uL NxGen RNase inhibitor. Cells were diluted 100x and quantified using a hemocytometer (VWR, CN 102966).
KEY RESOURCES TABLE.
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Bacterial strains | ||
MGH66 | Ma et al., eLife, 202144 | GCA_000694555.1 |
BIDMC35 | Ma et al., AAC, 201846 | GCA_000567225.2 |
UCI38 | Ma et al., eLife, 202144 | GCA_000566805.1 |
E. coli 10ß | NEB | GCF_000008865.2 |
PAO1 | Stover et al., Nature 200039 | GCF_000006765.1 |
EnGen0052-E3346 | Lebreton et al., Cell, 201740 | GCA_000322285.1 |
gfp.low | Zaslaver, A. et al, Nature Method, 200645 | N/A |
gfp.mid | Zaslaver, A. et al, Nature Method, 200645 | N/A |
gfp.high | Zaslaver, A. et al, Nature Method, 200645 | N/A |
MGH66: PyhcN:gfp | This study | N/A |
MGH66: PcpsD:gfp | This study | N/A |
MGH66: PIS903B:gfp | This study | N/A |
MGH66:pBAD: cspD | This study | N/A |
Recombinant DNA | ||
pUA139 | Zaslaver, A. et al, Nature Method, 2006 | N/A |
pUA139-PyhcN:gfp | This study | N/A |
pUA139-PcspD:gfp | This study | N/A |
pUA139-PIS903B:gfp | This study | N/A |
Oligos | ||
SMRT_dT, 2nd strans synthesis after TdT treatment with dATP | IDT | AAGCAGTGGTATC AACGCAGAGTTTT TTTTTTTTTTTVN |
SMRT_PCR, used for cDNA ampilification | IDT | AAGCAGTGGTATC AACGCAGAGT |
P5, P5 partial primer, used for cDNA amplification (paired with SMRT_PCR) and for library construction (paired with Nextera index primers N7**) | IDT | AATGATACGGCGA CCACCGAGA |
cspD_F, Forward primer used to amplify the promoter of cspD from MGH66 and ligate into pUA139. | IDT | ACTGGGATCCATG CAGATGGCGGTTT AATCGCAT |
cspD_R, Reverse primer used to amplify the promoter of cspD from MGH66 and ligate into pUA139. | IDT | ACTGCTCGAGATG CCAT ACT TCG ACA TCC TTC GT |
YhcN_F, Forward primer used to amplify the promoter of yhcN from MGH66 and ligate into pUA139. | IDT | ACTGGGATCCATG CAAGCGTTGTGAT GAGAACAT |
YhcN_R, Reverse primer used to amplify the promoter of yhcN from MGH66 and ligate into pUA139. | IDT | ACTGCTCGAGATG CGAT GTT CAC CTC GTC GAA TC |
RNA FISH oligos | IDT | Table S2 |
Deposited data | ||
BacDrop data | This paper | GSE180237 |
Bulk RNA-seq data | This paper | GSE180237 |
MGH66 persister WGS data | This paper | GSE180237 |
Key reagent | ||
NxGen RNase Inhibitor | Lucigen | 30281-2 |
Lysozyme | ThermoFisher | 90082 |
NEBNext rRNA Depletion Kit | NEB | E7850X |
DNase I | Sigma | AMPD1 |
SUPERase-In RNase inhibitor | ThermoFisher | AM2696 |
Maxima H Minus reverse transcriptase | ThermoFisher | EP0753 |
DTT | ProMega | P1171 |
dNTPs | NEB | N0447L |
Terminal transferase (TdT) | NEB | M0315L |
dATP | NEB | N0446S |
SUPERase-In RNase inhibitor | ThermoFisher | AM2696 |
Chromium Next GEM Chip H | 10X Genomics | 1000161 |
Chromium Next GEM Single Cell ATAC Library & Gel Bead Kit | 10X Genomics | 1000176 |
KAPA HiFi HotStart ReadyMix | Roche | KK2602 |
5% FC-40 oil | RAN Biotechnologies | 008-FluoroSurfactant-5wtF |
Buffer EB | Qiagen | 19086 |
5X SYBR Green | VWR | 12001-796 |
Illumina Nextera XT DNA library preparation kit | Illumina | FC-131-1096 |
Nextera index 1 primer | Illumina | FC-131-2001 |
AMPure XP beads | Beckman Coulter | A63881 |
We found that the number of cells in a permeabilization reaction is critical for achieving sufficient permeabilization. Insufficient permeabilization may result in inefficient rRNA depletion and gDNA removal. For 1 permeabilization reaction, up to 40 million cells were centrifuged and resuspended in 250 μL 0.04% Tween-20 in 1x PBS. If more cells are desired, multiple parallel reactions can be set up. Immediately following a 3-minute incubation on ice, 1 mL cold PBS-RI was added, and cells were spun down and resuspended in 200 μL lysozyme mix (100 mM Tris (pH 8.0), 50 mM EDTA pH 8.0, 0.25 U/μL NxGen RNase Inhibitor, 2.5 mg/mL lysozyme). Cells were then incubated at 37 °C for 15 minutes. After the incubation, 1 mL PBS-RI was added and cells were washed twice with 175 μL PBS-RI. After the second wash, cells were resuspended in 150 μL PBS (without RNase inhibitor added) and cell concentrations were measured by diluting cells 100 times and counting using a hemocytometer.
In-cell rRNA depletion and gDNA removal
Immediately after cell permeabilization, up to 40 million cells were centrifuged and resuspended in 11 μL nuclease-free H2O. The cell number is critical for achieving efficient rRNA depletion (Fig. S1D). 2 μL NEBNext Bacterial rRNA depletion solution and 2 μL Probe Hybridization Buffer (NEB, CN E7850) were mixed with cells on ice. The hybridization was conducted per the following (lid temperature set to 55 °C): 50 °C for 2 minutes, ramp down to 22 °C at 0.1 °C/second, and hold at 22 °C for 5 minutes. Probe hybridization was immediately followed by RNase H digestion by mixing the probe-hybridized cells with 2 μL RNase H reaction buffer, 2 μL Thermostable RNase H (NEB, CN E7850), and 1 μL nuclease-free H2O, followed by a 30-minute incubation at 50 °C (lid temperature set to 55 °C). The 20 μL reaction was centrifuged and resuspended in 10 μL DNase-RI buffer (1 μL DNase I reaction buffer (Sigma, CN AMPD1), 1 μL DNase I (Sigma, CN AMPD1), 0.025 μL NxGen RNase inhibitor (Lucigen, CN 30281), 8 μL nuclease-free H2O). The reaction was incubated at room temperature for 30 minutes, and the DNase treatment was stopped by adding 1 μL Stop Solution (50 mM EDTA) and incubating at 50 °C for 10 minutes. After the incubation, cells were centrifuged and washed twice with 100 μL PBS-RI. After the second wash, cells were resuspended in 20 μL 0.5x PBS supplemented with 1 U/μL SUPERase-In RNase inhibitor and used immediately for in-cell reverse transcription.
In-cell reverse transcription, round 1 cell barcoding and sample multiplexing
The round 1 plate barcoding and sample multiplexing is achieved via RT reactions in 384- or 96-well plates. 384 RT primers (Table S1) containing UMI sequences and round 1 plate barcodes (CB1) were synthesized at Integrated DNA Technologies at 100 μM concentration. The primers were diluted with ddH2O to a working concentration of 25 μM, and 2.5 μL of each primer was aliquoted into individual wells of the 384- or 96-well plates. The rRNA and gDNA depleted cells was diluted with nuclease-free H2O supplemented with 1 U/μL SUPERase-In RNase inhibitor and added to the plate containing RT primers (1 μL cells per well). Then the cell-primer mix was incubated at 55 °C for 5 minute and immediately put on ice. For each well, the RT master mix, containing 0.25 μL DTT (100 mM), 0.25 μL dNTP (10 mM each), 0.25 μL SUPERase-In, 1 μL RT buffer, 0.25 μL Maxima H Minus reverse transcriptase (Thermo Scientific, CN EP0753), was added and the RT reaction was incubated as follows (set lid temperature to 60 °C): 22 °C for 30 min, 50 °C for 10 min, 3 cycles of [8 °C for 12 s, 15 °C for 45 s, 20 °C for 45 s, 30 °C for 30 s, 42 °C for 2 min, 50 °C for 3 min], 50 °C for 5 min, hold at 4 °C.
In-cell cDNA 3’ poly-A tailing
After RT, cells were recovered and pooled from 384- or 96-well plate. For multiplexed samples, we suggest to only pool cells from the same sample together, generating individual pools for each sample. This will allow more flexibility to adjust cell numbers of different samples during droplet generation. Then the pooled cells were centrifuged and each pool was resuspended 40 μL nuclease-free H2O supplemented with 1 U/μL SUPERase-In. The poly-A tailing reaction was set up as the following: 38 μL cells, 5 μL 10x terminal transferase buffer, 1 μL dATP (100 mM) (NEB, N0446S), 1 μL SUPERase-In, 5 μL terminal transferase (TdT) (NEB, CN M0315L). The reaction was incubated at 37 °C for 1 hour. Then 10 μL 0.2 M EDTA was added to each 50 μL reaction and incubated at room temperature for 10 minutes. Cells were then centrifuged and resuspended in 10 μL nuclease-free H2O supplemented with 1 U/μL SUPERase-In. Cells were diluted and concentrations were quantified.
Droplet generation
The Chromium Next GEM Chip H (10x Genomics, PN 1000161) and Chromium Next GEM Single Cell ATAC Library & Gel Bead Kit (10x Genomics, PN 1000176) was used for the droplet generation. The unused wells were filled with 70 μL (row 1), 50 μL (row 2), and 40 μL (row 3) 50% glycerol solution. The desired number of cells was diluted to 33.75 μL with nuclease-free H2O supplemented with 1 U/μL SUPERase-In. Right before loading the chip, 33.75 μL cells were mixed with a PCR master mix containing 37.5 μL KAPA HiFi HotStart ReadyMix (Roche, CN KK2602), 2.25 μL second strand synthesis primer SMRT_dT (10 μM) (Key Resources Table), and 1.5 μL Reducing Agent B (10x Genomics, PN 2000087). The chip was loaded with 70 μL PCR-cell mix (row 1), 50 μL Gel Beads (row 2), and 40 μL partitioning oil (row 3), and run on the Chromium system. Approximately 100 μL droplet emulsion was obtained in row 3.
Second strand cDNA synthesis and round 2 droplet barcoding in droplets
To increase thermostability of the droplets, we split each 100-μL emulsion into 4 25-μL reactions into PCR tubes (USA Scientific, CN 1402-4700). In each reaction, 25 μL 5% FC-40 oil (RAN Biotechnologies, CN 008-FluoroSurfactant-5wtF) was added to the bottom and 50 μL mineral oil (Sigma, CN M5904) was added on the top. The 2nd strand cDNA synthesis and round 2 cell barcoding was performed as follows: 95 °C for 30 s, 39 °C for 5 min, 65 °C for 10 min; then 4 cycles of [98 °C for 20 s, 62 °C for 15 s, 72 °C for min], 72 °C for 5 min, hold at 4 °C.
Breaking emulsions and cDNA purification
After the round 2 cell barcoding was finished, the mineral oil and FC-40 oil was removed from the PCR tube, being careful and not to remove the middle layer which contains the emulsion. The emulsion was then combined. In cases where only a small number of cells are desired for library construction, the emulsion can be kept separately but all downstream reaction volume should be reduced accordingly. Each 100 μL emulsion can then be broken by adding 125 μL Recovery Agent (10x Genomics, PN 2000087). The tubes were inverted 10 times and centrifuged briefly to ensure that all droplets were broken. 125 μL Recovery Agent/Partitioning Oil (pink) from the bottom of the tube was then removed. cDNA was first purified using Dynabeads. In brief, for each reaction, a mix containing 182 μL cleanup buffer (10x Genomics, PN 2000088), 8 μL Dynabeads MyOne SILANE (10x Genomics, PN 2000048), 5 μL Reducing Agent B, and 5 μL Nuclease-free water was added. After mixing and incubating at room temperature for 10 minutes, samples were placed on a magnetic separator and washed twice with freshly prepared 80% ethanol. After removing ethanol from the second wash, each sample was eluted in 40.5 μL elution buffer that was prepared by mixing 98 μL Buffer EB (Qiagen, CN 19086), 1 μL 10% Tween 20 (Teknova, CN T0710), and 1 μL Reducing Agent B. Then 40 μL of the elution was transferred to a fresh PCR tube and subjected to a 0.6x Cleanup with AMPure XP beads (Beckman Coulter, CN A63881). The cDNA was then eluted in 30 μL nuclease-free water.
cDNA enrichment
Before the cDNA enrichment, a qPCR reaction, containing 1 μL cDNA, 5 μL KAPA HiFi HotStart Ready Mix, 0.3 μL primer P5 (10 μM), 0.3 μL primer SMRT_PCR (10 μM) (Key Resources Table), 2 μL 5x SYBR green (VWR, CN 12001-796), and 1.4 μL nuclease-free water, was set up to determine cycle numbers of the cDNA enrichment in a real-time thermocycler using the following program: 98 °C for 3 min, 30 cycles of [98 °C for 20 s, 67 °C for 20 s, 72 °C for 3 min], 72 °C for 5 min, hold at 4 °C. The cycle numbers at which the qPCR reaction reaches early exponential amplification phase was determined as the cycle numbers for cDNA enrichment. For cDNA enrichment, 25 μL of cDNA was mixed with 125 μL KAPA HiFi HotStart Ready Mix, 7.5 μL primer P5 (10 μM), 7.5 μL primer SMRT_PCR (10 μM), and 85 μL nuclease-free water. The cDNA was enriched using the same program as the qPCR reaction, using the cycle numbers determined from the qPCR reaction. The enriched cDNA was purified using 0.6x AMPure XP beads and eluted in 50 μL nuclease-free water.
Illumina sequencing library construction
The Illumina Nextera XT DNA library preparation kit (Illumina, CN FC-131-1096) was used to prepare sequencing libraries with these following modifications: ~ 2 ng enriched cDNA and 4 μL ATM was used for each 50 μL tagmentation reaction. Following tagmentation, 2.5 μL primer P5 (5 μM) and 2.5 μL Index 1 primer (N7**, Illumina, CN FC-131-2001) was used for the PCR enrichment. The PCR reaction was removed from the thermocycler and immediately put on ice after 5 cycles of PCR amplification. Then a qPCR reaction, containing 5 μL Nextera XT PCR reaction, 3 μL NPM, 1 μL primer P5 (5 μM), 1 μL Index 1 primer, 3 μL SYBR green, and 2 μL nuclease-free water, was set up to determine the remaining cycle numbers of the library enrichment using the following program: 72 °C for 3 min, 95 °C for 30s, 25 × [95 °C for 10 s, 55 °C for 30 s, 72 °C for 30 s], 72 °C for 5 min, hold at 10 °C. The cycle numbers at which the qPCR reaction reaches one third of the fluorescence saturation was determined as the remaining cycle numbers for library enrichment. Then the remaining 45 μL library enrichment PCR reaction was put back to a thermocycler and amplified using the same program with cycle numbers determined from the qPCR. The libraries were then subjected to a 0.6x cleanup with AMPure XP beads and eluted in 25 μL nuclease-free water.
Illumina sequencing of BacDrop libraries
Libraries were diluted to desired concentrations and sequenced on the Illumina NovaSeq 6000 platform with standard sequencing primers, using the following specifications: Read 1: 60 bp; Read 2: 39 bp; Index 1: 8 bp; Index 2: 16 bp. Depending the scale of the experiment and the sequencing depth desired, NovaSeq 6000 SP (Illumina #20027464), S1 (Illumina #20012865), or S2 (Illumina #20012862) reagent kits were used.
Bulk library construction
To construct sequencing libraries from bacterial cultures, cell pellets collected from 1 mL early exponential phase cultures were re-suspended in 500 μL TRIzol Reagents (ThermoFisher, CN 15596026) and frozen at −80 C for at least 20 min. Cells were then thawed and mixed with 250 μL of 0.1 mm diameter Zirconia/Silica beads (BioSpec Products), and lysed mechanically via bead-beating for 90 second at 10 m/sec on a FastPrep (MP Bio). After addition of 0.1 mL chloroform, each sample tube was mixed thoroughly by inversion, incubated for 3 minutes at room temperature, and centrifuged at 12,000 xg for 15 minutes at 4˚C. The aqueous phase was mixed with an equal volume of 100% ethanol, transferred to a Direct-zol spin column (Zymo Research, CN R2051), and RNA was extracted according the Direct-zol protocol. The sequencing libraries were then generated using the RNAtag-Seq protocol47.
To construct sequencing libraries from fixed cells, or fixed and permeabilized cells, 20 μL cells were pelleted and resuspended 20 μL lysis buffer (50 mM Tris pH 8.0, 200 mM NaCl, 25 mM EDTA pH 8.0) supplemented with 1.6 μL proteinase K (50 mg/mL). Cells were lysed at 55 °C for 1 hour, then RNA was purified using 1.5x AMPure RNAClean XP beads (Beckman Coulter, CN A63987) and eluted in 20 μL nuclease-free water. The sequencing libraries were then generated using the RNAtag-Seq protocol.
Quality control for rRNA and gDNA depletion
The efficiency of rRNA and gDNA depletion can be assessed by lysing and extracting RNA from permeabilized cells. As shown in Fig. S1A, without rRNA and gDNA depletion, three peaks were clearly seen by running the extracted RNA on Agilent TapeStation RNA 1000 high sensitivity tape. The three peaks are 16s rRNA (~1000 bp), 23 rRNA (~2000 bp), and gDNA (> 4000 bp). When an efficient rRNA and gDNA depletion was achieved, these three peaks will be absent. Due to the capacity of the rRNA depletion kit (10 μg total RNA maximal), the rRNA depletion efficiency is a function of cell numbers (Fig. S1D). In BacDrop protocol, we used 4 × 107 cell for each rRNA depletion reaction, resulting in a ~80% depletion efficiency.
Assessing cell recovery rates during cell permeabilization
In the experiment in Fig. S1E, cell numbers were quantified by counting cells using a hemacytometer before and after each centrifugation step to calculate the cell recovery rates. Equal numbers of cells from each species were mixed before round 1 plate barcoding (RT). Due to different cell sizes, these four species had different recovered rates at each centrifugation step at 7,000 × g. After RT and PolyA tailing (right before round 2 droplet barcoding), the cells recovered from each species differed significantly.
Saturation curve for coverage analysis
Three libraries were used to generate the saturation curve to assess the coverage of BacDrop (Fig. S1G–S1I). The first library containing 10,000 cells of E. coli was sequenced at 80,000 reads per cell. Roughly ~4,000 cells were recovered with an average of 90 mRNA genes detected per cell. The second library containing 12,000 cells of K. pneumoniae was sequenced at 80,000 reads per cell. Roughly ~6,000 cells were recovered with an average of 88 mRNA genes detected per cell. The third library containing ~1 million cells of K. pneumoniae was sequenced at 5,000 reads per cell. Top 3,000 cells were analyzed with an average of 127 mRNA genes detected per cell. For the first two libraries, sequencing reads were randomly subsampled to ~40,000, ~20,000, ~4,000 reads per cell, and the analysis was repeated to calculate numbers of mRNA genes detected per cell. For the third library, sequencing reads were randomly subsampled to ~4,000, ~3,000, ~2,000 reads per cell, and the analysis was repeated to calculate numbers of mRNA genes detected per cell.
BacDrop experiments to assess the numbers of cells required to detect extremely rare populations
Two libraries (Replicate 1 and Replicate 2) were constructed with ~250k untreated MGH66 cells. We sequenced Replicate 1 with ~5,000 reads/cell and Replicate 2 with ~3,000 reads/cell, and recovered ~40k cells and ~10k cells from Replicate 1 and Replicate 2, respectively. In these two libraries, a subpopulation highly expressing maltose transport genes (i.e, lamB) were detected. However, this subpopulation was not detected in the third library containing only ~3,000 untreated MGH66 cells that were collected at the same condition. The MGE subpopulation was detected in all libraries.
Estimation of mRNA copy numbers in E. coli GFP strains using RT-qPCR
The estimation of mRNA copy numbers was performed using a protocol modified from a previous study9. In brief, a gfp gBlock fragment was synthesized at Integrated DNA technology. A serial dilution was performed to create gfp dsDNA standards ranging from 1 to 1010 molecules/μL. One μL of the standard was used in a 10 μL qPCR reaction to generate the standard curve. 10 ng RNA from each GFP strain (~66,666 cells with the assumption that there is 0.15 pg RNA per bacterial cell) was converted into cDNA and subjected to qPCR reaction together with the gfp dsDNA standards. GFP mRNA copy numbers were estimated by mapping to the standard curve.
Killing kinetics
At each time points, cells under antibiotic treatments and the untreated samples were diluted and plated on LB agar plates without antibiotics. CFUs from each condition were enumerated and normalized to the CFUs of the untreated samples at the same time points to calculate the killing rates for each time points.
Construction of GFP reporter strains
We amplified the promoter region of cspD and yhcN from MGH66 and ligated it into pUA13944 using the BamHI and XhoI sites, resulting in pUA139-PcspD and pUA139-PyhcN. For the promoter of IS903B, we synthesized the promoter sequence at IDT with BamHI and XhoI restriction digestion sites as the following sequences and ligated it into pUA139: GGATCCAGAAATTCTCTGTTCCATGGTAGATTAATAAGTCCCCAACATTTAAATATACAGGATAATCTAAATATTAC TTCGTTCTTATCCTTAATAAATGGCAAAATTTCATTTAATTTATTTTTCAAATTATTCTGATGCATGAGTTACCCTA TAATTTACACATAAAGAAGGCTTTGTTGAATAAATCGAACTTTTGCTGAGTTGCTCGAG.
The construct was then transformed into E. coli DH5α and the plasmids were extracted and verified using Sanger sequencing. Electrocompetent cells of MGH66 was made as previously reported 43. The extracted plasmids were then transformed into MGH66 via electroporation, generating MGH66: PIS903B:gfp, MGH66: PcspD:gfp and MGH66: PyhcN:gfp.
Cell sorting of untreated MGH66: PIS903B:gfp and measurement of mutation frequencies
For cell sorting of MGH66: PIS903B:gfp, the top 10% cells with high-level GFP expression and bottom 10% of cells with low-level GFP expression were sorted into MHB medium without any antibiotics. This 10% sorting window was determined as a compromise between GFP levels and the time spent on sorting. Ideally, we could sort 1% of the population with the highest GFP expression, but this would take hours to sort enough cells for the experiments to measure mutation frequencies, which might alter transcript levels or cell physiology. We found 10% is a feasible time frame for us to get enough cells with sufficient GFP expression for subsequent experiments. The sorted cells were then used as the starting culture for measuring the mutation frequencies. We used a modified fluctuation analysis to measure mutation frequencies as previously described 43. In brief, ~100 sorted cells were seeded into each well of 384-well plates, followed by incubating at 37C for 3 hours. After the incubation, cells from three randomly chosen wells were taken and plated for cell counting. The rest of the wells were added meropenem at 0.06 ug/mL (2x MIC) and treated overnight at 37C in a humidified chamber. The 2nd day morning, wells with resistant mutants growing up were counted to calculate the mutation frequencies. An unsorted culture was included as a control. This experiment was repeated 3 times with 2 biological replicates each time.
Flow cytometry of the meropenem-treated samples
The cells for flow cytometry were prepared using the same scheme as the meropenem-treated sample for BacDrop. After treating with meropenem at 2 μg/mL for 30 minutes, cells were diluted into PBS at the final concentration of 106 cells/mL and immediately run through flow cytometer. For the flow cytometry including dead-cell staining, 10 μL propidium iodide (1 mg/mL) (Invitrogen, cat# P1304MP) was mixed with 1 mL PBS. Cells after treatment were then diluted in the staining buffer at the final concentration of 106 cells/mL, and incubated in the dark at room temperature for 15 minutes followed by running through flow cytometer. To sort GFP-low and GFP-high populations, cells were run through fluorescence-activated cell sorting (FACS). We sorted ~106 cells into 2 mL LB without or with meropenem (2 μg/mL) from each population. A 50 μL aliquot from cells sorted in LB medium without antibiotics was immediately diluted and plated on LB agar plates without antibiotics. The colony forming unites (CFU) on LB agar plates were enumerated to calculate the survival rates. The rest of the cells was used to measure MICs using the standard microdilution protocol48 or proceeded with the persister assay. For the persister assay, cells were cultured at 37 °C with shaking. At each time points, a 50 μL aliquot was taken, diluted and plated on LB agar plates without antibiotics. Three replicates were performed in each experiment.
RNA FISH
RNA FISH probes were designed using Design Probes tool of DECIPHER49. 3 – 5 probes were designed for each target gene (Key Resources Table). Probes were synthesized at Integrated DNA technologies and labeled with either Cy3 or Alex488 dye at the 5’. Probes of the same gene were pooled together and diluted into 10 μM stocks. Meropenem-treated MGH66 cells were fixed and permeabilized using the same protocol as the cell fixation and permeabilization steps in BacDrop. The hybridization was carried out in 40% hybridization buffer at 50 °C overnight and the following washing steps of washing were performed as described previously50. Cells were imaged using the DeltaVision widefield deconvolution imaging system with a 60x objective. ImageJ was used for image data analysis and cell quantification.
Whole genome sequencing of the persister cells of MGH66
Genomic DNA was isolated using DNeasy Blood and Tissue Kits (Qiagen, cat. # 69504) and quantified using Qubit dsDNA HS Assay Kit (Invitrogen, cat. # Q32851). WGS libraries were made using Nextera XT DNA library preparation kit (Illumina, cat. # FC-131-1096). Then the samples were sequenced using the MiSeq or NextSeq system with 300 cycles, pair-ended. For each strain sequencing, depth was set at approximately 100× coverage. BWA mem version 0.7.12 51 and Pilon v1.23, using default settings 52, were used to align reads against a reference genome assembly and to identify variants, respectively. SNP positions having mapping quality less than 10 (MQ < 10) were not considered.
Processing of sequencing data
To build input matrices with gene-barcode information, we created a pipeline that pulls UMI, barcodes 1 and 2. In experiments where multiple samples were pooled for droplet barcoding, a demultiplexing step based on CB1 was performed to parse different samples. Threshold of valid cell barcodes, removal invalid cell barcodes, and final count table generation was performed using UMI-tools53 (10.1101/gr.209601.116). Alignments were performed using BWA51 and annotation of bam files were done using FeatureCounts54.
Analysis of BacDrop data
Once the count tables were made, we used the standard workflow of the R package Seurat 355 (https://doi.org/10.1038/nbt.4096, https://doi.org/10.1016/j.cell.2019.05.031) (v.3.2.2). We excluded genes that were not expressed in any cells in the dataset, and excluded cells that had fewer than 10 or 15 genes detected and cells that had abnormally high numbers of mRNA detected. For experiments done using MGH66, we identified three genes that showed consistently high-level expression in the majority of cells and in various conditions: WP-004174069.1, WP-004174069.1-2, and WP-002920103.1. We use these three genes as internal controls and removed cells with a normalized expression of any of these three genes that is less than 50. The standard Seurat workflow prior to clustering was used including global normalization, feature selection, and scaling of gene expression. We used the top 2000 highly variable genes as input features for clustering analysis and downstream annotation. The Seurat packages FindNeighbors and FindClusters were used for clustering at a resolution of 0.5. Uniform manifold approximation and projection (UMAP) was utilized for visualization of clustering. For marker identification and annotation of clusters, Seurat’s FindMarkers tool was used with a requirement that the markers were expressed in 25% of the cells present in the dataset. For some rare population detection, the 25% criterion was removed. From the FindMarkers results, we consider genes with log2 fold changes that are greater than 2, and adjusted p-value less than 0.05 as significantly differentiated genes. If a cluster did not contain any genes that pass this threshold, we did not consider this as a significant cluster.
QUANTIFICATION AND STATISTICAL ANALYSIS
Statistical analysis for BacDrop experiments and bulk RNA-seq experiments were performed in R (v.3.2.2) and detailed analysis was described in relevant results and method sections. For all other experiments, including RT-qPCR, RNA-FISH, FACS, the quantification of mutation frequencies, and persister assays, the statistical analyses were performed in Prism 9 (version 9.3.1). Statistical details, including numbers of samples, types of statistical analyses, and precision measure can be found in figure legends and results sections. No methods were used to determine whether the data met assumptions of the statistical approach.
Supplementary Material
(A) rRNA and gDNA depletion are implemented in BacDrop. The three peaks are 16s rRNA (~1000 bp), 23 rRNA (~2000 bp), and gDNA (> 4000 bp).
(B) RNA species and their percentage in a BacDrop library of K. pneumoniae.
(C-F) Validation of BacDrop in K. pneumoniae, P. aeruginosa, E. coli, and E. faecium. (C) Unsupervised cell clustering separates these four species into distinct clusters. (D) rRNA depletion efficiency in these four species. (E) Cell loses in the mixed-species experiment are due to different cell recovery rates from centrifugation steps. Experiments in (D and E) were repeated three times. Error bars are plotted as standard deviation.
(G-I) Assessment of the transcriptome coverage of BacDrop libraries, see Method details.
(A) UMAP plotted based on the original identity of these 8 samples. Cells were separated well based on their treatment.
(B) Unsupervised UMAP showed three clusters with significantly (p < 0.05) higher expression of genes in the SOS-response pathway, heat-shock response, and genes encoding an IS903B transposase (MGE).
(C) No strong batch effect was observed between the two biological replicates with the same treatment conditions.
(A) UMAP plotted based on the original identity of these two samples (replicate 1 and replicate 2).
(B) Cells sorted from MGE.high and MGE.low had similar survival rates (p = 0.95).
This experiment was run in triplicates and error bars were plotted as standard deviation. The student’s t-test was used for statistical analysis (p = 0.95).
(A) Killing kinetics of MGH66 treated with meropenem (green), ciprofloxacin (blue) and gentamicin (purple). The cells were collected at 30 minutes under the treatment (red arrow) for BacDrop experiments. The experiment was repeated three times and the error bars are plotted as standard deviation.
(B-D) Analysis of the antibiotic-treated samples based on the replicates. UMAP plotted based on the original identity of two replicates (replicate 1 and replicate 2). (B) Meropenem, (C) Ciprofloxacin, (D) Gentamicin. For the meropenem-treated samples, fractions of cells in identified clusters were calculated separately from two replicates and indicated in the parentheses.
(A) Gating for cells to exclude cell debris.
(B-C) Dead cells were detected using mCherry red fluorescence, and GFP fluorescence of all cells was detected using green fluorescence. Roughly ~15.2% cells are dead in the gated population. The gating of GFP-low and GFP-high population is shown.
(D) 106 live cells from the GFP-high and GFP-low subpopulations were sorted into 1 mL LB medium, and immediate plated 100 μL on LB agar plates without any antibiotics.
This experiment was run in triplicates and error bars were plotted as standard deviation. The student’s t-test was used for statistical analysis (p = 0.43).
Table S1. 384 RT primers for round 1 plate barcoding, Related to Figure 1 and STAR methods.
Table S2. Bacterial Strains and oligos used in this study, Related to Figure 2–6, STAR methods, and Key Resource Table.
Table S3. Counts table of the E. coli library with 10k cells when analyzed in bulk, Related to Figure 2.
Table S4. Counts table of the small (12k) K. pneumoniae library when analyzed in bulk, Related to Figure 2.
Table S5. Counts table of the large (1 million) K. pneumoniae library when analyzed in bulk, Related to Figure 2.
Table S6. Prophage predicted in BIDMC35, Related to Figure 4.
Table S7. Cell numbers of each identified cluster in replicate 1 and replicate 2 in the meropenem-treated samples, Related to Figure 5.
Highlights.
We developed BacDrop, a droplet-based technology for bacterial single-cell RNA-seq.
BacDrop revealed a rare bacterial cellular state driven by expression of mobile genetic elements.
BacDrop revealed distinct cellular states resulting from antibiotic perturbation.
Cell states associated with emergence of antibiotic resistance or persistence were identified.
Acknowledgments
We thank Jonathan Livny, Noam Shoresh, and Eugenio Mattei for discussions on the data analysis, Zohar Bloom-Ackermann for the suggestions on the flow cytometry, Anne Clatworthy and Thulasi Warrier for the comments on the manuscript, Kevin Grosselin and Ming Pan for their discussion on the droplet generation steps. We thank Aviv Regev for her expert advice and comments on the manuscript. We thank the Microbial Omics Core at the Broad Institute for generating bulk RNA-seq libraries, and the Flow Cytometry Core Facility at the Broad Institute for cell sorting. This publication was supported in part by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under award 5R01AI117043-05 to DTH, and by a generous gift from Anita and Josh Bekenstein.
Footnotes
Declaration of interests
D.T.H., P.M., and H.M.A. have filed an U.S. Patent Application (Application No. 17/819,034) based on this work.
Inclusion and diversity
We support inclusive, diverse, and equitable conduct of research.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Macosko EZ et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell 161, 1202–1214, doi: 10.1016/j.cell.2015.05.002 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Shalek AK et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236–240, doi: 10.1038/nature12172 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Shalek AK et al. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 510, 363–369, doi: 10.1038/nature13437 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Shapiro E, Biezuner T & Linnarsson S Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat Rev Genet 14, 618–630, doi: 10.1038/nrg3542 (2013). [DOI] [PubMed] [Google Scholar]
- 5.Zeisel A et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142, doi: 10.1126/science.aaa1934 (2015). [DOI] [PubMed] [Google Scholar]
- 6.Klein AM et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201, doi: 10.1016/j.cell.2015.04.044 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Tang F et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods 6, 377–382, doi: 10.1038/nmeth.1315 (2009). [DOI] [PubMed] [Google Scholar]
- 8.Grun D et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525, 251–255, doi: 10.1038/nature14966 (2015). [DOI] [PubMed] [Google Scholar]
- 9.Taniguchi Y et al. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329, 533–538, doi: 10.1126/science.1188308 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Blattman SB, Jiang W, Oikonomou P & Tavazoie S Prokaryotic single-cell RNA sequencing by in situ combinatorial indexing. Nat Microbiol 5, 1192–1201, doi: 10.1038/s41564-020-0729-6 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Imdahl F, Vafadarnejad E, Homberger C, Saliba AE & Vogel J Single-cell RNA-sequencing reports growth-condition-specific global transcriptomes of individual bacteria. Nat Microbiol 5, 1202–1206, doi: 10.1038/s41564-020-0774-1 (2020). [DOI] [PubMed] [Google Scholar]
- 12.Kuchina A et al. Microbial single-cell RNA sequencing by split-pool barcoding. Science 371, doi: 10.1126/science.aba5257 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Homberger CH, Regan; Barquist Lars; Vogel Jörg. Improved bacterial single-cell RNA-seq through automated MATQ-seq and Cas9-based removal of rRNA reads. bioRxiv preprint, doi: 10.1101/2022.11.28.518171 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.McNulty RS, Duluxan; Liu Shichen; Hormoz Sahand; Rosenthal, Adam Z. Droplet-based single cell RNA sequencing of bacteria identifies known and previously unseen cellular states. bioRxiv preprint, doi: 10.1101/2021.03.10.434868 (2021). [DOI] [Google Scholar]
- 15.Dar D, Dar N, Cai L & Newman DK Spatial transcriptomics of planktonic and sessile bacterial populations at single-cell resolution. Science 373, doi: 10.1126/science.abi4882 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Tanay A & Regev A Scaling single-cell genomics from phenomenology to mechanism. Nature 541, 331–338, doi: 10.1038/nature21350 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Heimberg G, Bhatnagar R, El-Samad H & Thomson M Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing. Cell Syst 2, 239–250, doi: 10.1016/j.cels.2016.04.001 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Shekhar K et al. Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics. Cell 166, 1308–1323 e1330, doi: 10.1016/j.cell.2016.07.054 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Jaitin DA et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343, 776–779, doi: 10.1126/science.1247651 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Olah M et al. Single cell RNA sequencing of human microglia uncovers a subset associated with Alzheimer’s disease. Nat Commun 11, 6129, doi: 10.1038/s41467-020-19737-2 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mathys H et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 570, 332–337, doi: 10.1038/s41586-019-1195-2 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Del-Aguila JL et al. A single-nuclei RNA sequencing study of Mendelian and sporadic AD in the human brain. Alzheimers Res Ther 11, 71, doi: 10.1186/s13195-019-0524-x (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Habib N et al. Massively parallel single-nucleus RNA-seq with DroNc-seq. Nat Methods 14, 955–958, doi: 10.1038/nmeth.4407 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Zheng GX et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun 8, 14049, doi: 10.1038/ncomms14049 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Pollen AA et al. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat Biotechnol 32, 1053–1058, doi: 10.1038/nbt.2967 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Dixit A et al. Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell 167, 1853–1866 e1817, doi: 10.1016/j.cell.2016.11.038 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wang X, He Y, Zhang Q, Ren X & Zhang Z Direct Comparative Analyses of 10X Genomics Chromium and Smart-seq2. Genomics Proteomics Bioinformatics 19, 253–266, doi: 10.1016/j.gpb.2020.02.005 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhang X, Xu C & Yosef N Simulating multiple faceted variability in single cell RNA sequencing. Nat Commun 10, 2611, doi: 10.1038/s41467-019-10500-w (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Wu AR et al. Quantitative assessment of single-cell RNA-sequencing methods. Nat Methods 11, 41–46, doi: 10.1038/nmeth.2694 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Trapnell C et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 32, 381–386, doi: 10.1038/nbt.2859 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ackermann M A functional perspective on phenotypic heterogeneity in microorganisms. Nat Rev Microbiol 13, 497–508, doi: 10.1038/nrmicro3491 (2015). [DOI] [PubMed] [Google Scholar]
- 32.Dewachter L, Fauvart M & Michiels J Bacterial Heterogeneity and Antibiotic Survival: Understanding and Combatting Persistence and Heteroresistance. Mol Cell 76, 255–267, doi: 10.1016/j.molcel.2019.09.028 (2019). [DOI] [PubMed] [Google Scholar]
- 33.Andersson DI, Nicoloff H & Hjort K Mechanisms and clinical relevance of bacterial heteroresistance. Nat Rev Microbiol 17, 479–496, doi: 10.1038/s41579-019-0218-1 (2019). [DOI] [PubMed] [Google Scholar]
- 34.Regev A et al. The Human Cell Atlas. Elife 6, doi: 10.7554/eLife.27041 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Centers for Disease Control and Prevention: Antibiotic Resistance Threats in the United States. doi:https://www.cdc.gov/drugresistance/pdf/threatsreport/2019-ar-threats-report-508.pdf (2019).
- 36.Datlinger P et al. Ultra-high-throughput single-cell RNA sequencing and perturbation screening with combinatorial fluidic indexing. Nat Methods 18, 635–642, doi: 10.1038/s41592-021-01153-z (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Wulf MG et al. Non-templated addition and template switching by Moloney murine leukemia virus (MMLV)-based reverse transcriptases co-occur and compete with each other. J Biol Chem 294, 18220–18231, doi: 10.1074/jbc.RA119.010676 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Vasilyev N, Gao A & Serganov A Noncanonical features and modifications on the 5’-end of bacterial sRNAs and mRNAs. Wiley Interdiscip Rev RNA 10, e1509, doi: 10.1002/wrna.1509 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Stover CK et al. Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen. Nature 406, 959–964, doi: 10.1038/35023079 (2000). [DOI] [PubMed] [Google Scholar]
- 40.Lebreton F et al. Tracing the Enterococci from Paleozoic Origins to the Hospital. Cell 169, 849–861 e813, doi: 10.1016/j.cell.2017.04.027 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Choi PJ, Cai L, Frieda K & Xie XS A stochastic single-molecule event triggers phenotype switching of a bacterial cell. Science 322, 442–446, doi: 10.1126/science.1161427 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Bhattacharyya RP et al. Simultaneous detection of genotype and phenotype enables rapid and accurate antibiotic susceptibility determination. Nat Med 25, 1858–1864, doi: 10.1038/s41591-019-0650-9 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Ma P et al. Genetic determinants facilitating the evolution of resistance to carbapenem antibiotics. Elife 10, doi: 10.7554/eLife.67310 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Zaslaver A et al. A comprehensive library of fluorescent transcriptional reporters for Escherichia coli. Nat Methods 3, 623–628, doi: 10.1038/nmeth895 (2006). [DOI] [PubMed] [Google Scholar]
- 45.Ma P, Laibinis HH, Ernst CM & Hung DT Carbapenem Resistance Caused by High-Level Expression of OXA-663 beta-Lactamase in an OmpK36-Deficient Klebsiella pneumoniae Clinical Isolate. Antimicrob Agents Chemother 62, doi: 10.1128/AAC.01281-18 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Kim Y & Wood TK Toxins Hha and CspD and small RNA regulator Hfq are involved in persister cell formation through MqsR in Escherichia coli. Biochem Biophys Res Commun 391, 209–213, doi: 10.1016/j.bbrc.2009.11.033 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Shishkin AA et al. Simultaneous generation of many RNA-seq libraries in a single reaction. Nat Methods 12, 323–325, doi: 10.1038/nmeth.3313 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Wiegand I, Hilpert K & Hancock RE Agar and broth dilution methods to determine the minimal inhibitory concentration (MIC) of antimicrobial substances. Nat Protoc 3, 163–175, doi: 10.1038/nprot.2007.521 (2008). [DOI] [PubMed] [Google Scholar]
- 49.Wright ES, Yilmaz LS, Corcoran AM, Okten HE & Noguera DR Automated design of probes for rRNA-targeted fluorescence in situ hybridization reveals the advantages of using dual probes for accurate identification. Appl Environ Microbiol 80, 5124–5133, doi: 10.1128/AEM.01685-14 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Skinner SO, Sepulveda LA, Xu H & Golding I Measuring mRNA copy number in individual Escherichia coli cells using single-molecule fluorescent in situ hybridization. Nat Protoc 8, 1100–1113, doi: 10.1038/nprot.2013.066 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Li H & Durbin R Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760, doi: 10.1093/bioinformatics/btp324 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Walker BJ et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963, doi: 10.1371/journal.pone.0112963 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Smith T, Heger A & Sudbery I UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res 27, 491–499, doi: 10.1101/gr.209601.116 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Liao Y, Smyth GK & Shi W featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930, doi: 10.1093/bioinformatics/btt656 (2014). [DOI] [PubMed] [Google Scholar]
- 55.Stuart T et al. Comprehensive Integration of Single-Cell Data. Cell 177, 1888–1902 e1821, doi: 10.1016/j.cell.2019.05.031 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
(A) rRNA and gDNA depletion are implemented in BacDrop. The three peaks are 16s rRNA (~1000 bp), 23 rRNA (~2000 bp), and gDNA (> 4000 bp).
(B) RNA species and their percentage in a BacDrop library of K. pneumoniae.
(C-F) Validation of BacDrop in K. pneumoniae, P. aeruginosa, E. coli, and E. faecium. (C) Unsupervised cell clustering separates these four species into distinct clusters. (D) rRNA depletion efficiency in these four species. (E) Cell loses in the mixed-species experiment are due to different cell recovery rates from centrifugation steps. Experiments in (D and E) were repeated three times. Error bars are plotted as standard deviation.
(G-I) Assessment of the transcriptome coverage of BacDrop libraries, see Method details.
(A) UMAP plotted based on the original identity of these 8 samples. Cells were separated well based on their treatment.
(B) Unsupervised UMAP showed three clusters with significantly (p < 0.05) higher expression of genes in the SOS-response pathway, heat-shock response, and genes encoding an IS903B transposase (MGE).
(C) No strong batch effect was observed between the two biological replicates with the same treatment conditions.
(A) UMAP plotted based on the original identity of these two samples (replicate 1 and replicate 2).
(B) Cells sorted from MGE.high and MGE.low had similar survival rates (p = 0.95).
This experiment was run in triplicates and error bars were plotted as standard deviation. The student’s t-test was used for statistical analysis (p = 0.95).
(A) Killing kinetics of MGH66 treated with meropenem (green), ciprofloxacin (blue) and gentamicin (purple). The cells were collected at 30 minutes under the treatment (red arrow) for BacDrop experiments. The experiment was repeated three times and the error bars are plotted as standard deviation.
(B-D) Analysis of the antibiotic-treated samples based on the replicates. UMAP plotted based on the original identity of two replicates (replicate 1 and replicate 2). (B) Meropenem, (C) Ciprofloxacin, (D) Gentamicin. For the meropenem-treated samples, fractions of cells in identified clusters were calculated separately from two replicates and indicated in the parentheses.
(A) Gating for cells to exclude cell debris.
(B-C) Dead cells were detected using mCherry red fluorescence, and GFP fluorescence of all cells was detected using green fluorescence. Roughly ~15.2% cells are dead in the gated population. The gating of GFP-low and GFP-high population is shown.
(D) 106 live cells from the GFP-high and GFP-low subpopulations were sorted into 1 mL LB medium, and immediate plated 100 μL on LB agar plates without any antibiotics.
This experiment was run in triplicates and error bars were plotted as standard deviation. The student’s t-test was used for statistical analysis (p = 0.43).
Table S1. 384 RT primers for round 1 plate barcoding, Related to Figure 1 and STAR methods.
Table S2. Bacterial Strains and oligos used in this study, Related to Figure 2–6, STAR methods, and Key Resource Table.
Table S3. Counts table of the E. coli library with 10k cells when analyzed in bulk, Related to Figure 2.
Table S4. Counts table of the small (12k) K. pneumoniae library when analyzed in bulk, Related to Figure 2.
Table S5. Counts table of the large (1 million) K. pneumoniae library when analyzed in bulk, Related to Figure 2.
Table S6. Prophage predicted in BIDMC35, Related to Figure 4.
Table S7. Cell numbers of each identified cluster in replicate 1 and replicate 2 in the meropenem-treated samples, Related to Figure 5.
Data Availability Statement
Sequencing data and the processed counting matrix have been deposited to GEO repository (GSE180237).