Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jan 1.
Published in final edited form as: Nat Biotechnol. 2021 Apr 12;39(7):819–824. doi: 10.1038/s41587-021-00865-z

Single-cell CUT&Tag analysis of chromatin modifications in differentiation and tumor progression

Steven J Wu 1,2,#, Scott N Furlan 3,4,5,#, Anca B Mihalas 6,7, Hatice S Kaya-Okur 1,8,9, Abdullah H Feroze 7, Samuel N Emerson 7, Ye Zheng 10, Kalee Carson 3, Patrick J Cimino 6,11, C Dirk Keene 11, Jay F Sarthy 3,4, Raphael Gottardo 10, Kami Ahmad 1, Steven Henikoff 1,8,✉,#, Anoop P Patel 5,6,7,✉,#
PMCID: PMC8277750  NIHMSID: NIHMS1718994  PMID: 33846646

Abstract

Methods for quantifying gene expression1 and chromatin accessibility2 in single cells are well established, but single-cell analysis of chromatin regions with specific histone modifications has been technically challenging. In this study, we adapted the CUT&Tag method3 to scalable nanowell and droplet-based single-cell platforms to profile chromatin landscapes in single cells (scCUT&Tag) from complex tissues and during the differentiation of human embryonic stem cells. We focused on profiling polycomb group (PcG) silenced regions marked by histone H3 Lys27 trimethylation (H3K27me3) in single cells as an orthogonal approach to chromatin accessibility for identifying cell states. We show that scCUT&Tag profiling of H3K27me3 distinguishes cell types in human blood and allows the generation of cell-type-specific PcG landscapes from heterogeneous tissues. Furthermore, we used scCUT&Tag to profile H3K27me3 in a patient with a brain tumor before and after treatment, identifying cell types in the tumor microenvironment and heterogeneity in PcG activity in the primary sample and after treatment.


Substantial portions of the genome are actively repressed to create barriers between cell type lineages during development4. In particular, trimethylation on lysine 27 of histone H3 (H3K27me3) in nucleosomes by PcG proteins is crucial for gene silencing during normal differentiation and, thus, for maintaining cell identity5. Conversely, derangements in PcG silencing permit aberrant gene expression and disease6. Therefore, methods for assaying silenced chromatin can provide insights into a variety of processes, ranging from normal development to tumorigenesis.

Scalable methods for assessing silenced chromatin at the single-cell level have not been widely available. We set out to use chromatin profiling of single cells to assess gene silencing and to develop a framework for analysis. Our approach builds on Cleavage Under Targets and Tagmentation (CUT&Tag), which uses specific antibodies to tether a Tn5 transposome at the sites of chromatin proteins in isolated cells or nuclei. Activation of the transposome then tagments genomic loci with adapter sequences that are used for library construction and deep sequencing, thereby identifying binding sites for any protein where a specific antibody is available3. Our earlier work demonstrated that CUT&Tag profiling of the H3K4me2 histone modification efficiently detected gene activity, much like assay for transposase-accessible chromatin sequencing (ATAC-seq), whereas H3K27me3 profiling detected silenced chromatin that might be epigenetically inherited3.

To determine whether single-cell chromatin landscapes were sufficient to distinguish different cell types, we performed CUT&Tag on H1 human embryonic stem cells (hESCs) using an anti-H3K27me3-specific antibody in bulk and then distributed single cells for polymerase chain reaction (PCR) and library enrichment on the nanowell-based ICELL8 system (Fig. 1a). We compared this to previously published H3K27me3 scCUT&Tag profiles of K562 cells and hESCs3 to determine whether standard approaches to single-cell clustering could distinguish cell types based on H3K27me3 signal. As PcG domains typically span more than 10 kilo-bases (kb), we grouped read counts in 5-kb bins across the genome and used this for latent sematic indexing (LSI)-based dimensionality reduction and uniform manifold approximation and projection (UMAP) embedding, followed by standard Louvain clustering using the ArchR package7 (Methods). After quality control filtering (Methods and Supplementary Fig. 1ag), UMAP embedding clearly separated 100% of 804 hESCs with a median of 375 unique fragments from 908 K562 cells with a median of 6,064 unique fragments independent of batch effects (Fig. 1b). Interestingly, hESCs had 6% of the number of unique fragments when compared to K562 cells (Supplementary Fig. 1f). This demonstrates that stem cells have lower global H3K27me3 levels than more differentiated cell types8. Despite downsampling the number of unique fragments per cell to the same median value for both datasets, H3K27me3 signal still readily distinguished the two cell types (Supplementary Fig. 1g), confirming that clustering was driven by differences in H3K27me3 signal and not the number of unique fragments.

Fig. 1 |. scCUT&Tag resolves distinct cell types and maps repressive chromatin domains in early hESC development.

Fig. 1 |

a, Schematic of scCUT&Tag applied to nuclei isolated from cell culture, a model endoderm differentiaton system, blood cells and a human brain tumor. Single cells are then partitioned using either the 10x Genomics or ICELL8 microfluidic systems. b, UMAP embedding of scCUT&Tag for a repressive histone modification, H3K27me3, in K562 (n = 908) and hESC (n = 804) single cells. c, UMAP embedding of scCUT&Tag for a repressive histone modification, H3K27me3, in a 5-d differentiation time course from hESC to definitive endoderm (total n = 1,830). Cell types are colored according to the day along the time course in which they were harvested. d, Top, bar plot representing the percent of single cells (n = 350, 171, 474, 274 and 561 from days 1–5, respectively) that are repressed at each specific gene, where the upper axis corresponds to scCUT&Tag (percent of single cells repressed). Bottom, jitter plot depicting scRNA-seq for similar time points (n = 92, 66, 172, 138,and 188), where the lower axis corresponds to scRNA-seq (normalized messenger RNA counts from GSE75748). From left to right, well-known TF markers for pluripotent, mesendoderm and definitive endoderm cells. mRNA, messenger RNA.

Cellular determination and differentiation proceed by a controlled sequence of gene activation and gene repression. To study gene silencing during development, we differentiated hESCs toward definitive endoderm9. We confirmed differentiation by immunofluorescence staining of stage-specific transcription factors (TFs) (Supplementary Fig. 2a). UMAP embedding of 1,830 scCUT&Tag H3K27me3 profiles with a median of 279 fragments revealed a developmental trajectory, independent of batch effect (Supplementary Fig. 2b), from hESC to definitive endoderm (Fig. 1c) that was punctuated by stem-like states on days 1–2 followed by a rapid progression toward differentiation on days 3–5. To determine if changes in chromatin silencing correspond to changes in gene expression, we examined known markers of stem cells and endoderm differentiation in single-cell aggregate profiles from each day. Overall, H3K27me3 signal at a marker gene was inversely correlated with expression based on a published single-cell RNA sequencing (scRNA-seq) dataset9. Stem cell markers, such as SOX2, KLF4 and FOXD3, were expressed in hESCs and lacked H3K27me3 but were silenced as differentiation proceeded (Fig. 1d). Between days 2 and 3, hESCs transition into a mesendoderm state (characterized by expression of TBXT, MSX2 and PDGFRA) in which they have the developmental potential to become either mesoderm or endoderm9. This is illustrated in our data between days 2 and 3 where chromatin silencing at mesoderm markers was lower (Fig. 1d). As differentiation proceeded, endoderm markers, such as FOXA2, SOX17 and PRDM1, became active and lost H3K27me3 signal (Fig. 1d). Finally, markers of ectoderm (PAX6 and LHX2) were not expressed and accumulated H3K27me3, consistent with silencing of these loci (Supplementary Fig. 2c). Pseudo-temporal ordering of single cells recapitulated our real-time results (Supplementary Fig. 2d).

Having established that scCUT&Tag readily identifies dynamic changes in chromatin silencing, we next sought to determine whether chromatin profiles could distinguish cell types in a more complex tissue. To do so, we adapted scCUT&Tag to the 10x Genomics microfluidics platform and profiled H3K27me3 in mixed peripheral blood mononuclear cells (PBMCs) collected from two healthy donors. Briefly, we performed scCUT&Tag in bulk on 1 million cells and then loaded two lanes of a 10x Genomics microfluidic chip with 10,000 nuclei each to obtain technical replicates (Supplementary Fig. 3). We implemented a chromatin silencing score (CSS), which uses the gene activity score model in ArchR7 to create a proxy for the overall signal associated with a given locus. Quality control filtering resulted in 9,917 cells with a median of 1,110 unique fragments per cell for which we performed dimensionality reduction and embedding as described above (Fig. 2a). The median number of reads falls in the range expected for cell type variation, in spite of the platform differences in our study.

Fig. 2 |. scCUT&Tag for H3K27me3 readily identifies major subtypes in PBMCs.

Fig. 2 |

a, Left, UMAP embedding of single-cell data from PBMCs. Unsupervised clustering revealed five clusters. Right, UMAP projection of downsampled ChIPseq bulk data from primary sorted bulk datasets for major PBMC cell types (see Supplementary Methods for GSE citations) on single-cell CUT&Tag data on left. b, Heat map of genes with significantly low (top) or high (bottom) H3K27me3 signal (CSS) in each cluster (row). Fold change < −2 (top) or > 2 (bottom); q < 0.05 (both). Cell-type-specific genes are highlighted. c, Sparse mixture model clustering (using souporcell11) of genotype variant calls from the PBMC data colored by genotype assignment (before multiplet removal). NK, natural killer.

We then set out to identify the major cell types in the data using two methods. We first downsampled publicly available bulk H3K27me3 chromatin immunoprecipitation followed by sequencing (ChIP-seq) data (ENCODE) and used the UMAP transform function to ‘projectʼ the ChIP-seq data onto our UMAP embedding as previously described10 (Fig. 2a). We used the CSS score to identify cell-type-specific marker genes that showed a lack of H3K27me3 enrichment because active genes will have a low CSS. Therefore, we would expect a low CSS for a cell-type-specific marker gene in the cluster that corresponded to that cell type (Fig. 2b). Overall, cluster identification by CSS annotation matched our assignments by ChIP-seq projection (Fig. 2a) and distinguished major cell types in unsorted PBMCs, including those of lymphoid (T cell, natural killer cell and B cell) and myeloid (monocyte) lineages. We recovered the proportions of major cell types within the range of normal adult blood (Supplementary Table 1). Using this method, we can, therefore, generate cell-type-specific PcG landscapes across heterogenous cell types within a sample, obviating the need for physical cell sorting and minimizing confounders such as batch effect, read depth or sample heterogeneity (Supplementary Fig. 3). This allowed us to identify the top differentially PcG-silenced loci across the major cell types in PBMCs (Fig. 2b). We also profiled PBMCs with the active mark H3K27ac and recovered the major cell types in a similar proportion as H3K27me3 scCUT&Tag (Supplementary Fig. 4 and Supplementary Table 1).

We next demultiplexed each biological donor using souporcell. In brief, the algorithm identifies genotypic differences between single cells by variant calling aligned reads11. The variant calls can also be used to identify multiplets. Using this method, we were able to differentiate cells from each donor (Fig. 2c). Clustering was not driven by donor-specific effects but, rather, by cell type differences (Supplementary Fig. 3b).

Having established that scCUT&Tag can profile developmental systems and heterogenous tissues, we used scCUT&Tag to interrogate PcG-based clustering in glioblastoma, a human central nervous system tumor that is known to have a heterogeneous microenvironment12, exhibit intratumoral heterogeneity13 and have pseudo-hierarchical organization that mimics development12,14,15. In this tumor type, changes in PcG chromatin silencing can mediate emergence of resistant cell populations16.

We profiled H3K27me3 in 1,311 single cells (3,643 median fragments per cell) using the 10x scCUT&Tag workflow from a primary glioblastoma that had been snap-frozen shortly after surgical removal. We distinguished four major cell populations within the sample (Fig. 3a). To annotate clusters, we constructed CSS of previously defined marker loci12 and annotated clusters that correspond to microglia (Cluster 1, low CSS at the PTPRC gene), neurons (Cluster 3, low CSS at RBFOX3), oligodendrocytes (Cluster 4, low CSS at MOBP) and other neural lineage cells, including tumor cells (Cluster 4, low CSS at SOX2) (Fig. 3b). To confirm cluster annotations, we projected CUT&RUN bulk data from a glioma stem cell line (UW7gsc) derived from the same patient, two established neural stem cell lines (U5 and CB660)17, and ENCODE18 ChIP-seq bulk data for monocytes (proxy for microglia) and astrocytes. Projection onto the scCUT&Tag tumor sample embedding confirmed CSS annotations (Fig. 3c). UW7gsc projected to the center of the largest cluster, presumably made up of tumor cells. The astrocyte data projected to a smaller satellite cluster within the neural lineage cells. The neural stem cell line data localized to both the tumor cell cluster and the astrocyte cluster (Supplementary Fig. 5). This might reflect spontaneous differentiation of neural stem cells toward the astrocyte lineage in vitro or reflect subtle changes in cell state, such as lineage priming19.

Fig. 3 |. scCUT&Tag data for H3K27me3 for a human glioblastoma primary and relapse sample demonstrate heterogeneity in PcG distribution within tumor cell clusters and cluster enrichment after treatment.

Fig. 3 |

a, UMAP embedding of single cells from a primary human glioblastoma based on H3K27me3 signal. b, Cluster annotation using CSS for key marker genes identifies microglia (PTPRC), neurons (RBFOX3), oligodendrocytes (MOBP) and tumor cells (SOX2). c, UMAP transform and projection of bulk ChIP-seq (monocytes and astrocytes) or bulk CUT&RUN (UW7gsc) onto patient sample. d, Left, UMAP co-embedding of tumor cells from primary and relapse sample. Inset highlights locations of cells from relapse sample. Right, bar plot demonstrating fraction of cells in each sample (Primary and Relapse) that belong to each cluster. e, Left, two pseudotime trajectories starting with cluster T1 (presumed stem-like cluster) and ending in either Cluster T4 (Trajectory 1) or Cluster T2 (Trajectory 2). Right, heat map of 132 significant motif deviations based on H3K27me3 activity within peaks from aggregated tumor cell ATAC-seq data. Motif deviations are ordered by pseudotime. f, UMAP plots for tumor cells colored by deviation scores for selected motifs. Left column shows early motifs in pseudotime that are commonly silenced, including NEUROD1, SNAI2 and TCF12. Middle column shows silenced programs that diverge according to trajectory (NR1D2 in Trajectory 1 and ETV5 in Trajectory 2) or are common across trajectories (RFX4). Right column shows silenced programs specific to terminal pseudotime for Trajectory 1 (HES5), Trajectory 2 (GATA6) or both (DNMT1).

To understand how the tumor changed with treatment, we performed scCUT&Tag profiling for H3K27me3 for a relapse sample obtained via rapid autopsy from the patient 5 months after surgery and radiation therapy. Application of quality control metrics followed by low-dimensional embedding identified four distinct cell types in the relapse sample (Supplementary Fig. 6a). Projection of the 1,168 autopsy single-cell profiles (16,232 median fragments per cell) onto the primary tumor UMAP embedding allowed cell type identification, including 71 cells that co-localized to the tumor cell cluster (Supplementary Fig. 6b).

The low number of cells in the autopsy tumor specimen that met quality control standards limited the biological conclusions that we could draw from these data independently. As such, we chose to consider the 71 autopsy cells in the context of the 640 primary tumor cells by co-embedding them together and analyzing their relationship to each other. After batch correction, we identified four clusters within the tumor cell data with distinct H3K27me3 profiles (Fig. 3d, left). Examining the distribution of cell states across the two time points, we noted an enrichment for Cluster T1 in the relapse specimen (Fig. 3d, right). The relapse tumor cells had higher background signal when compared to the primary tumor cells as determined by FRiP analysis (Supplementary Fig. 6c). To further confirm that the relapse cells were most similar to Cluster T1, we characterized reads in relapse cells that were present in genomic regions that most significantly distinguished the primary tumor clusters (Supplementary Fig. 6d). This analysis confirmed similarity of the relapse tumor cells to Cluster T1. Gene set enrichment analysis using the CSS matrix identified potential programs silenced (positive enrichment scores) and derepressed (negative enrichment scores) in this cluster. Interestingly, the Verhaak_glioblastoma_proneural gene set appears to be silenced in the resistant cell cluster (Supplementary Fig. 7), consistent with the idea that tumor evolution might induce a proneural-to-mesenchymal shift20. In contrast, low CSS was observed at gene sets with high CpG content that are marked by H3K27me3 in whole brain21. The lack of H3K27me3 signal in this tumor cluster suggests that the PcG landscape of glioblastoma cells resembles a stem-like state rather than a terminally differentiated state22.

We next wanted to understand the relationship between the cell clusters. Clusters T1, T2 and T4 exist along a continuum, whereas Cluster T3 is separated from the main tumor cell group. We focused on whether TF programs are differentially silenced across Clusters T1, T2 and T4. H3K27me3 domains are broad, spanning 10–100 kb and covering many genes, enhancers, promoters and intervening regions. Therefore, to limit motif searching to potential regulatory elements within H3K27me3 domains, we used single-cell ATAC-seq data (Supplementary Fig. 8) to annotate enhancers and promoters in tumor cell subclusters based on accessible chromatin. We then calculated TF motif enrichments and depletions in this set of curated genomic regions based on H3K27me3 signal. We examined motif deviations ordered over two pseudotime trajectories that started with Cluster T1 (presumed stem-like cluster) and ended in either Cluster T4 (Trajectory 1) or Cluster T2 (Trajectory 2) (Fig. 3e, left). Motif deviations (n = 132) were ordered according to pseudotime, identifying silenced motifs that spanned Cluster T1 to Cluster T2 and Cluster T4 (Fig. 3e, right). At the apex of the trajectories, motif silencing was shared and included motifs for TFs such as NEUROD1, SNAI2 and TCF12 (Fig. 3f, left column). At intermediate pseudotime points, there were silenced motifs specific to Trajectory 1 (NR1DA2) or Trajectory 2 (ETV5) or shared by both (RFX4) (Fig. 3f, middle column). As pseudotime proceeded, Trajectory 1 showed evidence of HES5 motif silencing, whereas Trajectory 2 showed GATA6 motif silencing. Interestingly, the DNMT1 motif was strongly silenced across both pseudotime endpoints, concordant with the idea that PcG silencing of DNMT1-enriched promoters and enhancers is a common feature of differentiation23 (Fig. 3f, right column).

Fundamentally, we showed here that repressive chromatin can be used to identify cell states a priori from heterogeneous normal and diseased tissues. This approach has far-reaching applications, including generation of cell-type-specific chromatin atlases from archival tissue in a manner that does not require sorting of pure populations. We focused primarily on a single chromatin mark, but this method can, in theory, be applied to any histone modification or DNA-binding protein for which an antibody is available. As such, developing complete chromatin landscapes of complex tissues and disease states using scCUT&Tag will help decode the complex epigenetic machinery underlying gene expression. Broadly, our method for performing histone mark-specific single-cell analysis adds to the growing list of single-cell ‘omic’ methods that can be used to understand heterogeneous cell populations.

Methods

Biological material.

H1 hESCs were purchased from WiCell (cat. no. WA01, lot no. WB35186). We used the following antibodies: guinea pig anti-rabbit IgG (heavy and light chain) antibody (antibodies-online, ABIN101961), H3K27me3 (Cell Signaling Technology, cat. no. 9733), H3K27ac (Millipore Sigma, cat. no. MABE647, lot no. VP1901251), SOX17 (R&D Systems, AF1924, lot KGA0916121), OCT4 (Abcam, ab109183, lot gr120970-6) and H3K4me2 (Upstate Biotechnology, 07–030, lot 26335) The fusion enzyme pA-Tn5 was generated as previously described3.

hESC culture conditions.

H1 hESCs were maintained on Corning Matrigel hESC-Qualified Matrix (Corning, no. 354277) at 37 °C in mTeSR1 from STEMCELL Technologies (cat. no. 85850) with daily medium replacement. When cell aggregates were 80% confluent, they were released using ReLeSR (STEMCELL Technologies, no. 05872) per manufacturer instructions and incubated at 37 °C for 3–5 min. Cells were released into a small volume of complete medium by tapping of growth plate, and aggregates reduced in size by gentle pipetting and passaged to desired ratio.

hESC differentiation protocol.

hESCs were differentiated to definitive endoderm using the STEMdiff Definitive Endoderm Kit (cat. no. 05110). The full protocol is available from STEMCELL Technologies (https://cdn.stemcell.com/media/files/pis/29550-PIS_2_1_0.pdf?_ga=2.73376023.564267965.1597964514-138601152.1597964514). Briefly, hESCs at 80% confluence were harvested using Gentle Cell Dissociation Reagent (STEMCELL Technologies, cat. no. 07174) and reseeded in a single-cell manner on Matrigel plates. This was done daily for 5 d. Every 24 h after a new differentiation, culture was started, and cells were incubated with DE differentiation medium according to the manufacturer’s guidelines. On the fifth day, all five time points were harvested simultaneously using Accutase (STEMCELL Technologies, cat. no. AT104-500). Immunofluorescence was used to confirm differentiation as previously described24 with primary antibodies SOX17 1:250 dilution (R&D Systems, AF1924, lot KGA0916121) and OCT4 1:100 dilution (Abcam, ab109183, lot gr120970-6) and secondary antibodies donkey anti-goat Rhodamine Red 1:1,000 dilution (Jackson ImmunoResearch, cat. no. 705-295-147) and goat anti-rabbit-Cy5 1:1,000 dilution (Jackson ImmunoResearch, cat. no. 111-175-144).

PBMC acquisition and processing.

Healthy adult donors at the University of Washington underwent venipuncture, and blood was collected using heparin-containing vacutainer tubes after consenting to participate in our study (Institutional Review Board (IRB) protocol no. STUDY00008678). Additional PBMC specimens were obtained from consented donors at the Fred Hutchinson Cancer Research Center (IRB no. 0999.209). Mononuclear cells were harvested from peripheral blood using gradient centrifugation. Cells were then washed twice with PBS and captured as outlined below.

Brain tumor specimen acquisition, processing and culture.

Adult patients at the University of Washington provided pre-operative informed consent to take part in the study in all cases after approved IRB protocol (no. STUDY00002162). Fresh tumors were collected directly from the operating room at the time of surgery and either taken fresh or snap-frozen immediately after removal in liquid nitrogen. Histopathologic diagnosis was confirmed by a board-certified neuropathologist. Fresh tissue was enzymatically dissociated using a papain-based brain tumor dissociation kit (Miltenyi Biotec) as per the manufacturer’s protocol. Cells were then cultured on laminin-coated plates in DMEM/F12 supplemented with 1× N2/B27 and 1% penicillin–streptomycin. Cultures were passaged as needed when confluent and considered stable after three serial passages. Cell line UW7gsc was used for this study at passage 3. Autopsy tissue was collected with a post-mortem interval of approximately 8.75 h after informed consent with a waiver from the University of Washington IRB. Tissue was snap-frozen in liquid nitrogen-cooled isopentane. Tumor regions were sampled based on gross examination of brain sections and processed as outlined below.

Nuclei preparation from brain tumor specimens.

Frozen tissue was processed to nuclei using the ‘Frankenstein’ protocol from protocols.io. Briefly, snap-frozen glioblastoma tissue was thawed on ice and minced sharply into <1-mm pieces. Next, 500 μl of chilled Nuclei EZ Lysis Buffer (Millipore Sigma, NUC-101, no. N3408) was added, and tissue was homogenized 10–20 times in a Dounce homogenizer. The homogenate was transferred to a 1.5-ml Eppendorf tube, and 1 ml of chilled Nuclei EZ Lysis Buffer was added. The homogenate was mixed gently with a wide-bore pipette and incubated for 5 min on ice. The homogenate was then filtered through a 70-μm mesh strainer and centrifuged at 500g for 5 min at 4 °C. Supernatant was removed, and nuclei were resuspended in 1.5 ml of Nuclei EZ Lysis Buffer and incubated for 5 min on ice. Nuclei were centrifuged at 500g for 5 min at 4 °C. After carefully removing the supernatant (pellet might be loose), nuclei were washed in wash buffer (1× PBS, 1.0% BSA and 0.2 U μl−1 of RNase Inhibitor). Nuclei were then centrifuged and resuspended in 1.4 ml of wash buffer for two additional washes. Nuclei were then filtered through a 40-μm mesh strainer. Intact nuclei were counted after counterstaining with Trypan blue in a standard cell counter.

Chromatin profiling: scCUT&Tag using the ICELL8 system/protocol.

scCUT&Tag for the ICELL8 was carried out as previously described3. In brief, approximately 250,000 hESCs (for each time point) were processed by centrifugation between buffer exchanges at 600g for 3 min and in low-retention tubes. Cells were collected and washed with 1 ml of wash buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 0.5 mM spermidine and 1× protease inhibitor cocktail) at room temperature. Cells were incubated in antibody diluted 1:50 in NP40-Digitonin Wash Buffer (0.01% NP40 and 0.01% digitonin in wash buffer) overnight. This wash buffer permeabilized the cells and released nuclei. Permeabilized nuclei were then rinsed once with NP40-Digitonin Wash buffer and incubated with anti-rabbit IgG antibody (1:50 dilution) in 1 ml of NP40-Digitonin Wash buffer on a rotator at room temperature for 30 min. Nuclei were washed twice with NP40-Digitonin Wash buffer and incubated with 1:100 dilution of pA-Tn5 in NP40-Dig-med-buffer (0.01% NP40, 0.01% digitonin, 20 mM HEPES pH 7.5, 300 mM NaCl, 0.5 mM spermidine and 1× protease inhibitor cocktail) for 1 h at room temperature on a rotator. Cells were washed two times with NP40-Dig-med-buffer and resuspended in 150 μl of tagmentation buffer (10 mM MgCl2 in NP40-Dig-med-buffer) and incubated at 37 °C for 1 h. Tagmentation was stopped by adding 50 μl of 4× Stop Buffer (40.4 mM EDTA and 2 mg ml−1 of DAPI), and the sample was held on ice for 30 min. Samples were then strained through a 10-μm cell strainer to remove clumps of cells.

The SMARTer ICELL8 single-cell system (Takara Bio, cat. no. 640000) was used to array single cells as previously described3. Briefly, cells were loaded onto a source plate and dispensed into a SMARTer ICELL8 350v Chip (Takara Bio, cat. no. 640019) at 35 nl per well. The chip was then spun down at 300g for 5 min. Imaging on a DAPI channel confirmed the presence of single cells in specific wells. Non-single cell wells were excluded from downstream reagent dispenses. To index the whole chip, 72 × 72 i5/i7 unique indices (5,184 microwells total) were dispensed at 35 nl in wells that contained single cells, followed by two dispenses of 50 nl (100 nl total) of 2× NEBNext High-Fidelity 2× PCR Master Mix (NEB, M0541L). The chip was sealed and spun down at 2,250g for 3 min after each dispense. The PCR on the chip was performed with the following protocol: 5 min at 72 °C and 2 min at 98 °C, followed by 15 cycles of 10 s at 98 °C, 30 s at 60 °C and 5 s at 72 °C, with a final extension at 72 °C for 1 min.

Quality control (ICELL8).

The ICELL8 has a built-in imaging system that filters out wells that do not contain a single cell. Thus, empty wells without cells, with more than one single cell and with doublets are removed. Subsequently, we filtered single cells with fewer than 100 unique fragments to remove spurious barcodes that can be attributed to an overflow of dispensed PCR material.

A drawback of leveraging a hyperactive transposon in a fusion enzyme to target specific chromatin compartments is that the Tn5 has a high binding affinity for accessible chromatin, the basis of ATAC-seq. Previously, it was shown that this artifact is highly dependent on the concentration of salt in subsequent washes after fusion enzyme binding3. To identify whether our single-cell samples exhibited this artifact, we mapped the percent of reads in each single cell that fell into H3K27me3, H3K4me2 or ATAC-specific peaks (Supplementary Fig. 1c). The degree in which repressive H3K27me3 marked chromatin and active accessible chromatin ATAC-seq signal overlapped was minimal, as expected, whereas an active mark, H3K4me2, had a higher degree of overlap with ATAC-seq data. Correlations of aggregate versus bulk profiles across the 5-kb genome tiles showed similar results (Supplementary Fig. 1b).

As an initial test, we wanted to evaluate the robustness of scCUT&Tag by comparing it to single-cell ATAC-seq. Therefore, we chose the histone modification K4me2 that was shown to provide similar output to ATAC-seq. A representative genomic track comparing bulk, aggregate and single-cell profiles for K4me2 in H1 and K562 cells revealed the high-quality resulting data (Supplementary Fig. 1a). A low-dimensional embedding, UMAP, clearly separates K562 cells (n = 807) from hESCs (n = 317) (Supplementary Fig. 1d). Projections of published single-cell ATAC-seq data (GSE99172) onto our scCUT&Tag embedding align with cell-type-specific clusters (Supplementary Fig. 1e).

Chromatin profiling: scCUT&Tag using the 10x Genomics system.

CUT&Tag was performed with an anti-H3K27me3 antibody (Cell Signaling Technology, no. 9733, dilution 1:100) or anti-H3K27ac (MABE647, dilution 1:100) with 1 million cells, as previously published3. Guinea pig anti-rabbit IgG secondary antibody (ABIN1011961) was used at 1:100 dilution. Adaptation to the 10x workflow was performed as follows. For all samples except the PBMC mixing experiment, the nuclei were spun down at 600g for 3 min after the pA-Tn5 binding step. After counting, they were resuspended in 1× Diluted Nuclei Buffer at 2,500 nuclei per microliter. The nuclei were then prepared for transposition per the 10x Genomics single-cell ATAC-seq protocol (SingleCell_ATAC_ReagentKits_v1.1_UserGuide_RevD). All steps beginning with 1.1, ‘Prepare Transposition Mix’, were performed according to 10x Genomics standard protocol. Libraries were sequenced using an Illumina NovaSeq 6000.

For the PBMC mixing experiment, the nuclei were tagmented in high salt (300 mM) as per published protocol3. After tagmentation, BSA was added to a final concentration of 1% and nuclei were centrifuged at 600g for 3 min and then resuspended in 1× Diluted Nuclei Buffer (10x Genomics, PN-2000207) at 2,500 nuclei per microliter. The 10x Genomics single-cell ATAC-seq protocol (SingleCell_ATAC_ReagentKits_v1.1_UserGuide_RevD) was used with the following modifications. For Step 1.1, ‘Prepare Transposition Mix’, 7 μl of ATAC buffer, 3 μl of low TE buffer (10 mM Tris pH 8.0, 0.1 mM EDTA) and 5 μl of stock nuclei solution were mixed together, omitting the ATAC enzyme, as tagmentation had already been performed. All remaining steps, beginning with Step 2.0, ‘GEM Generation and Barcoding’, were performed according to 10x Genomics standard protocol. Libraries were sequenced using an Illumina NovaSeq 6000.

Data processing.

Illumina.bcl files were demultiplexed and converted to FASTQ format using the cellranger-mkfastq function. Resulting FASTQ files were aligned to the hg38 genome, filtered for duplicates and counted using cellranger-atac. An output BED file of filtered fragment data containing the cell barcode was then read into ArchR7 as fragment counts in 5-kb genome windows, which was used in all dimensionality reduction steps across all experiments. We used the ArchR7 gene activity score to calculate our CSS as described above. We used LSI dimensionality reduction7 using a TFIDF normalization function25, UMAP26 low-dimensional embedding and clustering using a nearest neighbor graph25 performed on data in LSI space.

As the cell line/differentiation experiments used the ICELL8 platform, we did not remove multiplets, as this platform uses microscopic imaging to ensure single-cell capture. For droplet partitioning data, we used the following methods to ensure data quality. 1) We first visualized fragment length distribution across clusters. We identified three clusters with nucleosomal banding distribution that was consistent with untethered transposition events (Supplementary Fig. 3b). 2) We then removed two clusters with high mean fragment counts. 3) We iteratively removed clusters that exhibited non-specific CSS. We accomplished this by calculating CSS significance across clusters using ArchR7. Any cluster that did not have any genes that were significantly over-represented or under-represented using significance thresholds of false discovery rate < 0.01 and absolute fold change > 3 was removed. Bulk projection of downsampled ChIP-seq data was performed as follows. Raw sequence data aligned to hg38 (BAM files) were downloaded from ENCODE18. Data were processed using ChomVAR27 by counting reads in 5-kb tiled genomes and subsequently used in the bulk projection function in ArchR. Shared nearest neighbor clustering was performed using Seurat from within ArchR. Single-cell projection was performed using a modified ArchR projection function that did not perform any manipulation of the input data before projection. Marker regions/genes for each group were calculated using the getMarkerFeatures function in ArchR. Pre-ranked GSEA (fgsea28) was performed using the entire list of marker genes ranked by −log10(P value)/sign(fold change) with the complete MSigDB29 set of gene lists. Peak set from single-cell ATAC-seq data (see below) was used as a custom annotation set, and motif deviations were calculated using the addDeviationsMatrix function in ArchR. Pseudotime trajectory was assigned with T1 as a root and Clusters T2 and T4 as an endpoint.

To perform variant calling, we first merged BAM output from cellranger-atac using a custom script (https://github.com/scfurl/mergeBams). We then used souporcell11 on the merged BAM, invoking the ‘no_umi’ and ‘skip_remap’ options. Sparse mixture model output from souporcell was log-normalized and colored by the genotype assignment.

Quality control and data processing for brain tumor ATAC-seq.

Nuclei preparation from snap-frozen brain tumor tissue was performed as described above, and standard single-cell ATAC-seq workflow was performed as per manufacturer guidelines (10x Genomics). Sequencing data were processed using the cell ranger-atac package. An output BED file of filtered fragment data containing the cell barcode was then read into ArchR7 using 500-bp genome windows. We used LSI dimensionality reduction7 using a TFIDF normalization function25, UMAP26 low-dimensional embedding and clustering using a nearest neighbor graph25 performed on data in LSI space. Tumor cells were identified as the largest cluster containing high gene activity scores for marker genes SOX2 and PTPRZ1. This cluster was used for peak calling using the MACS2 wrapper in ArchR with standard parameters.

External data.

Data from the following identifiers were downloaded from the ENCODE portal (https://www.encodeproject.org) and Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/). For Fig. 1b and Supplementary Fig. 1a,f,g: GSE124557. For Fig. 1d and Supplementary Fig. 2c: GSE75748. In addition, for the purposes of this study, hESC-differentiated time point 1.5 (scRNA-seq) was approximated to be day 2 in the GSE75748 dataset. For Supplementary Fig. 2b,c,e: GSE99172, GSE99173, GSE124557 and GSE85330. For Fig. 2b: ENCSR000ASK, ENCSR043SBG, ENCSR103GGR, ENCSR404MOX and ENCSR939JZW. For Fig. 3c, the following data sets were used: ENCFF363TCY and ENCFF911MNN.

Reporting Summary.

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Sequencing data are deposited in the Gene Expression Omnibus with accession code GSE157910. There are no restrictions on data use.

Code availability

Code used in this study can be found on GitHub at https://github.com/Henikoff/scCUT-Tag.

Supplementary Material

Supplemental

Acknowledgements

We thank E. Holland and members of the Holland lab for providing shared space for experimental work, the Fred Hutchinson Genomics Shared Resource for DNA sequencing and microfluidics services and BioRender for helping create figures. We thank T. Bryson, C. Codomo, J. Henikoff, M. Meers, D. Janssens, M. Setty, J. Thakur and other members of the Henikoff lab for helpful suggestions and discussions. We also thank the Koeplin Family Foundation and the Nancy and Buster Alvord Endowment, as well as A. Schantz for administrative support and L. Keene, A. Keen and K. Kern for technical support with autopsy specimen collection. This work was supported by the Howard Hughes Medical Institute (to S.H.); grants R01 HG010492 (to S.H.), R01 GM108699 (to K.A.) and K08 CA245037 (to P.J.C.) from the National Institutes of Health; an HCA Seed Network grant from the Chan-Zuckerberg Initiative (to S.H., A.P.P, S.N.F, R.G., K.A. and Y.Z.); a Burroughs Wellcome Career Award for Medical Scientists (to A.P.P.); and an American Cancer Society Mentored Scholar Award (to S.N.F). The Scientific Computing Infrastructure at the Fred Hutchinson Cancer Research Center is funded by ORIP grant S10OD028685.

Footnotes

Online content

Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41587-021-00865-z.

Competing interests

S.N.F. has received research support from Lyell Immunopharma. R.G. has received consulting income from Juno Therapeutics, Takeda, INFOTECHSoft, Celgene and Merck; has received research support from Janssen Pharmaceuticals and Juno Therapeutics; and declares ownership in CellSpace Biosciences. H.S.K. and S.H. have filed patent applications related to this work. A.P.P. declares ownership in Sygnomics.

Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41587-021-00865-z.

References

  • 1.Tanay A & Regev A Scaling single-cell genomics from phenomenology to mechanism. Nature 541, 331–338 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Klemm SL, Shipony Z & Greenleaf WJ Chromatin accessibility and the regulatory epigenome. Nat. Rev. Genet 20, 207–220 (2019). [DOI] [PubMed] [Google Scholar]
  • 3.Kaya-Okur HS et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat. Commun 10, 1930 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lee TI et al. Control of developmental regulators by polycomb in human embryonic stem cells. Cell 125, 301–313 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Laugesen A & Helin K Chromatin repressive complexes in stem cells, development, and cancer. Cell Stem Cell 14, 735–751 (2014). [DOI] [PubMed] [Google Scholar]
  • 6.Sparmann A & Van Lohuizen M Polycomb silencers control cell fate, development and cancer. Nat. Rev. Cancer 6, 846–856 (2006). [DOI] [PubMed] [Google Scholar]
  • 7.Granja JM et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat. Genet 53, 403–411 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hawkins RD et al. Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. Cell Stem Cell 6, 479–491 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Chu L-F et al. Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol. 17, 173 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Granja JM et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol 37, 1458–1465 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Heaton H et al. Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes. Nat. Methods 17, 615–620 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Bhaduri A et al. Outer radial glia-like cancer stem cells contribute to heterogeneity of glioblastoma. Cell Stem Cell 26, 48–63.e46 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Patel AP et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Couturier CP et al. Single-cell RNA-seq reveals that glioblastoma recapitulates a normal neurodevelopmental hierarchy. Nat. Commun 11, 3406 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wang Q et al. Tumor evolution of glioma-intrinsic gene expression subtypes associates with immunological changes in the microenvironment. Cancer Cell 32, 42–56 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Liau BB et al. Adaptive chromatin remodeling drives glioblastoma stem cell plasticity and drug tolerance. Cell Stem Cell 20, 233–246 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Janssens DH et al. Automated in situ chromatin profiling efficiently resolves cell types and gene regulatory programs. Epigenetics Chromatin 11, 74 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Llorens-Bobadilla E et al. Single-cell transcriptomics reveals a population of dormant neural stem cells that become activated upon brain injury. Cell Stem Cell 17, 329–340 (2015). [DOI] [PubMed] [Google Scholar]
  • 20.Segerman A et al. Clonal variation in drug and radiation response among glioma-initiating cells is linked to proneural–mesenchymal transition. Cell Rep. 17, 2994–3009 (2016). [DOI] [PubMed] [Google Scholar]
  • 21.Meissner A et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766–770 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Rheinbay E et al. An aberrant transcription factor network essential for Wnt signaling and stem cell maintenance in glioblastoma. Cell Rep. 3, 1567–1579 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.O’Neill KM et al. Depletion of DNMT1 in differentiated human cells highlights key classes of sensitive genes and an interplay with polycomb repression. Epigenetics Chromatin 11, 12 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Meers MP, Janssens DH & Henikoff S Pioneer factor–nucleosome binding events during differentiation are motif encoded. Mol. Cell 75, 562–575 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Stuart T et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Becht E et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol 10.1038/nbt.4314 (2018). [DOI] [PubMed] [Google Scholar]
  • 27.Schep AN, Wu B, Buenrostro JD & Greenleaf WJ chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods 14, 975–978 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Sergushichev AA An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. Preprint at bioRxiv 10.1101/060012 (2016). [DOI] [Google Scholar]
  • 29.Liberzon A et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental

Data Availability Statement

Sequencing data are deposited in the Gene Expression Omnibus with accession code GSE157910. There are no restrictions on data use.

RESOURCES