Abstract
Cell-to-cell variation is a universal feature of life that impacts a wide range of biological phenomena, from developmental plasticity1,2 to tumor heterogeneity3. While recent advances have improved our ability to document cellular phenotypic variation4–8 the fundamental mechanisms that generate variability from identical DNA sequences remain elusive. Here we reveal the landscape and principles of cellular DNA regulatory variation by developing a robust method for mapping the accessible genome of individual cells via assay for transposase-accessible chromatin using sequencing (ATAC-seq). Single-cell ATAC-seq (scATAC-seq) maps from hundreds of single-cells in aggregate closely resemble accessibility profiles from tens of millions of cells and provides insights into cell-to-cell variation. Accessibility variance is systematically associated with specific trans-factors and cis-elements, and we discover combinations of trans-factors associated with either induction or suppression of cell-to-cell variability. We further identify sets of trans-factors associated with cell-type specific accessibility variance across 8 cell types. Targeted perturbations of cell cycle or transcription factor signaling evoke stimulus-specific changes in this observed variability. The pattern of accessibility variation in cis across the genome recapitulates chromosome topological domains9 de novo, linking single-cell accessibility variation to three-dimensional genome organization. All together, single-cell analysis of DNA accessibility provides new insight into cellular variation of the “regulome.”
Main
Heterogeneity within cellular populations has been evident since the first microscopic observations of individual cells. Recent proliferation of powerful methods for interrogating single cells4–8 has allowed detailed characterization of this molecular variation, and provided deep insight into characteristics underlying developmental plasticity1,2, cancer heterogeneity3, and drug resistance10. In parallel, genome-wide mapping of regulatory elements in large ensembles of cells have unveiled tremendous variation in chromatin structure across cell-types, particularly at distal regulatory regions11. Methods for probing genome-wide DNA accessibility, in particular, have proven extremely effective in identifying regulatory elements across a variety of cell types12 – quantifying changes that lead to both activation and repression of gene expression. Given this broad diversity of activity within regulatory elements when comparing phenotypically distinct cell populations, it is reasonable to hypothesize that heterogeneity at the single cell level extends to accessibility variability within cell types at regulatory elements. However, the lack of methods to probe DNA accessibility within individual cells has prevented quantitative dissection of this hypothesized regulatory variation.
We have developed a single-cell Assay for Transposase-Accessible Chromatin (scATAC-seq), improving on the state-of-the-art13 sensitivity by >500-fold. ATAC-seq uses the prokaryotic Tn5 transposase14,15 to tag regulatory regions by inserting sequencing adapters into accessible regions of the genome. In scATAC-seq individual cells are captured and assayed using a programmable microfluidics platform (C1 single-cell Auto Prep System, Fluidigm) with methods optimized for this task (Fig. 1a and Extended Data Fig. 1 and Supplemental Discussion). After transposition and PCR on the Integrated Fluidics Circuit (IFC), libraries are collected and PCR amplified with cell-identifying barcoded primers. Single-cell libraries are then pooled and sequenced on a high-throughput sequencing instrument. Using single-cell ATAC-seq we generated DNA accessibility maps from 254 individual GM12878 lymphoblastoid cells. Aggregate profiles of scATAC-seq data closely reproduce ensemble measures of accessibility profiled by DNase-seq and ATAC-seq generated from 107 or 104 cells respectively (Fig. 1b,c and Extended Data Fig. 2a). Data from single cells recapitulate several characteristics of bulk ATAC-seq data, including fragment size periodicity corresponding to integer multiples of nucleosomes, and a strong enrichment of fragments within regions of accessible chromatin (Extended Data Fig. 2b,c). Microfluidic chambers generating low library diversity or poor measures of accessibility, which correlate with empty chambers or dead cells, were excluded from further analysis (Fig. 1d and Extended Data Fig. 2d–l). Chambers passing filter yielded an average of 7.3×104 fragments mapping to the nuclear genome. We further validated the approach by measuring chromatin accessibility from a total of 1,632 IFC chambers representing 3 tier 1 ENCODE cell lines16 (H1 human embryonic stem cells [ESCs], K562 chronic myelogenous leukemia and GM12878 lymphoblastoid cells) as well as from V6.5 mouse ESCs, EML1 (mouse hematopoietic progenitor), TF-1 (human erythroblast), HL-60 (human promyeloblast) and BJ fibroblasts (human foreskin fibroblast).
Because regulatory elements are generally present at two copies in a diploid genome, we observe a near digital (0 or 1) measurement of accessibility at individual elements within individual cells (Extended Data Fig. 3a). For example, within a typical single cell we estimate a total of 9.4% of promoters are represented in a typical scATAC-seq library (Extended Data Fig. 3). The sparse nature of scATAC-seq data makes analysis of cellular variation at individual regulatory elements impractical. We therefore developed an analysis infrastructure to measure regulatory variation using changes of accessibility across sets of genomic features (Fig. 2a,b). To quantify this variation we first choose a set of open chromatin peaks, identified using the aggregate accessibility track, which share a common characteristic (such as transcription factor binding motif, ChIP-seq peaks, cell cycle replication timing domains, etc.). We then calculate the observed fragments in these regions minus the expected fragments, down sampled from the aggregate profile, within individual cells. To correct for bias, we divide this by the root mean square of fragments expected from a background signal (BS) constructed to estimate technical and sampling error within single-cell data sets (Methods and Extended Data Fig. 4). Herein, we refer to this metric as “deviation”. Finally, for any set of features, we aggregate the deviation measurements across cells (Fig 2b) to obtain an overall “variability” score, a metric of excess variance over the background signal.
We first focused our analysis on K562 myeloid leukemia cells, a cell type with extensive epigenomic data sets17,18. To comprehensively characterize variability associated with trans-factors within individual K562 cells, we computed variability across all available ENCODE ChIP-seq, transcription factor motifs and regions that differed in replication timing (as determined from Repli-Seq data sets19) (Fig. 2c,d). We found measures of cell-to-cell variability were highly reproducible across biological replicates (Extended Data Fig 5). As expected from proliferating cells, we find increased variability within different replication timing domains, representing variable ATAC-seq signal associated with changes in DNA content across the cell cycle. In addition, we discover a set of trans-factors associated with high variability. These factors include sequence-specific transcription factors (TFs), such as GATA1/2, JUN, and STAT2, and chromatin effectors, such as BRG1 and P300. Immunostaining followed by microscopy or flow cytometry (Fig. 2e and Extended Data Fig. 6a–d) confirmed heterogeneous expression of GATA1 and GATA2. Principal component (PC) analysis of single-cell deviations across all trans-factors show seven significant PCs, with PC 5 describing changes in DNA abundance throughout the cell cycle. This analysis suggests that high-variance trans-factors are variable independent of the cell-cycle (Fig. 2f, Extended Data Fig. 6e–g). The remaining PCs show contributions from several TFs, suggesting that variance across sets of trans-factors represent distinct regulatory states in individual cells.
We hypothesized that variation associated with different trans-factors can synergize, either through cooperative or competitive binding, to induce or suppress site-to-site variability in chromatin accessibility. For example, the most variant factors in K562 cells – GATA1 and GATA2 – display expression heterogeneity and also bind an identical consensus sequence “GATA,” suggesting these factors may compete for access to DNA sequences. In support of this hypothesis, we find regulatory elements with both GATA1 and GATA2 ChIP-seq signals show increased variability in accessibility, whereas sites with only GATA1 or GATA2 show substantially less variability (Fig. 2g, Extended Data Fig. 6h). In contrast, we find no substantial change in variability of GATA1 binding sites that co-occur with JUN or CEBPB (Extended Data Fig. 6i). We also find peaks unique to GATA1 binding are significantly more accessible than peaks unique to GATA2 (Extended Data Fig. 6k–l) supporting the hypothesis that GATA1, an activator of accessibility, competes with GATA2 to induce single-cell variability. Extending this analysis to all TF ChIP-seq data sets revealed a trans-factor synergy landscape for accessibility variation (Fig. 2g and Extended Data Fig. 6j). For example, chromatin accessibility variance associated with GATA2 binding is significantly enhanced when the same region could also be bound by GATA1, TAL1 or P300. In contrast, CTCF, SUZ12, and ZNF143 appear to act as general suppressors of accessibility variance, unless associated with proximal binding of ZNF143 or SMC3, the latter a cohesin subunit involved in chromosome looping18,20. Thus, single cell accessibility profiles nominate distinct trans-factors that, in combination, induce or suppress cell-to-cell regulatory variation.
To validate our ability to detect changes in accessibility variance, we used chemical inhibitors to modulate potential sources of cell-cell variability. Inhibition of cyclin-dependent kinases 4 and 6 (CDK4/6), essential components of the cell cycle, caused a marked reduction of variability within peaks associated with DNA replication timing domains (Repli-seq) (Fig. 3a). The addition of inhibitors of JUN or BCR-ABL kinases (JNKi and Imatinib, respectively) increased G1/S-associated variability suggesting an increase in the subpopulation of G1/S cells, which was validated with flow cytometry (Extended Data Fig. 7). JUN variability was one of the top changes caused by JNKi but not Imatinib, suggesting that high-variance trans-factors can also be specifically and pharmacologically modulated. Tumor necrosis factor (TNF) treatment of GM12878 cells specifically modulated accessibility variability at NF-κB sites (Fig. 3b), consistent with the known stochastic and oscillatory property of nuclear shuttling in this system21. Together, these results show that variability can be experimentally modulated and further demonstrates that variability is not solely dependent on the cell-cycle.
We observe that trans-factors associated with high variability are generally cell type specific. Hierarchical bi-clustering of single-cell deviations generated from three cell lines reveals cell-type specific sets of transcription factor motifs associated with high variability (Fig. 3c). This analysis also shows cells from different biological replicates cluster with their cell type of origin (with a single exception), suggesting scATAC-seq can also be used to deconvolve heterogeneous cellular mixtures. Systematic analysis of all assayed cell types identified high-variance trans-factor motifs that are generally unique to specific cell types (Fig. 3d and Extended Data Figure 8a). For example, regions associated with GATA TFs are most variant in K562s while regions associated with master pluripotency TFs Nanog and Sox2 are most variant in mouse embryonic stem cells (ESCs), consistent with previous observations of expression variation of these factors22,23. Importantly we also find high variability of GATA1 and PU.1 (SPI1) binding accessibility in EML cells, a cell type previously shown to have >200x GATA1 and >15x PU.1 expression differences within clonal cellular subpopulations1. Interestingly, the complete set of identified high-variance trans-factors contains a number of TFs previously reported to dynamically localize into the nucleus, including NF-κB, JUN, and ETS/ERG21,24,25, suggesting that temporal fluctuations in TF concentration may be driving observed chromatin accessibility heterogeneity. Finally, we find BJ fibroblasts and HL-60s exhibit less variance among this set of annotated trans-factor motifs, suggesting differences in the global levels of trans-factor variability across cell lines. Specific chromatin states and histone modifications26 are also sometimes associated with accessibility variation in single cells (Extended Data Fig. 8b,c). Overall these findings suggest that trans-factors promote cell-type specific chromatin accessibility variation genome-wide.
Patterns of variation in accessibility along the linear genome in individual cells reveal an unexpected connection to higher order chromosome folding. We calculated single cell deviations within sliding windows across the genome, each encompassing a fixed number of peaks (N=25) (Fig. 4a). We then determined which windows co-varied within individual cells by calculating the co-correlation of each window across all others within the same chromosome within individual cells (Extended Data Fig. 9a,b). We then further enhanced this co-correlation matrix using a secondary correlation analysis using methods similar to those employed in chromosome conformation studies9 (Methods). The resulting matrix, which identifies pairs of positions in the genome where accessibility co-varies within individual cells, yields Mb-scale correlation domains highly concordant with previously observed chromatin domains29 (Fig. 4b–d and Extended Data Fig. 9c–i) (R=0.61 for chromosome 1). These data provide independent biological validation of large-scale compartmentalization of higher-order chromatin structure9,29. Moreover, these results suggest that higher-order chromatin interactions may drive regulatory variability in cis (elements that are close together tend to be open together), and that ensemble chromosome conformation data may arise in part from the statistical properties of single cell variation in co-regulated accessibility, a hypothesis also supported by single-cell FISH measurements of interactions between DNA loci30.
Using scATAC-seq we dissected single-cell epigenomic heterogeneity and linked cis- and trans- effectors to variability in accessibility profiles within individual epigenomes. We identify trans-factors associated with increased accessibility variance, which we call high-variance trans-factors. Additionally, other trans-factors such as CTCF appear to buffer variability, perhaps by providing a stable anchor of chromatin accessibility or insulator function that dampens potential fluctuations. Conversely, co-occurance with other factors such as P300 appears to amplify variability, perhaps due to synergistic interactions. Lineage-specific master regulators are associated with cell-type specific single-cell epigenomic variability across several cell types, suggesting that control of single-cell variance is a fundamental characteristic of different biological states. Finally, variation of chromatin accessibility in cis is highly correlated with previously reported chromosome compartments, opening the intriguing possibility that this component of epigenomic noise has its roots in higher-order chromatin organization. All together these data provide exciting new hypothesis of regulatory mechanisms that give rise to single-cell heterogeneity.
We envision that future studies will enhance the utility of scATAC-seq by further improving the recovery of DNA fragments, increasing throughput, and refining methods of data analysis (Supplementary Discussion). Improvements to throughput and new statistical tools will enable single-cells to be partitioned by cell-state and analyzed in aggregate to find the individual peaks that drive variability (Extended Data Fig. 10). In addition, we anticipate scATAC-seq may be paired with existing approaches in microscopy and single-cell RNA-seq to provide opportunities for systems analysis of individual cells. Such an approach will link regulatory variation to details of phenotypic variation, promising new insight into the molecular underpinnings of cellular heterogeneity. We believe scATAC-seq will likewise enable the interrogation of the epigenomic landscape of small or rare biological samples allowing for detailed, and potentially de novo, reconstruction of cellular differentiation or disease at the fundamental unit of investigation – the single cell.
Extended Data
Supplementary Material
Acknowledgments
This work was supported by National Institutes of Health (NIH) P50HG007735 (to H.Y.C. and W.J.G.), UH2 AR067676 and Lifespan Extension Foundation (H.Y.C.), U19AI057266 (to W.J.G.) and the Rita Allen Foundation (to W.J.G.) and the Baxter Foundation Faculty Scholar Grant (to W.J.G); H.Y.C. is an Early Career Scientist of the Howard Hughes Medical Institute. J.D.B. acknowledges support from the National Science Foundation Graduate Research Fellowships and NIH training grant T32HG000044 for support. M.P.S. acknowledges the NIH and the National Human Genome Research Institute (NHGRI) for funding through 5U54HG00455805. We thank members of Greenleaf and Chang labs, as well as the Fluidigm team, including Larry Xi for useful discussions. We acknowledge the S. Kim lab for assistance with FACS sorting and the C. Bustamante lab for help with sequencing. We also thank, Robert Nichols, Claire Mazumdar, Vittorio Sebastiano and Viviana Risca for cells.
Footnotes
Author Contributions
J.D.B., H.Y.C., and W.J.G. conceived of the method. J.D.B., B.W., M.G., and D.R. developed the Fluidigm C1 microfluidic protocols. B.W. performed all scATAC-seq experiments with supervision from J.D.B. U.M.L. conducted the flow analysis, immunostains and drug treatments. J.D.B. developed and implemented the analysis infrastructure with input from W.J.G. All authors interpreted the data and wrote the manuscript. W.J.G. and H.Y.C. supervised all aspects of this work.
Competing Interest.
Stanford University has filed a provisional patent application on the methods described, and J.D.B., H.Y.C., and W.J.G. are named as inventors. D.R. and M.L.G. declare competing financial interests as employees of Fluidigm Corp.
Materials and Methods
All data deposited in GEO under the accession number: GSE65360
References
- 1.Chang HH, Hemberg M, Barahona M, Ingber DE, Huang S. Transcriptome-wide noise controls lineage choice in mammalian progenitor cells. Nature. 2008;453:544–547. doi: 10.1038/nature06965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Imayoshi I, et al. Oscillatory control of factors determining multipotency and fate in mouse neural progenitors. Science. 2013;342:1203–1208. doi: 10.1126/science.1242366. [DOI] [PubMed] [Google Scholar]
- 3.Patel AP, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344:1396–1401. doi: 10.1126/science.1254257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bendall SC, et al. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science. 2011;332:687–696. doi: 10.1126/science.1198704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Raj A, Rifkin SA, Andersen E, van Oudenaarden A. Variability in gene expression underlies incomplete penetrance. Nature. 2010;463:913–918. doi: 10.1038/nature08781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Jaitin DA, et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science. 2014;343:776–779. doi: 10.1126/science.1247651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Smallwood SA, et al. Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat Meth. 2014;11:817–820. doi: 10.1038/nmeth.3035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zong C, Lu S, Chapman AR, Xie XS. Genome-wide detection of single-nucleotide and copy-number variations of a single human cell. Science. 2012;338:1622–1626. doi: 10.1126/science.1229164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lieberman-Aiden E, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Michor F, et al. Dynamics of chronic myeloid leukaemia. Nature. 2005;435:1267–1270. doi: 10.1038/nature03669. [DOI] [PubMed] [Google Scholar]
- 11.Consortium TEP. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Thurman RE, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489:75–82. doi: 10.1038/nature11232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Meth. 2013;10:1213–1218. doi: 10.1038/nmeth.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Goryshin IY, Reznikoff WS. Tn5 in vitro transposition. J Biol Chem. 1998;273:7367–7374. doi: 10.1074/jbc.273.13.7367. [DOI] [PubMed] [Google Scholar]
- 15.Adey A, et al. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 2010;11:R119. doi: 10.1186/gb-2010-11-12-r119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Consortium TEP. A User’s Guide to the Encyclopedia of DNA Elements (ENCODE) PLoS Biol. 2011;9:e1001046. doi: 10.1371/journal.pbio.1001046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gerstein MB, et al. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012;489:91–100. doi: 10.1038/nature11245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Xie D, et al. Dynamic trans-Acting Factor Colocalization in Human Cells. Cell. 2013;155:713–724. doi: 10.1016/j.cell.2013.09.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Hansen RS, et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proceedings of the National Academy of Sciences. 2010;107:139–144. doi: 10.1073/pnas.0912402107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Parelho V, et al. Cohesins Functionally Associate with CTCF on Mammalian Chromosome Arms. Cell. 2008;132:422–433. doi: 10.1016/j.cell.2008.01.011. [DOI] [PubMed] [Google Scholar]
- 21.Tay S, et al. Single-cell NF-kB dynamics reveal digital activation and analogue information processing. Nature. 2010;466:267–271. doi: 10.1038/nature09145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Grün D, Kester L, van Oudenaarden A. Validation of noise models for single-cell transcriptomics. Nat Meth. 2014;11:637–640. doi: 10.1038/nmeth.2930. [DOI] [PubMed] [Google Scholar]
- 23.Singer ZS, et al. Dynamic Heterogeneity and DNA Methylation in Embryonic Stem Cells. Molecular Cell. 2014;55:319–331. doi: 10.1016/j.molcel.2014.06.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Cai L, Dalal CK, Elowitz MB. Frequency-modulated nuclear localization bursts coordinate gene regulation. Nature. 2008;455:485–490. doi: 10.1038/nature07292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Levine JH, Lin Y, Elowitz MB. Functional roles of pulsing in genetic circuits. Science. 2013;342:1193–1200. doi: 10.1126/science.1239999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ernst J, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473:43–49. doi: 10.1038/nature09906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Calo E, Wysocka J. Modification of Enhancer Chromatin: What, How, and Why? Molecular Cell. 2013;49:825–837. doi: 10.1016/j.molcel.2013.01.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Mikkelsen TS, et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007;448:553–560. doi: 10.1038/nature06008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kalhor R, Tjong H, Jayathilaka N, Alber F, Chen L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat Biotechnol. 2012;30:90–98. doi: 10.1038/nbt.2057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Giorgetti L, et al. Predictive Polymer Modeling Reveals Coupled Fluctuations in Chromosome Conformation and Transcription. Cell. 2014;157:950–963. doi: 10.1016/j.cell.2014.03.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.