Abstract
Transcriptional control is exerted primarily through the binding of transcription factor proteins to regulatory elements in DNA. By virtue of eukaryotic DNA being complexed with histones, transcription factor binding to DNA alters or eliminates histone-DNA contacts, leading to increased accessibility of the DNA region to nuclease enzymes. This hypersensitivity to nuclease digestion has been used to define DNA binding events and regulatory elements across genomes, and to compare these attributes between cell types or conditions. These approaches make it possible to define the regulatory elements in a genome as well as to predict the regulatory networks of transcription factors and their target genes in a given cell state. As these chromatin accessibility assays are increasingly used, it is important to consider how to analyze the resulting data to avoid artifactual results or misinterpretation. In this review, we focus on some of the key technical and computational caveats associated with plant chromatin accessibility data, including strategies for sample preparation, sequencing, read mapping, and downstream analyses.
Keywords: Transcription factor, chromatin accessibility, ATAC-seq, DNaseI-seq, sequencing, normalization, peak calling
Introduction
The action of transcription factors (TFs) at DNA regulatory elements is a primary mechanism of gene regulation that underlies the transcriptional output of the genome in every cell. Thus, defining the networks of TFs and their regulatory elements that are in play in a given cell type is important for understanding transcriptional regulation in that cell state, as well as learning how cells respond to perturbations.
In eukaryotes, DNA is complexed with histone proteins to form nucleosome arrays. When a TF binds to DNA, this leads to disruption or displacement of a local nucleosome, making that DNA region more accessible to nuclease enzymes. As such, this nuclease hypersensitivity can be used to define sites of factor binding to DNA and to thereby identify DNA regions with potential regulatory activity [1]. Historically, these studies were carried out using DNase I as the nuclease and examining one locus at a time [2]. In more recent years, the DNAse I sensitivity (and sometimes micrococcal nuclease [MNase] sensitivity [3]) assay was coupled with high-throughput sequencing to define hypersensitive sites genome-wide [4]. While this DNase I-seq approach is powerful, it has recently been mostly supplanted by the Assay for Transposase Accessible Chromatin (ATAC-seq), which requires far lower numbers of input nuclei and greatly simplifies the sequencing library preparation step [5,6].
To date, chromatin accessibility profiling in plants using DNase I-seq, MNase-seq, and ATAC-seq has been employed in numerous species and has revealed a wealth of new information regarding the regulatory structure and dynamics in plant genomes. For example, accessibility studies have revealed that large genomes have a higher fraction of regulatory elements distal to genes than smaller genomes [7–12]. Chromatin accessibility studies have allowed the construction of gene regulatory networks during cell differentiation and organ development [8,13–15], as well as in response to environmental changes [16–18]. Additionally, examination of accessibility differences between mutant and wild-type organisms [19,20], and between accessions or species [8,21,22] have been highly informative. As these assays continue to be deployed in different contexts and scales, it is important to keep in mind their caveats and to follow best practices to avoid artifactual and misleading results.
In this review, we highlight some of the technical and computational challenges associated with the analysis of chromatin accessibility data, and we propose a set of best practices that will help ensure reliable results that are not biased by artifacts.
Considerations for sample preparation
The goal of a chromatin accessibility assay is to sequence the ends of the small DNA fragments arising from nuclease digestion at accessible sites in nuclei. However, contamination of the starting nuclear preparation with DNA-containing organelles such as plastids and mitochondria results in an abundance of organelle-derived reads in the final sequencing library. In extreme cases these reads can make up the vast majority of those obtained. Thus, it is important to eliminate these organelles and their DNA as much as possible. This can be achieved for essentially any plant species or tissue type by fluorescence-activated nuclei sorting (FANS) of crude DAPI-stained nuclei [7]. With the appropriate nuclear marker, FANS can also be used to isolate nuclei from specific cell types or tissues. Alternatively, the isolation of nuclei tagged in specific cell types (INTACT) method can be used to affinity purify nuclei from specific cell types or from all cells, depending on the promoter used to drive expression of the nuclear envelope targeting protein [23]. While it is also possible to reduce organelle contamination somewhat with the selective use of detergents and centrifugal fractionation of nuclei [23], fluorescence- or affinity-based nuclei purification methods generally perform better. Alternatively, several groups have recently reported the use of pooled synthetic guide RNAs with recombinant Cas9 to eliminate unwanted sequences, such as those from mitochondria, from a library prior to sequencing [24–26].
The fraction of accessibility reads aligning to the chloroplast genome can vary widely between starting tissue types, even using the same method for nuclei isolation. For example, using INTACT to prepare nuclei from Arabidopsis thaliana tissue and DNase I to digest accessible regions, we find a wide range of chloroplast contamination, from 2% of the reads when targeting the root to over 50% of the reads in samples from green tissues (Figure 1A). Thus, it is important to consider the abundances of organelles in the starting tissue and modify the nuclei purification protocol as needed to reduce their carryover.
Figure 1. Preparing the library.

(A) Degree of chloroplast contamination for various DNase I-seq samples. (B) qPCR assessment of enrichment for peak fragments and non-peak fragments in accessibility libraries. Values along x-axis indicate the log10(fold increase over gDNA). Specifically, if A = amount of template fragments in the ATAC-seq library within the test locus; B = amount of template fragments in the ATAC-seq library within the control nonpeak; C = amount of gDNA within the test locus; D = amount of gDNA within the control nonpeak, fold increase over gDNA = (A/B) / (C/D). (C) Relationship between the number of reads sequenced from amplified libraries of varying complexity and fraction of duplicated sequences. Curves generated via simulation (x) by sampling with replacement from a set of n unique starting fragments (n= 35M, 50M, 75M). Curves generated from real Arabidopsis and Sorghum reads () by subsampling a set of sequenced reads. (D) Cartoon displaying the effect of nuclei abundance (i.e. changing the substrate-to-enzyme ratio) on the complexity of the resulting library. (E) Fragments sequenced within an example peak (yellow region) in two ATAC reactions performed on the same nuclear preparation: one undiluted (top) and one diluted 1:10 (bottom). Horizontal line segments within the peak are paired-end sequenced fragments. Red line segments indicate duplicates.
Free nuclear DNA, likely arising from partially or completely lysed nuclei, can be another problematic contaminant in the nuclei preparation. Any such free DNA present during the nuclease cleavage step increases the background noise in accessibility measurements due to its high overall accessibility. The extent of this type of contamination is often measured as the fraction of accessibility reads that align within peaks of accessibility (a measure of signal-to-noise ratio). However, this information is generally only available after millions of reads have been sequenced. An alternative method of measuring levels of contamination with free DNA is to use qPCR to target known peak and non-peak regions in the amplified library, with the expectation that accessible peak regions should contain more template fragments than non-peak regions. For example, using this method, we determined that the nuclear preparations from Arabidopsis thaliana inflorescence meristem and roots were of high enough purity to proceed with sequencing, whereas a whole seedling preparation was not (Figure 1B). This qPCR-based approach is not only useful for analyzing the nuclear signal-to-noise ratio [27], but could also be used to quantify the abundance of organelle genomic fragments in the final library prior to sequencing.
Once a library is deemed acceptable, it is PCR-amplified, typically for 8-12 cycles, and millions of fragments are end-sequenced in order to identify sites in which the genomic DNA was cut, either by DNase I or by the Tn5 transposase (in which case the cut is followed by a transposon insertion event). These cut sites are used to measure accessibility. Invariably, some fraction of these sequence-pairs cut-sites are identically-located (sequence duplicates). We find that the majority of these duplicates arise due to limited complexity of accessibility libraries rather than PCR amplification bias. Two main factors limit the complexity (i.e. the number of unique starting fragments) of the library. First, producing accessibility libraries involves generating fragments from a limited (accessible) fraction of the genome. Second, due to sequence bias of the cleavage enzymes, those fragments also tend to have a limited set of start and end sites within those regions. After amplification, the sequencing process is analogous to sampling, with replacement, from an urn with n unique balls. The more balls sampled, the more duplicate balls in our sample. This property is reflected in a simple quadratic relationship between the number of fragments sequenced and the fraction of re-sampled fragments among those sequenced (i.e., sequence duplicates) both for simulated and real data. This pattern would be observed under completely unbiased amplification of the initial fragment library (Figure 1C). The complexity of the library can be increased by changing the ratio of substrate (number of genomes) to cleavage enzyme (eg: Tn5, DNase I). For example, reducing the number of nuclei, and thus the number of genomes, should theoretically result in a higher complexity library (Figure 1D). This is indeed what we observe experimentally in Arabidopsis thaliana (Figure 1E).
Starting off with the right sequencing and read mapping parameters
The aim of accessibility analysis is often to compare accessibility across biological conditions. An important aspect of this is ensuring that, in all samples compared, the same regions of the genome are examined with equal measure. This point is more obvious when comparing across distantly related strains or different species, as genomic content can vary greatly between them. However, it is also important when comparing ecotypes within the same species [21] or, in some cases, even mutants of the same ecotype [28]. Thus, it is important to define and consider only genomic regions that are shared among all samples examined. Accessibility in poorly assembled or repetitive regions of a genome is also difficult to measure, and certain aspects of the sample library preparation greatly affect the accuracy of these measurements. In this section we discuss the importance of “normalizing” the genome.
Certain types of regions, such as centromeres and rDNA arrays, are obvious candidates for exclusion from analysis (i.e., blacklisted regions), as those regions are known a priori to be poorly assembled. However, some regions of the genome are more “mappable” than others, depending on the amount and degree of repetitiveness in the genome, due to transposons, gene duplications, or polyploidy. The mappability of a region of the genome is reflected in the mapping quality score (mapq) assigned by the alignment algorithm to reads aligning within that region. For example, Arabidopsis thaliana has fewer repetitive regions than maize. This is reflected in the fraction of reads aligning to the respective genomes with the lowest mapping quality scores (Figure 2A). Note that although the accessibility of repetitive regions is difficult to accurately measure, the mappable (i.e., non-repetitive) regions of the genome are far more gene-dense than the unmappable regions, suggesting that although we are missing a large fraction of the genome, we are detecting the accessible sites (potential regulatory regions) for a large fraction of genes. Libraries with longer fragments in which more bases are sequenced are better able to distinguish repeated regions of a genome than those with shorter fragments and shorter reads, thus making more of the genome mappable. However, when comparing accessibility across samples, it is important to use libraries with the same fragment and read length, otherwise hundreds of kilobases may appear to contain more accessible sites in one sample when, in fact, those regions are simply more mappable (Figure 2B).
Figure 2. Normalizing the mappable genome.

(A) Fraction of ATAC-seq reads mapping to the nuclear genome (centromeres removed) with various assigned mapping quality scores in different species. Minimum values for BWA and BOWTIE2 were 0 and both 0 and 1, respectively. Maximum values for BWA and BOWTIE2 were 60 and 42, respectively. (B) The number of bases that are unmappable in A. thaliana for simulated whole genome shotgun reads of various fragment and read lengths. Black double-headed arrows indicate the difference in number of kilobases covered. (C) Distribution of the number of cuts per 1 Mb window in chr1A of tetraploid wheat, excluding reads aligning with mapping score below indicated minimum mapq. (D) The 1 Mb window with the second-highest number of cuts/Mb on chr1A. Region highlighted in red is homologous to the mitochondrial genome. (E) The 1 Mb window with the third-highest number of cuts/Mb on chr1A. Regions highlighted in green are homologous to the chloroplast genome. (F) The distribution of mapq scores for all reads mapping to chr1A (gray), reads mapping to red highlighted region in panel D (red), reads mapping to green highlighted region in panel E (green).
We can improve the reliability of our accessibility measurements by excluding reads that are ambiguously mapped (i.e., mapped with low confidence). Because both the repetitiveness and the quality of the genomic assemblies vary by species, the threshold distinguishing high confidence from low confidence mapping is species-specific and needs to be determined empirically. In Arabidopsis thaliana, for example, reads with a mapq score above the minimum (0 for BWA [29], 0 or 1 for BOWTIE2 [30]) can be considered as mapping with high confidence. However, in tetraploid wheat a higher minimum mapq value is needed. We can determine this threshold by plotting the distribution of cuts within 1Mb windows. For example, within chromosome 1A there are three megabase-length regions containing unusually high amounts of cleavage when the minimum BWA-assigned mapq value is 1 (Figure 2C). The megabase window with the highest number of cleavages is telomeric, while the other two contain large regions homologous to mitochondria (Figure 2D) and chloroplast (Figure 2E). The reads mapping to the organellar regions tend to have lower mapq scores than reads mapping outside of those regions (Figure 2F). These outliers disappear when the minimum BWA-assigned mapq value is raised to 40 (Figure 2C), indicating that a threshold mapq value of 40 is appropriate for tetraploid wheat.
Identification of reproducibly accessible sites within a condition and differentially accessible regions between conditions
Identification of accessible chromatin regions (ACRs) across the genome is a process of reducing the per-base accessibility information to a set of discrete hyper accessible loci, thereby itemizing the biological regions of interest. However, many of the popular peak-calling algorithms are sensitive to features of the quality and quantity of the sequenced library. In comparing across samples to identify differentially accessible regions, the aim is to ensure that each sample is represented equally. Here we highlight some potential analytical pitfalls in this process.
Peak-callers can be classified as probabilistic (i.e., identifying regions with significantly more cuts than background), heuristic (i.e. identifying regions with the top amount of cuts), or a combination of both (usually the significance test is performed first, followed by a filtering for only high-cut peaks). Although the relative merits of peak calling algorithms -- most of which were originally developed to identify ChIP-peaks -- have been examined frequently over the years [31,32], in practice, a variety of peak-calling methods are still employed, most commonly a probabilistic algorithm (e.g., MACS2 or Homer) followed by some sort of heuristic filtering [8,13,16,21,33]. Some use replicates to identify sets of reproducibly-found peaks, and some filter that set of reproduced peaks further by assigning each pair of reproduced peaks an irreproducible discovery rate (IDR), retaining only the peak-pairs with sufficiently low IDR [34]. However, as we discuss below, for the IDR method to be reliable, it is necessary to start with comparably prepared replicates.
Fairly early in the era of genome-wide accessibility analysis, it became clear that the number of peaks called by significance-based peak-callers (such as HOTSPOT [35], MACS2 [36], and Homer [37]) is strongly influenced by the simple quantity of reads collected [38–40]. A likely reason that this feature became apparent during accessibility analysis is that the number of ACRs is surprisingly consistent not only across tissues and treatment, but across species [12]. Although early accessibility papers in Drosophila and human tissue [41–44] did not explicitly control for read depth prior to identification of accessible sites, their samples, by and large, happened to be sequenced to a similar depth of coverage, thus not introducing catastrophic errors to their analysis. Plant accessibility papers that emerged soon after those seminal human and Drosophila papers generally ensured that samples had comparable numbers of reads before peaks are called, usually by subsetting to consistent depth across samples [8,9,16,21,33,45].
Here we wish to emphasize the effect that sample quality (i.e., purity of ATAC substrate, as measured by the fraction of ATAC-reads landing in ACRs) has on peak calling, highlighting the differences between probabilistic- and heuristic- peak-calling. As discussed above, all nuclei preparations are contaminated to some degree by free DNA. As such, some fraction of fragments within all accessibility libraries derive from cleavage of this free DNA, which increases the background signal. The degree of contamination is determined by two factors: (i) ease of extracting intact nuclear DNA from tissue, and (ii) size of the non-accessible genome. In part due to the cell wall, pure, intact nuclei are more challenging to extract from plant tissue than from animal tissue. Furthermore, some plant tissue types may be more challenging to extract nuclei from because of secondary metabolites. Crop plants also tend to have large genomes: banana (~600 Mbp), sorghum (~800 Mbp), soybean (~1.1 Gbp), Dura palm (~1.8 Gbp), pearl millet (~1.9 Gbp), maize (~3Gbp), and bread wheat (~15 Gbp). Therefore, many crop plants present a special challenge when it comes to generating consistent data quality across samples.
We start by demonstrating, once again, the effect of read quantity on the numbers of peaks identified by a popular probabilistic peak-caller, MACS2 [32], then comparing the results with those obtained using a simple heuristic peak-caller, the findpeaks function in the pracma package [46]. Downsampling an ATAC-seq dataset reveals that the number of peaks identified by the significance-based peak caller (MACS2) is indeed strongly dependent on the number of reads sampled. By contrast, the heuristic peak caller approaches an asymptote with far fewer reads (Figure 3A), with false positive peaks arising when low numbers of reads are supplied (below approximately 25M reads, in this example).
Figure 3. Calling accessibility peaks and differentially accessible regions.

(A) Relationship between the number of peaks identified using a significance-based peak-caller (MACS2 v.2.1.0) and a simple thresholded local maxima caller (simple peaks, findpeaks function in the pracma package, v.2.2.9) in an ATAC-seq sample with increasing amounts of reads sequenced. (B) Relationship between the number of peaks identified using a significance-based peak-caller (MACS2 v.2.1.0) and a simple thresholded local maxima caller (simple peaks, findpeaks function in the pracma package, v.2.2.9) in an ATAC-seq sample consisting of 40M reads with decreasing levels of contamination by simulated whole genome shotgun reads. The set of peaks identified in a pure (100% ATAC-seq) sample is defined as the benchmark. Venn diagrams above each dot indicate the number of peaks in the benchmark set that are found (true positives; gold), missed (false negatives; red) or not found in the benchmark set (false positives; gray). (C) Union peaks are generated by merging peaks from multiple samples (as in bedops –m).
We then examine the effects of read quality on peak-calling. Rather than downsampling, as above, we generated accessibility libraries with varying degrees of purity, mixing reads from an actual ATAC-seq library with progressively greater fractions of ‘contaminating’ simulated whole-genome-shotgun reads, these latter reads approximating reads derived from ATAC reactions on contaminating de-chromatinized DNA. As seen in Figure 3B, many peaks in the benchmark set are missed by the significance-based peak-caller, MACS2, in samples with greater contamination levels (i.e., a lower percentage of bona fide ATAC-seq reads). However, at similar contamination levels, a much greater fraction of the peaks in our benchmark set are identified by a simple peak caller, albeit with some fraction of false positives (red, Figure 3B), again suggesting a minimum number of reads be collected to obtain a low number of false positive peaks. In other words, for the simple peak caller, the precision (true positives / true and false positives) and recall (true positives / true positives and false negatives) remain fairly consistent across the sample quality levels shown, with precision decreasing from 83%-80% and recall increasing from 72%-76% as sample quality decreases. For the significance-based peak caller, precision remains extremely high (>97% for all quality levels shown) but recall drops dramatically with sample quality--down to 15%. This indicates that the use of a simple peak caller should be accompanied by an IDR-type method for narrowing down the list of true peaks, but also that the use of a significance-based peak caller on a sample of low quality would result in omission of many true accessible regions.
The main goal of most accessibility analyses is to determine regions that are differentially accessible across experimental conditions (genotypes, environments, etc). In this step it is important to make quantitative comparisons between samples, rather than simply identifying presence/absence of an accessible region based on the peak-caller. A presence/absence approach preferentially identifies regions of low accessibility in which accessibility happens to cross the peak-calling threshold in some samples and not others (for an example, see Supplemental Figure 2A and B in [21]). A better approach is to define a set of union ACRs, those defined across all conditions, by merging ACRs from each condition (Figure 3C), and looking for those union regions with high cut count variance between conditions. Algorithms developed to identify differentially expressed genes using transcript counts, such as DEseq2 [47], can then be applied to quantitatively test these union regions for differential accessibility between conditions.
Deriving biological meaning from genome-wide accessibility data
Once differentially accessible regions have been identified between conditions, many new questions arise. For example: Which transcription factors bind in these regions? Which genes are the targets of this presumed differential regulation? What does the higher level regulatory network of TFs look like within and across conditions? In this section we briefly introduce useful approaches for addressing such questions, as well as associated caveats.
The accessibility landscape in a given condition is thought to represent the universe of protein binding sites in that condition, but assigning any binding site to a specific factor is challenging. Some studies have used TF “footprinting”, which relies on the detection of short stretches of depressed cleavage within an otherwise accessible region, as these are thought to represent the direct interaction of a TF with DNA. However, analysis of the data at this level can be misleading due to the known sequence biases of the enzymes used in accessibility assays, as well as numerous other technical issues [48–50]. Alternatively, accessible regions can be examined collectively for enrichment of DNA sequence motifs through the use of tools such as the MEME suite [51]. Individual motif instances suggest the identity of transcription factors bound within individual accessible regions [52,53], however this motif-to-TF mapping is imperfect. Candidate TFs can be included or excluded using expression data in order to arrive at a list of TFs that may act differentially between conditions, for example [13]. Caveats of this indirect approach include the fact that many TFs often bind the same or highly similar motifs, most TFs in plants have unknown binding preferences, and that heterodimerization may alter binding specificity.
With a list of TFs of interest in a condition, it is then possible to predict all the regulatory interactions of each TF in that condition in order to build a putative regulatory network. In short, this is done by defining all of the target genes of a TF as those that are nearest to an accessible site that contains an instance of the TF’s motif sequence, and then performing this operation recursively for each TFs whose motifs were found to be overrepresented [54,55]. Such networks come with the caveats that they are based on inference as well as a variety of assumptions. For example the presence of a TF motif in an accessible region does not absolutely indicate a binding event, and defining a gene as a target of a regulatory site is based on the assumption that it will be the closest one, which may not be true. Thus, regulatory networks produced in this way should be considered hypothetical and ultimately subjected to experimental scrutiny.
Conclusions
Chromatin accessibility assays can be used to rapidly pinpoint the location of regulatory elements in a genome and to compare the utilization of these elements between conditions. As we discuss here, it is important to carefully consider sample preparation, sequencing, read mapping, and analysis strategies in order to maximize the power and minimize the pitfalls of these assays. The potential of these techniques, as well as associated analytical issues, will continue to grow as we enter the new era of their widespread application to single cells [56,57].
Acknowledgements
The authors would like to thank Rajiv Parvathaneni and Andrea Eveland, Zefu Lu and Bob Schmitz, and Juan Debernardi and Jorge Dubcovsky for allowing us to analyze their pre-publication accessibility data from sorghum, maize and tetraploid wheat, respectively. We also thank Cristina Alexandre for providing the data underlying Figure 1B.
References and recommended reading
* of special interest
** of outstanding interest
- 1.Gross DS, Garrard WT. Nuclease hypersensitive sites in chromatin. Annu Rev Biochem. 1988;57: 159–197. [DOI] [PubMed] [Google Scholar]
 - 2.Georgopoulos K, van den Elsen P, Bier E, Maxam A, Terhorst C. A T cell-specific enhancer is located in a DNase I-hypersensitive area at the 3’ end of the CD3-delta gene. EMBO J. 1988;7: 2401–2407. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 3.Rodgers-Melnick E, Vera DL, Bass HW, Buckler ES. Open chromatin reveals the functional maize genome. Proc Natl Acad Sci U S A. 2016;113: E3177–E3184. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 4.Song L, Crawford GE. DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harb Protoc. 2010;2010: db.prot5384. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 5.Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013;10: 1213–1218. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 6.Buenrostro JD, Wu B, Chang HY, Greenleaf WJ. ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide. Curr Protoc Mol Biol. 2015;109: 21.29.1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 7.**.Lu Z, Hofmeister BT, Vollmers C, DuBois RM, Schmitz RJ. Combining ATAC-seq with nuclei sorting for discovery of cis-regulatory regions in plant genomes. Nucleic Acids Res. 2017;45: e41. [DOI] [PMC free article] [PubMed] [Google Scholar]; This paper demonstrates the use of fluorescence-activated nuclei sorting (FANS) coupled with ATAC-seq in Arabidopsis. This general method is likely to be useful for obtaining highly pure nuclei for ATAC-seq from any plant species.
 - 8.Maher KA, Bajic M, Kajala K, Reynoso M, Pauluzzi G, West DA, et al. Profiling of Accessible Chromatin Regions across Multiple Plant Species and Cell Types Reveals Common Gene Regulatory Principles and New Control Modules. Plant Cell. 2018;30: 15–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 9.Zhang W, Wu Y, Schnable JC, Zeng Z, Freeling M, Crawford GE, et al. High-resolution mapping of open chromatin in the rice genome. Genome Res. 2012;22: 151–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 10.Zhao H, Zhang W, Chen L, Wang L, Marand AP, Wu Y, et al. Proliferation of Regulatory DNA Elements Derived from Transposable Elements in the Maize Genome. Plant Physiol. 2018;176: 2789–2803. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 11.Oka R, Zicola J, Weber B, Anderson SN, Hodgman C, Gent JI, et al. Genome-wide mapping of transcriptional enhancer candidates using DNA and chromatin features in maize. Genome Biol. 2017;18: 137. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 12.Lu Z, Marand AP, Ricci WA, Ethridge CL, Zhang X, Schmitz RJ. The prevalence, evolution and chromatin signatures of plant regulatory elements. Nat Plants. 2019. doi: 10.1038/s41477-019-0548-z [DOI] [PubMed] [Google Scholar]
 - 13.Sijacic P, Bajic M, McKinney EC, Meagher RB, Deal RB. Changes in chromatin accessibility between Arabidopsis stem cells and mesophyll cells illuminate cell type-specific transcription factor networks. Plant J. 2018;94: 215–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 14.Frerichs A, Engelhorn J, Altmüller J, Gutierrez-Marcos J, Werr W. Specific chromatin changes mark lateral organ founder cells in the Arabidopsis inflorescence meristem. J Exp Bot. 2019;70: 3867–3879. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 15.Yan W, Chen D, Schumacher J, Durantini D, Engelhorn J, Chen M, et al. Dynamic control of enhancer activity drives stage-specific gene expression during flower morphogenesis. Nat Commun. 2019;10: 1705. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 16.Sullivan AM, Arsovski AA, Lempe J, Bubb KL, Weirauch MT, Sabo PJ, et al. Mapping and dynamics of regulatory DNA and transcription factor networks in A. thaliana. Cell Rep. 2014;8: 2015–2030. [DOI] [PubMed] [Google Scholar]
 - 17.Liu Y, Zhang W, Zhang K, You Q, Yan H, Jiao Y, et al. Genome-wide mapping of DNase I hypersensitive sites reveals chromatin accessibility changes in Arabidopsis euchromatin and heterochromatin regions under extended darkness. Sci Rep. 2017;7: 4093. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 18.Wilkins O, Hafemeister C, Plessis A, Holloway-Phillips M-M, Pham GM, Nicotra AB, et al. EGRINs (Environmental Gene Regulatory Influence Networks) in Rice That Function in the Response to Water Deficit, High Temperature, and Agricultural Environments. Plant Cell. 2016;28: 2365–2384. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 19.Harris CJ, Scheibe M, Wongpalee SP, Liu W, Cornett EM, Vaughan RM, et al. A DNA methylation reader complex that enhances gene transcription. Science. 2018;362: 1182–1186. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 20.Jin J, Gui S, Li Q, Wang Y, Zhang H, Zhu Z, et al. The transcription factor GATA10 regulates fertility conversion of a two-line hybrid tms5 mutant rice via the modulation of UbL40 expression. J Integr Plant Biol. 2019. doi: 10.1111/jipb.12871 [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 21.Alexandre CM, Urton JR, Jean-Baptiste K, Huddleston J, Dorrity MW, Cuperus JT, et al. Complex Relationships between Chromatin Accessibility, Sequence Divergence, and Gene Expression in Arabidopsis thaliana. Mol Biol Evol. 2018;35: 837–854. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 22.Burgess SJ, Reyna-Llorens I, Stevenson SR, Singh P, Jaeger K, Hibberd JM. Genome-wide transcription factor binding in leaves from C3 and C4 grasses. Plant Cell. 2019. doi: 10.1105/tpc.19.00078 [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 23.**.Bajic M, Maher KA, Deal RB. Identification of Open Chromatin Regions in Plant Genomes Using ATAC-Seq. Methods Mol Biol. 2018;1675: 183–201. [DOI] [PMC free article] [PubMed] [Google Scholar]; This protocol chapter describes the general principles of using ATAC-seq in plants and details methods for preparation of suitable crude nuclei from whole tissue as well as purification of nuclei from specific cell types using INTACT.
 - 24.Gu W, Crawford ED, O’Donovan BD, Wilson MR, Chow ED, Retallack H, et al. Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications. Genome Biol. 2016;17: 41. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 25.**.Montefiori L, Hernandez L, Zhang Z, Gilad Y, Ober C, Crawford G, et al. Reducing mitochondrial reads in ATAC-seq using CRISPR/Cas9. Sci Rep. 2017;7: 2451. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 26.**.Wu J, Huang B, Chen H, Yin Q, Liu Y, Xiang Y, et al. The landscape of accessible chromatin in mammalian preimplantation embryos. Nature. 2016;534: 652–657. [DOI] [PubMed] [Google Scholar]; The papers above use custom synthetic guide RNAs and recombinant Cas9 nuclease to deplete mitochondrial reads from ATAC-seq libraries. This approach may prove generally useful for depleting organelle reads from plant ATAC-seq libraries.
 - 27.*.Tannenbaum M, Sarusi-Portuguez A, Krispil R, Schwartz M, Loza O, Benichou JIC, et al. Regulatory chromatin landscape in Arabidopsis thaliana roots uncovered by coupling INTACT and ATAC-seq. Plant Methods. 2018;14: 113. [DOI] [PMC free article] [PubMed] [Google Scholar]; The use of qPCR to assess ATAC-seq library quality prior to sequencing is demonstrated in the manuscript.
 - 28.Torres ES, Deal RB. The histone variant H2A.Z and chromatin remodeler BRAHMA act coordinately and antagonistically to regulate transcription and nucleosome dynamics in Arabidopsis. Plant J. 2019;99: 144–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 29.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25: 1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 30.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9: 357–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 31.*. Koohy H, Down TA, Spivakov M, Hubbard T. A comparison of peak callers used for DNase-Seq data. PLoS One. 2014;9: e96303. This paper presents a head-to-head performance comparison of four peak calling algorithms used on chromatin accessibility data.
 - 32.Thomas R, Thomas S, Holloway AK, Pollard KS. Features that define the best ChIP-seq peak calling algorithms. Brief Bioinform. 2017;18: 441–450. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 33.Sullivan AM, Arsovski AA, Thompson A, Sandstrom R, Thurman RE, Neph S, et al. Mapping and Dynamics of Regulatory DNA in Maturing Arabidopsis thaliana Siliques. Front Plant Sci. 2019;10: 1434. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 34.Li Q, Brown JB, Huang H, Bickel PJ. Measuring reproducibility of high-throughput experiments. Ann Appl Stat. 2011;5: 1752–1779. [Google Scholar]
 - 35.John S, Sabo PJ, Thurman RE, Sung M-H, Biddie SC, Johnson TA, et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nat Genet. 2011;43: 264–268. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 36.Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9: R137. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 37.Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Mol Cell. 2010;38: 576–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 38.He HH, Meyer CA, Hu SS, Chen M-W, Zang C, Liu Y, et al. Refined DNase-seq protocol and data analysis reveals intrinsic bias in transcription factor footprint identification. Nat Methods. 2014;11: 73–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 39.Jung YL, Luquette LJ, Ho JWK, Ferrari F, Tolstorukov M, Minoda A, et al. Impact of sequencing depth in ChIP-seq experiments. Nucleic Acids Res. 2014;42: e74. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 40.Meyer CA, Liu XS. Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat Rev Genet. 2014;15: 709–721. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 41.Kharchenko PV, Alekseyenko AA, Schwartz YB, Minoda A, Riddle NC, Ernst J, et al. Comprehensive analysis of the chromatin landscape in Drosophila melanogaster. Nature. 2011;471: 480–485. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 42.Thomas S, Li X-Y, Sabo PJ, Sandstrom R, Thurman RE, Canfield TK, et al. Dynamic reprogramming of chromatin accessibility during Drosophila embryo development. Genome Biol. 2011;12: R43. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 43.Li X-Y, Thomas S, Sabo PJ, Eisen MB, Stamatoyannopoulos JA, Biggin MD. The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding. Genome Biol. 2011;12: R34. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 44.Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489: 75–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 45.Zhang W, Zhang T, Wu Y, Jiang J. Genome-wide identification of regulatory DNA elements and protein-binding footprints using signatures of open chromatin in Arabidopsis. Plant Cell. 2012;24: 2719–2731. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 46.Borchers HW. pracma: Practical Numerical Math Functions. In: http://CRAN.R-project.org/package=pracma [Internet]. 2019. [cited 2019]. Available: http://CRAN.R-project.org/package=pracma [Google Scholar]
 - 47.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15: 550. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 48.Sung M-H, Guertin MJ, Baek S, Hager GL. DNase footprint signatures are dictated by factor dynamics and DNA sequence. Mol Cell. 2014;56: 275–285. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 49.Sung M-H, Baek S, Hager GL. Genome-wide footprinting: ready for prime time? Nat Methods. 2016;13: 222–228. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 50.Vierstra J, Stamatoyannopoulos JA. Genomic footprinting. Nat Methods. 2016;13: 213–221. [DOI] [PubMed] [Google Scholar]
 - 51.Machanick P, Bailey TL. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics. 2011;27: 1696–1697. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 52.Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell. 2014;158: 1431–1443. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 53.O’Malley RC, Huang S- SC, Song L, Lewsey MG, Bartlett A, Nery JR, et al. Cistrome and Epicistrome Features Shape the Regulatory DNA Landscape. Cell. 2016;165: 1280–1292. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 54.**.Kulkarni SR, Jones DM, Vandepoele K. Enhanced maps of transcription factor binding sites improve regulatory networks learned from accessible chromatin data. Plant Physiol. 2019. doi: 10.1104/pp.19.00605 [DOI] [PMC free article] [PubMed] [Google Scholar]; This manuscript describes many important technical aspects of generating gene regulatory networks from chromatin accessibility data.
 - 55.Reynoso MA, Kajala K, Bajic M, West DA, Pauluzzi G, Yao AI, et al. Evolutionary flexibility in flooding response circuitry in angiosperms. Science. 2019;365: 1291. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 56.Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature. 2015;523: 486–490. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - 57.Cusanovich DA, Daza R, Adey A, Pliner HA, Christiansen L, Gunderson KL, et al. Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science. 2015;348: 910–914. [DOI] [PMC free article] [PubMed] [Google Scholar]
 
