Abstract
The ChIP-exo assay precisely delineates protein–DNA crosslinking patterns by combining chromatin immunoprecipitation with 5′ to 3′ exonuclease digestion. Within a regulatory complex, the physical distance of a regulatory protein to DNA affects crosslinking efficiencies. Therefore, the spatial organization of a protein–DNA complex could potentially be inferred by analyzing how crosslinking signatures vary between its subunits. Here, we present a computational framework that aligns ChIP-exo crosslinking patterns from multiple proteins across a set of coordinately bound regulatory regions, and which detects and quantifies protein–DNA crosslinking events within the aligned profiles. By producing consistent measurements of protein–DNA crosslinking strengths across multiple proteins, our approach enables characterization of relative spatial organization within a regulatory complex. Applying our approach to collections of ChIP-exo data, we demonstrate that it can recover aspects of regulatory complex spatial organization at yeast ribosomal protein genes and yeast tRNA genes. We also demonstrate the ability to quantify changes in protein–DNA complex organization across conditions by applying our approach to analyze Drosophila Pol II transcriptional components. Our results suggest that principled analyses of ChIP-exo crosslinking patterns enable inference of spatial organization within protein–DNA complexes.
INTRODUCTION
Each cell type is defined by a unique gene expression program, which is in turn determined by the activities of regulatory proteins binding to promoters, enhancers, and other genomic regions. Genomic regulatory regions are bound by particular combinations of sequence-specific transcription factors (TFs), co-regulators, and chromatin modifiers in a spatiotemporal dependent manner. While large-scale efforts are underway to map and functionally characterize potential regulatory regions (1,2), we still know relatively little about the structure and organization of individual protein–DNA complexes along the genome. To fully understand how gene regulatory programs are coordinated, it will be crucial to characterize precisely how regulatory complexes are assembled and organized.
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) enables genome-wide localization of regulatory proteins. However, the spatial resolution of ChIP-seq is limited, as chromatin fragmentation strategies can result in sequencing reads that map several hundred base pairs away from the site bound by the protein of interest. Therefore, while integrative analyses of ChIP-seq data collections can find groups of co-bound regulatory proteins (3–5), such analyses provide only limited insight into the spatial organization of proteins within regulatory complexes.
In contrast to ChIP-seq, ChIP-exo and related assays (e.g. ChIP-nexus (6)), precisely define protein–DNA binding locations via the use of lambda exonuclease (7). The exonuclease digests protein-bound DNA in a 5′ to 3′ direction and, on average, stops at 6 bp before a protein–DNA crosslinking point. The ChIP-exo tag distribution at a given regulatory region is thus the product of crosslinking events that formaldehyde or other crosslinking agents have induced between the targeted protein and DNA.
ChIP-exo's ability to map crosslinking signatures suggests a strategy for characterizing the spatial organization of regulatory complexes. At sites where a sequence-specific TF is bound directly to its cognate motif, the dominant crosslinking signature should result from direct interactions between the TF’s residues and proximal DNA bases. However, regulatory proteins that alternatively (or additionally) interact with DNA via protein-protein interactions should display crosslinking signatures related to the TFs that recruit them. Since the physical distance of a regulatory protein to the recruiting TF will affect crosslinking efficiencies, different members of a regulatory complex should display distinct crosslinking patterns across a regulatory region. In principle, then, analysis of ChIP-exo crosslinking patterns should enable some degree of inference regarding the spatial organization of regulatory proteins within a protein–DNA complex.
Previous work suggests the feasibility of inferring the spatial organization of protein–DNA complexes via ChIP-exo crosslinking analysis. ChIP-exo analysis of yeast general transcription factors found that crosslinking patterns at Pol II promoters were consistent with those expected from crystallographic models of the transcriptional machinery (8). ChIP-exo characterization of ribosomal protein gene (RPG)-specific factors introduced the idea that the ordering of indirect protein–DNA interactions can be inferred from analysis of crosslinking efficiencies (9). Specifically, high-resolution analysis of ChIP-exo crosslinking patterns at sites bound by the sequence-specific TF Rap1 show the same crosslinking pattern echoed in ChIP-exo experiments targeting Sfp1, Ifh1, and Fhl1, suggesting that these factors may be indirectly recruited to DNA by Rap1. We and others have made use of this concept to characterize indirect protein–DNA interactions in other systems. For example, in some mammalian cell types, both Glucocorticoid Receptor and Estrogen Receptor alpha may be indirectly recruited to certain binding sites via protein-protein interaction with FoxA TFs, as evidenced by near identical crosslinking patterns at those sites (10–12). One limitation of previous approaches is that they have relied on TF binding motifs or known genomic anchor points to align bound sites before characterizing crosslinking patterns (7,9,13). Naturally, such strategies limit the usefulness of ChIP-exo crosslinking analysis to protein–DNA complexes where the focal point of spatial organization is already known.
In this work, we formalize concepts suggested by previous studies by presenting ChExAlign, a systematic approach for characterizing ChIP-exo crosslinking patterns across multiple members of a protein–DNA regulatory complex (Figure 1). Specifically, we first develop a multiple alignment procedure for characterizing consistent ChIP-exo crosslinking signatures across multiple ChIP-exo experiments and across multiple regulatory regions. To make the alignment approach broadly applicable to different types of protein–DNA complexes, our procedure does not rely on sequence features or other genomic annotations, but rather directly aligns multi-protein ChIP-exo tag profiles. While there has been limited work on aligning broad tag distributions such as those from histone modification ChIP-seq and ChIP-chip data (14–17), our approach is optimized for high-resolution, strand-separated ChIP-exo data. Given a multiple alignment of ChIP-exo tag profiles, our approach next applies a probabilistic mixture model to deconvolve individual protein–DNA crosslinking events. This approach allows consistent quantification of crosslinking strengths across multiple proteins in a regulatory complex. Finally, we apply principal component analysis (PCA) to visualize similarities between the crosslinking preferences of the regulatory proteins.
Figure 1.
Overview of ChExAlign. (A) ChExAlign takes a set of genomic locations as input and calculates pairwise similarity scores between multi-dimensional ChIP-exo profiles using a Needleman-Wunsch alignment algorithm. ChExAlign then uses the pairwise alignments to progressively build a multiple alignment across all input regions. This step results in a composite multi-dimensional profile representing the aligned ChIP-exo signals across all input regions. (B) After obtaining aligned ChIP-exo profiles, ChExAlign uses a mixture model to deconvolve crosslinking event positions within the aligned profiles. Crosslinking strengths for each protein are quantified at each crosslinking position. Principal Component Analysis (PCA) is applied to the crosslinking matrix to infer relationships between the crosslinking signatures of each protein within the complex.
We demonstrate the utility of our approach by applying it to characterize the spatial organization of three distinct regulatory complexes. We first apply our method to yeast RPG ChIP-exo datasets in order to show that multiple profile alignment can be used to automatically align collections of ChIP-exo data across a collection of coordinately regulated regions. While previous analyses of the RPG ChIP-exo data relied on a manual alignment around a sequence motif feature (Rap1 binding sites), our ChIP-exo alignment approach yields similar alignments, and the same biological conclusions, without knowledge of DNA sequence features. Secondly, we extend our analyses to 12 novel ChIP-exo datasets that characterize the occupancy of regulatory complexes at yeast tRNA genes. Due to the variable length of tRNA intragenic promoters, we extend our multiple profile alignment procedure to account for affine gaps. We further demonstrate that crosslinking event detection and quantification yields insight into the spatial organization of individual proteins within the Pol III transcriptional machinery. Finally, we demonstrate that crosslinking analysis provides a quantitative framework for characterizing changes in regulatory complex organization across conditions. By applying our methods to a collection of ChIP-nexus data that profile Drosophila Pol II transcriptional components under two experimental conditions, we demonstrate that we can quantify the degree to which individual proteins are relocalized when transcriptional initiation is inhibited.
In summary, our approaches provide a novel platform for examining the spatial organization of protein–DNA complexes from collections of high-resolution ChIP data.
MATERIALS AND METHODS
Pairwise alignment of ChIP-exo profiles with affine gap penalties
To align multi-experiment ChIP-exo tag profiles across multiple genomic regions, we extend an affine gapped, overlap alignment version of the Needleman–Wunsch algorithm (18,19). The Needleman–Wunsch algorithm is popularly used to perform global alignment of protein or nucleotide sequences. Here, rather than aligning discrete residue sequences, our approach aligns real-valued matrices of ChIP-exo tag counts. For a given length L genomic region, the corresponding input matrix contains rows for each of the two DNA strands in each of K ChIP-exo experiments. The elements in each row contain the number of ChIP-exo tag 5′ positions (stranded) mapped to each base in the region. Thus, the dimensionality of each input matrix is 2K × L, and each ChIP-exo tag is counted in only a single bin of the matrix. The goal of the pairwise alignment procedure is to form a global alignment between ChIP-exo profile matrices corresponding to genomic regions X = (x1, …, xi, …, xL) and Y = (y1, …, yj, …, yL).
As in the original dynamic programming alignment with affine gap penalties, we construct matrices M, Ix, and Iy, indexed by i and j, where the value M(i, j) is the best score up to (i, j) given that xi is aligned to yj, Ix(i, j) is the best score given that xi is aligned to a gap, and Iy(i, j) is the best score given that yj is an insertion with respect to x. We use an affine gap cost structure to impose different penalties for opening and extending a gap of length g as
, where d is the gap-open penalty and e is the gap-extension penalty. In this study, we use e = 0.1d. The recursion relationships are unchanged from the original algorithm:
![]() |
The similarity score of xi and yj coordinates, s(xi, yj), is computed using the Pearson correlation coefficient as follows:
![]() |
where
and
. qkw and qkc are the numbers of ChIP-exo tag 5′ positions mapped from experiment k to the Watson and Crick strands (respectively) at the xi coordinate. pkw and pkc are the numbers of ChIP-exo tag 5′ positions mapped from experiment k to the Watson and Crick strands (respectively) at the yj coordinate. The above Pearson correlation metric scores are normally distributed between –1 and 1 for pairs of randomly generated profile columns, and we confirmed that the above score yields the expected values for perfectly correlated and anti-correlated vectors (Supplementary Figures S4 and S5).
Our approach generates overlap alignments, which do not penalize overhanging ends. Therefore, the scoring matrices are initialized with M(i, 0) = 0, Ix(i, 0) = 0, and Iy(i, 0) = 0 for i = 1,…,n and M(0, j) = 0, Ix(0, j) = 0, and Iy(0, j) = 0 for j = 1,…,m. Traceback starts from the cell with the maximum value on the lower right quadrant border, (i.e. (n, j), j = 2/m,…,m, or (i, m), i = 2/n,…,n), among the M, Ix, and Iy matrices. Traceback continues until the top or left edge is reached.
Alignment scores between random regions are normally distributed, and low compared with the scores observed between pairs of RPG regions (Supplementary Figure S6). We use gap opening scores of d = 50 in analyzing Pol III transcriptional complex at tRNA genes, d = 100 for Drosophila Pol II components, and d = 200 for analyzing RPG-specific factors. These values were chosen to encourage fewer gaps for the latter two settings. However, we observed that similarity scores between RPG regions (which are expected to contain gaps) are stable across a wide range of settings for gap opening (d) and gap extension (e) values (Supplementary Figure S7).
Progressive alignment of multiple regions
To build multiple alignments, we progressively align nodes by a succession of pairwise profile alignments. Nodes initially contain single region profiles. Progressive alignment proceeds by aligning the most similar pair of profiles, and merging the aligned pair into a composite profile generated according to the gapped alignment. Gaps are represented as an additional one-dimensional array alongside the ChIP-exo profiles. Similarities are then recalculated between the aligned profile and all remaining regions. We repeat these steps until all the nodes are merged. After the final multiple alignment of all regions has been constructed, we subtract background signals from the composite ChIP-exo profiles. We first use NCIS (20) to scale a control experiment to each analyzed signal ChIP-exo experiment. We then subtract scaled per-base control tag counts from each signal ChIP-exo experiment.
Deconvolution and quantification of crosslinking profiles using a mixture model
After constructing composite ChIP-exo profiles representing the multiple alignment of analyzed regions, we aim to locate and quantify protein–DNA crosslinking positions within the profiles. Our approach models ChIP-exo composite data as being generated by a mixture of crosslinking events, and an Expectation Maximization (EM) learning scheme is used to probabilistically assign sequencing tags to crosslinking positions. By estimating crosslinking positions in the composite profiles, we implicitly assume that protein–DNA crosslinking patterns are consistent at all aligned regions, albeit with spacing differences between crosslinking sites as modeled by the alignment gaps.
gives the probability of observing ChIP-exo tag
from a crosslinking event located at genomic coordinate x, and is defined by a pair of Gaussian distributions (one on the positive strand, one on the negative strand) with σ = 6 bp and offset from each other by 12 bp (positive strand to the left). We define a vector of component locations μ where
is the genomic location of the crosslinking event j. We initialize potential crosslinking events, M, such that they are spaced in 5 bp intervals along the window. The overall likelihood of the observed set of tags, r, given the crosslinking event positions, μ, the binding event mixture probabilities (i.e. crosslinking event strengths),
, is given by:
![]() |
where N is the number of ChIP-exo tags that have been mapped to genomic locations.
To limit the number of modeled crosslinking positions, we place a sparseness promoting negative Dirichlet prior,
, on the crosslinking strength
(21). The latent assignments of tags to crosslinking events are represented by the vector z. The complete-data log posterior is as follows:
![]() |
The E-step that calculates the relative contribution of each crosslinking event in generating each tag is:
![]() |
The maximum a posteriori probability (MAP) estimation (21) of
is:
![]() |
The α parameter can thus be interpreted as the minimum number of ChIP-exo tags required to support a crosslinking event active in the model. MAP values of
are determined by enumerating over several possible values of
. Specifically, the MAP estimation of
is:
![]() |
where x starts at the previous values of the position and expands outwards to 50 bp each side. If the maximization step results in two components sharing the same positions, they are combined in the next iteration of the algorithm.
Crosslinking positions are simultaneously modeled across all analyzed ChIP-exo experiments in a given aligned profile, and information about crosslinking position estimates are shared between experiments at each EM step using a positional prior strategy described in the MultiGPS algorithm (22). This positional prior encourages consistency in estimated crosslinking positions across ChIP-exo experiments. Our rationale is that a given protein complex will be crosslinked to DNA at relatively few positions, but the signatures of these crosslinks will be present across experiments characterizing multiple members of the complex.
Finally, the relative crosslinking strengths of all estimated crosslinking points are quantified across all ChIP-exo profiles using Maximum Likelihood assignment of tag counts, yielding a matrix of crosslinking strengths.
Visualization of crosslinking relationships
The matrix of crosslinking strengths is used to assess the relationships between the crosslinking preferences of each protein in a protein–DNA complex. We normalize the crosslinking strength matrix such that the sum of crosslinking strengths for each protein is 1. We then apply a standard PCA to visualize the relationships between protein–DNA crosslinking preferences.
ChIP-exo experiments and processing
Tandem affinity purification (TAP)-tagged Brf1, Bdp1, Rpo31, Rpc17, Rpc40, Tfc1, Tfc3, Tfc4, Tfc6, Tfc7, and Tfc8 Saccharomyces cerevisiae strains in a BY4741 background were obtained from Open Biosystems. TAP-tagged S. cerevisiae cultures were grown in yeast peptone dextrose (YPD) media at 25°C to an OD600 = 0.8–1.0. ChIP-exo assays were performed as previously described (13,23). In brief, TAP-tagged S. cerevisiae cultures were treated with 1% formaldehyde for 15 min. Reactions were quenched. Cells were disrupted by bead beating, and chromatin pellets were washed. Chromatin was solubilized by sonication. Extracts were incubated with rabbit IgG antibodies (Sigma, i5006) coupled to protein A sepharose. The first adaptor was ligated to the ChIP DNA while immobilized on beads. Washed immobilized DNA was digested in the 5′-3′ direction with lambda exonuclese. Next, the material was eluted, and the second adaptor ligated to the exonuclease treated end. The resulting libraries were sequenced using Illumina machine. Mock IP control ChIP-exo experiments in yeast were performed using rabbit IgG (Sigma, i5006) in the BY4741 background strain (which does not contain a TAP tag sequence).
Libraries were paired-end sequenced and read pairs were mapped to the sacCer3 genome using BWA version 0.7.12 with options “mem -T 30 -h 5". Read pairs that share identical mapping coordinates on both ends are likely to represent PCR duplicates, and so Picard (http://broadinstitute.github.io/picard) was used to de-duplicate such pairs. Reads with MAPQ score <5 are filtered out using samtools (24).
Public datasets
ChIP-exo for RPG-specific factors, Tbp1 ChIP-exo, and ChIP-nexus data targeting Pol II and basal TFs are obtained from NCBI Sequence Read Archive under accession number SRP041518 (9), GSM2601059 (25) and GSE85741 (26), respectively. RPG and Tbp1 ChIP-exo data are aligned against sacCer3 using BWA (27) version 0.5.9. ChIP-nexus data are aligned against dm3 using BWA version 0.7.12 with options “mem -T 30 -h5".
Drosophila TSSs
We use TSS annotations from dm3 refGene protein-coding genes. To annotate a consensus TSS position within the aligned profiles, we use the position that contains the highest TSS frequency in the underlying aligned regions (i.e. the mode TSS position).
Motif analysis
We ran MEME-ChIP version 4.10.0 (28) on tRNA gene sequences to characterize the Box A and Box B motifs. Then, we scan 400 bp of regions used for alignment with the discovered motifs using a log-likelihood scoring threshold of 0.1% per base FDR defined using a second-order Markov model based on yeast genome nucleotide frequencies. We obtained the Rap1 cognate DNA-binding motif positions by scanning the corresponding cis-bp database motif (M4379_1.02) (29) in 1400 bp regions centered around TSSs using a log-likelihood scoring threshold of 0.1% per base FDR defined using a second-order Markov model based on yeast genome nucleotide frequencies.
ChExMix peak calling
We run ChExMix version 0.42 (11) with default parameters on Pol II ChIP-exo data in Kc167 cells and obtained the top 500 most enriched Pol II peaks using q-value.
RESULTS
ChExAlign overview
ChExAlign is a computational framework that aligns and quantifies ChIP-exo crosslinking patterns from multiple proteins across a set of coordinately bound regulatory regions. ChExAlign does not use sequence motif features and operates directly on the multi-protein, strand-separated ChIP-exo tag patterns. ChExAlign's multiple alignment procedure starts by obtaining strand-separated, per-base tag count profiles composed of data from each of the analyzed ChIP-exo experiments within a fixed-sized window at the defined input regions (Figure 1A). Each region's multi-dimensional profiles are normalized by dividing all values by the maximum tag count of each protein in a region. ChExAlign uses an overlap Needleman–Wunsch alignment algorithm (18,19) with affine gap penalties to find the optimal pairwise alignments between all input regions. Within the pairwise alignment procedure, the similarities between each pair of positions is defined using Pearson correlation. Next, a progressive profile multiple alignment procedure is applied to produce a multiple profile alignment. Pairs of multi-dimensional profiles are progressively combined in order of their pairwise alignment similarity scores. After aligning a pair of profiles, the resulting composite profile is aligned against all remaining profiles. After the final multiple alignment of all regions has been constructed, we subtract background signals from the composite ChIP-exo profiles. We first scale a control experiment to each analyzed signal ChIP-exo experiment and then subtract scaled per-base control tag counts from each signal ChIP-exo experiment. The output of the multiple alignment procedure is a composite multi-dimensional profile that represents the aligned protein–DNA crosslinking signatures across all analyzed regulatory regions.
After producing the composite aligned ChIP-exo profile, ChExAlign quantifies protein–DNA crosslinking events within the complex (Figure 1B). A probabilistic mixture model is applied to the composite profile to detect the positions of individual crosslinking events. Crosslinking strengths are estimated by assigning fractions of each protein's ChIP-exo signal to each detected crosslinking point. Lastly, dimensionality reduction is applied to visualize the relationships between crosslinking signatures for each protein within the regulatory complex. ChExAlign takes mapped ChIP-exo experiments (e.g. BAM files) and starting coordinates (e.g. loci of interest) as inputs. Outputs are aligned ChIP-exo profiles and quantifications of protein–DNA crosslinking events across multiple proteins.
ChIP-exo profile alignment recovers the spatial organization of a regulatory complex at ribosomal protein genes
To demonstrate that ChExAlign provides an informative alignment of protein–DNA crosslinking patterns across sets of related regulatory regions, we first applied it to analyze the organization of a protein–DNA complex at yeast ribosomal protein genes (RPGs). Most yeast RPGs are coordinately regulated by a common set of regulatory proteins. The transcription factor Rap1 binds to cognate sequence motifs located 77–501 bp upstream of the transcription start site (TSS), and recruits Fhl1, Ifh1, and Sfp1. Hmo1 is also recruited at roughly half of the RPGs (9,30). Previous analyses of Rap1, Fhl1, Ifh1, Sfp1, and Hmo1 ChIP-exo data determined that the RPG regulatory complex has a well-defined spatial organization (9). Rap1 binding sites serve as an upstream boundary to the complex. Fhl1, Ifh1, and Sfp1 are almost identically positioned ∼100 bp downstream of Rap1, with some evidence of additional crosslinking through the Rap1 site (indicative of protein-protein interactions between Rap1 and the recruited factors). When present, Hmo1 occupies the region between Rap1 and the TSS, essentially overlapping where Fhl1/Ifh1/Sfp1 bind. Thus, a consistently organized regulatory complex is present in the upstream regions of most yeast RPGs, and this organization should in principle be recoverable by aligning ChIP-exo profiles that span the RPG regulatory regions.
A standard approach to analyzing collections of regulatory genomics data at a set of related gene loci might begin by producing composite profiles centered on the genes’ TSSs. Applying this approach to the five ChIP-exo datasets at 134 RPGs produces a set of smooth composite profiles without any discernible organization between the members of the regulatory complex (Figure 2A). Indeed, previous analysis has demonstrated that it is only when the five ChIP-exo datasets are aligned by Rap1-bound motif locations (consistently oriented with respect to the RPG TSSs) that a more structured organization emerges from the data (9). Thus, the smooth profiles produced by a TSS-centric alignment are artefacts of the variable spacing between TSSs and the true organizing points of the regulatory complex (i.e. the Rap1 sites). We therefore asked whether ChExAlign's alignment procedure can recapitulate insights into the organization of the RPG regulatory complex without using sequence motif information or prior knowledge of Rap1 sites as the organizing loci.
Figure 2.
ChIP-exo tag alignment and an inferred organization of RPG specific factors. (A, B) Rap1, Sfp1, Ifh1, Fhl1, and Hmo1 ChIP-exo enrichment at 134 RP genes, centered on TSS coordinates (A) and after ChExAlign alignment (B). Forward strand tags are shown in blue and reverse strand tags are shown in red. The heatmaps in (B) are ordered by the multiple alignment procedure. (C) Positions of Rap1 motif in the same windows displayed in A). (D) Positions of Rap1 motif given the aligned coordinates displayed in B). (E) ChIP-exo tag patterns of RPG-specific factors before and after alignment, and centered around Rap1 motif. (F) ChIP-exo tag patterns for Sfp1, Ifh1, Fhl1, Hmo1 (grey, green, purple and blue traces, respectively) compared with Rap1 (filled plots) at Rap1 sites after progressive alignment.
We applied ChExAlign to produce an ungapped overlap multiple profile alignment across data from 5 ChIP-exo experiments (Rap1, Flh1, Ifh1, Sfp1, and Hmo1) taken from 1,400bp windows centered on the TSSs of 134 yeast RPGs (Figure 2B). Our alignment recovers sharply distributed composite profiles across the five factors (Figure 2B, E). The aligned composite plots enable some degree of inference regarding the organization of the regulatory complex. For example, high-resolution analysis of the crosslinking patterns displayed at the Rap1 binding sites suggest that Fhl1/Ifh1/Sfp1 are indirectly bound through protein–protein interactions with Rap1 (Figure 2F). In addition, the multiple profile alignment appropriately clusters the subset of RPGs that displays Hmo1 enrichment, consistent with previous observations (9,30). Importantly, even though ChExAlign does not use sequence information during the alignment procedure, the multiple profile alignment induces an alignment of the underlying Rap1 motif sequences (Figure 2C, D). Therefore, ChExAlign can accurately align multi-dimensional ChIP-exo profiles across a set of coordinately regulated regions, enabling insights into the organization of regulatory proteins within the regions.
Gapped ChIP-exo profile alignment enables consistent analysis of protein–DNA crosslinking patterns across 12 regulatory proteins at tRNA genes
Having demonstrated that multiple profile alignment can recover informative protein–DNA crosslinking patterns in a set of regions that share a tightly organized regulatory complex, we next aimed to demonstrate that the alignment procedure is robust to cases where the protein–DNA regulatory complex contains a more variably spaced organization. We chose to focus on protein–DNA interactions in the yeast Pol III transcriptional machinery at tRNA genes, as tRNA genes with varying lengths contain regulatory elements of consistent composition but variable internal spacing. Analyses of protein–DNA crosslinking patterns over tRNA genes might therefore be expected to require gapped ChIP-exo profile alignment strategies.
Eukaryotic tRNA genes are transcribed by Pol III, which is recruited by the multi-subunit transcription factor complexes TFIIIB and TFIIIC (31,32) (Figure 3A). TFIIIB is composed of TBP, Brf1, and Bdp1, while TFIIIC contains two subcomplexes, τA (composed of Tfc1, Tfc4, and Tfc7) and τB (composed of Tfc3, Tfc6, and Tfc8). The TFIIIC subcomplexes bind to conserved intragenic promoter motifs, named Box A (bound by τA) and Box B (bound by τB), and enable assembly of the TFIIIB complex at a region ∼30 bp upstream of the tRNA transcription start site (33–35). Pol III is then recruited by TFIIIB, enabling transcription. Yeast tRNA genes vary in length between 74 and 134 bp, and this variation is reflected by variable spacing between the intragenic Box A and Box B promoter elements (Figure 3B).
Figure 3.
Gapped alignment of ChIP-exo profiles enables characterization of consistent protein–DNA crosslinking patterns across 279 yeast tRNA genes. (A) Cartoon representation of protein organization within the yeast tRNA transcriptional machinery. (B) Sequence plot of tRNA genes, sorted by increasing tRNA gene length. The TSS and Box A are separated by 11 bp, while Box B and the transcription end site are separated by 16–20 bp. (C, D) The alignment method simultaneously aligns 12 ChIP-exo experiments with gaps. ChIP-exo heatmaps before (C) and after (D) alignment are shown for TBP (TFIIIB), Rpo31 (Pol III), Tfc1 (TFIIIC τA), and Tfc6 (TFIIIC τB). Gaps are represented in grey. The heatmaps are sorted by increasing tRNA gene length. (E, F) Relative positions of Box A and Box B motifs before (E) and after (F) the alignment.
We applied ChIP-exo to generate novel high-resolution protein–DNA interaction profiles for three protein components of Pol III, the three components of TFIIIB, and the six components of TFIIIC (Figure 3C, D). In order to account for the expected variable spacing between crosslinking signatures centered on the Box A and Box B sites in tRNA genes of different lengths, we extended our ChIP-exo multiple profile alignment strategy to incorporate affine gap penalties. We then applied our framework to align ChIP-exo profiles from the 12 ChIP-exo targets across 279 yeast tRNA gene loci. Our goal is to demonstrate that ChExAlign's ChIP-exo profile alignment procedure can recover informative protein–DNA crosslinking signatures for a regulatory complex without any prior knowledge of sequence features. However, the tRNA genes contain highly conserved sequence features, so initializing the alignments centered on tRNA TSSs would run the risk of being indirectly biased by the underlying conserved sequences (Figure 3B; Supplementary Figure S1A, B). We therefore chose to remove this potential confounding effect by initializing the alignments as centered on randomly shifted locations within a 60 bp range surrounding tRNA TSSs (Figure 3C). This also has the effect of scrambling the relative locations of the Box A and Box B motifs in the initial alignment (Figure 3E).
The aligned ChIP-exo profiles produced by ChExAlign multiple profile alignment display pronounced peaks compared to the initial alignment (Figure 3C, D: representative profiles for protein components of Pol III, TFIIIB, and TFIIIC τA & τB are shown for illustration; Supplementary Figure S2A–D: when single TBP ChIP-exo was used for alignment). Despite the fact that no sequence information is included in the alignment procedure, the multiple profile alignment induces a tight alignment of both Box A and Box B motif locations (Figure 3F). The aligned ChIP-exo profiles incorporate a gap between the Box A and Box B motifs; as expected, the newly aligned short tRNA genes contain a larger gap compared with longer tRNA genes, having the effect that a consistent crosslinking profile alignment is constructed across all tRNA gene loci (Figure 3D, F). Visual inspection shows that the multiple profile alignment contains sharp protein–DNA crosslinking peaks near the major regulatory elements (i.e. surrounding the Box A motif, Box B motif, and upstream TFIIIB binding location), consistent with where formaldehyde-induced protein–DNA crosslinking events would be expected to occur for the profiled regulatory proteins. Our results therefore demonstrate that ChExAlign can produce a high-quality multiple profile alignment of ChIP-exo crosslinking signatures, even in cases where the underlying regulatory elements occur at variable spacing in the constituent genomic loci.
ChIP-exo crosslinking quantification enables inference of protein–DNA complex organization
Visual inspection of aligned ChIP-exo crosslinking profiles enables some degree of insight into the spatial organization of protein–DNA complexes. For example, the aligned crosslinking profiles of tRNA transcriptional components show that TBP and other TFIIIB components are primarily crosslinked through a site upstream of the TSS, as expected from the direct protein–DNA crosslinks resulting from TBP binding to the TATA motif (Figure 4A). Meanwhile, the TFIIIC components primarily display crosslinking through sites around the Box A and Box B motifs, again reflective of the expected protein–DNA binding events. However, TFIIIC components also display additional, weaker crosslinking signatures through the same site that is crosslinked by TFIIIB components, indicative of the indirect protein–DNA crosslinking that results from protein-protein interactions between TFIIIC and TFIIIB. Within a regulatory complex, we should expect a protein's crosslinking efficiency at a given DNA site to decay as a function of the number of protein–protein crosslinks required to link it to the protein that directly contacts the DNA site. Therefore, careful analysis of the relative strengths of all crosslinking peaks may enable inference of protein spatial positioning within a regulatory complex. To facilitate such inference, we aimed to incorporate a principled approach to quantifying and visualizing crosslinking peaks into the ChExAlign framework.
Figure 4.
Inferring the positional organization of the tRNA transcriptional multi-subunit complex using ChIP-exo crosslinking quantification. (A) Aligned ChIP-exo profiles of Pol III, TFIIIB, and TFIIIC components at tRNA genes show distinct crosslinking patterns. (B) ChExAlign uses a mixture model to deconvolve the positions and relative strengths of protein–DNA crosslinking events across the aligned multi-protein ChIP-exo profiles. The effect of crosslinking event deconvolution in the Tfc6 ChIP-exo profile is shown as an example. (C) Matrix of relative crosslinking strengths for TFIIIB, Polymerase III, and TFIIIC components at detected crosslinking event positions. (D) Principal Component Analysis applied to the matrix of relative crosslinking strengths approximates aspects of the known organization of the yeast tRNA transcriptional machinery (Figure 3A). PC1 and PC2 together explain 91% of the variance.
Our approach to quantifying ChIP-exo crosslinking signatures first applies a probabilistic mixture model to estimate the positions and relative strengths of crosslinking events within a multiple ChIP-exo profile alignment (see Materials and Methods). The mixture model probabilistically assigns observed ChIP-exo tags to crosslinking events using a predefined model of tag distributions around single crosslinking events. We use the Expectation Maximization (EM) algorithm to iteratively optimize the positions and strengths of crosslinking events using information from the assigned tags. The mixture model incorporates priors to keep the number of detected crosslinking events low (i.e. assuming that protein–DNA crosslinking occurs at relatively few bases in a regulatory region) and to keep the positions of crosslinking events consistent across the aligned multi-protein profiles (i.e. assuming that the same crosslinking events will recur across ChIP-exo profiles from multiple interacting proteins in a regulatory complex).
We applied our crosslinking deconvolution procedure to the 12-experiment multiple profile alignment of the tRNA regulatory complex (Figure 4A), first subtracting scaled ChIP-exo control signals from the individual profiles to account for elevated ChIP-exo background signals over the tRNA genes. The result of the deconvolution procedure is illustrated for Tfc6 in Figure 4B; crosslinking events are detected at eight positions within the Tfc6 signature, and each has a relative strength quantification which is derived from the proportion of Tfc6 ChIP-exo tags that is associated by the mixture model. The results of the procedure across all profiles can be summarized using a relative crosslinking matrix (Figure 4C). The matrix illustrates that TBP has its highest crosslinking signal at position –60 in the alignment, a position that is located within the transcription bubble created by the Pol III pre-initiation complex (36) (Figure 4C). Brf1 and Pol III components also show the strongest signal at the same position, suggesting that they associate with DNA via protein–protein interactions with TBP (Figure 4C). On the other hand, Tfc6 and Tfc8 from the TFIIIC τB subcomplex show highest signals at positions +47 and +77, again consistent with their role in binding the Box B motif.
To visually summarize all crosslinking quantifications, we applied Principal Component Analysis (PCA) to the rows of the crosslinking matrix (Figure 4D). This dimensionality reduction organizes the proteins in a manner that resembles the expected spatial organization of the tRNA transcriptional complex (Figure 3A). TFIIIB components TBP and Brf1 are the left-most outliers of the plot, while Pol III and TFIIIC complexes are grouped in the middle and the right side of the plot. Notably, Tfc4, which is known to crosslink to both TFIIIB and to the rest of the τA subcomplex (35,37–39) is situated between Brf1 and other τA subcomplex members on the PCA plot. Consistent with a previous study that showed Bdp1 crosslinking to the C34 subunit of Pol III (40), Bdp1 is situated proximal to Pol III subunits on the PCA plot. Moreover, in accordance with the finding that Tfc3 crosslinks with Tfc1 and Box B (39), the PCA plot shows Tfc3 in between τA and τB subunits of TFIIIC.
In summary, ChExAlign's crosslinking deconvolution and quantification procedures, and downstream dimensionality reduction, are promising approaches for generating hypotheses regarding the spatial organization of a protein–DNA complex from a collection of ChIP-exo data.
ChExAlign enables a principled quantification of regulatory complex reorganization across conditions
If the structure of a regulatory complex changes across different gene classes or between cell types, we should expect that the relative crosslinking signatures of members of the complex will also change. Provided that data from all conditions are analyzed simultaneously, ChExAlign's approach to aligning and quantifying protein–DNA crosslinking signatures is suitable for detecting spatial localization changes within a regulatory complex across conditions. To demonstrate our framework's ability to detect regulatory complex reorganization, we applied it to ChIP-nexus data profiling Pol II, the basal transcription factors TBP, TFIIA, TFIIB, TFIIF, TAF2, XPB, and negative elongation factor E (NELFE) in Drosophila melanogaster Kc167 cells (26). Shao and Zeitlinger collected data profiling these factors both under control conditions and after applying Triptolide (TRI) treatment, which blocks transcriptional initiation. We therefore sought to apply ChExAlign to this collection of ChIP-nexus datasets to quantify TRI-dependent changes in protein–DNA crosslinking profiles across factors.
We applied ChExAlign to align 16 ChIP-nexus profiles (eight factors, each in two conditions) across the top 500 most enriched Pol II peak locations (Figure 5A, C). The multiple profile alignment procedure tightly aligns ChIP-nexus profiles across these factors and conditions, as indicated by the high consistency between our alignment and annotated gene TSSs in the regions; 99% regions were aligned in the same orientation as the annotated TSS, and 53% of annotated TSSs were within 10 bp of the consensus TSS position within the multiple profile alignment (Figure 5B; Supplementary Figure S3). We next used our mixture model approach to quantify crosslinking strengths for each factor and across conditions (e.g. Figure 5D).
Figure 5.
ChExAlign enables quantification of regulatory complex reorganization across protein components and conditions. (A) ChIP-nexus multiple profile alignments for Pol II and TFIIB in control and TRI treatment conditions. Alignments are performed across the top 500 Pol II binding event locations. Heatmaps are sorted based on the order resulting from the multiple alignment. (B) Annotated Drosophila gene TSS positions plotted relative to the multiple profile alignment. (C) Aligned ChIP-nexus tag patterns of eight factors with or without TRI treatment. (D) Example of ChIP-nexus tag deconvolution at the position +2 over the TFIIB and Pol II profile (control and TRI treatment condition). (E) Fold difference in crosslinking strengths for eight factors at detected crosslinking event positions. Red represents relative increase in crosslinking strength under the TRI treatment, while blue represents relative increase in crosslinking strength under the control condition. In all displayed plots, coordinates are displayed with respect to the mode position of refGene TSS annotations across the aligned regions.
As illustrated in Figure 5C and D, Pol II’s crosslinking profile in the control condition shows one major peak at alignment position +33, while the profile in the TRI treatment condition shows additional crosslinking at alignment position -19. This apparent shift of Pol II occupancy is explained in detail in the source publication; the shift in crosslinking to the –19 position is due to the accumulation of Pol II at the pre-initiation complex (PIC) when it is prevented from moving into the downstream transcriptional pause site by TRI treatment (26). Similarly, an increase in TFIIB occupancy is observed at position –19 in the TRI treatment condition compared to the control. Our approach allows us to quantify this effect. Comparison of crosslinking quantification across conditions shows a 2-fold increase in Pol II crosslinking strength at the –19 PIC position after TRI treatment, while the +33 pause site position shows a 47% decrease in crosslinking strength. Other changes discussed in the original publication are also provided with a quantitative basis by our approach. For example, XPB, the direct target of TRI treatment, shows a 2-fold increase in crosslinking strength at the +16 position after TRI treatment, consistent with it blocking transcriptional initiation at that site. Similarly, a 60% decrease in TBP crosslinking at the +33 position after TRI treatment is consistent with TBP enrichment at that location being dependent on Pol II moving into the pause site. While these factors show distinct differences between the conditions, factors such as NELFE and TAF2 retained similar crosslinking patterns after the TRI treatment (Figure 5C, E). Our results thus demonstrate that crosslinking profile alignment followed by mixture model analysis enables us to quantify changes in regulatory complex organization between different conditions as well as across different experimental targets.
DISCUSSION
We have presented ChExAlign, a principled framework for characterizing ChIP-exo crosslinking patterns across multiple members of a protein–DNA complex. Our approach implements a multiple alignment procedure that is optimized for aligning multi-dimensional ChIP-exo profiles, and which does not rely on sequence motif or genomic annotation information. ChExAlign also encapsulates a novel mixture modeling approach for estimating the locations and relative strengths of crosslinking events within a set of aligned ChIP-exo profiles. As we have demonstrated, the resulting crosslinking matrix can be used to visualize the relative spatial relationships between members of a protein–DNA complex (via dimensionality reduction), and to quantify shifts in spatial positioning across experiment conditions. One caveat is that ChExAlign is designed to analyze crosslinking signatures between known or hypothesized members of a regulatory complex; it is not designed to be an exploratory tool where one can determine if a given protein belongs to a regulatory complex. ChExAlign's procedures will estimate relative crosslinking strengths for all constituent ChIP-exo experiments at characterized crosslinking positions, even if one of the underlying ChIP-exo profiles represents non-specific background signal. It is further challenging to assess the statistical significance of a marginal improvement in alignment scores in alignments that include or exclude a constituent experiment.
By applying ChExAlign to characterize the spatial organization of three distinct regulatory complexes, we have demonstrated that our approach is generally applicable to collections of ChIP-exo data profiling multiple members of a protein–DNA complex that is consistently organized across a set of genomic regions. Several previous studies have proposed the idea of characterizing the organization of protein–DNA complexes from ChIP-exo data (9,26,41). However, these previous approaches were typically limited to aligning data manually around known focal points of the protein–DNA complex (e.g. TF binding motifs or genomic annotations) and thus become impractical when analyzing large protein–DNA complexes that may be crosslinked at multiple unknown locations within regulatory regions.
The gold standard for characterizing the spatial organization of regulatory complexes is the creation of 3D structural models by applying structural biology techniques such as X-ray crystallography or cryo-EM to purified complexes. Our approaches by no means result in the same type of structural information. However, careful analysis of protein–DNA crosslinking patterns might add useful orthogonal information to structural studies. For example, ChIP-exo crosslinking analysis might help to confirm the in vivo relevance of 3D structures that have been defined in vitro. Similarly, analysis of how crosslinking patterns vary across regulatory regions or experimental conditions may help to clarify how consistent the structure of a regulatory complex is in biological conditions.
In summary, we have demonstrated that our framework enables new forms of insight from analysis of multiple ChIP-exo crosslinking patterns, taking another step towards the fine-grained characterization of spatial organization within protein–DNA complexes. As larger collections of high-resolution protein–DNA interaction data become available, our analysis framework will contribute to investigating transcriptional mechanisms in greater detail.
DATA AVAILABILITY
Open source code (MIT license) and documentation for ChExAlign are available from https://github.com/seqcode/chexalign. All ChIP-exo sequencing data produced in this study has been uploaded to GEO under accession GSE140923.
Supplementary Material
ACKNOWLEDGEMENTS
Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Contributor Information
Naomi Yamada, Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.
Matthew J Rossi, Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.
Nina Farrell, Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.
B Franklin Pugh, Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.
Shaun Mahony, Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
National Institutes of Health [R01GM125722 to S.M., B.F.P.]; Early development of the ChExAlign platform was also supported by the National Science Foundation ABI Innovation [DBI1564466 to S.M.]. Funding for open access charge: NIH.
Conflict of interest statement. B.F.P. has a financial interest in Peconic, LLC, which utilizes the ChIP-exo technology implemented in this study and could potentially benefit from the outcomes of this research.
REFERENCES
- 1. Dunham I., Kundaje A., Aldred S.F., Collins P.J., Davis C.A., Doyle F., Epstein C.B., Frietze S., Harrow J., Kaul R. et al.. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489:57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Roadmap Epigenomics Consortium Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., Heravi-Moussavi A., Kheradpour P., Zhang Z., Wang J. et al.. Integrative analysis of 111 reference human epigenomes. Nature. 2015; 518:317–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Guo Y., Gifford D.K.. Modular combinatorial binding among human trans-acting factors reveals direct and indirect factor binding. BMC Genomics. 2017; 18:45. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Giannopoulou E.G., Elemento O.. Inferring chromatin-bound protein complexes from genome-wide binding assays. Genome Res. 2013; 23:1295–1306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Xie D., Boyle A.P., Wu L., Zhai J., Kawli T., Snyder M.. Dynamic trans-acting factor colocalization in human cells. Cell. 2013; 155:713–724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. He Q., Johnston J., Zeitlinger J.. ChIP-nexus enables improved detection of in vivo transcription factor binding footprints. Nat. Biotechnol. 2015; 33:395–401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Rhee H.S., Pugh B.F.. Comprehensive genome-wide protein–DNA interactions detected at single-nucleotide resolution. Cell. 2011; 147:1408–1419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Rhee H.S., Pugh B.F.. Genome-wide structure and organization of eukaryotic pre-initiation complexes. Nature. 2012; 483:295–301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Reja R., Vinayachandran V., Ghosh S., Pugh B.F.. Molecular mechanisms of ribosomal protein gene coregulation. Genes Dev. 2015; 29:1942–1954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Starick S.R., Ibn-Salem J., Jurk M., Hernandez C., Love M.I., Chung H.-R., Vingron M., Thomas-Chollier M., Meijsing S.H.. ChIP-exo signal associated with DNA-binding motifs provide insights into the genomic binding of the glucocorticoid receptor and cooperating transcription factors. Genome Res. 2015; 25:825–835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Yamada N., Lai W.K.M., Farrell N., Pugh B.F., Mahony S.. Characterizing protein–DNA binding event subtypes in ChIP-exo data. Bioinformatics. 2019; 35:903–913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Lim H.-W., Uhlenhaut N.H., Rauch A., Weiner J., Hübner S., Hübner N., Won K.-J., Lazar M.A., Tuckermann J., Steger D.J.. Genomic redistribution of GR monomers and dimers mediates transcriptional response to exogenous glucocorticoid in vivo. Genome Res. 2015; 25:836–844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Rossi M.J., Lai W.K.M., Pugh B.F.. Simplified ChIP-exo assays. Nat. Commun. 2018; 9:2842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Lai W.K.M., Buck M.J.. ArchAlign: coordinate-free chromatin alignment reveals novel architectures. Genome Biol. 2010; 11:R126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Hon G., Ren B., Wang W.. ChromaSig: a probabilistic approach to finding common chromatin signatures in the human genome. PLoS Comput. Biol. 2008; 4:e1000201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Nielsen F.G.G., Markus K.G., Friborg R.M., Favrholdt L.M., Stunnenberg H.G., Huynen M.. CATCHprofiles: clustering and alignment tool for chip profiles. PLoS One. 2012; 7:e28272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Nair N.U., Kumar S., Moret B.M.E., Bucher P.. Probabilistic partitioning methods to find significant patterns in ChIP-Seq data. Bioinformatics. 2014; 30:2406–2413. [DOI] [PubMed] [Google Scholar]
- 18. Needleman S.B., Wunsch C.D.. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 1970; 48:443–453. [DOI] [PubMed] [Google Scholar]
- 19. Durbin R., Eddy S.R., Krogh A., Mitchison G., Durbin R., Eddy S.R., Krogh A., Mitchison G.. Pairwise alignment. Biological Sequence Analysis. 2010; 12–45. [Google Scholar]
- 20. Liang K., Keles S.. Normalization of ChIP-seq data with control. BMC Bioinformatics. 2012; 13:199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Figueiredo M.A.T., Jain A.K.. Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 2002; 24:381–396. [Google Scholar]
- 22. Mahony S., Edwards M.D., Mazzoni E.O., Sherwood R.I., Kakumanu A., Morrison C.A., Wichterle H., Gifford D.K.. An integrated model of multiple-condition ChIP-Seq data reveals predeterminants of Cdx2 binding. PLoS Comput. Biol. 2014; 10:e1003501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Rhee H.S., Pugh B.F.. ChIP-exo method for identifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy. Curr. Protoc. Mol. Biol. 2012; doi:10.1002/0471142727.mb2124s100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R.. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009; 25:2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Vinayachandran V., Reja R., Rossi M.J., Park B., Rieber L., Mittal C., Mahony S., Pugh B.F.. Widespread and precise reprogramming of yeast protein-genome interactions in response to heat shock. Genome Res. 2018; 28:357–366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Shao W., Zeitlinger J.. Paused RNA polymerase II inhibits new transcriptional initiation. Nat. Genet. 2017; 49:1045–1051. [DOI] [PubMed] [Google Scholar]
- 27. Li H., Durbin R.. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25:1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Machanick P., Bailey T.L.. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics. 2011; 27:1696–1697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Weirauch M.T., Yang A., Albu M., Cote A.G., Montenegro-Montero A., Drewe P., Najafabadi H.S., Lambert S.A., Mann I., Cook K. et al.. Determination and inference of eukaryotic transcription factor sequence specificity. Cell. 2014; 158:1431–1443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Knight B., Kubik S., Ghosh B., Bruzzone M.J., Geertz M., Martin V., Dénervaud N., Jacquet P., Ozkan B., Rougemont J. et al.. Two distinct promoter architectures centered on dynamic nucleosomes control ribosomal protein gene transcription. Genes Dev. 2014; 28:1695–1709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Deprez E., Arrebola R., Conesa C., Sentenac A.. A subunit of yeast TFIIIC participates in the recruitment of TATA-binding protein. Mol. Cell. Biol. 1999; 19:8042–8051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Kassavetis G.A., Braun B.R., Nguyen L.H., Peter Geiduschek E.. S. cerevisiae TFIIIB is the transcription initiation factor proper of RNA polymerase III, while TFIIIA and TFIIIC are assembly factors. Cell. 1990; 60:235–245. [DOI] [PubMed] [Google Scholar]
- 33. Dumay-Odelot H., Acker J., Arrebola R., Sentenac A., Marck C.. Multiple roles of the tau131 subunit of yeast transcription factor IIIC (TFIIIC) in TFIIIB assembly. Mol. Cell. Biol. 2002; 22:298–308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Rüth J., Conesa C., Dieci G., Lefebvre O., Düsterhöft A., Ottonello S., Sentenac A.. A suppressor of mutations in the class III transcription system encodes a component of yeast TFIIIB. EMBO J. 1996; 15:1941–1949. [PMC free article] [PubMed] [Google Scholar]
- 35. Chaussivert N., Conesa C., Shaaban S., Sentenac A.. Complex interactions between yeast TFIIIB and TFIIIC. J. Biol. Chem. 1995; 270:15353–15358. [DOI] [PubMed] [Google Scholar]
- 36. Han Y., Yan C., Fishbain S., Ivanov I., He Y.. Structural visualization of RNA polymerase III transcription machineries. Cell Discov. 2018; 4:40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Liao Y., Willis I.M., Moir R.D.. The Brf1 and Bdp1 subunits of transcription Factor TFIIIB Bind to overlapping sites in the tetratricopeptide repeats of Tfc4. J. Biol. Chem. 2003; 278:44467–44474. [DOI] [PubMed] [Google Scholar]
- 38. Liao Y., Moir R.D., Willis I.M.. Interactions of Brf1 peptides with the tetratricopeptide repeat-containing subunit of TFIIIC inhibit and promote preinitiation complex assembly. Mol. Cell. Biol. 2006; 26:5946–5956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Male G., von Appen A., Glatt S., Taylor N.M.I., Cristovao M., Groetsch H., Beck M., Müller C.W.. Architecture of TFIIIC and its role in RNA polymerase III pre-initiation complex assembly. Nat. Commun. 2015; 6:7387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Khoo S.-K., Wu C.-C., Lin Y.-C., Lee J.-C., Chen H.-T.. Mapping the protein interaction network for TFIIB-related factor Brf1 in the RNA polymerase III preinitiation complex. Mol. Cell. Biol. 2014; 34:551–559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Rhee H.S., Closser M., Guo Y., Bashkirova E. V., Tan G.C., Gifford D.K., Wichterle H.. Expression of terminal effector genes in mammalian neurons is maintained by a dynamic relay of transient enhancers. Neuron. 2016; 92:1252–1265. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Open source code (MIT license) and documentation for ChExAlign are available from https://github.com/seqcode/chexalign. All ChIP-exo sequencing data produced in this study has been uploaded to GEO under accession GSE140923.












