Genomic organization of human transcription initiation complexes

Bryan J Venters; B Franklin Pugh

doi:10.1038/nature12535

This article has been retracted.

Retraction in: Nature. 2014 Jul 23;513(7518):444 See also: PMC Retraction Policy

. Author manuscript; available in PMC: 2014 May 13.

Published in final edited form as: Nature. 2013 Sep 18;502(7469):53–58. doi: 10.1038/nature12535

Genomic organization of human transcription initiation complexes

Bryan J Venters ^1,², B Franklin Pugh ¹

PMCID: PMC4018585 NIHMSID: NIHMS514597 PMID: 24048476

Abstract

The human genome is pervasively transcribed, yet only a small fraction is coding. Here we address whether this noncoding transcription arises at promoters, and detail the interactions of initiation factors TBP, TFIIB, and RNA polymerase (Pol) II. Using ChIP-exo, we identify ~160,000 transcription initiation complexes across the human K562 genome, and more in other cancer genomes. Only ~5% associate with mRNA genes. The remaining associate with non-polyadenylated noncoding transcription. Regardless, Pol II moves into a transcriptionally paused state, and TBP/TFIIB remain at the promoter. Remarkably, the vast majority of locations contain the four core promoter elements: BRE_u, TATA, BRE_d, and INR, in constrained positions. All but the INR also reside at Pol III promoters, where TBP makes similar contacts. This comprehensive and high resolution genome-wide detection of the initiation machinery produces a consolidated view of transcription initiation events from yeast to humans at Pol II/III TATA-containing/TATA-less coding and noncoding genes.

Keywords: ChIP-seq, ChIP-exo, GTFs, human, promoter elements, noncoding transcription

The classic paradigm for assembling the minimal core transcription machinery at mRNA promoters starts with the recruitment of the TATA binding protein (TBP) to the TATA box core promoter element¹. Next is the docking of TFIIB, which straddles TBP and locks onto flanking TFIIB-responsive elements (BRE_u and BRE_d)^2,3. Together with TFIIF, TFIIB then engages Pol II in its active site to help set the start site of transcription (TSS) at an Initiator element (INR)^4-6. The recruitment of the transcription machinery has long been thought to be an important rate-limiting step in gene expression⁷. Concepts in transcription initiation by all three RNA polymer-ases (I, II, and III) have been guided by this basic theme⁸.

Clashing with this seemingly simplified view is that the TATA box has been identified at only ~10% of human promoters^9,10, with most genes ostensibly being classified as “TATA-less” in all three RNA polymerase systems. The other core promoter elements are apparently equally rare. A second complication of the classic view, particular to multi-cellular eukaryotes, is that the general transcription factors may be largely pre-assembled at promoters. There Pol II is in a transcriptionally engaged but paused state, approximately 30-50 bp downstream from the TSS^11-13. A third complication is that transcription of genomes is not restricted to coding genes, but appears to be quite pervasive, without clear evidence of being coupled to definable promoters¹⁴. These complications, together, paint a seemingly complex picture of eukaryotic transcription initiation.

Towards reconciling simplistic models against complex data, we recently developed the ChIP-exo assay to map sites of protein-DNA interactions at near single-base resolution¹⁵. We discovered in yeast that so-called TATA-less promoters actually possess degenerate versions of the TATA-box, and that most yeast promoters assemble the transcription machinery fundamentally in accord with the classic paradigm¹⁶, although a deep dichotomy between the TATA/SAGA/stress-induced genes and TATA-less/TFIID/housekeeping genes remains. This led us to consider whether similar simplicity was true in humans, albeit with additional complications of paused polymerase and pervasive noncoding transcription.

TBP/TFIIB separation from paused Pol II

Using ChIP-exo, we detected 159,117 TFIIB locations (peak pairs) in K562 cells, of which 36% were associated with ENCODE-defined transcriptional domains (Extended Data Fig. 1a)¹⁷. Remarkably, half were associated with heterochromatic regions, which are generally thought to be devoid of stable RNA production. However, heterochromatic transcription may be more pervasive.

Extended Data Figure 1 — a, Pie chart of all 159,117 TFIIB-bound locations in K562 cells parsed into ENCODE-annotated regions. b, Venn overlap among mRNA genes having TBP or TFIIB locations (<500 bp from its TSS) and genes with measured polyadenylated mRNA levels detected by RNA-seq³⁸. Data thresholding may contribute to nonoverlapping sets. c, Moving average (100-gene) of mRNA levels versus TFIIB/TBP/Pol II occupancy levels on a median-centered log₂ scale.

We assigned a TBP/TFIIB location to >50% of all annotated protein-coding K562-expressed genes (Extended Data Fig. 1b), thereby providing independent validation. Seemingly expressed genes that lacked a TBP/TFIIB location may have arisen from multiple sources including rare but stable mRNAs, detection noise, and antisense transcription arising from a more distal promoter. TBP/TFIIB/Pol II occupancy and mRNA levels were correlated (Extended Data Fig. 1c), as expected of recruitment being at least partially rate-limiting in gene expression.

We initially focused on all 8,364 K562 TFIIB locations near the TSS of 6,511 coding RNAs as defined by RefSeq¹⁸. Fig. 1a provides one example of the raw tag distribution and the identified core promoter elements concentrated ~25 bp upstream of the RPS12 ribosomal protein gene TSS. When individual genes were examined (Fig. 1b), or averaged across all 6,511 genes (Fig. 1c), two regions of high TFIIB/TBP/Pol II occupancy were observed. The major right-ward peaks corresponded to primary promoter transcription initiated complexes (Fig. 1c, upper panel). Those in the left-ward direction matched divergent TSSs^19-22, although the resulting RNA was less abundant than expected from TFIIB/TBP/Pol II occupancy levels (Fig. 1c, lower vs upper panel; Note that 2° TSS represents only 24% of the total TSS signal). This may result from RNA instability, as seen in yeast. The clear spatial separation of complexes indicates that divergent transcripts arise from distinct initiation complexes, most (78%) of which were in CpG islands. On average two complexes were detected per CpG island²³, regardless of island length, with the center of the island being enriched ~100 bp downstream of the primary TSS (Extended Data Fig. 2a,b). Complexes tended to be separated by 70-180 bp (Extended Data Fig. 2c, red), and had un-correlated occupancies (Extended Data Fig. 2c, black), which suggests that they are regulated independently.

a, Smoothed distribution of strand-separated ChIP-exo tag 5’ ends at the RPS12 gene. Core promoter elements are shown with lower case denoting mismatches to the consensus. b, Peak-pair distribution or RNA at RefSeq genes (rows). Rows are linked, and sorted by TFIIB occupancy. c, Upper panel: Averaged ChIP-exo patterns around the closest (1°) RefSeq TSS. The “spikes” of TBP and TFIIB are indiscernible (vertically offset in inset). Lower panel: Distribution of 2° polyadenylated RNA³⁸, with traces separated by sense (blue) and antisense (red, inverted trace) orientations relative to the corresponding mRNA TSS.

Extended Data Figure 2 — a, Peak-pair distribution for TFIIB, TBP and Pol II at the 5,095 CpG islands that overlap with the mRNA TSSs from Figure 1b (78% overlap), and with the direction of transcription to the right. Rows are linked, and sorted by CpG island length. CpG island borders are indicated by blue and red bars, respectively. b, Shown is the averaged data from panel a. c, All 159,117 TFIIB locations were sorted by location, and inter-TFIIB distances calculated (red trace). Data were then sorted by distance, and the standard deviation of TFIIB occupancy was calculated on a sliding window of 30 values. Peak calling parameters preclude detection of two separate TFIIB locations <~40 bp apart. Those that were 40-70 bp apart were correlated, whereas those >~70 bp apart were uncorrelated.

For the vast majority of transcription units, Pol II crosslinked 50 bp downstream of the primary TSS (Fig. 1b, c), where it is expected to pause after initiating transcription¹³. Pol II was most depleted over the core promoter, indicating that it does not stably reside there in proliferating K562 cells. Therefore when Pol II enters the core promoter, it rapidly initiates transcription and then moves into a paused state ~50 bp downstream, thereby preventing any new polymerase from detectably engaging the core promoter.

The crosslinking pattern of human TFIIB was of particular interest since TFIIB in budding yeast crosslinks broadly across the relatively stable single-stranded DNA region within the Pol II active site at core promoters¹⁶, in accord with crystallographic models of “open” complexes²⁴. Remarkably, human TFIIB maintained its contact within this region, despite the absence of polymerase (Fig. 1c, upper panel). Mechanistically, this might occur via TFIIB contacts with BRE_d³ (see below), which are absent in budding yeast. The coincidence of TBP and TFIIB crosslinking at the BRE_d suggests that TBP may be predominantly crosslinking to TFIIB there, rather than directly to DNA.

BRE_u TATA, BRE_d and INR are common

We looked for core promoter elements (illustrated in Fig. 2a) within the narrow intervals defined by 8,364 mRNA TSS-proximal TFIIB locations. Remarkably, and consistent with yeast¹⁶, nearly 85% of them had a sequence with 0-3 mismatches to the TATA-box consensus (TATAWAWR)²⁵ (Fig. 2b-c). Less than 3% had a perfect match to the consensus. Deviations from the TATA box consensus inversely correlated with TFIIB and TBP occupancy levels (Extended Data Fig. 3a), indicating that TATA element sequence quality contributes to their occupancy level, consistent with previous observations²⁶ on their in vivo functionality.

a, Core promoter schematic. b, Nucleotide distribution for TATA elements with 0-3 mismatches (panels) to the consensus, and sorted by ascending p-value. Colors are reflected in the logo color. c, Cumulative percent of TFIIB locations having a TATAWAWR sequence with 0-3 mismatches (solid line). Controls include a randomized sequence (60% GC, dashed black line), a scrambled consensus (dashed red line), and 8,364 locations represented by a single background tag (dashed gray line). d, Distance of strand-specific TFIIB peaks (exonuclease stop sites) from TATA element midpoints. Opposite-strand peaks are in red and inverted.

Extended Data Figure 3 — a, Average TFIIB and TBP occupancy parsed by the number of mismatches to the TATA consensus. b, Distribution of each candidate core promoter element relative to each other.

Several controls put the false positive rate for TATA elements at ~20% (Fig. 2c). First, 10,000 randomly generated sequences having the same human genome sequence bias found that only 16% were called by chance. Second, a scrambled version of the motif (having 0-3 mismatches) was identified only 20% of the time, and had no positional relationship with TFIIB/TBP (not shown). Third, coordinates having a single isolated tag were used to generate an essentially random set of false-positive locations, and the analysis repeated. TATA elements (0-3 mismatches) were identified only 20% of the time. Fourth, whereas control sequences were distributed randomly across the query space, the distribution of TATA elements was not random. Instead it displayed a tight peak 20 bp upstream of TFIIB and TBP locations (Fig. 2d, and data not shown).

TFIIB in complex with TBP makes sequence-specific contacts with BRE_u and BRE_d, which flank the TATA box^2,3 and are upstream of the INR (Fig. 2a). However, these elements are essentially nonexistent in yeast, and ill-defined across mammalian genomes. Using the identified TATA elements as a reference point, we searched upstream for the BRE_u and downstream for the BRE_d and INR. Strikingly, in nearly every instance a sequence with three or less mismatches to the literature-derived consensus for BRE (SSRCGCC)², BRE_d(RTDKKKK)³, and INR (YYANWYY)²⁷ was found (Fig. 3a-c). Remarkably, sequences within each element appeared to co-vary. For example, the BRE_d consensus tended towards either GTKGGGG or ATKTTTT, rather than an equal mixture of all possible combinations (Fig. 3b), making them less degenerate than the consensus would suggest. Similarly, the INR consensus tended towards either CCANWCC or TTANWTT (Fig. 3c). Sequence bifurcation was not observed with TATA or BRE_u elements. Given the strong bias towards either strong (G/C) or weak (A/T) base-pairing, this sequence dimorphism may reflect selection for distinct thermodynamic stabilities towards helix melting, which is an essential first step in initiation at these elements. Consistent with this, A/T-rich BRE_d and INR elements had substantially higher crosslinking levels of TFIIB than their G/C-rich counterparts (not shown). However, this may not explain the strand bias of the sequences.

**a-c**, Nucleotide distribution for BRE_u, BRE_d, and INR, vertically separated by 0-3 mismatches to the consensus, and sorted by ascending p-value within panels. d, Distance of strand-specific TFIIB peaks from BRE_u, BRE_d, and INR. Opposite-strand peaks are in red and inverted. e, Cumulative percent of genes with 0-3 mismatches to each motif in panels a-c. Controls were randomized sequences (60% GC, dashed lines). f, Distribution of core promoter elements relative to TATA box borders.

Similar to our TATA analysis, the TFIIB peak density was tightly focused at a fixed distance from each core promoter element (Fig. 3d), and randomized controls were rarely found (Fig. 3e), thereby validating them. TFIIB peak-pairs were centered over BRE_d, suggesting that the primary crosslinking point is through the BRE_d. Unlike the TATA element, the BRE and INR elements deviated relatively little from their consensus (compare Figs. 2c and 3e) and such deviations did not correlate with TBP and TFIIB occupancy levels (not shown). Thus, BRE and INR sequence variability may regulate occupancy of the basal initiation complex to a lesser extent than TATA. Within their search space, the locations of each core promoter element peaked at previously-defined canonical positions (Fig. 3f and Extended Data Fig. 3b), thereby providing cross-validation and a core promoter consensus: SSRCGCCNNNTATAWAWRNNRTDKKKKNNNNYYANWYY. The tolerance for mismatches in these elements appears to be 2-3-2-1, respectively.

150,000 noncoding initiation complexes

We next examined the remaining 150,753 putative TFIIB locations that were far (>500 bp) from a protein-coding gene (Extended Data Table 1). At a 20% false discovery rate per element, we identified at least 3 of the 4 core promoter elements at 97% of all non-mRNA TFIIB locations (Extended Data Fig. 4a). Deviations from the consensus were no more than at mRNA genes (average of 5 deviations across 28 positions within the four core promoter elements). TBP, TFIIB, and Pol II peaked at the same canonical distances from each motif as found at mRNA promoters (Extended Data Fig. 4b,c). They were also embedded in the same chromatin environment as mRNA promoters (Fig. 4a,b), but displayed comparatively lower TFIIB occupancy (Extended Data Fig. 4d).

Extended Data Table 1. Illumina sequencing statistics.

Summary of uniquely mapped sequencing reads for each biological replicate.

Factor	Antibody	Cell Line	Total Reads	Uniquely Mapped Reads	Unique Mapping Rate
Input	none	K562	126,007,656	104,591,819	83%
Input	none	K562	109,745,112	91,160,835	83%
		Totals:	235,752,768	195,752,654
TBP	sc-204	K562	97,896,951	60,581,579	62%
TBP	sc-204	K562	181,420,753	132,655,896	73%
TBP	sc-204	K562	200,167,837	115,213,419	58%
		Totals:	479,485,541	308,450,894
TFIIB	sc-225	K562	64,473,390	43,727,825	68%
TFIIB	sc-225	K562	129,513,614	80,930,721	62%
		Totals:	193,987,004	124,658,546
Pol II	sc-899	K562	40,833,504	31,260,456	77%
Pol II	sc-899	K562	119,799,682	88,431,598	74%
		Totals:	160,633,186	119,692,054
TFIIB	sc-225	HeLa-S3	62,249,055	41,815,431	67%
TFIIB	sc-225	HeLa-S3	185,240,056	123,002,393	66%
		Totals:	247,489,111	164,817,824
TFIIB	sc-225	HepG2	78,313,847	50,505,201	64%
TFIIB	sc-225	HepG2	264,530,278	172,112,282	65%
		Totals:	342,844,125	222,617,483
TFIIB	sc-225	MCF7	25,615,261	14,780,271	58%
TFIIB	sc-225	MCF7	120,958,757	28,600,410	24%
		Totals:	146,574,018	43,380,681

Open in a new tab

Extended Data Figure 4 — a, Bar graph showing the percentage of all 150,754 putative “noncoding” TFIIB binding locations (>500 bp from an annotated RefSeq TSS) that have the indicated number of core promoter elements. b, Distribution of ChIP-exo peaks on each strand relative to the indicated core promoter element, for 150,754 putative “noncoding” TFIIB locations. Opposite strand traces (red) are inverted. c, Distribution of TBP (purple) and Pol II (black) peak-pair midpoints relative to the TATA motif midpoint derived from the 150,754 TFIIB putative “noncoding” locations. d, TFIIB occupancy versus percentage of locations that code for proteins. All 159,117 TFIIB locations were sorted by occupancy level, then the percentage of locations linked to an annotated RefSeq feature was plotted as a moving average.

a, Distribution of chromatin marks around TFIIB at RefSeq genes (left) and ncRNA (right). b, TFIIB locations that overlap with chromatin marks and epigenetic regulators³⁹. c, Distribution of polyadenylated³⁸ and non-polyadenylated⁴⁰ RNA-seq tags around TFIIB >500 bp from a RefSeq TSS. Percentages reflect TFIIB having an RNA tag <2 kb away. Left panels include sense (blue) and antisense (red and inverted) strands for RefSeq genes, which was not applied to ncRNA (right panels). d, 100-gene moving average of polyadenylated and nonpolyadenylated RNA levels versus TFIIB occupancy at mRNA and ncRNA genes (left and right panels, respectively) on a median-centered log₂ scale.

Remarkably, TBP/TFIIB/Pol II complexes were linked to the production of nonpolyadenylated RNA (87% had them) rather than polyadenylated transcripts (Fig. 4c and Extended Data Fig. 5), which is in agreement with the finding of enhancer RNAs²⁸. Their locations mapped precisely to the location of TFIIB. Nonpolyadenylated transcript levels also correlated more strongly with “noncoding” TFIIB occupancy than did polyadenylated levels (Fig. 4d), further validating the link. Taken together, we conclude that the vast majority of all 159,117 TFIIB locations (noncoding plus coding) detected in K562 cells represent bona fide and fundamentally identical core promoter initiation complexes of which ~5% produce mRNA and ~95% produce RNA that is non-polyadenylated and noncoding.

Extended Data Figure 5 — Frequency distribution RNA 5’ ends for Poly(A)+³⁸ (top plots) and ENCODE project RNA fractions⁴⁰ as indicated to the far left. Traces in the left panels are separated by sense (blue) and anti-sense (red, inverted) orientations relative to the corresponding mRNA TSS, which is directed to the right. Since the TSS orientation is not known for the Poly(A)- ncRNA loci, positive and negative strand tags were plotted relative to the TFIIB midpoint. The percent of putative TFIIB locations that exist within 2 kb of an RNA tag are indicated in the upper right corner of each plot.

Restricted motif spacing in promoters

We searched for an overall core promoter element (CPE) consensus (SSRCGCCNNNTATAWAWRNNRTDKKKKNNNNYYANWYY) and ~40 spacing variants within 100 bp of all TFIIB locations, and plotted their distribution relative to TFIIB (Extended Data Fig. 6). Remarkably, the consensus spacing defined in Fig. 3f displayed the strongest positional relationship with TFIIB (Fig. 5a). For example, a consensus having the spacing between BRE_d and INR reduced by 1 bp displayed almost no positional relationship with TFIIB (red vs black thick traces in Fig. 5a), as would be expected of a random/nonfunctional sequence.

Extended Data Figure 6 — Candidate CPE at varying distances from all 159,117 TFIIB locations, for the indicated spacing variants (not all possible combination were tested). Digits within spacing variant names reflect the bp spacing (N) between elements (e.g., “324” denotes BRE_u-NNN-TATA-NN-BRE_d-NNNNINR). The collection of core promoters illustrated at the top had the strongest positional enrichment, and thus were used to associate candidate core promoters with TFIIB (*Extended Data Table 1*).

a, Candidate core promoter enrichment at varying distances from all 159,117 TFIIB locations, for spacing variants “323” and “324”, for motifs with weak (thick lines) and strong (thin lines) p-values. b, Traces from panel a and *Extended Data Fig. 6*, were transformed into enrichment scores and shown as a table, sectored by element spacing, and at two motif p-value intervals. Values are heat-map colored from green to light gray. Configurations in white were not examined. c, Schematic of core promoters having the strongest positional correlation with TFIIB, rank ordered by opacity. “324” (***) stood out as the strongest.

There was very little or no tolerance for variable spacing between core promoter elements (Fig. 5b), which reflects structural constraints of the initiation complex⁵. Surprisingly, proper spacing was accompanied by greater sequence deviations within individual core promoter elements (thick vs thin black line in Fig. 5a), whereas small (~1 bp) spacing deviations were accompanied by stronger elements (thick vs thin red line in Fig. 5a, and summarized in Fig. 5b,c). In short, core promoters may be weak by design, through a compensatory balance of sequence and spacing deviations from the consensus. This allows for greater dependence on transcriptional activators, but also provides for a specified basal output.

We conducted ChIP-exo mapping of TFIIB locations across four ENCODE cancer cell lines: HeLa S3, HepG2, and MCF7 in addition to K562 (cervical, liver, breast, and blood, respectively). We detected TFIIB at 9,074 mRNA genes in at least one cell line, and at 1,691 genes in all lines (group 1 in Extended Data Fig. 9). Cluster analysis suggested that while TFIIB occupancy levels varied from gene to gene, most were relatively constant at individual genes across cell lines. About a third displayed noticeable cell-type specificity (e.g., group 3 in Extended Data Fig. 9). For noncoding initiation complexes, we focused on those present in two or more cell types, and found 100,349 such locations (376,074 locations were found in at least one cell type). Noncoding complexes appeared to have more cell-type specificity and were bimodally distributed at high and low occupancy levels. This heterogeneity may reflect more numerous and diverse roles for the resulting noncoding transcription and/or RNA in cell-type specific physiology compared to proteins.

tRNA genes have TATA and BRE

With some exception²⁹, tRNA genes have been classically defined as TATA-less, where TFIIIC recognizes specific sequences downstream of the TSS, then recruits TFIIIB to a region immediately upstream of the TSS that lacks apparent sequence specificity^30,31. Pol III then binds to form an initiation complex. TFIIIB contains TBP (and BRF, a factor related to TFIIB) and thus it has been enigmatic as to how TBP in TFIIIB engages the upstream region without a TATA box.

Remarkably, TBP crosslinked ~21 bp upstream of 386 tRNA genes (Fig. 6a, left panel), as seen at Pol II promoters. In nearly every instance we found a TATA element (Fig. 6a, middle) that was ~18 bp further upstream (Fig. 6b). Similar to TBP crosslinking through TFIIB, we suspect that TBP crosslinks through BRF. Indeed, the peaks of BRF and TBP crosslinking are coincident at Pol III genes in mice³². As with Pol II promoters, we found a BRE_d centered between each TBP peak pair (Fig. 6a, right panel) and a BRE_u immediately upstream of TATA (not shown). Enrichment of these elements, but not the Pol II-specific INR³³, were statistically significant (Fig. 6c). Thus, TBP in complex with a TFIIB family member engages a set of BRE_u-TATA-BRE_d core promoter elements similarly in Pol II and III systems.

a, Left panel: TBP peak density separated by forward and reverse strand orientation (blue and red colors, respectively) relative to each tRNA TSS. Corresponding sequences are shown in the right two panels. b, Average distribution of TBP peaks around all identified tRNA TATA elements. c, Cumulative percent of tRNA genes with the indicated promoter element having 0, 1, 2, or 3 mismatches to the consensus. Dashed lines represent calculations for an equivalent number of randomized sequences for the color-linked solid traces.

Consolidated genomic view of initiation

Genome-wide mapping of the general transcription machinery at near single-base resolution offers a consolidated model of certain transcription initiation events from yeast to humans, Pol II to Pol III, TATA-containing to TATA-less, and mRNA to ncRNA. In general, a TFIIB/BRF family member is recruited to all coding or noncoding core promoters via a TBP family member and spatially-constrained core promoter elements. Sequence-specific (BRE_d) contact with the DNA a few bp downstream of TATA, might “bookmark” the site of DNA melting for a rapidly departing Pol II or III. Yeast Pol II is relatively slow to depart, and so it produces equivalent TFIIB-“open” promoter contacts in the absence of a BRE_d. Pol II then scans downstream several bp, where it encounters an INR that allows for productive transcription, which subsequently pauses 30-50 bp further downstream. In yeast, where an INR and pausing appear absent, a nucleosome border may help set the start site of productive transcription.

Although core promoters are seemingly long (~38 bp in human) for sequence-specific binding, they are designed to be inherently low in specificity, presumably to keep basal transcription low and to maintain high dependence on transcriptional activators. Appropriate specificity is achieved via a blend of degeneracy in motif sequence and spacing. Broad clusters of TSSs at mammalian genes⁴ can therefore be explained in terms of clusters of core promoters, many of which may fall below bioinformatic detection.

The discovery that transcription of the human genome is vastly more pervasive than what produces coding mRNA raises the question as to whether Pol II initiates transcription promiscuously through random collisions with chromatin as biological noise or whether it arises specifically from canonical Pol II initiation complexes in a regulated manner. Our discovery of ~150,000 noncoding promoter initiation complexes in human K562 cells and more in other cell lines suggests that pervasive noncoding transcription is promoter-specific, regulated, and not much different from coding transcription, except that it remains nuclear and nonpolyadenylated. An important next question is the extent to which transcription factors regulated this ncRNA.

We detected promoter transcription initiation complexes at 25% of all ~24,000 human coding genes, and found that there were 18-fold more noncoding complexes than coding. We therefore estimate that the human genome potentially harbors as many as 500,000 potential promoter initiation complexes, corresponding to an average of about one every 3 kb in the non-repetitive portion of the human genome. This number may vary more or less depending on what constitutes a meaningful transcription initiation event. The finding that these initiation complexes are largely limited to locations having well-defined core promoters and measured TSSs indicates that they are functional and specific, but it remains to be determined to what end. Their massive numbers would appear to provide an origin for the so-called dark matter RNA of the genome³⁴, and could house a substantial portion of the missing heritability³⁵.

METHODS

Cell Culture

Human chronic myelogenous leukemia cells (K562, ATCC) were maintained between 1×10⁵ – 1×10⁶ cells/milliliter in DMEM media supplemented with 10% bovine calf serum at 37°C with 5% CO₂. Human adenocarcinoma cells from the cervix (HeLa S3, ATCC), liver (HepG2, ATCC), and breast (MCF7, ATCC) were grown in a similar manner as K562 cells except that they were maintained between 25-90% confluence. Cells were washed and phosphate buffered saline (1× PBS, 8 mM Na₂HPO₄, 2 mM KH₂PO₄, 150 mM NaCl, and 2.7 mM KCl) before incu bation with formaldehyde in a final concentration of 1% for 10 minutes. Cells were lysed (10 mM Tris pH 8, 10 mM NaCl, 0.5% NP40, and complete protease inhibitor cocktail (CPI, Roche), and then the nuclei lysed (50 mM Tris pH 8, 10 mM EDTA, 0.32% SDS, CPI). Purified chromatin was resuspended in IP dilution buffer (40 mM Tris pH 8.0, 7 mM EDTA, 56 mM NaCl, 0.4% Triton x-100, 0.2% SDS, and CPI) and sonicated with a Bioruptor (Diagenode) to obtain fragments with a size range between 100 and 500 bp.

ChIP-exo and Antibodies

With the following modifications, ChIP-exo was carried out as previously described³⁶ with chromatin extracted from 10 million cells, ProteinG MagSepharose resin (GE Healthcare), and 3 ug of either TFIIB (Santa Cruz Biotech, sc-225), TBP (Santa Cruz Biotech, sc-204), or Pol II (Santa Cruz Biotech, sc-899, directed against the N-terminus of the Pol II large subunit encoded by POL2RA).

Alignment to Genome, Peak Calling, and Data Access

Libraries were sequenced on an Illumina HiSeq sequencer. The entire length of the sequenced tags were aligned to the human hg18 reference genome using BWA⁴¹ using default parameters. Raw sequencing data are available at NCBI Sequence Read Archive (SRA067908). The resulting sequence read distribution was used to identify peaks on the forward (W) and reverse (C) strand separately using the peak calling algorithm in GeneTrack (sigma = 20, exclusion zone = 40 bp)⁴². For strand-specific and strand-merged plots, sequencing tags were normalization to input. All 11,458 locations that were present in the ENCODE designated blacklist were removed from the analysis. Peaks were paired if they were 0-80 bp in the 3’ direction from each other and on opposite strands. Since patterns described here were evident among individual biological replicates, and replicates were well correlated, we merged all tags from biological replicate data sets to make final peak-pair calls. Peak pairs were considered to be TFIIB if they had a tag count of >4 in the merged datasets. 159,117 locations met these criteria. Peak pair matches across cell lines required that their midpoints be within 80 bp of each other.

NCBI-curated RefSeq TSSs (n=26,987)¹⁸ comprising 23,181 nonredundant mRNA genes were considered. Assignment of TFIIB (8,364 peak-pairs) and TBP (7,642 peak-pairs) to the nearest RefSeq TSS in Extended Data Table 1 required that they be within 500 bp of the TSS, yielding 6,511 nonredundant mRNA genes. Importantly, using a more stringent interval only marginally changed these numbers and did not alter our conclusions. If a gene had >1 TSS, then the TSS nearest to the bound location (peak-pair midpoint) was used as the primary TSS, and other nearby TSSs were considered secondary (Fig. 1f, lower panel).

Motif analysis

At each of these 6,511 promoters, using the MEME suite of tools³⁷, we searched for TATA elements within 80 bp of the midpoint of TFIIB-bound locations on the sense strand, first by searching for the consensus TATAWAWR (Extended Data Table 1), then sequentially for one to three mismatches to the consensus, if an element was not found. In rare cases where multiple elements were found, we chose the one closest to the TFIIB peak. This rule had no qualitative impact on the data since such events were rare and choosing the furthest element gave the same result (not shown). Moreover, peak motif detection for BRE_u, TATA, and INR were not centered over TFIIB, indicating that this distance criteria was not driving the observed motif enrichment at TFIIB locations. Using a similar strategy, we searched for candidate BRE_u element (Supplementary Table 4) within 40 bp upstream of the 5,546 identified TATA elements, and searched for candidate BRE_d and INR elements (Extended Data Table 1) within 40 bp and 60 bp downstream of the 5,546 TATA elements, respectively. At Pol III promoters, candidate BRE_d elements were required to be within 20 bp of a TBP peak pair midpoint, and in the same orientation as the TATA element.

Our searches infrequently picked up multiple motif instances within the search window. Where this did occur, we chose the motif with the best match to the published consensus (not the closest to TFIIB). In the situation where we obtained more than one motif with the same number of mismatches, we chose the one closest to TFIIB. Third, when we discard these multiple occur-rences, the results qualitatively did not change. Fourth, the peak locations that we obtained for BRE_u, TATA, and INR were not centered over TFIIB. Instead they peaked at the canonical location that had been established in the literature. This provided independent validation.

Using a PSPM matrix derived from individual core promoter element (CPE) logos from Figs. 2 and 3 (the matrices and data processing details are presented in Extended Data Table 1), FIMO³⁷ was used to find 37-40 bp sequences within 100 bp of a TFIIB peak pair, and had either a p-value of <10^-4 (thick trace in Fig. 5 and Extended Data Fig. 6) or between 10^-4 and 10^-3 (thin trace). Any CPE <50 bp from a stronger CPE (defined by motif and spacing similarity to the consensus) was eliminated. Distances between the two (TFIIB peak-pair midpoint to consensus BRE_d midpoint, i.e. 13 bp upstream of the CPE 3’ end) were then calculated for those CPE spacing variants listed at the top of Extended Data Fig. 6. Their frequency distribution was then plotted as a 5 bp moving average. Distributions were transformed into enrichment scores by calculating the ratio of occurrences near TFIIB (0-15 bp) to those far from TFIIB (55-70 bp), then log_2-transforming the data.

Supplementary Material

Table S1

NIHMS514597-supplement-Table_S1.xlsx^{(23MB, xlsx)}

Table S2

NIHMS514597-supplement-Table_S2.xlsx^{(21.7MB, xlsx)}

Table S3

NIHMS514597-supplement-Table_S3.xls^{(26.5KB, xls)}

Table S4

NIHMS514597-supplement-Table_S4.xls^{(28.5MB, xls)}

Table S5

NIHMS514597-supplement-Table_S5.xlsx^{(533KB, xlsx)}

One sentence summary.

Widespread coding and noncoding transcription across the human genome arises from discrete transcription initiation complexes assembled at four core promoter elements.

Extended Data Figure 7 — Occupancy levels for TFIIB linked to coding genes (a) and noncoding regions (b) in the indicated cell type were normalized by column. The color scales represent the range of average-centered, log₂ transformed values within each respective column. Detection in all four cell types define Group 1. Groups 2-4 were parsed by k-means clustering. Rows were sorted within groups based on TFIIB occupancy averaged across the four cell types (yellow-black-cyan-gray, denote high, medium, low, and zero occupancy, respectively). For clarity in panel b, TFIIB locations that were detected in only one cell line were excluded from clustering. Columns were hierarchically clustered. The MCF7 dataset had 20-30% of the coverage of other cell lines (reported in *Extended Data Table 1*), which likely accounts for excessive number of zero-occupancy loci (gray).

Acknowledgements

We thank Pindi Albert and Yunfei Li for bioinformatic assistance, and Michael Cousar and Ka-Yim Chan-Salis for experimental support. This work was supported by National Institutes of Health grant GM059055.

Footnotes

Author Contributions B.J.V. performed the experiments and conducted data analyses. B.J.V. and B.F.P. conceived the experiments, analyses, and co-wrote the manuscript.

Author Information Sequencing data have been deposited at the NCBI Sequence Read Archive under accession number SRA067908. The authors declare no competing financial interests. Readers are welcome to comment on the online version of this article at www.nature.com/nature.

Supplementary Information is linked to the online version of the paper at www.nature.com/nature.

REFERENCES

1.Buratowski S, Hahn S, Guarente L, Sharp PA. Five intermediate complexes in transcription initiation by RNA polymerase II. Cell. 1989;56:549–561. doi: 10.1016/0092-8674(89)90578-3. [DOI] [PubMed] [Google Scholar]
2.Lagrange T, Kapanidis AN, Tang H, Reinberg D, Ebright RH. New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA binding by transcription factor IIB. Genes Dev. 1998;12:34–44. doi: 10.1101/gad.12.1.34. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Deng W, Roberts SG. A core promoter element downstream of the TATA box that is recognized by TFIIB. Genes Dev. 2005;19:2418–2423. doi: 10.1101/gad.342405. doi:10.1101/gad.342405. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Juven-Gershon T, Kadonaga JT. Regulation of gene expression via the core promoter and the basal transcriptional machinery. Dev Biol. 2010;339:225–229. doi: 10.1016/j.ydbio.2009.08.009. doi:10.1016/j.ydbio.2009.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Kostrewa D, et al. RNA polymerase II-TFIIB structure and mechanism of transcription initiation. Nature. 2009;462:323–330. doi: 10.1038/nature08548. doi:10.1038/nature08548. [DOI] [PubMed] [Google Scholar]
6.He Y, Fang J, Taatjes DJ, Nogales E. Structural visualization of key steps in human transcription initiation. Nature. 2013 doi: 10.1038/nature11991. doi:10.1038/nature11991. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Ptashne M, Gann A. Transcriptional activation by recruitment. Nature. 1997;386:569–577. doi: 10.1038/386569a0. doi:10.1038/386569a0. [DOI] [PubMed] [Google Scholar]
8.Vannini A, Cramer P. Conservation between the RNA polymerase I, II, and III transcription initiation machineries. Molecular cell. 2012;45:439–446. doi: 10.1016/j.molcel.2012.01.023. doi:10.1016/j.molcel.2012.01.023. [DOI] [PubMed] [Google Scholar]
9.Kim TH, et al. A high-resolution map of active promoters in the human genome. Nature. 2005;436:876–880. doi: 10.1038/nature03877. doi:10.1038/nature03877. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Carninci P, et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet. 2006;38:626–635. doi: 10.1038/ng1789. doi:10.1038/ng1789. [DOI] [PubMed] [Google Scholar]
11.Gilmour DS, Lis JT. RNA polymerase II interacts with the promoter region of the noninduced hsp70 gene in Drosophila melanogaster cells. Mol Cell Biol. 1986;6:3984–3989. doi: 10.1128/mcb.6.11.3984. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Guenther MG, Levine SS, Boyer LA, Jaenisch R, Young RA. A chromatin landmark and transcription initiation at most promoters in human cells. Cell. 2007;130:77–88. doi: 10.1016/j.cell.2007.05.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Kwak H, Fuda NJ, Core LJ, Lis JT. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science. 2013;339:950–953. doi: 10.1126/science.1229386. doi:10.1126/science.1229386. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Kapranov P, Willingham AT, Gingeras TR. Genome-wide transcription and the implications for genomic organization. Nature reviews. Genetics. 2007;8:413–423. doi: 10.1038/nrg2083. doi:10.1038/nrg2083. [DOI] [PubMed] [Google Scholar]
15.Rhee HS, Pugh BF. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell. 2011;147:1408–1419. doi: 10.1016/j.cell.2011.11.013. doi:10.1016/j.cell.2011.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Rhee HS, Pugh BF. Genome-wide structure and organization of eukaryotic pre initiation complexes. Nature. 2012;483:295–301. doi: 10.1038/nature10799. doi:10.1038/nature10799. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Ernst J, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473:43–49. doi: 10.1038/nature09906. doi:10.1038/nature09906. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35:D61–65. doi: 10.1093/nar/gkl842. doi:10.1093/nar/gkl842. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.He Y, Vogelstein B, Velculescu VE, Papadopoulos N, Kinzler KW. The antisense transcriptomes of human cells. Science. 2008;322:1855–1857. doi: 10.1126/science.1163853. doi:10.1126/science.1163853. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Seila AC, et al. Divergent transcription from active promoters. Science. 2008;322:1849–1851. doi: 10.1126/science.1162253. doi:10.1126/science.1162253. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Core LJ, Lis JT. Transcription regulation through promoter-proximal pausing of RNA polymerase II. Science. 2008;319:1791–1792. doi: 10.1126/science.1150843. doi:10.1126/science.1150843. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Fenouil R, et al. CpG islands and GC content dictate nucleosome depletion in a transcription-independent manner at mammalian promoters. Genome Res. 2012;22:2399–2408. doi: 10.1101/gr.138776.112. doi:10.1101/gr.138776.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Rozenberg JM, et al. All and only CpG containing sequences are enriched in promoters abundantly bound by RNA polymerase II in multiple tissues. BMC Genomics. 2008;9:67. doi: 10.1186/1471-2164-9-67. doi:10.1186/1471-2164-9-67. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Sainsbury S, Niesser J, Cramer P. Structure and function of the initially transcribing RNA polymerase II-TFIIB complex. Nature. 2012 doi: 10.1038/nature11715. doi:10.1038/nature11715. [DOI] [PubMed] [Google Scholar]
25.Basehoar AD, Zanton SJ, Pugh BF. Identification and distinct regulation of yeast TATA box-containing genes. Cell. 2004;116:699–709. doi: 10.1016/s0092-8674(04)00205-3. [DOI] [PubMed] [Google Scholar]
26.Singer VL, Wobbe CR, Struhl K. A wide variety of DNA sequences can functionally replace a yeast TATA element for transcriptional activation. Genes Dev. 1990;4:636–645. doi: 10.1101/gad.4.4.636. [DOI] [PubMed] [Google Scholar]
27.Smale ST, Baltimore D. The “initiator” as a transcription control element. Cell. 1989;57:103–113. doi: 10.1016/0092-8674(89)90176-1. [DOI] [PubMed] [Google Scholar]
28.Kim TK, et al. Widespread transcription at neuronal activity-regulated enhancers. Nature. 2010;465:182–187. doi: 10.1038/nature09033. doi:10.1038/nature09033. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Hamada M, Huang Y, Lowe TM, Maraia RJ. Widespread use of TATA elements in the core promoters for RNA polymerases III, II, and I in fission yeast. Mol Cell Biol. 2001;21:6870–6881. doi: 10.1128/MCB.21.20.6870-6881.2001. doi:10.1128/MCB.21.20.6870-6881.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Geiduschek EP, Tocchini-Valentini GP. Transcription by RNA polymerase III. Annual review of biochemistry. 1988;57:873–914. doi: 10.1146/annurev.bi.57.070188.004301. doi:10.1146/annurev.bi.57.070188.004301. [DOI] [PubMed] [Google Scholar]
31.White RJ, Jackson SP. Mechanism of TATA-binding protein recruitment to a TATA-less class III promoter. Cell. 1992;71:1041–1053. doi: 10.1016/0092-8674(92)90398-v. [DOI] [PubMed] [Google Scholar]
32.Carriere L, et al. Genomic binding of Pol III transcription machinery and relationship with TFIIS transcription factor distribution in mouse embryonic stem cells. Nucleic Acids Res. 2012;40:270–283. doi: 10.1093/nar/gkr737. doi:10.1093/nar/gkr737. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Verrijzer CP, Chen JL, Yokomori K, Tjian R. Binding of TAFs to core elements directs promoter selectivity by RNA polymerase II. Cell. 1995;81:1115–1125. doi: 10.1016/s0092-8674(05)80016-9. [DOI] [PubMed] [Google Scholar]
34.Kapranov P, St Laurent G. Dark Matter RNA: Existence, Function, and Controversy. Frontiers in genetics. 2012;3:60. doi: 10.3389/fgene.2012.00060. doi:10.3389/fgene.2012.00060. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Eichler EE, et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nature reviews. Genetics. 2010;11:446–450. doi: 10.1038/nrg2809. doi:10.1038/nrg2809. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Rhee HS, Pugh BF. ChIP-exo method for identifying genomic location of DNA- binding proteins with near-single-nucleotide accuracy. Curr Protoc Mol Biol. 2012 doi: 10.1002/0471142727.mb2124s100. Chapter 21, Unit 21 24, doi:10.1002/0471142727.mb2124s100. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Bailey TL, et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37:W202–208. doi: 10.1093/nar/gkp335. doi:10.1093/nar/gkp335. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Berger MF, et al. Integrative analysis of the melanoma transcriptome. Genome Res. 2010;20:413–427. doi: 10.1101/gr.103697.109. doi:10.1101/gr.103697.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Dunham I, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. doi:10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Djebali S, et al. Landscape of transcription in human cells. Nature. 2012;489:101–108. doi: 10.1038/nature11233. doi:10.1038/nature11233. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. doi:10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Albert I, Wachi S, Jiang C, Pugh BF. GeneTrack--a genomic data processing and visualization framework. Bioinformatics. 2008;24:1305–1306. doi: 10.1093/bioinformatics/btn119. doi:10.1093/bioinformatics/btn119. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1

NIHMS514597-supplement-Table_S1.xlsx^{(23MB, xlsx)}

Table S2

NIHMS514597-supplement-Table_S2.xlsx^{(21.7MB, xlsx)}

Table S3

NIHMS514597-supplement-Table_S3.xls^{(26.5KB, xls)}

Table S4

NIHMS514597-supplement-Table_S4.xls^{(28.5MB, xls)}

Table S5

NIHMS514597-supplement-Table_S5.xlsx^{(533KB, xlsx)}

[R1] 1.Buratowski S, Hahn S, Guarente L, Sharp PA. Five intermediate complexes in transcription initiation by RNA polymerase II. Cell. 1989;56:549–561. doi: 10.1016/0092-8674(89)90578-3. [DOI] [PubMed] [Google Scholar]

[R2] 2.Lagrange T, Kapanidis AN, Tang H, Reinberg D, Ebright RH. New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA binding by transcription factor IIB. Genes Dev. 1998;12:34–44. doi: 10.1101/gad.12.1.34. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Deng W, Roberts SG. A core promoter element downstream of the TATA box that is recognized by TFIIB. Genes Dev. 2005;19:2418–2423. doi: 10.1101/gad.342405. doi:10.1101/gad.342405. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Juven-Gershon T, Kadonaga JT. Regulation of gene expression via the core promoter and the basal transcriptional machinery. Dev Biol. 2010;339:225–229. doi: 10.1016/j.ydbio.2009.08.009. doi:10.1016/j.ydbio.2009.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Kostrewa D, et al. RNA polymerase II-TFIIB structure and mechanism of transcription initiation. Nature. 2009;462:323–330. doi: 10.1038/nature08548. doi:10.1038/nature08548. [DOI] [PubMed] [Google Scholar]

[R6] 6.He Y, Fang J, Taatjes DJ, Nogales E. Structural visualization of key steps in human transcription initiation. Nature. 2013 doi: 10.1038/nature11991. doi:10.1038/nature11991. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Ptashne M, Gann A. Transcriptional activation by recruitment. Nature. 1997;386:569–577. doi: 10.1038/386569a0. doi:10.1038/386569a0. [DOI] [PubMed] [Google Scholar]

[R8] 8.Vannini A, Cramer P. Conservation between the RNA polymerase I, II, and III transcription initiation machineries. Molecular cell. 2012;45:439–446. doi: 10.1016/j.molcel.2012.01.023. doi:10.1016/j.molcel.2012.01.023. [DOI] [PubMed] [Google Scholar]

[R9] 9.Kim TH, et al. A high-resolution map of active promoters in the human genome. Nature. 2005;436:876–880. doi: 10.1038/nature03877. doi:10.1038/nature03877. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Carninci P, et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet. 2006;38:626–635. doi: 10.1038/ng1789. doi:10.1038/ng1789. [DOI] [PubMed] [Google Scholar]

[R11] 11.Gilmour DS, Lis JT. RNA polymerase II interacts with the promoter region of the noninduced hsp70 gene in Drosophila melanogaster cells. Mol Cell Biol. 1986;6:3984–3989. doi: 10.1128/mcb.6.11.3984. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Guenther MG, Levine SS, Boyer LA, Jaenisch R, Young RA. A chromatin landmark and transcription initiation at most promoters in human cells. Cell. 2007;130:77–88. doi: 10.1016/j.cell.2007.05.042. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Kwak H, Fuda NJ, Core LJ, Lis JT. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science. 2013;339:950–953. doi: 10.1126/science.1229386. doi:10.1126/science.1229386. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Kapranov P, Willingham AT, Gingeras TR. Genome-wide transcription and the implications for genomic organization. Nature reviews. Genetics. 2007;8:413–423. doi: 10.1038/nrg2083. doi:10.1038/nrg2083. [DOI] [PubMed] [Google Scholar]

[R15] 15.Rhee HS, Pugh BF. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell. 2011;147:1408–1419. doi: 10.1016/j.cell.2011.11.013. doi:10.1016/j.cell.2011.11.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Rhee HS, Pugh BF. Genome-wide structure and organization of eukaryotic pre initiation complexes. Nature. 2012;483:295–301. doi: 10.1038/nature10799. doi:10.1038/nature10799. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Ernst J, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473:43–49. doi: 10.1038/nature09906. doi:10.1038/nature09906. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35:D61–65. doi: 10.1093/nar/gkl842. doi:10.1093/nar/gkl842. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.He Y, Vogelstein B, Velculescu VE, Papadopoulos N, Kinzler KW. The antisense transcriptomes of human cells. Science. 2008;322:1855–1857. doi: 10.1126/science.1163853. doi:10.1126/science.1163853. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Seila AC, et al. Divergent transcription from active promoters. Science. 2008;322:1849–1851. doi: 10.1126/science.1162253. doi:10.1126/science.1162253. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Core LJ, Lis JT. Transcription regulation through promoter-proximal pausing of RNA polymerase II. Science. 2008;319:1791–1792. doi: 10.1126/science.1150843. doi:10.1126/science.1150843. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Fenouil R, et al. CpG islands and GC content dictate nucleosome depletion in a transcription-independent manner at mammalian promoters. Genome Res. 2012;22:2399–2408. doi: 10.1101/gr.138776.112. doi:10.1101/gr.138776.112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Rozenberg JM, et al. All and only CpG containing sequences are enriched in promoters abundantly bound by RNA polymerase II in multiple tissues. BMC Genomics. 2008;9:67. doi: 10.1186/1471-2164-9-67. doi:10.1186/1471-2164-9-67. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Sainsbury S, Niesser J, Cramer P. Structure and function of the initially transcribing RNA polymerase II-TFIIB complex. Nature. 2012 doi: 10.1038/nature11715. doi:10.1038/nature11715. [DOI] [PubMed] [Google Scholar]

[R25] 25.Basehoar AD, Zanton SJ, Pugh BF. Identification and distinct regulation of yeast TATA box-containing genes. Cell. 2004;116:699–709. doi: 10.1016/s0092-8674(04)00205-3. [DOI] [PubMed] [Google Scholar]

[R26] 26.Singer VL, Wobbe CR, Struhl K. A wide variety of DNA sequences can functionally replace a yeast TATA element for transcriptional activation. Genes Dev. 1990;4:636–645. doi: 10.1101/gad.4.4.636. [DOI] [PubMed] [Google Scholar]

[R27] 27.Smale ST, Baltimore D. The “initiator” as a transcription control element. Cell. 1989;57:103–113. doi: 10.1016/0092-8674(89)90176-1. [DOI] [PubMed] [Google Scholar]

[R28] 28.Kim TK, et al. Widespread transcription at neuronal activity-regulated enhancers. Nature. 2010;465:182–187. doi: 10.1038/nature09033. doi:10.1038/nature09033. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Hamada M, Huang Y, Lowe TM, Maraia RJ. Widespread use of TATA elements in the core promoters for RNA polymerases III, II, and I in fission yeast. Mol Cell Biol. 2001;21:6870–6881. doi: 10.1128/MCB.21.20.6870-6881.2001. doi:10.1128/MCB.21.20.6870-6881.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Geiduschek EP, Tocchini-Valentini GP. Transcription by RNA polymerase III. Annual review of biochemistry. 1988;57:873–914. doi: 10.1146/annurev.bi.57.070188.004301. doi:10.1146/annurev.bi.57.070188.004301. [DOI] [PubMed] [Google Scholar]

[R31] 31.White RJ, Jackson SP. Mechanism of TATA-binding protein recruitment to a TATA-less class III promoter. Cell. 1992;71:1041–1053. doi: 10.1016/0092-8674(92)90398-v. [DOI] [PubMed] [Google Scholar]

[R32] 32.Carriere L, et al. Genomic binding of Pol III transcription machinery and relationship with TFIIS transcription factor distribution in mouse embryonic stem cells. Nucleic Acids Res. 2012;40:270–283. doi: 10.1093/nar/gkr737. doi:10.1093/nar/gkr737. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Verrijzer CP, Chen JL, Yokomori K, Tjian R. Binding of TAFs to core elements directs promoter selectivity by RNA polymerase II. Cell. 1995;81:1115–1125. doi: 10.1016/s0092-8674(05)80016-9. [DOI] [PubMed] [Google Scholar]

[R34] 34.Kapranov P, St Laurent G. Dark Matter RNA: Existence, Function, and Controversy. Frontiers in genetics. 2012;3:60. doi: 10.3389/fgene.2012.00060. doi:10.3389/fgene.2012.00060. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Eichler EE, et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nature reviews. Genetics. 2010;11:446–450. doi: 10.1038/nrg2809. doi:10.1038/nrg2809. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] 36.Rhee HS, Pugh BF. ChIP-exo method for identifying genomic location of DNA- binding proteins with near-single-nucleotide accuracy. Curr Protoc Mol Biol. 2012 doi: 10.1002/0471142727.mb2124s100. Chapter 21, Unit 21 24, doi:10.1002/0471142727.mb2124s100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Bailey TL, et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37:W202–208. doi: 10.1093/nar/gkp335. doi:10.1093/nar/gkp335. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Berger MF, et al. Integrative analysis of the melanoma transcriptome. Genome Res. 2010;20:413–427. doi: 10.1101/gr.103697.109. doi:10.1101/gr.103697.109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Dunham I, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. doi:10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Djebali S, et al. Landscape of transcription in human cells. Nature. 2012;489:101–108. doi: 10.1038/nature11233. doi:10.1038/nature11233. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. doi:10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Albert I, Wachi S, Jiang C, Pugh BF. GeneTrack--a genomic data processing and visualization framework. Bioinformatics. 2008;24:1305–1306. doi: 10.1093/bioinformatics/btn119. doi:10.1093/bioinformatics/btn119. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

This article has been retracted.

Genomic organization of human transcription initiation complexes

Bryan J Venters

B Franklin Pugh

Abstract

TBP/TFIIB separation from paused Pol II

Extended Data Figure 1. Validation of ChIP-exo data and association with ENCODE annotated regions.

Figure 1. Transcription machinery organization at human mRNA promoters.

Extended Data Figure 2. Distribution of the TFIIB/TBP/Pol II in CpG islands that overlap mRNA TSSs.

BREu TATA, BREd and INR are common

Figure 2. TATA elements at most mRNA genes.

Extended Data Figure 3. Properties of core promoter elements associated with RefSeq genes.

Figure 3. BRE and INR at most mRNA genes.

150,000 noncoding initiation complexes

Extended Data Table 1. Illumina sequencing statistics.

Extended Data Figure 4. Core promoter elements at noncoding loci bound by TFIIB.

Figure 4. Noncoding TFIIB locations have chromatin marks and non-polyadenylated RNA.

Extended Data Figure 5. Enrichment of different RNA fractions at 159,117 TFIIB locations throughout the human genome.

Restricted motif spacing in promoters

Extended Data Figure 6. TFIIB-core promoter distances.

Figure 5. Restricted spacing of core promoter elements.

tRNA genes have TATA and BRE

Figure 6. TATA and BRE elements at most tRNA genes.

Consolidated genomic view of initiation

METHODS

Cell Culture

ChIP-exo and Antibodies

Alignment to Genome, Peak Calling, and Data Access

Motif analysis

Supplementary Material

One sentence summary.

Extended Data Figure 7. Promoter complexes across cancer cell lines.

Acknowledgements

Footnotes

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

BRE_u TATA, BRE_d and INR are common