Abstract
The assembly and expression of mouse antigen receptor genes is controlled by a collection of cis-acting regulatory elements, including transcriptional promoters and enhancers. Although many powerful enhancers have been identified for immunoglobulin (Ig) and T cell receptor (Tcr) loci, it remained unclear whether additional regulatory elements remain undiscovered. Here, we use chromatin profiling of pro-B cells to define 38 epigenetic states in mouse antigen receptor loci, each of which reflects a distinct regulatory potential. One of these chromatin states corresponds to known transcriptional enhancers and identifies a new set of candidate elements in all three Ig loci. Four of the candidates were subjected to functional assays and all four exhibit enhancer activity in B but not in T lineage cells. The new regulatory elements identified by focused chromatin profiling likely have important functions in the creation, refinement, and expression of Ig repertoires.
Introduction
Many of the strategies employed by developing lymphocytes to regulate gene expression share features with mechanisms that control the stepwise assembly of antigen receptor (AgR) loci (1). Both processes require highly orchestrated interfacing between cis-regulatory elements, transcription factors, covalent modification of histones, changes in chromatin accessibility, and recruitment of machinery that drives transcription or recombination. In AgR loci, enhancer and promoter elements play crucial roles in modulating chromatin associated with variable (V), diversity (D), and joining (J) segments to control their recombination potential at each stage of lymphocyte development (2). Accordingly, most of the cis-elements associated with AgR loci are lineage- and stage-specific.
In addition to classical enhancers, recent studies identified a novel class of elements, termed “superenhancers”, which are thought to regulate the expression of genes that serve as primary determinants of cell identity (3, 4). Superenhancers are focal points for lineage-specifying transcription factors and for the ubiquitous mediator complex, which is required for activator-dependent gene expression. Moreover, superenhancers are centered within unusually large stretches of activating histone modifications, such as acetylation of histone H3 at the lysine 27 position (H3K27Ac). Three regions harboring superenhancers have been identified within the Igh locus(3). However, the collection of cis-elements, known as the cistrome, which govern AgR gene assembly and expression during the early stages of lymphocyte development remains incomplete. Here, we identify novel enhancers within all three Ig loci, which exhibit activity specific for precursor B-lymphocytes, using focused computational analyses of publically available and new chromatin data.
Materials and Methods
Data collection and processing
We considered 16 different epigenetic modifications that can be classified into four groups: 1) histone modifications (H3K4me1, H3K4me2, H3K4me3, H3K27ac, H3K27me3, H3K36me3, H3K9ac/K14ac), 2) key transcription factors (p300, PU.1, Med1, c-Myc, Rad21, CTCF, EBF, E2A, Pax5), 3) nucleosome-poor, transcribed regions (DNase I hypersensitivity (DHS) and RNA Pol II occupancy), and 4) mature transcriptional signal from RNA-Seq experiments. Fourteen genome-wide experiments were available in public databases. For RNA Pol II and H3K27ac, new chromatin immunoprecipitation (ChIP) analyses were performed on a custom-made microarray covering all AgR loci (ChIP-Chip, see below). Supplemental Table S1 summarizes the sources of all experimental data.
All ChIP-Seq and DHS experiments were processed starting from SRA files. The binary SRA archives were converted into FASTQ files using the SRA toolkit, then aligned with Bowtie (5) (version 0.12.7) using “-m 1 -v3 --best --strata” options. The resulting alignment SAM file was converted into read BED files using BEDTools. RNA-Seq data were aligned with TopHat (6) (version 1.4.1.1) using “--prefilter-multihits --max-multihits 15 --segment-length 20” options, and GenBank annotated mRNAs as an alignment reference (--GTF option).
Peak calling
We applied the SICER (7) (v1.1) algorithm to reads BED files and call peaks for all ChIP-Seq and DHS experiments. We used settings for narrow peaks (200 bp window size, 200 bp gap size, and FDR of 0.01) in all cases except for H3K27me3 and H3K36me3, which have broad signal distributions (200 bp window size, 600 bp gap size, and 0.01 FDR). Peak identification for RNA-Seq was performed by transcriptome assembly with Cufflinks (8) using no reference transcriptome, and exons of assembled transcripts with FPKM >2 were considered as peaks. For ChIP-Chip experiments, peak calling was performed with MA2C (9) using a p-value of 0.01.
Genome segmentations
BED files obtained after peak calling were binarized using BEDTools (10). Genome-wide files were prepared with 200 bp windows and the overlaps of peak BED files and window files were calculated. If overlap constituted more than 50%, the bin was assigned 1. The exact regions of mouse genome (mm9 assembly) that were used for the analysis of AgR loci: chr6: 40838000 - 40845000, chr6: 40986000 - 41250000, chr6: 41476000 - 41555000 (Tcrb), chr13 : 19245000 - 19449000 (Tcrg), chr14 : 52962000 - 54855000 (Tcra/d), chr12 : 114435000 - 117280000 (Igh), chr6 : 67490000 - 70715000 (Igk), chr16 : 18971000 - 19285000 (Igl). Values for the genome outside of AgR boundaries were automatically set to 0, thus excluding all conventional genes from the segmentation. The resulting binarized input was then used in ChromHMM segmentation software (v1.10) (11) to generate hidden Markov models with the number of states ranging from 20 to 40, generating emission and transition probabilities, as well as segmentation BED files and HTML output. Corresponding BED files are available online at https://artyomovlab.wustl.edu/publications/supp_materials/AgR_2013/.
Conservation analysis
Individual states were overlapped with phyloP30WayPlacental track from UCSC table browser (downloadable as a WIG file; a complete description of how the conservation score was generated is provided at http://hgdownload-test.cse.ucsc.edu/goldenPath/mm9/multiz30way/multiz30way.html), and maximum conservation score for each interval was obtained using an in-house script by picking the highest value within each genomic interval. After this, the average maximum conservation score was calculated for each state by summing individual scores and dividing them by the number of intervals in the state.
ChIP-Chip experiments
Pro-B cells from RAG-deficient mice (C57BL/6, 4-6 weeks) were purified using MACS in conjunction with CD19 microbeads (Miltenyi Biotec, CA). ChIP experiments for H3K27ac and RNA Pol II were performed as described (12) using the following antibodies: H3K27ac (Abcam, ab4729) and Pol II (Abcam, ab5131). ChIP-DNA was purified using a Qiagen kit and subjected to whole genome amplification (Sigma, MO), labeled, and hybridized to custom Nimblegen microarrays according to the manufacturer's protocol by Mogene Inc., St. Louis. Total input DNA was used as the hybridization control.
Luciferase assays
The following cell lines were used: P5424 (RAG-1-/-, p53-/- pro-T cell line), 63-12 (RAG-2-/- A-MuLV transformed pro-B cell line), and J558L (B myeloma cell line). All cell lines were cultured at 37°C with 5% CO2 in RPMI 1640 supplemented with 10% FCS, 2mM L-Glutamine, 1% Penicillin/Streptomycin and 50uM b-mercaptoethanol. For transient transfection, cells were centrifuged at 100g for 5 minutes at room temperature, resuspended in serum-free RPMI 1640 at 107/ml. After this, 3×106 cells were mixed with 3ug respective Firefly plasmid and 30ng Renilla control plasmid pRL-CMV (Promega), electroporated at 250V/960uF, transferred into 5ml pre-incubated media and cultured for 24 hours. Then Firefly and Renilla activities were measured using a dual assay kit, and the fold changes were calculated following the technical manual (Promega E2920).
Candidate regulatory elements were amplified using PCR. A full list of cloning primers is provided in Supplemental Table S2. The Igl enhancers Eλ24 and Eλ31 were amplified and cloned into the Bam HI site of pGL3 (Promega, WI). The regions of interest upstream of canonical enhancers, denoted λRE1, λRE2, and λRE3, were cloned individually in the Sal I site of pGL3. The Vλ1 promoter was cloned into the Xho I/Hind III sites of the Igλ enhancer-containing luciferase constructs. The hRE1, hRE2, and κRE1 regions were amplified and blunt-end cloned into the Bam HI site of pGL3-Promoter, which contains an SV40 promoter. Cells were co-transfected with a Renilla expression plasmid for normalization and analyzed as described previously(12).
Results
Epigenetic Landscapes of AgR Loci
Genome-wide patterns of histone modifications have been characterized for numerous cell types using chromatin immunoprecipitation (ChIP), followed by high-throughput sequencing (ChIP-Seq) (13). Bioinformatic integration of these data has emerged as a powerful method for the functional assignment of genomic regions, including the identification of promoters, enhancers, and microRNA sites (14-16). However, histone modifications can play additional, specialized roles at genetic loci. For example, the H3K4me3 modification is a hallmark of active promoters but, at AgR loci, this epigenetic mark also enhances binding to RAG-2, an essential component of the V(D)J recombination machinery (17). The specialized roles of histone modifications at certain loci may produce unique epigenetic patterns that are impossible to unravel with supervised segmentation methods. To decipher such novel patterns, unsupervised algorithms have been used (11, 18). For example, the epigenome of CD4+ T lymphocytes was segmented into 53 functional states, including active and repressed promoters, enhancers, and gene bodies (19). These approaches rely on statistical enrichment of specific combinations of chromatin marks throughout the genome to identify reproducible patterns. However, because genes represent the major organizational unit of the genome, the most robust patterns identified with current approaches correspond to promoter/gene bodies.
The unique segmented organization of AgR loci, coupled with genome-scale statistical analyses, could potentially mask AgR-specific patterns that are rare or non-existent in the remaining epigenome. To circumvent these potential complications in our search for new regulatory elements, we restricted combinatorial analysis of chromatin features to data covering only the seven AgR loci (Igh, Igk, Igl, Tcra/d, Tcrb, and Trcg). All data sets were obtained from purified pro-B cells harboring germline AgR loci (RAG-deficient), with the exceptions of Med1 and PU.1 association, which correspond to ChIP-Seq data from a transformed pro-B cell line (3).
We first calculated the coverage of individual features at the seven AgR loci (histone modifications, factor binding, transcription) compared with the entire genome. As shown in Fig. 1, the epigenetic landscape of AgR loci is distinguished from the rest of the genome in several important respects. First, Ig and Tcr loci display a much lower density of the repressive H3K27me3 modification in pro-B cells relative to the entire genome. The dearth of this epigenetic mark suggests that Polycomb-mediated repression is less pronounced at AgR loci, even when a locus is silent for transcription/recombination. Second, the density of H3K36me3, a modification associated with transcriptional elongation, as well as transcripts themselves (RNA-Seq), are substantially decreased in AgR loci relative to the whole genome. This finding likely reflects the predominance of gene segments, rather than conventionally expressed genes in AgR loci, as well as the limited amounts of transcription arising from the four Tcr loci in pro-B cells. Third, despite lower overall transcription, signals for the mediator component, Med1, and the transcription factors PU.1 and E2A are increased several-fold relative to the entire epigenome, suggesting a higher density of regulatory sites, potentially corresponding to enhancers. Finally, transcription factors c-Myc and EBF, which have important functions in pro-B cells, show substantially lower peak densities compared to the whole genome. This implies that the binding sites for these factors are mostly located outside of AgR loci. Overall, these initial analyses indicate that the distribution of important chromatin features in AgR loci differs substantially from the remainder of the genome. Therefore, identification of novel regulatory regions within AgR loci will benefit from a more focused computational analysis of chromatin states tailored to these regions.
Figure 1.
Unique epigenetic characteristics of mouse antigen receptor (AgR) loci. The y-axis represents the ratio of DNA space covered by a feature within AgR loci relative to the entire genome. A value of 1.0 corresponds to an equal distribution of that chromatin feature in AgR loci and the entire genome. Seventeen genome-wide ChIP-Seq, RNA-Seq, and DHS experiments were incorporated into the computational analyses.
Chromatin Profiling of AgR Loci in Pro-B Cells
The complex structure of Ig and Tcr loci requires advanced computational analysis to identify major AgR-specific chromatin patterns. For this purpose, we applied the ChromHMM algorithm (11), focusing on only the seven AgR loci. The resulting chromatin states may then be used to identify active and poised regulatory elements in an unbiased manner. Briefly, ChromHMM utilizes a hidden Markov model that captures two types of information -- the co-occurrence frequency of individual features at either the same location (emission probabilities) or at adjacent locations (transition probabilities) -- yielding patterns of chromatin features defined as characteristic “states”.
In total, we considered 19 distinct chromatin features in pro-B cells for this analysis (Fig. 2A), derived from published or new data sets, including histone modifications, key transcription factors, nucleosome density, and transcription (Supplemental Table S1). Using ChromHMM, we compared individual emission probabilities for models in which the combinatorial number of states was varied from 20 to 40 and found that 38 states optimally described the epigenetic landscape of AgR loci. A higher model dimensionality produced redundancies, whereas distinct states, corresponding to active or poised regulatory elements, were merged when fewer than 38 were considered. Each state in the model corresponds to either a single feature or combinations of features, yielding an unbiased description of the AgR epigenetic landscape. A full list of states in the optimized model can be found in Supplemental Table S3. The probabilistic relationship between chromatin features and an individual state is important to note. For example, state 13 is defined by simultaneous presence of H3K36me3, H3Ac, DNAse, E2A, Med1 and some other marks, all with probabilities of nearly 1 (dark blue, Fig. 2A), indicating that all state 13 regions have these chromatin features. However, the probability of observing p300 in state 13 is intermediate (0.4, light blue), reflecting the fact that only some state 13 regions associate with p300 (e.g. Eμ), while others do not.
Figure 2.
Unbiased characterization of the AgR epigenetic landscape. (A) The 38-state model of chromatin for AgR loci in pro-B cells. The hidden Markov model was based on the distribution of 19 chromatin features over all seven AgR loci: 17 genome-wide features shown on Fig. 1 were narrowed to AgR and two features, Pol II and H3K27Ac, profiled by ChIP-Chip of AgR loci. The shades of blue represent numerically determined emission probabilities that range from 0.0 to 1.0 and describe the precise “composition” of each state, or the probability to find a certain chromatin mark or transcription factor in the region defined as a particular state. A mark is considered “enriched” in a particular state if its emission probability in the model is >0.50. (B) Distribution across AgR loci for states with the highest regulatory potentials. Pie charts are scaled to the total numbers of regions corresponding to each state. Significant enrichment of these chromatin states is observed for Ig loci, suggesting lineage-specific activities for these regulatory states. (C) Enrichment of individual states for specific AgR elements. Each chromatin state was evaluated for its composition with respect to the indicated elements (± 500 bp from their annotated borders). Shades of blue correspond to hypergeometric probability of enrichment compared to random distribution across the entire collection of elements.
For the 38-state model (Fig. 2A), AgR chromatin can be divided broadly into 3 non-redundant categories. The first category includes twelve chromatin states that are defined by the presence of a single feature, such as H3K4me1, H3K27me3 or Pax5, suggesting a limited regulatory potential for these regions in pro-B cells (states 15, 29, and 34, respectively). A second category corresponds to states associated with only two or three chromatin features, which may reflect a partially active, or poised, configuration (e.g., states 1, 2, 20, and 26). Finally, six of the 38 chromatin states show strong enrichment for multiple histone modifications or other features of active chromatin (states 3, 4, 5, 7, 8, and 13). Regions assigned to these chromatin states likely harbor active regulatory elements since they are also nucleosome poor (DHS peaks) and have other modifications that characterize promoters or enhancers (H3K4me3, H3ac etc). Notably, state 4 is characterized by its robust enrichment for 10 chromatin features (hereafter, enrichment indicates that the probability for a feature is >0.5), including association with the transcription factors p300, PU.1, and Med1. Given these characteristics, regions within AgR loci categorized as state 4 likely encompass cis elements with a high regulatory potential. Indeed, the chromatin states most highly enriched for activation features (states 3, 4, 5, and 13) are predominantly localized to Ig loci, particularly to Igh, which are more active (or poised) in pro-B cells compared with Tcr (Fig. 2B). Together, focused epigenetic analysis of AgR loci defines a unique set of chromatin states, some of which likely reflect functionality in the context of gene regulation and recombination.
Chromatin State Functions
To garner functional insights, we first assessed whether classes of known AgR elements segregate into different chromatin states. As shown in Fig. 2C, we parsed the AgR loci into seven functional categories corresponding to the following annotated regions (± 500 bp): (i) Igh V segments (including upstream promoters), (ii) Igk V segments plus promoters, (iii) all D segments (Ig and Tcr), (iv) all J segments, (v) all constant regions, (vi) all known enhancers, and (vii) Pax5-activated intergenic repeat (PAIR) elements, a set of promoters that direct anti-sense transcription within specific regions of the Igh V cluster (20, 21). Strikingly, most of the annotated AgR regions segregate from one another into a small number of individual chromatin states, likely reflecting the relationships between epigenetic features and their functionality. For example, Igh V regions, whose associated promoters exhibit varying degrees of activity in pro-B cells (22), belong to five different chromatin states. Conversely, Igk V regions belong to only two states (21 and 22), distinct from those of Igh Vs, which display a highly restricted set of chromatin features, presumably reflecting their poised status in pro-B cells (H3K4me2 and PU.1). Moreover, most of the PAIR anti-sense promoters belong to chromatin state 3, which recapitulates known features of these regulatory elements, including their simultaneous association with Pax5, CTCF, and Rad21 (Figs. 2A and 2C). Most notably, eight of the twelve known AgR enhancers belong to chromatin state 4, which is most enriched for activating features (Fig. 3 and see below). One exception, Eμ, belongs to state 13 (Fig. 3A), likely because of its dual function as a strong promoter. A second exception, Eγ4, should be excluded from consideration since its epigenetic profile is masked by a proximal gene (Stard3nl) that is highly expressed in pro-B cells (Fig. 3F).
Figure 3.
Chromatin states for selected regions of Ig and Tcr loci. Tracks for the indicated epigenetic features (ChIP-Seq) or transcription (RNA-Seq) as visualized in the IGV browser. Annotations for known elements and their corresponding chromatin states are shown in the bottom two tracks. Genomic coordinates are shown above the tracks (build mm9). State 13 is characteristic of actively transcribed elements (highlighted in light green) and harbors two enhancers, Eμ (A) and Eγ4 (F). State 4, which is characterized by a lack of transcription, the presence of activating chromatin marks, and binding by E2A, Pax5, PU.1, p300, and Med1, coincides with most known AgR enhancers, including 3′Eκ (B), Eβ (C), Eα (D), and Eγ2 (E). Regions identified as chromatin state 4 are highlighted in gray.
Notwithstanding, the vast majority of known AgR enhancers, whether active (3′RR, Fig. 5A) or inactive in pro-B cells (Eβ, Eα, Eγ2, 3′Eκ, Eλ31, Eλ24, Figs. 3 and 4), were assigned to state 4 using this unbiased analysis. Based on the aforementioned characteristics, chromatin state 4 provides a focused set of candidates for novel regulatory elements in the AgR cistrome. Overall, state 4 is limited to 41 segments, spanning 42 kb of the 9.2 Mb that encompasses all AgR loci (∼0.4%). The priority status of state 4 as an identifier of enhancers is further supported by its enrichment for the p300 histone acetyltransferase, which is considered to be a general feature of enhancers, as well as its enrichment in three key transcription factors for pro-B cell gene expression programs (PU.1, E2A, and Pax5). Accordingly, most of the putative cis-elements belonging to state 4 are situated in active (46% in Igh) or poised loci (36% in Igk and Igl), while only 17% of such regions are located in the four Tcr loci (Fig 2B). Additionally, most of the state 4 regions are well-conserved at the level of DNA sequence (average conservation score is 0.682 for 4, see Methods and Supporting Tables S3 and S4). We conclude that chromatin state four, which encompasses the most of the known AgR enhancers whether active, poised, or inactive in pro-B cells, will provide a rich source of candidate elements for functional analyses.
Figure 5.
Functional definition of a novel Igh superenhancer. (A) Tracks for the indicated chromatin features as visualized in the IGV browser (see Fig. 2). Chromatin states 3 (black), 4 (blue), 5 (green), and 13 (lavender) are shown in the bottom four tracks. The location of candidate enhancer elements hRE1 and hRE2 are highlighted as red boxes. (B) The distribution of Med1 peaks (identified using SICER) in pro-B cells by width (left panel) and read count-to-width ratio (right panel). Arrows indicate positions of Med1 peaks overlapping the three superenhancer regions within Igh, highlighting their extreme breadth (left panel) and read densities (right panel). (C) Luciferase data for hRE1 and hRE2 as described in Fig. 3C.
Figure 4.
Identification and functional validation of novel Ig light chain enhancers. (A, B) Tracks for the indicated chromatin features as visualized in the IGV browser (see Fig. 2). Candidate enhancer regions identified as chromatin state 4 are highlighted in gray. Locations of tested fragments, κRE1 in Igk (A), λRE1, λRE2 and λRE3 (B), are shown as red blocks. The known Eλ enhancer elements are indicated as green boxes. (C) Luciferase data for κRE1 enhancer activity in lymphocytes. Reporter plasmids containing combinations of the SV40 promoter (pSV40), SV40 enhancer (ESV40), κRE1, or lacking control elements (P-E-) were tested in the following cell lines: 63-12 pro-B cells (red bars), J558L plasmacytoma (blue bars), and P5424 pro-T cells (gray bars). All data are normalized for transfection efficiency and presented relative to pSV40 activity, which is set to 1. Representative data from at least two biological replicates are shown for all luciferase data. (D) Luciferase data for the indicated combinations of regulatory elements as described in (C).
Characterization of Novel Enhancers in Ig Light Chain Loci
Leveraging the predictive power of our AgR chromatin analyses, we selected three state 4 regions from Igk or Igl for functional assays. These light chain loci exhibit only modest transcriptional activity in primary pro-B cells and mostly likely reside in a “poised” chromatin configuration (23). We first tested a state 4 element, situated in the large cluster of Vκ gene segments, for potential enhancer function (Fig. 4A, κRE1). Expression of luciferase reporters harboring the SV40 promoter was robustly augmented (7-fold) in a pro-B cell line (63-12) upon inclusion of κRE1 (Fig. 4C). In contrast, this region was devoid of enhancer activity when tested in pro-T or plasma cell lines (P5424 and J558L, respectively). Thus, the new Igk cis-element is a stage- and cell type-specific enhancer, suggesting a role in controlling the recombination potential of some mouse Vκ gene segments.
The mouse Igl locus has two highly conserved enhancers, termed Eλ13 and Eλ24, located distal to each of the VλJλ cassettes (24). In addition, we identified two state 4 regions lying even more distal to the VλJλ cassettes (Fig. 4B, λRE1 and λRE3), which are highly conserved (1.054 and 0.706, respectively). These regions may represent “shadow” enhancers, which are suspected to serve as booster or redundancy elements for the regulation of many genes (25). Indeed, in conjunction with the Vλ1 promoter, each of these regions augment reporter gene expression in pro-B and plasma cells, but not in a mouse pro-T cell line (Fig. 4D). In the J558L plasmacytoma, λRE1 also boosts the function of its nearby enhancer, Eλ31, in an additive manner. As a control, the λRE2 region (Fig. 4B), which associates with PU.1 but identifies with chromatin states 20 and 21, fails to augment reporter gene expression in either pro-B or plasma cells (Fig. 4D). Taken together, assignment as chromatin state 4 accurately predicts the location of novel enhancers in Ig light chain loci.
Characterization of a Superenhancer in Igh
Transcription, V(D)J recombination, and Igh class switching are controlled by a set of enhancers and promoters, most of which have presumably been uncovered (26). Our chromatin analysis assigned the two classical Igh enhancers as state13 (Eμ and hs4) (Fig. 5A). Several independent enhancers and a CTCF-rich region, together termed as 3′RR have distinct, but important functions in the B cell lineage, including transcription and recombination of adjacent DHJH gene segments (Eμ) and control of class-switch recombination (Eμ and 3′RR) (27). Notably, 3′RR-proximal region identified as state 4 is also enriched in CTCF (Fig.5A), consistent with its role as insulator, yet indicating possible more complex role played by this region. A third stretch within Igh, embedded between Cγ1 and Cγ2b, was recently described as a superenhancer (3). Young and colleagues define superenhancers using several parameters, including an exaggerated intensity of Med1 and PU.1 binding relative to other ChIP-Seq peaks, implicating these regions as key regulatory elements controlling cell identity genes (3). To identify superenhancers, the authors find regions of overlap for “master” transcription factors, such as PAX5 and PU.1 in pro-B cells, which also co-localize with the most intense and broadest peaks for the general transcription factor, Med1. Accordingly, we performed an unbiased analysis of Med1 distributions using SICER (7), which allows accurate peak-calling for broad chromatin features (see Materials and Methods). We discovered that the 3′RR, Eμ, and the new superenhancer, heretofore called Igh-SE, all appear as outliers in both width and read density for Med1, when compared with the entire epigenome (Fig. 5B).
Importantly, our focused AgR chromatin analysis splits the Igh-SE into two active regions that belong to states 4 and 5 (Fig. 5A, hRE1 and hRE2, respectively). As shown in Fig. 5C, only the hRE1 region functions as an enhancer in pro-B cells when monitored by luciferase reporters, but is devoid of enhancer activity in pro-T or plasma cell lines. The other region, hRE2, belongs to state 5 and likely corresponds to the Cγ2b germline promoter, which is active in pro-B cells based on its enrichment for H3K4me3 and the presence of sterile Iγ2b transcripts (Fig. 5A). The new hRE1 enhancer region is highly conserved (0.723) and interacts physically with other Igh regulatory elements (26), strongly suggesting an important, but unknown function during the early stages of B cell development.
Discussion
We have used tailored computational approaches to assign chromatin states throughout all seven AgR loci in pro-B cells. Although the functional significance of many chromatin states remains to be defined, state 4 was found to accurately predict sites corresponding to AgR regulatory regions, both known and novel. The set of potential regulatory regions identified by state 4 also includes AgR enhancers that are inactive or only poised in pro-B cells (e.g., Eβ and Eλ24, respectively), broadening the scope of this chromatin-guided approach for enhancer discovery. Indeed, all state 4 regions tested in this study (4/4), which were derived from each of the three Ig loci, have enhancer activity in pro-B cells.
Strikingly, the only tested region that was found to be inactive, hRE2, was assigned to a separate chromatin state that is enriched for Igh V region promoters (state 5). This region corresponds to the germline Cg2b promoter (located near the hRE1 enhancer), which is clearly active in pro-B cells judging from RNA-seq and ChiP-seq data (H3K4me3). Indeed, recent studies have shown that pro-B cells can execute class-switch recombination (CSR) to Cg2b (28). We suspect that enhancer region hRE1 plays a role in stabilizing the active conformation of Igh required for CSR (29), V(D)J recombination (26), or both in precursor B cells. The primary enhancer region directing CSR in activated, mature B cells is thought to be the 3′RR. However, deletion of 3′RR abrogates recombination to all Igh isotypes except Cg1, the constant region lying most proximal to the hRE1 enhancer (30). As such, our chromatin state analysis of pro-B cells provides at least four new enhancer elements, including hRE1, which can now be studied in vivo for their roles in Ig gene assembly, expression, isotype switching, and somatic hypermutation.
In summary, we have developed an unbiased epigenome-based approach to define the regulomes of AgR and other complex loci, such as those encoding NK cell receptors or MHC molecules (31). While our functional validation of new enhancers focused on regions belonging to chromatin state 4, other states may also harbor important regulatory elements. These include state 5, which was enriched for promoters, and state 13, which spans Eμ and a flanking portion of the 3′RR. Future validations, including targeted disruption of these elements, will produce a more complete picture of AgR regulomes in the context of lymphocyte development and activation.
Supplementary Material
Acknowledgments
We are grateful to Dr. Stephanie Kolar for helpful discussions.
This research was supported by NIH Grants AI 079732, AI 081224 and CA 156690 (to E.M.O.) and AI 082918 (to A.J.F.).
Abbreviations used in this article
- ChIP
chromatin immunoprecipitation
- AgR
antigen receptor
- FDR
false discovery rate
- FPKM
fragments per kilobase of transcript per million mapped reads
Footnotes
The online version of this article contains supplemental material.
References
- 1.Osipovich O, Oltz EM. Regulation of antigen receptor gene assembly by genetic-epigenetic crosstalk. Seminars in immunology. 2010;22:313–322. doi: 10.1016/j.smim.2010.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Degner-Leisso SC, Feeney AJ. Epigenetic and 3-dimensional regulation of V(D)J rearrangement of immunoglobulin genes. Seminars in immunology. 2010;22:346–352. doi: 10.1016/j.smim.2010.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013;153:307–319. doi: 10.1016/j.cell.2013.03.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Loven J, Hoke HA, Lin CY, Lau A, Orlando DA, Vakoc CR, Bradner JE, Lee TI, Young RA. Selective inhibition of tumor oncogenes by disruption of super-enhancers. Cell. 2013;153:320–334. doi: 10.1016/j.cell.2013.03.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105–1111. doi: 10.1093/bioinformatics/btp120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Zang C, Schones DE, Zeng C, Cui K, Zhao K, Peng W. A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics. 2009;25:1952–1958. doi: 10.1093/bioinformatics/btp340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature protocols. 2012;7:562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Song JS, Johnson WE, Zhu X, Zhang X, Li W, Manrai AK, Liu JS, Chen R, Liu XS. Model-based analysis of two-color arrays (MA2C) Genome biology. 2007;8:R178. doi: 10.1186/gb-2007-8-8-r178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nature methods. 2012;9:215–216. doi: 10.1038/nmeth.1906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Gopalakrishnan S, Majumder K, Predeus A, Huang Y, Koues OI, Verma-Gaur J, Loguercio S, Su AI, Feeney AJ, Artyomov MN, Oltz EM. Unifying model for molecular determinants of the preselection Vbeta repertoire. Proceedings of the National Academy of Sciences of the United States of America. 2013;110:E3206–3215. doi: 10.1073/pnas.1304048110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–837. doi: 10.1016/j.cell.2007.05.009. [DOI] [PubMed] [Google Scholar]
- 14.Abeel T, Van de Peer Y, Saeys Y. Toward a gold standard for promoter prediction evaluation. Bioinformatics. 2009;25:i313–320. doi: 10.1093/bioinformatics/btp191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Yip KY, Cheng C, Bhardwaj N, Brown JB, Leng J, Kundaje A, Rozowsky J, Birney E, Bickel P, Snyder M, Gerstein M. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome biology. 2012;13:R48. doi: 10.1186/gb-2012-13-9-r48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zhang S, Li Q, Liu J, Zhou XJ. A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules. Bioinformatics. 2011;27:i401–409. doi: 10.1093/bioinformatics/btr206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Matthews AG, Kuo AJ, Ramon-Maiques S, Han S, Champagne KS, Ivanov D, Gallardo M, Carney D, Cheung P, Ciccone DN, Walter KL, Utz PJ, Shi Y, Kutateladze TG, Yang W, Gozani O, Oettinger MA. RAG2 PHD finger couples histone H3 lysine 4 trimethylation with V(D)J recombination. Nature. 2007;450:1106–1110. doi: 10.1038/nature06431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature methods. 2012;9:473–476. doi: 10.1038/nmeth.1937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature biotechnology. 2010;28:817–825. doi: 10.1038/nbt.1662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ebert A, McManus S, Tagoh H, Medvedovic J, Salvagiotto G, Novatchkova M, Tamir I, Sommer A, Jaritz M, Busslinger M. The distal V(H) gene cluster of the Igh locus contains distinct regulatory elements with Pax5 transcription factor-dependent activity in pro-B cells. Immunity. 2011;34:175–187. doi: 10.1016/j.immuni.2011.02.005. [DOI] [PubMed] [Google Scholar]
- 21.Verma-Gaur J, Torkamani A, Schaffer L, Head SR, Schork NJ, Feeney AJ. Noncoding transcription within the Igh distal V(H) region at PAIR elements affects the 3D structure of the Igh locus in pro-B cells. Proceedings of the National Academy of Sciences of the United States of America. 2012;109:17004–17009. doi: 10.1073/pnas.1208398109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Choi NM, Loguercio S, Verma-Gaur J, Degner SC, Torkamani A, Su AI, Oltz EM, Artyomov M, Feeney AJ. Deep sequencing of the murine igh repertoire reveals complex regulation of nonrandom v gene rearrangement frequencies. J Immunol. 2013;191:2393–2402. doi: 10.4049/jimmunol.1301279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Mercer EM, Lin YC, Benner C, Jhunjhunwala S, Dutkowski J, Flores M, Sigvardsson M, Ideker T, Glass CK, Murre C. Multilineage priming of enhancer repertoires precedes commitment to the B and myeloid cell lineages in hematopoietic progenitors. Immunity. 2011;35:413–425. doi: 10.1016/j.immuni.2011.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hagman J, Rudin CM, Haasch D, Chaplin D, Storb U. A novel enhancer in the immunoglobulin lambda locus is duplicated and functionally independent of NF kappa B. Genes & development. 1990;4:978–992. doi: 10.1101/gad.4.6.978. [DOI] [PubMed] [Google Scholar]
- 25.Hobert O. Gene regulation: enhancers stepping out of the shadow. Current biology : CB. 2010;20:R697–699. doi: 10.1016/j.cub.2010.07.035. [DOI] [PubMed] [Google Scholar]
- 26.Medvedovic J, Ebert A, Tagoh H, Tamir IM, Schwickert TA, Novatchkova M, Sun Q, Huis In 't Veld PJ, Guo C, Yoon HS, Denizot Y, Holwerda SJ, de Laat W, Cogne M, Shi Y, Alt FW, Busslinger M. Flexible Long-Range Loops in the VH Gene Region of the Igh Locus Facilitate the Generation of a Diverse Antibody Repertoire. Immunity. 2013;39:229–244. doi: 10.1016/j.immuni.2013.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Perlot T, Alt FW. Cis-regulatory elements and epigenetic changes control genomic rearrangements of the IgH locus. Advances in immunology. 2008;99:1–32. doi: 10.1016/S0065-2776(08)00601-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kumar S, Wuerffel R, Achour I, Lajoie B, Sen R, Dekker J, Feeney AJ, Kenter AL. Flexible ordering of antibody class switch and V(D)J joining during B-cell ontogeny. Genes & development. 2013;27:2439–2444. doi: 10.1101/gad.227165.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Han JH, Akira S, Calame K, Beutler B, Selsing E, Imanishi-Kari T. Class switch recombination and somatic hypermutation in early mouse B cells are mediated by B cell and Toll-like receptors. Immunity. 2007;27:64–75. doi: 10.1016/j.immuni.2007.05.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Vincent-Fabert C, Fiancette R, Pinaud E, Truffinet V, Cogne N, Cogne M, Denizot Y. Genomic deletion of the whole IgH 3′ regulatory region (hs3a, hs1,2, hs3b, and hs4) dramatically affects class switch recombination and Ig secretion to all isotypes. Blood. 2010;116:1895–1898. doi: 10.1182/blood-2010-01-264689. [DOI] [PubMed] [Google Scholar]
- 31.Shiina T, Inoko H, Kulski JK. An update of the HLA genomic region, locus information and disease associations: 2004. Tissue antigens. 2004;64:631–649. doi: 10.1111/j.1399-0039.2004.00327.x. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





