Abstract
Fetal hemoglobin (HbF, α2γ2) level is genetically controlled and modifies severity of adult hemoglobin (HbA, α2β2) disorders, sickle cell disease and β-thalassemia. Common genetic variation affects expression of BCL11A, a regulator of HbF silencing. To uncover how BCL11A supports the developmental switch from γ- to β- globin, we use a functional assay and protein binding microarray to establish a requirement for a zinc-finger cluster in BCL11A in repression, and identify a preferred DNA recognition sequence. This motif appears in embryonic and fetal-expressed globin promoters, and is duplicated in γ-globin promoters. The more distal of the duplicated motifs is mutated in individuals with hereditary persistence of fetal hemoglobin. Using the CUT&RUN approach to map protein binding sites in erythroid cells, we demonstrate BCL11A occupancy preferentially at the distal motif, which can be disrupted by editing the promoter. Our findings reveal that direct γ-globin gene promoter repression by BCL11A underlies hemoglobin switching.
Graphical abstract
The developmental transition between fetal and adult hemoglobin is controlled by a silencer that selectively regulates γ-globin expression, suggesting a simplified control mechanism that could be amenable for manipulation in treatment of β- hemoglobin disorders.

Introduction
During human development, the major hemoglobin expressed in red blood cells changes from fetal hemoglobin (HbF, α2γ2) to adult hemoglobin (HbA, α2β2). The transcriptional shift of the γ- to the β-globin gene is commonly referred to as the “fetal-to-adult hemoglobin switch”. The level of HbF, which is ∼1% of total hemoglobin in adults, is genetically controlled and a critical modifier of the clinical severity of the major β-hemoglobin disorders, β-thalassemia and sickle cell disease (SCD). In β-thalassemia, where β-globin is deficient, increased γ-globin expression reduces the imbalance of the α- and β-globin chains that underlies the pathophysiology of anemia in this condition. In SCD, increased HbF levels interfere with the polymerization (sickling) of HbS (α2βS2), thereby reducing damage to red blood cells and extending their lifespan. Common genetic variation modestly affects HbF level, whereas rare alleles with deletions within the β-globin gene cluster or single base substitutions (or microdeletion) in the γ-globin gene promoters lead to substantial elevations (10-30% of total hemoglobin) in the benign hereditary persistence of fetal hemoglobin (HPFH) syndrome.
Nuclear regulatory factors controlling HbF level were largely elusive until GWAS provided potential candidates (Menzel et al., 2007; Uda et al., 2008), including BCL11A, which has been validated as a major HbF silencer through surrogate genetics in cells (Sankaran et al., 2008; Xu et al., 2010), mouse knockout (Sankaran et al., 2009), and study of rare haploinsufficient individuals (Basak et al., 2015; Dias et al., 2016). Despite clear genetic evidence for its critical role, how and where BCL11A acts to repress γ-globin gene expression has remained unclear. Literature conflicts with regard to whether BCL11A, or its paralog BCL11B (Wakabayashi et al., 2003), binds DNA directly and, if so, the sequence(s) it may recognize (Avram et al., 2002; Ippolito et al., 2014; Liu et al., 2006; Longabaugh et al., 2017; Marban et al., 2005; Senawong et al., 2005; Tang et al., 2011; Wiles et al., 2013). Prior work suggested that BCL11A acts at a distance from the γ-globin gene promoters but was unable to map precisely sites of occupancy within the β-globin locus (Jawaid et al., 2010; Sankaran et al., 2008; Sankaran et al., 2011). To understand how BCL11A acts, we began by characterizing its essential domains and DNA-binding specificity, both in vitro and in chromatin.
Results
Domains of BCL11A required for globin repression
Several isoforms of BCL11A protein are reported (Figure S1A). In adult erythroid cells, two species are expressed: L and a larger, more abundant isoform XL, which differs by inclusion of a terminal exon via RNA splicing (Liu et al., 2006; Sankaran et al., 2008). To determine domains of BCL11A required for globin repression, we developed a functional rescue assay in which cDNA constructs are expressed in murine erythroleukemia (MEL) cells deficient in BCL11A, either through deletion of the essential erythroid enhancer (Bauer et al., 2013) or TALEN-edited removal of exon 1 or exons 1-5 (Figure 1A-D; Figure S1B,C). Enhancer- and exon-deleted cells express <5% and no detectable BCL11A RNA (or protein), respectively. BCL11A-deficient cells exhibit ≥100-fold derepression of β-like embryonic εy-globin gene (Xu et al., 2010) and modest derepression of the weakly expressed βh1-globin gene (Figure 1A,D; Figures S1D-H, 2). Forced expression of BCL11A-XL, but not BCL11A-L, restored repression of embryonic globins (Figure 1A,D; Figures S1G,H). As BCL11A-XL differs from isoform L only by the presence of a C-terminal zinc-finger cluster (ZnF456), these results implicate this domain as essential for globin repression. Consistent with this conclusion, deletion of ZnF6, ZnF56, or ZnF456 from BCL11A-XL impaired repression (Figure 1C,D; Figure S1).
Figure 1. Functional rescue analysis identifies BCL11A isoform XL and domains required for transcriptional repression.

(A) Schematic of functional rescue analysis in BCL11A enhancer or exon knockout mouse erythroleukemia (MEL) cells by ectopic expression of various BCL11A isoforms or domain mutants. Expression of BCL11A-XL, but no other known isoforms, restored the transcriptional repression of the embryonic εy- and βh1-globin genes, whereas the adult βmajor-globin remained unaffected. Results are mean ± SEM of three experiments and analyzed by two-sided t-test. *P < 0.05, n.s. not significant.
(B) Western blot analysis validated the ectopic expression of BCL11A isoforms in Enh-KO cells to levels that are comparable to the endogenous BCL11A expression in MEL cells.
(C) Schematic of the structure-function domains of BCL11A and various mutants. The numbers denote the amino acids retained in each BCL11A mutant relative to the full-length BCL11A-XL isoform.
(D) Repression of εy-globin expression was restored by full-length BCL11A-XL, but not by domain mutants lacking the NuRD-interacting domain, ZnF23 or ZnF456. Each circle denotes an independent single-cell-derived stable cell clone. Results are mean ± SEM of multiple independent clones and analyzed by one-way ANOVA with repeated measures. *P < 0.05, **P < 0.01, ***P < 0.001.
See also Figure S1.
Expression of additional BCL11A constructs in the rescue assay revealed requirement for the N-terminus (BCL11A 11-835) and an internal ZnF cluster, ZnF23 (delZnF23) (Figure 1C,D). As the N-terminus contains a canonical NuRD-binding peptide (Lejon et al., 2011), NuRD is present in protein complexes with BCL11A (Xu et al., 2013), and downregulation of NuRD subunits induces HbF expression (Gnanapragasam et al., 2011; Sankaran et al., 2008; Xu et al., 2013), we infer that physical association with NuRD is necessary for globin repression. In contrast, other domains, including ZnF0 and ZnF1 appear dispensable for BCL11A-mediated globin repression (Figure 1C,D; Figure S1).
DNA recognition by BCL11A
Various DNA sequences have been reported as potential consensus binding motifs for BCL11A (or its paralog BCL11B) (Ai et al., 2017; Avram et al., 2002; Consortium, 2012; Ippolito et al., 2014; Liu et al., 2006; Longabaugh et al., 2017; Marban et al., 2005; Senawong et al., 2005; Tang et al., 2011; Wakabayashi et al., 2003; Wiles et al., 2013). Published studies are internally inconsistent and compromised by the failure to recognize existence of an XL isoform and/or adequately validate ChIP-seq findings. Where no dominant motif has been recovered, some have reported ETS- and Runx-family motifs and deduced their TFs as possible cofactors (Ippolito et al., 2014; Longabaugh et al., 2017). To interrogate in an unbiased manner sequences recognized by BCL11A, we used the universal protein binding microarray (PBM) platform that permits high throughput determination of potential DNA binding sequences (Berger et al., 2006). All possible 8-mer sequences were represented on PBMs with 32-fold coverage. Because ZnF23 and ZnF456 are individually required for globin repression (Figure 1D) and DNA-binding ZnF proteins typically bind DNA through ZnF clusters (Choo and Klug, 1997; Wolfe et al., 2000), we first searched for DNA sequences recognized by BCL11A-XL, and the two isolated finger domains, ZnF23 and ZnF456. We performed experiments with the corresponding ZnF clusters of both BCL11A and BCL11B. For ZnF23 and ZnF456 we deduced consensus sequences of (T/C)NNGG(C/A)C(A/G/C) and TG(A/T)CC(A/C/T), respectively (Figure 2A). Sequence selectivity was quantified with a PBM Enrichment (E) score between -0.5 and +0.5. E-scores of 0.45 or greater represent highly selective binding (Badis et al., 2009; Berger et al., 2008; Jiang et al., 2013). The specific sequence that yielded the highest average E-score for ZnF456 for both BCL11A and BCL11B was TGACCA (A456, E=0.472, B456, E=0.477) (Figure 2A). Of all possible 8-mers, TNCGGCCA was the highest scoring sequence for ZnF23 from BCL11A or BCL11B (A23, E=0.489, B23, E=0.490) (Figure 2A). ZnF23 and ZnF456 each bound their respective sequences selectively (Figure 2B,C). A double mutant lacking both ZnF domains (d23d456) failed to bind either motif specifically (Figure 2B,C), or other sequences, thereby excluding DNA-binding contribution from elsewhere in the protein. As ZnF456 and BCL11A-XL recognize the same sequence (Figure 2A), whereas ZnF23 recognizes a different sequence, we infer that ZnF456 largely dictates DNA-binding of the intact protein. The apparent lack of contribution by ZnF23 to DNA-binding of BCL11A-XL may reflect weak affinity for its cognate sequence or inaccessibility in the context of the entire protein. Taken together, these results identify a preferred DNA motif, TGACCA, bound directly by BCL11A-XL in vitro, which is mediated through ZnF456.
Figure 2. BCL11A binds DNA in a sequence-specific manner through its C-terminal ZnF cluster.

(A) Heatmaps depict results of clustering PBM enrichment (E) scores (rows) and BCL11 proteins (either BCL11A or BCL11B) (columns) for all 8-mers bound (PBM E> 0.40) by at least one protein. Replicates are denoted by R1 and R2. Logos generated from clusters of 8-mers depict binding specificities by the various proteins and match the motifs (information content on y-axis, nucleotide position on x-axis) derived by the Seed-and-Wobble algorithm.
(B,C) Boxplots depicting PBM E-scores of all 8-mers containing TGACCA (B) or TNCGGCCA (C). Each plot illustrates the respective motif preference for the different BCL11 proteins. The middle line of each box represents the median E-score value, the upper and lower whiskers portray the min(max(x), Q_3 + 1.5* IQR) and the max(min(x), Q_1-1.5*IQR), respectively, and points beyond the whiskers represent outliers.
(D,E) Fluorescence polarization curves for ZnF456 (D) and ZnF23 (E) upon binding a 6-FAM-labelled 10-basepair double-stranded oligonucleotide containing the 456 motif (blue), the 23 motif (red), a scrambled control sequence (black), or with the addition of 1mM EDTA (grey). The curves were fit to a single-site binding model (hyperbola).
(F,G) Octet curves for ZnF456 (F) and ZnF23 (G) upon binding the 456 motif (blue), the 23 motif (red), or a scrambled control sequence (black). The calculated affinity of ZnF456 binding its preferred motif was 31.9 ± 6.8 nM. The calculated affinity of ZnF23 binding to the 23 motif was 2079 ± 245.8 nM and indeterminate for binding the 456 motif.
To measure the affinity, as opposed to selectivity, of binding by ZnF23 and ZnF456 to their respective DNA sequences, we performed two biophysical assays, fluorescence polarization (Do et al., 2008) and BioLayer Interferometry (BLI)-based octet analysis (Concepcion et al., 2009) with highly purified recombinant proteins. In fluorescence polarization, plane polarized light excited a fluorescent tag, 6-FAM, on one 5′ end of a ten basepair double-stranded oligonucleotide containing the optimal PBM- derived motif of ZnF23 or ZnF456 at the center. Binding of protein to DNA causes the fluorophore to tumble more slowly, leading to an increase in polarization. Titration of 6-FAM-labelled ZnF456 motif with purified ZnF456 protein resulted in increased polarization (Figure 2D), consistent with a calculated KD of 23.1±2.7nM, which is within the range typical of ZnF DNA-binding proteins (Pedone et al., 1996). Purified ZnF23 protein bound its motif (Figure 2E) with a much lower affinity, 571.3±149.9nM, which may account for the apparent lack of contribution of ZnF23 to overall BCL11A-XL binding specificity. To confirm that the DNA motifs were specific to each ZnF domain, we performed the fluorescence polarization assay with each fluorescently-labelled motif and the alternative ZnF protein. The affinity of ZnF456 for the ZnF23 motif was 2349±1293nM, which is approximately 100-fold weaker than ZnF456 bound to its own motif (Figure 2D,E). Surprisingly, ZnF23 bound its cognate motif and the ZnF456 motif with similar affinity (KD=526.3±57.05nM), which is 22-fold weaker than binding of ZnF456 to its motif. Despite the weak affinity of ZnF23 for its preferred DNA motif, its requirement for globin repression suggests an alternative role, perhaps in mediating protein or RNA interactions. Similar analyses with the alternative method, octet, confirmed the affinity measurements obtained by fluorescence polarization (Figure 2F,G).
Mutation of preferred embryonic/fetal recognition site in HPFH
Of note, the preferred binding motif for BCL11A, TGACCA, is present in the promoters of all embryonic and fetal expressed globin genes of humans and mice, and none of the adult expressed genes (Table S1). Moreover, the motif is duplicated in each of the human γ-globin genes (-118 to -113 and -91 to -86). The distal motif is altered by single base substitutions at -117 and -114 or by a 13-bp deletion in rare individuals with HPFH (Collins et al., 1985; Gelinas et al., 1985; Gilman et al., 1988). In heterozygous individuals, the HbF level is 10-30%. PBM analysis revealed that the -117 and -114 HPFH substitutions abrogate binding by BCL11A (Figure 3A). Affinity determinations by fluorescence polarization (Figure 3B) and octet (data not shown) confirmed PBM findings. Thus, these HPFH alleles represent rare sequence variants impaired for BCL11A binding.
Figure 3. HPFH mutations abrogate BCL11A-DNA-binding.

(A) Boxplot representing PBM E-scores of all 8-mers containing the depicted 6-mers, highlighting how the HPFH mutations impair BCL11A-XL binding. Boxplot formats are as in Figure 2B, 2C.
(B) Fluorescence polarization curves for ZnF456 upon binding a 6-FAM-labelled 10-basepair double-stranded oligonucleotide containing the 456 motif (blue), the 456 motif with the G-117A mutation (green), or the C-114G/T mutation (red). The curves were fit to a single-site binding model (hyperbola).
See also Table S1.
Mapping BCL11A chromatin occupancy by CUT&RUN
Independent laboratories failed to detect occupancy of BCL11A at the γ-globin promoters by ChIP-PCR or ChIP-chip and instead suggested binding within the locus control region (LCR) and the intergenic Aγ-δ region (Jawaid et al., 2010; Sankaran et al., 2008; Sankaran et al., 2011). These findings are most consistent with a model in which BCL11A silences HbF primarily through long-distance interactions, perhaps in the context of chromosomal loops (Deng et al., 2014; Xu et al., 2010). Given the presence of a preferred recognition sequence for BCL11A within the γ-globin promoters and mutation of this motif in HPFH, we revisited in vivo BCL11A chromatin occupancy with an alternative approach. We adapted CUT&RUN (Skene and Henikoff, 2017), a recently described method that maps protein binding sites and their long-range genomic contact regions. CUT&RUN is a nuclease-based procedure that eliminates use of protein crosslinking and affords higher resolution than ChIP-seq due to recruitment of micrococcal nuclease to DNA-bound protein with subsequent release of small DNA cleavage fragments.
We performed pilot experiments in immortalized adult-stage human erythroid HUDEP-2 cells (Kurita et al., 2013) and compared findings for localization of CTCF and GATA1 to ChIP-seq datasets (Canver et al., 2017). CUT&RUN identified CTCF and GATA1 binding sites with high sensitivity and resolution at the genome-wide level (Figure S2A). The average width of peaks identified from CUT&RUN CTCF was 159 bp compared to 273 bp from CTCF ChIP-seq. Over 46% and 51% of CTCF and GATA1 ChIP-seq peaks were recovered by the respective CUT&RUN experiments. Upon analysis, we observed that DNA fragments <40 bp were underrepresented. To improve recovery of short fragments, we modified several steps of the procedure and library preparation, and optimized the data processing pipeline. The modified protocol increased the recovery of short fragments, enhanced the peak signal over noise (Figure S2B; see Methods), and increased the percentage of peak overlap with ChIP-seq by 9%. These optimizations were important for BCL11A CUT&RUN due to its short recognition motif and generation of short fragments from its binding sites.
From six antibodies, we selected one (Figure S2C) for BCL11A CUT&RUN experiments based on enrichment of its preferred binding sequence, overall peak number, and peak overlap between antibodies. BCL11A CUT&RUN was performed in expansion phase HUDEP-2 cells (Kurita et al., 2013) and primary human stem/progenitor CD34+ cells (Xu et al., 2012) undergoing erythroid differentiation for 3, 5, 7, and 9 days (Figure S2D-G). Three or four nuclease cleavage time courses were performed for each cell type and stage. BCL11A knockout HUDEP-2 cells (Canver et al., 2015) or IgG were used as negative controls for HUDEP-2 or CD34+ cell experiments, respectively (Figure S2F). 23000-74000 peaks were identified in these experiments (Figure S2H). The distributions of peaks were highly similar and were most abundant within promoter (35-39%), intron (30-34%), and distal intergenic regions (23-25%) (Figure S2I,J).
Unbiased motif discovery of BCL11A CUT&RUN yielded the motif TG(A/G)CC(A/T/G), which was centrally enriched in peaks and contained the BCL11A recognition sequences determined by PBM (Figure 4A,B and Figure S3A). Motifs for GATA1, CTCF, EKLF family proteins and other TFs were also identified. Among the several sequences conforming to TG(A/G)CC(A/T/G), TGACCA was the most favored sequence (Figure S3B,C), which is consistent with PBM E-scores for all sequences (data not shown).
Figure 4. BCL11A CUT&RUN experiments reveal an enriched binding motif consistent with PBM.
(A,B) Motif discovery analysis in BCL11A CUT&RUN peaks in HUDEP-2 cells (A) and in CD34+ cells at day 7 of erythroid differentiation (B). The plots showed the position of the motif relative to peak centers. Other motifs discovered are in Figure S3A. E values shown in upper right were reported by MEME.
(C) Left, Schematic showing the principal of footprint detection. Right, Motifs discovered within footprints.
(D,E) Targeted motif footprint analysis of BCL11A CUT&RUN. Cut probability of each base surrounding and within TGACCA motifs was plotted. Left, footprint analysis on TGACCA in BCL11A CUT&RUN in wild-type HUDEP-2 cells (D) and in CD34+ cells at day 7 of differentiation (E). Right, control analysis performed in BCL11A KO HUDEP-2 cells (D) or IgG CUT&RUN in CD34+ cells (E). Data for other controls and CD34+ cells at days 3, 5, 9 of differentiation are shown in Figure S4.
(F,G) Footprint analysis for five tiers of BCL11A peaks ranked by binding log odds. Log odds are defined as log(PA/(1-PA)), where PA is posterior binding probability reported by CENTIPEDE. Left, the average cut count of each base surrounding and within TGACCA was plotted for five tiers of BCL11A peaks. Right, heatmap depicts the number of pA-MNase cuts per base (column) for each peak (row). The peaks are ordered by decreasing binding log odds. The γ-globin promoter distal motif's binding log odds is denoted by a red arrow.
See also Figure S2, S3, and S4.
As a nuclease-based method, CUT&RUN allows base resolution digital footprints, which may reflect precise protein binding sites (Neph et al., 2012; Skene and Henikoff, 2017). We enumerated the fragment ends for each base of the genome in BCL11A CUT&RUN, and adapted an algorithm that detects and scores footprints de novo (Neph et al., 2012). We performed motif discovery within these footprints. TG(A/T)CC(A/C/T) was identified as the top enriched motif (Figure 4C). This motif matches well with the peak-based de novo discovery approach above, and matches more closely the PBM-derived motif. We next plotted the frequency of nuclease cutting on TGACCA and surrounding sequences, and observed that TGACCA was well protected in BCL11A CUT&RUN (Figure 4D,E, left; Figure S4A). In contrast, none of the random hexamers, GATA1 motif, or the mutant HPFH sequences was protected (Figure S4B,C). No footprint was evident on analysis of TGACCA in experiments performed with BCL11A knockout cells or with IgG (Figure 4D, E, right). We trained an unsupervised Bayes mixture model on the raw counts around -100 to +100 bp of all occurrences of TGACCA to compute the binding probability of each occurrence (Methods). As we ranked TGACCA sites by their computed binding log odds, the rank percentile matches well with the strength of the footprint (Figure 4F,G). From this analysis, we found ∼3000 TGACCA sites (with log odds >4.5 or >0.99 probability) which are potential binding sites of the protein. Taken together, these observations support TGACCA as the preferred in vivo binding sequence for BCL11A.
Chromatin localization of BCL11A in globin loci
Within the β-globin locus, strong peaks were observed starting from day 5 of CD34+ cell differentiation and in HUDEP-2 cells within the LCR, the HBBP1 pseudogene, β-globin gene (HBB), 3′HS, and of special interest, in the γ-globin gene promoters (Figure 5A,B). The general pattern was similar for different nuclease incubation times starting at 30 minutes (data not shown). Hypersensitivity sites and HBB gained greater accessibility as differentiation progressed in CD34+ cells, reflecting progressive opening of the region upon erythroid differentiation. The TGACCA motif is present at 35 sites in the entire locus. Among 12 that are encompassed by CUT&RUN peaks, the motifs within the γ-globin promoter lie in the peak center, and have highest binding probability (Figure 5A, last track, Table S2). Peaks encompassing HS4, HS3, HS1, HBD and HBB, which do not contain a TGACCA motif, may reflect frequent chromatin contacts within the locus, rather than direct sites of BCL11A occupancy on DNA (see Discussion). Strong peaks with a centered motif within the γ-globin promoters are highly likely to reflect authentic, direct binding sites of BCL11A.
Figure 5. CUT&RUN reveals BCL11A binding in γ-globin promoters.
(A) CUT&RUN profiles in β-globin cluster. Antibodies and cell types for each track are shown on the right. The promoters of duplicated γ-globin genes (HBG2 and HBG1) are highlighted in pink. The bottom track shows all 35 TGACCA sites in non-repetitive regions.
(B) Left, BCL11A binding at the HBG1/2 promoter across multiple CUT&RUN experiments. Right, zoomed in view of 216 bp of Gγ promoter region. The distal and proximal TGACCA motifs are highlighted in green. Arrows indicate positions of peak summits.
(C) Comparison of the distances of distal motif or proximal motif to peak summit.
(D) Comparison of the fragment coverage of distal and proximal motifs. Each fragment originates from one DNA molecule released by protein A-MNase digestion. Relative fragment coverage was defined as average fragment number per base on motifs divided by average fragment number per base within the peak.
(E) Comparison of cut numbers occurred in distal motif or proximal motif.
(F) Single locus footprint analysis shows the cut frequency at each nucleotide of the γ-globin promoter region (from -146 to -67 in relation to TSS). Three footprints (regions protected from protein A-MNase cut) are indicated with brackets and two TGACCA motifs are highlighted in green. 14 CUT&RUN experiments in CD34+ cells are included in (C, D and F), 24 CUT&RUN experiments in HUDEP-2 and CD34+ cells are included in (E), and the p value was calculated by paired t test.
See also Figure S5, Table S2.
Strong peaks were also found within the α-globin locus, including the promoter of HBZ (Figure S5A,B), which encodes the embryonic-expressed α-like ζ-globin, a finding consistent with derepression of ζ-globin on the loss of BCL11A (Sankaran et al., 2008).
High-resolution localization of BCL11A at the γ-globin promoters
The enhanced resolution of CUT&RUN permits discrimination of BCL11A binding to the duplicated TGACCA motifs in the γ-globin promoters, despite their close proximity (21 bp separate the motifs; Figure 3A). Two independent analyses reveal a binding preference of BCL11A for the distal versus proximal motif. First, we observed that the distal motif (-118 to -113) was more often covered with fragments than the proximal motif (P=0.0004) and the distal motif was closer to the peak summit position than the proximal motif (P=0.0001) (Figure 5B-D). Second, a discriminative TF binding model built for BCL11A revealed that the log odds of binding at the distal motif was 82 in CD34+ cells and 61 in HUDEP-2 cells (Figure 4F,G), which ranked in the top 1-2% of all TGACCA sites, higher than the proximal motif (log odds of 65 in CD34+ cells and 43 in HUDEP-2 cells). In accord with this estimate, we enumerated the number of fragment ends for each base of the duplicated γ-globin promoters, and found that there were consistently across independent experiments fewer cuts within the distal motif than within the proximal motif (P=0.0129) (Figures 5E, 5F). Taken together, these findings indicate that BCL11A preferentially binds the distal TGACCA motif in the γ-globin promoters in adult erythroid chromatin. Similarly, the TGACCA motif within the ζ-globin promoter was also protected within a footprint (Figure S5B,C), suggesting that the HBZ gene promoter harbors an authentic binding site for BCL11A in adult erythroid cells.
Gene editing of distal BCL11 A-binding motif in γ-globin promoter
To provide definitive evidence that the distal TGACCA motif is bound by BCL11A and also required for repression, we generated a HUDEP-2 cell line (clone D3) in which the distal motifs of all four γ-globin promoters were mutated by CRISPR/Cas9 genome editing (Figure 6A). FACS and RT-qPCR analysis showed that HbF was expressed in 97% of cells (Figure 6B), and γ-globin comprised 77% of total β-like globin RNA (Figure 6C). Chromosome conformation capture (3C) revealed a switch in enhancer-gene interaction from LCR-β-globin to LCR-γ-globin in clone D3 (Figure S6B), similar to prior observations with a human β-globin locus examined in the context of BCL11A knockout transgenic mice (Xu et al., 2010). We performed BCL11A CUT&RUN in clone D3. To facilitate mapping of mutated sequences, we generated a mutant reference genome, in which HBG1 carried a ΔC allele and HBG2 carried a Δ13bp allele (Figure S6A). The ΔA mutation can be efficiently mapped to the ΔC allele since there is only one mismatch.
Figure 6. BCL11A fails to bind distal motif in γ-globin promoter edited HUDEP-2 cells.
(A) Sequence alignment of 4 alleles of γ-globin promoter region of HUDEP-2 promoter-edited clone D3 cells.
(B) FACS analysis showing the percentages of fetal hemoglobin (HbF) positive cells in wild type HUDEP-2 and clone D3. Results are shown as mean ± SEM of three experiments.
(C) Left, RT-qPCR analysis of mRNA levels for γ-globin and β-globin in indicated cell lines. Right, percentages of β- or γ-globin mRNA. Results are shown as mean ± SEM of three experiments.
(D) CUT&RUN tracks showing the γ-globin promoter region. The top track shows combination of 4 γ-globin alleles in wild type HUDEP-2 cells, the middle track shows combination of 3 alleles (ΔA+ ΔC+ ΔC) in clone D3, and the bottom track shows allele Δ13bp. The positions of proximal and distal motifs are highlighted in green. Note that the distal motifs in clone D3 are mutated. Three replicates were merged. Figures S9B and 9C show the replicates separately.
(E) Single locus footprint showing the γ-globin promoter region in HUDEP-2 and clone D3. Edited cells in clone D3 show a high number of cuts interrupting the distal motif (highlighted green), whereas the wild-type has few to no cuts.
See also Figure S6.
The CUT&RUN results were initially somewhat surprising, as peaks appeared stronger at the γ-globin promoters in D3 than in wild-type cells (Figure S6D). This seemingly paradoxical observation can be explained, we believe, by consideration of two contributors. First, CUT&RUN is principally a nuclease-based method. Without immunoprecipitation, it maps sites that are directly occupied by the protein, as well as sites that are in proximity to the protein (Skene and Henikoff, 2017). Therefore, major changes in overall chromatin accessibility and long distance chromosomal interactions may lead to differences in peak sizes and distributions. Second, within the globin locus, transcription of γ- and β-globin genes correlates with DNase sensitivity of their respective promoters. Indeed, as evidenced by ATAC-seq, the γ-globin promoter in wild-type is virtually inaccessible, but accessible in BCL11A knockout and promoter-edited D3 cells (Figure S6E).
Moreover, close inspection of the BCL11A CUT&RUN in D3 cells revealed a valley at the distal motif and a much broader overall pattern (Figure 6D and Figure S6B,C). Footprint analysis showed that the mutated distal motif became hypersensitive to nuclease digestion (Figure 6E). Within position -146 to -67 of the γ-globin promoters, 89 out of 840 cuts occurred in the mutant distal motif (TGCCAA and TGACAA); in contrast, only 2 out of 412 cuts within the distal motif in wild type cells (Figure 6E). Consistent with this, log odds of binding to distal motif decreased to an average between alleles of -38 in edited cells, as compared to +61 in wild-type cells. Importantly, the gene-edited, mutant motifs were no longer bound by BCL11A, despite overall increased accessibility of the γ-globin promoter. To confirm that these findings are not limited to this specific edited clone of cells, we performed the analysis in a population of cells subjected to CRISPR/Cas9 editing as a pool. Similar results, including derepression of γ-globin and loss of protection at the mutant motifs, were obtained (Figures S6F,G). Taken together, these findings indicate that repression of γ-globin gene expression requires direct binding of BCL11A to the distal promoter motif.
Discussion
In addition to its prominent role in HbF regulation, BCL11A, and its paralog BCL11B, are critical for differentiation in diverse contexts, including B- and T-lymphoid cells (Albu et al., 2007; Liu et al., 2006; Liu et al., 2003) and the developing brain (Dias et al., 2016; Simon et al., 2012; Tang et al., 2011). As numerous difficulties have been encountered in mapping chromatin occupancy of these proteins (Avram et al., 2002; Consortium, 2012; Ippolito et al., 2014; Liu et al., 2006; Longabaugh et al., 2017; Marban et al., 2005; Senawong et al., 2005; Tang et al., 2011; Wiles et al., 2013), prior conclusions regarding their target genes and transcriptional networks merit careful reassessment.
We found that the nuclease-based CUT&RUN procedure (Skene and Henikoff, 2017) provides a powerful, high-resolution method for chromatin localization of a previously intractable protein. Unbiased motif search in BCL11A CUT&RUN experiments yielded the PBM-derived ZnF456 preferred sequence at the center of peaks, indicative of direct protein binding. The utility of CUT&RUN in the context of BCL11A is underscored by our inability to detect the binding motif in ChIP-seq experiments performed with the same antibody (data not shown).
In addition to establishing in vivo occupancy of the γ-globin promoters by BCL11A, the high-resolution afforded by CUT&RUN revealed preferential binding to the distal motif of the duplicated pair in the promoters. Furthermore, the nature of this method, a targeted nuclease assay, allowed in vivo footprinting of BCL11A at base resolution, which provided further evidence of BCL11A occupancy at the distal motif. Although the rare HPFH alleles discovered to date only represent a few of the many possible mutations in the promoter region, it is nonetheless striking that four HPFH mutations target the distal motif and none the proximal motif. This correlation, supported by demonstration of preferential occupancy of the distal motif and loss of binding in promoter-edited cells in vivo, provides persuasive evidence that BCL11A acts through this element to direct γ-globin gene repression and hemoglobin switching.
TGACCA motifs overlap with CCAAT motifs in most embryonic and fetal stage globin gene promoters (Table S1). CCAAT motifs of globin genes are primarily bound by transcriptional activators, such as NF-Y, and are required for their transcription (Kim and Sheffery, 1990; Martyn et al., 2017). Within γ-globin promoters, duplication of TGACCAAT sequences offers the potential of multiple modes of regulation. Prior EMSA data suggested that NF-Y prefers the proximal motif (Superti-Furga et al., 1988), whereas our data demonstrate preferential in vivo occupancy of BCL11A at the distal motif. In fetal cells, in which BCL11A abundance is low, NF-Y may direct active transcription of γ-globin through the proximal motif. With increased BCL11A abundance at the adult stage, BCL11A binding to the distal motif in the γ-globin promoter may transform the local chromatin into a condensed state through NuRD recruitment, thereby preventing NF-Y binding and silencing γ-globin expression.
Besides peaks at the γ-globin promoters, 4 other regions that contain a TGACCA motif within a footprint are also potential direct BCL11A binding sites (Table S2). These sites include the promoter of the pseudogene lying between the γA- and δ-globin genes (HBBP1), a subregion of the (β-globin LCR (HS2), and both upstream and downstream insulator elements of the (β-globin locus (HS5 and 3′HS). Log odds calculations are supportive for binding to HS5, HBBP1, and 3′HS. The presence of potential binding sites outside the γ-globin promoters suggests a broader role for BCL11A in the (β-globin locus. Testing the functional impact of these potential sites will necessitate precise editing of these regions and subsequent assessment of effects, if any, on HbF silencing. Of interest, on close inspection we have observed that one of the peaks in the HBBP1 region, which lies within the broader Aγ-δ region implicated historically in HbF silencing, was selectively diminished in the γ-promoter edited cells (Figure S6C). Although the larger significance of this finding remains to be elucidated, it is notable that numerous chromosomal interactions within the β-globin locus map to the HBBP1 region, as determined by capture-C (Huang et al., 2017) and CAPTURE-3C-seq (Liu et al., 2017). HBBP1, however, lies outside the 3.5 kb domain previously proposed as a cis-element for HbF silencing(Sankaran et al., 2011). These sequences, just upstream of the δ-globin gene, do not contain a detectable BCL11A binding site, which had been suggested previously by ChIP-Chip (Sankaran et al., 2011), yet they constitute another region with a high frequency of chromosomal interactions (Liu et al., 2017).
Peaks in CUT&RUN also reflect chromosomal interactions at a distance (Skene and Henikoff, 2017). The presence of BCL11A protein, either bound directly or recruited through interaction with other DNA-bound factors, at numerous positions within the β-globin locus, may signify a role in establishing or maintaining proper architecture of the locus. CUT&RUN peaks that are not associated with direct binding of BCL11A to a TGACCA motif encompassed within the peak reflect chromatin-bound BCL11A at other sites brought into proximity through chromosomal interactions. The increased frequency of contacts of the highly accessible γ-globin promoter in promoter edited cells with other regions of the β-globin locus, we propose, accounts for the seemingly paradoxical increase in CUT&RUN peaks in the promoter, despite lack of direct binding of BCL11A to the mutated distal motif.
The findings reported herein provide compelling evidence that HbF silencing in adult cells is achieved in large part through direct promoter-mediated repression. Thus, prior models proposing that BCL11A acts primarily at a distance from the γ-globin promoters to repress HbF (Sankaran et al., 2011; Xu et al., 2010) are untenable. Hemoglobin switching appears somewhat less complex than previously suspected. As chromosomal looping between the LCR and downstream globin genes is believed necessary for high-level expression (Deng et al., 2014) and loss of BCL11A shifts LCR interactions to the γ-globin gene (Xu et al., 2010), we suspect the presence of BCL11A at the γ-globin promoter may be incompatible with stable loop formation. However, additional roles for BCL11A elsewhere in the β-globin cluster, e.g. in the vicinity of HBBP1 or the LCR are by no means excluded.
BCL11A brings together two complementary themes in human genetics. Common genetic variation, as reflected in GWAS, led to recognition of BCL11A as a candidate factor responsible for HbF silencing (Menzel et al., 2007; Uda et al., 2008). Rare genetic variants, reflected by classical HPFH alleles (Collins et al., 1985; Gelinas et al., 1985), prevent BCL11A occupancy at the γ-globin promoters. Whereas common variation affects BCL11A expression through an erythroid-specific enhancer (Bauer et al., 2013; Canver et al., 2015), rare variation disrupts a pivotal cis-regulatory motif in the γ-globin promoter bound by BCL11A. Both genetic elements represent attractive, discrete targets for therapeutic genome editing to reactivate HbF in patients with sickle cell disease or β-thalassemia.
Contact For Reagent and Resource Sharing
Further information and requests for resources and reagents should be directed and will be fulfilled by the Lead Contact, Stuart H. Orkin (stuart_orkin@dfci.harvard.edu)
Experimental Model and Subject Details
Human blood and tissue samples
Human peripheral blood stem cells (CD34+ cells) from healthy donors (G-CSF mobilized, CD34 enriched) were purchased from Fred Hutchinson Cancer Research Center. The lot R002895 for experiments was collected from a health male and confirmed to express low level of HBG1/2. Samples were de-identified prior to purchasing, thus the research consent requirements were exempted from BCH IRB approval. The cells were thawed and recovered to EDM (IMDM (Corning, 15-016-CV) supplemented with 330 μg/mL Holo-Human Transferrin, 10 μg/mL Recombinant Human Insulin, 2 lU/mL Heparin, 5% Inactivated Plasma, 3 lU/mL Erythropoietin, 2 mM L-Glutamine) with three supplements (10-6 M hydrocortisone, 100 ng/mL SCF, 5 ng/mL IL-3) for 7 days to allow erythroid differentiation, then further differentiated in EDM with one supplement (100 ng/mL SCF) for another 2 days. Samples were collected on day 3, 5, 7, and 9 for CUT&RUN, RNA extraction and protein analysis.
Immortalized Cell lines
Human HEK293T cells (female) were purchased from ATCC. The cells were cultured in DMEM, high glucose (Thermo Fisher Scientific, 11965) with 10% FCS and 2 mM L-Glutamine, and passaged every three days.
MEL (Murine Erythroleukemia, male) cells were purchased from Coriell Institute for Medical Research. The cells were cultured in DMEM, low glucose (Thermo Fisher Scientific, 11885) with 10% FCS, 2 mM L-Glutamine, and passaged every three days.
HUDEP-2 cell line (human umbilical cord blood-derived erythroid progenitor, male) was generated and kindly shared by Dr. Yukio Nakamura. HUDEP-2 cells were maintained in expansion medium: StemSpan™ SFEM (STEMCELL Technologies, 09650) with SCF (50 ng/ml), EPO (3 lU/ml), Dexamethasone (10-6 M), Doxycycline (1 μg/ml), and passaged every three days. The cell density was maintained within 20 thousand to 500 thousand cells/mL. Erythroid differentiation was carried out by replacing the medium to EDM (Erythroid Differentiation Media: IMDM (Corning, 15-016-CV) supplemented with 330 μg/mL Holo-Human Transferrin, 10 μg/mL Recombinant Human Insulin, 2 lU/mL Heparin, 5% Inactivated Plasma, 3 lU/mL Erythropoietin, 2 mM L-Glutamine) with two supplements (100 ng/mL SCF, 1 μg/mL doxycycline). After 5 days differentiation, cells were collected for chromosome conformation capture assay.
Method Details
Functional Rescue Experiments
Bcl11a enhancer knockout (Enh-KO) MEL cells were generated by TALEN-mediated deletion of the 12-kb intronic enhancer(Bauer et al., 2013). Bcl11a exon knockout MEL cells were generated by TALEN-mediated deletion of exon1 (E1D) or exons 1 to 5 (E15D). Specifically, the TALENs were designed to generate double strand breaks flanking the Bcl11a exon1 or exon1 to 5 regions. The TALENs recognize the following sequences: TCTTGACTGGGCTGAAGC (5′L1),TGAAGTGGGGGCTGGGGG (5′R1), TGACTGGGCTGAAGCGTC (5′L2), TATTGAAGTGGGGGCTGG (5′R2), TGTTTACAAGCACCGCGT (lntron1L1), TCCTCTGTCTGTTTGTTG (lntron1R1), TTTACAAGCACCGCGTGT (lntron1L2), TCCGTCCTCTGTCTGTTT (lntron1R2), TGTGCTAGCTCTTTGTTG (3′L1), TCCCGGTGGAAGAGGAAC (3′R1), TGCTAGCTCTTTGTTGAT (3′L2), and TCTCCCGGTGGAAGAGGA (3′R2). TALENs were synthesized and cloned into pcDNA3.1 vector with the Fokl nuclease domain. 2.5 μg of each of the four TALEN plasmids together with 0.5 μg of pMax-GFP (Lonza) were electroporated to 2 × 106 MEL cells following manufacturer's procedure (Lonza, VCA-1005). Single cell-derived biallelic deletion clones were isolated by flow cytometry sorting of GFP positive cells followed by limiting dilution and genotyping analyses. The complete open reading frame (ORF) of human BCL11A isoforms (XL, L, S and XS) and various domain mutants were subcloned into the lentiviral vector pLVX-EF1a-IRES-zsGreen1 (Clontech #631982). Lentiviruses were packaged in HEK293T cells as described(Huang et al., 2016). Briefly, 2 μg of pΔ8.9, 1 μg of VSV-G and 3 μg sgRNA vectors were co-transfected into HEK293T cells seeded in a 10 cm petri dish. Lentiviruses were harvested from the supernatant 48-72 hr post-transfection. BCL11A enhancer or exon knockout MEL cells were transduced with lentiviruses in 6-well plates and GFP-positive cells were FACS sorted 48 hr post-transfection. After sorting, cells were seeded in T25 flasks as pooled populations or in 96-well plates as single cell clones. 6-32 single cell-derived stable cell clones with ectopic expression of BCL11A isoforms or domain mutants were processed for gene expression analysis. Total RNA was isolated using RNeasy Plus Mini Kit (Qiagen) following manufacturer's protocol. Quantitative RT-PCR (RT-qPCR) was performed using the iQ SYBR Green Supermix (Bio-Rad). Oligonucleotide sequences are listed in Table S3.
Western Blot
Western blot was performed as described(Xu et al., 2012). Briefly, samples were boiled in 1× SDS loading buffer to denature all proteins and separated with 13% or gradient SDS-PAGE gels. Proteins were then transferred to PVDF membrane with a standard wet transfer system at 2.5 mA/cm2 for 2 hr. Membranes were blocked with 5% nonfat milk for 1 hr and then incubated with primary antibodies for 1 hr at room temperature or overnight at cold room with shaking. Excess antibodies were washed with TBS-T (50 mM Tris pH 8.0, 150 mM NaCl, 0.1% Tween 20) for 3 times and HRP-conjugated secondary antibodies were incubated for 30 min at room temperature. After 3 washes with TBS-T, the membranes were developed with Immobilon Western Chemiluminescent HRP Substrate (Millipore, WBKLS0500). The following antibodies were used: M2-Flag (F1804, Sigma-Aldrich), BCL11A (ab19487 and ab191401, Abcam), and GAPDH (sc-25778, Santa Cruz Biotechnology). All antibodies were used at 1:1,000 dilutions in TBS-T.
Protein Binding Microarray (PBM) Experiments
For PBM experiments, ZnF23 and 456 from BCL11A and BCL11B were cloned into pDEST15 (Invitrogen) with an N-terminal glutathione S-transferase (GST) tag using the Gateway system (Invitrogen). The ZnF domains were expressed by in vitro transcription-translation (IVT) using PURExpress (New England Biolabs) supplemented with RNase Inhibitor (New England Biolabs) and 50 μM zinc acetate. Protein concentrations were determined by immunoblotting with a GST dilution series. BCL11A XL and BCL11B were cloned into vector pT7CFE (Thermo Fisher) with an N-terminal HA tag. The full-length proteins were expressed by IVT using the 1-Step Human Coupled IVT kit (Thermo Fisher) supplemented with 50 μM zinc acetate. Concentrations of full-length proteins from IVT reactions were determined by comparison to pure BCL11A-XL of known concentration on Western blot with antibodies against BCL11A (Abcam, ab18688) and HA (Santa Cruz Biotechnology, sc-805 HRP).
PBM experiments were performed as described (Berger and Bulyk, 2009; Berger et al., 2006), except as noted here. Custom-designed “all 10mer” universal oligonucleotide arrays in 8 × 60K GSE array format (Agilent Technologies; AMADID, 030236) (Nakagawa et al., 2013) were double-stranded. Each BCL11 wild-type and mutant protein was assayed at a final concentration of 600 nM in a standard PBM binding reaction containing 50 μM zinc acetate on a fresh slide. Protein binding was detected using an anti-HA antibody (Life Technologies, A-21287) or an anti-GST antibody (Life Technologies; catalog, A-11131) at a final concentration of 25 μg/mL and fluorescence measurements were obtained using a GenePix 4400A (Molecular Devices) microarray scanner. Subsequent data quantification, normalization, and DNA binding specificity analysis were performed as described previously using the Universal PBM Analysis Suite and the Seed-and-Wobble motif-derivation algorithm (Berger and Bulyk, 2009; Berger et al., 2006). Average E-scores for TGACCA and TNCGGCCA sequences were calculated by taking the mean of the E-scores of all 8-mers containing matches to these sequences. Average E-scores for 8-mers scoring E>0.40 by at least one BCL11 construct were plotted with the heatmap.2 function in the gplots R package using the Pearson correlation distance metric and single-linkage clustering. Unweighted clusters of 8-mers were then aligned to generate a PWM as previously described(Jiang et al., 2013) and associated sequence logos were made using the makePWM and seqLogo functions from the Biostrings and seqLogo R packages, respectively. Plots were made using the standard Boxplot function in R.
Protein Purification and Biophysical Assessment of Binding
For fluorescence polarization experiments, 6×His-tagged ZnF23 and 456 were cloned into vector pMCSG7 (Midwest Center for Structural Genomics) in a ligation-independent manner using T4 DNA Polymerase (Millipore, 70099). The construct was transformed into Rosetta™(DE3) pLysS competent cells (Millipore, 70956-4) for protein expression. Protein expression was induced with 1 mM IPTG (Roche) and supplemented with 100 μM ZnSO4 (Sigma-Aldrich) at 20°C for 18 hr. Cells were resuspended in lysis buffer (50 mM Tris pH 8.0, 300 mM NaCl, 10 mM imidazole, 1mM DTT, 8 M urea, protease inhibitor cocktail), sonicated, spun, and the lysates were incubated with Ni-NTA agarose resin (Qiagen). After washing, proteins were eluted with lysis buffer containing 300 mM imidazole, and dialyzed overnight at 4°C against minimal buffer (20 mM Tris pH 8.0, 50 mM NaCl, 1mM DTT). The following day, the proteins were purified on a UNO S1 column (Bio-Rad) and dialyzed overnight at 4°C against storage buffer (20 mM Tris pH 8.0, 50 mM NaCl, 1 mM DTT, 10% glycerol v/v, 20 μM ZnSO4). The proteins were then concentrated using Amicon ultra-15 centrifugal filter units with a 3K MWCO (Millipore), aliquoted, flash frozen, and stored at -80°C.
Fluorescence polarization experiments were performed as described(Jameson and Seifried, 1999) in 384-well plates (Corning, 3820) using the EnVision 2102 multimode plate reader (Perkin Elmer) running version 1.13 of the EnVision Manager software. Briefly, increasing concentrations of His-ZnF23 or His-ZnF456 (0-3 μM) were titrated into a total of 20 μL assay buffer (10 mM Tris pH 8.0, 150 mM NaCl) containing 10 nM 6-FAM-labelled double-stranded oligonucleotide (Integrated DNA Technologies), whose sequences are listed in Table S3. Samples were measured at an excitation wavelength of 480 nm and an emission wavelength of 535 nm. Data were analyzed using GraphPad Prism 5 (Graph Pad Software Inc), plotting mP against protein concentration and fitting the curve to a single-site binding model (hyperbola) to extract a KD value.
Proteins for Octet analysis were expressed and purified similarly to those for fluorescence polarization with the following exceptions. Rosetta™(DE3) pLysS expressing 6xHis-tagged ZnF23 or ZnF456 were resuspended in binding buffer (20 mM sodium phosphate buffer pH 8.0, 500 mM NaCl, 0.1% NP40, 10 mM imidazole) and sonicated (Qsonica sonicators). Cell lysates were spun down and supernatants were run on a Ni-NTA column (MCLAB, NiNTA-300). After washing, the proteins were eluted with 300 mM imidazole, purified on a size exclusion column (GE Healthcare HiLoad 16/60 75 pg), and dialyzed overnight at 4°C against storage buffer. The proteins were then concentrated, aliquoted, flash frozen, and stored at -80°C.
Octet Red384 (ForteBio, Pall Science) was used for binding studies(Do et al., 2008). Ni-NTA biosensors (ForteBio) were used to immobilize 6×His-tagged proteins. Ni-NTA biosensors were pre-wet for 60 sec in kinetic buffer (PBS, 0.02% Tween20, 0.05% sodium azide), then immersed in ligand solution (10 μg/ml His-ZnF456 or 10 μg/ml His-ZnF23) for 180 sec, and finally immersed in kinetic buffer for 60 sec. The kinetics of DNA association was monitored by moving sensors into wells containing 0.3 nM-10 μM dsDNA (sequences in Table S3) for 180 sec. This was followed by dissociation in kinetic buffer for 180 sec. During the entire kinetic assay, the sample plate was kept shaking at 1000 rpm. A column of biosensors without ligand was titrated with analyte and used as a parallel reference control. A ligand-loaded biosensor without analyte was also used as baseline. Data were analyzed using GraphPad Prism 5 (Graph Pad Software Inc), plotting Response against DNA concentration and fitting the curve to a single-site binding model to extract a KD value.
Generation of HBG promoter-edited cells
Cas9 expression vector (Addgene plasmid ID 52962) and sgRNA targeting common sequences at the HBG1 and HBG2 genes 115 bp upstream of the transcription start sites (TSS) were sequentially introduced into HUDEP clone 2 (HUDEP-2) cells via lentiviral transduction (Canver et al., 2015). Briefly, lentivirus packaging was carried out as described in section “Functional Rescue Experiments”. HUDEP-2 cells were infected with Cas9 expression virus and selected with 10 μg/mL blasticidin. The sgRNA sequence was cloned into lentiGuide-Puro (Addgene plasmid ID 52963) (Sanjana et al., 2014), and transfected into HEK293T cells to generate virus. 105 Cas9-expressing cells were transduced with the virus containing the sgRNA-encoding plasmid and incubated for 7-10 days with 10 μg/mL blasticidin and 1 μg/mL puromycin selection to allow for editing. These bulk cultures were plated clonally at limiting dilution and grown for approximately 14 days.
Screening of -115 Edited HUDEP-2 clones
Genomic DNA extracted from HUDEP-2 clones was amplified using primer pairs specific to HBG1 or HBG2 (Table S3). Unique identification of HBG1 vs HBG2 was achieved by designing primer sequences overlapping nucleotide variants between HBG1 and HBG2. Additional variants were present within the amplicon to ensure single gene-specific amplification. PCR was performed using the Qiagen HotStarTaq 2* master mix and the following cycling conditions: 95°C for 15 minutes; 35 cycles of 95°C for 15 seconds, 60°C for 1 minute, 72°C for 1 minute; 72°C for 10 minutes. Amplicons were purified using the QIAquick® PCR Purification Kit (Cat. No. 28106) and Sanger sequenced or sent to amplicon sequencing.
For Sanger sequencing, sequences were aligned to a reference sequence obtained from sequencing HUDEP-2 cells. Edited alleles were identified by comparing the Sanger sequence chromatogram to the HUDEP-2 reference sequence. One clone, D3, was found to contain edited versions of all four alleles for HBG1 and HBG2 at the -115 promoter site.
For amplicon sequencing of bulk edited cells, reads were aligned to amplicon sequence of HBG1 or HBG2 promoters (∼1700 bp, see Table S3 for genotyping primers) using Bowtie2. Reads which did not align to any amplicon were discarded. Remaining reads were analyzed to determine the mutation type (i.e. deletion) and deleted position by comparing to the wild-type sequence. Reads were tallied to determine the proportion of different deletion variants.
Validation of HbF enrichment
5×104 of the D3 single-cell HUDEP-2 clones were stained for HbF expression by intracellular flow-cytometry (Canver et al., 2015). Briefly, cells were fixed with 0.05% glutaraldehyde in PBS, incubated at room temperature for 10 min and centrifuged at 600 × g for 5 min. Cells were then treated with 0.1% Triton in PBS with 0.1% BSA, incubated at room temperature for 5 min, and centrifuged at 600× g for 15 min. Cells were resuspended in PBS/BSA and incubated with 2 μL of FITC-conjugated anti-Human Fetal Hemoglobin antibody in the dark at room temperature for 15 min. Cells were analyzed using an LSRII flow cytometer, recording 10,000 events per condition. In addition, RNA was obtained from single-cell HUDEP-2 clone D3 and control HUDEP-2 cells and prepared for RT-qPCR (Canver et al., 2015) using CAT and HBA1/2 as reference genes. Expression data represent the mean of at least three replicates.
Chromatin Immunoprecipitation
107 expansion phase HUDEP-2 cells were collected and fixed with 1% formaldehyde for 5 minutes at room temperature. Fixation was quenched with 125 mM glycine. Cells were washed with ice cold PBS twice and resuspended in 0.13 mL lysis buffer (50 mM Tris-HCl 8.0, 10 mM EDTA, 0.5% SDS) and sonicated in a microtube (Covaris, 520045) with Covaris E220 ultrasonicator (Covaris). 0.12 mL sonicated chromatin was mixed with 1 mL ChIP dilution buffer (20 mM Tris-HCl 8.0, 2 mM EDTA, 1% Triton X-100, 300 mM NaCl, protease inhibitor), 20 μL Dynabeads protein G (Thermo Fisher Scientific) and 3 μg antibody (CTCF, 07-729, Millipore). After overnight rotating, the beads were washed with the following buffers: twice with RIPA150 (20 mM Tris-HCl 8.0, 1 mM EDTA, 1% Triton X-100, 150 mM NaCl, 0.1% Sodium Deoxycholate, 0.1% SDS), twice with RIPA500 (20 mM Tris-HCl 8.0, 1 mM EDTA, 1% Triton X-100, 500 mM NaCl, 0.1% Sodium Deoxycholate, 0.1% SDS), twice with LiCI buffer (10 mM Tris-HCl 8.0, 1 mM EDTA, 0.5% NP-40, 250 mM LiCI, 0.5% Sodium Deoxycholate), twice with TE buffer (10 mM Tris-HCl 8.0, 1 mM EDTA). To elute and decrosslink, 300μL Elution buffer (50 mM Tris-HCL 8.0, 10 mM EDTA, 1% SDS, 150 mM NaCl, 0.1 mg/mL Proteinase K) was added to the beads and incubated at 65 °C overnight. The eluted material w as extracted with phenol chloroform, and DNA was precipitated by adding 750μL absolute ethanol. DNA was pelleted by centrifuging at 14000 rpm for 15 minutes, washed once with 75% ethanol, then dried and dissolved with 50 μL TE buffer. The quality of ChIP was verified by real-time PCR. To construct ChIP-seq library, we used NEBNext® Ultra™ II DNA Library Prep Kit (NEB) according to manufacturer's protocol. Quality check of library was carried out with Qubit and bioanalyzer. The library was sequenced in the NextSeq 500 platform with NextSeq® 500/550 High Output Kit v2 (75 cycles). Paired-end sequencing was performed (2×42 bp, 6 bp index).
CUT&RUN
CUT&RUN experiments were carried out as described (Skene and Henikoff, 2017) with modifications. Briefly, nuclei from 2×106 cells were isolated with NE buffer (20 mM HEPES-KOH, pH 7.9, 10 mM KCl, 0.5 mM Spermidine, 0.1% Triton X-100, 20% Glycerol and 1× protease inhibitor cocktails from Sigma), captured with BioMag®Plus Concanavalin A (Polysciences) and incubated with primary antibody for 2 hours. After washing away unbound antibody with wash buffer (20 mM HEPES-NaOH pH 7.5, 150 mM NaCl, 0.5 mM Spermidine, 0.1% BSA and 1× protease inhibitor cocktails from Sigma), protein A-MNase was added at a 1:1000 ratio and incubated for 1 hour. The nuclei were washed again and placed in a 0°C metal block. To activate protein A-MNase, CaCl2 was added to a final concentration of 2 mM. The reaction was carried out for different time courses and stopped by addition of equal volume of 2×STOP buffer (200 mM NaCl, 20 mM EDTA, 4 mM EGTA, 50 μg/mL RNase A and 40 μg/mL glycogen). The protein-DNA complex was released by centrifugation and then digested by proteinase K at 50°C overnight. DNA was extracted by ethanol precipitation, followed by Qubit fluorometer and bioanalyzer quality control. Protein A-MNase (batch 5) was kindly provided by Dr. Steve Henikoff. The antibodies used were: GATA1, ab11852, abcam; CTCF, 07-729, Millipore; BCL11A, ab191401, abcam; normal rabbit IgG, 12-370, Millipore.
Library preparation and sequencing for CUT&RUN
To construct the CUT&RUN DNA library for sequencing on an Illumina platform, we modified the protocol of NEBNext® Ultra™ II DNA Library Prep Kit, NEB (Ipswich, USA), aiming to preserve short DNA fragments (30-80 bp). Briefly, dA-tailing temperature was decreased to 50°C to avoid DNA melting, and the reaction time was increased to 1 hour to compensate for lower enzymatic activity. After adapter ligation, 2.1× volume of AMPure XP beads was added to the reaction to ensure high recovery efficiency of short fragments. After 12 cycles of PCR amplification, the reaction was cleaned up with 1.2× volume of AMPure XP beads. 16-24 barcoded libraries were quantified and mixed at equal molar ratio. PCR dimers were removed with Pippin prep size selection according to manufacturer's manual, and the mixed library was denatured according to the standard protocol from Illumina. 1.3 mL of 1.8 pM diluted library was loaded to a NextSeq® 500/550 High Output Kit v2 (75 cycles), and sequenced in the NextSeq 500 platform. To enable determination of fragment length, paired-end sequencing was performed (2×42 bp, 6 bp index).
CUT&RUN data processing
Paired-end sequencing reads were trimmed with Trimmomatic (Bolger et al., 2014) to remove adapter sequences from the 3′ end of each read using parameters ILLUMINACLIP: 2:15:4:4:true SLIDINGWINDOW:4:15 MINLEN:25. Reads which were 25 bp or longer after trimming, and were paired between the mates were kept. Then reads were aligned to the reference human hg19 assembly (Schatz and Langmead, 2013) with settings –end-to-end, --dove-tail, --phred33. The – dove-tail considers mates that overlap with each other usually when fragment length is less than read length, as a concordant alignment. The resulting alignments, recorded in BAM file, were sorted, indexed, and marked for duplicates with Piccard (http://broadinstitute.github.io/picard) MarkDuplicate function. Afterward, the BAM file was filtered with SAMtools (Badis et al., 2009) to discard reads, mates that were unmapped, or PCR/optical duplicates (-f 3 –F 4 –F 8 –F 1024). Fragments of size less than 120 bp were kept (Skene and Henikoff, 2017).
Peak calling and overlap with ChIP-seq
MACS (Zhang et al., 2008) version 2.1 was used to call peaks from the BAM file with narrowPeak setting, P-value cutoff of 1e-5, and with –SPMR which normalizes the signal intensity to facilitate comparison between tracks. To compare CUT&RUN with ChIP-seq, we sampled the number of paired reads per experiment to 18 million, and called peaks with default narrowPeak setting in MACS2. We then computed the Jaccard overlap coefficient among the top 15,000 peaks called per experiment. To evaluate the signal-to-noise ratio of GATA1 CUT&RUN and GATA1 ChIP-seq, we computed the fraction of total reads located in top genomic bins (of 500 bp) with highest read coverage. This was achieved with the plotFingerprint tool using default settings (Ramirez et al., 2016).
De novo motif discovery from peaks
To discover possible enriched sequences among the BCL11A CUT&RUN peaks, we performed de novo motif discovery with MEME motif suite (Bailey and Elkan, 1994; Machanick and Bailey, 2011). Specifically, sequences from -100 bp to +100 bp from the summit position of all peaks called by MACS were analyzed for enrichment. Top enriched motifs (up to 6 motifs) were collected from running DREME (Bailey, 2011) and MEME on each batch of sequences.
Footprint detection and motif discovery within footprints
A footprint is a signature cut pattern left by the MNase enzyme as it cuts around regions bound by a TF. A footprint has a characteristic low number of cuts within core region containing consensus motif that is well protected from enzyme cuts, and a high number of cuts in the two flanking regions of the core. To detect footprints, we first enumerate all fragment ends to determine the precise cut location of the enzyme. We used BEDtools (Quinlan, 2014) bamtobed, with parameter –bedpe, to determine the fragment ends. We then ran a simple calculation termed Footprint Occupancy Score (FOS)(Neph et al., 2012) to detect footprints. Significant footprints were next analyzed for enriched motif sequence with the MEME discovery tool. Overall, this analysis allowed ascertainment of binding sites of CUT&RUN which met two important criteria: 1) presence of a consensus motif, and 2) location within a footprint region (i.e. protection from digestion).
Single-locus footprinting, and global footprinting analysis targeted toward TGACCA motif
We produced a single-base resolution footprinting profile for the beta-globin region. To do so, we first enumerated all fragment ends as above, and then combined cuts from all CD34+ CUT&RUN BCL11A experiments to produce an aggregate footprinting profile.
To fit a binding probability model for BCL11A, we relied on the motif obtained by PBM and from the PWM determined from motif analysis within the footprint. The consensus motif sequence pointed to TGACCA. With this consensus, motif-targeted footprinting followed a multistep procedure. 1) We first used FIMO (Grant et al., 2011) to find all occurrences of TGACCA within peak regions of the genome. 2) For all occurrences of TGACCA, we constructed a cut frequency matrix with make_cut_matrix.py (https://github.com/Parkerlab/atactk) where the rows of this matrix were occurrences of TGACCA, and the columns were the cuts at each base for the 206-base motif-centered region (100-base each flank, 6-base motif). 3) We used CENTIPEDE (Pique-Regi et al., 2011) to learn a binding model from the observed cut frequency matrix with default parameters. This method assumes that the sites unbound by a TF differ in many ways from sites bound by a TF, such as in their cut frequencies. It uses an unsupervised Bayes mixture model to classify each site as either bound or unbound by TF and outputs a probability value. Using this approach, we quantified the binding probability for the HBG1/2 promoter at the distal and proximal motif locations independently.
It should be noted that since HBG1 and HBG2 promoters are duplicates of each other, we summed the cut frequency from both regions and computed one binding log odds for the HBG1/2 promoters. We also report log-odds scores for the combined CD34+ experiments, sub-sampled to 40 million reads, and proceeded to build a model from the combined experiment.
Chromosome conformation capture analysis
Chromosome conformation capture (3C) was performed as described previously (Xu et al., 2010). 107 HUDEP-2 and derived cell lines at day 5 of erythroid differentiation were crosslinked with 2% formaldehyde for 10 minutes at room temperature. The cells were lysed for 15 minutes with ice cold lysis buffer (10 mM Tris-HCL at pH 8.0, 10 mM NaCl, 0.2% NP-40), and resuspended in 0.5 mL 1.2× NEB CutSmart® buffer. 10 μL 10% SDS was added to the nuclei, then shaken for 20 minutes at 65°C. Triton X-100 was added to 2% and nuclei were shaken for 1 hour at 37°C. Digestion was done by adding 300 U of EcoRI-HF (NEB, R3101) to the nuclei and incubating overnight at 37°C. 200 U EcoRI-HF was added again a nd incubated for another 3 hours. 88 μL 10% SDS was added to the nuclei, followed by 65°C incubation for 20 minutes. To ligate the digested chromatin, the following components were added to the nuclei and brought up to 7 mL by distilled water: 0.7 mL 10× ligation buffer (NEB), 350 μL 20% Triton X-100, and 70 U T4 DNA ligase (Thermo Fisher, EL0011). The reaction was carried out at 16°C for 4 hours and room temperature for 30 minutes. The DNA was purified by phenol extraction and isopropanol precipitation. Quantitative RT-PCR was performed with iQ SYBR Green mix (Biorad). Samples were normalized to an independent ERCC3 locus. Primers are listed in Table S3.
ATAC-seq
ATAC-seq was performed with Illumina Nextera DNA Preparation kit (Illumina, FC-121-1030) as previously described (Buenrostro et al., 2015). 50,000 cells of expansion phase HUDEP-2 or derived cell lines were collected, and permeabilized with 50 μL ice cold lysis buffer (10 mM Tris-HCL pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% IGEPAL CA-360) for 10 minutes. The transposition reaction was carried out at 37°C for 60 minutes in 50 μL volume containing 25 μL 2× TD buffer and 2.5 μL transpose enzyme (contained in Illumina Nextera DNA Preparation kit). DNA was purified with Qiagen MinElute PCR purification kit. Library amplification was done with NEBNext Ultra II Q5® Master Mix (contained in NEBNext® Ultra™ II DNA Library Prep Kit). PCR amplification was carried out as follow: 98°C 30s, 98°C 10s, 63°C 30s, 72°C 1min, repeat steps 2-4 for another 7 cycles, hold at 4°C. The resulting libraries were purified using Qiagen MinElute PCR purification kit, and quantified with Qubit fluorometer and bioanalyzer, and then sequenced in NextSeq 500 platform. Sequencing parameter was paired end 2×41 bp reads, 8 bp index.
ATAC-seq data analysis
In processing ATAC-seq data, paired-end reads were trimmed and aligned to hg19 using Trimmomatic and Bowtie2 respectively. For promoter-edited samples, these were aligned to a custom hg19 genome where there is a 1 bp deletion in HBG1 promoter and a 13 bp deletion in the HBG2 promoter (exact locations shown in Figure S6A). Reads were piled-up using MACS2 – SPMR –B parameters and visualized with IGV.
Quantification and Statistical Analysis
Sample size, mean, and significance P values are indicated in the text, figure legends, or Method Details. Error bars in the experiments represent standard error of the mean (SEM) from either independent experiments or independent samples. Statistical analyses were performed using GraphPad Prism, or reported by the relevant computational tools. Detailed information about statistical methods is specified in figure legends or Method details.
Data and Software Availability
All raw and processed CUT&RUN, ChIP-seq and ATAC-seq data have been deposited in the NCBI Gene Expression Omnibus under accession number GEO: GSE104676.
PBM data have been deposited in UniPROBE (http://the_brain.bwh.harvard.edu/uniprobe/) under accession number: LIU18A.
Supplementary Material
Figure S1. Functional rescue analysis in BCL11A exon knockout cells, Related to Figure 1
(A) Schematic of human BCL11A isoforms.
(B) Schematic of TALEN-mediated deletion of BCL11A exon 1 (E1D) or exon1 to 5 (E15D) in murine MEL cells.
(C) Representative Sanger sequencing cytographs of BCL11A exon knockout clones.
(D) Western blot analysis validated the complete absence of BCL11A protein expression in exon knockout cells.
(E) RT-qPCR analysis validated the loss of BCL11A mRNA expression relative to the unmodified MEL or Enh-KO cells.
(F) Complete knockout of BCL11A expression led to significant derepression of εy- andβh1-globin genes. The mRNA expression of εy, βh1 and βmajor is shown as the % of total β-like globin genes.
(G) Ectopic expression of BCL11A-XL but not the L isoform restored the stable repression of εy- and βh1-globin genes in multiple independent BCL11A exon knockout MEL cell lines.
(H) Expression of full-length BCL11A-XL, but not domain mutants lacking the NuRD-interacting domain, ZnF23 or ZnF456, restored repression of βh1-globin. Expression of βmajor-globin remained largely unaffected. Each circle denotes an independent single-cell-derived stable cell clone. Results are mean ± SEM of multiple independent clones and analyzed by one-way ANOVA with repeated measures. *P < 0.05.
Figure S2. Optimization and application of CUT&RUN, Related to Figure 4, (A) Heat map comparison of overlapping peaks between ChIP-seq and CUT&RUN (see Methods).
(B) Left, Fragment length distributions of original and modified CUT&RUN protocols. Fragment ends were enumerated and used to calculate the fragment length. Right, Signal-to-noise measured by the total number of reads in top ranked randomly sampled bins (each 500 bp) with plotFingerprint. A steep rise to the right of the plot indicates a better signal enrichment (see Methods).
(C) Western blot showing the specificity of BCL11A antibody.
(D) Protein levels of BCL11A and GATA1 in expansion phase HUDEP-2 anddifferentiating CD34+ cells analyzed by Western blot of whole cell lysates. Histone H3 was used as loading control.
(E) mRNA level of β-like globin genes in differentiating CD34+ cells analyzed by RT-qPCR. Results are shown as mean ± SEM of three experiments.
(F) Experimental design of BCL11A CUT&RUN in HUDEP-2 cells and differentiating CD34+ cells.
(G) Heat maps showing the BCL11A CUT&RUN peaks in HUDEP-2, BCL11A KO HUDEP-2 cells, and in CD34+ cells.
(H) Peak numbers of all BCL11A CUT&RUN. The fraction of peaks that contain the motif is shown in dark blue.
(I) Pairwise overlap of peaks for all BCL11A CUT&RUN experiments. Peaks were called by MACS2 with narrowPeak setting. Overlap refers to 1-bp overlap between peaks.
(J) Peak distribution of BCL11A CUT&RUN. Data for 60 min or 30 min protein A-MNase digestion are shown for HUDEP-2 and CD34+ cells, respectively.
Figure S3. Motif discovery for BCL11A CUT&RUN, Related to Figure 4
(A) Motifs discovered in HUDEP-2 and each stage of CD34+ cells. Each motif and its position and likely binding factor are shown. Data for 60 min or 30 min protein A-MNase digestion are shown for HUDEP-2 and CD34+ cells, respectively. Results of other cut times were similar and not shown. E-values shown in upper right were reported by MEME.
(B,C) Comparison of all combinations of TG(A/G)CC(A/C/T) in BCL11A CUT&RUN in HUDEP-2 (B) or CD34+ cells (C). The sequences are ranked by −log10(E-value), where E-value is the probability of occurrence reported by MEME. The column “Ratio” shows the proportion of peaks that contain the corresponding sequence. The datasets used are 60 min cut in HUDEP-2 and 30 min cut in CD34+ cells.
Figure S4. Footprint analysis for BCL11A CUT&RUN, Related to Figure 4
(A) Targeted motif footprint analysis for BCL11A CUT&RUN in CD34+ cell experiments. Cut probability for each base on TGACCA and surrounding sequences was plotted.
(B,C) Targeted motif footprints in BCL11A CUT&RUN for control sequences in HUDEP-2 (B) or CD34+ cells (C). Cut probabilities for each base surrounding indicated motifs were plotted.
Figure S5. High-resolution CUT&RUN profiles in α-globin region, Related to Figure 5
(A) CUT&RUN profiles in α-globin cluster. Antibodies and cell types for each track are shown on the right. The promoter of ζ-globin gene (HBZ) is highlighted in pink.
(B) Left, BCL11A binding at HBZ promoter across multiple CUT&RUN experiments. Right, zoom in view of 180 bp of HBZ promoter region. The TGACCA motif is highlighted in green.
(C) Single locus footprint analysis shows the cut frequency at each nucleotide of the HBZ region (from -251 to -179 relative to TSS). 14 CUT&RUN experiments in CD34+ cells are combined for this analysis.
Figure S6. CUT&RUN in γ-globin promoter edited cells, Related to Figure 6
(A) Part of the mutant reference genome containing two mutant motifs that correspond to two alleles in clone D3.
(B) Chromosome conformation capture (3C) assay in wild-type and γ-globin promoter edited cells. Results are shown as mean ± SEM of three experiments.
(C) Three biological replicates of BCL11A CUT&RUN showing the peaks in γ-globin gene region in wild-type HUDEP-2 cells and clone D3. The reads from D3 were mapped to the mutant genome. Note that in the mutant genome, HBG1 carries the ΔC allele, and HBG2 carries the Δ13bp allele. Thus the reads of Δ1bp alleles (ΔA+ ΔC+ ΔC) will be mapped to HBG1, and reads of Δ13bp allele will be mapped to HBG2.
(D) Zoomed in view showing the BCL11A CUT&RUN peaks on the HBG1 promoter.
(E) ATAC-seq in wild-type, BCL11A knockout and γ-globin promoter edited cells.
(F) Left, RT-qPCR analysis of mRNA levels for γ-globin and (β-globin in indicated cells. Right, percentages of β- or γ-globin mRNA. Results are shown as mean ± SEM of threeexperiments.
(G) Single locus footprint showing the γ-globin promoter region in HUDEP-2 and bulk edited cells. Bottom, sequence and percentage of each edited allele in bulk edited cells, determined by amplicon sequencing.
Highlights.
BCL11A recognizes TGACCA motif in vitro and in vivo through zinc finger domain.
BCL11A acts at distal TGACCA in human γ-globin promoters to silence expression.
TGACCA motif is mutated in individuals with HPFH syndrome.
Editing the distal TGACCA prevents BCL11A binding and reactivates γ-globin expression.
Acknowledgments
We thank Peter Skene and Steven Henikoff for advice on protocols for CUT&RUN, Birgit Knoechel for assistance with DNA sequencing, and Paul Bruno for improvements in protein purification. We appreciate the generosity of Merlin Crossley for sharing findings during the evolution of these studies. This work was supported by grants HG003985 (M.L.B.), a Doris Duke Charitable Foundation Innovations in Clinical Research Award, Cooley's Anemia Foundation Fellowship and R03 DK109232 (D.E.B.), K01 DK093543 (J.X.), R01HL119099 (G.C.Y.) and R01 HL032259 (S.H.O.). S.H.O. is an Investigator of HHMI.
Footnotes
Author Contributions. S.H.O. conceived this study. J.X. designed the functional rescue assay and V.V.H. and J.X. executed experiments. M.L.B. conceived and supervised the PBM experiments, and V.V.H., J.V.K., and J.R. executed experiments and analyzed data. V.V.H, J.H., and W.K. expressed and purified recombinant BCL11A protein and performed fluorescence polarization and Octet experiments. Fluorescence polarization experiments were performed at the Institute for Chemistry and Cell Biology-Longwood Screening Facility at HMS. Octet experiments were performed at the Center for Macromolecular Interactions in the Department of Biological Chemistry and Molecular Pharmacology at HMS. R.K. and Y.N. provided the HUDEP-2 cell line. N.L. designed and executed CUT&RUN experiments, and conducted ChIP-seq, ATAC-seq, and chromosome conformation capture experiments. Q.Z. and N.L. designed and performed computational analyses, and interpreted the results of CUT&RUN. G.C.Y. and S.H.O provided feedback. F.S., C.M.T., and D.E.B. generated promoter edited HUDEP-2 cells. S.H.O., V.V.H., N.L., and Q.Z. wrote the manuscript with input from all authors.
Declaration of Interests. The authors have no competing interests related to this work.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Ai S, Peng Y, Li C, Gu F, Yu X, Yue Y, Ma Q, Chen J, Lin Z, Zhou P, et al. EED orchestration of heart maturation through interaction with HDACs is H3K27me3-independent. Elife. 2017;6 doi: 10.7554/eLife.24570. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Albu DI, Feng D, Bhattacharya D, Jenkins NA, Copeland NG, Liu P, Avram D. BCL11B is required for positive selection and survival of double-positive thymocytes. J Exp Med. 2007;204:3003–3015. doi: 10.1084/jem.20070863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Avram D, Fields A, Senawong T, Topark-Ngarm A, Leid M. COUP-TF (chicken ovalbumin upstream promoter transcription factor)-interacting protein 1 (CTIP1) is a sequence-specific DNA binding protein. Biochem J. 2002;368:555–563. doi: 10.1042/BJ20020496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA, Chan ET, Metzler G, Vedenko A, Chen X, et al. Diversity and complexity in DNA recognition by transcription factors. Science. 2009;324:1720–1723. doi: 10.1126/science.1162327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bailey TL. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics. 2011;27:1653–1659. doi: 10.1093/bioinformatics/btr261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol. 1994;2:28–36. [PubMed] [Google Scholar]
- Basak A, Hancarova M, Ulirsch JC, Balci TB, Trkova M, Pelisek M, Vlckova M, Muzikova K, Cermak J, Trka J, et al. BCL11A deletions result in fetal hemoglobin persistence and neurodevelopmental alterations. J Clin Invest. 2015;125:2363–2368. doi: 10.1172/JCI81163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bauer DE, Kamran SC, Lessard S, Xu J, Fujiwara Y, Lin C, Shao Z, Canver MC, Smith EC, Pinello L, et al. An erythroid enhancer of BCL11A subject to genetic variation determines fetal hemoglobin level. Science. 2013;342:253–257. doi: 10.1126/science.1242088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berger MF, Badis G, Gehrke AR, Talukder S, Philippakis AA, Pena-Castillo L, Alleyne TM, Mnaimneh S, Botvinnik OB, Chan ET, et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell. 2008;133:1266–1276. doi: 10.1016/j.cell.2008.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berger MF, Bulyk ML. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat Protoc. 2009;4:393–411. doi: 10.1038/nprot.2008.195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Berger MF, Philippakis AA, Qureshi AM, He FS, Estep PW, 3rd, Bulyk ML. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat Biotechnol. 2006;24:1429–1435. doi: 10.1038/nbt1246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buenrostro JD, Wu B, Chang HY, Greenleaf WJ. ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide. Curr Protoc Mol Biol. 2015;109:21 29 21–29. doi: 10.1002/0471142727.mb2129s109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Canver MC, Lessard S, Pinello L, Wu Y, Ilboudo Y, Stern EN, Needleman AJ, Galacteros F, Brugnara C, Kutlar A, et al. Variant-aware saturating mutagenesis using multiple Cas9 nucleases identifies regulatory elements at trait-associated loci. Nat Genet. 2017;49:625–634. doi: 10.1038/ng.3793. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Canver MC, Smith EC, Sher F, Pinello L, Sanjana NE, Shalem O, Chen DD, Schupp PG, Vinjamur DS, Garcia SP, et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature. 2015;527:192–197. doi: 10.1038/nature15521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choo Y, Klug A. Physical basis of a protein-DNA recognition code. Curr Opin Struct Biol. 1997;7:117–125. doi: 10.1016/s0959-440x(97)80015-2. [DOI] [PubMed] [Google Scholar]
- Collins FS, Metherall JE, Yamakawa M, Pan J, Weissman SM, Forget BG. A point mutation in the A gamma-globin gene promoter in Greek hereditary persistence of fetal haemoglobin. Nature. 1985;313:325–326. doi: 10.1038/313325a0. [DOI] [PubMed] [Google Scholar]
- Concepcion J, Witte K, Wartchow C, Choo S, Yao D, Persson H, Wei J, Li P, Heidecker B, Ma W, et al. Label-free detection of biomolecular interactions using BioLayer interferometry for kinetic characterization. Comb Chem High Throughput Screen. 2009;12:791–800. doi: 10.2174/138620709789104915. [DOI] [PubMed] [Google Scholar]
- Consortium, E.P. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deng W, Rupon JW, Krivega I, Breda L, Motta I, Jahn KS, Reik A, Gregory PD, Rivella S, Dean A, et al. Reactivation of developmentally silenced globin genes by forced chromatin looping. Cell. 2014;158:849–860. doi: 10.1016/j.cell.2014.05.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dias C, Estruch SB, Graham SA, McRae J, Sawiak SJ, Hurst JA, Joss SK, Holder SE, Morton JE, Turner C, et al. BCL11A Haploinsufficiency Causes an Intellectual Disability Syndrome and Dysregulates Transcription. Am J Hum Genet. 2016;99:253–274. doi: 10.1016/j.ajhg.2016.05.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Do T, Ho F, Heidecker B, Witte K, Chang L, Lerner L. A rapid method for determining dynamic binding capacity of resins for the purification of proteins. Protein Expr Purif. 2008;60:147–150. doi: 10.1016/j.pep.2008.04.009. [DOI] [PubMed] [Google Scholar]
- Gelinas R, Endlich B, Pfeiffer C, Yagi M, Stamatoyannopoulos G. G to A substitution in the distal CCAAT box of the A gamma-globin gene in Greek hereditary persistence of fetal haemoglobin. Nature. 1985;313:323–325. doi: 10.1038/313323a0. [DOI] [PubMed] [Google Scholar]
- Gilman JG, Mishima N, Wen XJ, Stoming TA, Lobel J, Huisman TH. Distal CCAAT box deletion in the A gamma globin gene of two black adolescents with elevated fetal A gamma globin. Nucleic Acids Res. 1988;16:10635–10642. doi: 10.1093/nar/16.22.10635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gnanapragasam MN, Scarsdale JN, Amaya ML, Webb HD, Desai MA, Walavalkar NM, Wang SZ, Zu Zhu S, Ginder GD, Williams DC., Jr p66Alpha-MBD2 coiled-coil interaction and recruitment of Mi-2 are critical for globin gene silencing by the MBD2-NuRD complex. Proc Natl Acad Sci U S A. 2011;108:7487–7492. doi: 10.1073/pnas.1015341108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang J, Liu X, Li D, Shao Z, Cao H, Zhang Y, Trompouki E, Bowman TV, Zon LI, Yuan GC, et al. Dynamic Control of Enhancer Repertoires Drives Lineage and Stage-Specific Transcription during Hematopoiesis. Dev Cell. 2016;36:9–23. doi: 10.1016/j.devcel.2015.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang P, Keller CA, Giardine B, Grevet JD, Davies JOJ, Hughes JR, Kurita R, Nakamura Y, Hardison RC, Blobel GA. Comparative analysis of three-dimensional chromosomal architecture identifies a novel fetal hemoglobin regulatory element. Genes Dev. 2017 doi: 10.1101/gad.303461.117. ePub. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ippolito GC, Dekker JD, Wang YH, Lee BK, Shaffer AL, 3rd, Lin J, Wall JK, Lee BS, Staudt LM, Liu YJ, et al. Dendritic cell fate is determined by BCL11. A Proc Natl Acad Sci U S A. 2014;111:E998–1006. doi: 10.1073/pnas.1319228111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jameson DM, Seifried SE. Quantification of protein-protein interactions using fluorescence polarization. Methods. 1999;19:222–233. doi: 10.1006/meth.1999.0853. [DOI] [PubMed] [Google Scholar]
- Jawaid K, Wahlberg K, Thein SL, Best S. Binding patterns of BCL11A in the globin and GATA1 loci and characterization of the BCL11A fetal hemoglobin locus. Blood Cells Mol Dis. 2010;45:140–146. doi: 10.1016/j.bcmd.2010.05.006. [DOI] [PubMed] [Google Scholar]
- Jiang B, Liu JS, Bulyk ML. Bayesian hierarchical model of protein-binding microarray k-mer data reduces noise and identifies transcription factor subclasses and preferred k-mers. Bioinformatics. 2013;29:1390–1398. doi: 10.1093/bioinformatics/btt152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim CG, Sheffery M. Physical characterization of the purified CCAAT transcription factor, alpha-CP1. J Biol Chem. 1990;265:13362–13369. [PubMed] [Google Scholar]
- Kurita R, Suda N, Sudo K, Miharada K, Hiroyama T, Miyoshi H, Tani K, Nakamura Y. Establishment of immortalized human erythroid progenitor cell lines able to produce enucleated red blood cells. PLoS One. 2013;8:e59890. doi: 10.1371/journal.pone.0059890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lejon S, Thong SY, Murthy A, AlQarni S, Murzina NV, Blobel GA, Laue ED, Mackay JP. Insights into association of the NuRD complex with FOG-1 from the crystal structure of an RbAp48.FOG-1 complex. J Biol Chem. 2011;286:1196–1203. doi: 10.1074/jbc.M110.195842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu H, Ippolito GC, Wall JK, Niu T, Probst L, Lee BS, Pulford K, Banham AH, Stockwin L, Shaffer AL, et al. Functional studies of BCL11A: characterization of the conserved BCL11A-XL splice variant and its interaction with BCL6 in nuclear paraspeckles of germinal center B cells. Mol Cancer. 2006;5:18. doi: 10.1186/1476-4598-5-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu P, Keller JR, Ortiz M, Tessarollo L, Rachel RA, Nakamura T, Jenkins NA, Copeland NG. Bcl11a is essential for normal lymphoid development. Nat Immunol. 2003;4:525–532. doi: 10.1038/ni925. [DOI] [PubMed] [Google Scholar]
- Liu X, Zhang Y, Chen Y, Li M, Zhou F, Li K, Cao H, Ni M, Liu Y, Gu Z, et al. In situ capture of chromatin interactions by biotinylated dCas9. Cell. 2017;170:1028–1043. doi: 10.1016/j.cell.2017.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Longabaugh WJR, Zeng W, Zhang JA, Hosokawa H, Jansen CS, Li L, Romero-Wolf M, Liu P, Kueh HY, Mortazavi A, et al. Bcl11b and combinatorial resolution of cell fate in the T-cell gene regulatory network. Proc Natl Acad Sci U S A. 2017;114:5800–5807. doi: 10.1073/pnas.1610617114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Machanick P, Bailey TL. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics. 2011;27:1696–1697. doi: 10.1093/bioinformatics/btr189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marban C, Redel L, Suzanne S, Van Lint C, Lecestre D, Chasserot-Golaz S, Leid M, Aunis D, Schaeffer E, Rohr O. COUP-TF interacting protein 2 represses the initial phase of HIV-1 gene transcription in human microglial cells. Nucleic Acids Res. 2005;33:2318–2331. doi: 10.1093/nar/gki529. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- Martyn GE, Quinlan KGR, Crossley M. The regulation of human globin promoters by CCAAT box elements and the recruitment of NF-Y. Biochim Biophys Acta. 2017;1860:525–536. doi: 10.1016/j.bbagrm.2016.10.002. [DOI] [PubMed] [Google Scholar]
- Menzel S, Garner C, Gut I, Matsuda F, Yamaguchi M, Heath S, Foglio M, Zelenika D, Boland A, Rooks H, et al. A QTL influencing F cell production maps to a gene encoding a zinc-finger protein on chromosome 2p15. Nat Genet. 2007;39:1197–1199. doi: 10.1038/ng2108. [DOI] [PubMed] [Google Scholar]
- Nakagawa S, Gisselbrecht SS, Rogers JM, Hartl DL, Bulyk ML. DNA-binding specificity changes in the evolution of forkhead transcription factors. Proc Natl Acad Sci U S A. 2013;110:12349–12354. doi: 10.1073/pnas.1310430110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Neph S, Vierstra J, Stergachis AB, Reynolds AP, Haugen E, Vernot B, Thurman RE, John S, Sandstrom R, Johnson AK, et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature. 2012;489:83–90. doi: 10.1038/nature11212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pedone PV, Ghirlando R, Clore GM, Gronenborn AM, Felsenfeld G, Omichinski JG. The single Cys2-His2 zinc finger domain of the GAGA protein flanked by basic residues is sufficient for high-affinity specific DNA binding. Proc Natl Acad Sci U S A. 1996;93:2822–2826. doi: 10.1073/pnas.93.7.2822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pique-Regi R, Degner JF, Pai AA, Gaffney DJ, Gilad Y, Pritchard JK. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 2011;21:447–455. doi: 10.1101/gr.112623.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics. 2014;47:11 12 11–34. doi: 10.1002/0471250953.bi1112s47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ramirez F, Ryan DP, Gruning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dundar F, Manke T. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanjana NE, Shalem O, Zhang F. Improved vectors and genome-wide libraries for CRISPR screening. Nat Methods. 2014;11:783–784. doi: 10.1038/nmeth.3047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sankaran VG, Menne TF, Xu J, Akie TE, Lettre G, Van Handel B, Mikkola HK, Hirschhorn JN, Cantor AB, Orkin SH. Human fetal hemoglobin expression is regulated by the developmental stage-specific repressor BCL11A. Science. 2008;322:1839–1842. doi: 10.1126/science.1165409. [DOI] [PubMed] [Google Scholar]
- Sankaran VG, Xu J, Byron R, Greisman HA, Fisher C, Weatherall DJ, Sabath DE, Groudine M, Orkin SH, Premawardhena A, et al. A functional element necessary for fetal hemoglobin silencing. N Engl J Med. 2011;365:807–814. doi: 10.1056/NEJMoa1103070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sankaran VG, Xu J, Ragoczy T, Ippolito GC, Walkley CR, Maika SD, Fujiwara Y, Ito M, Groudine M, Bender MA, et al. Developmental and species-divergent globin switching are driven by BCL11A. Nature. 2009;460:1093–1097. doi: 10.1038/nature08243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schatz MC, Langmead B. The DNA Data Deluge: Fast, efficient genome sequencing machines are spewing out more data than geneticists can analyze. IEEE Spectr. 2013;50:26–33. doi: 10.1109/MSPEC.2013.6545119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Senawong T, Peterson VJ, Leid M. BCL11A-dependent recruitment of SIRT1 to a promoter template in mammalian cells results in histone deacetylation and transcriptional repression. Arch Biochem Biophys. 2005;434:316–325. doi: 10.1016/j.abb.2004.10.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simon R, Brylka H, Schwegler H, Venkataramanappa S, Andratschke J, Wiegreffe C, Liu P, Fuchs E, Jenkins NA, Copeland NG, et al. A dual function of Bcl11b/Ctip2 in hippocampal neurogenesis. EMBO J. 2012;31:2922–2936. doi: 10.1038/emboj.2012.142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skene PJ, Henikoff S. An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. Elife. 2017;6 doi: 10.7554/eLife.21856. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Superti-Furga G, Barberis A, Schaffner G, Busslinger M. The -117 mutation in Greek HPFH affects the binding of three nuclear factors to the CCAAT region of the gamma-globin gene. EMBO J. 1988;7:3099–3107. doi: 10.1002/j.1460-2075.1988.tb03176.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tang B, Di Lena P, Schaffer L, Head SR, Baldi P, Thomas EA. Genome-wide identification of Bcl11b gene targets reveals role in brain-derived neurotrophic factor signaling. PLoS One. 2011;6:e23691. doi: 10.1371/journal.pone.0023691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uda M, Galanello R, Sanna S, Lettre G, Sankaran VG, Chen W, Usala G, Busonero F, Maschio A, Albai G, et al. Genome-wide association study shows BCL11A associated with persistent fetal hemoglobin and amelioration of the phenotype of beta-thalassemia. Proc Natl Acad Sci U S A. 2008;105:1620–1625. doi: 10.1073/pnas.0711566105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wakabayashi Y, Watanabe H, Inoue J, Takeda N, Sakata J, Mishima Y, Hitomi J, Yamamoto T, Utsuyama M, Niwa O, et al. Bcl11b is required for differentiation and survival of alphabeta T lymphocytes. Nat Immunol. 2003;4:533–539. doi: 10.1038/ni927. [DOI] [PubMed] [Google Scholar]
- Wiles ET, Lui-Sargent B, Bell R, Lessnick SL. BCL11B is up-regulated by EWS/FLI and contributes to the transformed phenotype in Ewing sarcoma. PLoS One. 2013;8:e59369. doi: 10.1371/journal.pone.0059369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wolfe SA, Nekludova L, Pabo CO. DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct. 2000;29:183–212. doi: 10.1146/annurev.biophys.29.1.183. [DOI] [PubMed] [Google Scholar]
- Xu J, Bauer DE, Kerenyi MA, Vo TD, Hou S, Hsu YJ, Yao H, Trowbridge JJ, Mandel G, Orkin SH. Corepressor-dependent silencing of fetal hemoglobin expression by BCL11A. Proc Natl Acad Sci U S A. 2013;110:6518–6523. doi: 10.1073/pnas.1303976110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu J, Sankaran VG, Ni M, Menne TF, Puram RV, Kim W, Orkin SH. Transcriptional silencing of {gamma}-globin by BCL11A involves long-range interactions and cooperation with SOX6. Genes Dev. 2010;24:783–798. doi: 10.1101/gad.1897310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu J, Shao Z, Glass K, Bauer DE, Pinello L, Van Handel B, Hou S, Stamatoyannopoulos JA, Mikkola HK, Yuan GC, et al. Combinatorial assembly of developmental stage-specific enhancers controls gene expression programs during human erythropoiesis. Dev Cell. 2012;23:796–811. doi: 10.1016/j.devcel.2012.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, et al. Model-based analysis of ChIP-Seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figure S1. Functional rescue analysis in BCL11A exon knockout cells, Related to Figure 1
(A) Schematic of human BCL11A isoforms.
(B) Schematic of TALEN-mediated deletion of BCL11A exon 1 (E1D) or exon1 to 5 (E15D) in murine MEL cells.
(C) Representative Sanger sequencing cytographs of BCL11A exon knockout clones.
(D) Western blot analysis validated the complete absence of BCL11A protein expression in exon knockout cells.
(E) RT-qPCR analysis validated the loss of BCL11A mRNA expression relative to the unmodified MEL or Enh-KO cells.
(F) Complete knockout of BCL11A expression led to significant derepression of εy- andβh1-globin genes. The mRNA expression of εy, βh1 and βmajor is shown as the % of total β-like globin genes.
(G) Ectopic expression of BCL11A-XL but not the L isoform restored the stable repression of εy- and βh1-globin genes in multiple independent BCL11A exon knockout MEL cell lines.
(H) Expression of full-length BCL11A-XL, but not domain mutants lacking the NuRD-interacting domain, ZnF23 or ZnF456, restored repression of βh1-globin. Expression of βmajor-globin remained largely unaffected. Each circle denotes an independent single-cell-derived stable cell clone. Results are mean ± SEM of multiple independent clones and analyzed by one-way ANOVA with repeated measures. *P < 0.05.
Figure S2. Optimization and application of CUT&RUN, Related to Figure 4, (A) Heat map comparison of overlapping peaks between ChIP-seq and CUT&RUN (see Methods).
(B) Left, Fragment length distributions of original and modified CUT&RUN protocols. Fragment ends were enumerated and used to calculate the fragment length. Right, Signal-to-noise measured by the total number of reads in top ranked randomly sampled bins (each 500 bp) with plotFingerprint. A steep rise to the right of the plot indicates a better signal enrichment (see Methods).
(C) Western blot showing the specificity of BCL11A antibody.
(D) Protein levels of BCL11A and GATA1 in expansion phase HUDEP-2 anddifferentiating CD34+ cells analyzed by Western blot of whole cell lysates. Histone H3 was used as loading control.
(E) mRNA level of β-like globin genes in differentiating CD34+ cells analyzed by RT-qPCR. Results are shown as mean ± SEM of three experiments.
(F) Experimental design of BCL11A CUT&RUN in HUDEP-2 cells and differentiating CD34+ cells.
(G) Heat maps showing the BCL11A CUT&RUN peaks in HUDEP-2, BCL11A KO HUDEP-2 cells, and in CD34+ cells.
(H) Peak numbers of all BCL11A CUT&RUN. The fraction of peaks that contain the motif is shown in dark blue.
(I) Pairwise overlap of peaks for all BCL11A CUT&RUN experiments. Peaks were called by MACS2 with narrowPeak setting. Overlap refers to 1-bp overlap between peaks.
(J) Peak distribution of BCL11A CUT&RUN. Data for 60 min or 30 min protein A-MNase digestion are shown for HUDEP-2 and CD34+ cells, respectively.
Figure S3. Motif discovery for BCL11A CUT&RUN, Related to Figure 4
(A) Motifs discovered in HUDEP-2 and each stage of CD34+ cells. Each motif and its position and likely binding factor are shown. Data for 60 min or 30 min protein A-MNase digestion are shown for HUDEP-2 and CD34+ cells, respectively. Results of other cut times were similar and not shown. E-values shown in upper right were reported by MEME.
(B,C) Comparison of all combinations of TG(A/G)CC(A/C/T) in BCL11A CUT&RUN in HUDEP-2 (B) or CD34+ cells (C). The sequences are ranked by −log10(E-value), where E-value is the probability of occurrence reported by MEME. The column “Ratio” shows the proportion of peaks that contain the corresponding sequence. The datasets used are 60 min cut in HUDEP-2 and 30 min cut in CD34+ cells.
Figure S4. Footprint analysis for BCL11A CUT&RUN, Related to Figure 4
(A) Targeted motif footprint analysis for BCL11A CUT&RUN in CD34+ cell experiments. Cut probability for each base on TGACCA and surrounding sequences was plotted.
(B,C) Targeted motif footprints in BCL11A CUT&RUN for control sequences in HUDEP-2 (B) or CD34+ cells (C). Cut probabilities for each base surrounding indicated motifs were plotted.
Figure S5. High-resolution CUT&RUN profiles in α-globin region, Related to Figure 5
(A) CUT&RUN profiles in α-globin cluster. Antibodies and cell types for each track are shown on the right. The promoter of ζ-globin gene (HBZ) is highlighted in pink.
(B) Left, BCL11A binding at HBZ promoter across multiple CUT&RUN experiments. Right, zoom in view of 180 bp of HBZ promoter region. The TGACCA motif is highlighted in green.
(C) Single locus footprint analysis shows the cut frequency at each nucleotide of the HBZ region (from -251 to -179 relative to TSS). 14 CUT&RUN experiments in CD34+ cells are combined for this analysis.
Figure S6. CUT&RUN in γ-globin promoter edited cells, Related to Figure 6
(A) Part of the mutant reference genome containing two mutant motifs that correspond to two alleles in clone D3.
(B) Chromosome conformation capture (3C) assay in wild-type and γ-globin promoter edited cells. Results are shown as mean ± SEM of three experiments.
(C) Three biological replicates of BCL11A CUT&RUN showing the peaks in γ-globin gene region in wild-type HUDEP-2 cells and clone D3. The reads from D3 were mapped to the mutant genome. Note that in the mutant genome, HBG1 carries the ΔC allele, and HBG2 carries the Δ13bp allele. Thus the reads of Δ1bp alleles (ΔA+ ΔC+ ΔC) will be mapped to HBG1, and reads of Δ13bp allele will be mapped to HBG2.
(D) Zoomed in view showing the BCL11A CUT&RUN peaks on the HBG1 promoter.
(E) ATAC-seq in wild-type, BCL11A knockout and γ-globin promoter edited cells.
(F) Left, RT-qPCR analysis of mRNA levels for γ-globin and (β-globin in indicated cells. Right, percentages of β- or γ-globin mRNA. Results are shown as mean ± SEM of threeexperiments.
(G) Single locus footprint showing the γ-globin promoter region in HUDEP-2 and bulk edited cells. Bottom, sequence and percentage of each edited allele in bulk edited cells, determined by amplicon sequencing.



