Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Feb 20.
Published in final edited form as: Immunity. 2018 Feb 20;48(2):243–257.e10. doi: 10.1016/j.immuni.2018.01.012

Lineage-determining transcription factor TCF-1 initiates the epigenetic identity of T cell development

John L Johnson 1,2,3,*, Georgios Georgakilas 1,2,3,*, Jelena Petrovic 2,4,*, Makoto Kurachi 2,5, Stanley Cai 1,2,3, Christelle Harly 6, Warren S Pear 2,4, Avinash Bhandoola 6, E John Wherry 2,5, Golnaz Vahedi 1,2,3,**,#
PMCID: PMC5824646  NIHMSID: NIHMS939508  PMID: 29466756

Summary

T cell development is orchestrated by transcription factors that regulate the expression of genes initially buried within inaccessible chromatin, but the transcription factors that establish the regulatory landscape of the T cell lineage remain unknown. Profiling chromatin accessibility at eight stages of T cell development revealed the selective enrichment of TCF-1 at genomic regions that became accessible at the earliest stages of development. TCF-1 was further required for the accessibility of these regulatory elements and at the single-cell level, it dictated a coordinate opening of chromatin in T cells. TCF-1 expression in fibroblasts generated de novo chromatin accessibility even at chromatin regions with repressive marks, inducing the expression of T cell-restricted genes. These results indicate that a mechanism through which TCF-1 controls T cell fate is through its widespread ability to target silent chromatin and establish the epigenetic identity of T cells.

eTOC blurb

It is known that TCF-1 is required for T cell development, but the mechanism by which it controls the T cell lineage remains unclear. Johnson et al reveal that TCF-1 controls T cell fate through its ability to create de novo open chromatin, establishing the epigenetic identity of T cells.

graphic file with name nihms939508u1.jpg

Introduction

Eukaryotic organisms express genes in incredibly diverse patterns that are necessary for biological complexity (Struhl, 1999). This transcriptional diversity is largely controlled by the interactions between transcription factors and their cognate DNA binding sites within accessible chromatin regions. However, eukaryotic genomes are compacted to fit over a meter of DNA within the limited volume of the nucleus and this compaction is inherently repressive to processes that require access to the DNA sequence (Horn and Peterson, 2002). Despite the inherently repressive state of the chromatin, a number of lineage-instructive transcription factors alone or in cooperation with their partners can access a subset of their binding sites even if it is partially occluded by nucleosomes, recruiting chromatin-remodeling enzymes and exposing the underlying DNA. The distinctive collection of such accessible sequences controls the transcriptional output of a cell type and determines its functional characteristics.

Hematopoiesis is an excellent system for studying lineage-instructive transcription factors and their roles in establishing chromatin accessibility. Numerous studies in macrophages and B cells illustrate the emergence of accessible chromatin commanded by lineage-determining transcription factors (Boller et al., 2016; Di Stefano et al., 2014; Ghisletti et al., 2010; Heinz et al., 2010). The pervasive patterns of PU.1 binding to thousands of genomic regions are closely related to the permissive chromatin state in macrophages (Ghisletti et al., 2010; Heinz et al., 2010). EBF1 can induce lineage-specific chromatin accessibility in B cell progenitors (Boller et al., 2016). In addition to instructing development, transcription factors can also play key roles in cell reprogramming. For example, C/EBPα can induce transdifferentiation of B cells into macrophages at high efficiency by activating regulatory elements of macrophages (Di Stefano et al., 2014).

Despite numerous studies of CD4+ T helper cell differentiation (Ciofani et al., 2012; Vahedi et al., 2015; Vahedi et al., 2012) and CD8+ T effector responses (Gray et al., 2017; Pauken et al., 2016; Yu et al., 2017), and reports on the dynamics of histone modifications during T cell development (Dose et al., 2014; Zhang et al., 2012), we have a limited understanding of transcription factors shaping the chromatin accessibility of mature T cells in the thymus. The inception of T-lineage cells occurs when bone marrow-derived multipotent precursors seed the thymus and give rise to early thymic progenitors (ETP or DN1). Notch activation initiates T cell lineage commitment, reaching CD4CD8 double negative (DN)3 stage where the T cell receptor (TCR)β gene locus is rearranged. DN3 thymocytes that complete the β-selection mature to CD4+CD8+ double-positive (DP) cells, which further rearrange their TCRα locus. The T cell receptors are tested for reactivity to self-antigens, and positively selected DP thymocytes will become either CD4+ helper T or CD8+ cytotoxic T cells.

The distinct phases of T cell development in the thymus are controlled by the upregulation of transcription factors including TCF-1, GATA3, and Bcl11b as well as the repression of alternative-lineage factors such as PU.1 and Bcl11a. The earliest T cell-specific transcription factor is TCF-1, encoded by Tcf7, which is steeply up-regulated in T cell progenitors by Notch1 signaling and sustained until maturation. TCF-1 can positively regulate Gata3 in addition to Bcl11b, which is necessary for T lineage commitment (Germar et al., 2011; Weber et al., 2011). Transcription factors required in other hematopoietic differentiation programs such as E2A and its relatives, Ikaros, Gfi1, MYB, and RUNX1 are also essential in T cell development (reviewed in (Rothenberg et al., 2008)). Despite the broad knowledge on the functions of these transcription factors at distinct developmental stages, it remains unclear which ones shape the chromatin accessibility of T cells in the thymus.

By mapping chromatin accessibility at eight stages of thymic T cell development in mice, we found the significant enrichment of TCF-1 at genomic regions that became accessible at the earliest stage of development and persisted until T cell maturation. T-like cells in Tcf7−/− mice did not establish the open chromatin landscape and transcriptional profile of normal T cells. Moreover, TCF-1 dictated a coordinate opening of chromatin in single cells that followed a T cell trajectory. Gain of function experiments in fibroblasts further revealed the ability of TCF-1 to bind to previously occupied nucleosomes, generating de novo chromatin accessibility even at condensed chromatin regions and inducing the expression of T cell-restricted genes ordinarily silenced in fibroblasts, A subset of TCF-1 binding events further erased the pre-existing repressive marks in fibroblasts, highlighting the ability of this lineage-determining transcription factor to substantially target closed chromatin. Collectively, our results identified the role of TCF-1 in the making of chromatin accessibility at T cell genes and revealed a mechanism through which this protein controls the epigenetic identity of T cells during development.

Results

Chromatin remodeling occurs in three waves during T cell development

To elucidate the developmental stages in which the open chromatin landscapes of mature CD4+ and CD8+ T cells are established in the thymus, we assessed chromatin accessibility at eight stages of development including ETP (also referred to as DN1), DN2a, DN2b, DN3, DN4, DP, CD4+, and CD8+ T cells using ATAC-seq (ImmGen Consortium, STAR Method). To identify T cell-specific regulatory elements, we compared these maps with those of progenitor cells including hematopoietic stem cells (HSC), multipotent progenitors (MPP), and common lymphoid progenitors (CLP) in addition to B and NK cells. Initial steps of the analysis led to the characterization of 35,869 open chromatin regions with differential accessibility levels across cell states. Our unsupervised clustering of these regulatory elements revealed patterns of gain and loss of chromatin accessibility as cells progressed from early to terminal stages of T cell fate determination (Figures 1A and S1A–B). We aggregated patterns of gain and loss in chromatin accessibility into broader meta-clusters capturing selective opening in early, intermediate, and late phases of development. Our data showed that the sustained accessibility of mature T cells was established in three distinct waves: “early” at ETP (1,705 regulatory elements, cluster 9), “intermediate” after commitment at DN2b (1,399 regulatory elements, cluster 19), and “late” at the single-positive stage (1,917 regulatory elements, cluster 10) (Figure 1A–B). In addition, a set of genomic regions that became open early was shared between T and NK cells (1,445 regulatory elements, cluster 7). Our analysis further revealed a pattern of gain followed by loss of chromatin accessibility as 75% (9,071) of regulatory elements that became accessible at the early ETP stage were dismantled before T cell maturation (“Open Early in T” meta-cluster, Figure 1A). These results demonstrated the dynamic remodeling of chromatin landscape with distinct expansions and restrictions of regulatory elements during T cell development.

Figure 1. TCF-1 binding occurs at three waves of chromatin remodeling during T cell development (see also Figure S1).

Figure 1

(A) Heatmap demonstrates the level of chromatin accessibility at 35,869 regulatory regions measured by bulk ATAC-seq in HSC, MPP, CLP, ETP, DN2a–b, DN3, DN4, DP, SP, B and NK cells (ImmGen Consortium and STAR method). All ATAC-seq libraries were generated in duplicates and data were merged to calculate the FDR. Rows represent genomic loci and columns are the significance of each element’s accessibility level in every sample. Accessible regions were organized in groups with k-means clustering (k=20) using FDR as a proxy for signal enrichment (see Figure S1A). The number of clusters was chosen based on Average Silhouette Width statistic. Clusters were further assembled into meta-clusters depending on their accessibility patterns in progenitor, B, NK, in addition to early, intermediate, and late opening in T cells. Clusters that were open in mature T cells and specific to T cell development are highlighted in red.

(B) Heatmap demonstrates normalized ATAC-seq tag counts around regulatory loci (+/− 2kb window and 10bp bin size) in clusters 9, 19 and 10.

(C–E) De novo motif discovery using HOMER in each cluster of regulatory elements (A) using elements in clusters that were removed from the final clustering analysis shown in A (see Figures S1A, S1C and STAR methods).

(F) Percentage of cluster members bound by TCF-1, PU.1, GATA3 and RUNX1 ChIP-seq peaks (left) and their corresponding odds ratio (right). Contingency tables were calculated using ChIP-seq data summarized in STAR methods.

(G) ATAC-seq (13 cell types) and TCF-1 ChIP-seq (DP T cells) profiles in the Bcl11b locus.

TCF-1 is the top enriched transcription factor in mature T cell clusters

We reasoned that the transcription factors that can bind to nucleosomal DNA in progenitors and create the chromatin accessibility landscape of terminally differentiated cells should be enriched within regulatory elements that selectively become open in that lineage. To find transcription factors with such characteristics, we inferred their occupancy in cell and stage-specific regulatory elements by performing motif analysis (Heinz et al., 2010). B cell-specific open chromatin regions were enriched with motifs of EBF1, a transcription factor which has been previously reported to create the accessibility of regulatory elements in B cells (Figure S1C) (Boller et al., 2016). Furthermore, Tbox, ETS, and GATA motifs were highly enriched among regulatory elements of NK and progenitor-specific cells (Figure S1C). In T cells, recognition sites for TCF, a high-mobility group (HMG) family of proteins, were the top enriched motif in the early, intermediate, and late waves of chromatin opening that persisted until T cell maturation (clusters 9, 19, 7, and 10) (Figure 1C–E). Notably, E2A, ETS and RUNX recognition sites were among the second and third motifs in these clusters (Figure S1C). Similar analysis on chromatin accessibility maps of human T cells revealed the enrichment of TCF motifs within T cell-specific open chromatin of human naïve T cells, suggesting the conserved role of this transcription factor in humans and mice (Figure S1D).

Among TCF family transcription factors, TCF-1 is induced early at the inception of T lineage cells. To further substantiate direct binding of TCF-1 in comparison to other T cell related transcription factors including GATA3, RUNX1, and PU.1, we calculated the number of genomic regions within each cluster bound by these transcription factors using ChIP-seq (STAR method). As predicted by the enrichment of its motif, TCF-1 bound to around 70% of the genomic regions within the early and intermediate T cell specific clusters in addition to 24% of the late T cell cluster (Table S12). This contrasted with RUNX1 and GATA3 binding events at less than 17% of the genomic regions within T cell-specific clusters (Figure 1F). Moreover, the highest odds ratio was associated with TCF-1 binding events in early and intermediate T cell clusters in particular clusters 9, 7, and 19 (Figure 1F). The early regulatory elements deactivated before maturation were enriched with PU.1 binding, reminiscent of earlier findings that most active chromatin features at PU.1 binding events were ‘dismantled’ as PU.1 is down-regulated in early DN stages (Zhang et al., 2012). Together, the pervasive binding of TCF-1 corroborated the strong enrichment of its motif at accessible regulatory elements of T cells.

We further sought to explore the relationship between the activation of regulatory elements and their associated genes. The ontology of genes proximal to T cell-specific clusters was mostly related to T cell receptor signaling and naïve T cell development with no ontology distinguishing different waves of chromatin opening (Figure S1E). The gene expression levels proximal to dynamic regulatory elements did not present significant differences during development, suggesting a larger transformation for the regulatory landscape than the transcriptional output (Figure S1F). While the T cell commitment factor Bcl11b has low expression levels in ETP, multiple T cell-specific regulatory elements proximal to this gene became accessible at the earliest stage and co-localized with TCF-1 binding (Figure 1G). A representative of three waves of chromatin remodeling during development is the Bcl11b locus. The rightmost elements including the Bcl11b promoter became accessible as early as the ETP and these early elements were retained until T cell maturation. The middle of the locus was mostly accessible in intermediate stages, and the leftmost elements of the locus gained accessibility late in the developmental process. Collectively, these results demonstrated the dynamic of expansions and restrictions of regulatory elements during T cell development and foreshadowed the importance of TCF-1 in patterning the regulatory landscape from early thymic progenitors to mature T cells.

Tcf7-deficient T cells cannot establish the open chromatin landscape of normal T cells

Germ-line deletion of TCF-1 leads to a severe reduction in thymocyte numbers (Verbeek et al., 1995). Although some T lineage-like cells continue to develop in the thymus of Tcf7-deficient mice, they are functionally limited in terms of differentiation and persistence of memory T cells during infection (Verbeek et al., 1995; Zhou et al., 2010). It remains unclear whether the chromatin accessibility landscape and transcriptional outputs of these T-like cells is different from those of normal T cells. Therefore, we next measured chromatin accessibility at TCF-1 binding events in wildtype and Tcf7−/− DP T cells. Our data revealed the loss of chromatin accessibility at 5,000 regulatory elements and the gain at 1,165 genomic loci in Tcf7−/− T cells (Figures 2A and S2A). We sought to elucidate the relationship between regulatory elements that required TCF-1 for their accessibility and the three waves of chromatin opening during T cell development (clusters in Figure 1A). Regulatory elements that lost chromatin accessibility in the absence of TCF-1 were strongly enriched within early or intermediate waves of chromatin opening during T cell development, suggesting that this transcription factor is required for patterning the chromatin at early stages (clusters 7, 9 and 19) (Figure 2B). Examples of affected regions included the well-annotated Tcrb enhancer (Osipovich and Oltz, 2010) and the distal Bcl11b enhancers (Li et al., 2013) (Figure 2C). Performing de novo motif analysis revealed TCF as the top enriched motif in the lost sites supporting the notion that TCF-1 is directly responsible for chromatin accessibility (Figure 2D). TCF-1-bound regions with gains in accessibility in Tcf7−/− T cells were also enriched with the TCF motif but were associated with elements accessible in B and NK cells or T cell regulatory elements deactivated in mature T cells, supporting the previously reported repressive role of TCF-1 at some genomic locations (Figures 2B, D) (Xing et al., 2016). Together, these data demonstrated that TCF-1 was required for patterning the chromatin of T cells at early stages of development in the thymus.

Figure 2. Tcf7-deficient DP T cells cannot establish the open chromatin landscape and transcriptional output of normal DP T cells (see also Figure S2).

Figure 2

(A) Volcano plot demonstrates fold-change and p-value calculated by DESeq2 to delineate differentially accessible regions between WT and Tcf7−/− DP T cells at TCF-1 binding sites based on ChIP-seq. While 5,000 genomic regions were less accessible, 1,165 regions were more accessible in Tcf7−/− DP T cells (fold-change > 1.5 and p-value < 1e-3). Two technical replicates of ATAC-seq in wildtype and Tcf7−/− DP T cells were generated in one experiment (see Figure S2A).

(B) Heatmap demonstrates odds ratios of the enrichment of TCF-1-dependent open chromatin regions (A) within T-cell specific clusters from Figure 1A. Contingency tables were calculated as described in STAR methods.

(C) Representative examples of TCF-1 dependent chromatin accessibility at Tcrb and Bcl11b.

(D) Heatmap demonstrates DE-seq normalized tag counts of ATAC-seq at differentially accessible regions in wildtype and Tcf7−/− DP T cells. The de novo motif analysis in differentially accessible regions was performed with remaining elements as background using HOMER.

(E) GSEA depicts the enrichment of genes proximal to differential accessible regions within transcriptionally regulated genes. Two technical replicates of RNA-seq in WT and Tcf7−/− DP T cells in one experiment were generated to assess the effect of TCF-1 absence on gene expression levels (see Figure S2B). DESeq2 was used to identify differentially expressed genes (fold-change > 1.5 and p-value < 5e-2). Our analysis unveiled 1,167 down- and 1,293 up-regulated genes in Tcf7−/− compared to WT DP T cells (see Figure S2C). Genes were ranked based on log2 fold-change and used as the pre-ranked gene list in GSEA analysis. The GSEA gene sets were genes within 10kb of top 200 regions with highest fold-change in chromatin accessibility between Tcf7−/− and WT DP T cells (A).

To elucidate how changes in chromatin accessibility relate to the dynamics of gene expression, we evaluated the transcriptome of wildtype and Tcf7−/− T cells using RNA-seq (Figure S2B–C). We then interrogated changes in the expression of genes proximal to TCF-1-dependent open chromatin regions using gene-set-enrichment analysis. Genes proximal to regions that became less accessible in the absence of TCF-1, such as Tcrb and Bcl11b, displayed reduced expression in cells lacking this transcription factor (Figure 2E). Conversely, genes such as Adam19 that became more accessible also showed an increase in transcription in Tcf7−/− T cells (Figure S2D). Together, these results indicated that while some T-like cells continued to develop in the absence of TCF-1 in the thymus, they could not establish the open chromatin landscape and transcriptional profiles of normal T cells.

TCF-1 binding exerts a coordinate impact on the chromatin of single T cells

If a transcription factor is required for patterning the regulatory landscape of a lineage, it may need to exert a harmonizing impact on the chromatin of individual cells making the same fate decision. To interrogate which T cell transcription factor may have such features, we first exploited maps of chromatin accessibility at the population level and reasoned that at a given regulatory element, the strength of bulk ATAC-seq signal can reflect the fraction of cells in the population with open chromatin. We compared the normalized intensity of chromatin accessibility in bulk ATAC-seq at genomic regions uniquely bound by T lineage transcription factors TCF-1, GATA3, or RUNX1 (Figure 3A). Our analysis revealed that TCF-1 binding events rendered the highest average level of chromatin opening in comparison to GATA3 and RUNX1, advancing the notion that TCF-1 may unify chromatin accessibility across single T cells (Figure 3A).

Figure 3. TCF-1 binding exerts a coordinate impact on the chromatin of single T cells (see also Figure S3).

Figure 3

(A) Violin plots depict the enrichment of chromatin accessibility at transcription factor binding events using bulk ATAC-seq. Genome-scale binding of TCF-1, RUNX1, and GATA3 in DP T cells was measured by ChIP-seq. An equal number of genomic regions with unique binding of each transcription factor were subsampled from ChIP-seq data sets. The normalized tag count for ATAC-seq in DP T cells was calculated for each instance from the subsampled groups of transcription factor binding. Statistical significance of the difference in ATAC-seq enrichment between pairs of groups was assessed with Mann-Whitney U test.

(B) Scatter plot shows the correlation between bulk ATAC-seq and ensemble of single-cell ATAC-seq data. Accessible chromatin regions identified from bulk ATAC-seq in 50,000 DP T cells were merged with peaks characterized by aggregating the samples from 110 single DP T cells passing QC measures (see Figure S3D). Normalized enrichment was subsequently calculated in bulk (down sampled to 11.6 million reads) and aggregated scATAC-seq with 11.6 million reads enabling the correlation assessment between the two assays. Three independent experiments (captures) were performed.

(C) Genome-browser view depicts scATAC at 110 single T cells, ensemble of single-cell ATAC, and bulk ATAC-seq profiles with TCF-1 ChIP-seq on the Tcrb locus.

(D) Overview of our method to infer transcription factor-associated chromatin accessibility variation across single cells (STAR methods).

(E) Chromatin accessibility variation across individual DP T cells at TCF-1, RUNX1, and GATA3 ChIP-seq binding as measured by our method (D) and chromVAR.

(F) The level of chromatin accessibility at the single cell level was calculated for 110 single DP T cells across T cell specific open regions in cluster 9 (Figure 1A). Fraction of cells with binarized open chromatin was measured to rank regulatory regions (top rows are genomic regions that are open in majority of cells). TCF-1, GATA3 and RUNX1 ChIP-seq enrichment was assessed in the same order as well as changes in chromatin accessibility based on bulk ATAC-seq signal in WT and Tcf7−/− DP T cells. De novo motif analysis using HOMER was also performed at the 100 enhancers exhibiting the highest/lowest similarity at the single cell level.

While chromatin accessibility maps of bulk T cells measure the average patterns of open regulatory elements at the population level, it remains unclear if Tn5 insertions linearly reflect the fraction of individual cells with open chromatin. To address this concern, we tested our hypothesis using single-cell (sc)ATAC-seq (Buenrostro et al., 2015). In this approach, individual cells stained for viability were captured and assayed using a programmable microfluidics platform (Fluidigm) (Figure S3A–B). Collapsing reads from single T cells to aggregate scATAC-seq data closely reproduced measures of accessibility profiled by ATAC-seq generated from 50,000 T cells (Figure 3B). A representative genomic region such as the Tcrb enhancer confirmed the strong correlation between bulk and single-cell measurements (Figure 3C). Furthermore, data from single T cells recapitulated several characteristics of bulk ATAC-seq data, including fragment-size periodicity corresponding to integer multiples of nucleosomes (Figure S3C). Together, we performed three independent single-cell captures and 110 T cells at the DP stage passed various quality control thresholds, suggesting high-confidence single-cell chromatin accessibility maps in T cells (Figure S3D).

Single-cell chromatin accessibility data are sparse, binary, and high dimensional, leading to computational challenges. To overcome these difficulties, we developed a method using a geometric distance metric and quantified cell-to-cell chromatin accessibility variation (Figure 3D, STAR Methods). To interrogate which T cell transcription factor can create harmonizing effects, we exploited our method on binarized scATAC-seq count data in every cell and calculated the average distance between pairs of T cells at genomic regions uniquely bound by TCF-1, RUNX1 or GATA3. We reasoned that binarizing scATAC-seq count data at transcription factor binding events reflected the openness or closeness (1 or 0) of a locus in a single cell. Due to biases in the number of observed fragment counts between cells based on the GC content or mean accessibility of a given peak set, we normalized the distance between individual cells at each set of transcription factor binding events to that of a background set comprising an equal number of peaks with matching GC content and mean accessibility. Our single-cell analysis revealed that TCF-1-bound regions were associated with the least variability among individual T cells in comparison with GATA3 and RUNX1 (Figure 3E). We further applied another analytical technique called “chromVAR” which was recently developed to address the same question (Schep et al., 2017). Unlike our method in which the difference in accessibility of a genomic region between every cell-pair contributes to the variability score, chromVAR relies on the aggregate of accessibility signal across a genomic set. Despite differences in the inference of variability at transcription factor binding sites, chromVAR also identified TCF-1 as the least variable transcription factor in exerting chromatin accessibility across single T cells (Figure 3E). Together, two analytical strategies developed by us and others corroborated the enrichment of TCF-1 binding at regulatory elements that their accessibility was conserved across single T cells.

As an alternative strategy, we ranked T cell specific genomic regions in the early T cell cluster (cluster 9) based on the fraction of cells harboring open chromatin and evaluated whether they were bound by T cell transcription factors TCF-1, GATA3, and RUNX1 (Figure 3F). The top regulatory elements open across majority of single cells were bound consistently by TCF-1 in contrast with GATA3, and RUNX1 (Figure 3F). We reasoned if TCF-1 indeed plays a role in creating accessibility at genomic regions with the highest similarity across individual cells, its deletion should have a stronger effect on the accessibility of these regions at the bulk level. Indeed, the most similar genomic regions across individual T cells, i.e. being open at the highest fraction of cells, were more affected by loss of TCF-1 compared to the least similar genomic regions (Figure 3F). In line with consistent TCF-1 binding and a stronger effect size in chromatin accessibility in the absence of TCF-1, the TCF motif was selectively enriched within the top 100 most similar genomic regions. Furthermore, the genes proximal to these genomic regions with the highest similarity across individual T cells were associated with T cell biology and included T cell relevant genes such as Bcl11b (Figure 3F). Together, studying maps of chromatin accessibility at bulk and single cell levels with distinct analytical strategies suggested that TCF-1 could dictate a harmonizing impact on the chromatin of individual T cells.

TCF-1 can create de novo chromatin accessibility in fibroblasts

It has been shown that when TCF-1 is forcibly expressed in bone marrow progenitors, it can drive the expression of T-lineage genes (Weber et al., 2011). Yet, it is not clear whether this alteration in the gene expression program of multipotent progenitors relates to the ability of TCF-1 to bind to silent chromatin and drive the epigenetic commitment to the T cell lineage. To examine if TCF-1 can create de novo open chromatin, we assessed this transcription factor in a gain-of-function model in nonhematopoietic somatic cells. We reasoned that fibroblasts could serve as an ideal model since the chromatin state in fibroblasts is distinct from cells of the hematopoietic system and T cell-specific genes are repressed in these somatic cells, allowing us to better evaluate the role of TCF-1 in targeting condensed chromatin.

To evaluate the genome-scale binding of TCF-1, we ectopically expressed this transcription factor in a fibroblast cell line using a retroviral transduction system and performed TCF-1 ChIP-seq (Figure S4A). To define genome-scale TCF-1 binding events, we used the irreproducible discovery rate (IDR) method with a threshold of 2% (Figure S4B). We further mapped the position of nucleosomes using microccocal nuclease (MNase)-seq in pre-induced cells. The ectopic expression of TCF-1 led to more than 40,000 TCF-1 binding events across the genome of fibroblasts where 73% of these events colocalized with previously nucleosome-occupied DNA (Figures 4A and S4C). The extent to which TCF-1 bound to nucleosome-occupied regions in fibroblasts was comparable to reprogramming transcription factors such as OCT4 (85%), SOX2 (80%), and KLF4 (65%) (Soufi et al., 2015). As an independent measure, we found that 67% of TCF-1 summits, the center of TCF-1 peak, were within 75bp of a nucleosome dyad in contrast with CTCF binding which was favored towards nucleosome-free regions, suggesting the enrichment of TCF-1 binding at previously occupied nucleosomes (Figure 4B). Furthermore, TCF was the strongest motif within TCF-1-bound sites with different levels of nucleosome occupancy (p-value<1e-930) (Figure 4C). TCF recognition sites bound by TCF-1 in fibroblasts were significantly closer to the nucleosome dyads compared to random TCF sites not bound by this transcription factor, reminiscent of PU.1 binding events being shielded by nucleosomes in cells that do not express PU.1 (Barozzi et al., 2014) (Figure S4D). Together, the ectopic expression of TCF-1 in fibroblasts revealed the widespread binding of TCF-1 at genomic regions previously occupied by nucleosomes harboring TCF consensus binding sites.

Figure 4. TCF-1 can bind to nucleosomes and create chromatin accessibility in fibroblasts (see also Figure S4).

Figure 4

(A) Heatmap demonstrates TCF-1 ChIP-seq in TCF-1 expressing fibroblast cell line together with pre-existing map of nucleosomes using MNase-seq. TCF-1 ChIP-seq (two biological replicates) was performed on the p33 isoform of Tcf7 expressing NIH3T3 cells using retrovirus (RV) as well as in Empty vector controls 48 hours post transduction (see Figure S4A). Peak calling was achieved with macs2 and the reproducibility across replicates was assessed with IDR (see Figure S4B) resulting in the identification of 40,562 TCF-1 binding sites. The region surrounding TCF-1 summits was segmented in three non-overlapping 200bp windows centered around each summit. Normalized MNase-seq enrichment was calculated for each window and summits were ordered from high to low enrichment. TCF-1 ChIP-seq and MNase-seq normalized enrichment profiles were also calculated in non-overlapping 10bp bins of 6kb windows centered around TCF-1 summits. Two independent experiments were performed.

(B) The distance between TCF-1 and CTCF (serving as control) ChIP-seq summits and the closest nucleosome summits were calculated as an alternative strategy of assessing the ability of TCF-1 to directly bind nucleosomes. The vertical dashed red line is set to 75bp which is typically half the size of histone octamer bound DNA denoting the edge of nucleosomes. 27,145 TCF-1 summits (66.9%) located less than 75bp away from nucleosome summits were classified as bound to nucleosomes and 13,417 (33.1%) as unbound. 20,370 (56.6%) CTCF summits were marked as bound and 15,616 (43.4%) as unbound.

(C) De novo motif analysis at nucleosome-low, medium and high clusters using HOMER (defined in Figure S4C). We chose open regions with no overlap with TCF-1 summits as background.

(D–E) Volcano-plot (D) and heatmap (E) demonstrate differentially accessible regions after TCF-1 expression in fibroblasts. We performed ATAC-seq in duplicates in no RV (Mock), Empty RV, and 2 and 4 days after TCF-1 RV NIH3T3 cells (see Figure S4E). Tag counts for no RV (Mock) are not shown. To identify differentially accessible regions, TCF-1 ChIP-seq (A) and ATAC-seq peaks were merged to facilitate differential enrichment at both TCF-1 bound and unbound regions of the genome. We used DESeq2 and based on fold-change > 2 and p-value < 1e-3, 6,882 regions gained while 1,618 lost accessibility in TCF-1 RV cells. Two independent experiments at days 2 and 4 after transduction were performed.

(F) The de novo motif discovery with HOMER in differentially accessible regions (D) using regulatory regions with unchanged accessibility levels as background.

(G) TCF-1 bound to 5,575 (80%) gained accessible sites in contrast to only 40 (3%) lost sites.

(H) ATAC-, MNase- and TCF-1 ChIP-seq profiles in NIH3T3 cells in the Tcra locus. Arrows depict TCF-1 binding events, previously occupied by nucleosomes that gain in accessibility in TCF-1 RV NIH3T3 cells.

(I) Genome-browser depicts ATAC-seq and TCF-1 ChIP-seq profiles in T cells from Figure 1A as well as Empty and TCF-1 RV NIH3T3 cells (Day 2) at the Ccr7 locus.

To measure the impact of widespread TCF-1 binding on silent genomic loci, we mapped the accessibility of chromatin by ATAC-seq post transduction with Empty or TCF-1 vectors. Using differential enrichment analysis, we found that 6,882 genomic regions previously occupied by nucleosomes gained accessibility while 1,618 sites became less accessible after TCF-1 expression in fibroblasts (Figures 4D–E, S4E). We further performed de novo motif analysis and observed that more than 80% of the gained sites harbored a TCF motif while the lost sites were enriched with AP-1 and RUNX family motifs (Figure 4F). In concordance with motif presence, 80% of the gained sites were also bound by TCF-1 while only 3% of lost sites colocalized with TCF-1 binding (Figure 4G), suggesting an indirect role of TCF-1 on sites losing chromatin accessibility. To infer nucleosome position and occupancy within TCF-1 binding events, we further applied NucleoATAC algorithm (Schep et al., 2015) to our chromatin accessibility data and found 7,395 genomic regions with significant loss of nucleosomes after TCF-1 expression (Figure S4F). An example of de novo regulatory elements induced by TCF-1 included the T cell receptor alpha locus where the binding of TCF-1 at previously occupied nucleosomes led to gains in chromatin accessibility at multiple genomic regions (Figure 4H). Together, our data suggested that TCF-1 can bind to thousands of previously nucleosome-occupied DNA and this binding can lead to de novo chromatin accessibility.

We next sought to examine whether de novo chromatin accessibility in fibroblasts had any relevance to T cell biology. Our data revealed that TCF-1 binding events in T cells and fibroblasts were highly correlated (Figure S4G) and more than 800 de novo regulatory elements in fibroblasts (~11%) overlapped with open chromatin in T cells while only 40 regions (~0.5%) corresponded to the open chromatin in B cells (Figure S4H). Furthermore, the de novo regulatory elements in fibroblasts were selectively enriched for regions belonging to the early wave of chromatin opening during T cell development (cluster 9) (Figure S4I). For example, the promoter of Ccr7, which is among the regulatory elements that gained accessibility at the early cluster 9, was bound by TCF-1 and became accessible in TCF-1-expressing fibroblasts (Figure 4I). Together, TCF-1 can invoke a subset of T cell regulatory elements to become open in distant somatic cells like fibroblasts.

TCF-1 can bind and erase H3K27me3 and H3K9me3 repressive marks

The widespread binding of TCF-1 in fibroblasts led to thousands of de novo open chromatin regions. Yet, it is not clear whether these TCF-1-dependent regulatory elements were previously repressed or instead poised for activation with permissive histone modifications in fibroblasts. To address this question, we examined the pre-existing patterns of histone modifications in fibroblasts using maps of 5 histone modifications including: H3K4me3, primarily associated with promoters; H3K4me1 and H3K27ac characteristic of poised and active promoters and enhancers; and the repressive marks H3K9me3 and H3K27me3. Correlation and principal component analysis (PCA) at TCF-1 bound sites indicated a preferential colocalization of gained sites with previously repressed domains containing H3K27me3 or H3K9me3 modifications (Figures 5A and S5A–B). To create a more quantitative picture of the chromatin state prior to TCF-1 binding, we developed an unsupervised learning workflow and partitioned TCF-1 binding events into 11 clusters corresponding to 7 distinct chromatins states (Figures 5B, S5 C–D). Although less than half of TCF-1 binding events associated with active and poised enhancers or promoters (~40%), 16,800 (~42%) occurred within repressed and heterochromatin genomic regions. The gains in chromatin accessibility by TCF-1 were strongly enriched at these repressed domains (Figures 5B and S5E).

Figure 5. TCF-1 can bind to repressed chromatin and promote accessibility (see also Figure S5).

Figure 5

(A) Principal component analysis reduces the dimensionality of signal intensity measured by histone modification and ATAC-seq at TCF-1 binding events in fibroblasts. Two biological replicates of H3K9me3 and H3K27me3 ChIP-seq in NIH3T3 cells were generated and combined with public H3K4me3, H3K4me1 and H3K27ac ChIP-seq data to assess pre-induced histone mark enrichment around TCF-1 binding summits from Figure 4A using normR (+/−1kb window around TCF-1 summits). The enrichment of ATAC-seq in TCF-1 RV versus Empty RV NIH3T3 cells and vice versa was also calculated around each summit for assessing different levels of chromatin accessibility.

(B) Heatmap demonstrates normalized tag counts of various epigenetic measurements at TCF-1 binding events. K-means clustering (Figures S5C and S5D) of TCF-1 summits on the adjusted significance levels of the enrichment in each histone mark identified chromatin states ranging from PRC (H3K27me3) (4,110, 10.2%), hetero/PRC (H3K27me3 and H3K9me3) (8,957, 22%), hetero (H3K9me3) (4,242, 10.4%), trivalent (H3K27ac, H3K4me1 and H3K9me3) (6,634, 16.4%), poised enhancers (H3K4me1) (7,458, 18.3%), active enhancers (H3K4me1 and H3K27ac) (7,343, 18.2%) and promoters (H3K4me3) (1,818, 4.5%). Normalized enrichment profiles of histone modification using ChIP-seq as well as ATAC-seq were also calculated for 10bp non-overlapping bins spanning the +/− 3kb region centered around TCF-1 summits.

(C–D) Representative examples (C) and heatmap (D) demonstrate the effect of TCF-1 expression at histone modifications. To assess differences in the enrichment of H3K9me3, H3K27me3 and H3K27ac ChIP-seq signal around TCF-1 binding events between pre-induced and TCF-1 RV NIH3T3 cells, we used the diffR function from normR package using an FDR threshold of 5e-2. More than 1,400 TCF-1 binding events colocalized with gains in both chromatin accessibility and H3K27ac with a corresponding loss of H3K27me3/H3K9me3 marks (D).

To further assess whether TCF-1 is also capable of erasing the repressive histone modifications, we mapped H3K27me3 and H3K9me3 repressive marks in addition to the active enhancer mark H3K27ac in TCF-1 expressing cells. We found that more than 1,400 TCF-1 binding events overlapping de novo open chromatin were associated with gain in H3K27ac and loss of H3K27me3 and/or H3K9me3 repressive marks at the center of TCF-1 binding (Figure 5C–D). Together, the integration of nucleosome mapping, chromatin accessibility, transcription factor binding, and histone modifications in fibroblasts suggested a fundamental role of TCF-1 in establishing de novo chromatin accessibility because of its ability to bind to previously repressed chromatin domains.

T cell-restricted genes are actively transcribed after TCF-1 expression

To evaluate whether the ectopic expression of TCF-1 and its widespread binding at over forty thousand genomic regions corresponded to any change in gene expression, we measured the transcriptional outputs in fibroblasts (Figure S6A). After TCF-1 transduction, we found that 1,477 genes were up-regulated but 1,295 genes were down-regulated (Figure S6B). To further assess the identity of these up- and down-regulated genes, we generated two gene sets containing top “T cell genes” and “fibroblast genes” by performing differential expression analysis in DP T cells and pre-induced fibroblasts. Using gene-set-enrichment analysis, we found that the fibroblast gene-set was enriched within the down-regulated genes, suggesting the repression of the fibroblast gene expression program by TCF-1 (Figure 6A). Conversely, the T cell gene set was enriched within genes up-regulated by TCF-1 (Figure 6B). The leading edge in this enrichment analysis included genes essential for T cell commitment and development including Bcl11b, Rorc, and Cd247 (Figure 6C). Together, our data suggest that TCF-1 can initiate the reprogramming of fibroblasts towards T cells.

Figure 6. T cell-specific genes innately repressed in fibroblasts are up-regulated by TCF-1 (see also Figure S6).

Figure 6

(A–C) Three replicates of RNA-seq in TCF-1 RV and Empty RV NIH3T3 cells were generated to assess the effects on gene expression. DESeq2 (fold-change > 2 and p-adj < 0.05) facilitated the differential gene expression analysis resulting in 1,295 down- and 1,477 up-regulated genes in TCF-1 RV NIH3T3 cells (see Figures S6A and S6B). Genes located in non-canonical chromosomes were removed from the lists. In addition, we applied differential gene expression analysis between Empty RV and DP T cells to establish cell specific gene expression (see STAR methods) which facilitated GSEA analysis of DEGs in TCF-1 RV NIH3T3 cells on the fibroblast gene set (A) and the T cell gene set (B). Leading edge analysis (C) in top T cell genes.

(D) Thymocyte-specific genes were defined using public ImmGen microarray data (see STAR methods) and the overlap tested between TCF-1 RV up-regulated genes in NIH3T3 (see Figure S6B) and thymocyte-specific genes. These genes were clustered using ImmGen microarray expression profiles (middle and right). Gene expression profiles of genes not overlapping thymocyte-specific genes but expressed in progenitors (597 genes) were also plotted (left).

(E–G) TCF-1 summits assigned to chromatin states (see Figure 5B) were linked to proximal genes (see STAR methods). (E) Enrichment of up-regulated genes by TCF-1 within each chromatin state. (F) Enrichment of T cell genes up-regulated by TCF-1 (B) in each chromatin state was compared to fibroblast genes (A).

(G) Genome-track depicts RNA-, ATAC- and MNase-seq as well as histone and TCF-1 ChIP-seq profiles in Ccr7 locus in NIH3T3 cells.

To examine whether TCF-1 up-regulated genes in fibroblasts had any relevance to transcriptional profiles during T cell development, we delineated ‘thymocyte-specific genes’ as a group of genes that were selectively expressed in at least one stage of T cell development but not in bone-marrow progenitors using the ImmGen expression data (Heng et al., 2008) (Figure 6D). We found that TCF-1 was capable of upregulating 81 thymocyte-specific genes with ontologies associated with tissue development, cell proliferation and immune system processes (Figures 6D and S6C). Examples included Bcl11b, Ikzf4, Il2rb, Klf4, and Rorc. Additional 597 genes up-regulated by TCF-1 were expressed at multiple cellular states (Figures 6D). It is well established that TCF-1 has recurring roles in T cell development, peripheral T cells and cells with stem properties (Im et al., 2016). We further evaluated the expression of the 1,477 genes up-regulated by TCF-1 in fibroblasts for their expression in hematopoietic progenitors together with naïve CD4+ and naïve, effector and memory CD8+ T cells using RNA-seq data (Figures S6D–E). After performing unsupervised clustering, we found that 753 genes were ordinarily expressed in one of these cell states. In particular, 475 genes (63%) including Ccr7, Il15ra, and Icosl were selectively expressed in the T cell program (Figures S6D–E). In addition, 42 genes that were up-regulated by TCF-1 in fibroblasts were selectively down-regulated in Tcf7−/− DP T cells, suggesting that TCF-1 is necessary and sufficient for transcription of these genes in multiple cell contexts (Figure S6F). Together, our data suggested that de novo open chromatin regions were invoked by TCF-1 to induce the T cell-specific gene expression program in fibroblasts.

Genes up-regulated by TCF-1 reside in previously repressed chromatin domains in fibroblasts

Our data in TCF-1 expressing fibroblasts led to two observations: (a) TCF-1 can generate chromatin accessibility at previously repressed domains and (b) TCF-1 can induce the expression of thousands of genes. To relate the chromatin state at the TCF-1 binding events to changes in transcriptional outputs in fibroblasts, we calculated the enrichment of up- and down-regulated genes among genes whose 5kb extended regions fell within TCF-1 binding events in different chromatin states. We found that the TCF-1-up-regulated genes were significantly enriched for TCF-1 binding events at chromatin domains with repressive chromatin marks (Figure 6E). Conversely, the TCF-1 down-regulated genes were mostly associated with promoters and the trivalent state with high H3K4me1 and H3K27ac surrounded by H3K9me3 (Figure S6G). A statistically significant proportion of genes were proximal to TCF-1 binding events that led to gain in H3K27ac and loss of H3K27me3/H3K9me3 modifications in contrast to those that did not alter the chromatin state (Figure S6H). In particular, genes of the T cell program were strongly enriched within genomic regions previously within repressed chromatin domains or harboring high nucleosome occupancy (Figures 6F and S6I). Examples of T cell genes ordinarily blanketed by repressive H3K27me3 and H3K9me3 in fibroblasts and actively transcribed after TCF-1 expression included Ccr7, the receptor required for cell trafficking within and out of the thymus, and Rorc, an essential transcription factor for T cell development (Figures 6G and S6J). Thus, TCF-1 can induce the expression of T cell genes in an unrelated non-hematopoietic cell type by accessing repressive chromatin domains and converting these regions to open, transcriptionally active loci.

Discussion

It has been known for more than 2 decades that TCF-1 is a key transcription factor in T cell development (Verbeek et al., 1995). As a major mediator of NOTCH signaling in the specification of bone-marrow progenitors to a T cell fate, TCF-1 is required for the expression of transcription factors essential for T cell commitment and specification such as GATA3 and Bcl11b (Germar et al., 2011; Weber et al., 2011). Yet, it has been unclear whether the mechanism by which TCF-1 controls T cell fate is the specific transcriptional regulation of a small number of genes or whether this protein has a more fundamental role establishing the global epigenetic identity of T cells. Here, by reading between the ‘open’ lines of the genome during thymocyte development, we found that TCF-1 was the most enriched transcription factor at thousands of regulatory elements that became accessible at the earliest stage and persisted until T cell maturation. TCF-1 binding across the genome of fibroblasts led to gains in chromatin accessibility at genomic regions enriched with repressive marks. This ability of TCF-1 targeting repressed chromatin might be attributed to the ability of HMG proteins to introduce a strong bend into DNA (Love et al., 1995). A subset of TCF-1 binding events was also associated with gain of the active enhancer mark H3K27ac and loss of the repressive marks H3K27me3 and H3K9me3, corroborating the ability of TCF-1 in targeting silent chromatin. As a result of this epigenomic remodeling, hundreds of T cell-restricted genes including Ccr7, Bcl11b, and Rorc were induced in TCF-1 expressing fibroblasts. These results revealed a mechanism by which TCF-1 controls T cell fate through genome-wide programming of the epigenetic identity of T cells.

It has been shown that TCF-1 is essential for repressing CD4+ related genes in CD8+ T cells through intrinsic HDAC activity (Xing et al., 2016). Of all TCF-1 binding events that had differential accessibility in the absence of TCF-1, we found that a majority (80%) exerted an activating role (i.e., losing accessibility in Tcf7−/− cells) with a smaller number gaining accessibility, supporting this previously reported repressive role of TCF-1. Both gained and lost sites in our data were enriched with TCF-1 binding and TCF motif, suggesting the direct role of this transcription factor at recognizing its binding sites across the genome. While further analysis is required to examine the sequence features and epigenetic modifications classifying the activating versus repressive TCF-1 binding events, our work revealed the widespread role of TCF-1 at establishing de novo open chromatin during development and reprogramming.

Conrad Waddington proposed a metaphor for cellular differentiation coining the term “epigenetic landscape” and envisioning a cell rolling down a hill like a ball. Exploiting the single cell technology, we interrogated whether a lineage-determining transcription factor can exert harmonizing and coordinate impact on the chromatin of single cells following the T cell trajectory. To infer cell-to-cell variability on open chromatin associated with transcription factors, we developed an analytical method and found that TCF-1 target sites but not those of RUNX1 or GATA3 conferred the lowest cell-to-cell variability across individual T cells. Stated in a different way, open chromatin events that were highly conserved across single cells (revealed by single cell ATAC-seq) were likely to be causal to the identity of that cell type since, in this case, T cells appeared not to function effectively without TCF-1 driven epigenetic events. Despite the limitation that our knowledge of transcription factor binding is still gathered from bulk assays such as ChIP-seq, our data demonstrated a distinct pattern at genomic regions with TCF recognition sites and TCF-1 binding, suggesting the role of this transcription factor at coordinating the chromatin accessibility of individual cells.

Our data demonstrated that the TCF motif and TCF-1 binding events were strongly enriched at T-cell specific regulatory elements that became accessible early and persisted until T cell maturation. Furthermore, loss of TCF-1 selectively affected the accessibility of the early regulatory elements. These findings together with the early up-regulation of TCF-1 in T cell development and the ability of this protein to reprogram the gene expression profile of fibroblasts may describe TCF-1 as a “pioneer” transcription factor (Zaret and Carroll, 2011). Nonetheless, we propose that the epigenetic complexities and the requirement for combinatoriality among transcription factors suggest that lineage-determining transcription factors such as TCF-1 may require additional events to fully enact the program of cell lineage that they initiate (Oestreich and Weinmann, 2012). Here, we found that TCF-1 was endowed with an ability to target chromatin regions with repressive marks and in this manner, is more potent than the previously characterized pioneer factors in other developmental settings which are often impeded by heterochromatin (Soufi et al., 2012; Soufi et al., 2015). Nevertheless, not the entire collection of ~1 million TCF recognition sites were bound by TCF-1 in fibroblasts and only a fraction of the T cell-specific regulatory elements became accessible in this context. It is worth noting that no other transcription factor including the previously studied pioneer factors has been reported to bind to the entire set of possible binding sites present in the genome. We postulate that higher order chromatin conformation and epigenetic modifications such as DNA methylation may impede TCF-1 binding to the entire set of its cognate sites (Wohrle et al., 2007). Moreover, the three waves of chromatin remodeling during T cell development enriched with TCF-1 binding suggested multiple modes of action for this transcription factor. The regulatory elements in the intermediate wave that remained closed at an earlier stage may indicate a requirement for the cooperation between TCF-1 and its partners. Similarly, although more than thousand TCF-1 binding events in fibroblasts erased the pre-existing repressive marks, the remaining TCF-1 binding events did not modify fibroblasts’ endogenous chromatin state, indicating the requirement of cooperating partners at these regulatory sequences. The regulatory syntax that TCF-1 follows to read the genetic code may be ascertained by machine learning techniques delineating rules of transcription factor engagement from DNA sequence and shape, histone modifications, DNA methylation, and 3D genome organization during development and reprogramming. Collectively, our integrative data highlight a widespread means by which TCF-1 initiates the T lineage program through genome-wide epigenetic programming and induction of T cell identity genes.

STAR Methods

CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Golnaz Vahedi (vahedi@pennmedicine.upenn.edu).

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Mice

Female C57BL/6J (CD45.2+) and B6.SJL-Ptprca Pepcb/BoyJ (CD45.1+) mice purchased from the US National Cancer Institute animal facility. All mice analyzed were 6–12 weeks and were used without randomization or ‘blinding’ of researchers to mouse or sample identity. Tcf7−/− (TCF-1−/− ΔVII) mice were kindly provided by A. Bhandoola (Verbeek et al., 1995). All animal work was in accordance with the Institutional Animal Care and Use Committee for the University of Pennsylvania in accordance with guidelines set forth by the NIH.

Cell Culture

NIH3T3 male cells were purchased from ATCC for this study and used at a low passage number (<12) and were maintained in high glucose DMEM 1x medium with L-glutamine and sodium pyruvate (Corning) with 100 U mL−1 penicillin and 100 μg mL−1 streptomycin (Gibco) and 10% bovine serum (Gibco). 293T (ATCC) cells were maintained in high glucose DMEM 1x medium with L-glutamine and sodium pyruvate (Corning), and 100 U mL−1 penicillin and 100 μg mL−1 streptomycin (Gibco) with 10% fetal calf serum (Gemini). All cells were grown at 37°C and 5% CO2.

METHOD DETAILS

Retroviral Transductions

Gateway compatible MSCV-IRES-VEX (MSCV-ccdB-VEX) and empty vector controls (MSCV-VEX) retroviral vectors were obtained from A. Bhandoola (Weber et al., 2011). Mouse Tcf7 cDNA (NM_009331) of the short isoform of TCF-1 (p33) was obtained from Origene and cloned into MSCV-ccdB-VEX (MSCV-TCF7-VEX) according to Gateway Clonase II instructions (Invitrogen). Sequences were verified using MacVector v15.5.0. Cells were transduced by addition of virions to culture media supplemented with polybrene at 8 μg mL−1 and 10 mM HEPES. As transduction efficiency in NIH3T3 was >99%, all assays on transduced NIH3T3 cells were performed without cell sorting.

Retroviral Packaging

293T cells were plated in 4 mL DMEM media in 10 cm dishes prior to transfection. Immediately prior to transfection, chloroquine was added to a final concentration of 25 μM. The retroviral construct and the pCL-Eco plasmid were transiently co-transfected using Lipofectamine 3000 (Invitrogen). The cells were returned to the incubator for 6 hours. Subsequently, the medium was changed to fresh media. Virions were collected 24 and 48 hr after transfection, snap-frozen, and stored at −80°C for future us e.

Assay for Transposase-Accessible Chromatin (ATAC)

ATAC-seq was performed as previously described with minor modifications (Buenrostro et al., 2013). Fifty thousand cells were pelleted at 550 x g and washed with 1 mL 1x PBS, followed by treatment with 50 μL lysis buffer (10 mM Tris-HCl [pH 7.4], 10 mM NaCl, 3 mM MgCl2, 0.1% IGEPAL CA-630). After pelleting nuclei, the pellets were resuspended in 50 μL transposition reaction with 2.5 μL Tn5 transposase (FC-121-1030; Illumina) to tag and fragment accessible chromatin. The reaction was incubated in a 37°C water bath for 45 minutes. Tagm ented DNA was purified using a MinElute Reaction Cleanup Kit (Qiagen) and amplified with 12 cycles of PCR. Libraries were purified using a QIAQuick PCR Purification Kit (Qiagen). Libraries were paired-end sequenced (38bp+37bp) on a NextSeq 550 (Illumina). For accessibility in NIH3T3 cells, two biological replicates were performed at both 48 and 96 hr time points after transduction. Three technical replicates were performed between WT and TCF-1 KO DP T cells.

Single Cell ATAC

Single cell ATAC-seq was performed as previously described (Buenrostro et al., 2015) using the C1 Single-Cell Auto Prep System with the C1 Open App program (Fluidigm). Briefly, cells were FACS sorted to high viability and purity. Cells were then stained with mammalian LIVE/DEAD Viability/Cytotoxicity Kit (Invitrogen) for 10 minutes on ice at a final concentration of 5 μM Ethidium homodimer-1 and 5 μM Calcein AM in 1x PBS. After staining, cells were diluted in RPMI-1640 to a concentration of 400,000 cells mL−1. C1 Cell Suspension Reagent (Fluidigm) was added to a final concentration of 20%. Brightfield and fluorescent images of each capture site was taken with a Leica DMi8. The Lysis/Tagmention step in the C1 protocol was lengthened to a duration of 60 minutes using the Open App software (Fluidigm). After single cell ATAC-seq chemistry was performed on the Fluidigm C1, tagmented DNA was harvested and amplified for 14 PCR cycles (Fluidigm). Libraries were paired-end sequenced (38bp+37bp) on a NextSeq 550. Three captures of DP T cells were performed over the course of this study.

Chromatin Immunoprecipitation (ChIP) assay

Briefly, chromatin samples prepared from fixed cells were immunoprecipitated with antibodies recognizing mouse TCF-1 (C46C7; CST), H3K9me3 (AM39161; Active Motif), and H3K27me3 (07-449; EMD Millipore). Antibody-chromatin complexes were captured with protein G–conjugated beads, washed, and eluted. After reversal of cross-linking, RNase and proteinase K treatment were performed and DNA was purified and quantified for library preparation. Input sample was prepared by the same approach without immunoprecipitation. Libraries were then prepared using the Ultra DNA Library Prep Kit (NEB). Two replicates were performed for each condition. Indexed libraries were validated for quality and size distribution using a TapeStation 2200 (Agilent). Single end sequencing (75 bp) was performed on a NextSeq 550.

RNA-seq

Cells were washed once with 1x PBS before resuspending pellet in 350 μL Buffer RLT Plus (Qiagen) with 10% 2-Mercaptoethanol (Sigma), vortexed briefly, snap-frozen on dry ice, and stored at −80°C. Subsequently, total RNA was isolated using the RNeasy Plus Micro Kit (Qiagen). RNA integrity numbers were determined using a TapeStation 2200 (Agilent), and all samples used for RNA-seq library preparation had RIN numbers greater than 9.5. Libraries were prepared using the SMARTer® High-Input Strand-Specific Total RNA-seq for Illumina kit (Clontech). Libraries were single-end sequenced (75 bp) on a NextSeq 550. Three biological replicates were performed for TCF-1 RV and Empty RV transduced NIH3T3 cells. Two technical replicates were performed in WT and TCF-1 KO DP T cells.

Cell staining and flow cytometry

Single-cell suspensions were prepared from thymi of mice by dissociation of tissue through 70 μM mesh filters (Falcon) in RPMI 1640 (Corning) +1% FBS (Gemini), and surfaces were stained following standard protocols. The fluorochrome-conjugated, anti-mouse antibodies were as follows: PE CD4 (RM4-4), APC CD8a (53-6.7), PE c-Kit (2B8), APC CD25 (PC61), and Streptavidin BV605. For intracellular detection of TCF-1 in RV-transduced NIH3T3, cells were harvested after trypsin dissociation (Gibco), fixed with 1% PFA for 10 minutes on ice to preserve VEX signal, fixed and permeabilized with the FoxP3/Transcription Factor Staining Buffer Set (eBioscience), and incubated with PE-conjugated anti-TCF-1 (S33-966). All antibodies used for flow cytometry were purchased from BioLegend or BD Biosciences. Data were collected on an LSRII running DIVA software (BD Biosciences) and were analyzed with FlowJo software v10.2 (TreeStar).

Cell sorting

Antibodies used in the lineage cocktail (Lin) include biotinylated antibodies against B220 (RA3-6B2), CD19 (1D3), CD11b/Mac1 (M1/70), Gr1 (8C5), CD11c (HL3), NK1.1 (PK136), TER119 (TER-119), CD3ε (2C11), CD8α (53-6.7), CD8β (53-5.8), TCRβ (H57), γδ TCR (GL-3). After surface staining with the lineage cocktail, cells were incubated with Streptavidin Microbeads (Miltenyi Biotec). DN cells were then negatively isolated from total thymocytes using magnetic separation columns (Miltenyi Biotec). Negatively selected cells were then stained with c-Kit and CD25 followed by Strepavidin BV605 to reveal escaping Lin+ cells. The DN3 population was defined and cell-sorted as Lin Kit CD25+. Total thymocytes were stained with CD4+ CD8+ to define and sort the DP population. Dead cells were excluded through 7-amino-actinomycin D (7-AAD) uptake. Doublets were excluded through forward scatter–height by forward scatter–width and side scatter–height by side scatter–width parameters. Purity was verified after sorting, and all cell populations were sorted to a purity of >98%. Sorting was performed on FACS Aria II (BD Biosciences) and were analyzed with FlowJo v10.2 (TreeStar).

High-throughput sequencing data pre-processing

Quality assessment of raw reads was achieved with FastQC and contaminants were removed using Trimgalore with parameters ‘-q 15 --length 20 --stringency 5’. For RNA-Seq samples, ‘--clip_R1 3’ was added to the Trimgalore parameters facilitating the removal of the 3nt bias introduced to the 5′ end of reads. Human (GRCh37, November 17 2015) and mouse (GRCm38, May 23 2014) reference genomes were downloaded from UCSC repository and mouse gene models were derived from Gencode vM11.

Bulk ATAC-seq samples were mapped to the reference genomes using Bowtie2 v2.2.9 (Langmead and Salzberg, 2012) with –X2000. STAR v2.5 (Dobin et al., 2013) was used for aligning single-cell ATAC, RNA, ChIP and MNase-seq reads with parameters specifically optimized based on the properties of each protocol. RNA-seq samples were analyzed with parameters ‘--outFilterMultimapNmax 1 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --alignEndsType Local’. On the other hand, ChIP-seq raw reads were aligned with parameters ‘--alignSJDBoverhangMin 999 --alignSJoverhangMin 999 --alignIntronMax 1 --outFilterMultimapNmax 1 --outFilterScoreMinOverLread 0 --outFilterMatchNminOverLread 0 --alignEndsType Local’ to disable the usage of known and prevent calling novel splice junctions. The same parameters were also applied for mapping scATAC-seq and MNase-seq data combined with ‘--alignMatesGapMax 2000’ which limits the distance between aligned read mates to 2,000bp.

Reads aligned to the mitochondrial genome as well as reads mapping to multiple genomic loci were discarded from downstream analyses. Additionally, Picard minimized the PCR amplification bias in ATAC-, ChIP- and MNase-seq samples. In cases of paired-end MNase-seq samples, fragments smaller than 75bp were also filtered out.

ATAC-seq samples derived from single DP T cells were filtered using previously described quality standards (Buenrostro et al., 2015). In brief, libraries containing less than 10,000 fragments or libraries with less than 15% of their fragments falling in open chromatin (as defined in the single cell accessibility section) were also removed from subsequent analyses (Figure S3C–D).

QUANTIFICATION AND STATISTICAL ANALYSIS

All statistical analyses were performed using packages from R’s basic installation.

Differential gene expression analysis

HTSeq v0.6.1 (Anders et al., 2015) facilitated counting RNA-seq reads on Gencode vM11 gene models with parameters ‘-s yes -t exon -m intersection-nonempty’. DESeq2 (Love et al., 2014) was subsequently applied on gene counts to identify genes differentially expressed between DP WT and DP Tcf7−/− (Figure S2C), NIH3T3 Empty RV and NIH3T3 TCF-1 RV (Figure S6B) as well as DP TCF-1 WT and NIH3T3 Empty RV cells after removing entries that exhibited zero counts in all replicates (Table S4). The quality of replicates was assessed by calculating pairwise spearman correlation coefficient (Figures S2B and S6A) as well as plotting the variability explained by the first two principal components (data not shown).

Additionally, gene expression levels were calculated in a variety of cell types ranging from hematopoietic stem cells to effector and memory T cells (Table S4) and normalized using the variance stabilizing transformation (VST) (Love et al., 2014). K-means (k=12) clustering was then applied on the VST expression values of genes up-regulated by TCF-1 in NIH3T3 cells to identify cell state specific patterns (i.e., clusters) of TCF-1 regulated gene expression. For the same set of genes, we also calculated RPKM normalized expression values that were used to filter out lowly expressed genes (RPKM < 0.5 in all samples) and visualizing the clusters (Figure S6D–E). Cluster 1 was removed from the analysis due to low expression levels in all hematopoietic lineages. Genes downregulated in the Tcf7−/− DP T cells were overlapped with the genes up-regulated by TCF-1 RV in NIH3T3 and the significance of the overlap was tested by Fisher’s exact test. (Figure S6F).

Defining thymocyte-specific gene program

Normalized microarray expression data for bone marrow stem cell and thymocyte populations was downloaded from the Immunological Genome Project Consortium (Heng et al., 2008). Microarray probe IDs (affy mogene 1.0st v1) were converted to Ensembl gene IDs using the Ensembl mouse gene mart (GRCm38.p5) in biomaRt (Aken, Oxford Database). Genes were considered expressed in a population if expression values were above 120 indicating >95% probability of true expression (Ericson, ImmGen guideline). To define thymocyte-specific genes (Figure 6D), genes were filtered based on expression values lower than 120 in all considered progenitor populations (LT-HSC, ST-HSC, MPP, CLP) and with expression values higher than 120 in at least 1 thymocyte population (ETP, DN2a, DN2b, DN3a, DN3b, DN4, ISP, DP, CD4+, CD8+). Genes were further filtered based on having at least a 2-fold increase in expression between any two populations. The overlap of thymocyte-specific genes and genes up-regulated by TCF-1 RV in NIH3T3 (Table S4) was determined using the GeneOverlap package. Genes up-regulated by TCF-1 RV in NIH3T3, described in previous sections, but not overlapping with thymocyte-specific genes were filtered based on expression >120 in at least one progenitor population and plotted (Figure 6D). Thymocyte genes were grouped into patterns of expression by combining thymocyte-specific genes with both overlapping and non-overlapping with genes up-regulated by TCF-1 RV in NIH3T3 and performing k-means clustering using 5 centers. Gene ontology analysis (Figures S1E and S6C) was performed using the Gene Ontology gene set collection in MSigDB database v6.1.

Peak calling

Following ENCODE guidelines, for the characterization of reproducible TCF-1 peaks in NIH3T3 TCF-1 RV cells, macs2 v2.1.1 (Zhang et al., 2008) was initially applied separately on each of the two ChIP-seq replicates as well as after merging both replicates with parameters ‘--nomodel --extsize 300 --keep-dup all --call-summits -q 0.9’ using the TCF-1 ChIP-seq on NIH3T3 Empty RV cells as control. The identified peaks were filtered with Irreproducible Discovery Rate (IDR) v2.0.2 (Grant et al., 2011) using an IDR threshold of 2e-2 resulting in a high-quality set of 40,562 reproducible peaks. TCF-1, GATA3, RUNX1 and PU.1 binding sites in mouse thymocytes were identified by applying macs2 with parameters ‘-p 1e-3 -q 0.05’ using the corresponding Input samples as control resulting in 56,817 TCF-1 peaks, 54,475 GATA3, 67,915 RUNX1, 98,036 PU.1 in DN1 and 92,660 in DN2a. A proximity-based strategy was adopted for linking genes to regulatory elements and transcription factor binding sites. Gene models were downloaded from Gencode M11 and both ends of each gene were extended by 5kbp. Open chromatin sites identified by ATAC-seq as well as ChIP-seq derived transcription factor binding sites were assigned to genes if they were found to overlap with their extended models.

Differentially accessible chromatin between DP WT and Tcf7−/− as well as between NIH3T3 TCF-1 and Empty RV cells

Macs2 with ‘-p 1e-7 --nolambda --nomodel’ was applied on each DP WT and DP Tcf7−/− ATAC-seq replicate (Tables S3) separately to identify accessible chromatin. Peaks were subsequently merged using BEDTools (Quinlan and Hall, 2010) and ATAC-seq read counts were calculated in the merged peaks for every replicate. The resulting count table was used to identify 6,165 (1,165 presenting more and 5,000 less enrichment in DP Tcf7−/−) loci differentially enriched in ATAC-seq signal between DP WT and DP Tcf7−/− with DESeq2 (Table S3) after applying a 0.001 and 0.58 cutoff on p-value and logFC respectively (Figure 2A).

The same approach and cutoff were applied in NIH3T3 cells (Figure S4D) for identifying 8,506 genomic regions presenting differential ATAC-seq signal enrichment between Empty and TCF-1 RV (6,882 presenting more and 1,618 less enrichment in TCF-1 RV choosing 2 and 0.001 as cutoffs for fold-change and p-value respectively, Table S3).

Characterization of cell-state specific accessible chromatin

An IDR threshold of 5e-2 was used, following the pipeline described in previous section, to identify accessible chromatin for every murine ATAC-seq sample (HSC, MPP, CLP, B, NK and all stages of T cell development from DN1 to naïve CD4+ and naïve CD8+ cells). Peaks were merged and filtered based on their overlap with annotated promoters (Gencode M11 TSSs extended by +4kb/−2kb) resulting in a collection of 55,481 distal regulatory elements. The FDR value of each peak in every cell type was used as a proxy for the level of accessibility.

Each peak was assigned a 13-dimensional vector containing the ATAC-seq enrichment proxy in every cell type. Average Silhouette Width (ASW) statistic was used for deciding on the number of clusters prior to applying k-means. The initial set of regulatory regions was reduced after removing the members of clusters 7, 8, 10, 11, 13, 17 and 23 (Figure S1A, Table S1). The remaining 35,869 loci were re-clustered after re-calculating ASW (data not shown) to produce the final set of groups (Figure 1A, Table S2). Normalized (TPM) ATAC-seq profile for every regulatory element was calculated by segmenting a +/− 2,000bp window around its center in 10bp bins and calculating the normalized overlapping ATAC-seq tag counts (Figures 1B and S1B). De novo motif analysis using Homer with ‘size given -len 6,8,10’ was applied on each cluster separately using the excluded set of clusters as background (Figure 1C–E, Figure S1C). Additionally, odds ratio and percentage of binding of TCF-1, GATA3, RUNX1 and PU.1 (DN1 and DN2a) was calculated for each cluster based on publicly available ChIP-seq data (Figure 1F, Table S2).

An alternative approach was used for identifying T cell specific accessible chromatin in human cells (Figure S1D). The lack of replicates for certain cell types restricted the use of IDR. Therefore, macs2 with parameters ‘-p 1e-7 --nolambda --nomodel’ was used for every cell type (HSC, MPP, CLP, B, NK, Naïve CD4+ and Naïve CD8+ cells) on each replicate separately. Peaks were merged with BEDTools and normalized ATAC-seq enrichment for every cell type was calculated after merging the replicate samples within each cell type. Gencode M11 gene models were used to separate the set of ATAC-seq peaks into distal and promoter related loci after extending the annotated gene transcription start sites by −4kb/+2kb.

Each peak was assigned a 7-dimensional vector containing the normalized ATAC-seq enrichment in every cell type. Within Sum of Squares (WSS) statistic was used (data not shown) for deciding on the number of clusters prior to applying k-means (k=10 for the distal sets and k=5 for the promoter sets). De novo motif analysis using Homer with ‘-size given -len 6,8,10,12’ was applied on each cluster separately with remaining peaks in other clusters as background (Figure S1D).

Querying chromatin accessibility at the single-cell level

To assess whether TCF-1 binding events harbor the strongest chromatin accessibility as measured by ATAC-seq in DP T cells, we measured genome-wide binding of TCF-1, RUNX1 and GATA3 by ChIP-seq as previously described. An equal number of genomic regions with unique binding of each transcription factor were subsampled and the normalized tag count enrichment from ATAC-seq in DP T cells facilitated the comparison of the 3 regulatory proteins (Figure 3A).

Based on this analysis, TCF-1 bound open chromatin was found to exhibit the highest levels of accessibility compared to RUNX1 and GATA3. This observation inspired us to further investigate with a single cell analysis. ATAC-seq data from 110 single DP T cells passing previously defined (Buenrostro et al., 2015) quality standards (Figure S3D) were utilized to test the hypothesis that TCF-1 exerts a deterministic effect on the chromatin, forcing T cell fate commitment. Following preprocessing and alignment, DP single cell ATAC-seq reads were merged and using macs2 with parameters ‘-p 1e-3’, 22,774 accessible sites were identified.

To assess the correlation between aggregated single cell and bulk ATAC-seq enriched sites identified from both experimental procedures were merged. Normalized enrichment was subsequently calculated in bulk (downsampled to 11.6 million reads using samtools) and aggregated scATAC-seq with 11.6 million reads enabling the correlation level quantification between the two assays (Figure 3B). Our objective was to assess whether TCF-1-bound open chromatin had lower accessibility variance than background noise and chromatin bound by RUNX1 or GATA3. To this end, we generated 4 disjoint sets comprising of ATAC-seq peaks uniquely bound by TCF-1, RUNX1, GATA3 as well as peaks not bound by any of these three transcription factors. For each subset, binarized accessibility matrices were calculated based on the overlap between the identified peaks and ATAC-seq reads from each cell, thus 1 translates to accessible and 0 to inaccessible regions.

TCF-1 binding events overlapped with more ATAC-seq peaks than RUNX1 or GATA3, therefore we subsampled 30 peaks from each TF-bound peak set. We repeated the subsampling process 500 times to increase accuracy. We then calculated the accessibility variance between cells at each subsample as follows. For each subsample, the binary accessibility vector of each cell formed a 30-dimensional vector. To measure cell to cell differences in accessibility levels, we calculated the pairwise Manhattan distance between accessibility vectors, forming a distance matrix. Where p and q are n-dimensional vectors:

Manhattan(p,q)=i=1npi-qi

We subsequently centered the Manhattan distance matrix by subtracting column and row means and adding the overall mean. Then we spectrally decomposed the centered matrix to define principal coordinates, and mapped all accessibility vectors to full principal coordinate space. We identified the location that minimized the average distance to all vectors, termed the spatial median (Figure 3D). Then, we calculated each vector’s distance from the spatial median. Finally, we calculated the average distance from accessibility vectors to the spatial median using the R package vegan (Oksanen 2017).

Correction for Technical Biases

Variation associated with technical factors such as GC content and mean accessibility differences can often introduce obstacles in interpreting NGS data. To overcome such limitations, for every original peak, we selected 30 “technical control” ones. The set of peaks not bound by any TF were divided into 2-percentiles based on GC content. Every original peak was subsequently placed into a 2-percentile, and 30 technical control peaks within a 2-percentile of GC content were randomly subsampled with replacement. All technical control peaks were also within +/− 0.01 of the overall mean accessibility of the original 30 peaks.

We controlled for technical biases as follows:

ControlledVariation=OldVariationMean(30TechnicalVariations)

We repeated this for every one of the 500 subsamples for every TF. Then we took the average.

Correction for Background Noise

To measure accessibility variation beyond background noise, we calculated accessibility variation (with technical controls) for 500 randomly selected subsamples of peaks bound by no TF. This can be viewed as a negative control. Then we accounted for background noise as follows:

FinalVariation=Mean(500ControlledVariations)Mean(500BackgroundVariations)

A variability equal to 1 implied that a TF was associated with no more variation than background noise. A variability below 1 implied that a TF was associated with less variation than background noise, and a variability above 1 implied greater variation than background noise.

In addition to the methodology described above, we also applied chromVAR for assessing the deterministic effect of TCF-1 on shaping the chromatin landscape during T cell development (Figure 3E).

ChIP-seq oriented approach for assessing the deterministic effect of TCF-1 during T cell development

An alternative, unbiased strategy was also adopted which, unlike the previous approach, was not formed on the basis of TCF-1 binding (Figure 3F). The T cell specific sites in cluster 9 (Figure 1A) were ranked based on the sum of binary counts across individual T cells. Using default parameters of bedtools intersect, the overlap of regions with ChIP-seq signal from transcription factors known to be important in T cell development such as TCF-1, GATA3 and RUNX1 was assessed. De novo motif analysis was performed on the top and bottom 100 enhancers that exhibit the highest and lowest homogeneity respectively at the single cell level using Homer with parameters ‘-size given -len 6,8,10,12’. Background control in this motif analysis was any other open chromatin sites in DP T cells. The top/bottom 100 enhancers were also linked to genes based on proximity (<10kbp) in order to enable GO term enrichment analysis using the GSEA software.

Identifying the nucleosome occupancy level on TCF-1 binding sites

MNase-seq in mouse embryonic fibroblasts was used as a proxy for observing the nucleosome enrichment surrounding TCF-1 binding sites. To this end, the region around TCF-1 peak summits was divided into 3 windows of 200bp each; −300/−100, −100/+100 and +100/+300 following a upstream-central-downstream rationale. The nucleosome enrichment in every window was approximated by calculating the number of overlapping MNase-seq reads after extending their 3′ end to 147bp and normalizing based on the number of uniquely mapped reads in each sample. TCF-1 summits were subsequently ranked from high to low enrichment by summing the values of left, central and right windows. For visualization purposes, the normalized MNase-seq enrichment was also calculated for 10bp non-overlapping bins spanning the +/− 3kb region centered around TCF-1 summits (Figure 4A). In the case of mouse embryonic fibroblasts, visualizing the prior nucleosome enrichment status on the genomic loci bound by TCF-1 after Tcf7 retroviral transduction clearly suggests that TCF-1 binding occurs on: a) nucleosome dense, b) nucleosome free and c) regions of intermediate nucleosome occupancy (Figure 4A). Instead of choosing an arbitrary threshold on the ratio of central versus left and right window nucleosome enrichment, k-means (k = 3) clustering was applied resulting in the formation of 3 TCF-1 summit groups and validating the previously described observation (Figure S4C). A total of 29,661 (73.2%) TCF-1 binding events occur on sites with dense (10,593, 26.2%) or intermediate (19,068, 47%) nucleosome enrichment and 10,901 (26.8%) on nucleosome free regions. Dpos module from Danpos2 (Chen et al., 2013) was applied on the MNase-seq data with default settings to identify nucleosome positioning as well as calculate the nucleosome enrichment profile on a genome-wide scale. Regions called as nucleosomes exhibiting increased fuzziness (Dpos score less than 80) were removed from subsequent analyses. The distance of 40,562 TCF-1 summits in mouse embryonic fibroblasts to the closest nucleosome summit was calculated as an alternative strategy of assessing the ability of TCF-1 to directly bind on nucleosomes (Figure 4B). The typical length of DNA fragments wrapped around nucleosomes is 147bp. This allowed us to classify 27,145 TCF-1 summits (66.9%) located less than 75bp (vertical dashed red line) away from nucleosome summits as bound to nucleosomes and 13,417 (33.1%) summits as unbound. As a control, we applied the same bound/unbound to nucleosomes classification scheme on CTCF summits derived from analyzing public ChIP-seq data, resulting in 20,370 (56.6%) bound and 15,616 (43.4%) unbound summits.

To assess the difference of nucleosome occupancy level around TCF-1 ChIP-seq peak summits between Empty RV and TCF-1 RV NIH3T3 cells, TCF-1 summits (IDR less than or equal to 0.02) were intersected with ATAC-seq enriched regions in both conditions. Summits overlapping ATAC-seq peaks in either set (n=15,763) were extended by +/− 500 bases and nucleosome occupancy in Empty and TCF-1 RV NIH3T3 cells was measured using NucleoATAC algorithm (Schep et al., 2015). NucleoATAC infers nucleosome enrichment by integrating large and small ATAC-seq fragment positioning in accessible chromatin. Therefore, to quantitate nucleosome enrichment around TCF-1 summits with NucleoATAC algorithm, ATAC-seq signal from both Empty and TCF-1 RV NIH3T3 samples is required. Out of 15,763 queried summits 7,395 were found to exhibit at least 1.5 fold-change difference in nucleosome occupancy signal between Empty and TCF-1 RV NIH3T3 cells (Figure S4F).

T cell gene enrichment in nucleosome enriched based clusters of TCF-1 summits

Based on the previously described analysis regarding the pre-induced nucleosome enrichment levels around TCF-1 ChIP-seq derived binding events in TCF-1 RV NIH3T3 cells, we identified 3 clusters of TCF-1 summits (Figure S4C). TCF-1 summits with high (n=10,593), intermediate (n=19,068) and low (n=10,901) nucleosome enrichment. In parallel, the previously described differential gene expression analysis between Empty RV NIH3T3 and DP T cells, identified 3,349 genes as DP T and 4,040 as NIH3T3 cell-specific. To calculate the enrichment of the 2 gene sets in the 3 nucleosome enrichment clusters, TCF-1 peak summits were associated with genes, as described in previous section, resulting in 27,794 interactions between 24,330 TCF-1 summits and 10,212 genes. To remove redundancy in the association between genes and nucleosome clusters we filtered out genes associated to zero or more than one clusters. The remaining were used to calculate the enrichment of DP T cell-specific genes in high, intermediate and low MNase clusters with Fisher’s exact test (Figure S6I).

Motif distances from nucleosome summit

MEME-FIMO (Grant et al., 2011) and TCF-1 position probability matrix (MA0769.1) from JASPAR facilitated the discovery of 1,102,896 putative TCF-1 binding sites (motifs) in the mouse genome using a p-value threshold of 1e-4. 17,816 motifs were found to overlap with TCF-1 ChIP-seq peaks specific to TCF-1 RV NIH3T3 and 7,782 with peaks specific to DP T cells. To avoid biases associated to imbalanced number of motif occurrences in peaks, a one-to-one association between motifs and summits was created by selecting the closest to summit motif per peak with a maximum distance threshold of 100bp. This resulted in the finalized sets of motifs bound in TCF-1 RV NIH3T3 (n=10,665) and DP T cells (n=6,115). The remaining unbound putative TCF-1 sites were grouped into motif hotspots using a distance threshold of 500bp. For every hotspot the motif with the highest FIMO score was selected as its representative (random selection for ties) resulting in the formation of the final ‘Random’ set of unbound motifs (n=862,733) that were used as control.

Nucleosome positions were called using Danpos2 (Chen et al., 2013) as previously described. The distance between motifs from NIH3T3, DP T and Random sets to their closest nucleosome dyad was calculated using BEDTools (Quinlan and Hall, 2010). The visual comparison of the distribution of distances between each cell type specific set and the Random set was achieved by randomly selecting 1,000 samples from each set with replacement, plotting the density of distances and repeating this process 1,000 times (Figure S4D). To assess whether there is a statistically significant difference in the median motif distance from the nucleosome dyad between each cell-specific (target) set and the Random set, we carried out two separate bootstrapping procedures, one for each target set. Distances from the target and Random set were combined into a pooled vector. Both target and Random sets were transformed by subtracting each set’s mean from every member of the relevant set and adding the mean of the pooled vector. This way, both sets are first centered around their mean and then shifted by the pooled mean resulting in the proper transformation for testing the null-hypothesis (no difference between median motif distances from the nucleosome center of the two sets) without making any assumptions about their variance. Subsequently, we randomly selected 1,000 samples (with replacement) from each transformed set and compared the difference between median distances. After repeating this process 100,000 times we divided the number of times we observed a difference between the median distances larger than (or equal to) the raw difference (no subsampling) to calculate the p-value (Figure S4D).

Characterization of the chromatin state in NIH3T3 cells prior to Tcf7 retroviral transduction

In addition to having established the pre-Tcf7 overexpression nucleosome occupancy environment in NIH3T3 cells, querying the chromatin state landscape is a critical step towards unveiling the properties of TCF-1 binding in a genome-wide, quantitative way. To this end, the enrichment of H3K4me3, H3K4me1, H3K9me3, H3K27me3 and H3K27ac versus Input in the +/− 1kb (and +/− 250bp) region surrounding TCF-1 summits has been calculated by modeling read counts with a binomial mixture model of two components with normR (Helmut and Chang, bioRxiv). The first component models the background and the second one the signal, independently for each histone mark, resulting in a five-dimensional vector of p-values adjusted for multiple comparisons for every summit. H3K27me3, H3K9me3, and H3K27ac enrichment in TCF-1 RV NIH3T3 cells was calculated as well (Figure 5C–D). Furthermore, the enrichment of ATAC-seq in NIH3T3 TCF-1 RV versus Empty RV cells and vice versa has also been calculated around summits.

These enrichment results facilitated the assessment of correlations between the chromatin status and chromatin accessibility before and after Tcf7 overexpression (Figure 5, Figure S5). Additionally, k-means clustering has been applied on TCF-1 summits based on the enrichment level of the 5 chromatin marks in pre-induced cells resulting in the formation of 11 clusters (Figure 5B, Figure S5D, Tables S5 and S6), following silhouette coefficient analysis (Figure S5C). For visualization purposes, the normalized histone mark ChIP-seq as well as ATAC-seq enrichment was also calculated for 10bp non-overlapping bins spanning the +/− 3kb region centered around TCF-1 summits separately for each cluster.

Deregulated genes in NIH3T3 TCF-1 RV cells were linked to TCF-1 binding sites based on the proximity strategy described in previous sections. Consequently, genes were also connected to chromatin states (Table S6). This enabled the calculation of the significance of up- and down-regulated genes enrichment in each chromatin state using Fisher’s exact test (Figures 6E and S6G). To assess differences in the enrichment of H3K9me3, H3K27me3 and H3K27ac ChIP-seq signal around TCF-1 binding events between pre-induced and TCF-1 RV NIH3T3 cells, we used diffR function from normR package using an FDR threshold of 5e-2.

Gene set enrichment analysis

Pre-ranked lists of genes were used by ranking genes using estimated log2 fold-change in DESeq2. GSEA v2.2.4 with default parameters was used to perform gene set enrichment analysis.

DATA AND SOFTWARE AVAILABILITY

The accession number for the ChIP-seq, RNA-seq and ATAC-seq reported in this study is NCBI GEO: GSE99159

Supplementary Material

1
2
3
4
5
6
7

Highlights.

  • TCF-1 drives the early wave of chromatin accessibility in T cell development

  • Tcf7−/− mice cannot establish the open chromatin landscape of normal T cells

  • At the single-cell level, TCF-1 dictates a coordinate opening of the chromatin

  • TCF-1 can erase repressive marks and activate T cell-restricted genes in fibroblasts

Acknowledgments

The authors thank the NGS sequencing facilities at Penn’s School of Medicine and Vet School. We are grateful to Drs. Shelley Berger, Kenneth Zaret, Douglas Epstein, Klaus Kaestner, R. Babak Faryabi, Jorge Henao-Mejia, Maria Fasolino, and Naomi Goldman for critically reading this manuscript. We thank members of the Vahedi, Wherry, and Faryabi labs particularly Aaron Chen, Gregory Schwartz, and Yeqiao Zhou. We thank Stephanie Sansbury for initial RNA-seq library and Junko Kurachi for designing and validating the constructs. This study is funded by NIH K22AI112570, Sloan Foundation, and National Psoriasis Foundation to G.V., LLS Fellow Award to J.P., and NIH R01AI047833 and P01CA119070 to W.S.P, and NIH AI105343, AI082630, AI112521, AI115712, AI117718, AI108545, AI117950, and the Parker Institute fund for Cancer Immunotherapy to E.J.W.

Footnotes

Declaration of Interests

The authors declare no competing interests.

Author contributions

All authors contributed extensively to the work presented in this paper. J.L.J. designed and conducted the experiments. G.G., G.V., J.L.J. performed computational analysis, wrote the code and analyzed the data. S.C. developed a method for single-cell ATAC-seq, J.P. performed ChIP-seq experiments. M.K. designed and validated the retroviral constructs. C.H. provided mice. W.P., A.B., and E.J.W provided technical support, reagents, and conceptual advice. G.V. conceived the project, administered the experiments and analyses, and wrote the manuscript.

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  1. Anders S, Pyl PT, Huber W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Barozzi I, Simonatto M, Bonifacio S, Yang L, Rohs R, Ghisletti S, Natoli G. Coregulation of transcription factor binding and nucleosome occupancy through DNA features of mammalian enhancers. Mol Cell. 2014;54:844–857. doi: 10.1016/j.molcel.2014.04.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Boller S, Ramamoorthy S, Akbas D, Nechanitzky R, Burger L, Murr R, Schubeler D, Grosschedl R. Pioneering Activity of the C-Terminal Domain of EBF1 Shapes the Chromatin Landscape for B Cell Programming. Immunity. 2016;44:527–541. doi: 10.1016/j.immuni.2016.02.021. [DOI] [PubMed] [Google Scholar]
  4. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013;10:1213–1218. doi: 10.1038/nmeth.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, Chang HY, Greenleaf WJ. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature. 2015;523:486–490. doi: 10.1038/nature14590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Chen J, Liu H, Liu J, Qi J, Wei B, Yang J, Liang H, Chen Y, Chen J, Wu Y, et al. H3K9 methylation is a barrier during somatic cell reprogramming into iPSCs. Nat Genet. 2013;45:34–42. doi: 10.1038/ng.2491. [DOI] [PubMed] [Google Scholar]
  7. Ciofani M, Madar A, Galan C, Sellars M, Mace K, Pauli F, Agarwal A, Huang W, Parkurst CN, Muratet M, et al. A validated regulatory network for th17 cell specification. Cell. 2012;151:289–303. doi: 10.1016/j.cell.2012.09.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Di Stefano B, Sardina JL, van Oevelen C, Collombet S, Kallin EM, Vicent GP, Lu J, Thieffry D, Beato M, Graf T. C/EBPalpha poises B cells for rapid reprogramming into induced pluripotent stem cells. Nature. 2014;506:235–239. doi: 10.1038/nature12885. [DOI] [PubMed] [Google Scholar]
  9. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Dose M, Emmanuel AO, Chaumeil J, Zhang J, Sun T, Germar K, Aghajani K, Davis EM, Keerthivasan S, Bredemeyer AL, et al. beta-Catenin induces T-cell transformation by promoting genomic instability. Proc Natl Acad Sci U S A. 2014;111:391–396. doi: 10.1073/pnas.1315752111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Germar K, Dose M, Konstantinou T, Zhang J, Wang H, Lobry C, Arnett KL, Blacklow SC, Aifantis I, Aster JC, Gounari F. T-cell factor 1 is a gatekeeper for T-cell specification in response to Notch signaling. Proc Natl Acad Sci U S A. 2011;108:20060–20065. doi: 10.1073/pnas.1110230108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ghisletti S, Barozzi I, Mietton F, Polletti S, De Santa F, Venturini E, Gregory L, Lonie L, Chew A, Wei CL, et al. Identification and characterization of enhancers controlling the inflammatory gene expression program in macrophages. Immunity. 2010;32:317–328. doi: 10.1016/j.immuni.2010.02.008. [DOI] [PubMed] [Google Scholar]
  13. Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gray SM, Amezquita RA, Guan T, Kleinstein SH, Kaech SM. Polycomb Repressive Complex 2-Mediated Chromatin Repression Guides Effector CD8+ T Cell Terminal Differentiation and Loss of Multipotency. Immunity. 2017;46:596–608. doi: 10.1016/j.immuni.2017.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38:576–589. doi: 10.1016/j.molcel.2010.05.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Heng TS, Painter MW Immunological Genome Project C. The Immunological Genome Project: networks of gene expression in immune cells. Nat Immunol. 2008;9:1091–1094. doi: 10.1038/ni1008-1091. [DOI] [PubMed] [Google Scholar]
  17. Horn PJ, Peterson CL. Molecular biology. Chromatin higher order folding-- wrapping up transcription. Science. 2002;297:1824–1827. doi: 10.1126/science.1074200. [DOI] [PubMed] [Google Scholar]
  18. Im SJ, Hashimoto M, Gerner MY, Lee J, Kissick HT, Burger MC, Shan Q, Hale JS, Lee J, Nasti TH, et al. Defining CD8+ T cells that provide the proliferative burst after PD-1 therapy. Nature. 2016;537:417–421. doi: 10.1038/nature19330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Li L, Zhang JA, Dose M, Kueh HY, Mosadeghi R, Gounari F, Rothenberg EV. A far downstream enhancer for murine Bcl11b controls its T-cell specific expression. Blood. 2013;122:902–911. doi: 10.1182/blood-2012-08-447839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Love JJ, Li X, Case DA, Giese K, Grosschedl R, Wright PE. Structural basis for DNA bending by the architectural transcription factor LEF-1. Nature. 1995;376:791–795. doi: 10.1038/376791a0. [DOI] [PubMed] [Google Scholar]
  22. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Oestreich KJ, Weinmann AS. Encoding stability versus flexibility: lessons learned from examining epigenetics in T helper cell differentiation. Curr Top Microbiol Immunol. 2012;356:145–164. doi: 10.1007/82_2011_141. [DOI] [PubMed] [Google Scholar]
  24. Osipovich O, Oltz EM. Regulation of antigen receptor gene assembly by genetic-epigenetic crosstalk. Semin Immunol. 2010;22:313–322. doi: 10.1016/j.smim.2010.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Pauken KE, Sammons MA, Odorizzi PM, Manne S, Godec J, Khan O, Drake AM, Chen Z, Sen DR, Kurachi M, et al. Epigenetic stability of exhausted T cells limits durability of reinvigoration by PD-1 blockade. Science. 2016;354:1160–1165. doi: 10.1126/science.aaf2807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Rothenberg EV, Moore JE, Yui MA. Launching the T-cell-lineage developmental programme. Nat Rev Immunol. 2008;8:9–21. doi: 10.1038/nri2232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Schep AN, Buenrostro JD, Denny SK, Schwartz K, Sherlock G, Greenleaf WJ. Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions. Genome Res. 2015;25:1757–1770. doi: 10.1101/gr.192294.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Schep AN, Wu B, Buenrostro JD, Greenleaf WJ. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat Methods. 2017 doi: 10.1038/nmeth.4401. [DOI] [PMC free article] [PubMed]
  30. Soufi A, Donahue G, Zaret KS. Facilitators and impediments of the pluripotency reprogramming factors’ initial engagement with the genome. Cell. 2012;151:994–1004. doi: 10.1016/j.cell.2012.09.045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Soufi A, Garcia MF, Jaroszewicz A, Osman N, Pellegrini M, Zaret KS. Pioneer transcription factors target partial DNA motifs on nucleosomes to initiate reprogramming. Cell. 2015;161:555–568. doi: 10.1016/j.cell.2015.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Struhl K. Fundamentally different logic of gene regulation in eukaryotes and prokaryotes. Cell. 1999;98:1–4. doi: 10.1016/S0092-8674(00)80599-1. [DOI] [PubMed] [Google Scholar]
  33. Vahedi G, Kanno Y, Furumoto Y, Jiang K, Parker SC, Erdos MR, Davis SR, Roychoudhuri R, Restifo NP, Gadina M, et al. Super-enhancers delineate disease-associated regulatory nodes in T cells. Nature. 2015;520:558–562. doi: 10.1038/nature14154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Vahedi G, Takahashi H, Nakayamada S, Sun HW, Sartorelli V, Kanno Y, O’Shea JJ. STATs shape the active enhancer landscape of T cell populations. Cell. 2012;151:981–993. doi: 10.1016/j.cell.2012.09.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Verbeek S, Izon D, Hofhuis F, Robanus-Maandag E, te Riele H, van de Wetering M, Oosterwegel M, Wilson A, MacDonald HR, Clevers H. An HMG-box-containing T-cell factor required for thymocyte differentiation. Nature. 1995;374:70–74. doi: 10.1038/374070a0. [DOI] [PubMed] [Google Scholar]
  36. Weber BN, Chi AW, Chavez A, Yashiro-Ohtani Y, Yang Q, Shestova O, Bhandoola A. A critical role for TCF-1 in T-lineage specification and differentiation. Nature. 2011;476:63–68. doi: 10.1038/nature10279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Wohrle S, Wallmen B, Hecht A. Differential control of Wnt target genes involves epigenetic mechanisms and selective promoter occupancy by T-cell factors. Mol Cell Biol. 2007;27:8164–8177. doi: 10.1128/MCB.00555-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Xing S, Li F, Zeng Z, Zhao Y, Yu S, Shan Q, Li Y, Phillips FC, Maina PK, Qi HH, et al. Tcf1 and Lef1 transcription factors establish CD8(+) T cell identity through intrinsic HDAC activity. Nat Immunol. 2016;17:695–703. doi: 10.1038/ni.3456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Yu B, Zhang K, Milner JJ, Toma C, Chen R, Scott-Browne JP, Pereira RM, Crotty S, Chang JT, Pipkin ME, et al. Epigenetic landscapes reveal transcription factors that regulate CD8+ T cell differentiation. Nat Immunol. 2017;18:573–582. doi: 10.1038/ni.3706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Zaret KS, Carroll JS. Pioneer transcription factors: establishing competence for gene expression. Genes Dev. 2011;25:2227–2241. doi: 10.1101/gad.176826.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Zhang JA, Mortazavi A, Williams BA, Wold BJ, Rothenberg EV. Dynamic transformations of genome-wide epigenetic marking and transcriptional control establish T cell identity. Cell. 2012;149:467–482. doi: 10.1016/j.cell.2012.01.056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of ChIP-Seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Zhou X, Yu S, Zhao DM, Harty JT, Badovinac VP, Xue HH. Differentiation and persistence of memory CD8(+) T cells depend on T cell factor 1. Immunity. 2010;33:229–240. doi: 10.1016/j.immuni.2010.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3
4
5
6
7

RESOURCES