Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Mar 7.
Published in final edited form as: Nat Immunol. 2023 Aug 17;24(10):1698–1710. doi: 10.1038/s41590-023-01599-7

Intrinsically disordered domain of transcription factor TCF-1 is required for T cell developmental fidelity

Naomi Goldman 1,2,3, Aditi Chandra 1,2,3,*, Isabelle Johnson 1,2,3,*, Matthew A Sullivan 2,4,5, Abhijeet R Patil 1,2,3, Ashley Vanderbeck 2,6, Atishay Jay 1,2,3,7, Yeqiao Zhou 2,3,8, Emily K Ferrari 1,2,3, Leland Mayne 9, Jennifer Aguilan 10, Hai-Hui Xue 11,12, Robert B Faryabi 2,3,8, E John Wherry 2,4,13,14, Simone Sidoli 10, Ivan Maillard 2,6,14, Golnaz Vahedi 1,2,3,13,14,#
PMCID: PMC10919931  NIHMSID: NIHMS1947029  PMID: 37592014

Abstract

In development, pioneer transcription factors access silent chromatin to reveal lineage-specific gene programs. The structured DNA-binding domains of pioneer factors have been well characterized, but whether and how intrinsically disordered regions (IDRs) affect chromatin and control cell fate is unclear. Here, we report that deletion of an IDR of the pioneer factor TCF-1, termed “L1”, leads to an early developmental block in T cells. The few T cells that develop from progenitors expressing TCF-1 lacking L1 exhibit lineage infidelity distinct from the lineage diversion of TCF-1 deficient cells. Mechanistically, L1 is required for activation of T cell genes and repression of GATA2-driven genes, normally reserved to the mast cell and dendritic cell lineages. Underlying this lineage diversion, L1 mediates binding of TCF-1 to its earliest target genes which are subject to repression as T cells develop. These data suggest that TCF-1’s intrinsically disordered N-terminus maintains T cell lineage fidelity.


The induction of tissue-specific gene expression programs depends on the reconfiguration of silent chromatin and formation of accessible regulatory elements. Some proteins in the class of transcription factors (TFs) are endowed with the capacity to reprogram silent chromatin and are hence critical for cell fate determination. These TFs, also referred to as pioneer factors1,2, are thought to target DNA sequences frequently summarized as binding motifs3 through their DNA-binding domains and recruit proteins with enzymatic activities to remodel silent chromatin. Despite detailed knowledge of TF domains required for DNA binding specificities, the extent to which non-DNA binding domains of TFs are critical for lineage determination and chromatin reorganization remains largely understudied.

Here we interrogated whether and how non-DNA binding domains of a pioneer factor are required to reprogram the chromatin and determine cell fate using T cells as a model. In T cell development, lymphoid progenitors enter the thymus and receive signals from Delta-like-4 (DLL4) Notch ligands to initiate the process of commitment to the T cell lineage. The TCF-1 protein, encoded by the transcription factor 7 (Tcf7) gene, has been characterized as a lineage-determining TF for T cells since Tcf7 deletion disrupts T cell development4. TCF-1, whose expression rises precipitously as soon as bone marrow-derived progenitors enter the thymus, has been characterized as a pioneer factor5,6. Moreover, TCF-1 can promote long-range interactions across topologically associating domains7. Despite these mechanistic insights into the role of TCF-1 in T cell development, whether non-DNA binding domains of TCF-1 are engaged in shaping the chromatin landscape of T cells is unknown.

Multiple major isoforms of TCF-1 in mouse and human T cells4 have been characterized including the long isoforms which contain an N-terminal βcatenin binding domain and respond to Wnt signaling. Both short and long isoforms of TCF-1 are sufficient to initiate and sustain T cell development8,9. The high-mobility group (HMG) box DNA-binding domain of TCF-1 and closely related factor LEF-1 has largely been studied due to the solved crystal structure of the HMG-box10. Moreover, an intrinsic histone deacetylase domain within TCF-1’s N-terminus has been linked to the protein’s ability to suppress CD4+ lineage genes in CD8+ T cells11. Like many eukaryotic TFs whose non-DNA binding domains are highly disordered12 and exhibit conformational heterogeneity13, TCF-1 is predicted to be highly disordered outside of the HMG-box DNA-binding domain. The low complexity of TCF-1’s disordered domain limits the feasibility of crystallography studies and the predictive power of algorithms like AlphaFold14. Disordered regions often harbor TF effector domains whose canonical role is to interact with co-activators or co-repressors to remodel the chromatin15. The structure and function of the low complexity disordered regions of many TFs including TCF-1 remains largely understudied.

Here, we studied the distinct roles of regions within the intrinsically disordered N-terminus of TCF-1 in primary developing mouse T cells and a pro-T cell line. An N-terminal region termed “L1” was necessary for efficient transition between early T lineage progenitors in the double-negative 1 (DN1) and DN2 subsets. The L1 region was required for development and also lineage fidelity. Cells that developed without the L1 region of TCF-1 (ΔL1) expressed mast cell genes and exhibited epigenetic reprogramming downstream of Gata2 de-repression. The L1 domain was principally required for the binding of TCF-1 to its earliest target genes which were subject to repression as T cells develop. Additionally, the L1 domain could be functionally replaced with a heterologous disordered domain of B cell pioneer factor, the early B cell factor 1 (EBF-1), to rescue both early binding and T cell development. However, the L1 region was no longer required once T cells reached a post commitment stage. These studies suggest the functional relevance of TF effector domains and the importance of careful dissection of protein function through mutational approaches at multiple stages of development.

Results

The N terminus of TCF-1 is intrinsically disordered.

We constructed an alignment of the long isoform of murine TCF-1 (P45), referred to as “wild type TCF-1”, with 150 vertebrate homologues and plotted the evolutionary conservation score of each amino acid position16. The most conserved positions across species fell in the HMG-box DNA-binding domain, however a large non-DNA-binding domain within the N-terminus demonstrated moderate conservation (Fig. 1a upper). Relying on a quantitative method to predict TCF-1 structure, we utilized the predictor of natural disordered regions (VSL2 in PONDR17) and plotted the disorder score at each residue (Fig. 1a lower). The DNA binding domain had a low PONDR score indicative of ordered residues, while the surrounding sequence consisted of spans of mostly disordered amino acids.

Fig. 1. The N terminus of TCF-1 is intrinsically disordered.

Fig. 1

a) Profile of amino acid conservation score across residues of mouse TCF-1 protein utilizing ConSurf-DB and MAFFT alignment of 150 vertebrate homologous sequences (top). Profile of VSL2 score across residues in mouse TCF-1 utilizing the predictor of natural disorder regions (PONDR)46 (bottom).

b) Percentage of deuterium uptake at 4 seconds and measured sample pH of 7.0 for exchange (normalized to measured deuterium content after 23 hours of H-to-D exchange) for each TCF-1 peptide observation (different peptide charge states treated as separate observations). Line represents mean value of n=2 technical replicates.

c) Number of incorporated deuterium (D) atoms (corrected for back exchange) vs. H-to-D exchange (HX) time for each indicated peptide observation as representative examples of the time-dependent HX behavior of L1 and HMG-box domains. HX time for pHmeas 5.0 and 7.0 was scaled by a factor of 10 relative to a pHmeas 6.0 timescale in order to directly compare all data. Solid line corresponds to fit of data to stretched exponential function used for estimating approximate experimental peptide-level HX rate constant kex (see Methods). Red dashed line corresponds to the predicted behavior for each indicated peptide sequence as random coil (calculated as described19,20,47). Time = 0sec value is assumed as 0 D.

d) Schematic representation of wild type isoforms of TCF-1 (P33 and P45) and internal deletions.

e) Immunoblot (IB) analysis of NIH 3T3 cells transduced with FLAG-tagged wild type (WT) TCF-1 and mutant TCF-1 constructs with internal deletions (ΔL1-ΔL7). Vinculin used as a loading control.

f) Representative histogram of flow cytometry depicts TCF-1 expression with intracellular anti-mouse TCF-1 staining in NIH 3T3 cells transduced with empty vector (EV), wild type TCF-1 (WT), and mutant TCF-1 constructs with internal deletions (ΔL1-ΔL7).

g) Representative immunofluorescence depicts nuclear localization of FLAG tagged wild type (WT) and mutant TCF-1 with internal deletions. A nuclear mask is indicated with a dotted line in DAPI images and superimposed to the FLAG AF568 channel to indicate nuclear localization of FLAG tagged wild type (WT) and mutant TCF-1. Scale bar: 10 μm.

To interrogate the protein’s secondary structure in vitro, we expressed and purified recombinant TCF-1 protein from E. coli (Extended Data Fig. 1a) and used hydrogen-deuterium exchange coupled with mass spectrometry (HX-MS). The peptide bond amide 1H (“H”) of each amino acid, except proline, undergoes exchange in aqueous environments with solvent-derived hydrogen at variable rates that depend on the pH, temperature, and flanking amino acid side chains. In proteins, the chemical exchange rate is slowed by hydrogen bonded structure18. Low structural stability or highly dynamic regions of proteins exhibit less protection from exchange, and thus faster exchange rates, than regions with stable secondary structure. HX-MS measures this exchange over time in deuterium- (2H, “D”) containing buffer. We observed very rapid exchange for all measured N-terminal peptides, with nearly complete exchange within 4 seconds of D2O addition at a measured sample pH (pHmeas) of 7.0 and temperature of ~4C, whereas peptides within the HMG-box domain underwent much slower exchange (Fig. 1b and Extended Data Fig. 1b-d, Table S1). We repeated HX measurements at a lower pHmeas of 6.0 and 5.0, where the H-to-D exchange rate is 10- and 100-fold lower, respectively (Extended Data Fig. 1b). The exchange vs. time relationship for N-terminal peptides very closely approximated that predicted for each respective peptide sequence if residues were dynamically disordered random coil and not subject to any protection19,20 (Extended Data Fig. 1e), whereas HMG-box peptides were protected relative to this prediction across all pHmeas (Fig. 1c). Collectively, the N-terminal region of TCF-1 lacks stable secondary structure, consistent with an IDR.

To examine if the N terminus of TCF-1 play any role in T cell development, we utilized a collection of mutant TCF-1 constructs11 in which internal deletions were made tiling the protein from the N-terminus to the DNA-binding domain and labeled them sequentially as ΔL1 to ΔL7 (Fig. 1c). We also deleted the DNA-binding domain of TCF-1 to generate a ΔHMG mutant construct. To confirm that these internal deletions did not disrupt protein localization, stability, or expression, we visualized the individual mutants with immunofluorescence, western blotting, and intracellular flow cytometry to detect FLAG-tagged nuclear constructs in transduced NIH 3T3 cells (Fig. 1e-g). Thus, this series of truncation mutants of TCF-1 enabled us to test the role of TCF-1 domains in T cell development.

Efficient DN1 to DN2 transition requires the L1 domain of TCF-1

Our study focused on the murine pro-T cell program that is induced in developing progenitors divided by cell surface markers into early T cell precursors from the DN1, DN2, and DN3 subsets. We next tested the ability of each TCF-1 mutant to rescue T cell development in primary TCF-1 deficient T cell progenitors by modeling T cell development in vitro2127. Differentiation of primary mouse progenitor cells into T-lymphocytes can be achieved in vitro using a bone-marrow-derived stromal cell line that ectopically expresses the Notch ligand (OP9-DLL1)28,29 and closely mimics T cell development in vivo24,26. We first generated TCF-1 conditional knock out mice (“Tcf7 cKO”) by breeding Vav-iCre mice30 with Tcf7 eGFP reporter mice31 in which exon 2 of Tcf7 is floxed and an eGFP reporter cassette is inserted into the first intron. Cre+ Tcf7 cKO mice displayed altered T cell development in the thymus and lacked thymic expression of all TCF-1 protein isoforms8,32. To accommodate the eGFP reporter in Tcf7 cKO mice, we cloned all mutant TCF-1 constructs onto a backbone with violet-excited fluorescent protein (MSCV-IRES-Vex). Expectedly32, neither Lin- Sca1+ ckit+ (LSK) bone marrow (BM) cells nor ckit+ BM progenitors from Tcf7 cKO mice generated any Thy1+ CD25+ T cells after co-culture on OP9-DLL1 cells for 5 or 13 days (Fig. 2a-c and Extended Data Fig. 2a-c).

Fig. 2. Loss of TCF-1’s L1 domain limits DN1 to DN2 transition.

Fig. 2

a) Identification of Thy1+ CD25+ cells in OP9-DLL1 co-cultures of Tcf7 cKO cells transduced with empty vector (EV), wild type (WT) TCF-1, or mutant TCF-1 (ΔL1-ΔL7 and ΔHMG) on day 5 after in vitro differentiation. Data are representatives of at least 3 independent experiments, all cells were pre-gated on SSC-A/FSC-A, Singlets, Live cell (Viability-), CD45+, transduced (Vex+).

b) Detection of DN1, DN2, and DN3 cells by CD44 and CD25 surface expression in co-cultures described in A. Data are representatives of at least 3 independent experiments. All cells were pre-gated on SSC-A/FSC-A, Singlets, Live cell (Viability-), CD45+, transduced (Vex+).

c) Quantification of frequency and number of Thy1+ CD25+ cells (left), CD44+ CD25+ DN2s, and CD44- CD25+ DN3 cells (right) from Tcf7 cKO cells on day 5 after in vitro differentiation on OP9-DLL1 cells. Bars represent mean from n=2 independent animals, individual replicates are represented by data points. P values are determined by one-way ANOVA followed by Dunnett’s multiple comparison test with WT TCF-1 (P45) as a control. *P ≤ 0.05, ** P ≤ 0.01, *** P ≤ 0.001, and **** P ≤ 0.001.

d) Representative flow cytometric analysis identifying transduced (Vex+) GFP+ cells (Tcf7 eGFP reporter) of Tcf7 cKO cells on day 5 after in vitro differentiation on OP9-DLL1 co-cultures.

e) Quantification of frequency of Vex+ GFP+ (Tcf7 eGFP reporter) cells in OP9-DLL1 co-cultures on day 5 as described in D. All cells were pre-gated on SSC-A/FSC-A, Singlets, Live cell (Viability-), and CD45+. Bars represent mean from n=2 independent animals, individual replicates are represented by data points.

We characterized early T cells between the DN2 and DN3 stages as Thy1+ CD25+ and resolved DN2s and DN3s as CD25+ CD44+ and CD25+ CD44-, respectively. Rescue of T cell development in Tcf7 cKO progenitors with retroviral transduction of the P45 or P33 isoforms of TCF-1 was evident after co-culture on OP9-DLL1 for 5 days (Fig. 2a-c). Since T cell development is accelerated in wild type progenitors transduced with TCF-132, we also transduced wild type ckit+ BM progenitors with control empty vector (EV) or wild type TCF-1 and characterized the extent of T cell development in these cultures. Co-cultures led to robust T cell development amongst wild type TCF-1-transduced cells at both days 5 and 13, while development of un-transduced (GFP-) progenitors resulted in fewer Thy1+CD25+ cells (Extended Data Fig. 2d-e). OP9-DLL4 co-cultures recapitulated OP9-DLL1 results (Extended Data Fig. 2f), while in the absence of Notch ligand the OP9-control co-cultures failed to give rise to developing T cells despite overexpression of wild type TCF-1 at either day 5 or 13 (Extended Data Fig. 2g). Together, we established a system to evaluate the functional relevance of TCF-1 domains.

We aimed to test how retroviral transduction of mutant TCF-1 constructs in parallel with wild type TCF-1, EV, and ΔHMG controls in wild type and Tcf7 cKO progenitors affected T cell development. We ensured that the levels of transduction of wild type and mutant TCF-1 were comparable to that detected in wild type pro-T cells (Extended Data Fig. 2h). Although most TCF-1 mutants restricted T cell development to various degrees, Tcf7 cKO progenitors transduced with ΔL1 demonstrated a major defect in progression towards the DN2 and DN3 stages at both days 5 and 13 (Fig. 2a-c and Extended Data Fig. 2a-c). Corroborating this finding, wild type progenitors transduced with ΔL1 showed no substantial increase in DN2 proportions over that seen in un-transduced cells, further demonstrating that the defect was cell-intrinsic and not dominant negative (Extended Data Fig. 2d-e). Furthermore, ΔL1 co-cultures did not generate an increased proportion of alternative lineage B220+ cells but showed an increased percentage of CD11b+ cells (Extended Data Fig. 2i-j). We next exploited the eGFP fluorescent reporter in Tcf7 cKO mice to assess the ability of ΔL1 to transactivate the endogenous Tcf7 locus. At day 5 in wild type TCF-1-transduced co-cultures, the presence of Vex+ GFP reporter+ cells suggested the activation of endogenous Tcf7 transcriptional activity with transduction of either full length isoform of TCF-1 (P45 or P33) (Fig. 2d,e). Intriguingly, ΔL1 co-cultures showed very few GFP reporter+ Vex+ cells akin to levels seen in EV and ΔHMG control co-cultures lacking TCF-1, suggesting limited transcriptional activity at the endogenous Tcf7 locus (Fig. 2d,e). Together, the L1 region of Tcf7 is necessary for efficient transition from DN1 to DN2 stages.

L1 domain is required for T cell identity genes

We profiled the transcriptomes of DN1s and DN2s using bulk RNA-sequencing and compared these populations to wild type, EV, ΔL7 and ΔHMG transduced cells (Fig. 3a, Table S1). Dimensionality reduction of RNA-sequencing data using principal component analysis (PCA) separated DN1s and DN2s of all conditions along the first principal component (PC1) (Fig. 3b). EV and ΔHMG expressing DN1s clustered closely together and were separated from other conditions (Fig. 3b). Rescue of development with wild type TCF-1, ΔL1, or ΔL7 compared to EV-transduced cells led to modest differences between DN1s across conditions (Fig. 3c). Intriguingly, we observed a significant de-repression of over 600 genes and a reduction in expression of around 130 genes in ΔL1 expressing DN2s compared to wild type TCF-1 transduced counterparts (Fig. 3d and Table S1). ΔL7 transduced DN2s showed much fewer differential genes compared to wild type TCF-1 expressing DN2s (121 genes up and 38 genes down) (Fig. 3d). Amongst the significantly downregulated genes in ΔL1 transduced DN2s compared to wild type TCF-1 expressing DN2s, we identified numerous T cell identity genes including Gata3, Bcl11b, Lck, Lef1, Thy1, Il2rb, Rag2, CD3g, and Cd3d (Fig. 3e,f). Hence, transcriptional divergence between ΔL1 and wild type TCF-1 expressing T cell progenitors occurs after the DN1 stage as cells enter the DN2 stage and that ΔL1 expressing DN2s have significantly reduced expression of T cell identity genes.

Fig. 3. GATA2 driven mast cell gene signature is unmasked in developing T cells lacking L1.

Fig. 3

a) Identification and sorting strategy for DN1 and DN2 cells in Tcf7 cKO co-cultures after in vitro differentiation on OP9-DLL1 cells for 7 days.

b) PCA of RNA-seq on cell populations depicted in A. RNA-seq for each population was performed in 2–3 technical replicates for n=2 independent animals.

c) Volcano plots demonstrating significantly differential genes as calculated by DESeq2 between empty vector (EV) vs. wild type (WT) TCF-1, ΔL7 vs. WT TCF-1, and ΔL1 vs. WT TCF-1 transduced Tcf7 cKO DN1s. (Adjusted P<0.05 and |Log2FoldChange|>1) P-values are calculated by the Wald test and adjusted using the Benjamini and Hochberg method.

d) Volcano plot demonstrating significantly differential genes as calculated by DESeq2 between ΔL1 vs. WT TCF-1 and ΔL7 vs. WT TCF-1 transduced Tcf7 cKO DN2s. (Adjusted P<0.05 and |Log2FoldChange|>1) Significance calculated as in C.

e) Bar plot of expression values (in RPKM) of select genes in DN1 and DN2s. Bars represent mean expression values +/− SD, and individual data points are overlaid.

f) Heatmap depicting genes (n=137) significantly upregulated in WT vs. ΔL1 transduced DN2s (Adjusted P<0.05 and |Log2FoldChange|>1). Significance calculated as in C

g) Heatmap of two sets of genes (“ΔL1 specific” and “DN1 legacy”) significantly upregulated in ΔL1 vs. WT TCF-1 transduced Tcf7 cKO DN2s. (Adjusted P<0.05 and |Log2FoldChange|>1) Significance calculated as in C.

h) Boxplots of normalized expression of gene sets (“ΔL1 specific” and “DN1 legacy”) depicted in (G.) in 62 immune cell populations from ImmGen33. Center line of box plots represent median, bounds of box represent 1st and 3rd quartiles, whiskers represent maximum and minimum values, data points represent outlier values.

i) Cumulative distribution plot of corresponding fold change in GATA2 KO dendritic cell (DC) progenitors (GATA2KO/control)36 of genes differentially up and down regulated between ΔL1 and WT transduced DN2s. P value was determined by two-sample two-sided Kolmogorov-Smirnov test on Log2FoldChange values derived from RNA-seq on n=2 independent animals, with 2–3 technical replicates each.

k) Heatmap depicting genes significantly upregulated in ΔL1 vs. WT transduced DN2s that also were downregulated between GATA2 knock out and control DC progenitors.

Loss of L1 unmasks a hidden gene signature

Amongst the genes with significantly upregulated expression in ΔL1 expressing DN2s compared to wild type TCF-1 expressing counterparts, we found genes enriched for inflammation, chemotaxis, and cytokine production ontologies (Extended Data Fig. 3a). One group of genes, which we called “ΔL1 specific genes”, showed upregulation in ΔL1 expressing DN2s uniquely compared to all other DN1s and DN2s (Fig. 3g left). The second group constituted a set of genes, which we called “DN1 legacy genes”, that were expressed in DN1s across conditions and were downregulated in ΔL7 and wild type TCF-1 expressing DN2s but only modestly reduced in ΔL1 transduced DN2s (Fig. 3g right). To gain insight into the identity of the two groups of genes with upregulated expression in the ΔL1 expressing cells, we plotted the normalized expression of each gene set across 62 immune cell populations33 (Fig. 3h). While the DN1 legacy gene set was specifically expressed in macrophages, monocytes, and granulocytes across multiple tissues, surprisingly the ΔL1 specific gene set was distinctly upregulated in splenic dendritic cell populations and peritoneal cavity mast cells (Fig. 3h). However, canonical T cell genes such as Bcl11b, Gata3, Il2ra, and Lck that were downregulated compared to wild type TCF-1 expressing DN2s were still more highly expressed compared to all DN1 populations (Fig. 3f). The majority of the nearly 7,500 genes differentially expressed between wild type TCF-1 expressing DN1s and DN2s showed a similar expression pattern in ΔL1 expressing progenitors (Extended Data Fig. 3b). The expected downregulation of PU.1 (Spi1) from DN1 to DN2 was intact in ΔL1 expressing progenitors (Fig. 3e). Of note, the L1 domain is conserved in the human TCF-1 protein and the L1-dependent modulation of early T cell associated genes was also recapitulated in human cell lines using RNA-seq (Extended Data Fig. 3c). We next characterized the factors that could orchestrate expression of de-repressed dendritic cell and mast cell-specific genes. Notably, Gata2 was co-expressed with other ΔL1 specific genes and had higher expression than other TFs that were differentially expressed between ΔL1 and wild type TCF-1 expressing DN2s (Extended Data Fig. 3d). Since GATA2 is expressed in mast cells and has been reported to regulate dendritic cell differentiation3437, we tested whether the de-repression of genes in ΔL1 transduced DN2s corresponded with increased activation of GATA2 target genes. We re-analyzed publicly available transcriptome profiling data in GATA2 deficient dendritic cell progenitors36 and found a significantly higher proportion of the genes upregulated in ΔL1 expressing DN2s to be downregulated in Gata2 deficient dendritic cell progenitors, suggesting that the de-repressed genes in ΔL1 DN2s are positive targets of GATA2 (Fig. 3i). Notable genes that were GATA2-responsive in dendritic cell progenitors and found to have upregulated expression in ΔL1 expressing DN2s included: Mcpt8, Maf, Ccl6, Cebp3, and Fcer1a (Fig. 3j, Extended Data Figs. 3e, 4). These data support the partial functionality of mutant TCF-1 lacking the L1 region and reflect a precise defect in TCF-1-dependent transcriptional repression.

L1 is dispensable for chromatin accessibility of early T cells

Previous reports characterized TCF-1 as a pioneer TF that is able to establish de novo chromatin accessibility5,6. We therefore hypothesized that the ability for TCF-1 to affect changes in local chromatin accessibility could be endowed by the L1 domain, and the developmental block that we observed in ΔL1 expressing progenitors may represent a downstream consequence of this failure. We profiled chromatin accessibility in DN1s and DN2s using ATAC-seq. Notably, at the DN1 stage TCF-1-dependent chromatin opening in ΔL1 expressing progenitors was intact (Fig. 4a,b, Table S1). TCF-1’s cognate motif appeared as the most significantly enriched motif in genomic regions demonstrating increased chromatin accessibility in both wild type TCF-1 and ΔL1 expressing DN1s compared to TCF-1-deficient EV-transduced DN1s (Extended Data Fig. 5a). Together, early chromatin opening by TCF-1 is not dependent on the L1 domain.

Fig. 4. L1 modulates binding and transcriptional outcomes in early T cell development.

Fig. 4

a) PCA of ATAC-seq in WT and mutant TCF-1 DN1/DN2. ATAC-seq was performed in 1–3 technical replicates for n=2 independent animals.

b) Volcano plot demonstrating differentially accessible peaks EV vs WT, EV vs ΔL1, ΔL1 vs WT DN1 (left) and ΔL1 vs WT DN2 (right) (adjusted P<0.05 and |Log2FoldChange|>1). P-values are calculated using Wald test and adjusted using the Benjamini and Hochberg method.

c) SeqLogo depicting enriched motifs from de novo HOMER analysis on differentially accessible peaks open in WT TCF-1 vs. ΔL1 transduced DN2 with non-differential peaks as background. P values are calculated using a hypergeometric test.

d) Heatmap depicting chromatin accessibility in DN1 and DN2 with binding of GATA2 in mast cells and GATA3 and RUNX1 in DN121,35,43 at differentially accessible peaks between WT vs. ΔL1 DN2.

e) As in c, motif analysis on differential peaks closed in WT TCF-1 vs. ΔL1 transduced DN2. P values calculated as in c.

f) As in d, depicting differentially accessible peaks closed in WT compared ΔL1 DN2.

g) Number of WT and ΔL1 binding sites profiled by TCF-1 CUT&RUN in DN1, DN2 and DN3 cells n=2 independent animals. Bars represent mean number of binding sites, individual replicate data points are overlaid.

h) L1 dependent and independent TCF-1 binding sites in DN1 and DN2 cells.

i) Boxplot representing distance to TSS in bp (top) and read normalized ATAC coverage for groups of binding sites described in (h.) (bottom). Center line of box plots represent median, bounds of box represent 1st and 3rd quartiles, whiskers represent maximum and minimum, data points represent outliers.

j) Cumulative distribution of genes within 1,000bp of a WT TCF-1 binding site shared or unique to DN1/DN2 and change in expression between DN1/DN2 (Log2 fold change). P values calculated by two-sample two-sided Kolmogorov-Smirnov test on n=2 independent animals, with 2–3 technical replicates. WT TCF-1 only bound in DN1 vs. shared DN1 & DN2 P=3.7e-6, WT TCF-1 only bound in DN2 vs. shared DN1 & DN2 P=2.2e-16, WT TCF-1 only bound in DN1 vs. only bound DN2 P=2.53e-13.

k) Genome browser view of Gata2, Mcpt and Gata3.

In DN2s, the chromatin landscape of wild type TCF-1 and ΔL1 expressing cells diverged extensively. We measured loss of chromatin accessibility in ΔL1 expressing DN2s compared with wild type TCF-1 expressing counterparts in ~3,000 genomic regions, while an extensive gain in chromatin accessibility was measured in ~2,800 genomic regions (Fig. 4b). Motif enrichment at genomic loci which lost accessibility in ΔL1 expressing DN2s compared with wild type DN2s showed an enrichment for RUNX1, STAT2, ETV4, and TCF motifs (Fig. 4c). We mapped chromatin accessibility levels along with binding intensity of relevant TFs including RUNX1, GATA2, and GATA3 at the lost regions. We observed that the majority of these sites were accessible in DN1s and required wild type TCF-1 to maintain accessibility in DN2s (Fig. 4d). A smaller number of sites showed L1 dependent de novo opening in DN2s (cluster 2, Fig. 4d). In particular, these sites were correspondingly bound by GATA3 and RUNX1 in DN1s (ChIP columns, Fig. 4d). These data suggested a requirement for the L1 domain to maintain accessibility at regions co-bound by RUNX1.

Sites that gained accessibility in ΔL1 expressing DN2s compared with wild type counterparts showed an enrichment for GATA, AP1, and NFAT motifs but the TCF motif did not appear to be enriched at these sites (Fig. 4e). We mapped chromatin accessibility along with TF binding at the ~2,800 regions which gained accessibility in ΔL1 expressing DN2s compared with wild type counterparts (Fig. 4f). These de novo accessible sites in ΔL1 DN2 were inaccessible across DN1s and in wild type TCF-1 expressing DN2s. Correspondingly, we observed robust binding of these same loci by GATA2 in mast cells and no substantial binding of GATA3 or RUNX1 in DN1s (Fig. 4f). Altogether, the L1 domain of TCF-1 is dispensable for early changes to chromatin accessibility in DN1s. Divergence in the accessibility landscape occurrs as development progressed to the DN2 stage, a stage at which the L1 domain is required to repress GATA2- induced chromatin accessibility.

L1 is required for TCF-1 binding in early T cell development

We next reasoned that the early defect in progenitors expressing ΔL1 may instead be attributed to a requirement for the L1 domain in the initial step of targeting chromatin at genomic regions. We mapped genome-wide binding profiles of wild type and ΔL1 TCF-1 in DN1s and DN2s using CUT&RUN (Table S1). Remarkably, we observed a 90% reduction in global binding of ΔL1 TCF-1 in DN1s compared with the wildtype counterpart. Only 4,265 binding events were detected for ΔL1 TCF-1 compared to 39,867 binding events for wild type TCF-1 (Fig. 4g). All sites bound by ΔL1 TCF-1 overlapped with sites bound by wild type TCF-1 in DN1s (Extended Data Fig. 5b). In DN2s the divergence in binding profiles narrowed, where wild type TCF-1 bound at 65,576 sites compared to 24,082 sites bound by ΔL1 TCF-1 (Fig. 4g). The majority of binding events in DN2s were shared between wild type and ΔL1 mutant, however 4,006 sites were uniquely bound by ΔL1 TCF-1 (Extended Data Fig. 5b). Together, ΔL1 TCF-1 has a major defect in binding DNA in DN1s.

We utilized a dimensionality reduction strategy for genomic regions demonstrating TCF-1 binding and chromatin accessibility across conditions using PCA (Extended Data Fig. 5c). Wild type TCF-1 binding did not colocalize with accessibility measurements in cells at the corresponding stage. Additionally, ΔL1 TCF-1 binding did not overlap with wildtype TCF-1 binding in DN2s and instead clustered more closely with accessibility measurements in DN2s (Extended Data Fig. 5c). Hence, the binding of wild type TCF-1 at distinct stages is not dictated by chromatin accessibility, consistent with previous reports of TCF-1’s ability to bind to nucleosome occupied DNA5. Furthermore, this intrinsic property of TCF-1 is endowed by the L1 domain.

To characterize the mechanism through which the L1 domain might affect TCF-1 binding, we delineated binding events in both DN1s and DN2s for which binding was dependent or independent of the L1 domain (Fig. 4h). We observed in each of these cell types binding events that depended on the presence of the L1 domain were more distant from promoters and showed lower chromatin accessibility than sites that were bound independently of the L1 domain (Fig. 4i). We performed de novo motif analysis and observed an enrichment of ETS and RUNX motifs, but not the TCF-1’s cognate motif, at L1 dependent sites bound by TCF-1 in DN1s compared to DN2s (Extended Data Fig. 5d). Despite a requirement for the L1 domain in binding of TCF-1 to early DN1 targets, a corresponding L1 dependency in creating de novo chromatin accessibility in DN1 or DN2s was not detected (Extended Data Fig. 5e). Together, the L1 domain was required for binding of TCF-1 at distal regions with low level of chromatin accessibility and low enrichment for TCF-1’s cognate motif at early stages of T cell development.

L1 is required for stage dependent transcriptional outcomes

To determine the consequences of L1 dependent binding at distinct stages, we linked TCF-1 binding with the transcriptional regulation of its target genes. We selected TCF-1 target genes based on detection of wildtype TCF-1 binding events within 1,000bp of genes’ transcriptional start sites (TSS) and evaluated gene expression difference in DN1s and DN2s in three classes defined by shared and unique binding of TCF-1 in DN1s and DN2s (Extended Data Fig. 5b). Genes bound by TCF-1 in both DN1s and DN2s were moderately expressed in DN1s and showed no increase in expression in DN2 (yellow in Fig. 4j). In contrast, genes bound by TCF-1 specifically in DN2s were biased to DN2 specific gene expression (blue in Fig. 4j). Notably, genes bound by TCF-1 in DN1s did not coincide with increased expression subsequently in DN2s (red in Fig. 4j) – suggesting an early role of transient TCF-1 occupancy in pre-emptive gene repression. With these data we reasoned that the effects of reduced binding by ΔL1 in DN1s preferentially impacted suppression of alternative lineage genes including Gata2 and other mast cell genes (Fig. 4k left and middle) at which we observed a corresponding decrease in DN1 ΔL1 occupancy and an increase in subsequent lineage inappropriate chromatin accessibility. The binding disparity in DN2s may underlie inefficient T cell gene activation including at Gata3 (Fig. 4k right). Together these data provided a model through which TCF-1 orchestrates transcriptional control to allow T cell developmental competence. This model postulates a transient early wave of L1 dependent TCF-1 binding, as we observed at the Gata2 locus, to regions with low TCF motif enrichment, low chromatin accessibility, and frequently enriched for RUNX1 binding at target genes whose expression is inhibited in T cells. A second wave of binding of TCF-1 in DN2s occurs at regions enriched for TCF-1 motifs and promotes T cell specific gene activation, as illustrated at the Gata3 locus.

L1 can be functionally substituted with another unstructured domain

The interchangeable nature of IDR of proteins has been described previously38. To determine if the L1 domain of TCF-1 could be functionally replaced with another previously characterized IDR, we designed a construct in which the L1 region was replaced with the C terminal domain (CTD) of EBF-139,40 (Fig. 5a,b). We referred to this construct as ΔL1 + EBF1 CTD. Surprisingly, we observed a significant rescue in both the absolute number and percentage of Thy1+ CD25+ cells when progenitors were transduced with ΔL1 + EBF1 CTD, unlike the progenitors transduced with ΔL1 (Fig. 5c,d). The expression of ΔL1 + EBF1 CTD also rescued the defect in the induction of the GFP reporter in transduced cells (Fig. 5e). We next determined if this unrelated IDR could also rescue the defect in ΔL1’s ability to target chromatin in DN1s. We mapped the global binding events of ΔL1 + EBF1 CTD TCF-1 in DN1s using CUT&RUN (Fig. 5f). Remarkably, ΔL1 + EBF1 CTD TCF-1 showed binding to a substantially increased number of genomic sites compared to ΔL1 TCF-1 including Rag, Il2ra, and Lef1 loci, although it did not completely recapitulate the binding profile of wild type TCF-1 (Fig. 5f-i). Together, the T cell developmental defect associated with the loss of L1 domain was linked to TCF-1’s ability to access its full range of binding sites in DN1s and that this defect could be rescued with another TF’s IDR.

Fig. 5. L1 can be functionally substituted with another unstructured domain.

Fig. 5

a) Schematic representation of wild type (WT) isoform of TCF-1 (P45), mutant lacking the L1 domain (ΔL1) and mutant in which the L1 domain is replaced with the C terminus of EBF1 (EBF-1 CTD).

b) Immunoblot analysis of 293T cells transfected with wild type (WT) TCF-1 and mutant TCF-1 constructs; Δ L1 and Δ L1 + EBF1 CTD. Immunoblot is probed with TCF-1 antibody and H3 as a loading control.

c) Identification of Thy1+ CD25+ cells in OP9-DLL1 co-cultures of Tcf7 cKO cells transduced with empty vector (EV), wild type (WT) TCF-1, or mutant TCF-1 (ΔL1 and Δ L1 + EBF1 CTD) on day 5 after in vitro differentiation. All cells were pre-gated on SSC-A/FSC-A, Singlets, Live cell (Viability-), CD45+, transduced (Vex+).

d) Quantification of frequency (left) and numbers (right) of Thy1+ CD25+ cells from Tcf7 cKO cells on day 5 after in vitro differentiation on OP9-DLL1 cells (in C.). Bars represent mean values from n=2 independent animals, individual replicate data points are shown.

e) Representative flow cytometric analysis identifying transduced (Vex+) GFP+ cells (Tcf7 eGFP reporter). All cells were pre-gated on SSC-A/FSC-A, Singlets, Live cell (Viability-), CD45+.

f) Quantification of binding sites identified by TCF-1 CUT&RUN in DN1s transduced with WT TCF-1, ΔL1, ΔL1 + EBF1 CTD, and EV. CUT&RUN experiments for each population were performed in at least two biological replicates. Bars represent mean number of binding sites from n=2 independent animals, individual data points are shown.

g) Principal component analysis of TCF-1 CUT&RUN in Tcf7 cKO DN1 and DN2s transduced with wild type (WT) TCF-1, Δ L1, Δ L1 + EBF1 CTD, and EV. CUT&RUN experiments for each population were performed in at least two biological replicates.

h-i) Genome browser view of Il2ra, Rag1/2, and Lef1 loci visualizing CUT&RUN profiles of DN1and DN2s from OP9-DLL1 co-culture of Tcf7 cKO cells at day 7.

Limited effect of L1 on chromatin accessibility in committed T cells

We next sought to analyze how deletion of L1 or other domains in the N-terminus of TCF-1 affects chromatin accessibility and gene regulation at a post-commitment stage of T cell development in which cells cannot adopt alternative fates to T cells. Hence, we utilized a gene-replacement strategy in a post T cell commitment Tcf7−/− pro-T cell line, DN3-like Scid.adh cells, abbreviated as DN37,21 (Fig. 6a). We first ablated endogenous TCF-1 with CRISPR/Cas9 in DN3s7 and then “replaced” expression with wild type TCF-1 or mutant TCF-1. We measured transcriptional outputs and found ΔL1 expressing DN3s clustered more closely with cells expressing wild type TCF-1 while ΔL7 expressing cells were closer to TCF-1 deficient EV-transduced cells (Extended Data Fig. 6a). Surprisingly, deletion of the L7 region of TCF-1 was more detrimental to TCF-1 dependent gene regulation at the DN3s than deletion of the L1 region (Extended Data Fig. 6b-d).

Fig. 6. Loss of L1 has limited effect on chromatin accessibility in committed T cells.

Fig. 6

a) Experimental design of gene replacement strategy using retroviral transduction of wild type (P45 isoform) or mutant TCF-1 in Scid.adh (DN3) cells after CRISPR/Cas9 disruption of endogenous TCF-1.

b) Quantification of L1 dependent and independent binding sites detected with TCF-1 CUT&RUN in DN1, DN2, and Scid.adh DN3 cells. Bars represent total number of binding sites.

c) Principal component analysis depicting ATAC-seq in Tcf7−/− DN3 cells transduced with wild type (WT), empty vector (EV), or mutant TCF-1 (ΔL1, ΔL2, ΔL6, or ΔL7).

d) Volcano plot demonstrating differential accessibility of ATAC-seq peaks between EV transduced and WT TCF-1 (left), ΔL1 (middle) and ΔL7 (right) transduced Tcf7−/− DN3 cells. (adjusted P<0.05 and |Log2FoldChange|>1). P-values are calculated by the Wald test and adjusted using the Benjamini and Hochberg method.

e) Heatmap demonstrating TCF-1 and ΔL1 binding measured by CUT&RUN and chromatin accessibility in WT and mutant TCF-1 (ΔL1, ΔL2, ΔL6, and ΔL7) transduced cells at peaks significantly open in WT TCF-1 compared to EV transduced Tcf7−/− DN3 cells. (adjusted P<0.05 and |Log2FoldChange|>1). P-values are calculated by the Wald test and adjusted using the Benjamini and Hochberg method.

f-g) Genome browser view of Il2ra and Rag1/2 loci depicting TCF-1 CUT&RUN and ATAC-seq profiles in wild type (WT) TCF-1, empty vector (EV), and mutant TCF-1 (ΔL1, ΔL2, ΔL6, and ΔL7) transduced Tcf7−/− DN3.

h) Depiction of L1 dependent TCF-1 protein-protein interaction (PPI) network identified by mass spectrometry (MS) of TCF-1 immunoprecipitation in DN3 cells. All Interactions were filtered and ranked (see methods) to identify proteins with enrichment in WT-TCF-1 vs. EV and ΔL1 immunoprecipitations. Network was filtered based on first neighbor nodes of Tcf7. Node size and color indicate fold change in normalized abundance between DN3 cells expressing WT TCF-1 and EV.

i) RUNX1 co-immunoprecipitation with separate parallel immunoblotting for RUNX1 and TCF-1. Bar plot depicted mean FLAG protein level quantification normalized to 5% input. Data points indicate three quantifications per condition and error bars represent standard deviation. Results are representative of n=2 biologically independent samples.

In DN3s, wild type TCF-1 bound 62,046 sites, while ΔL1 TCF-1 bound to 36,448 sites as measured by CUT&RUN (Extended Data Fig. 6e). The comparison of TCF-1 binding data across stages suggested that the progression of cells between DN1 and DN3 coincided with a reduction in the percentage of sites that depended on the L1 domain for binding (Fig. 6b and Extended Data Fig. 6e,f). We next analyzed chromatin accessibility measured by ATAC-seq in TCF-1 mutant expressing DN3s. PCA of the chromatin accessibility in mutant TCF-1 expressing DN3s displayed a distinct epigenetic state compared to either EV or wild type TCF-1 expressing controls. Moreover, the ΔL1 replaced cells exhibited a closer relationship to wild type TCF-1 transduced cells, while the ΔL7 replaced cells were closer to TCF-1 deficient EV-transduced cells (Fig. 6c). Wild type TCF-1 transduction led to a significant gain in chromatin accessibility, while the ΔL1 and ΔL7 TCF-1 established only 230 and 88 accessible regions, respectively, at which accessibility was gained compared to EV, with the greatest defect observed in ΔL7 replaced cells (Fig. 6d). We performed k-means clustering on chromatin accessibility and visualized data using heatmaps across mutant and wild type TCF-1 replaced DN3s at the 2,141 genomic sites significantly more open with wild type TCF-1 compared to EV-transduced DN3s (Fig. 6e). Both ΔL6 and ΔL7 showed a greater reduction in creating open chromatin regions compared to the ΔL1 and ΔL2 relative to wild type TCF-1 as illustrated at the Il2ra and Rag1-Rag2 loci (Fig. 6e-g). These findings suggest the importance of the L7 region for TCF-1’s functionality after T cell commitment.

L1-dependent interaction between RUNX1 and TCF-1

Recent studies on protein interactions mediated by IDR of TFs highlighted the formation of biomolecular condensates or foci representing high local concentrations of TFs and transcriptional machinery15,40. We generated constructs in which wild type TCF-1 and ΔL1 were fused with GFP, transduced DN3s with GFP fusion constructs and visualized cells with confocal microscopy. The GFP signal in both wild type and ΔL1 TCF-1 localized to the nucleus with distinct granular morphology compared to an EV control in which GFP alone is expressed homogenously in both the cytoplasm and nucleus (Extended Data Fig. 7a). This morphology was not consistent with discrete foci, however we determined the GFP signal associated with both wild type and ΔL1 TCF-1 to be more granular within the nucleus than GFP alone (Extended Data Fig. 7a). Hence, a local partitioning of TCF-1 in the nucleus does not depend on the L1 domain.

To identify proteins that could interact with TCF-1 in an L1 dependent manner in DN3s, we performed affinity purification of FLAG tagged wild type and ΔL1 TCF-1 followed by liquid chromatography with tandem mass spectrometry (LC-MS/MS). Identified interactors were scored by enrichment in the immunoprecipitation of wild type TCF-1 compared to ΔL1 and EV control (Fig. 6h). We constructed a network of top L1 dependent putative protein-protein-interactions (PPIs) based on the extent of enrichment between wild type TCF-1 and EV immunoprecipitations (Fig. 6h, Extended Data Fig. 7b,c, Table S1). Uniprot keywords “acetylation”, “phosphoprotein” and “nucleus” were significantly enriched in the network (Extended Data Fig. 7d). Notably, we identified RUNX1 (with cofactor CBFB41) and Tle3 as L1 dependent interactors (Fig. 6h). The TCF-1 and Tle3 interactions have been described previously to partition Tle3 between TCF-1 and RUNX1/3 in CD8+ T cell lineage specification42. We validated the L1-mediated association of RUNX1 with TCF-1 by co-immunoprecipitation (Fig. 6i). Together, the L1 dependent interaction of RUNX1 and chromatin associated proteins with TCF-1 enables TCF-1-dependent gene regulation. Furthermore, the interaction between TCF-1 and RUNX1 mediated by the L1 domain can occur at early stages of T cell development as well as in post commitment DN3s and likely has functional significance.

Discussion

Pioneering work over 30 years ago identified TCF-1 as an exquisitely tissue specific factor that binds DNA in the minor grove to distort and bend the double helix10. In the intervening years, the molecular mechanisms of TCF-1’s function in development and disease have come to light. Despite these advances, key questions remain about the role of non-DNA binding domains of TCF-1. In this study, we showed that distinct regions within the N-terminus of TCF-1 have integral roles in orchestrating T cell development. We uncovered L1, an IDR within the N-terminus of TCF-1 that was required for efficient early T cell development. Bone marrow progenitors that lacked L1 were unable to upregulate T cell identity genes and showed a marked de-repression of GATA2 target genes normally restricted to mast cell and dendritic cell lineages. The L1 region of TCF-1 facilitated early binding to inaccessible loci lacking the TCF motifs, which corresponded to genes repressed later in T cell development. This impact on early binding was linked to the inability of ΔL1 expressing cells to progress developmentally. Rescue of early binding and development was achieved by substituting the L1 domain for a heterologous disordered domain. We additionally identified L7, a region flanking the DNA binding domain of TCF-1 that contributed to TCF-1 dependent chromatin opening and gene regulation in a T cell committed DN3 cell line, but whose loss did not contribute to a developmental block in primary early T cells.

TCF-1 is the earliest mediator of T cell specific gene control and as such is positioned to reshape cell fate. Pioneer TFs can interface with repressed chromatin and shape cell identity, while other TFs are limited to sites within already accessible chromatin2. Pioneer factors can engage with compacted chromatin but may still require recruitment of other factors to affect sustained changes2. The L1 domain was not only required for binding of TCF-1 in DN1s, but also for an association with RUNX1 and its obligate cofactor CBFB. Whether this interaction is direct or whether the L1 domain enables TCF-1 to bind at RUNX1 co-occupied regions remains unclear. Recent reports have described dynamic genomic occupancy and transcriptional control by RUNX1/3 during T cell development, enabling distinct associations with co-factors24. In one example, early expression of PU.1 can lead to a redistribution of RUNX1 binding43. In such cases, TCF-1-mediated repression of PU.1 may facilitate the L1-dependent co-binding of RUNX1 and TCF-1, thereby promoting T cell development.

A reductionist view of TFs separates DNA binding and effector functions into modular distinct domains. However, a large body of work demonstrates that non-DNA binding domains often enable TFs to bind compacted chromatin and initiate chromatin opening25,40,44,45. The function of non-DNA binding domains intersects the sequential process through which TFs function and interact with chromatin. Here, the deletion of the L1 domain had a distinct impact on TCF-1 binding and chromatin opening. The lack of binding stability, separate from chromatin opening, suggests a regulatory mechanism where continuous occupancy is not essential. Instead, a transient “hit and run” binding event may initiate accessibility in this early context, allowing other partner factors to bind and sustain accessibility. In later stages of T cell development, TCF-1 binding was less dependent on the L1 domain. This highlights the specific requirement for non-DNA binding domains early in developmental trajectories before cell specification when the chromatin landscape has not been extensively acted upon by other factors.

METHODS

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Cell culture

Scid.adh cell line, a pro-T cell line derived from spontaneous thymic lymphomas49, was a kind gift from Warren Pear’s lab at the University of Pennsylvania. These cells were grown in RPMI 1640 medium (Invitrogen), supplemented with 10% fetal bovine serum (FisherScientific), 1mM sodium pyruvate (Gibco), 1% non-essential amino acids (Gibco), 2 mM L-Glutamine (Lonza), 1% penicillin-streptomycin and 0.1% 2-Mercaptoethanol (Gibco). OP9-ctrl, OP9-DLL1, and OP9-DLL4 cells where a kind gift from Ivan Maillard’s lab at the University of Pennsylvania. These cells were maintained in αMEM (Invitrogen), supplemented with 20% fetal bovine serum and 1% penicillin-streptomycin. HEK293T cells were purchased from ATCC (Cat# CRL-3216; RRID:CVCL_0063). HEK-293T cells were maintained in high glucose DMEM medium 1X with L-Glutamine (Invitrogen), supplemented with 100 U/mL penicillin and 100 mg/mL streptomycin (Gibco) with 10% FBS. NIH 3T3 cells were purchased from ATCC (Cat# CRL-1658 RRID:CVCL_0594). NIH-3T3 cells were maintained in high glucose DMEM medium 1X with L-Glutamine (Invitrogen), supplemented with 100 U/mL penicillin and 100 mg/mL streptomycin (Gibco) with 10% Bovine Serum, heat inactivated (Thermo). Cells were maintained at low passage number (< 12), at 70–80% confluency. All cells were grown at 37°C and 5% CO2. Cell lines were not authenticated. Mycoplasma contamination were tested periodically in all cell lines, no mycoplasma contamination was detected. Commonly misidentified cell lines were not used.

Mice

Female and Male breeder Vav-iCre transgenic mice (Strain #008610) 30,50,51 and Tcf7eGFP Mice (strain # 030909) 31 were purchased from Jackson Laboratory. “Tcf7−/− cKO” mice were generated by breeding Tcf7eGFP mice, in which 2 loxP sites are inserted on either side of exon 2 of the Tcf7 gene, with Vav-iCre mice. The F1 generation was backcrossed to Tcf7eGFP mice to reach homozygous floxed Cre+ experimental mice (Tcf7 cKO). All mice were bred and housed in an American Association for the Accreditation of Laboratory Animal Care (AAALAC) accredited vivarium at the University of Pennsylvania. All husbandry and experimental procedures were performed according to the protocol reviewed and approved by the Institutional Animal Care and Use Committee (IACUC). Mice were fed with 5010 - Laboratory Autoclavable Rodent Diet (LabDiet), and were maintained at 12 light/12 dark cycle, between 18–23°C temperature and 40–60% humidity. Experimental and control mice were 6–10 weeks old of either sex. At least 2 biological replicate mice of matching age and sex were used for each experiment.

Cell preparation

Single-cell suspensions were prepared from the bone marrow (BM) removed from the femur and tibiae of 6–8 week old C57BL6/J or Tcf7 cKO mice. Ckit+ BM cells were enriched for with EasySep Mouse CD117 (cKIT) Positive Selection kit according to manufacturer instructions. Enriched cells where co-cultured on OP9 monolayers or stained for LSK sorting. For LSK sorting cells were stained with LD Aqua, a combination of lineage antibodies (Ter119, CD3, NK1.1, GR1, TCRgd, TCRb, Cd11c, Cd19, B220, CD11b- all diluted 1:200), Sca1 (dilution- 1:200), and Ckit (dilution- 1:300) and were sorted for Viabillity-, Lin-, Ckit+, Sca1+. Ckit+ or sorted LSK cells were activated in IMDM media supplemented with 20% FBS, 1% penicillin streptomycin, SCF(100ng/ml), IL-6 (5ng/ml) and IL-3(10ng/ml)) overnight. Transduced cells where plated the following day on OP9 monolayers in OP9 media supplemented with 5 ng/mL Flt-3L and 1 ng/mL IL-7 for 5, 7, or 13 days. Co-cultures were passaged by gently disrupting cells, passage through a 40um cell strainer (Falcon) and transferred onto new OP9 monolayers every 4–5 days. Cells from co-cultures were stained with L/D APCef780 (dilution 1:4000), and fluorescent antibodies to B220 (dilution- 1:300), CD44 (dilution- 1:400), CD45 (dilution- 1:400), Thy1.2 (dilution- 1:300), Ckit (dilution- 1:300), CD25 (dilution- 1:350), and CD11b (dilution- 1:200). Sorting was performed on a BD FACSAria after 7 days to isolate DN1 (CD45+c-KithiCD44hI CD25low), DN2s (CD45+c-KitlowCD44low CD25high) and DN3 cells (CD45+ Ckitlow CD44low CD25low).

Cloning/Generation of TCF-1 mutants

FLAG-tagged MSCV GFP-TCF-1 constructs for the long (P45) and short (P33) isoforms of TCF-1 as well as mutants ΔL1 - ΔL5 were a kind gift from Hai-Hui Xue. To create ΔL6 and ΔL7 mutants deletion flanking primers were used with Q5 site-directed mutagenesis kit according to manufacturer’s instructions. TCF-1 P45 Vex MSCV constructs5 were utilized with deletion flanking primers and q5 site-directed mutagenesis kit to create all mutants on a Vex MCSV backbone. Mutant TCF-1 ΔL1 + EBF1 CTD was constructed with HIFI NEBuilder HiFi Assembly Kit and PCR based cloning with primers designed to amplify 489bp region encoding EBF-1’s CTD region with overlaps flanking L1 domain of TCF-1 on Vex MSCV backbone. Constructs in which WT TCF-1 and ΔL1 were fused to GFP along with an EV GFP construct were created with HIFI NEBuilder HiFi Assembly Kit and PCR based cloning into a custom pMSCV-derived plasmid containing an EGFP variant (with monomerizing A206K mutation) (“mEGFP”) downstream of the mouse PGK1 promoter. Human ΔL1 TCF-1 was created using Q5 site-directed mutagenesis kit according to manufacturer’s instructions, both human WT TCF-1 and corresponding mutant ΔL1 were cloned into lentiviral LRG2.1 downstream of the U6 promoter using HIFI NEBuilder HiFi Assembly Kit and PCR based cloning. All constructs were confirmed by sanger sequencing.

Transduction for Tcf7 KO in Scid.adh cell line and primary Ckit+ BM or LSK cells.

CRISPR/Cas9 system was used to delete TCF-1 in Scid.adh cells as described previously7. Transduction of Tcf7 KO scid.adh cells was accomplished by addition of retroviral supernatants to culture medium supplemented with polybrene (8mg/ml) and spinfected at 700 × g for 25 minute. 72hrs after transduction, live transduced cells were sorted for downstream experiments. Retroviral transduction of Ckit+ BM and LSK cells was performed by spinfection of cells with equal volumes of viral supernatants for 90 min at 1300 × g (RT), after 4hrs virions were diluted with fresh IMDM media and cells were returned to the incubator overnight, cells were plated on OP9 monolayers the following day.

Retroviral Packaging

For retroviral packaging of mutant TCF-1 plasmids (GFP MSCV or Vex MSCV backbone), 4 × 106 293T cells were plated in 4 mL DMEM media in 10 cm dishes on the day prior to transfection. Immediately before transfection, chloroquine was added to the media to a final concentration of 25 mM. The retroviral construct/empty vector and the pCL-Eco plasmid were transiently co-transfected using Lipofectamine 3000 (Invitrogen). The cells were returned to the incubator for 6 hours. Subsequently, the medium was changed to fresh media. Virions were collected 24 and 48 hrs after transfection, snap-frozen, and stored at −80°C for future use.

Western Blot

Western blotting was performed on whole cell lysates from transduced 3T3 and DN3 cells, and transfected 293T’s. Cells were lysed with 1X RIPA buffer supplemented with proteinase inhibitor cocktail. Equal numbers of cells per condition were utilized and equal volumes of lysate were loaded on a NuPAGE 4–12% Bis-Tris gel and transferred using the iBlot 2 Gel Transfer Device. Membranes were blocked with 5% non fat dry milk in 1X TBST buffer followed by incubation with primary anti mouse M2 FLAG Ab (Millipore Sigma Cat# F1804; dilution- 1:1000), Mouse anti-RUNX1 (Santa Cruz, sc-365644; dilution- 1:200), and rabbit anti-mouse vinculin (Santa Cruz, sc-25336; dilution- 1:200) and finally probed with HRP conjugated anti-rabbit IgG (CST Cat# 7074, dilution- 1:2000) or anti-mouse IgG (CST Cat# 7076, dilution- 1:2000) secondary antibodies. Blots were visualized with SuperSignal West Femto Maximum Sensitivity Substrate (Thermo Scientific) on the ChemiDoc Imaging system (Bio-Rad).

Co-Immunoprecipitations for Western Blot Analysis

Co-immunoprecipitations (Co-IP) were performed as described52 Antibodies were conjugated to protein G beads including FLAG antibody (6μg, Sigma, F1804), anti-TCF7 antibody (6μg, Cell Signaling Technology, C63D9) or anti-RUNX1 antibody (6μg, Abcam Ab23980 RUNX1). Beads were washed in blocking buffer three times and clarified lysate was incubated with the antibody conjugated beads rotating overnight at 4 °C. The mixture was washed with immunoprecipitation buffer without supplements three times and eluted by boiling in NuPAGE loading dye (Invitrogen) at 95 °C for 5 minutes. Samples were analyzed by western blotting. Western blots were quantified using FIJI53 (ImageJ2 Version 2.9.0) to assess protein density. FLAG IP Protein density was normalized to input protein density, quantification of band density was performed three times for each condition, error bars represent standard deviation of these three quantifications.

Co-Immunoprecipitations for Mass Spectrometry Analysis

For samples analyzed by mass spectrometry, the following modifications were made to the CO-IP protocol. Cells for 5% input were lysed separately with a non-detergent lysis buffer (6M urea, 2M thiourea in 50mM ammonium bicarbonate (pH 8)). After the overnight incubation, the beads were washed once with immunoprecipitation buffer, and then twice with non-detergent immunoprecipitation buffer (20mM Tris, pH 7.5, 137 mM NaCl, 1MM MgCl2, 1mM CaCl2). On-bead digestion of protein was performed by incubating the beads in 50mM TEAB and 5mM DTT at room temperature for 60 minutes, shaking at 1200 rpm. Iodoacetamide was added to the mixture at a concentration of 20mM, and continued shaking at 1200 rpm in the dark for 60 minutes. Trypsin was added to the mixture and incubated overnight shaking at 900 rpm. The samples were frozen at −80 °C and then analyzed with mass spectrometry.

Sample desalting

Prior to mass spectrometry analysis, samples were desalted using a 96-well plate filter (Orochem) packed with 1 mg of Oasis HLB C-18 resin (Waters). Briefly, the samples were resuspended in 100 μl of 0.1% TFA and loaded onto the HLB resin, which was previously equilibrated using 100 μl of the same buffer. After washing with 100 μl of 0.1% TFA, the samples were eluted with a buffer containing 70 μl of 60% acetonitrile and 0.1% TFA and then dried in a vacuum centrifuge.

LC-MS/MS Acquisition and Analysis

Samples were resuspended in 10 μl of 0.1% TFA and loaded onto a Dionex RSLC Ultimate 300 (Thermo Scientific), coupled online with an Orbitrap Fusion Lumos (Thermo Scientific). Chromatographic separation was performed with a two-column system, consisting of a C-18 trap cartridge (300 μm ID, 5 mm length) and a picofrit analytical column (75 μm ID, 25 cm length) packed in-house with reversed-phase Repro-Sil Pur C18-AQ 3 μm resin. To analyze the proteome, peptides were separated using a 60 min gradient from 4–30% buffer B (buffer A: 0.1% formic acid, buffer B: 80% acetonitrile + 0.1% formic acid) at a flow rate of 300 nl/min. The mass spectrometer was set to acquire spectra in a data-dependent acquisition (DDA) mode. Briefly, the full MS scan was set to 300–1200 m/z in the orbitrap with a resolution of 120,000 (at 200 m/z) and an AGC target of 5×10e5. MS/MS was performed in the ion trap using the top speed mode (2 secs), an AGC target of 1×10e4 and an HCD collision energy of 35.

Proteome raw files were searched using Proteome Discoverer software (v2.5, Thermo Scientific) using SEQUEST search engine and the SwissProt mouse database (updated Jan 2023). The search for total proteome included variable modification of N-terminal acetylation, and fixed modification of carbamidomethyl cysteine. Trypsin was specified as the digestive enzyme with up to 2 missed cleavages allowed. Mass tolerance was set to 10 pm for precursor ions and 0.2 Da for product ions. Peptide and protein false discovery rate was set to 1%. Following the search, data was processed as described54. Briefly, proteins were log2 transformed, normalized by the average value of each sample and missing values were imputed using a normal distribution 2 standard deviations lower than the mean. Statistical regulation was assessed using heteroscedastic T-test (if p-value < 0.05). Data distribution was assumed to be normal but this was not formally tested. To prioritize proteins of interest that were enriched in WT TCF-1 immunoprecipitation compared to both ΔL1 and EV proteins were ranked using an enrichment score calculated for each comparison (WT TCF-1 IP vs. EV IP and WT TCF-1 IP vs. ΔL1 IP) using the product of the fold change and -log of the P value. Proteins were then filtered for non-differential enrichment in input samples. Proteins with the top 100 enrichment scores were plotted using Cytoscape to create a network of L1 dependent protein-protein interactions. The stringApp was utilized with the tool STRING: protein query for visualization of entire network or network of first neighbor proteins to Tcf7.

Immunofluorescence

TCF-1 wild type and mutant transduced NIH 3T3 cells were plated on poly-L-lysine treated glass slides and allowed to adhere for 2 hours in a humidified chamber and then flooded with media and returned to the incubator overnight. WT, ΔL1 and EV-GFP fusion constructs were transduced into Scid.adh DN3 cells. Cells were collected after 48 hours and sorted according to the same level of GFP expression. Cells were fixed on slides for 10 minutes with 4% formaldehyde at room temperature (RT), followed by permeabilization with 0.5% Triton X-100 in PBS for 15 minutes at RT. Slides blocked for 1hr with 10% BSA in 1X PBS, and stained overnight with primary antibody (Monoclonal Anti-Flag M2 antibody, Sigma, F1804) at a 1:1000 dilution. Slides were washed and stained with a AF568 conjugated goat anti-mouse secondary antibody (Invitrogen, Cat# A-11004) at a 1:200 dilution for 2 hours. Slides were stained with DAPI at a 1:10000 dilution, and mounted with Slowfade, Gold antifade mounting media. Imaging was carried out on a Leica Multiphoton Confocal using a 63X oil immersion objective with a 2.0 zoom factor, a pixel size of 58.77 nm x 58.77 nm, and Z-stack sizes of 15 μm with a Z-step size of 300 nm.

Flow Cytometry

Single-cell suspensions were stained following standard protocols. The fluorochrome-conjugated, anti-mouse antibodies were as follows: CD44-BV785 (dilution- 1:400), CD45-BV650 (dilution- 1:400), Thy1.2 PerCPCy5.5 (dilution- 1:300), Ckit PE (dilution- 1:300), CD25 PECy7 (dilution- 1:350), B220-APC (dilution- 1:300), CD11b-BV421 (dilution- 1:200), Sca1-PECy7 (dilution- 1:200), Ter119-APC (dilution- 1:200), CD3-APC (dilution- 1:200), NK1.1-APC (dilution- 1:200), GR1-APC (dilution- 1:200), TCRgd-APC (dilution- 1:200), TCRb-APC (dilution- 1:200), Cd11c-APC (dilution- 1:200), Cd19-APC (dilution- 1:200), and CD11b-APC (dilution- 1:200). Cells were stained with LIVE/DEAD Fixable Aqua Dead Cell Stain Kit (ThermoFisher Scientific) (dilution- 1:500) or Invitrogen eBioscience Fixable Viability Dye eFluor 780 (dilution- 1:4000) for discrimination of live cells. Resuspended cells were supplemented with 123count eBeads (ThermoFischer Scientific, ref:01–1234-42) following manufacturer’s recommendations for cell counting. For intracellular flow cytometry of TCF-1 Data were collected on an LSRII running DIVA software (BD Biosciences) and were analyzed with FlowJo software v10.6.1.

RNA-seq

Cells were washed once with 1x PBS before resuspending pellet in 350 μL Buffer RLT Plus (QIAGEN) with 1% 2-Mercaptoethanol (Sigma), vortexed briefly, and stored at −80°C. Subsequently, total RNA was isolated using the RNeasy Plus Micro Kit (QIAGEN). RNA integrity numbers were determined using a TapeStation 2200 (Agilent), and all samples used for RNA-seq library preparation had RIN numbers greater than 9. Libraries were prepared using the SMARTer® Stranded Total RNA-seq Kit v2- Pico Input Mammalian kit (Takara). 2–3 biological replicates were generated for each experiment. Two separate aliquots of cells per condition were used as technical replicates for each biological replicate. Libraries were validated for quality and size distribution using a TapeStation 2200 (Agilent). Libraries were paired-end sequenced (38bp+38bp) on a NextSeq 550 (Illumina).

ATAC-seq

ATAC-seq was performed as previously described with minor modifications 55,56. Fifty thousand cells were pelleted at 550 x g and washed with 50 μL ice-cold 1x PBS, followed by treatment with 50 μL lysis buffer (10 mM Tris-HCl [pH 7.4], 10 mM NaCl, 3 mM MgCl2, 0.1% IGEPAL CA-630). Nuclei pellets were resuspended in 50 μL transposition reaction containing 2.5 μL Tn5 transposase (FC-121–1030; Illumina). The reaction was incubated in a 37°C heat block for 45 min. Tagmented DNA was purified using a MinElute Reaction Cleanup Kit (QIAGEN) and amplified with varying cycles, depending on the side reaction results. Libraries were purified using a QIAQuick PCR Purification Kit (QIAGEN). Libraries were validated for quality and size distribution using a TapeStation 2200 (Agilent). Libraries were paired-end sequenced (38bp+38bp) on a NextSeq 550 (Illumina).

CUT&RUN

CUT&RUN was performed sorted DN1, DN2, and DN3 cells using CUTANA ChIC/CUT&RUN Kit (EpiCypher, Cat#14–1048), using manufacturer’s recommendation. Briefly, between 20,000–200,000 live cells were sorted and nuclei were extracted, washed, and allowed to adsorb onto activated Concanavalin A beads. Cells were then resuspended in recommended buffer, 0.5 mg of antibody was added, mixed well, and allowed to incubate at 4°C overnight on a nutator. Anti-TCF-1 (Cell Signaling Technology, C63D9) antibody was used along with positive and negative controls. Subsequently, the reactions were washed with cell permeabilization buffer and incubated with pAG-MNase, and DNA was isolated for the antibody-bound regions. At least two biological replicates were generated for each experiment. Library preparation was carried out using NEBNext Ultra II DNA Library Prep Kit for Illumina (NEB) and were paired-end sequenced (38bp+38bp) on a NextSeq 550 (Illumina) or 61bp+61bp on Novaseq 6000 (Illumina).

RNA-seq data analysis:

The FASTQ files of RNA-seq experiments were aligned and further counted using STAR 2.7.7a with parameters ‘--outSAMtype BAM SortedByCoordinate --outWigType wiggle read1_5p --outWigStrand Stranded --outWigNorm RPM --quantMode GeneCounts’. Next, DESeq2 was performed to identify differentially expressed genes (|log2 fold change|>1 or 0.5 and adjusted p-value < 0.05). Heatmaps of differential genes were created using pheatmaps in R with parameters: scale = “row”.

ATAC-seq data analysis:

The FASTQ files of ATAC-seq experiments were aligned to the bam file using BWA (version 0.7.17-r1188). In this process, minor chromosomes such as mitochondrial chromosome or chrY were removed using samtools (version 1.11). Next, duplicated reads were removed using Picard (version 2.26.7) and then the bam files were indexed using samtools. BigWig files were generated using bamCoverage (version 3.3.2) with parameters ‘normalizedUsing=CPM, binsize=30, smoothLength=300, p=5, extendReads=200’. For peak calling, macs2 (version 2.1.4) was used with following commands: ‘macs2 callpeak -t input_file -c control -g mm -n output_path –nomodel -f BAMPE -B –keep-dup all –broad –broad-cutoff 0.25 -q 0.25’. The count data of each peak was then fed to DESeq2 for differential analysis.

CUT&RUN analysis

The FASTQ files of CUT&RUN experiments were aligned to the bam file using BWA (version 0.7.17-r1188). In this process, minor chromosomes such as mitochondrial chromosome or chrY were removed using samtools (version 1.11). Next, duplicated reads were removed using Picard (version 2.26.7) and then the bam files were indexed using samtools. BigWig files were generated using bamCoverage (version 3.3.2) with parameters ‘normalizedUsing=CPM, binsize=30, smoothLength=300, p=5, extendReads=200’. For peak calling, macs2 (version 2.1.4) was used with following commands: ‘macs2 callpeak -t input_file -c control -g mm -n output_path –nomodel -f BAMPE -B –keep-dup all –broad – --broad-cutoff 0.1 -q 0.1’. For the background (control), the bam file of IgG CUT&RUN data was used. CUT&RUN peaks from two conditions and both replicates were merged and the number of fragments in each peak were counted with bedtools. The count data of each peak was then fed to DESeq2 for differential analysis.

Deeptools analysis of ATAC-seq data:

The differentially gained or lost sites were obtained using DESeq2 (|log2 fold change| > 1 and adjusted p-value<0.05). Next, deeptools plot was generated with computeMatrix function using following parameters: reference-point –referencePoint center -a 2000 -b 2000. The heatmap was generated with the ‘plotHeatmap’ function with --kmeans 3.

Motif Analysis

Homer de novo motif analysis was performed using findMotifsGenome.pl on differential peak sets identified by Deseq with options -size given -len 6,8,10 and background as non differential peaks or random background.

Re-Analysis of GSE82044

Microarray data from GSE82044 was re-analyzed with GEO2R to find differentially expressed genes between Gata2 knockout and control DCs. Probes for Agilent-028005 SurePrint G3 Mouse GE 8×60K Microarray were collapsed to corresponding genes, for genes with multiple probes the mean fold change and adjusted P value was utilized. Gata2 activated and repressed genes were defined as having a log fold change (logFC) of greater than or less than 0.5 and −0.5 and adjusted P<0.05. Overlap between differential genes up and down in ΔL1 compared to wild type TCF-1 expressing DN2s and Gata2 activated and repressed gene lists were calculated and eCDF of the Gata2KO vs. control log fold change was plotted in R.

Immgen analysis of gene sets

Expression values of gene sets were plotted across a curated list of 62 immune cell types. Normalized gene counts were downloaded from immgen (GSE109125_Normalized_Gene_count_table.csv). For gene sets of interest, scaled expression values were calculated by subtracting the mean and dividing by the standard deviation of each gene across all cell types.

GSEA analysis

Pre-ranked lists of genes were used by ranking genes using estimated log2 fold-change in DESeq2 for 293T cells expressing human WT TCF-1 vs. EV. GSEA v2.2.4 with default parameters was used to perform gene set enrichment analysis.

Gene ontology using Metascape

Metascape (https://metascape.org/gp/index.html#/main/step1) was utilized for gene ontology analysis of differential gene sets.

Imaging analysis

Granularity measurements were performed with CellProfiler Version 4.2.5 (https://cellprofiler.org)57. Image pre-processing steps were completed using FIJI53 (ImageJ2 Version 2.9.0). ‘IdentifyPrimaryObjects’ tool was used to perform segmentation on maximal intensity projections with a minimum and maximum object diameter of 50 and 200 pixels. Objects outside this range were discarded along with objects that were in contact with the image border. The ‘MeasureGranularity’ tool was used to report the percentage of the highest intensity pixels that were subtracted from the image within the iterative range of the granular spectrum specified. Images were subsampled by a factor of 0.25 for granularity measurements and a subsampling factor of 0.25 was introduced for background reduction which reduced low-frequency background variations in the image. The radius of the structuring element of interest, referring to the approximate radius of punctate objects, was set at 2 pixels to represent the effect of subsampling on the original maximum intensity projection images. The 2-pixel structuring element radius would therefore correspond to an 8 pixel radius in the unsampled image. The granular spectrum range was specified as 40 iterations and the first iteration percentages were used to compare granularities conditions.

Expression and Purification of Recombinant TCF-1

cDNA encoding the full-length mouse TCF-1 protein (NCBI sequence ID EDL33620.1) with an N-terminal 6xHis tag and TEV cleavage site separated by DYDIPTT and GSEF linkers, respectively, was cloned into a pET-derived bacterial expression plasmid (gift from Sierra McDonald and Shelley Berger, PhD, University of Pennsylvania) via NEB HiFi DNA Assembly. A single sequence-verified clone was transformed into NEB T7 Express lysY chemically competent E. coli (NEB C3010I) and plated on LB agar + carbenicillin. For this and all subsequent antibiotic selection, 100ug/mL carbenicillin (GoldBio) was used. An overnight LB + carbenicillin starter culture was inoculated with isolated colonies of transformed T7 Express lysY E. coli and grown at 37C with vigorous shaking. Preparative-scale growth cultures were prepared using Terrific Broth (RPI) media supplemented with 4mL glycerol / 1L (RPI) and 10mM magnesium sulfate (Sigma Aldrich), inoculated with starter culture (1:2000 dilution) and carbenicillin, and grown at 37C with vigorous shaking until an OD600 of approximately 0.4–0.6 was achieved. Cultures were subsequently induced with 0.4mM IPTG (GoldBio) and grown for 12–14 hours at 18C with vigorous shaking. Bacterial pellets were recovered via centrifugation (>6,000 rcf, 20min, 4C), resuspended in an adequate volume of Ni Wash/Lysis Buffer (60mM NaPO4H2/Na2PO4H pH 8.0, 500mM NaCl, 20mM imidazole pH 8.0, 10% glycerol, +4mM DTT supplemented with 1X Roche cOmplete Protease Inhibitors EDTA-free), frozen in liquid N2, and stored at −80C.

Nickel Affinity Pulldown for Purification of Recombinant TCF-1

Frozen bacterial pellets were thawed on ice and supplemented with lysozyme (CAS 9001–63-2; MP Biomedicals). Cells were lysed via sonication with ice bath submersion cooling until turbidity and color changes indicative of complete lysis were achieved (approximately 1min sonication time per 1L culture-equivalent of cell resuspension via cycles of 10sec on, 20sec off at 60% amplitude in increments of 2–3min total sonication time; Fisher FB505 sonicator, 500W power, 20kHz frequency, 0.5in solid probe). All subsequent liquid handling, chromatography, and other purification procedures were similarly performed at 4C or on ice, as appropriate. Lysate was clarified via two sequential rounds of centrifugation (>10,000 rcf, 20 min, 4C) then mixed for 30 min with Ni2+-NTA agarose resin (GoldBio, 1mL 50% slurry per 2L culture equivalent) equilibrated in Ni Wash/Lysis Buffer. Flow-through was collected via gravity column and resin was sequentially washed with >15 column volumes each (CVs) of Ni Wash/Lysis Buffer and Ni Wash Buffer 2 (60mM NaPO4H2/Na2PO4H pH 8.0, 300mM NaCl, 20mM imidazole pH 8.0, 10% glycerol, +4mM DTT). Bound proteins were eluted in 3×5 CVs of Ni Elution Buffer (Wash Buffer 2 with 200mM imidazole pH 8.0).

Ion Exchange Chromatography for Purification of Recombinant TCF-1

Nickel eluate was diluted with 10mM HEPES/NaOH pH 7.8 / 10% glycerol (+5mM DTT) to approximately equivalent conductivity as IEX Buffer A (20mM HEPES/NaOH pH 7.8, 130mM NaCl, 10% glycerol, +5mM DTT), then loaded on a Buffer A-equilibrated 5mL HiTrap Heparin Sepharose High Performance (“HP”) column (Cytiva) via an AKTA Pure 25 sample pump at 2–3 mL/min. After washing with 5CV Buffer A, protein was eluted (1.5 mL/min) over an 8CV gradient of 0–100% IEX Buffer B (20mM HEPES/NaOH pH 7.8, 1M NaCl, 10% glycerol, +5mM DTT), which resolved two partially overlapping major populations of protein by 280nm absorbance that differed from each other primarily in the relative abundance and size distribution of lower and higher molecular weight species by SDS-PAGE analysis, but were similarly enriched for the major species of the expression construct. Fractions corresponding to each of the earlier- and later-eluting halves of this major peak (“pool 1” and “pool 2” respectively) were separately pooled for further purification and chromatographic analysis, though only the later-eluting material was ultimately characterized by HX-MS given its apparently greater capacity for more robust ionic interactions with a DNA-like polymer.

Size Exclusion Chromatography for Purification of Recombinant TCF-1

Each Heparin pool was separately concentrated via repeated centrifugation (4,000–7,000 rcf, 20–30 min increments with mixing in between, 4C) in an Amicon Ultra-4 30 kDa MWCO centrifugal filter. Concentrate was transferred to a new tube and centrifuged at >20,000 rcf (10min, 4C) to ensure absence of any precipitate. This supernatant was loaded via 500uL injections onto a Superose 6 Increase 10/300 GL column (Cytiva; approx. 24mL bed volume) equilibrated in 0.2um-filtered HGN600 (20mM HEPES pH 7.8, 600mM NaCl, 5% glycerol, +5mM DTT) and eluted over 1.5CV at 0.5–1.0 mL/min AKTA Pure 25. Multiple injections and column runs were performed as needed for the total quantity of protein in each Heparin pool concentrate. For both Heparin pools, a minor void population was similarly separated from two major populations of larger and progressively smaller effective sizes at retention volumes of approx. 10–14mL and 16–20mL, respectively. The primary peak of this later-eluting population (hereafter, “target peak”) was enriched for the apparently near-full-length expression construct with only minimal appreciable proteolysis or degradation by SDS-PAGE. A minimal number of equivalent fractions from separate pool 2 Superose 6 runs corresponding to the approximate center of the target peak were combined and dialyzed against 1L (>1500-fold excess by volume) of HGN280 (20mM HEPES pH 7.8, 280mM NaCl, 5% glycerol, +5mM DTT) for 16 hours (Thermo Scientific Slide-A-Lyzer 2k MWCO MINI Dialysis Device, approx. 100uL per device). Prior to HX-MS analysis, combined dialyzed material was filtered using 0.22um Ultrafree-MC GV Durapore centrifugal filters (Millipore Sigma) pre-equilibrated in HGN280.

To determine whether target peak species were potentially subject to time-dependent aggregation after the initial Superose 6 purification, remaining portions of additional unpooled, undialyzed fractions corresponding to the 10–14mL peak and a region spanning the target peak (but not used for HX-MS) were separately pooled several days after the conclusion of HX-MS data acquisition. These pools were supplemented with fresh DTT in excess of existing DTT by approx. 5mM, separately concentrated, 0.1um centrifugal-filtered (Ultrafree-MC PVDF, Millipore Sigma), injected in 100uL onto a Superose 6 Increase 10/300 GL column (GE Healthcare) equilibrated at room temperature in fresh, 0.1um-filtered 20mM HEPES/NaOH pH 7.8 / 600mM NaCl (+10mM DTT), and analyzed by 280nm absorbance throughout continuous elution at 0.5mL/min. Acquisition of these data was performed at the Johnson Foundation Structural Biology and Biophysics Core at the Perelman School of Medicine (Philadelphia, PA) with assistance from Core staff.

SDS-PAGE Gel Electrophoresis for Purification of Recombinant TCF-1

SDS-PAGE analysis was performed using 4–20% Mini-PROTEAN TGX precast gels (BioRad) with 25mM Tris / 192mM glycine pH 8.3 / 0.1% SDS electrophoresis buffer. Gels were stained using either Coomassie G-250 or SYPRO Orange (Thermo Fisher) per the manufacturer’s instructions and imaged on either an Epson document scanner (Coomassie stain) or GE Typhoon fluorescent imager.

Hydrogen-Deuterium Exchange Mass Spectrometry (HX-MS) Overview

H-to-D exchange (HX) of recombinant full-length, N-terminally 6xHis-tagged mouse TCF-1 protein was queried via electron spray ionization (ESI) mass spectrometry (MS) essentially as described58,59 using a Thermo Scientific Q Exactive Mass Spectrometer (calibrated every 24 hours per manufacturer’s instructions) at the Perelman School of Medicine Johnson Foundation Structural Biology and Biophysics Core. For liquid chromatography- (LC) based protein digestion and peptide separation, a custom LC system contained within a Peltier cooling chamber set at 0C was used that consisted of an injection valve-controlled 50uL sample loop with downstream pepsin protease column (Thermo Scientific POROS AL 20um, 2.1×30mm loaded with Sigma pepsin) in-line with a C8 trap column (TARGA C8 5um, 5×1.0mm Piccolo column, Higgins Analytical TP-M501-C085), with isocratic flow of 50uL/min 0.1% formic acid + 0.05% trifluoroacetic acid (TFA); 3 minutes after initiating flow through the sample loop, flow through the trap column was diverted from waste to a separate path driven by an Eksigent gradient pump to elute peptides from the trap onto an analytic C8 column (TARGA C8 5um, 50×0.3um, Higgins Analytical TS-05M3-C085) via 6uL/min of 10% acetonitrile (ACN) (Buffer A, 0.1% formic acid + 0.05% TFA; Buffer B, 100% ACN), which was further developed over sequential 15min and 5min linear gradients of 10–40% and 40–60% ACN, respectively, with continuous elution onto the ESI path followed by MS peptide separation. For initial identification of the digested peptides obtained under our conditions and their respective retention times, 2 sequential replicates of high resolution all-1H, tandem MS/MS spectra were acquired in positive-ion mode (Thermo Scientific Q Exactive), with search exclusion of peptides identified from the first MS/MS replicate during the second replicate of MS/MS acquisition. For all subsequent D-containing samples, only single MS positive-ion mode spectra were acquired as previously described58,59.

Preparation of HX-MS Samples

Dialysis of pooled Superose 6 fractions described above were setup such that the final estimated concentration of TCF-1 protein in each HX sample was approx. 2–3 uM (calculated from A280 using ε280 = 41830). Each HX sample was generated by mixing 10uL of filtered, dialyzed TCF-1 protein with 2uL of 60mM DTT (prepared in DTT-free HGN280), followed by rapid manual addition on ice of 48uL D2O dilution buffer (92.5mM NaCl and 5% glycerol prepared in D2O with one of the following buffer components: for HX at measured pH = 7.0, 20mM HEPES/KOD with measured pH = 7.03; for HX at measured pH = 6.0, 20mM MES/KOD with measured pH = 5.91; for HX at measured pH = 5.0, 20mM MES/DCl with measured pH = 2.44 ). Before use, each D2O dilution buffer was freshly supplemented with 2.5mM TCEP/KOD, prepared in D2O as a 1M stock with measured pH = 4.62. This setup achieves a final sample D2O composition of 80% in a background of 20mM HEPES or MES (with final measured pH of 7.0, 6.0, or 5.0 as above), 130mM NaCl, 5% glycerol, +2mM TCEP. Stocks of DCl and KOD used for adjusting the measured pH of each solution were prepared using D2O. After the specified HX time, this 60uL mixture was rapidly transferred with mixing to a new tube on ice containing 8.4uL (pH 7.0 HX) or 5.4uL (pH 6.0 HX) of 300mM H3PO4 (prepared in H2O), or 3.6uL of 250mM H3PO4 (pH 5.0 HX), to lower the measured pH of each respective sample to 2.44–2.45 and quench H-to-D exchange. 50uL of the quenched sample was immediately loaded into a pre-cooled glass Hamilton syringe and rapidly injected onto the LC sample loop described above. Sufficiently homogenous mixing of protein with D2O dilution buffer was achieved via the described pipetting steps, which were chosen to allow for reproducible pipetting with very short HX times <10sec. Quench conditions were determined empirically for each sample series. The measured pH of the sample mixture during both its HX and quenched states was repeatedly verified in advance using scaled larger volume, simulated mixtures of all components (exact lot) and an accupHast pH electrode (Fisher Scientific) freshly calibrated at 4 points over pH 1.64 to 10.00 with commercial Fisher pH standards. For the all-1H sample used for MS/MS, 10uL from an undialyzed portion of the size exclusion-purified material in HGN600 was mixed with 2uL of 30mM DTT (prepared in DTT-free HGN280), diluted with 48uL HGN25 (20mM HEPES pH 7.8, 25mM NaCl, 5% glycerol, +5mM DTT) to achieve a final [NaCl] ~ 130mM, mixed with 5.4uL of 300mM H3PO4 (prepared in H2O) to achieve a final pH of approximately 2.1–2.3, then 50uL of this material was injected immediately as described above.

Analysis of HX-MS Data

HX-MS data were analyzed essentially as described using ExMS248 with the sample pD (given 80% D2O / 20% H2O) for each condition estimated as measured sample pH (pHmeas) + 0.460. The “preload” datafile used by ExMS2 for generating the reference peptide list against which all experimental HX-MS spectra were compared was generated using Proteome Discoverer software (SEQUEST search with default parameters, modified as appropriate, for recombinant TCF-1 protein sequence against a custom database of off-target/decoy protein sequences). To empirically account for the D-to-H back-exchange that occurs continuously, even after “quenching” and transfer onto the LC system, the ExMS2-derived number of incorporated deuterium (D) atoms (vs. time) for each condition and peptide observation (sample observation centroid m/z – corresponding all-1H centroid m/z) was either (i) normalized to the corresponding ExMS2 calculation for a pHmeas 7.0 sample with HX time of approximately 23 hours to give a % deuterium uptake (Figs. 1B, Extended Data Fig. 1B), or (ii) scaled by the quantity maxD/(pHmeas 7.0 23hr centroid m/z – corresponding all-1H centroid m/z) to give the back-exchange-corrected number of incorporated D (Fig. 1C, Extended Data Fig. 1D-E), where maxD is the number of amino acids in an observed peptide – the number of prolines – 2. ExMS2-generated representative examples of the uncorrected number of incorporated D measured from m/z differences in centroid distributions are shown in Extended Data Fig. 1C. Given the effectively saturated exchange observed after 20 minutes under pHmeas 7.0 conditions, this 23hr sample is a reasonable estimate of an “all-D” sample.

Visualizations of % deuterium uptake across the TCF-1 protein sequence (Figs. 1B, Extended Data Fig. 1B) or exchange vs. time (Fig. 1C, Extended Data Fig. 1D-E) were generated using R. For Fig. 1C, the actual time values for pHmeas 5.0 and 7.0 samples were multiplied by 0.1 and 10, respectively, to scale the time for all samples relative to a pHmeas 6.0 timescale (chemical exchange rates increase 10-fold with each pH increase of 1.0, so similar scaling can be applied to protein samples under the assumption that there are no pH-induced structural changes over the pH range of interest, in which case there would be a clear absence of equivalence between time t at pHmeas 5.0 and time 0.1*t at pHmeas 6.0, for example). Time-scaled experimental data were then fit using nonlinear least squares regression in R to the stretched exponential function47 D(kex, b, t) = maxD*(1-exp(−(kex*t)^b)), where values of the stretching factor b were not fixed, maxD was defined per peptide as above, and D(t = 0sec) was forced as 0. This analysis provides an approximate estimate of the effective observed peptide-level HX rate constant kex for each analyzed peptide, where larger kex generally correspond to less protection from exchange (Fig. 1C, Extended Data Fig. 1D).

For each indicated peptide sequence (in the context of the full-length, unfragmented protein), the random coil-predicted exchange vs. time relationships (Fig. 1C, Extended Data Fig. 1E) were calculated as the sum of (1-exp(−ki,pred*t)) over all residues within that sequence (except for the first 2 N-terminal residues of the peptide and proline residues). Here, ki,pred is the predicted single-residue rate constant for -NH exchange (calculated for pD = 6.4 and T = 277.15 K from previously described reference parameters19,20,48 using publicly available resources at hx2.med.upenn.edu/download.html) if that residue were dynamically disordered random coil subject to chemical and steric effects from neighboring residues, but not subject to protection from H-to-D exchange. To extract approximate estimates of the predicted peptide-level HX rate constants kpred for L1-L7 regions (Extended Data Fig. 1E), the predicted number of incorporated D vs. time for a given peptide from the sum of (1-exp(−ki,pred*t)) above were fit to D(kpred, b, t) = maxD*(1-exp(−(kpred*t)^b)), defined as before for D(kex, b, t) where the stretching factor b was again not fixed. Approximate peptide-level protection factors can be estimated as kpred/kex. However, we display only the calculated values of each rate constant in Extended Data Fig. 1E because of minor differences in the experimental vs. predicted stretching factors and because the experimental time dimension has been scaled based on the expected pH dependence of HX rates. Comparing calculated kex between L1-L7 vs. HMG regions (Extended Data Fig. 1D) quantitatively confirms the differences in exchange behaviors and qualitative extent of protection between these regions, despite experimental uncertainty and possible sample heterogeneity. We emphasize, though, that this rate constant comparison is approximate because of qualitative differences in the shape of many experimental exchange vs. time curves between L1-L7 vs. HMG peptides, which leads to differences in the stretching factors b from above. Because of similar shape differences across HMG peptides between experimental vs. predicted curves, we did not extend the quantitative analysis in Extended Data Fig. 1E to HMG peptides.

Statistics and Reproducibility

For all experiments, at least 2 biological replicate mice of matching age and sex were used. All experiments were independently reproduced between 2–4 times, except for HX-MS measurements, where 1–3 technical replicates of each condition (see Supplementary Table) were measured from a single preparation of purified recombinant TCF-1 protein. No statistical methods were used to pre-determine sample sizes but our sample sizes are similar to those reported in previous publications7. Data distribution was not formally tested. Experimental and control groups were tested for significance in Prism 9 GraphPad software (Version 9.2.0 (283), July 15, 2021) using one-way ANOVA followed by Dunnett’s multiple comparison test (ns = not significant, * = p ≤ 0.05, ** = p ≤ 0.01, *** = p ≤ 0.0005, **** = p ≤ 0.0001. Data collection and analysis in this study did not require randomization and blinding. No data were excluded from the analyses.

Materials availability

This study did not generate new unique reagents.

Extended Data

Extended Data Fig. 1. The N terminus of TCF-1 is intrinsically disordered (related to Figure 1).

Extended Data Fig. 1

a) Summary of size exclusion chromatography purification of affinity- and ion exchange-purified recombinant TCF-1 protein expressed in E. coli. Chromatogram (left) displays measured A280 vs. elution volume for a representative injection of pooled ion exchange fractions (see Methods). Vertical dashed lines indicate approximate position of fractions pooled for analysis by HX-MS. Gel image displays SDS-PAGE analysis (stained with SYPRO Orange) of final protein input to HX-MS. Chromatogram (right) displays A280 vs. elution time for repeated analysis of indicated fractions from a prior Superose 6 Increase 10/300GL run during the initial purification. This repeated analysis suggests that the purified material is subject over time to some extent of aggregation that resembles the larger molecular weight (MW) population seen during the initial purification.

b) Plots of normalized deuterium uptake (relative to measured deuterium content after 23hrs of H-to-D exchange) at each indicated measured sample pH (pHmeas) for each TCF-1 peptide observation at the indicated exchange times (different peptide charge states treated as separate observations). For observations with technical replicates (n=3 independent samples for pHmeas 6.0 4sec, 10sec), center line represents mean value with error bars correspond to standard deviation. Shaded columns indicated pHmeas and time conditions where approximately equivalent exchange is expected given the pH dependency of HX rates.

c) Representative mass spectra of indicated peptide observations (generated using ExMS248). Relative to the all-1H sample (treated as HX time = 0sec), the change in m/z of each centroid distribution reflects the indicated change in mass due to deuterium incorporation.

d) Time-scaled, back exchange-corrected deuterium content vs. time experimental data as in Fig. 1c was fit to a stretched-exponential function for each peptide across the indicated regions of TCF-1 (each datapoint is a unique peptide observation; different charge states treated as separate observations). Boxplots of estimated kex values are approximate estimates of observed peptide-level HX rate constants. Center line of box plots is median, limits are 1st and 3rd quartiles, and whiskers are maximum and minimum values. Peptide observations for which nonlinear least squares regression could not be achieved are not displayed.

e) Comparison of approximate predicted (random coil) kpred vs. observed kex peptide-level HX rate constants across peptides from the L1-L7 regions of TCF-1. Values for kex are as in (d). Values for kpred were estimated from the stretched-exponential fitting approach using predicted deuterium content vs. time data calculated from the predicted residue-specific HX rate constants across each respective peptide sequence under the assumption of no protection from exchange (see methods). Pearson correlation coefficient and corresponding correlation P value are displayed (calculated in R). Peptide observations for which nonlinear least squares regression of either predicted or observed data could not be achieved are not displayed.

Extended Data Fig. 2. Loss of TCF-1’s L1 domain limits DN1 to DN2 transition (related to Figure 2).

Extended Data Fig. 2

a) Identification of Thy1+ CD25+ cells in OP9-DLL1 co-cultures of Tcf7 cKO cells transduced with empty vector (EV), WT TCF-1, or mutant TCF-1 (ΔL1–7) on day 13. Cells are pre-gated on SSC-A/FSC-A, Singlets, Live cell (Viability-), CD45+, Vex+ (transduced).

b) Identification of DN1 (CD44+ CD25-), DN2 (CD44+CD25+), and DN3 (CD44- CD25+) cells in OP9-DLL1 co-cultures of Tcf7 cKO cells transduced with EV, WT TCF-1, or mutant TCF-1 (ΔL1–7) on day 13. Cells are pre-gated on SSC-A/FSC-A, Singlets, Live cell (Viability-), CD45+, Vex+ (transduced).

c) Number and frequency of Thy1+ CD25+ cells (top) and DN2, and DN3 cells by CD44 and CD25 surface expression (bottom) in OP9-DLL1 co-cultures of Tcf7 cKO cells transduced with EV, WT TCF-1, or mutant TCF-1 (ΔL1–7) on day 13. Data are representative of at least 3 independent experiments; bars show mean from n=2 biologically independent animals, dots represent individual data points. P values are determined by one-way ANOVA followed by Dunnett’s multiple comparison test with WT TCF-1 (P45) as a control. *P ≤ 0.05, ** P ≤ 0.01, *** P ≤ 0.001, and **** P ≤ 0.001.

d) Identification of DN1 (CD44- CD25-), DN2 (CD44+ CD25+), and DN3 (CD44- CD25+) cells in co-cultures of wild type (WT) ckit+ bone marrow (BM) progenitors transduced with WT TCF-1, EV, or mutant TCF-1 (ΔL1–7) (GFP+) on OP9-DLL1 cells at day 5. Cells are pre-gated on SSC-A/FSC-A, Singlets, Live cell (Viability-), and CD45+.

e) Frequency of GFP+ (transduced) and GFP- (un-transduced) DN2 and DN3 cells in (D) (top). Analysis of ratio of GFP+ to GFP- Thy1+ CD25+ cells in (D) (middle). Frequency of GFP+ and GFP- Thy1+ CD25+ cells in (D) (bottom). Data are representative of at least 3 independent experiments; bars show the mean from n=2 biologically independent animals, dots represent individual data points.

f) Identification of DN1 (CD44- CD25-), DN2 (CD44+ CD25+) and DN3 (CD44- CD25+) in WT ckit+ BM progenitors transduced with WT TCF-1, ΔL1, or EV (GFP+) cultured on OP9-DLL4 cells after 5 days. Cells are pre-gated on SSC-A/FSC-A, Singlets, Live cell (Viability-), and CD45+.

g) Identification of DN1 (CD44+ CD25-), DN2 (CD44+ CD25+), and DN3 (CD44- CD25+) T cell progenitors in co-cultures of WT ckit+ BM progenitors transduced with WT TCF-1 (GFP+) and cultured on OP9-DLL1 cells (left) and OP9 controls cells (right) for 5 days. Cells are pre-gated on SSC-A/FSC-A, Singlets, Live cell (Viability-), and CD45+.

h) Histogram depicting TCF-1 intracellular flow cytometry in Tcf7−/− progenitors un-transduced (Vex-) or transduced with WT TCF-1, ΔL1, or EV (Vex+) as well as WT TCF-1 sufficient CD25- or CD25+ progenitors.

i) Frequency of B220+ Vex+ (transduced) cells in Tcf7 cKO OP9-DLL1 co-cultures at day 5 (left) and 13 (right). Bars represent mean frequency from n=2 biologically independent animals, dots represent individual data points.

j) Number (left) and frequency (right) of CD11b+ CD25- cells at day 7 in Tcf7 cKO OP9-DLL1 co-cultures. Bars represent the mean values from n=2 biologically independent animals, dots represent individual data points.

Extended Data Fig. 3. GATA2 driven mast cell gene signature is unmasked in developing T cells lacking L1 (related to Figure 3).

Extended Data Fig. 3

a) Heatmaps depicting gene ontology enrichment in significantly differential gene sets (adjusted P<0.05, |Log2FoldChange|>1). P values are calculated using a hypergeometric test.

b) Heatmap demonstrating differentially expressed genes (adjusted P <0.05) between wild type (WT) TCF-1 transduced DN1 and DN2s from Tcf7 cKO cells on OP9-DLL1 co-cultures at day 7. P-values are calculated by the Wald test and adjusted using the Benjamini and Hochberg method.

c) Principle component plot of RNA-sequencing on 293T human cell line transduced with empty vector (EV), wild type (WT) human TCF-1, and an internal deletion mutant lacking the analogous L1 region of human TCF-1; human ΔL1 (upper panel). GSEA depicts the enrichment of genes in GSE22601_IMMATURE_CD4_SINGLE_POSITIVE VS_DOUBLE_POSITIVE_THYMOCYTE_UP gene set within genes upregulated in 293T cells with human TCF-1 vs. EV.

d) Heatmap depicting transcription factors differentially upregulated in ΔL1 and WT TCF-1 transduced DN2s from Tcf7 cKO cells on OP9-DLL1 co-cultures at day 7 (adjusted P<0.05 and |Log2FoldChange|>1). P-values are calculated by the Wald test and adjusted using the Benjamini and Hochberg method.

e) Bar plots depicting select gene expression (in RPKM) values in DN1 and DN2s from Tcf7 cKO cells on OP9-DLL1 co-cultures at day 7. Bars represent mean RPKM values, error bars represent Standard deviation (SD), and individual data points are represented with dots.

Extended Data Fig. 4. GATA2 driven mast cell transcriptional signature is unmasked in developing T cells lacking the L1 region of TCF-1 (related to Figure 3).

Extended Data Fig. 4

a-e. Representative genome browser views of counts per million normalized strand specific RNA-seq tracks at Gata2 (a.), Gata3 (b.), Thy1 (c.), Mcpt1/2/4 (d.) and Bcl11b loci (e).

Extended Data Fig. 5. The L1 domain of TCF-1 modulates binding and transcriptional outcomes in early T cell development independent of chromatin accessibility. (related to Figure 4).

Extended Data Fig. 5

a) SeqLogo depicting top enriched motifs from de novo HOMER motif analysis of differentially accessible peaks in WT vs. EV, ΔL1 vs. EV, and WT vs ΔL1 transduced DN1 and DN2s with non-differential peaks as background. P values are calculated using a hypergeometric test.

b) Venn-diagram representing TCF-1 CUT&RUN experiments and associated unique and overlapping WT TCF-1 and ΔL1 binding events in DN1 and DN2s.

c) Principal component analysis of TCF-1 CUT&RUN and chromatin accessibility measurements in DN1 and DN2s. Counts of ATAC-seq and TCF-1 binding in CUT&RUN measurements were generated across the union of all peaks across all ATAC-seq and CUT&RUN conditions.

d) SeqLogo depicting top enriched motifs from de novo HOMER motif analysis of L1 dependent and independent binding events in DN1 and DN2s compared to randomly generated background. P values are calculated using a hypergeometric test.

e) Heatmap depicting TCF-1 binding as measured by TCF-1 CUT&RUN and chromatin accessibility in DN1 and DN2s at differentially accessible peaks open in WT DN2 vs. DN1s.

Extended Data Fig. 6. Loss of the L1 domain of TCF-1 has limited effect on chromatin accessibility in committed T cells (related to Figure 6).

Extended Data Fig. 6

a) Principal component plot of RNA-sequencing on Tcf7−/− DN3 like Scid.adh cells transduced with empty vector (EV), wild type (WT) TCF-1, and internal deletion mutants: ΔL1, ΔL2, ΔL6, and ΔL7.

b) Volcano plot demonstrating significantly differential genes comparing WT TCF-1 and EV (left), ΔL1 and WT TCF-1 (middle), and ΔL7 and WT TCF-1 (right) transduced Tcf7−/− DN3 cells. (adjusted P<0.05 and |Log2FoldChange|>1) P-values are calculated by the Wald test and adjusted using the Benjamini and Hochberg method.

c) Heatmap depicting significantly up and down-regulated genes comparing WT TCF-1 and EV transduced Tcf7/− KO DN3 cells. (adjusted P <0.05 and |Log2FC|>1). P-values are calculated by the Wald test and adjusted using the Benjamini and Hochberg method.

d) Pathway enrichment analysis of differential gene sets depicted in B. P values are calculated using a hypergeometric test.

e) Quantification of number of WT TCF-1 and ΔL1 binding events profiled by TCF-1 and FLAG CUT&RUN in Tcf7/− KO DN3 cells. Bars represent mean number of binding sites from n=2 biologically independent samples.

f) Principal component plot of WT TCF-1 and ΔL1 binding events as measured in E.

Extended Data Fig. 7. Proteomics measurements suggest the interaction between RUNX1 and TCF-1 is dependent on the L1 domain (related to Figure 6).

Extended Data Fig. 7

a) Representative immunofluorescence images depicting GFP tagged wild type (WT) TCF-1, ΔL1 mutant TCF-1 and empty vector (EV). DAPI staining of nuclei and overlay images are included (right). Boxplot of granularity of GFP signal in DN3 cells transduced with either EV, WT TCF-1 or ΔL1 fused with GFP. Granularity indicates the percentage of highest intensity elements of 8 pixels subtracted relative to the background (see methods). Cells with a more granular pattern or punctate localization are indicated by a lower percentage. Center line of box plots represent median granularity, limits represent 1st and 3rd quartiles, whiskers represent maximum and minimum values, data points represent outliers. Cells analyzed per condition EV: n=189, WT TCF-1: n=237, ΔL1: n=190. P values were determined by a two-tailed Mann-Whitney test: *P ≤ 0.05, ** P ≤ 0.01, *** P ≤ 0.001, and **** P ≤ 0.001. Scale bar: 4μm.

b) Heatmap indicating the Z score of the log2 normalized abundance of top 100 proteins detected with a higher enrichment between DN3 cells expressing WT TCF-1 and both EV and ΔL1 in mass spectrometry of TCF-1 immunoprecipitation in DN3 cells.

c) Depiction of L1 dependent TCF-1 protein-protein interaction network identified by mass spectrometry of TCF-1 immunoprecipitation in DN3 cells. Node size and color indicate fold change in log normalized abundance between DN3 cells expressing WT TCF-1 and EV.

d) Network terms corresponding to Uniprot keywords are highlighted in the network depicted in b.

Supplementary Material

Supplementary Table S1

Acknowledgments

We thank helpful discussions with members of the Vahedi lab in addition to Maria Fasolino, Kenneth Zaret and Nancy Speck. We thank Kushol Gupta and the Johnson Foundation Structural Biology and Biophysics Core at the Perelman School of Medicine (RRID:SCR_022414) for HX-MS resources. Flow cytometry data were generated in the Penn Cytomics and Cell Sorting Shared Resource Laboratory at the University of Pennsylvania (RRID:SCR_022376), which is partially supported by P30CA016520 (Abramson Cancer Center). We thank the Marmorstein lab for protein purification equipment, the Berger lab for 6xHis-TEV bacterial expression vector, and the Lynch lab for Typhoon imager access. The work in this manuscript was supported by F30AI161873 (A.V.), R01AI091627 (I.M.), F30AI174776 (M.S.), R01-CA230800 and R01-CA248041 (R.B.F.), the Burroughs Welcome Fund, the Chan Zuckerberg Initiative Award, W. W. Smith Charitable Trust, Sloan Foundation, and the NIH grants R01AI168240, UC4DK112217, U01DK112217, R01HL145754, U01DK127768, U01DA052715 (G.V).

EJW is a member of the Parker Institute for Cancer Immunotherapy that supports research in the Wherry lab. EJW is an advisor for Danger Bio, Marengo, Janssen, NewLimit, Pluto Immunotherapeutics Related Sciences, Santa Ana Bio, Synthekine, and Surface Oncology. EJW is a founder of and holds stock in Surface Oncology, Danger Bio, and Arsenal Biosciences.

Footnotes

Code availability

Code available upon reasonable request.

Competing Interests Statement

Other authors declare no competing interests.

Data availability

The accession number for the ATAC-seq, RNA-seq and CUT&RUN reported in this study is NCBI GEO: GSE213238. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD043586. Proteome raw files were searched with the SwissProt mouse database (updated Jan 2023) (https://www.uniprot.org/help/downloads). Other Publicly available datasets used in the study: GSE82044.

References

  • 1.Cirillo LA et al. Opening of compacted chromatin by early developmental transcription factors HNF3 (FoxA) and GATA-4. Mol Cell 9, 279–289, doi: 10.1016/s1097-2765(02)00459-8 (2002). [DOI] [PubMed] [Google Scholar]
  • 2.Zaret KS & Carroll JS Pioneer transcription factors: establishing competence for gene expression. Genes Dev 25, 2227–2241, doi: 10.1101/gad.176826.111 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lambert SA et al. The Human Transcription Factors. Cell 172, 650–665, doi: 10.1016/j.cell.2018.01.029 (2018). [DOI] [PubMed] [Google Scholar]
  • 4.Verbeek S. et al. An HMG-box-containing T-cell factor required for thymocyte differentiation. Nature 374, 70–74, doi: 10.1038/374070a0 (1995). [DOI] [PubMed] [Google Scholar]
  • 5.Johnson JL et al. Lineage-Determining Transcription Factor TCF-1 Initiates the Epigenetic Identity of T Cells. Immunity 48, 243–257 e210, doi: 10.1016/j.immuni.2018.01.012 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Emmanuel AO et al. TCF-1 and HEB cooperate to establish the epigenetic and transcription profiles of CD4(+)CD8(+) thymocytes. Nat Immunol 19, 1366–1378, doi: 10.1038/s41590-018-0254-4 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wang W. et al. TCF-1 promotes chromatin interactions across topologically associating domains in T cell progenitors. Nat Immunol 23, 1052–1062, doi: 10.1038/s41590-022-01232-z (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Xu Z. et al. Cutting Edge: beta-Catenin-Interacting Tcf1 Isoforms Are Essential for Thymocyte Survival but Dispensable for Thymic Maturation Transitions. J Immunol 198, 3404–3409, doi: 10.4049/jimmunol.1602139 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zhao X, Shan Q. & Xue HH TCF1 in T cell immunity: a broadened frontier. Nat Rev Immunol 22, 147–157, doi: 10.1038/s41577-021-00563-6 (2022). [DOI] [PubMed] [Google Scholar]
  • 10.Love JJ et al. Structural basis for DNA bending by the architectural transcription factor LEF-1. Nature 376, 791–795, doi: 10.1038/376791a0 (1995). [DOI] [PubMed] [Google Scholar]
  • 11.Xing S. et al. Tcf1 and Lef1 transcription factors establish CD8(+) T cell identity through intrinsic HDAC activity. Nat Immunol 17, 695–703, doi: 10.1038/ni.3456 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wang C, Uversky VN & Kurgan L. Disordered nucleiome: Abundance of intrinsic disorder in the DNA- and RNA-binding proteins in 1121 species from Eukaryota, Bacteria and Archaea. Proteomics 16, 1486–1498, doi: 10.1002/pmic.201500177 (2016). [DOI] [PubMed] [Google Scholar]
  • 13.Shin Y. & Brangwynne CP Liquid phase condensation in cell physiology and disease. Science 357, doi: 10.1126/science.aaf4382 (2017). [DOI] [PubMed] [Google Scholar]
  • 14.Jumper J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589, doi: 10.1038/s41586-021-03819-2 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Boija A. et al. Transcription Factors Activate Genes through the Phase-Separation Capacity of Their Activation Domains. Cell 175, 1842–1855 e1816, doi: 10.1016/j.cell.2018.10.042 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ben Chorin A. et al. ConSurf-DB: An accessible repository for the evolutionary conservation patterns of the majority of PDB proteins. Protein Sci 29, 258–267, doi: 10.1002/pro.3779 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Peng K, Radivojac P, Vucetic S, Dunker AK & Obradovic Z. Length-dependent prediction of protein intrinsic disorder. BMC Bioinformatics 7, 208, doi: 10.1186/1471-2105-7-208 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Skinner JJ, Lim WK, Bedard S, Black BE & Englander SW Protein hydrogen exchange: testing current models. Protein Sci 21, 987–995, doi: 10.1002/pro.2082 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Bai Y, Milne JS, Mayne L. & Englander SW Primary structure effects on peptide group hydrogen exchange. Proteins 17, 75–86, doi: 10.1002/prot.340170110 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Connelly GP, Bai Y, Jeng MF & Englander SW Isotope effects in peptide group hydrogen exchange. Proteins 17, 87–92, doi: 10.1002/prot.340170111 (1993). [DOI] [PubMed] [Google Scholar]
  • 21.Hosokawa H. et al. Stage-specific action of Runx1 and GATA3 controls silencing of PU.1 expression in mouse pro-T cells. J Exp Med 218, doi: 10.1084/jem.20202648 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Hu G. et al. Transformation of Accessible Chromatin and 3D Nucleome Underlies Lineage Commitment of Early T Cells. Immunity 48, 227–242 e228, doi: 10.1016/j.immuni.2018.01.013 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Rothenberg EV, Hosokawa H. & Ungerback J. Mechanisms of Action of Hematopoietic Transcription Factor PU.1 in Initiation of T-Cell Development. Front Immunol 10, 228, doi: 10.3389/fimmu.2019.00228 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Shin B. et al. Runx1 and Runx3 drive progenitor to T-lineage transcriptome conversion in mouse T cell commitment via dynamic genomic site switching. Proc Natl Acad Sci U S A 118, doi: 10.1073/pnas.2019655118 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ungerback J. et al. Pioneering, chromatin remodeling, and epigenetic constraint in early T-cell gene regulation by SPI1 (PU.1). Genome Res 28, 1508–1519, doi: 10.1101/gr.231423.117 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Zhou W, Gao F, Romero-Wolf M, Jo S. & Rothenberg EV Single-cell deletion analyses show control of pro-T cell developmental speed and pathways by Tcf7, Spi1, Gata3, Bcl11a, Erg, and Bcl11b. Sci Immunol 7, eabm1920, doi: 10.1126/sciimmunol.abm1920 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhou W. et al. Single-Cell Analysis Reveals Regulatory Gene Expression Dynamics Leading to Lineage Commitment in Early T Cell Development. Cell Syst 9, 321–337 e329, doi: 10.1016/j.cels.2019.09.008 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Schmitt TM & Zuniga-Pflucker JC T-cell development, doing it in a dish. Immunol Rev 209, 95–102, doi: 10.1111/j.0105-2896.2006.00353.x (2006). [DOI] [PubMed] [Google Scholar]
  • 29.Schmitt TM & Zuniga-Pflucker JC Induction of T cell development from hematopoietic progenitor cells by delta-like-1 in vitro. Immunity 17, 749–756, doi: 10.1016/s1074-7613(02)00474-0 (2002). [DOI] [PubMed] [Google Scholar]
  • 30.de Boer J. et al. Transgenic mice with hematopoietic and lymphoid specific expression of Cre. Eur J Immunol 33, 314–325, doi: 10.1002/immu.200310005 (2003). [DOI] [PubMed] [Google Scholar]
  • 31.Yang Q. et al. TCF-1 upregulation identifies early innate lymphoid progenitors in the bone marrow. Nat Immunol 16, 1044–1050, doi: 10.1038/ni.3248 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Weber BN et al. A critical role for TCF-1 in T-lineage specification and differentiation. Nature 476, 63–68, doi: 10.1038/nature10279 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Yoshida H. et al. The cis-Regulatory Atlas of the Mouse Immune System. Cell 176, 897–912 e820, doi: 10.1016/j.cell.2018.12.036 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Ling KW et al. GATA-2 plays two functionally distinct roles during the ontogeny of hematopoietic stem cells. J Exp Med 200, 871–882, doi: 10.1084/jem.20031556 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Li Y. et al. GATA2 regulates mast cell identity and responsiveness to antigenic stimulation by promoting chromatin remodeling at super-enhancers. Nat Commun 12, 494, doi: 10.1038/s41467-020-20766-0 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Onodera K. et al. GATA2 regulates dendritic cell differentiation. Blood 128, 508–518, doi: 10.1182/blood-2016-02-698118 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Tsai FY & Orkin SH Transcription factor GATA-2 is required for proliferation/survival of early hematopoietic cells and mast cell formation, but not for erythroid and myeloid terminal differentiation. Blood 89, 3636–3643 (1997). [PubMed] [Google Scholar]
  • 38.Jin W. et al. Critical POU domain residues confer Oct4 uniqueness in somatic cell reprogramming. Sci Rep 6, 20818, doi: 10.1038/srep20818 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Boller S. et al. Pioneering Activity of the C-Terminal Domain of EBF1 Shapes the Chromatin Landscape for B Cell Programming. Immunity 44, 527–541, doi: 10.1016/j.immuni.2016.02.021 (2016). [DOI] [PubMed] [Google Scholar]
  • 40.Wang Y. et al. A Prion-like Domain in Transcription Factor EBF1 Promotes Phase Separation and Enables B Cell Programming of Progenitor Chromatin. Immunity 53, 1151–1167 e1156, doi: 10.1016/j.immuni.2020.10.009 (2020). [DOI] [PubMed] [Google Scholar]
  • 41.de Bruijn MF & Speck NA Core-binding factors in hematopoiesis and immune function. Oncogene 23, 4238–4248, doi: 10.1038/sj.onc.1207763 (2004). [DOI] [PubMed] [Google Scholar]
  • 42.Xing S. et al. Tle corepressors are differentially partitioned to instruct CD8(+) T cell lineage choice and identity. J Exp Med 215, 2211–2226, doi: 10.1084/jem.20171514 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Hosokawa H. et al. Transcription Factor PU.1 Represses and Activates Gene Expression in Early T Cells by Redirecting Partner Transcription Factor Binding. Immunity 49, 782, doi: 10.1016/j.immuni.2018.09.019 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Frederick MA et al. A pioneer factor locally opens compacted chromatin to enable targeted ATP-dependent nucleosome remodeling. Nat Struct Mol Biol 30, 31–37, doi: 10.1038/s41594-022-00886-5 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Minderjahn J. et al. Mechanisms governing the pioneering and redistribution capabilities of the non-classical pioneer PU.1. Nat Commun 11, 402, doi: 10.1038/s41467-019-13960-2 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Xue B, Dunbrack RL, Williams RW, Dunker AK & Uversky VN PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. Biochim Biophys Acta 1804, 996–1010, doi: 10.1016/j.bbapap.2010.01.011 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Nguyen D, Mayne L, Phillips MC & Walter Englander S. Reference Parameters for Protein Hydrogen Exchange Rates. J Am Soc Mass Spectrom 29, 1936–1939, doi: 10.1007/s13361-018-2021-z (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

Methods Only Reference

  • 48.Kan ZY, Ye X, Skinner JJ, Mayne L. & Englander SW ExMS2: An Integrated Solution for Hydrogen-Deuterium Exchange Mass Spectrometry Data Analysis. Anal Chem 91, 7474–7481, doi: 10.1021/acs.analchem.9b01682 (2019). [DOI] [PubMed] [Google Scholar]
  • 49.Carleton M. et al. Signals transduced by CD3epsilon, but not by surface pre-TCR complexes, are able to induce maturation of an early thymic lymphoma in vitro. J Immunol 163, 2576–2585 (1999). [PubMed] [Google Scholar]
  • 50.Ogilvy S. et al. Promoter elements of vav drive transgene expression in vivo throughout the hematopoietic compartment. Blood 94, 1855–1863 (1999). [PubMed] [Google Scholar]
  • 51.Shimshek DR et al. Codon-improved Cre recombinase (iCre) expression in the mouse. Genesis 32, 19–26, doi: 10.1002/gene.10023 (2002). [DOI] [PubMed] [Google Scholar]
  • 52.Dou Z. et al. Autophagy mediates degradation of nuclear lamina. Nature 527, 105–109, doi: 10.1038/nature15548 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Schindelin J. et al. Fiji: an open-source platform for biological-image analysis. Nat Methods 9, 676–682, doi: 10.1038/nmeth.2019 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Aguilan JT, Kulej K. & Sidoli S. Guide for protein fold change and p-value calculation for non-experts in proteomics. Mol Omics 16, 573–582, doi: 10.1039/d0mo00087f (2020). [DOI] [PubMed] [Google Scholar]
  • 55.Buenrostro JD, Giresi PG, Zaba LC, Chang HY & Greenleaf WJ Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10, 1213–1218, doi: 10.1038/nmeth.2688 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Fasolino M. et al. Genetic Variation in Type 1 Diabetes Reconfigures the 3D Chromatin Organization of T Cells and Alters Gene Expression. Immunity 52, 257–274 e211, doi: 10.1016/j.immuni.2020.01.003 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Stirling DR et al. CellProfiler 4: improvements in speed, utility and usability. BMC Bioinformatics 22, 433, doi: 10.1186/s12859-021-04344-9 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Mayne L. Hydrogen Exchange Mass Spectrometry. Methods Enzymol 566, 335–356, doi: 10.1016/bs.mie.2015.06.035 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Mayne L. et al. Many overlapping peptides for protein hydrogen exchange experiments by the fragment separation-mass spectrometry method. J Am Soc Mass Spectrom 22, 1898–1905, doi: 10.1007/s13361-011-0235-4 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Glasoe PK & Long FA USE OF GLASS ELECTRODES TO MEASURE ACIDITIES IN DEUTERIUM OXIDE1,2. The Journal of Physical Chemistry 64, 188–190, doi: 10.1021/j100830a521 (1960). [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Table S1

Data Availability Statement

The accession number for the ATAC-seq, RNA-seq and CUT&RUN reported in this study is NCBI GEO: GSE213238. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD043586. Proteome raw files were searched with the SwissProt mouse database (updated Jan 2023) (https://www.uniprot.org/help/downloads). Other Publicly available datasets used in the study: GSE82044.

RESOURCES