Abstract
The molecular regulation of human hematopoietic stem cell (HSC) maintenance is therapeutically important, but limitations in experimental systems and interspecies variation have constrained our knowledge of this process. Here, we have studied a rare genetic disorder due to MECOM haploinsufficiency, characterized by an early-onset absence of HSCs in vivo. By generating a faithful model of this disorder in primary human HSCs and coupling functional studies with integrative single-cell genomic analyses, we uncover a key transcriptional network involving hundreds of genes that is required for HSC maintenance. Through our analyses, we nominate cooperating transcriptional regulators and identify how MECOM prevents the CTCF-dependent genome reorganization that occurs as HSCs differentiate. We show that this transcriptional network is co-opted in high-risk leukemias, thereby enabling these cancers to acquire stem cell properties. Collectively, we illuminate a regulatory network necessary for HSC self-renewal through the study of a rare experiment of nature.
Subject terms: Haematopoietic stem cells, Gene regulation in immune cells, Haematopoiesis, Primary immunodeficiency disorders, Leukaemia
Modeling a rare bone marrow failure disorder due to haploinsufficiency for the MECOM transcription factor identifies a human hematopoietic stem cell regulatory network, which is co-opted by high-risk leukemias.
Main
HSCs lie at the apex of the hierarchical process of hematopoiesis and rely on transcriptional regulators to coordinate self-renewal and lineage commitment to enable effective and continuous blood cell production1. Perturbations of HSC maintenance or differentiation result in a spectrum of hematopoietic consequences, ranging from bone marrow failure to leukemia2. Despite the importance of HSCs in human health and the therapeutic opportunities that could arise from being able to better manipulate these cells, the precise regulatory networks that maintain these cells remain poorly understood.
Recently, loss-of-function mutations in myelodysplastic syndrome (MDS) and ecotropic virus integration site-1 (EVI1) complex locus (MECOM) have been identified that lead to a severe neonatal bone marrow failure syndrome3–5. Haploinsufficiency of MECOM leads to near complete loss of HSCs within the first months of life, suggesting an important and dosage-dependent role in early hematopoiesis. In mice, different Mecom isoforms have distinct hematopoietic functions6–8,9,10, but the ability of Mecom haploinsufficient mice to maintain sufficient hematopoietic output stands in sharp contrast to the profound and highly penetrant HSC loss observed in patients with MECOM haploinsufficiency, irrespective of which isoform is impacted. This interspecies variation suggests that the clinical observations in MECOM haploinsufficiency may provide a unique opportunity to better understand human HSC regulation.
MECOM overexpression has been reported in ~10% of adult and pediatric acute myeloid leukemias (AMLs) and is associated with a particularly poor prognosis11. Despite the potential mechanisms of MECOM activity that have been suggested from studies in AML cell lines12–15, the holistic functions of MECOM that enable effective human HSC maintenance and drive leukemia remain enigmatic. Here, inspired by in vivo observations from patients who are MECOM haploinsufficient, we have modeled this disorder by genome editing of primary human CD34+ hematopoietic stem and progenitor cells (HSPCs). Through integrative single-cell genomic analyses in this model, we define fundamental transcriptional regulatory circuits necessary for human HSC maintenance. Finally, we demonstrate that this same HSC transcriptional regulatory network is co-opted in AML, thereby conferring stem cell features and a poor prognosis.
Results
MECOM loss impairs HSC function in vitro and in vivo
Monoallelic mutations spanning the coding sequence of MECOM have been reported in at least 31 individuals with severe, early-onset neonatal bone marrow failure (Fig. 1a, Supplementary Table 1 and Extended Data Fig. 1a,b)3–5. The paucity of HSCs associated with MECOM haploinsufficiency prevents the mechanistic study of primary patient samples4, so we sought to develop a model to study MECOM haploinsufficiency in primary human cells by disrupting MECOM via CRISPR editing in CD34+ HSPCs purified from umbilical cord blood (UCB) samples of healthy newborns (Fig. 1b and Extended Data Fig. 1a,c,d). We achieved editing at >80% of alleles in the bulk CD34+ population, but the subpopulation of CD34+CD45RA−CD90+CD133+EPCR+ITGA3+ phenotypic long-term HSCs (LT-HSCs)16 displayed 48% editing of MECOM alleles (Fig. 1c), allowing for predominantly heterozygous edits in the LT-HSC compartment. Genotyping of single LT-HSCs following MECOM perturbation confirmed that 70% were heterozygous for MECOM edits (Fig. 1d), although this is likely an underestimation given that allelic dropout is common in single-cell genotyping17. These edits were transcribed to messenger RNA, but reduced transcript levels, possibly due to nonsense-mediated decay18 (Extended Data Fig. 1e–g).
MECOM-edited human HSPCs underwent 1.9-fold higher expansion over 5 d in culture conditions that promote HSC maintenance19 (Extended Data Fig. 1h,i), consistent with previous observations of differentiation and expansion of HSCs after MECOM loss8. MECOM perturbation was associated with a decrease in the proportion of bulk cells in G0/G1 on day 5, but no difference in the cell cycle states of HSCs (Extended Data Fig. 1j). Most HSCs remained in G0/G1 and the majority of LT-HSCs had G0/G1 transcriptional signatures (Extended Data Fig. 1k), as previously reported20. MECOM editing resulted in more frequent cell divisions (Extended Data Fig. 1l) and a significant reduction in the absolute number of LT-HSCs (Extended Data Fig. 1m), with a 3.7-fold reduction by day 10 after editing (Fig. 1e,f). We observed a 6.4-fold reduction in multipotent colony-forming unit (c.f.u.) granulocyte erythroid macrophage megakaryocyte (GEMM) colonies and a 3.8-fold reduction in bipotent c.f.u. granulocyte macrophage (GM) colonies, along with increases in more differentiated unipotential c.f.u. granulocyte (G) and c.f.u. macrophage (M) colonies (Fig. 1g). There was a similar loss of multipotent and bipotent progenitor colonies derived from adult HSPCs following MECOM editing (Extended Data Fig. 1n), validating the importance of this factor across developmental stages.
Next, we performed non-irradiated xenotransplantation of edited HSPCs into immunodeficient and Kit-mutant (Methods) mice to assess how MECOM loss impacts human HSCs in vivo21. MECOM-edited HSPCs engrafted in only half of the transplanted animals with significantly lower human chimerism in the peripheral blood and bone marrow compared to AAVS1-edited controls (Fig. 1h). When we compared the edited allele frequency of cells collected from the bone marrow at 16 weeks with the cells before transplant, we found a fivefold enrichment of the unmodified MECOM allele (Fig. 1i and Extended Data Fig. 1o,p), consistent with selection occurring against MECOM-edited HSCs. In the mouse bone marrow, there was a 2.7-fold reduction in human CD34+ HSPCs in the MECOM-edited samples, but no detectable differences in engrafted lymphoid, erythroid, megakaryocytic or monocytic lineages (Fig. 1j). Similarly, we found significant reduction in human chimerism following primary xenotransplantation of adult HSPCs following MECOM editing (Extended Data Fig. 1q). When we performed secondary xenotransplantation of UCB HSPCs, we observed moderate secondary engraftment of AAVS1-edited cells (two of five mice), but no detectable secondary engraftment of MECOM-edited cells (zero of eight mice). To more sensitively assay for the presence of human cells in the secondary transplant recipients, we PCR-amplified human MECOM from all bone marrow samples. Sequencing revealed 100% wild-type MECOM in seven of eight secondary recipients and 95% in the remaining mouse (Extended Data Fig. 1r). This near complete absence of MECOM edits in serially repopulating LT-HSCs is consistent with the profound HSC loss observed in patients with MECOM haploinsufficiency. In summary, our model of MECOM haploinsufficiency reveals that MECOM is required for maintenance of LT-HSC in vitro and in vivo and enables us to capture LT-HSCs before their complete loss to directly study MECOM function.
Single-cell profiling reveals HSC loss after MECOM disruption
Having established a primary human HSC model of MECOM haploinsufficiency, we sought to gain insights into the transcriptional circuitry required for human HSC maintenance by single-cell RNA sequencing (scRNA-seq) before complete HSC loss. Three days after AAVS1 or MECOM perturbation, we sorted CD34+CD45RA−CD90+ HSPCs and performed scRNA-seq using the 10x Genomics platform. We used Celltypist22 to delineate cellular identity based on lineage-specific signatures and identified 11 cell clusters (Fig. 2a), of which only the earliest HSC cluster was significantly depleted after MECOM editing (Fig. 2b,c and Extended Data Fig. 2a). Next we examined cells expressing an HSC molecular signature (CD34, HLF and CRHBP)23, which is found in a rare subpopulation representing only 0.6% of 263,828 UCB cells from the Immune Cell Atlas (Extended Data Fig. 2b,c). MECOM perturbation led to a significant loss of cells expressing the HSC signature (Fig. 2d,e and Extended Data Fig. 2d). To examine the gene expression changes in this population of transcriptional LT-HSCs, we again edited UCB CD34+ HSPCs and sorted for phenotypic CD34+CD45RA−CD90+CD133+EPCR+ITGA3+ LT-HSCs. We found that our sorted phenotypic LT-HSCs are highly enriched for the HSC signature (Fig. 2f and Extended Data Fig. 2e–g). Next, we compared the transcriptomes of 5,935 MECOM-edited and 4,291 AAVS1-edited phenotypic LT-HSCs. Following our stringent immunophenotypic sorting strategy, MECOM-edited LT-HSCs colocalized with AAVS1-edited cells (Fig. 2g). This confirmed that our sorting strategy would allow us to directly compare developmentally stage-matched cells before they are completely lost, to uncover transcriptional changes that underlie the profound depletion of LT-HSCs after MECOM editing.
As an orthogonal approach to simultaneously profile the precise genomic editing outcome and transcriptional profile of LT-HSCs, we employed genome and transcriptome sequencing (G&T-seq)24. MECOM heterozygous cells (Fig. 1d) colocalize with AAVS1-edited cells, as well as the non-genotyped cells examined with the 10x Genomics method (Fig. 2h). These results reveal a high degree of similarity in the high-dimensional transcriptomic analysis of LT-HSCs following MECOM perturbation, as expected given the stringent phenotypic sorting strategy we employed before scRNA-seq analysis. Furthermore, these results suggest that the profound functional consequences of MECOM loss are due to coordinated expression changes in a select group of genes.
MECOM loss in LT-HSCs elucidates a dysregulated gene network
To compare individual gene expression in single LT-HSCs following AAVS1 or MECOM editing, we used model-based analysis of single-cell transcriptomes (MAST)25 (Fig. 3a and Extended Data Fig. 3a,b). Despite the high-dimensional transcriptional similarity in the LT-HSCs, we detected significant downregulation of a group of 322 genes following MECOM editing that we refer to as ‘MECOM down’ genes (Supplementary Table 2), which includes factors with previously described functions in HSC maintenance (Fig. 3a,b). We then used MAST to identify 402 genes that are significantly upregulated after MECOM editing, which we refer to as the ‘MECOM up’ gene set (Supplementary Table 2), which includes key factors expressed during hematopoietic differentiation (Fig. 3a,c). To validate these subtle differences, we performed random permutation analysis and did not detect any differentially expressed genes (Extended Data Fig. 3c,d).
To minimize the potential confounding influence of allelic dropout, we performed pseudobulk analysis of gene expression changes following MECOM perturbation26. We observed that the MECOM down and up gene sets again represented the most differentially expressed genes with larger expression differences compared to the single-cell analysis (Fig. 3d). To validate that the gene expression differences that we observed in the population of immunophenotypic LT-HSCs accurately represented gene expression changes in molecularly defined LT-HSCs, we examined expression of each differentially expressed gene in the subset of cells with robust expression of the HSC signature. There was significant correlation of gene expression changes in this subpopulation of transcriptionally defined LT-HSCs compared to the total population of immunophenotypic LT-HSCs, demonstrating that MECOM network genes were indeed differentially expressed in cells with a stringent molecular HSC signature (Extended Data Fig. 3e). As further validation of this gene signature, we examined differential gene expression in bulk phenotypic LT-HSCs at days 3, 7 and 10 after MECOM perturbation and detected significant and consistent changes of the MECOM down and MECOM up gene sets at all time points (Fig. 3e).
Next, we sought to uncover differential gene expression patterns between AAVS1- and MECOM-edited HSPCs in each of the 11 hematopoietic cell clusters identified in our initial scRNA-seq profiling of CD34+CD45RA−CD90+ cells. The MECOM down genes were significantly depleted from the HSC and cycling multipotent progenitor clusters, but not in other early progenitor populations, including megakaryocyte-erythroid progenitors, megakaryocyte-erythroid-mast cell progenitors and common myeloid progenitors. Early megakaryocytes and mast cell progenitors also had differential expression of MECOM down genes (Extended Data Fig. 3f). Combining these data with the observed cell numbers in each cell cluster after MECOM perturbation revealed that only the HSC cluster was depleted (Extended Data Fig. 2a), providing further support for the notion that the MECOM down gene set is crucial for HSC maintenance. Gene set enrichment analysis (GSEA) for the MECOM up genes in each cluster revealed that these genes were significantly enriched in 7 out of the 11 cell clusters (Extended Data Fig. 3f), suggesting that MECOM up genes are expressed in cells undergoing differentiation into multiple lineages. We then evaluated the expression of the MECOM down and up genes during normal hematopoiesis by comparing the enrichment of the gene sets in 20 distinct hematopoietic cell lineages27. Similar to MECOM itself (Fig. 3f), the MECOM down genes are collectively more highly expressed in HSCs and early progenitors (Fig. 3g). Conversely, the MECOM up genes are turned on during hematopoietic differentiation and are more highly expressed in differentiated cells of various lineages (Fig. 3h). Collectively, these analyses reveal that MECOM loss in LT-HSCs leads to functionally significant transcriptional dysregulation in genes that are fundamental to HSC maintenance and differentiation.
Increased MECOM expression rescues HSC dysregulation
To confirm that the functional and transcriptional impacts on LT-HSCs are due specifically to reduced MECOM levels, we sought to rescue the phenotype by lentiviral MECOM expression in HSCs after CRISPR editing (Fig. 4a). To avoid unintended CRISPR disruption of the virally encoded MECOM complementary DNA, we introduced wobble mutations in the single guide RNA (sgRNA) binding site in the cDNA (Extended Data Fig. 4a,b). Infection of MECOM-edited HSPCs with MECOM virus led to supraphysiologic levels of MECOM expression (Fig. 4b), which was sufficient to rescue the LT-HSC loss observed after MECOM editing (Fig. 4c,d and Extended Data Fig. 4c,d). Expression of the shorter MECOM isoform EVI1 resulted in a higher percentage of LT-HSCs on day 6, but this increase was blunted by endogenous MECOM editing. Expression of the MDS isoform did not result in rescue of LT-HSCs (Extended Data Fig. 4e). Green fluorescent protein (GFP) is coexpressed with MECOM and we observed a significantly higher ratio of GFP expression in LT-HSCs compared to the bulk population (Fig. 4e), confirming that increased MECOM expression favored LT-HSC preservation. Increased MECOM expression also rescued the loss of multipotent and bipotent progenitor colonies after MECOM editing (Fig. 4f). Together, these data reveal that restoration of the full-length MECOM isoform is sufficient to overcome the functional loss of LT-HSCs caused by endogenous MECOM perturbation.
Next, we performed RNA-seq of phenotypic LT-HSCs after MECOM editing and rescue. After MECOM perturbation alone, we observed significantly lower expression of the MECOM down gene set compared to a subset of randomly selected genes (Fig. 4g). Similarly, GSEA revealed significant depletion of the MECOM down genes (Fig. 4h). Following rescue by increasing MECOM expression, the MECOM down genes were significantly upregulated (Fig. 4i,j and Supplementary Table 3). While increasing MECOM expression can rescue the impact of MECOM perturbation in short-term in vitro contexts, due to the risk of leukemic transformation driven by constitutive MECOM overexpression12, it is challenging to assess this rescue of HSC function in vivo.
We did not observe upregulation or subsequent rescue of the MECOM up genes in bulk following MECOM perturbation and overexpression (Extended Data Fig. 4g,h). The MECOM up gene set contains factors important for hematopoietic differentiation. Lentiviral infection may subtly alter this process. Alternatively, the supraphysiologic expression that we obtained may not allow effective regulation of the MECOM up genes. Regardless, these data collectively show that the loss of LT-HSCs after MECOM editing can be rescued with increased MECOM expression and is accompanied by restoration of the MECOM down gene set.
Defining the HSC cis-regulatory network mediated by MECOM
We next sought to define the cis-regulatory elements (cisREs) that control expression of the MECOM network, which underlies HSC self-renewal. To do so, we developed HemeMap, a computational framework to identify putative cisREs and cell-type-specific cisRE-gene interactions by integrating multiomic data from 18 hematopoietic cell populations (Fig. 5a and Extended Data Fig. 5a,b)28–32. We calculated HemeMap scores based on chromatin accessibility for each cisRE-gene interaction in HSCs and found that the scores were correlated with gene expression (Extended Data Fig. 5c). There was significant overlap of the predicted enhancer–gene pairings from HemeMap with chromatin looping data in hematopoietic progenitors29 and predicted regulatory elements in HSPCs33. Our cisREs had a strong H3K4me1 signal and DNase hypersensitivity without an H3K27me3 signal, consistent with their likely identities as enhancer elements (Extended Data Fig. 5d). All of the interactions with a significant HemeMap score in HSCs were selected to construct an HSC-specific regulatory network (Extended Data Fig. 5e).
To identify cooperating transcription factors (TFs) driving expression of the MECOM network genes in HSCs, we performed unbiased motif discovery within the MECOM network cisREs and found six significantly enriched motifs: ETS, RUNX, JUN, KLF, CTCF and GATA (Fig. 5b). The ETS family motif (AGGAAGT) was most highly enriched and can be bound by several hematopoietic TFs, including FLI1, ERG, ETV2 and ETV6 (ref. 34). Additionally, the experimentally determined binding motif of EVI1 in AML13, is a near perfect mimic of our nominated ETS motif, suggesting that many of these cisREs may be directly occupied by MECOM (Fig. 5c). Notably, HemeMap scores were significantly higher in cisREs with ETS motifs compared to those without (Extended Data Fig. 5f).
Next, we performed digital genomic footprinting analyses to predict TF occupancy in HSCs (Supplementary Tables 4 and 5 and Fig. 5d). We observed a significant co-occurrence of footprints across TF pairs, with a particular enrichment of overlap between ETS with RUNX, JUN and GATA footprints, suggesting cooperativity between these TFs (Fig. 5e and Extended Data Fig. 5g,h). We evaluated specific TF binding to the MECOM network cisREs by integrating TF ChIP-seq data from human HSPCs35. Consistent with the footprinting analysis, we found highly enriched TF occupancy of the ETS family member FLI1, as well as RUNX1 and GATA2 in HSPCs (Fig. 5f). These ChIP-seq data are derived from bulk CD34+ HSPCs, so while they provide a general indication of TF binding in HSPCs, there may be important differences in TF binding in LT-HSCs. As further evidence of TF cooperativity, we found that FLI1, RUNX1 and GATA2 have significant co-occupancy at the MECOM-regulated gene cisREs in HSPCs (Fig. 5g). Additionally, we examined EVI1 binding data from overexpression studies14 and found significant overlap with cisREs that contain ETS footprints (Extended Data Fig. 5i). These analyses from heterogenous populations of hematopoietic progenitors provide support for our model of cooperativity between MECOM and other hematopoietic TFs (these datasets are summarized in Supplementary Table 6).
Dynamic CTCF binding represses MECOM down genes
In addition to the enrichment of HSC TF motifs, the MECOM network cisREs showed CTCF motif enrichment. CTCF is a regulator of three-dimensional genome organization and acts by anchoring cohesin-based chromatin loops to insulate genomic regions of self-interaction36. Recently, CTCF has been implicated in regulating HSC differentiation by altering looping to silence key stemness genes37, while also cooperating with lineage-specific TFs during hematopoietic differentiation38. Therefore, we hypothesized that CTCF plays a role in mediating the differential expression of MECOM down genes following loss of MECOM.
We uncovered CTCF footprints in bulk CD34+ HSPCs (Fig. 6a) and significant co-occurrence of CTCF with ETS, RUNX, JUN and KLF footprints in the cisREs of MECOM down genes (Fig. 6b). On average, the distance between ETS and CTCF footprints in our cisREs was 36 base pairs (Extended Data Fig. 6a). We observed significant CTCF binding to the nominated cisREs (Fig. 6c). We found CTCF occupancy of nominated footprints was highly conserved across erythroid cells, T cells, B cells and monocytes (Fig. 6d and Extended Data Fig. 6b). In HSPCs, CTCF binding was measured in bulk CD34+ cells, which contain LT-HSCs and numerous other progenitors. Despite the heterogeneity of the HSPC compartment, terminally differentiated cells showed significantly stronger CTCF signals compared to the CD34+ HSPCs and chromatin accessibility at those loci decreased during hematopoietic differentiation (Extended Data Fig. 6c–e). Although these analyses do not allow for a sensitive description of CTCF binding throughout the many intermediate stages of hematopoietic differentiation, they reveal increased binding of CTCF to the cisREs of MECOM down genes in differentiated cells in comparison with the heterogenous population of CD34+ HSPCs.
To gain mechanistic insights into the role of CTCF in the MECOM-driven regulation of HSC quiescence, we analyzed an overall set of 7,358 chromatin loops from studies of HSCs37, as well as a subset of loops whose anchors colocalized with MECOM network cisREs. These loops were elucidated in the OCI-AML2 cell line, which was previously used to extrapolate differential looping as LT-HSCs exit quiescence37. In total, 448 chromatin interactions were identified for MECOM down genes and the loop anchors showed a strong enrichment of CTCF footprints (Extended Data Fig. 6f). Next, we performed aggregate peak analysis to compare the genomic organization of the MECOM down genes upon exit from quiescence by integrating Low-C chromatin interaction data from phenotypic LT-HSCs and short-term (ST)-HSCs. Using all 7,358 common chromatin loops, there was significant enrichment of chromatin interaction apices in both LT-HSCs and ST-HSCs, as previously observed37, but there was no significant difference between the populations. Analysis of the chromatin loops of CTCF footprint-containing cisREs associated with MECOM down genes revealed significantly stronger chromatin interactions in ST-HSCs compared to LT-HSCs. There was no chromatin interaction difference in MECOM down genes that lacked association with a CTCF footprint-containing cisRE (Fig. 6e,f). These observations are consistent with the concept that CTCF activity at the cisREs of MECOM down genes induces tighter chromatin looping and restricts gene expression, promoting differentiation of HSCs, as exemplified by the increased chromatin looping at MLLT3 and MEF2C concordant with their silencing as LT-HSCs differentiate (Fig. 6g,h).
To validate their functional interaction, we performed simultaneous MECOM and CTCF perturbation in primary human HSPCs (Extended Data Fig. 6g) and observed that concurrent CTCF perturbation was sufficient to rescue the loss of LT-HSCs (Fig. 6i) and prevent the increased expansion of HSPCs caused by MECOM perturbation (Extended Data Fig. 6h). GSEA revealed significant depletion of MECOM down genes and significant upregulation of MECOM up genes following MECOM compared to AAVS1 editing, corroborating our observations from single cells (Extended Data Fig. 6i). When compared to the AAVS1 sample, CTCF editing alone resulted in significant enrichment of the MECOM down gene set, but no significant changes in the MECOM up genes (Extended Data Fig. 6j). Dual editing of MECOM and CTCF resulted in significant upregulation of MECOM down genes (Fig. 6j) and significant depletion of MECOM up genes (Fig. 6k). Upon dual perturbation, there was significantly greater rescue of MECOM down genes that are associated with cisREs containing CTCF binding motifs compared to those without CTCF motifs (Extended Data Fig. 6k). These data demonstrate that MECOM plays a key role in activating the expression of genes critical for HSC maintenance, which are then subject to genomic reorganization by CTCF upon differentiation.
The MECOM gene network is hijacked in high-risk AMLs
Having elucidated a fundamental transcriptional regulatory network necessary for HSC maintenance, we wondered to what extent this network may be relevant to leukemia. First, we combined 165 primary adult AML samples from The Cancer Genome Atlas (TCGA)39 with 430 adult samples from the BEAT AML dataset40 into an adult AML cohort (Fig. 7a). We found significant enrichment of the MECOM down gene set in clinical samples with high MECOM expression levels (Extended Data Fig. 7a). We analyzed this adult AML cohort in parallel with 440 pediatric AML samples from the TARGET AML dataset41 (Fig. 7b). Using optimal thresholding to stratify patients by MECOM expression, we observed a survival disadvantage in both adult and pediatric AML (Fig. 7c), consistent with previous reports42,43.
Given the importance of the MECOM down gene network in HSC maintenance, we sought to determine whether expression of this network was associated with survival in AML. Using GSEA, we determined whether individual patient AML samples had enrichment or depletion of the MECOM down gene set (Extended Data Fig. 7b–d). Enrichment of the MECOM down gene set was associated with worse survival in both the adult (hazard ratio (HR) 1.52 (95% CI 1.13–2.04), P = 0.005) and pediatric AML cohorts (HR 1.96 (95% CI 1.38–2.69), P = 7.4 × 10−5; Fig. 7d).
We then generated a rank order list based on the normalized enrichment score (NES) for each sample to allow for further stratification based on the degree of network enrichment. We used optimal thresholding to stratify patients based on NES and found significantly worse overall survival in patients with high MECOM NES compared to patients with low NES in both adult (HR 1.58 (95% CI 1.18–2.11), P = 0.0016) and pediatric (HR 2.08 (95% CI 1.49–2.89), P = 3.6 × 10−5) patients (Fig. 7e).
Stratification based on clinical risk group or LSC17 score44 had significant associations with survival (Fig. 7f,g) and we sought to determine whether MECOM network enrichment identified the same subgroup of high-risk patients. We observed that 48% of adult AML and 51% of pediatric AML with adverse clinical risk features also had MECOM network enrichment. Similarly, we found that 51% of adult AML and 55% of pediatric AML with high LSC17 scores had MECOM network enrichment (Extended Data Fig. 7e,f). Thus, MECOM network enrichment identifies a largely unique subset of patients compared to currently available risk stratification tools.
Next, we investigated whether the addition of MECOM network enrichment to the clinical risk group or LSC17 score resulted in improved risk stratification. In the adult AML cohort, MECOM down gene set enrichment was independently associated with mortality particularly in patients with intermediate risk AML (P = 0.005) (Fig. 7h) and high LSC17 score (P = 0.01) (Fig. 7i). The contribution of MECOM network enrichment to clinical risk grouping was even more striking in the pediatric AML cohort in which MECOM network enrichment was significantly associated with mortality independent of clinical risk group (P = 0.008) (Fig. 7h) and, separately, independent of LSC17 score (P = 0.01) (Fig. 7i). These results reveal that stratification of primary AML patient samples by MECOM down gene enrichment can be integrated with currently available prognostic tools to improve risk stratification for overall survival in both adult and pediatric AML. Additionally, MECOM down network enrichment was significantly associated with lower event-free survival, independent of clinical risk group and LSC17 score in pediatric AML (P = 1.72 × 10−6 and P = 5.62 × 10−5, respectively) (Extended Data Fig. 7g,k).
Finally, we calculated marginal HRs to evaluate the degree of MECOM expression or MECOM network NES with overall survival. We observed a modest effect of incremental increases of MECOM expression on the marginal HR of survival (Fig. 7j) and a much more significant effect of incremental increases in MECOM NES (Fig. 7k). Together, these data reveal that the MECOM down network is highly enriched in a subset of adult and pediatric AMLs with poor prognosis and can be integrated with currently available prognostic tools to improve risk stratification for patients with AML.
Validation of MECOM addiction in a subset of high-risk AMLs
Given the prognostic significance of MECOM network enrichment in AML, we sought to further study this network in AML cell lines. We examined 44 AML cell lines from the Cancer Cell Line Encyclopedia (CCLE) and stratified them based on MECOM expression (Extended Data Fig. 8a). We compared gene expression in MECOM-high compared to MECOM-low AML cell lines and found significant enrichment of MECOM down genes and depletion of MECOM up genes. (Fig. 8a). Comparison of gene expression in individual MECOM-high AML cell lines to the average expression in MECOM-low AML lines revealed highly significant MECOM network enrichment in MUTZ-3, F36P, HNT34 and OCI-AML4 cells (Extended Data Fig. 8b). We compared CRISPR dependencies of MECOM-high and MECOM-low AML cell lines and observed differential essentiality of RUNX1, consistent with our findings of potential cooperativity between RUNX1 and MECOM in regulating the HSC network genes (Extended Data Fig. 8c).
To validate the role of the MECOM network in an otherwise isogenic AML background, we performed CRISPR editing of MECOM in the MUTZ-3 AML cell line45,46. MUTZ-3 cells maintain a population of primitive CD34+ blasts in culture that can self-renew or differentiate into CD14+ monocytes (Fig. 8b and Extended Data Fig. 8d). MECOM editing in MUTZ-3 cells (Fig. 8c) resulted in significant reduction in MECOM expression level (Fig. 8d) and a loss of primitive CD34+ cells (Fig. 8e). Loss of progenitors after MECOM perturbation was accompanied by enrichment of edited MECOM alleles, as MECOM perturbed cells underwent greater expansion (Extended Data Fig. 8e). Maintenance of CD34+ cells was restored by lentiviral MECOM expression, but not lentiviral expression of the EVI1 isoform (Fig. 8f), consistent with our rescue data from primary HSPCs (Extended Data Fig. 4e). RNA-seq of CD34+ progenitor MUTZ-3 cells after MECOM editing revealed significant depletion of MECOM down genes and significant enrichment of MECOM up genes (Fig. 8g, Extended Data Fig. 8f and Supplementary Table 7), Additionally, MECOM perturbation in HNT34 AML cells led to significant depletion of MECOM down genes and significant enrichment of MECOM up genes (Fig. 8h), revealing the conservation of this gene regulatory network in multiple AML contexts.
Because of the functional interaction between MECOM and CTCF in the transcriptional control of LT-HSC quiescence, we reasoned that the loss of MUTZ-3 progenitors following MECOM perturbation may also be dependent on CTCF. We performed dual CRISPR editing of MECOM and CTCF and observed partial rescue of the loss of CD34+ progenitors induced by MECOM perturbation alone (Fig. 8i). The more modest rescue of progenitors in the MUTZ-3 system compared to the LT-HSC model (Fig. 6i) may be a function of less efficient CTCF editing in MUTZ-3 cells (Extended Data Fig. 8g).
To evaluate binding of CTCF to the cisREs of MECOM network genes, we generated a Cas9 and GFP expressing MUTZ-3 cell line which, we infected with a lentivirus encoding an sgRNA targeting AAVS1 or MECOM along with red fluorescent protein (RFP). We observed a gradual loss of CD34+ cells following MECOM sgRNA delivery and on day 4 after editing we examined CTCF binding in CD34+ MUTZ-3 progenitors by ChIP-seq before complete loss of CD34+ progenitors. In the AAVS1-treated samples, we observed strong CTCF binding in the cisREs of MECOM network genes that contain CTCF footprints (Extended Data Fig. 8h). There was no difference in CTCF binding after MECOM editing, suggesting that the co-regulation of MECOM network genes by CTCF is not due to differential CTCF chromatin occupancy in CD34+ MUTZ-3 cells, but may instead be due to differential cofactor interactions or chromatin looping. Collectively, these data reveal that the MECOM regulatory gene network co-regulated by CTCF is indispensable for AML progenitor maintenance.
Discussion
A greater fundamental understanding of the transcriptional circuitry that enables human HSCs self-renewal holds considerable promise for future mechanistic studies of HSC function and therapeutic applications. For instance, with emerging advances in gene therapy and genome editing of HSCs, the ability to better maintain and manipulate these cells both ex and in vivo would be clinically beneficial47; however, the limitations in our molecular understanding of this regulatory process have hampered such efforts.
Here, we have taken advantage of a rare experiment of nature to illuminate fundamental transcriptional circuitry that is required for human HSC maintenance in vivo. We have followed up on the human genetic observation that MECOM haploinsufficiency results in early-onset bone marrow failure and by modeling this disorder in primary HSPCs, we show that the functional loss of HSCs is accompanied by alterations in a network of genes critical for HSC maintenance. The identification of this gene network highlights the need to couple rigorous functional assays that nominate cellular vulnerabilities with integrative genomic profiling and analyses. Our results demonstrate how subtle gene expression changes can translate into major defects in HSC maintenance and uncover additional regulators of HSCs that can be subject to systematic perturbation studies in the future.
Through integrative genomic analysis of this network, we have gained insights into critical gene targets and have elucidated cooperative interactions among hematopoietic TFs involved in HSC function. We identify an antagonistic role for CTCF in altering chromatin looping of MECOM network genes as the cells differentiate and validate this interaction by functional and molecular rescue, illuminating fundamental transcriptional circuitry required for human HSC maintenance. We also find that this very same network is co-opted in AMLs with poor prognosis. A notable finding is that the MECOM regulatory network serves as a better predictor of poor outcome than does MECOM expression itself, suggesting that some AMLs may augment MECOM function in a manner beyond expression changes. This will be an important area for future exploration. It is also notable that leukemias arising due to insertional mutagenesis following human gene therapy trials have resulted in activation of MECOM48. Clones with increased MECOM expression often have a long latency, but can result in a more aggressive disease course. Our finding that an HSC regulatory program is co-opted by increased MECOM expression may help explain these perplexing clinical observations. A deeper understanding of how such stem cell networks are utilized in malignant states may enable improved therapeutic approaches and provide opportunities to expand and manipulate non-malignant HSCs for therapeutic benefit.
Methods
Data reporting
No statistical methods were used to predetermine sample sizes but our sample sizes are similar to those reported in previous publications16,23,. Data distribution was assumed to be normal but this was not formally tested. Data collection and analysis were not performed blind to the conditions of the experiments. No animals or data points were excluded from analysis.
Cell line and primary cell culture
HSPCs were purified from discarded UCB samples of healthy male or female newborns using the EasySep Human CD34 Positive Selection Kit II following pre-enrichment using the RosetteSep Pre-enrichment cocktail (Stem Cell Technologies) and mononuclear cell isolation on Ficoll-Paque (GE Healthcare) density gradient. Cells were cryopreserved for later use. Granulocyte colony-stimulating factor mobilized adult CD34+ HSPCs and were purchased (Fred Hutchinson Cancer Research Center). Thawed cells were cultured at 37 °C and 5% O2 in serum-free HSC medium consisting of StemSpan II medium (Stem Cell Technologies) supplemented with CC100 cytokine cocktail (Stem Cell Technologies), 100 ng ml−1 TPO (Peprotech) and 35 nM UM171 (Stem Cell Technologies). Confluency was maintained between 2 × 105 and 1 × 106 cells per ml.
MUTZ-3 cells (DSMZ) were cultured at 37 °C in α-MEM (Life Technologies) supplemented with 20% FBS, 20% conditioned medium from 5,637 cells (ATCC)49 and 1% penicillin/streptomycin. Confluency was maintained between 7 × 105 and 1.5 × 106 ml−1.
HNT34 cells (Creative Bioarrray) were cultured at 37 °C in α-MEM (Life Technologies) supplemented with 20% FBS, 20% conditioned medium from 5,637 cells (ATCC)49 and 1% penicillin/streptomycin. Confluency was maintained between 5 × 105 and 1.5 × 106 ml−1.
The 293T cells were cultured at 37 °C in DMEM (Life Technologies) supplemented with 10% FBS and 1% penicillin/streptomycin.
Mouse model
NOD.Cg-KitW-41JTyr+PrkdcscidIl2rgtm1Wjl (NBSGW) mice were obtained from the Jackson Laboratory (stock 026622)21. Littermates of the same sex were randomly assigned to experimental groups. NBSGW were interbred to maintain a colony of animals homozygous or hemizygous for all mutations of interest. The Institutional Animal Care and Use Committee at Boston Children’s Hospital approved the study protocol and provided guidance and ethical oversight
CRISPR editing and analysis
Electroporation was performed on day 1 after thawing HSPCs using the Lonza 4D Nucleofector with 20 µl Nucleocuvette strips as described23,50. Briefly, the RNP complex was made by combining 100 pmol Cas9 (IDT) and 100 pmol modified sgRNA (Synthego) targeting MECOM (5′-CAAGGTCTGCAAACCTAACA-3′), AAVS1 (5′-GGGGCCACTAGGGACAGGAT-3′) or CTCF (5′-CAATTCTCCACTGGTCACAA-3′) and incubating at 21 °C for 15 min. Between 2 × 105 and 4 × 105 HSPCs resuspended in 20 µl P3 solution were mixed with RNP and underwent nucleofection with program DZ-100. For samples that underwent dual perturbation, total amounts of 100 pmol Cas9 and 100 mol sgRNA (50 pmol each guide) were used. Cells were returned to HSC medium and editing efficiency was measured by PCR at 48 h after electroporation, unless otherwise indicated. First, genomic DNA was extracted using the DNeasy kit (QIAGEN) or both DNA and RNA were extracted using the AllPrep DNA/RNA Mini kit (QIAGEN) according to the manufacturer’s instructions. Genomic PCR was performed using Platinum II Hotstart Mastermix (Thermo Fisher Scientific) and edited allele frequency was detected either by Sanger sequencing and analyzed by ICE (ice.syngthego.com) or NGS and analyzed with Crispresso2 (ref. 51). The following primer pairs were used: MECOM-ICE (forward: 5′-ACATCAACCCAGAATCAGAAAC-3′; reverse: 5′-GGAAAAGGAAGGCTGCAAAG-3′); MECOM-NGS (forward: 5′-AGAAATGTGAGTTCCATGCAAGA-3′; reverse: 5′-AGCAAATATCATTGTCAGACCTGT-3′); and CTCF (forward: 5′-CAGCGGATTCAGATGGGTAA-3′; reverse: 5′-TCACCGTTTTAGCCAGGATG-3′). The effect on MECOM mRNA after editing was detected by quantitative PCR with reverse transcription (qRT–PCR) using SYBR green (Bio-Rad) after cDNA synthesis with iScript (Bio-Rad).
MUTZ-3 cells were edited as above with the following modification: cells were resuspended in 20 µl SF solution and program EO-100 was used for electroporation.
Viral constructs and transduction
MDS and EVI1 cDNA were synthesized from mRNA of human HSPCs using the following primers: MDS (forward: 5′-CGTACTCGAGGCCGCCACCATGAGATCCAAAGGCAGGGCAA-3′; reverse: 5′-TACGGAATTCTCACTCCCATCCATAACTGGGGTCT-3′); and EVI1 (forward: 5′-CGTACTCG AGGCCGCCACCATGATCTTAGACGAATTTTACAATG-3′; reverse: 5′-TACGGAATTCTCATACGTGGCTTATGGACTGG-3′). MECOM cDNA was synthesized using MDS-F and EVI1-R primers. Wobble mutations were introduced to disrupt the sgRNA binding site using the following primers EVI1-F and wobble reverse (5′-GTGCCGAGTGAGATTCGCGGATCTAGGAAAAAT-3′) and wobble forward (5′-ATTTTTCCTAGATCCGCGAATCTCACTCGGCAC-3′) with EVI1-R, followed by overlap PCR of the two fragments. Primers included restriction enzyme sites to allow for cloning using EcoRI and XhoI into the HMD IRES–GFP backbone52.
The lentiviral pXPR_049 plasmid was obtained from the Genomics Perturbation Platform at the Broad Institute and RFP was cloned in place of the puromycin resistance gene. sgRNA sequences targeting AAVS1 or MECOM as described above were cloned into pXPR_049-RFP using BsmBI. The lentiviral pXPR_104 plasmid encoding Cas9v3-2A-GFP was also obtained from the Broad Institute Genomics Perturbation Platform.
To produce lentivirus, approximately 24 h before transfection, 293T cells were seeded in 10-cm plates. Cells were co-transfected with 10 µg pΔ8.9, 1 µg VSVG and 10 µg HMD vector variant, Cas9–GFP or sgRNA–RFP using calcium phosphate. The medium was changed the following day and viral supernatant was collected 48 h after transfection, filtered with a 0.45-µm filter and concentrated by ultracentrifugation at 100,000g for 2 h at 4 °C.
For lentiviral rescue experiments, 24 h after CRISPR nucleofection, 1 × 105 HSPCs were transduced at a multiplicity of infection (MOI) of 10, with HMD empty, MDS, EVI1 or MECOM virus in 12-well plates with 8 µg ml−1 of polybrene (Millipore), spun at 931g for 1.5 h at 21 °C and incubated in the viral supernatant overnight at 37 °C. Virus was washed off 16 h after infection.
MUTZ-3 cells were transduced at an MOI of 1 by spinfection at 1,455g for 1.5 h at 21 °C and were incubated in the viral supernatant overnight. Virus was washed off 16 h after infection. MUTZ-3 cells underwent viral transduction first, followed by CRISPR editing at 48 h after infection. MUTZ-3 or HNT34 cell lines expressing Cas9–GFP were generated by spinfection followed by GFP purification and subsequent spinfection with sgRNA–RFP virus and a second sorting for GFP+RFP+ cells.
Transplantation assays
Non-irradiated NBSGW mice (between 4–8 weeks of age) were tail vein injected with UCB or adult CD34+ HSPCs (1–2 × 105 cells) on day 3 after CRISPR editing. Peripheral blood was sampled monthly by retro-orbital sampling and animals were killed at 16 weeks for BM evaluation. Secondary transplantations were performed by directly transplanting 60% of total BM cells from primary recipients into secondary non-irradiated NBSGW recipients. Human chimerism was assessed by evaluation of the BMs of secondary recipients at 16 weeks by flow cytometry and MECOM sequencing.
Flow cytometry and cell sorting
Cells were washed with PBS and stained with the following panel of antibodies to quantify and enrich for LT-HSCs: anti-CD34-PerCP-Cy5.5 (BioLegend, 343612), anti-CD45RA-APC-H7 (BD, 560674), anti-CD90-PECy7 (BD, 561558), anti-CD133-super bright 436 (eBioscience, 62-1338-42), anti-EPCR-PE (BioLegend, 351904) and anti-ITGA3-APC (BioLegend, 343808). LT-HSCs were defined by the following immunophenotype: CD34+CD45RA−CD90+CD133+ITGA3+EPCR+ (ref. 16). Three microliters of each antibody were used per 1 × 105 cells in 100 µl. Total LT-HSC numbers were calculated as a product of the frequency of LT-HSCs by flow cytometry and total cell number in culture.
Human cell chimerism after xenotransplantation was determined by staining with anti-mouse CD45-FITC (BioLegend, 103108) and anti-human CD45-APC (BioLegend, 368512). Human cell subpopulations were detected in the BM of transplanted mice using the following antibodies: anti-human CD45-APC (BioLegend, 368512), anti-human CD3-Pacific Blue (BioLegend, 344823), anti-human CD19-PECy7 (BioLegend, 302215), anti-human CD11b-FITC (BioLegend, 301330), anti-human CD41a-FITC (eBioscience, 11-0419-42), anti-human CD34-Alexa 488 (BioLegend, 343518) and anti-human CD235a-APC (eBioscience, 17-9987-42). Aliquots were stained individually for CD34 and CD235 or with CD45 in conjunction with the other lineage-defining markers. Mice with human cell chimerism <2% in the BM were excluded from subpopulation analysis.
MUTZ-3 cells were stained with anti-CD34-APC (BioLegend, 343607) and anti-CD14-PECy7 (BioLegend, 367112).
Flow cytometric analyses were conducted on a BD LSRII, LSR Fortessa or Accuri C6 instruments and all data were analyzed using FlowJo software (v.10.8). FACS was performed on BD Aria and samples were collected in PBS containing 2% BSA and 0.01% Tween for immediate processing for sequencing on the 10x Genomics platform. Alternatively, single cells were sorted into PCR plates containing 5 µl Buffer RLT Plus (QIAGEN) with 1% BME and immediately frozen at −80 °C for G&T sequencing.
Cell cycle analysis
For cell cycle analyses, on day 5 after CRISPR editing, cells were incubated with 5-ethynyl-2′-deoxyuridine (EdU) (Thermo Fisher Scientific, C10634) for 2 h, then fixed and permeabilized before cell surface staining as per the manufacturer’s recommendations. Multipotent progenitors were defined by the immunophenotype CD34+CD45RA−CD90+CD133+. Pegasus v.1.0 (https://github.com/klarman-cell-observatory/pegasus) in the Terra environment (https://app.terra.bio/#) was used to determine the expression of transcriptional signatures of cell cycle status of single LT-HSCs53.
Analysis of cell division was performed by carboxyfluorescein succinimidyl ester (CFSE) labeling (Thermo, Fisher Scientific C34554). At 24 h after CRISPR editing, cells were incubated with CFSE, washed and subjected to flow cytometric analysis to establish a baseline and again on day 5. Proliferation modeling was performed in FlowJo v.10.8.0. Replication index was calculated in FlowJo v.10.8.0 as the total number of divided cells / cells that underwent at least one division.
Colony-forming unit cell assays
Three days after RNP electroporation, 500 CD34+ HSPCs were plated in 1 ml methylcellulose medium (H4034, Stem Cell Technologies) in triplicate unless otherwise noted. Primary colonies were counted after 14 d.
10x Genomics scRNA-seq
A suspension of 11,000 AAVS1-edited LT-HSCs and a suspension of 16,000 MECOM-edited LT-HSCs were loaded into two lanes of 10x RNA 3′ V3 kit (10x Genomics) according to the manufacturer’s guidelines. Libraries were constructed with distinct i7 barcodes, pooled in equal molecular concentrations and sequenced on one lane of Hiseq (Illumina) according to the manufacturer’s protocol. Briefly, 36 cycles were carried out for read1, 8 cycles for index1 and 90 cycles for read2, yielding ~15,000 reads per cell.
Bulk RNA-seq
Total RNA was extracted using the RNeasy Micro kit (QIAGEN, 74004) or using the 2.2× RNAClean XP kit (Beckman, A63987) from ~1,000 cells sorted in 25 µl Buffer RLT Plus with 1% BME. Then we proceeded with the SmartSeq2 protocol from the reverse transcription step using 10 ng of RNA54. The whole transcriptome amplification step was set at ten cycles. The 15 bulk RNA libraries were pooled at equal molecular concentration and sequenced using the NextSeq550 High Output or Novaseq kit (Illumina) with 35 paired-end reads.
Genome and transcriptome sequencing
Plates of sorted LT-HSCs were thawed from −80 °C on ice and an equal volume of prepared 2× Dynabeads was added. Samples were incubated at 72 °C for 1 min, then 56 °C for 2 min, followed by 10 min at 25 °C to allow for mRNA hybridization. Plates were placed on a magnet for 2 min and 8 µl of the supernatant containing genomic DNA (gDNA) was transferred into a new plate. Beads were washed twice in 10 µl of cold 1× Hybridization Buffer and once in PBS + RNase Inhibitor. All washes were transferred to the gDNA plate. Once PBS was removed, Dynabeads were immediately resuspended in 7.34 µl of SmartSeq2 Mix 1 and the plate was incubated at 80 °C for 3 min. The plate was immediately placed on the magnet and the supernatant containing mRNA was rapidly transferred into a new plate on ice. Then, 2.66 µl of SmartSeq2 Mix 2 was added. At this point, we proceeded with the SmartSeq2 protocol from the reverse transcription step54. The whole transcriptome amplification step was set at 23 cycles. gDNA which was present in the pooled supernatant/wash buffer was precipitated on DNA SPRI beads at a 0.6× ratio and eluted in 10 µl MDA Hyb buffer, denatured at 95 °C for 3 min and cooled on ice. Then 5 µl of Phi29 Mix was added and the mix was incubated at 45 °C for 8 h. The reaction was deactivated at 65 °C for 5 min. The MDA plate was stored at −20 °C. Eight plates of mRNA libraries were sequenced using the Nextseq550 high output kit (Illumina) with 35 paired-end reads according to the manufacturer’s recommendations. To genotype each cell based on MECOM editing status, MECOM from gDNA and whole transcriptome analysis was amplified by PCR and libraries were constructed, pooled and sequenced using the Miseq 300 cycle kit (Illumina) according to manufacturer’s protocol with 150 paired-end reads.
ChIP-seq
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) was performed on chromatin from 2×106 CD34+MUTZ-3 after MECOM or AAVS1 editing. Sorted cells were cross-linked with 1% methanol-free formaldehyde (Pierce Life Technologies, 28906), quenched with 0.125 M glycine and frozen at −80 °C and stored until further processing. ChIP reaction was performed with iDeal ChIP-seq kit for TFs (Diagenode, C01010055) with modifications of the manual detailed below. Lysed samples were sonicated using the E220 sonicator (Covaris, 500239) in microTUBE AFA Fiber Pre-Slit Snap-Cap tubes (Covaris, 520045) with settings for 200-bp DNA shearing. Sheared chromatin was immunoprecipitated with 2.5 μg CTCF antibody (Abcam, ab128873, RRID AB_11144295) or 2.5 μg IgG antibody (Diagenode, C15410206, RRID AB_2722554). Eluted and decross-linked DNA was purified with MicroChIP DiaPure columns (Diagenode, C03040001) and eluted in 30 μl of nuclease-free water. ChIP and input libraries for sequencing were prepared with ThruPLEX DNA-Seq kit (Takara, R400674) and DNA Single Index kit, 12S Set A (Takara, R400695). Size selection steps were performed with Magbio Genomics HighPrep PCR beads (Fisher Scientific, 50-165-6582). The libraries were sequenced at Broad Institute Genomic Services by using the Illumina NextSeq 500 platform and the 150-bp paired-end configuration to obtain at least 30 million reads per sample.
Quantification and statistical analysis
Protein structure prediction
The MECOM sequence corresponding to amino acids 700–900 was submitted to the I-TASSER server for homology modeling55. The predicted structure of the zinc finger domain was rendered and visualized using PyMOL.
Bulk RNA data analysis
Fastq files demultiplexed by bcl2fastq from bulk RNA-seq run were uploaded to Terra and processed with the Cumulus pipeline for bulk RNA-seq53 to get gene counts and gene isoform matrices. Human reference genome GRCh38 and gene annotation reference Homo_sapiens.GRCh38.93.gtf were used in all the RNA analysis.
Single-cell RNA data analysis
BCL files generated by scRNA-seq were uploaded to Terra and processed with the Cumulus pipeline for 10x single-cell RNA data and SmartSeq2 (ref. 53) to get gene matrices. Human reference genome GRCh38 and gene annotation reference Homo_sapiens.GRCh38.93.gtf were used in all the RNA analyses. For 10x data, doublets were filtered out and cells that contained reads for 500 to 8,000 genes with the percent of mitochondrial genes <20% were included in the analysis; cells were not filtered based on unique molecular identifier counts. For SmartSeq2 data, Scanpy56 was used to integrate all plates and perform batch correction and normalization. Cells that contained reads for 2,000 to 20,000 genes with the percent of mitochondrial genes <20% were included. Genes expressed in at least 0.05% of cells were included. Scanorama57 was used for batch correction. SmartSeq2 and 10x data were integrated and batch correction was performed on donor, technology and process batch with a Python version of Harmony58. Celltypist22 was used to infer cell types with the Pan_Fetal_Human.pkl model.
MECOM genotyping in G&T data
MECOM editing was determined by CRISPResso2 (ref. 51). Genotyping from gDNA and from cDNA was combined for the same cell and cells that contained both an edited allele and a wild-type allele were defined as heterozygous. Genotyping annotation was integrated into gene matrix metadata.
Differential expression analysis
Differential expression analysis was performed by Seurat v.4.0 with the function FindMarkers pipeline in the 10x single-cell RNA data to compare AAVS1- and MECOM-edited LT-HSCs. The fold change threshold for significant gene expression was 0.05 on log2scale, ident.1 was AAVS1-edited cells, ident.2 was MECOM-edited cells and the test algorithm was MAST. Permutation analysis was performed by randomly assigning single cells to one of two groups irrespective of the initial experimental group and repeating differential expression analysis. One hundred independent permutations were performed.
Pseudobulk analysis
Raw counts from single LT-HSCs that passed the quality control from each experimental condition (AAVS1 or MECOM-edited) were aggregated to generate pseudobulk data for each group. Genes that did not reach the detection ratio cutoff used in the single-cell differential gene expression discovery were removed from the pseudobulk analysis. Log2 fold change between groups was calculated and correlation with gene expression data from single cells was calculated by Spearman’s rank correlation.
HSC signatures in the Immune Cell Atlas
Pegasus was used to determine the expression of the HSC signature (CD34, HLF and CRHBP)23 in umbilical cord samples from the Immune Cell Atlas (https://data.humancellatlas.org/explore/projects/cc95ff89-2e68-4a08-a234-480eca21ce79).
Gene signature enrichment during hematopoiesis
We measured the enrichment of the MECOM down or MECOM up gene sets during hematopoiesis, using bulk RNA-seq datasets across 20 hematopoietic subpopulations27. The observed expression for the tested gene set in each cell type was calculated by taking the mean expression of genes in the list. We performed 1,000 permutations in which we sampled gene sets with the same number of genes as the tested gene set. The expected expression for permuted gene set in cell type was calculated by taking the mean expression of genes in the list. The enrichment for gene set in cell type was computed as follows:
where the mean and variance of are taken over all values of P .
Gene set enrichment analysis
We used GSEApy (https://github.com/zqfang/GSEApy) for all GSEA analyses to determine the enrichment of MECOM network genes following MECOM editing and rescue and in the TCGA and CCLE datasets that were stratified based on MECOM expression or overall survival. Significant enrichment of the gene set was determined using a t-test for MECOM rescue in LT-HSCs and MUTZ-3 cells and diff_of_classes for TCGA analyses. Genes from CCLE data were preranked by determining mean expression for each gene in AML-high and AML-low cohorts and calculating log2 fold change. GSEA was performed using 1,000 permutations to determine significance.
Development of HemeMap
A detailed description is provided in the Supplementary Note59–65.
ChIP-seq data analysis
The raw ChIP-seq data35 for the binding sites of hematopoietic TFs FLI1, GATA2 and RUNX1 in human CD34+ HSPCs, were downloaded and processed. The paired-end reads were trimmed and aligned to hg19 reference genome using Trimmomatic and Bowtie2, respectively. MACS2 (ref. 66) was used for peak calling with the default narrow peak setting. Genomic tracks were generated from BAM files using counts per million mapped reads normalization to facilitate comparison between tracks. The processed CTCF ChIP-seq data from HSPCs and differentiated hematopoietic lineages were obtained from a previous study38. To determine the significance of the enrichment of TF occupancy within cisREs of MECOM network genes, a permutation test was performed. For each TF, we calculated the number of cisREs overlapping with ChIP-seq peaks. The expected distribution of overlapping cisREs was generated by 1,000 permutations of an equal number of TF peaks across the genome. The presence of TF peaks in cisREs were counted and the Venn plot was generated by the web app BioVenn (https://www.biovenn.nl). The enrichment of CTCF signal on the footprints was performed using deepTools software67. We used a Wilcoxon signed-rank test to evaluate the differences of normalized CTCF signals on footprints between HSPCs and other terminal blood cells, namely erythroid cells, T cells, B cells and monocytes.
CTCF-mediated loop enrichment analysis
A set of 7,358 representative chromatin interactions in hematopoietic cells was identified from a high-resolution Hi-C map of OCI-AML2 cells as previously described37. The loops whose anchors overlap with cisREs of MECOM down genes were extracted for further analysis. The CTCF-mediated loops (at least one of the anchors containing a CTCF footprint) and non-CTCF-mediated loops (anchors without CTCF footprint) were identified separately. The Low-C data of chromatin looping in LT- and ST-HSC were normalized by Knight–Ruiz balanced interaction frequencies at a resolution of 25 Kb. We used Juicer to perform aggregate peak analysis36 to test for enrichment of loops within the Low-C data from LT-HSCs and ST-HSCs. Loops containing genes were identified by the genes within the genomic domains between loop anchors.
Analysis of primary AML patient data
Included studies
Three study cohorts were included in the survival analyses. We downloaded RNA-seq V2 expression data and corresponding clinical outcomes from the TCGA LAML39 cohort from cBioPortal (https://www.cbioportal.org/study/summary?id=laml_tcga_pub) for 173 patients with AML. The same was conducted for the BEAT AML cohort for 430 patients (https://www.cbioportal.org/study/summary?id=aml_ohsu_2018)40. In addition, the TARGET dataset was downloaded for 440 pediatric patients with AML (https://www.cbioportal.org/study/summary?id=aml_target_2018_pub)41. To gain maximal insight, adult datasets (TCGA and BEAT) were combined, with subsequent adjustments in analyses to account for study specific features. The only pediatric data used were from the TARGET dataset. The results published here are in part based upon data generated by the Therapeutically Applicable Research to Generate Effective Treatments (https://ocg.cancer.gov/programs/target) initiative, phs000218. The data used for this analysis are available at https://portal.gdc.cancer.gov/projects.
Derivation of variables of interest
A detailed description is provided in the Supplementary Note.
Survival analyses
KM curves were constructed demonstrating survival for each cohort (adult and pediatric) and variables (MECOM expression, MECOM network enrichment score, MECOM network enrichment (categorical), LSC17 and clinical risk score). For continuous variables, to appreciate survival differences in the variable in this way, KM curves were stratified by thresholding on the optimum threshold determined by Youden’s J statistic, maximizing both sensitivity and specificity of the metric. Follow-up time was truncated at 2,500 d for the pediatric cohort (thereby including n = 350, 79.5% of all complete cases) and at 1,500 d for the adult cohort (thereby including n = 513, 83.8% of all complete cases) for this and subsequent analyses to limit the issue of data sparsity at very late event time points. KM curves were constructed in R using survival and ggsurvplot packages.
HRs and 95% CI of death were determined from Cox proportional hazards models. These were created for each variable, correcting for contributing study in the adult group. This allowed assessment of continuous variables at their full spectrum. This also allowed for assessment of association of MECOM down network enrichment with mortality, independent of existing clinical approaches such as the clinical risk score and LSC17. Corrected models for age and sex were created and marginal hazard of mortality was derived and displayed graphically by different ages. The R packages’ coxph, survival, rms and ggeffects were used.
For analysis of AML cells from the CCLE database, we downloaded RNA-seq and CRISPR dependency data from the Cancer Dependency Map (https://depmap.org)68. We stratified the cohort based on MECOM expression (MECOM-low, log2(RPKM + 1) < 1; MECOM-high, log2(RPKM + 1) ≥ 1). Differential essentiality was determined by subtracting the CERES gene effect score of MECOM-high and MECOM-low AML samples. A negative value indicates stronger essentiality in MECOM-high AML.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41590-022-01370-4.
Supplementary information
Acknowledgements
We are grateful to members of the Sankaran laboratory and numerous colleagues for valuable comments and suggestions. This work was supported by the New York Stem Cell Foundation (V.G.S.), a gift from the Lodish Family to Boston Children’s Hospital (V.G.S.), the Klarman Cell Observatory (A.R.), the Edward P. Evans Foundation (V.G.S.) and National Institutes of Health Grants R01 DK103794, R01 CA265726 and R01 HL146500 (V.G.S.). R.A.V. and L.W. received support from National Institutes of Health Grant T32 HL007574. R.A.V. is supported by the Edward P. Evans Center for Myelodysplastic Syndromes at the Dana-Farber Cancer Institute, the Julia’s Wings Foundation and the Office of Faculty Development at Boston Children’s Hospital. S.K.N. is a Scholar of the American Society of Hematology. V.G.S. is a New York Stem Cell-Robertson Investigator.
Extended data
Source data
Author contributions
R.A.V., L.T., F.Y. and V.G.S. conceived and designed the experiments and wrote the manuscript with input from all authors. R.A.V., L.T., L.D.C., B.C., T.J.F., M.A., X.L., C.F., S.K.N., L.W. and K.T. performed functional studies and provided interpretation. F.Y. and L.T. performed the computational analyses. F.Y. designed and developed HemeMap. A.R. and V.G.S. provided supervision and overall project oversight.
Peer review
Peer review information
Nature Immunology thanks H. Grimes and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Laurie A Dempsey, in collaboration with the Nature Immunology team.
Data availability
Summary statistics from RNA-seq studies are available in Supplementary Tables 2,3 and 7. HemeMap correlation data are available in Supplementary Tables 4 and 5. All sequencing data are deposited in National Center for Biotechnology Information Gene Expression Omnibus under Super Series GSE175521, including GSE175515 for MUTZ-3 and primary human CD34+ LT-HSPC bulk RNA-seq; GSE175516 for LT-HSPC 10x Genomics single-cell RNA-seq data; GSE175518 for primary human CD34+ LT-HSPC Amplicon-seq data; GSE175520 for primary human CD34+ LT-HSPC SmartSeq2 data; GSE214399 for CTCF in MUTZ-3 ChIP-seq data; and GSE216225 for F36P, HNT34 and primary human CD34+ HSPC bulk RNA-seq data and HSPC 10x Genomics scRNA-seq data. Publicly available AML gene expression data were downloaded from the following links and analyzed as described in the Methods: TCGA LAML (https://www.cbioportal.org/study/summary?id=laml_tcga_pub), TARGET AML (https://www.cbioportal.org/study/summary?id=aml_target_2018_pub) and BEAT AML (https://www.cbioportal.org/study/summary?id=aml_ohsu_2018). Source data are provided with this paper.
Code availability
Source data for reproducing results of this study are available on GitHub (https://github.com/sankaranlab/mecom_var).
Competing interests
A.R. is a founder and equity holder of Celsius Therapeutics, an equity holder in Immunitas Therapeutics and until 31 August 2020 was a scientific advisory board member of Syros Pharmaceuticals, Neogene Therapeutics, Asimov and Thermo Fisher Scientific. Since 1 August 2020, A.R. has been an employee of Genentech, a member of the Roche Group. V.G.S. serves as an advisor to and/or has equity in Branch Biosciences, Ensoma, Novartis, Forma and Cellarity, all unrelated to the present work. The authors have no other competing interests to declare.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Richard A. Voit, Liming Tao, Fulong Yu.
Contributor Information
Richard A. Voit, Email: rvoit@broadinstitute.org
Vijay G. Sankaran, Email: sankaran@broadinstitute.org
Extended data
is available for this paper at 10.1038/s41590-022-01370-4.
Supplementary information
The online version contains supplementary material available at 10.1038/s41590-022-01370-4.
References
- 1.Liggett LA, Sankaran VG. Unraveling hematopoiesis through the lens of genomics. Cell. 2020;182:1384–1400. doi: 10.1016/j.cell.2020.08.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Karantanos T, Jones RJ. Acute myeloid leukemia stem cell heterogeneity and its clinical relevance. Adv. Exp. Med. Biol. 2019;1139:153–169. doi: 10.1007/978-3-030-14366-4_9. [DOI] [PubMed] [Google Scholar]
- 3.Bluteau O, et al. A landscape of germ line mutations in a cohort of inherited bone marrow failure patients. Blood. 2018;131:717–732. doi: 10.1182/blood-2017-09-806489. [DOI] [PubMed] [Google Scholar]
- 4.Germeshausen M, et al. MECOM-associated syndrome: a heterogeneous inherited bone marrow failure syndrome with amegakaryocytic thrombocytopenia. Blood Adv. 2018;2:586–596. doi: 10.1182/bloodadvances.2018016501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Niihori T, et al. Mutations in MECOM, encoding oncoprotein EVI1, cause radioulnar synostosis with amegakaryocytic thrombocytopenia. Am. J. Hum. Genet. 2015;97:848–854. doi: 10.1016/j.ajhg.2015.10.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Goyama S, et al. Evi-1 is a critical regulator for hematopoietic stem cells and transformed leukemic cells. Cell Stem Cell. 2008;3:207–220. doi: 10.1016/j.stem.2008.06.002. [DOI] [PubMed] [Google Scholar]
- 7.Christodoulou C, et al. Live-animal imaging of native haematopoietic stem and progenitor cells. Nature. 2020;578:278–283. doi: 10.1038/s41586-020-1971-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhang Y, et al. PR-domain-containing Mds1-Evi1 is critical for long-term hematopoietic stem cell function. Blood. 2011;118:3853–3861. doi: 10.1182/blood-2011-02-334680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kataoka K, et al. Evi1 is essential for hematopoietic stem cell self-renewal and its expression marks hematopoietic cells with long-term multilineage repopulating activity. Journal of Experimental Medicine. 2011;208:2403–2416. doi: 10.1084/jem.20110447. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yuasa H, et al. Oncogenic transcription factor Evi1 regulates hematopoietic stem cell proliferation through GATA-2 expression. The EMBO Journal. 2005;24:1976–1987. doi: 10.1038/sj.emboj.7600679. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Bindels EMJ, et al. EVI1 is critical for the pathogenesis of a subset of MLL-AF9-rearranged AMLs. Blood. 2012;119:5838–5849. doi: 10.1182/blood-2011-11-393827. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ayoub E, et al. EVI1 overexpression reprograms hematopoiesis via upregulation of Spi1 transcription. Nat. Commun. 2018;9:4239. doi: 10.1038/s41467-018-06208-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Glass C, et al. Global identification of EVI1 target genes in acute myeloid leukemia. PLoS ONE. 2013;8:e67134. doi: 10.1371/journal.pone.0067134. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bard-Chapeau EA, et al. EVI1 oncoprotein interacts with a large and complex network of proteins and integrates signals through protein phosphorylation. Proc. Natl Acad. Sci. USA. 2013;110:E2885–E2894. doi: 10.1073/pnas.1309310110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Kurokawa M, et al. The evi-1 oncoprotein inhibits c-Jun N-terminal kinase and prevents stress-induced cell death. EMBO J. 2000;19:2958–2968. doi: 10.1093/emboj/19.12.2958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Tomellini E, et al. Integrin-α3 is a functional marker of ex vivo expanded human long-term hematopoietic stem cells. Cell Rep. 2019;28:1063–1073. doi: 10.1016/j.celrep.2019.06.084. [DOI] [PubMed] [Google Scholar]
- 17.Pellegrino M, et al. High-throughput single-cell DNA sequencing of acute myeloid leukemia tumors with droplet microfluidics. Genome Res. 2018;28:1345–1352. doi: 10.1101/gr.232272.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kurosaki T, Popp MW, Maquat LE. Quality and quantity control of gene expression by nonsense-mediated mRNA decay. Nat. Rev. Mol. Cell Biol. 2019;20:406–420. doi: 10.1038/s41580-019-0126-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Fares I, et al. Cord blood expansion. Pyrimidoindole derivatives are agonists of human hematopoietic stem cell self-renewal. Science. 2014;345:1509–1512. doi: 10.1126/science.1256337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Laurenti E, et al. CDK6 levels regulate quiescence exit in human hematopoietic stem cells. Cell Stem Cell. 2015;16:302–313. doi: 10.1016/j.stem.2015.01.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.McIntosh BE, et al. Nonirradiated NOD,B6.SCID Il2rγ-/- Kit(W41/W41) (NBSGW) mice support multilineage engraftment of human hematopoietic cells. Stem Cell Rep. 2015;4:171–180. doi: 10.1016/j.stemcr.2014.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Domínguez Conde C, et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science. 2022;376:eabl5197. doi: 10.1126/science.abl5197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Bao EL, et al. Inherited myeloproliferative neoplasm risk affects haematopoietic stem cells. Nature. 2020;586:769–775. doi: 10.1038/s41586-020-2786-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Dey SS, Kester L, Spanjaard B, Bienko M, van Oudenaarden A. Integrated genome and transcriptome sequencing of the same cell. Nat. Biotechnol. 2015;33:285–289. doi: 10.1038/nbt.3129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Finak G, et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 2015;16:278. doi: 10.1186/s13059-015-0844-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Squair JW, et al. Confronting false discoveries in single-cell differential expression. Nat. Commun. 2021;12:5692. doi: 10.1038/s41467-021-25960-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wahlster L, et al. Familial thrombocytopenia due to a complex structural variant resulting in a WAC-ANKRD26 fusion transcript. J. Exp. Med. 2021;218:e20210444. doi: 10.1084/jem.20210444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Corces MR, et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 2016;48:1193–1203. doi: 10.1038/ng.3646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Granja JM, et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 2019;37:1458–1465. doi: 10.1038/s41587-019-0332-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Javierre BM, et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell. 2016;167:1369–1384. doi: 10.1016/j.cell.2016.09.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ulirsch JC, et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 2019;51:683–693. doi: 10.1038/s41588-019-0362-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhang X, et al. Large DNA methylation nadirs anchor chromatin loops maintaining hematopoietic stem cell identity. Mol. Cell. 2020;78:506–521. doi: 10.1016/j.molcel.2020.04.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Roadmap Epigenomics Consortium. et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ciau-Uitz A, Wang L, Patient R, Liu F. ETS transcription factors in hematopoietic stem cell development. Blood Cells Mol. Dis. 2013;51:248–255. doi: 10.1016/j.bcmd.2013.07.010. [DOI] [PubMed] [Google Scholar]
- 35.Beck D, et al. Genome-wide analysis of transcriptional regulators in human HSPCs reveals a densely interconnected network of coding and noncoding genes. Blood. 2013;122:e12–e22. doi: 10.1182/blood-2013-03-490425. [DOI] [PubMed] [Google Scholar]
- 36.Rao SSP, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159:1665–1680. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Takayama N, et al. The transition from quiescent to activated states in human hematopoietic stem cells is governed by dynamic 3D genome reorganization. Cell Stem Cell. 2021;28:488–501. doi: 10.1016/j.stem.2020.11.001. [DOI] [PubMed] [Google Scholar]
- 38.Qi Q, et al. Dynamic CTCF binding directly mediates interactions among cis-regulatory elements essential for hematopoiesis. Blood. 2021;137:1327–1339. doi: 10.1182/blood.2020005780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Cancer Genome Atlas Research Network. et al. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 2013;368:2059–2074. doi: 10.1056/NEJMoa1301689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Tyner JW, et al. Functional genomic landscape of acute myeloid leukaemia. Nature. 2018;562:526–531. doi: 10.1038/s41586-018-0623-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Bolouri H, et al. The molecular landscape of pediatric acute myeloid leukemia reveals recurrent structural alterations and age-specific mutational interactions. Nat. Med. 2018;24:103–112. doi: 10.1038/nm.4439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Glass C, Wilson M, Gonzalez R, Zhang Y, Perkins AS. The role of EVI1 in myeloid malignancies. Blood Cells Mol. Dis. 2014;53:67–76. doi: 10.1016/j.bcmd.2014.01.002. [DOI] [PubMed] [Google Scholar]
- 43.Gröschel S, et al. Deregulated expression of EVI1 defines a poor prognostic subset of MLL-rearranged acute myeloid leukemias: a study of the German-Austrian Acute Myeloid Leukemia Study Group and the Dutch-Belgian-Swiss HOVON/SAKK Cooperative Group. J. Clin. Oncol. 2013;31:95–103. doi: 10.1200/JCO.2011.41.5505. [DOI] [PubMed] [Google Scholar]
- 44.Ng SWK, et al. A 17-gene stemness score for rapid determination of risk in acute leukaemia. Nature. 2016;540:433–437. doi: 10.1038/nature20598. [DOI] [PubMed] [Google Scholar]
- 45.Gröschel S, et al. A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in leukemia. Cell. 2014;157:369–381. doi: 10.1016/j.cell.2014.02.019. [DOI] [PubMed] [Google Scholar]
- 46.Yamazaki H, et al. A remote GATA2 hematopoietic enhancer drives leukemogenesis in inv(3)(q21;q26) by activating EVI1 expression. Cancer Cell. 2014;25:415–427. doi: 10.1016/j.ccr.2014.02.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Porteus MH. A new class of medicines through DNA editing. N. Engl. J. Med. 2019;380:947–959. doi: 10.1056/NEJMra1800729. [DOI] [PubMed] [Google Scholar]
- 48.Stein S, et al. Genomic instability and myelodysplasia with monosomy 7 consequent to EVI1 activation after gene therapy for chronic granulomatous disease. Nat. Med. 2010;16:198–204. doi: 10.1038/nm.2088. [DOI] [PubMed] [Google Scholar]
- 49.Kappas NC, Bautch VL. Maintenance and in vitro differentiation of mouse embryonic stem cells to form blood vessels. Curr. Protoc. Cell Biol. 2007;23:Unit 23.3. doi: 10.1002/0471143030.cb2303s34. [DOI] [PubMed] [Google Scholar]
- 50.Bak RO, Dever DP, Porteus MH. CRISPR/Cas9 genome editing in human hematopoietic stem cells. Nat. Protoc. 2018;13:358–376. doi: 10.1038/nprot.2017.143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Clement K, et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 2019;37:224–226. doi: 10.1038/s41587-019-0032-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Basak A, et al. Control of human hemoglobin switching by LIN28B-mediated regulation of BCL11A translation. Nat. Genet. 2020;52:138–145. doi: 10.1038/s41588-019-0568-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Li B, et al. Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq. Nat. Methods. 2020;17:793–798. doi: 10.1038/s41592-020-0905-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Trombetta JJ, et al. Preparation of single-cell RNA-seq libraries for next generation sequencing. Curr. Protoc. Mol. Biol. 2014;107:4.22.1–17. doi: 10.1002/0471142727.mb0422s107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Yang J, et al. The I-TASSER Suite: protein structure and function prediction. Nat. Methods. 2015;12:7–8. doi: 10.1038/nmeth.3213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:15. doi: 10.1186/s13059-017-1382-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Hie B, Bryson B, Berger B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 2019;37:685–691. doi: 10.1038/s41587-019-0113-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Korsunsky I, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods. 2019;16:1289–1296. doi: 10.1038/s41592-019-0619-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Machanick P, Bailey TL. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics. 2011;27:1696–1697. doi: 10.1093/bioinformatics/btr189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Bailey TL. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics. 2011;27:1653–1659. doi: 10.1093/bioinformatics/btr261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Kulakovskiy IV, et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-seq analysis. Nucleic Acids Res. 2018;46:D252–D259. doi: 10.1093/nar/gkx1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS. Quantifying similarity between motifs. Genome Biol. 2007;8:R24. doi: 10.1186/gb-2007-8-2-r24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Yu F, Sankaran VG, Yuan G-C. CUT&RUNTools 2.0: a pipeline for single-cell and bulk-level CUT&RUN and CUT&Tag data analysis. Bioinformatics. 2021;38:252–254. doi: 10.1093/bioinformatics/btab507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27:1017–1018. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Pique-Regi R, et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 2011;21:447–455. doi: 10.1101/gr.112623.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Zhang Y, et al. Model-based analysis of ChIP-seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Ramírez F, et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 2016;44:W160–W165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Ghandi M, et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature. 2019;569:503–508. doi: 10.1038/s41586-019-1186-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Summary statistics from RNA-seq studies are available in Supplementary Tables 2,3 and 7. HemeMap correlation data are available in Supplementary Tables 4 and 5. All sequencing data are deposited in National Center for Biotechnology Information Gene Expression Omnibus under Super Series GSE175521, including GSE175515 for MUTZ-3 and primary human CD34+ LT-HSPC bulk RNA-seq; GSE175516 for LT-HSPC 10x Genomics single-cell RNA-seq data; GSE175518 for primary human CD34+ LT-HSPC Amplicon-seq data; GSE175520 for primary human CD34+ LT-HSPC SmartSeq2 data; GSE214399 for CTCF in MUTZ-3 ChIP-seq data; and GSE216225 for F36P, HNT34 and primary human CD34+ HSPC bulk RNA-seq data and HSPC 10x Genomics scRNA-seq data. Publicly available AML gene expression data were downloaded from the following links and analyzed as described in the Methods: TCGA LAML (https://www.cbioportal.org/study/summary?id=laml_tcga_pub), TARGET AML (https://www.cbioportal.org/study/summary?id=aml_target_2018_pub) and BEAT AML (https://www.cbioportal.org/study/summary?id=aml_ohsu_2018). Source data are provided with this paper.
Source data for reproducing results of this study are available on GitHub (https://github.com/sankaranlab/mecom_var).