Significance
In BCP ALL, molecular classification is used for risk stratification and influences treatment strategies. We reanalyzed the transcriptomic landscape of 1,223 BCP ALLs and identified 14 subgroups based on their transcriptional profiles. Eight of these (G1 to G8) are previously well-known subgroups, harboring specific genetic abnormalities. The sample size allowed the identification of six previously undescribed subgroups, consisting of cases harboring PAX5 or CRLF2 fusions (G9), PAX5 (p.P80R) mutations (G10), IKZF1 (p.N159Y) mutations (G11), either ZEB2 (p.H1038R) mutations or IGH–CEBPE fusions (G12), HLF rearrangements (G13), or NUTM rearrangements (G14). In addition, this study allowed us to determine the prognostic impact of several recently defined subgroups. This study suggests that RNA sequencing should be a valuable tool in the routine diagnostic workup for ALL.
Keywords: BCP ALL, RNA-seq, subtypes, gene fusion, gene mutation
Abstract
Most B cell precursor acute lymphoblastic leukemia (BCP ALL) can be classified into known major genetic subtypes, while a substantial proportion of BCP ALL remains poorly characterized in relation to its underlying genomic abnormalities. We therefore initiated a large-scale international study to reanalyze and delineate the transcriptome landscape of 1,223 BCP ALL cases using RNA sequencing. Fourteen BCP ALL gene expression subgroups (G1 to G14) were identified. Apart from extending eight previously described subgroups (G1 to G8 associated with MEF2D fusions, TCF3–PBX1 fusions, ETV6–RUNX1–positive/ETV6–RUNX1–like, DUX4 fusions, ZNF384 fusions, BCR–ABL1/Ph–like, high hyperdiploidy, and KMT2A fusions), we defined six additional gene expression subgroups: G9 was associated with both PAX5 and CRLF2 fusions; G10 and G11 with mutations in PAX5 (p.P80R) and IKZF1 (p.N159Y), respectively; G12 with IGH–CEBPE fusion and mutations in ZEB2 (p.H1038R); and G13 and G14 with TCF3/4–HLF and NUTM1 fusions, respectively. In pediatric BCP ALL, subgroups G2 to G5 and G7 (51 to 65/67 chromosomes) were associated with low-risk, G7 (with ≤50 chromosomes) and G9 were intermediate-risk, whereas G1, G6, and G8 were defined as high-risk subgroups. In adult BCP ALL, G1, G2, G6, and G8 were associated with high risk, while G4, G5, and G7 had relatively favorable outcomes. This large-scale transcriptome sequence analysis of BCP ALL revealed distinct molecular subgroups that reflect discrete pathways of BCP ALL, informing disease classification and prognostic stratification. The combined results strongly advocate that RNA sequencing be introduced into the clinical diagnostic workup of BCP ALL.
B cell precursor acute lymphoblastic leukemia (BCP ALL), the most common childhood cancer, is a highly heterogeneous malignant hematological disorder (1). Previous genome- and/or transcriptome-wide analyses of BCP ALLs have greatly improved our understanding of the pathogenesis and prognostic impact of many molecular abnormalities in BCP ALL (2, 3). Structural chromosomal alterations as well as sequence mutations are common in childhood and adult BCP ALL. In the last four decades, most of the recurring chromosomal abnormalities, including aneuploidy, chromosomal rearrangements/gene fusions (e.g., ETV6–RUNX1, BCR–ABL1, and TCF3–PBX1), and rearrangements of KMT2A (previously MLL), were identified by cytogenetics and fluorescence in situ hybridization. Subsequently, gene expression profiling revealed that these cytogenetic subgroups displayed specific gene expression patterns (3–5). With the advent of genome sequencing technology, several groups discovered a large number of novel gene mutations and fusions, such as those involving ZNF384, MEF2D, and DUX4 rearrangements (6–11), among those cases with no defining chromosomal abnormalities, termed “B-other-ALL.”
However, it remained unknown whether additional novel BCP ALL subtypes could be detected by integrated analysis of pooled datasets from studies with otherwise relatively small sample sizes. We hypothesized that the versatility provided by RNA-seq (sequencing) would uncover otherwise undetected genetic abnormalities in BCP ALL, providing that sufficient numbers of cases were analyzed. Thus, through the formation of an international consortium of five major study groups, we have delineated the transcriptomic landscape of BCP ALL and at the same time identified new subgroups of biological and clinical importance.
Results
Identification of BCP ALL Subgroups with Distinctive Gene Expression Profiles and Genomic Aberrations.
To comprehensively identify BCP ALL subtypes, we first systematically classified gene expression profiles, gene fusions, and gene mutations from RNA-seq data of 1,223 BCP ALL cases from five significant patient cohorts (Table 1 and SI Appendix, Fig. S2 and Dataset S2). Based on a consecutive two-step unsupervised clustering, 14 distinct subgroups based on their gene expression signatures were identified (G1 to G14) (Fig. 1 and Table 2). Most of these gene expression subgroups segregated with well-known genetic abnormalities. TCF3–PBX1 fusions were present among the G2 subgroup (n = 76, 6%); ETV6–RUNX1 fusion belonged to G3; BCR–ABL1 (Ph) and BCR–ABL1–like (Ph-like, including a cluster with CRLF2 fusions) comprised G6 (n = 167, 14%); and cases with a hyperdiploid karyotype formed the subgroup G7 (n = 408, 33%). Three subgroups which had recently been reported identified among B-other-ALL cases were those with MEF2D fusions (G1; n = 39, 3%), DUX4 rearrangements (G4; n = 63, 5%), and ZNF384 fusions (G5; n = 74, 6%) (6–11). These recently described subgroups formed distinctive gene expression-based clusters, consistent with prior reports (6, 7, 10, 11). The most recently defined BCP ALL ETV6–RUNX1–like cluster, characterized by the absence of ETV6–RUNX1 fusions but with similar gene expression profiles to ETV6–RUNX1–positive BCP ALLs (6), was also found among our combined datasets. In concordance with previous findings (6), both fusions involving ETV6 and fusions involving IKZF1 were common in these ETV6–RUNX1–like cases (Dataset S2). However, all ETV6–RUNX1–negative cases exhibiting a gene expression profile similar to ETV6–RUNX1–positive cases were defined as ETV6–RUNX1–like. Together, ETV6–RUNX1–positive/ETV6–RUNX1–like BCP ALL constituted G3 (n = 161, 13%). KMT2A-rearranged cases formed a distinct subgroup (G8; n = 56, 5%). Notably, six previously undescribed gene expression subgroups (G9 to G14) with distinct genomic abnormalities were identified. G9 (n = 111, 9%) was associated with PAX5 fusions and “Ph-like” ALL with CRLF2 fusions (12). G10 (n = 23, 2%) and G11 (n = 6, <1%) were characterized by two hotspot mutations in PAX5 (p.P80R) (21/22, 96%) and IKZF1 (p.N159Y) (6/6, 100%), respectively. The subgroup G12 (n = 8, <1%) was enriched for hotspot mutation in ZEB2 (p.H1038R) (5/8, 63%) and IGH–CEBPE fusions (3/8, 27%). G13 (n = 11, <1%) and G14 (n = 20, 2%) were associated with TCF3/4–HLF (7/11, 64%) and NUTM1 (6/20, 30%) rearrangements, respectively.
Table 1.
Characteristics | Cohort 1 (SIH, n = 166) | Cohort 2 (LUH, n = 182) | Cohort 3 (JALSG, n = 71) | Cohort 4 (MaSpore, n = 194) | Cohort 5 (TARGET/COG, n = 394) | Cohort 6 (TARGET/COG, n = 216) | Total (n = 1,223) |
Age at diagnosis | |||||||
Mean, y | 19.41 | 5.04 | 15.42 | 5.84 | 17.96 | 7.87 | 12.60 |
Median, y | 15.01 | 4.00 | 17.00 | 4.56 | 13.00 | 6.44 | 7.95 |
<18 y | 91 (55) | 182 (100) | 39 (55) | 194 (100) | 248 (63) | 152 (70) | 906 (74) |
≥18 y | 75 (45) | 0 | 32 (45) | 0 | 145 (37) | 6 (3) | 258 (21) |
Not available | 0 | 0 | 0 | 0 | 1 | 58 (27) | 59 (5) |
Gender | |||||||
Male | 95 (57) | 107 (59) | 30 (42) | 111 (57) | 209 (53) | 105 (49) | 657 (54) |
Female | 71 (43) | 75 (41) | 41 (58) | 83 (43) | 185 (47) | 111 (51) | 566 (46) |
Fusions | |||||||
BCR–ABL1 | 27 (16) | 5 (3) | NA | 9 (5) | 12 (3) | 6 (3) | 59 (5) |
ETV6–RUNX1 | 19 (11) | 45 (25) | 2 (3) | 36 (19) | 18 (5) | 14 (6) | 134 (11) |
TCF3–PBX1 | 17 (10) | 13 (7) | 6 (8) | 13 (7) | 11 (3) | 16 (7) | 76 (6) |
KMT2A | 8 (5) | 14 (8) | 2 (3) | 7 (4) | 9 (2) | 6 (3) | 46 (4) |
DUX4 | 9 (5) | 8 (4) | 10 (14) | 23 (12) | NA | 2 (1) | 52 (4) |
MEF2D | 7 (4) | 1 (1) | 7 (10) | 2 (1) | 18 (5) | 5 (2) | 40 (3) |
ZNF384 | 15 (9) | 2 (1) | 10 (14) | 11 (6) | 11 (3) | 17 (8) | 66 (5) |
Data are years or no. of patients (%). Percentages might not add up to 100% because of rounding. Note: Cohort 3 (JALSG) does not include BCR–ABL patients. NA, not available.
Table 2.
RNA-seq data-based subgroups | Frequency in the study cohort (n = 1,223), no. of patients (%) | Most frequently mutated genes (%) |
MEF2D fusions (G1) | 39 (3) | MEF2D–BCL9 (67), MEF2D–HNRNPUL1 (21), NRAS (13), KMT2A (10) |
TCF3–PBX1 (G2) | 76 (6) | TCF3–PBX1 (100), TP53 (8) |
ETV6–RUNX1/–like (G3) | 161 (13) | ETV6–RUNX1 (82), WHSC1 (9), KRAS (7), NRAS (6) |
DUX4 fusions (G4) | 63 (5) | DUX4–IGH (78), NRAS (30), MYC (11), TP53 (11), PTPN11 (11), KMT2D (11), CTCF (8), FLT3 (8), PAX5 (8) |
ZNF384 fusions (G5) | 74 (6) | EP300–ZNF384 (53), TCF3–ZNF384 (12), TAF15–ZNF384 (11), SMARCA2–ZNF362 (4), NRAS (14), KRAS (12), |
FLT3 (14), PTPN11 (14), SETD1B (9), ZEB2 (8), EZH2 (8), KMT2D (7) | ||
BCR–ABL1/Ph–like (G6) | 167 (14) | BCR–ABL1 (31), IGH–CRLF2 (10), JAK2 fusions (10), ABL1 fusions (7), IGH–EPOR (7), P2RY8–CRLF2 (5), KRAS (6), JAK2 (7), RUNX1 (5) |
Hyperdiploidy (G7) | 408 (33) | NRAS (19), KRAS (18), FLT3 (13), PTPN11 (8), KMT2D (7), CREBBP (6) |
KMT2A fusions (G8) | 56 (5) | KMT2A–AFF1 (29), KMT2A–MLLT1 (25), KMT2A–MLLT3 (13), KRAS (13), NRAS (14), FLT3 (7) |
PAX5 and CRLF2 fusions (G9) | 111 (9) | P2RY8–CRLF2 (12), PAX5–NOL4L (8), PAX5–AUTS2 (6), NRAS (23), KRAS (23), PAX5 (12), FLT3 (11), JAK1 (8) |
PAX5 (p.P80R) mutation (G10) | 23 (2) | PAX5 (96), PTPN11 (26), NRAS (22), KRAS (17), FLT3 (13), IL7R (9), SETD2 (9) |
IKZF1 (p.N159Y) mutation (G11) | 6 (<1) | IKZF1 (100), KRAS (17), KMT2D (17) |
ZEB2 (p.H1038R)/IGH–CEBPE (G12) | 8 (<1) | ZEB2 (75), NRAS (62), KMT2D (25), KRAS (12), KMT2A (12), CDKN2A (12) |
TCF3/4–HLF (G13) | 11 (<1) | TCF3/4–HLF (64), KRAS (18), NRAS (9), ZEB2 (9), ASXL2 (9) |
NUTM1 fusions (G14) | 20 (2) | NUTM1 fusions (30), TP53 (15), KRAS (10), CREBBP (15), KMT2D (10), SETD1B (10) |
Nonsilent Sequence Mutation Profile.
We next analyzed nonsilent sequence variants in available whole exome sequencing (WES) and RNA-seq data, based on in-house analysis criteria from previous studies (6, 8, 11, 13). We identified 44 genes that were recurrently mutated in at least 1% of the cases (12/1,223 cases). Nonsilent variants in NRAS, KRAS, FLT3, KMT2D, PAX5, PTPN11, CREBBP, and TP53 exhibited the highest mutation frequencies (3 to 14%) (SI Appendix, Figs. S5 and S6A). The mutated genes (>1%) were functionally divided into five categories: signaling molecules, transcription factors, epigenetic factors, cell cycle, and others (Dataset S3). Distinct gene mutation categories showed different levels of enrichment among the gene expression subgroups G1 to G14. Gene mutations among signaling molecules were enriched in subgroups G5, G7, G9, and G10, while G4, G10, G11, and G12 harbored a higher number of variants in transcription factor genes. HIST family (HIST1H2AG and HIST1H2AI) point mutations located in the histone H2A type 1 domain (SI Appendix, Fig. S7A) were highly correlated with G2 (TCF3–PBX1), while WHSC1 (NSD2) point mutations (p.E1099K) in the SET domain (SI Appendix, Fig. S7B) were significantly associated with G3 (ETV6–RUNX1–positive/ETV6–RUNX1–like; SI Appendix, Figs. S5–S7).
Co-occurrence or mutual exclusivity of mutations was also evaluated using two-sided Fisher’s exact test. A total of 36 gene pairs (for example, TP53 and MYC) exhibited significant co-occurrence (P < 0.05; SI Appendix, Fig. S6B). Along with the novel subgroups defined in this study (G9 to G14), 13 gene pairs (for example, PAX5 and PTPN11, and ZEB2 and NRAS) exhibited significant co-occurrence (SI Appendix, Fig. S6 C and D). In G9, four gene pairs, namely PAX5 and IKZF1, JAK1 and SETD2, SH2B3 and ASXL1, and CDKN2A and ARID1B, exhibited significant co-occurrence (P < 0.05; SI Appendix, Fig. S6D).
Enrichment of certain mutations differed between pediatric and adult BCP ALL patients. Transcription factor mutations, such as in RUNX1, were more frequent in adult ALL, while signaling molecule and epigenetic factor WHSC1 mutations were more prevalent in pediatric BCP ALL (Datasets S5 and S6).
ZNF362 Fusions Cluster with ZNF384 Rearrangements (G5) and Display Activation of the JAK-STAT Pathway.
Four cases harbored previously undescribed ZNF362 rearrangements (n = 4), including SMARCA2–ZNF362 (n = 3) and TAF15–ZNF362 (n = 1). These cases clustered within the G5 subgroup, otherwise associated with ZNF384 fusions (Fig. 2 and SI Appendix, Figs. S8A and S9). ZNF384 and ZNF362 are homologous C2H2-type zinc-finger transcription factors containing six zinc fingers that belong to the zinc-finger protein 384/nuclear matrix transcription factor 4 (ZFAM4) gene family (14). Of note, the zinc-finger domains were retained in both fusion proteins (SI Appendix, Fig. S8B), and both clusters showed similar gene expression profiles with activated JAK-STAT signaling pathway (SI Appendix, Fig. S8C). Moreover, the fusion partners of ZNF362, namely TAF15 and SMARCA2, were also found to fuse to ZNF384, with similar breakpoints.
Previously Undescribed Subgroups Associated with Different Gene Fusions/Sequence Mutations.
G9: PAX5 and CRLF2 fusions are representative of this subgroup.
According to the gene expression profiles, 46 cases with PAX5 fusions and 13 cases with CRLF2 fusions (accounting for 41 and 12%, respectively) clustered together in G9 (n = 111). Previous work identified CRLF2 fusions in Down syndrome ALL and Ph-like BCP ALL, each of them accounting for approximately half of the cases (12, 15). In our study, 30% of the cases with CRLF2 fusions (13/44) were found in G9 and 57% (25/44) in the BCR–ABL1/Ph-like subgroup (G6), with the remaining cases present in G7 and G10 (Fig. 1). Notably, all 13 CRLF2 fusions in G9 were P2RY8–CRLF2 fusions, in contrast to those in G6 in which the fusion partners of CRLF2 were either P2RY8 or IGH. In the 13 CRLF2 fusion cases (G9), seven coexisted with PAX5 fusions. Signaling molecule mutations were also significantly enriched in G9 (P < 0.001; SI Appendix, Fig. S5 and Dataset S5), a feature reminiscent of Down syndrome ALL (12). Compared with the CRLF2 fusion clusters in G6, the PI3K-Akt signaling (e.g., FLT4 and EGF), cytokine–cytokine receptor interaction (e.g., CCL17 and IL2RA), and hematopoietic cell lineage (e.g., CD33 and CD34) pathways were significantly down-regulated in the CRLF2 fusion-positive cases in G9 (SI Appendix, Fig. S10), whereas a B cell-specific member of the tumor necrosis factor (TNF) receptor superfamily, TNFRSF13B, was up-regulated among those cases with CRLF2 fusions in G9 (SI Appendix, Fig. S10C) (16). However, the expression patterns of cytokine receptor and tyrosine kinase signaling genes (CRLF2, PDGFRB, JAK1, JAK2, JAK3, and RAS) were similar in the CRLF2 fusion-positive cases in G9 and G6.
G10: PAX5 (p.P80R) point mutation is strongly associated with a distinct gene expression profile.
PAX5 encodes the B cell lineage-specific activator protein that is normally expressed at the early stage of B cell differentiation (17). It has previously been reported that PAX5 haploinsufficiency is central to ALL pathogenesis (17). In the present study, 64 cases harbored PAX5 sequence mutations, including p.P80R (n = 22), p.V26G (n = 10), p.L58F/L58P (n = 4), and others. PAX5 (p.P80R), located at the DNA-binding domain, was correlated with increased expression of PAX5 (P < 0.001) compared with other BCP ALLs without PAX5 mutations (Fig. 2 A–C and SI Appendix, Fig. S11A). Previous studies have described heterozygous deletions of CDKN2A/B, IKZF1, and PAX5 in PAX5 (p.P80R)-positive BCP ALL patients (18). Notably, 21 of the 22 PAX5 (p.P80R) cases clustered in subgroup G10, with no other known driver gene abnormalities detected, except for one case with a P2RY8–CRLF2 fusion (C184) (Fig. 1). PAX5 (p.P80R)-positive cases showed up-regulation of PI3K/Akt/mTOR signaling and down-regulation of cell-adhesion molecules (Fig. 2C). As in G9, TNFRSF13B gene up-regulation was seen in this subgroup (SI Appendix, Figs. S10C and S12A).
G11: IKZF1 (p.N159Y) point mutation associated with a distinct gene expression profile and increased SALL1 expression.
Inherited or somatic sequence mutations of IKZF1 have previously been described in BCP ALL (19–21). In the present series, 26 cases with IKZF1 sequence abnormalities were found, with mutations commonly located in its DNA-binding domain (Fig. 2D and SI Appendix, Fig. S11B). Notably, IKZF1 (p.N159Y) cases (n = 6) formed a gene expression subgroup (G11) without other detectable genomic rearrangements (Fig. 1). Pathway analysis showed down-regulation of B cell receptor signaling and JAK-STAT signaling such as FLT3 (P < 0.001) and STAT5A (P < 0.001) (Fig. 2E). We also found that spalt-like transcription factor 1 (SALL1) was overexpressed (P < 0.001) in G11 (SI Appendix, Fig. S12B). Previous studies have reported that SALL1 can recruit histone deacetylase (HDAC) to mediate transcriptional repression and that its promoter is often methylated in BCP ALL (22, 23).
G12: hotspot point mutations in ZEB2 (p.H1038R) and IGH–CEBPE fusion.
ZEB2 is a member of the Zfh1 family of two-handed zinc-finger/homeodomain proteins. We and others have previously reported mutations of ZEB2 in BCP ALL (9, 10, 24). Here, we showed that ZEB2 was recurrently mutated (n = 25), with the p.H1038R hotspot mutation (n = 15) being located within the DNA-binding domain (Fig. 2 F and G and SI Appendix, Fig. S11C). Based on unsupervised clustering of gene expression, cases with ZEB2 (p.H1038R) (n = 5) clustered closely with cases with IGH–CEBPE fusions (n = 3). The remaining 10 cases with ZEB2 (p.H1038R) mutations mostly coexisted with other known gene fusions, such as TCF3–PBX1 (n = 1), DUX4 fusions (n = 1), ZNF384 fusions (n = 5), and ZNF362 fusions (n = 1). A significant enrichment of NRAS mutations (5/8) was also found in the G12 cases. All four cases with IGH–CEBPE fusion exhibited a truncation of the 3′ UTR region of CEBPE (Fig. 2H). The known ALL driver gene LMO1 was up-regulated in G12 (Fig. 2I and SI Appendix, Fig. S12C).
G13: TCF3/4–HLF fusion.
TCF3–HLF is a rare (<1%) fusion associated with high-risk BCP ALL and PAX5 haploinsufficiency from allelic deletion. It has been shown that TCF3–HLF–positive cases may respond to the BCL2 inhibitor venetoclax (25). It has also been shown that the homologous TCF4 may compensate for TCF3 in a conditional knockout mice model (26). Herein, we identified one case with a TCF4–HLF fusion, which clustered with six cases of TCF3–HLF in G13 (Fig. 1 and SI Appendix, Fig. S13). Both TCF3–HLF and TCF4–HLF retained part of the HLF bZIP_2 domain (Fig. 3A) and displayed significant up-regulated expression of HLF (Fig. 3 A–C). Down-regulation of the JAK-STAT and an up-regulation of the NOTCH signaling pathways were also noted (Fig. 3D). Four cases with low expression of HLF, but lacking TCF3/4–HLF fusions, were assigned to this cluster (Fig. 1), based on evidence of expression signatures similar to TCF3/4–HLF fusion (e.g., BCL2, PAX5, JAK2, and STAT5), suggesting that alternative genetic alterations may elicit the same transcriptional program.
G14: NUTM1 fusions with aberrantly high expression of NUTM1.
NUTM1 is a chromatin regulator that functions to recruit p300, leading to increased local histone acetylation (27). NUTM1 is normally only expressed in testis, but is frequently involved in NUT midline carcinoma (27). We found nine cases with distinct NUTM1 fusions (SI Appendix, Fig. S13), six of them clustering into the G14 subgroup (Fig. 1). The predicted protein structure showed that all NUTM1 fusions retained part of the NUT domain (Fig. 3 E and F). Furthermore, increased expression of NUTM1 resulting from the fusion was found (Fig. 3G), possibly leading to a global change in chromatin acetylation. We also noted up-regulation of ZYG11A, a cell-cycle regulator, and HOXA family genes (Fig. 3H), which were slightly down-regulated in the three NUTM1 fusion-positive cases which did not cluster in G14, especially ZYG11A and HOXA9 (Fig. 3I). In addition, gene set enrichment analysis showed a higher expression level of the NOTCH pathway and a down-regulation of genes in the Hedgehog pathway among the G14 subgroup (Fig. 3J).
Prognostic Impact of Gene Expression Subgroups in BCP ALL.
We were able to retrieve clinical follow-up data on 380 BCP ALL cases (31%), allowing us to investigate the prognostic impact of the different gene expression subgroups. As these patients were treated on different protocols, we used BCR–ABL1–positive cases (n = 35) as a reference group for “high-risk” and ETV6–RUNX1–positive cases (n = 96) as a reference group for “low-risk” BCP ALL. We then compared the outcomes in terms of 5-y overall survival and relapse-free survival rates of the other subtypes against these two reference groups and classified them into low-, intermediate-, or high-risk groups. Due to the small sample sizes with available clinical data in subgroups G10 to G14, only cases in G1 to G9 were analyzed for treatment outcome. In pediatric BCP ALL, no deaths occurred in G2 (TCF3–PBX1), ETV6–RUNX1–like (a part of G3), G5 (ZNF384 fusions), and high hyperdiploidy (G7; 51 to 65/67 chromosomes) (SI Appendix, Fig. S14). In addition to these subtypes, G4 (DUX4 fusions) was also considered as low-risk, as no significant difference in overall survival was found in comparison with G3 (n = 46, P = 0.476; SI Appendix, Fig. S14). PAX5 and CRLF2 fusions (n = 33) and other cases in G7 (≤50 chromosomes, n = 14), however, were classified into the intermediate-risk group due to an inferior 5-y overall survival compared with that of G3 (ETV6–RUNX1–positive/ETV6–RUNX1–like) (P < 0.05; SI Appendix, Fig. S14). In contrast, G1 (MEF2D fusions) and G8 (KMT2A fusions) were associated with high risk. Taken as a whole, among 295 pediatric patients, the RNA-seq–based subgroups stratified 193 (65%) as low-risk, 47 (16%) as intermediate-risk, and 55 (19%) as high-risk groups. Based on the Cox proportional-hazards model, the range of hazard ratios between low and intermediate risk was 10.7 [95% confidence interval (CI) 3.3 to 34.1, P < 0.001] and between low and high risk was 14.52 (4.8 to 44.1, P < 0.001) (Fig. 4). For 5-y relapse-free survival, hazard ratios between low and intermediate risk was 2.1 (95% CI 1.0 to 4.5, P = 0.04) and between low and high risk was 3.6 (1.9 to 6.8, P < 0.001). In adult BCP ALL, in the absence of G3 cases, the BCR–ABL–positive subgroup (G6) was used as the only reference, denoting high-risk BCP ALL. In this regard, G1 (MEF2D fusions), G2 (TCF3–PBX1), and G8 (KMT2A fusions) were associated with poor prognosis, while G4 (DUX4 fusions), G5 (ZNF384 fusions), and G7 (high hyperdiploidy) were associated with an intermediate prognosis (SI Appendix, Fig. S15). Overall, in adult BCP ALL, 47 (55%) of the 85 patients were classified as intermediate-risk and 38 (45%) as high-risk.
Discussion
In this comprehensive analysis of the transcriptomic landscape of 1,223 BCP ALL cases, we identified 14 subgroups of BCP ALL based on their gene expression profiles. Of these, eight were previously well-known subgroups, harboring specific genetic abnormalities (MEF2D fusions, TCF3–PBX1, ETV6–RUNX1–positive/ETV6–RUNX1–like, DUX4 fusions, ZNF384 fusions, BCR–ABL1 and Ph-like, high hyperdiploidy, and KMT2A fusions). Notably, the large sample size allowed us to identify six additional subgroups (G9 to G14), harboring distinct genetic alterations including gene fusions and/or sequence mutations. The number of cases for some of the candidate leukemogenic abnormalities identified, such as ZNF362 fusions, NUTM1 fusions, and PAX5/CRLF2 fusions, and hotspot mutations of PAX5 (p.P80R), IKZF1 (p.N159Y), and ZEB2 (p.H1038R), was relatively small, which may explain the lack of detection of such cases in previous studies.
We addressed survival among the various BCP ALL subtypes in relation to pediatric and adult patients. Although the outcome data originated from different study groups, we validated the prognostic impact of all previously known major subgroups of BCP ALL and were able to ascertain the prognostic impact of some of the newly defined subgroups. Among the pediatric cohort in this study, TCF3–PBX1 (G2), ETV6–RUNX1–positive/ETV6–RUNX1–like (G3), DUX4 fusions (G4), ZNF384 fusions (G5), and high hyperdiploidy (G7; 51 to 65/67 chromosomes) were defined as low-risk, PAX5 and CRLF2 fusions (G9) and other cases in G7 (≤50 chromosomes) were intermediate-risk, while MEF2D fusions (G1), BCR–ABL1/Ph-like (G6), and KMT2A fusions (G8) were defined as high-risk groups. In adults, MEF2D fusions (G1), TCF3–PBX1 (G2), BCR–ABL1/Ph-like (G6), and KMT2A fusions (G8) were high-risk, as previously described, while DUX4 fusions (G4), ZNF384 fusions (G5), and high hyperdiploidy (G7) showed relatively favorable outcomes, albeit inferior to those of their pediatric counterparts. Even though this is a large study, the numbers of patients with novel subgroups (G10 to G14) are small and the treatments are heterogeneous, and thus more cases are needed to be analyzed in independent studies in the future to validate their prognostic impact.
Notably, fusion genes were generally mutually exclusive, suggesting their role as drivers in the leukemogenic process. In contrast, while some hotspot gene mutations, such as PAX5 (p.P80R) and IKZF1 (p.N159Y), were independent abnormalities suggestive of their function as leukemia drivers, co-occurrence of many of the other point mutations indicated their potential cooperative role in leukemogenesis. A schematic summary of the major gene expressional/structural aberrations identified in our analysis is provided in Fig. 5. These alterations are functionally located within distinct and wide-ranging cellular compartments, from cell-surface receptors to cytosolic signaling pathways, to transcription factors/cofactors for transcriptional regulation essential for B-precursor development, and molecules involved in epigenetic regulation.
A large body of evidence suggests that many BCP ALL subgroups have a unifying gene expression signature driven by similar but not identical gene fusion/mutation events. Hence these genetic abnormalities molecularly “phenocopy” each other and point to a convergent signaling pathway related to pathogenesis with the same transcriptomic subgroup. For example, the similarities in gene expression profiles and genetic aberrations between the Ph-like and the BCR–ABL1 subtypes indicate that these are phenocopies of each other; a similar relationship appears to exist between the ETV6–RUNX1 and ETV6–RUNX1–like subtypes. This large dataset has allowed us to systematically identify such previously undescribed molecular phenocopies. ZNF384 fusions were the predominant fusions in the G5 subgroup, while ZNF362 fusions displayed the same expression signature and thus plausibly the same pathogenetic process. Also, the rare TCF4–HLF fusion evidently phenocopies TCF3–HLF rearrangements. Intriguingly, the hotspot point mutation ZEB2 (p.H1038R) appeared to phenocopy the IGH–CEBPE fusion, although the molecular relationship between these genetic aberrations is less obvious than in the previous examples. All of these observations point to a common theme in BCP ALL: There are likely a limited number of pathways leading to leukemogenesis in BCP ALL, and each is identified by a distinct gene expression pattern. However, there are presumably several factors, such as complex genetic backgrounds, coexisting genetic abnormalities, alternative partner genes of fusions, and different cells of origin, that all contribute to determine the dominating pathway in a single case, which can partially explain why cases sometime present outside of the expected cluster.
In conclusion, we additionally defined six gene expression subgroups. These six subgroups included cases characterized by PAX5 and CRLF2 fusions; point mutations in PAX5 (p.P80R); point mutations in IKZF1 (p.N159Y); IGH–CEBPE fusion or mutations in ZEB2 (p.H1038R); TCF3/4–HLF fusion; and NUTM1 fusions. We have also demonstrated that transcriptome profiling by RNA sequencing allows the identification of distinct gene expression subgroups in BCP ALL with characteristic gene fusions and/or sequence mutations that can be readily called using the integrative analysis described in this study. Apart from providing information on perturbed transcriptional programs/signaling pathways that may be amenable to therapeutic targeting, the identified gene expression subgroups are likely important for improved disease stratification and prognostication of BCP ALL. Hence, our combined results of this collaborative study strongly advocate for RNA-seq being applied in the clinical diagnostic workup of BCP ALL.
Materials and Methods
Patients.
Transcriptome (RNA-seq) and other genomic data of all patients analyzed in this study are listed in Dataset S1. All of the included datasets have been analyzed as part of previous publications (3, 6–8, 10, 11, 18, 19). Basic clinical characteristics and genetic types of collected BCP ALL cohorts are shown in Table 1 and Dataset S2. The Lund University Hospital (LUH) cohort (cohort 2) (6) and the Singapore and Malaysia MaSpore cohort (MaSpore; cohort 4) (7) included only childhood BCP ALL cases. The vast majority of cohort-2 patients were treated according to the Nordic Society of Paediatric Haematology and Oncology (NOPHO) ALL 1992, 2000, or 2008 protocols (6), and cohort-4 patients were enrolled on the MaSpore frontline ALL protocols (7). The Japan Adult Leukemia Study Group (JALSG) cohort (cohort 3) (8) comprised adolescents and young adults with Philadelphia chromosome-negative ALL who were treated with the JALSG ALL202-U (adults) and TCCSG L04-16 (pediatric) protocols (8, 28). BCP ALL patients in the Chinese cohort (cohort 4) enrolled in this study were diagnosed and/or treated in the Multicenter Hematology-Oncology Protocols Evaluation System (M-HOPES) by the Shanghai Institute of Hematology (SIH)-based hospital network. Adult patients were enrolled in an SIH trial (Chinese Clinical Trial Registry; no. ChiCTR-ONRC-14004968), which was basically a modification of the vincristine, daunorubicin, l-asparaginase, cyclophosphamide, and prednisone regimen. Pediatric patients in the Chinese cohort were enrolled in the Shanghai Children’s Medical Center ALL-2005 protocol (Chinese Clinical Trial Registry; no. ONC-14005003) (10). There were two TARGET/COG (Therapeutically Applicable Research to Generate Effective Treatments/Children’s Oncology Group) cohorts (cohort 5 and cohort 6), with the data accession nos. EGAS00001001952 and phs000463/phs000464, respectively (3, 11, 18, 29). Informed consent was obtained from all patients, and the study was approved by the ethics committee of Rui-jin Hospital. The clinical outcome data of the TARGET/COG cohorts were not available. The comparability of the clinical data from the different cohorts was supported by very similar survival curves for the favorable genetic subtype (ETV6–RUNX1) and the unfavorable genetic subtype (BCR–ABL1/Ph-like cases) of ALL among these cohorts (SI Appendix, Fig. S1).
RNA-Seq Data Analyses.
Reading pairs were aligned to human reference genomes hg38 (fusion gene analysis) and hg19 (gene expression and gene mutation calling). Principal component analysis was applied on the RNA-seq data of the 1,223 BCP ALL cases, and batch effects were adjusted by the SVA package (30) (SI Appendix, Fig. S3). To investigate the bias of different cohorts, age, gender, and race on gene expression, we checked the distribution of well-known biomarkers in the gene expression clusters. No obvious bias based on cohort, age, gender, and race was found. The patients mainly clustered based on the different gene expression profiles related to underlying genetic abnormalities. Procedures of reading pair alignment, mutation calling from RNA-seq data, and gene expression/pathway analysis are listed in SI Appendix, Materials and Methods.
Statistical Analyses.
We tested mutual exclusivity and co-occurrence of mutations for the 44 most frequently mutated genes (>1%). For gene pairs, we completed the two-sided Fisher’s exact test according to their mutation status (positive or negative). The R package QVALUE (v2.10.1) (31) was used to control for multiple testing. Comparisons of categorical variables were ascertained by Pearson’s χ2 test or Fisher’s exact test. Overall survival was calculated from time of diagnosis to death, while relapse-free survival was calculated from time of complete remission to relapse. The Kaplan–Meier method, log-rank test, and Cox proportional-hazards model were used to calculate estimates of survival probabilities and hazard ratios. Two-sided P values are reported, and the significance level was set to less than 0.05. Analyses were performed with the use of R (v3.4.4).
Supplementary Material
Acknowledgments
We thank TARGET/COG and St. Jude Children’s Research Hospital for providing the RNA-seq data in this analysis. The RNA-seq dataset and clinical information for the TARGET/COG ALL project used in this study are available in the database of Genotypes and Phenotypes (dbGaP) under accession phs000218.v20.p7 and European Genome Phenome archive, accessions EGAS00001000654 and EGAS00001001952. This work was supported by Mega-Projects of Scientific Research for the 12th Five-Year Plan (2013ZX09303302); National Natural Science Foundation of China (Grants 81570122 and 81570122); Shanghai Municipal Education Commission-Gaofeng Clinical Medicine Grant Support (Grant 20161303); Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning (Grant QD2015005); Fine Classification and Standardized Treatment of Children with Acute Leukemia of Multi Center Clinical Research (Grant 14411950600), Shanghai Municipal Science and Technology Commission; National Key Research and Development Program (Grant 2016YFC0902800); Practical Research for Innovative Cancer Control from the Japan Agency for Medical Research and Development; US National Institutes of Health Grants CA21765, CA36401, and U01 GM115279; American Lebanese Syrian Associated Charities (ALSAC); Swedish Cancer Society, Swedish Childhood Cancer Foundation, Swedish Research Council, Knut and Alice Wallenberg Foundation, and Governmental (ALF) Funding of Clinical Research within the Swedish National Health Service; NMRC/CSA/0053/2013; VIVA Foundation for Children with Cancer; Goh Foundation; Children’s Cancer Foundation, Singapore Totalisator Board; Samuel Waxman Cancer Research Foundation; and Center for HPC at Shanghai Jiao Tong University.
Footnotes
The authors declare no conflict of interest.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1814397115/-/DCSupplemental.
References
- 1.Pui C-H, Yang JJ, Bhakta N, Rodriguez-Galindo C. Global efforts toward the cure of childhood acute lymphoblastic leukaemia. Lancet Child Adolesc Health. 2018;2:440–454. doi: 10.1016/S2352-4642(18)30066-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Holmfeldt L, et al. The genomic landscape of hypodiploid acute lymphoblastic leukemia. Nat Genet. 2013;45:242–252. doi: 10.1038/ng.2532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Roberts KG, et al. Genetic alterations activating kinase and cytokine receptor signaling in high-risk acute lymphoblastic leukemia. Cancer Cell. 2012;22:153–166. doi: 10.1016/j.ccr.2012.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Den Boer ML, et al. A subtype of childhood acute lymphoblastic leukaemia with poor treatment outcome: A genome-wide classification study. Lancet Oncol. 2009;10:125–134. doi: 10.1016/S1470-2045(08)70339-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Andersson A, et al. Microarray-based classification of a consecutive series of 121 childhood acute leukemias: Prediction of leukemic and genetic subtype as well as of minimal residual disease status. Leukemia. 2007;21:1198–1203. doi: 10.1038/sj.leu.2404688. [DOI] [PubMed] [Google Scholar]
- 6.Lilljebjörn H, et al. Identification of ETV6-RUNX1-like and DUX4-rearranged subtypes in paediatric B-cell precursor acute lymphoblastic leukaemia. Nat Commun. 2016;7:11790. doi: 10.1038/ncomms11790. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Qian M, et al. Whole-transcriptome sequencing identifies a distinct subtype of acute lymphoblastic leukemia with predominant genomic abnormalities of EP300 and CREBBP. Genome Res. 2017;27:185–195. doi: 10.1101/gr.209163.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yasuda T, et al. Recurrent DUX4 fusions in B cell acute lymphoblastic leukemia of adolescents and young adults. Nat Genet. 2016;48:569–574. doi: 10.1038/ng.3535. [DOI] [PubMed] [Google Scholar]
- 9.Zhang J, et al. St. Jude Children’s Research Hospital–Washington University Pediatric Cancer Genome Project Deregulation of DUX4 and ERG in acute lymphoblastic leukemia. Nat Genet. 2016;48:1481–1489. doi: 10.1038/ng.3691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Liu YF, et al. Genomic profiling of adult and pediatric B-cell acute lymphoblastic leukemia. EBioMedicine. 2016;8:173–183. doi: 10.1016/j.ebiom.2016.04.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gu Z, et al. Genomic analyses identify recurrent MEF2D fusions in acute lymphoblastic leukaemia. Nat Commun. 2016;7:13331. doi: 10.1038/ncomms13331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Schwartzman O, et al. Suppressors and activators of JAK-STAT signaling at diagnosis and relapse of acute lymphoblastic leukemia in Down syndrome. Proc Natl Acad Sci USA. 2017;114:E4030–E4039. doi: 10.1073/pnas.1702489114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chen B, et al. Identification of fusion genes and characterization of transcriptome features in T-cell acute lymphoblastic leukemia. Proc Natl Acad Sci USA. 2018;115:373–378. doi: 10.1073/pnas.1717125115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Seetharam A, Bai Y, Stuart GW. A survey of well conserved families of C2H2 zinc-finger genes in Daphnia. BMC Genomics. 2010;11:276. doi: 10.1186/1471-2164-11-276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Mullighan CG, et al. Rearrangement of CRLF2 in B-progenitor- and Down syndrome-associated acute lymphoblastic leukemia. Nat Genet. 2009;41:1243–1246. doi: 10.1038/ng.469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Salzer U, et al. Relevance of biallelic versus monoallelic TNFRSF13B mutations in distinguishing disease-causing from risk-increasing TNFRSF13B variants in antibody deficiency syndromes. Blood. 2009;113:1967–1976. doi: 10.1182/blood-2008-02-141937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dang J, et al. PAX5 is a tumor suppressor in mouse mutagenesis models of acute lymphoblastic leukemia. Blood. 2015;125:3609–3617. doi: 10.1182/blood-2015-02-626127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Roberts KG, et al. Targetable kinase-activating lesions in Ph-like acute lymphoblastic leukemia. N Engl J Med. 2014;371:1005–1015. doi: 10.1056/NEJMoa1403088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Churchman ML, et al. Efficacy of retinoids in IKZF1-mutated BCR-ABL1 acute lymphoblastic leukemia. Cancer Cell. 2015;28:343–356. doi: 10.1016/j.ccell.2015.07.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Churchman ML, et al. Germline genetic IKZF1 variation and predisposition to childhood acute lymphoblastic leukemia. Cancer Cell. 2018;33:937–948.e8. doi: 10.1016/j.ccell.2018.03.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Olsson L, et al. Cooperative genetic changes in pediatric B-cell precursor acute lymphoblastic leukemia with deletions or mutations of IKZF1. Genes Chromosomes Cancer. 2015;54:315–325. doi: 10.1002/gcc.22245. [DOI] [PubMed] [Google Scholar]
- 22.Ma C, et al. SALL1 functions as a tumor suppressor in breast cancer by regulating cancer cell senescence and metastasis through the NuRD complex. Mol Cancer. 2018;17:78. doi: 10.1186/s12943-018-0824-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kuang SQ, et al. Genome-wide identification of aberrantly methylated promoter associated CpG islands in acute lymphocytic leukemia. Leukemia. 2008;22:1529–1538. doi: 10.1038/leu.2008.130. [DOI] [PubMed] [Google Scholar]
- 24.Ma X, et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature. 2018;555:371–376. doi: 10.1038/nature25795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Fischer U, et al. Genomics and drug profiling of fatal TCF3-HLF-positive acute lymphoblastic leukemia identifies recurrent mutation patterns and therapeutic options. Nat Genet. 2015;47:1020–1029. doi: 10.1038/ng.3362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Nguyen H, et al. Tcf3 and Tcf4 are essential for long-term homeostasis of skin epithelia. Nat Genet. 2009;41:1068–1075. doi: 10.1038/ng.431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Alekseyenko AA, et al. The oncogenic BRD4-NUT chromatin regulator drives aberrant transcription within large topological domains. Genes Dev. 2015;29:1507–1523. doi: 10.1101/gad.267583.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Takahashi H, et al. Treatment outcome of children with acute lymphoblastic leukemia: The Tokyo Children’s Cancer Study Group (TCCSG) study L04-16. Int J Hematol. 2018;108:98–108. doi: 10.1007/s12185-018-2440-4. [DOI] [PubMed] [Google Scholar]
- 29.Pui CH, et al. Childhood acute lymphoblastic leukemia: Progress through collaboration. J Clin Oncol. 2015;33:2938–2948. doi: 10.1200/JCO.2014.59.1636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–883. doi: 10.1093/bioinformatics/bts034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.