Publisher's Note: There is a Blood Commentary on this article in this issue.
Key Points
GWAS in Hispanics identified ERG as a novel ALL risk locus, with effect sizes correlated with Native American ancestry.
ERG risk genotype was underrepresented in ALL with the ETV6-RUNX1 fusion or somatic ERG deletion, but enriched in the TCF3-PBX1 subtype.
Abstract
Acute lymphoblastic leukemia (ALL) is the most common malignancy in children. Characterized by high levels of Native American ancestry, Hispanics are disproportionally affected by this cancer with high incidence and inferior survival. However, the genetic basis for this disparity remains poorly understood because of a paucity of genome-wide investigation of ALL in Hispanics. Performing a genome-wide association study (GWAS) in 940 Hispanic children with ALL and 681 ancestry-matched non-ALL controls, we identified a novel susceptibility locus in the ERG gene (rs2836365; P = 3.76 × 10−8; odds ratio [OR] = 1.56), with independent validation (P = .01; OR = 1.43). Imputation analyses pointed to a single causal variant driving the association signal at this locus overlapping with putative regulatory DNA elements. The effect size of the ERG risk variant rose with increasing Native American genetic ancestry. The ERG risk genotype was underrepresented in ALL with the ETV6-RUNX1 fusion (P < .0005) but enriched in the TCF3-PBX1 subtype (P < .05). Interestingly, ALL cases with germline ERG risk alleles were significantly less likely to have somatic ERG deletion (P < .05). Our results provide novel insights into genetic predisposition to ALL and its contribution to racial disparity in this cancer.
Visual Abstract
Introduction
Acute lymphoblastic leukemia (ALL) is the most common cancer in children, with substantial racial disparities in both disease susceptibility and treatment outcomes.1,2 In particular, Hispanics have a disproportionally higher incidence of ALL with a significantly lower survival than other racial/ethnic groups in the United States (supplemental Figure 1, available on the Blood Web site),3,4 which may be partially attributed to Native American ancestry-related genomic variations.5-7
Through genome-wide association studies (GWASs), a number of risk loci have been identified for childhood ALL.8-10 The majority of these risk genes are transcription factors involved in hematopoietic development, with variable effects by race/ethnicity. For instance, single-nucleotide polymorphisms (SNPs) in ARID5B, GATA3, and PIP4K2A have higher-risk allele frequencies in Hispanics,5,11-13 whereas CEBPE SNP does not contribute to ALL susceptibility in African Americans (AAs).11 However, due to the limited sample size and complex admixture, there is a paucity of genome-wide investigation of ALL risk variants in Hispanics.
In this study, we performed a GWAS in genetically defined Hispanic children with ALL and ancestry-matched controls to systematically identify novel leukemia risk loci in this population and evaluate their associations with ALL clinical features.
Study design
In the discovery GWAS, Hispanic childhood B-cell ALL (B-ALL) cases were from the Children’s Oncology Group (COG) AALL023214 and P9904/P990515 clinical trials (supplemental Figure 2; supplemental Table 1). Non-ALL controls were unrelated subjects from the Multi-Ethnic Study of Atherosclerosis (MESA).12 The replication cohort included 144 Hispanic B-ALL cases from the COG P990615 and St. Jude Total Therapy XIIIB/XV cohorts,16,17 with 441 Hispanic controls from the Genetics of Asthma in Latino Americans (GALA) study.18 For rs2836365, we also examined its allele frequency across populations in Europe and Latino groups in the Americas in the 1000 Genomes Project (supplemental Figure 3), and compared them against allele frequency observed in MESA (supplemental Figure 4), to rule out selection bias in our control subjects. This study was approved by the respective institutional review boards with proper informed consent. Detailed methods are described in supplemental Methods.
Results and discussion
The discovery GWAS was conducted by comparing genotype frequencies of 572 556 SNPs between 940 Hispanic B-ALL cases and 681 controls, with SNP genotype-based principal components representing genetic ancestry included as covariables to control for population structure. Four loci reached genome-wide significance (P < 5 × 10−8, Figure 1A; supplemental Table 2), of which ARID5B, IKZF1, and GATA3 have been reported previously.11,12,19,20 A novel locus was identified in the intronic region of the ERG gene at 21q22.2 (Figure 1A), with the strongest association signal at rs2836365 (P = 3.8 × 10−8; odds ratio [OR] = 1.56, 1.33-1.83; supplemental Table 3). In the replication cohort of 144 Hispanic cases and 441 controls, the association signal was confirmed for rs2836365 (P = .01; OR = 1.43 [1.07-1.89]; supplemental Table 3). To further explore ALL risk variants in ERG, we imputed genotypes at additional SNPs within a 1-Mb region flanking rs2836365 and found 12 variants achieving genome-wide significance (supplemental Table 4). An imputed SNP rs2836371 showed more significant association than the original GWAS top hit (P = 1.42 × 10−9; OR = 1.64 [1.40-1.93]; supplemental Table 4), and it remained significant even after adjusting for rs2836365 (P = .006; OR = 2.03 [1.22-3.37]; supplemental Figure 5). However, no SNP in this region was significant after adjusting for rs2836371, pointing to single plausible causal variant.
To explore the potential functional effects of ALL risk alleles in the ERG locus, we examined lineage-specific chromatin accessibility data of the human hematopoietic cells,21 and found that rs2836371 resided in a region of open chromatin with a moderate ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) signal in both hematopoietic stem cells and megakaryocyte-erythroid progenitor cells (Figure 1B). More interestingly, the ALL association peak at this locus was located within a ∼150-kb region encompassing genome-wide significant loci for plateletcrit, mean corpuscular volume/hemoglobin, and white blood cell types.
The ERG risk allele at rs2836365 was only modestly associated with ALL susceptibility in European Americans (EAs) (P = .02; OR = 1.12 [1.02-1.22]; N = 2317 cases and 2050 controls) and was not significant in AAs (P > .05, OR = 0.96 [0.74-1.24], N = 227 cases and 1380 controls; supplemental Table 5). In both GWAS discovery and replication series, the ERG risk allele was significantly more common in Hispanics than EAs and AAs, and the allele frequency was positively related to the proportion of Native American ancestry (Figure 2A). The effect size of this variant also increased with Native American ancestry (OR = 1.13, 1.55, and 2.35, respectively; Figure 2B). These results pointed to ERG as a plausibly ancestry-related risk locus for childhood ALL.
We next examined whether ERG SNP genotype preferentially predisposes to any ALL subtype, focusing on the COG P9904/9905/9906 series because it represented a large national cohort of ALL patients consecutively enrolled with minimal selection bias, including major subtypes: ETV6-RUNX1, TCF3-PBX1, KMT2A rearrangement, hyperdiploidy, and B-other. Because the ERG risk allele was significant in both Hispanics and EAs, we performed our analyses combining patients from these 2 racial/ethnic groups and adjusted for genetic ancestry (N = 1391). The ERG risk genotype was significantly underrepresented in ETV6-RUNX1 ALL (P = .0003), but enriched in the TCF3-PBX1 subtype (P = .03; Figure 2C). ERG expression also varied significantly across ALL subtypes, with the highest level observed in ETV6-RUNX1 ALL (supplemental Figure 6). Because somatic alterations at the ERG locus have been recently described and define a novel ALL subtype (concomitant with IGH-DUX4 rearrangements),22 we also evaluated its association with ERG risk variants in a subset of 905 ALL cases with both somatic and germline genomic data available. The frequency of ERG risk allele at rs2836365 was significantly lower in cases with somatic ERG deletion than those without (supplemental Figure 7; P = .04 and .02, for with or without adjusting for genetic ancestry, respectively).
The biological basis of racial disparities in cancer is poorly understood, in part because non-European populations are disproportionally underrepresented in cancer genomic studies. Taking a race/ethnicity-specific approach, we identified a novel ALL risk locus in Hispanics, in the ERG intronic region. The ERG risk variant is related to Native American ancestry in that its variant frequency and effect size both increase with the level of Native American ancestry, pointing to a likely ancestry-related effect on ALL susceptibility. The correlation of the ERG risk allele frequency with Native American ancestry was also true in a cohort of Guatemalan children with ALL (supplemental Figure 8). The underlying mechanism for such race/ethnicity-dependent effects of a genetic risk factor is unclear, although it has been reported for other cancers23 (eg, a stronger effect of the ESR1 locus for breast cancer susceptibility in Chinese women compared with Europeans and not significant in Africans24). It can be posited that the ERG variant interacts with another yet-to-be-discovered ALL risk allele that is exclusively present in Hispanics and the combination of both is important for ALL susceptibility. Alternatively, the ERG risk variant identified herein tags a causal allele that is absent in non-Hispanics, although this is less likely given the results from the imputation analyses. Future studies are thus warranted to unravel the mechanistic details linking ERG to ALL pathogenesis. We also examined all previously reported ALL susceptibility loci in our Hispanic GWAS (supplemental Table 2).
ERG encodes an ETS domain-containing transcription factor important for normal hematopoietic development.25 Recently, we and others identified a novel ALL subtype characterized by IGH-DUX4 rearrangement in which the overexpression of DUX4 leads to ERG deregulation (primarily the expression of an alternative ERG transcript [ERGalt] with secondary deletion of the wild-type ERG allele in some cases).22 Interestingly, our novel ALL risk variant resides within close proximity to the hotspot of leukemic ERG deletions (Figure 1B), and there was a significant negative correlation between germline and somatic variation at the ERG locus, arguing for similar effects of these variants on ERG function (supplemental Figure 7A).
Our results suggested that there could be a substantial number of genetic variants/loci contributing to racial/ethnic disparities in ALL, and collaborative efforts with larger sample sizes are needed to systematically uncover these molecular determinants in the future.
Supplementary Material
The online version of this article contains a data supplement.
Acknowledgments
The authors thank the patients and parents who participated in the clinical protocols included in this study, and the clinicians and research staff at participating institutions.
This work was partly supported by a St. Baldrick’s Foundation International Scholar award (H.Z.); a St. Baldrick’s Foundation Scholar award and a Robert J. Arceci award (C.G.M.); National Institutes of Health grants P50 GM115279 (National Institute of General Medical Sciences), CA156449, CA21765, CA36401, CA98543, CA114766, CA98413, CA140729, and CA176063 (all from the National Cancer Institute), GM92666 (National Institute of General Medical Sciences), and HHSN261200800001E (National Cancer Institute); the National Key Research and Development Program of China (2016YFC0905000 [2016YFC0905001, 2016YFC0905002]); the National Natural Science Foundation of China (81522028, 81728003, and 81673452); and the American Lebanese Syrian Associated Charities.
Footnotes
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Authorship
Contribution: J.J.Y. is the principal investigator of this study, has full access to all of the data in the study, and takes responsibility for the integrity of the data and the accuracy of the data analysis; M.Q., H.X., W.Y., and S.Z. performed data analysis; M.Q., H.X., and J.J.Y. wrote the manuscript; V.P.-A., K.G.R., X.Z., C.S., M.D., J.M.G.-F., E.R., E.L., N.W., F.A.-K., W.P.B., P.L.M., M.B., B.W., E.G.B., C.-H.P., C.G.M., W.E.E., S.P.H., M.V.R., and M.L.L. contributed reagents, materials, and/or data; M.Q., H.X., H.Z., W.Y., and J.J.Y. interpreted the data and the research findings; and all of the coauthors reviewed the manuscript.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Jun J. Yang, Hematologic Malignancies Program, Comprehensive Cancer Center, Department of Pharmaceutical Sciences, St. Jude Children’s Research Hospital, 262 Danny Thomas Pl, MS313, Memphis, TN 38105; e-mail: jun.yang@stjude.org.
REFERENCES
- 1.Hunger SP, Mullighan CG. Acute lymphoblastic leukemia in children. N Engl J Med. 2015;373(16):1541-1552. [DOI] [PubMed] [Google Scholar]
- 2.Pui CH, Evans WE. Treatment of acute lymphoblastic leukemia. N Engl J Med. 2006;354(2):166-178. [DOI] [PubMed] [Google Scholar]
- 3.Linabery AM, Ross JA. Trends in childhood cancer incidence in the U.S. (1992-2004). Cancer. 2008;112(2):416-432. [DOI] [PubMed] [Google Scholar]
- 4.Chow EJ, Puumala SE, Mueller BA, et al. Childhood cancer in relation to parental race and ethnicity: a 5-state pooled analysis. Cancer. 2010;116(12):3045-3053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Xu H, Cheng C, Devidas M, et al. ARID5B genetic polymorphisms contribute to racial disparities in the incidence and treatment outcome of childhood acute lymphoblastic leukemia. J Clin Oncol. 2012;30(7):751-757. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Walsh KM, Chokkalingam AP, Hsu LI, et al. Associations between genome-wide Native American ancestry, known risk alleles and B-cell ALL risk in Hispanic children. Leukemia. 2013;27(12):2416-2419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wiemels JL, Walsh KM, de Smith AJ, et al. GWAS in childhood acute lymphoblastic leukemia reveals novel genetic associations at chromosomes 17q12 and 8q24.21. Nat Commun. 2018;9(1):286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Moriyama T, Relling MV, Yang JJ. Inherited genetic variation in childhood acute lymphoblastic leukemia. Blood. 2015;125(26):3988-3995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Vijayakrishnan J, Studd J, Broderick P, et al. ; PRACTICAL Consortium. Genome-wide association study identifies susceptibility loci for B-cell childhood acute lymphoblastic leukemia. Nat Commun. 2018;9(1):1340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Xu H, Zhang H, Yang W, et al. Inherited coding variants at the CDKN2A locus influence susceptibility to acute lymphoblastic leukaemia in children. Nat Commun. 2015;6(1):7553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Xu H, Yang W, Perez-Andreu V, et al. Novel susceptibility variants at 10p12.31-12.2 for childhood acute lymphoblastic leukemia in ethnically diverse populations. J Natl Cancer Inst. 2013;105(10):733-742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Perez-Andreu V, Roberts KG, Harvey RC, et al. Inherited GATA3 variants are associated with Ph-like childhood acute lymphoblastic leukemia and risk of relapse. Nat Genet. 2013;45(12):1494-1498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Perez-Andreu V, Roberts KG, Xu H, et al. A genome-wide association study of susceptibility to acute lymphoblastic leukemia in adolescents and young adults. Blood. 2015;125(4):680-686. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Larsen EC, Devidas M, Chen S, et al. Dexamethasone and high-dose methotrexate improve outcome for children and young adults with high-risk B-acute lymphoblastic leukemia: a report from Children’s Oncology Group Study AALL0232. J Clin Oncol. 2016;34(20):2380-2388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Borowitz MJ, Devidas M, Hunger SP, et al. ; Children’s Oncology Group. Clinical significance of minimal residual disease in childhood acute lymphoblastic leukemia and its relationship to other prognostic factors: a Children’s Oncology Group study. Blood. 2008;111(12):5477-5485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Pui CH, Sandlund JT, Pei D, et al. ; Total Therapy Study XIIIB at St Jude Children’s Research Hospital. Improved outcome for children with acute lymphoblastic leukemia: results of Total Therapy Study XIIIB at St Jude Children’s Research Hospital. Blood. 2004;104(9):2690-2696. [DOI] [PubMed] [Google Scholar]
- 17.Pui CH, Campana D, Pei D, et al. Treating childhood acute lymphoblastic leukemia without cranial irradiation. N Engl J Med. 2009;360(26):2730-2741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Burchard EG, Avila PC, Nazario S, et al. ; Genetics of Asthma in Latino Americans (GALA) Study. Lower bronchodilator responsiveness in Puerto Rican than in Mexican subjects with asthma. Am J Respir Crit Care Med. 2004;169(3):386-392. [DOI] [PubMed] [Google Scholar]
- 19.Treviño LR, Yang W, French D, et al. Germline genomic variants associated with childhood acute lymphoblastic leukemia. Nat Genet. 2009;41(9):1001-1005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Papaemmanuil E, Hosking FJ, Vijayakrishnan J, et al. Loci on 7p12.2, 10q21.2 and 14q11.2 are associated with risk of childhood acute lymphoblastic leukemia. Nat Genet. 2009;41(9):1006-1010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Corces MR, Buenrostro JD, Wu B, et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat Genet. 2016;48(10):1193-1203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhang J, McCastlain K, Yoshihara H, et al. ; St. Jude Children’s Research Hospital–Washington University Pediatric Cancer Genome Project. Deregulation of DUX4 and ERG in acute lymphoblastic leukemia. Nat Genet. 2016;48(12):1481-1489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Henderson BE, Lee NH, Seewaldt V, Shen H. The influence of race and ethnicity on the biology of cancer. Nat Rev Cancer. 2012;12(9):648-653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Cai Q, Wen W, Qu S, et al. Replication and functional genomic analyses of the breast cancer susceptibility locus at 6q25.1 generalize its importance in women of Chinese, Japanese, and European ancestry. Cancer Res. 2011;71(4):1344-1355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Loughran SJ, Kruse EA, Hacking DF, et al. The transcription factor Erg is essential for definitive hematopoiesis and the function of adult hematopoietic stem cells. Nat Immunol. 2008;9(7):810-819. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.