Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Dec 20.
Published in final edited form as: Nat Med. 2016 Jun 20;22(7):792–799. doi: 10.1038/nm.4125

Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia

Sheng Li 1,*,2, Francine E Garrett-Bakelman 3,*, Stephen S Chung 4, Mathijs A Sanders 5, Todd Hricik 4, Franck Rapaport 4, Jay Patel 4, Richard Dillon 6, Priyanka Vijay 7, Anna L Brown 8,9,10, Alexander E Perl 11, Joy Cannon 11, Lars Bullinger 12, Selina Luger 11, Michael Becker 13, Ian D Lewis 8,10,14, Luen Bik To 10,14, Ruud Delwel 5, Bob Löwenberg 5, Hartmut Döhner 12, Konstanze Döhner 12, Monica L Guzman 3, Duane C Hassane 3, Gail J Roboz 3, David Grimwade 6, Peter JM Valk 5, Richard J D’Andrea 8,9,10, Martin Carroll 11, Christopher Y Park 15,16, Donna Neuberg 17, Ross Levine 4, Ari M Melnick 3,ˆ, Christopher E Mason 18,2,ˆ
PMCID: PMC4938719  NIHMSID: NIHMS786454  PMID: 27322744

Abstract

Genetic heterogeneity contributes to clinical outcome and progression of most tumors. Yet, little is known regarding allelic diversity for epigenetic compartments and almost no data exists for acute myeloid leukemia (AML). Here we examined epigenetic heterogeneity as assessed by cytosine methylation within defined genomic loci with four CpGs (epigenetic alleles), somatic mutations and transcriptomes of AML patient samples at serial time points. We observe that epigenetic allele burden is linked to inferior outcome and varies considerably during disease progression. Epigenetic and genetic allelic burden and patterning follow different patterns and kinetics during disease progression. We observed a subset of AMLs with high epiallele and low somatic mutation burden at diagnosis, a subset with high somatic mutation and lower epiallele burdens at diagnosis, and a subset with a mixed profile, suggesting distinct modes of tumor heterogeneity. Genes linked to promoter-associated epiallele shifts during tumor progression display increased single-cell transcriptional variance and differential expression, suggesting functional impact on gene regulation. Thus, genetic and epigenetic heterogeneity can occur with distinct kinetics, each likely able to impact biological and clinical features of tumors.

INTRODUCTION

Acute Myeloid Leukemia (AML) is a predominantly fatal hematopoietic malignancy13. Even when leukemia cells appear to have been eradicated from the bone marrow after chemotherapy treatment, most patients eventually relapse. Therefore, understanding how subpopulations of AML cells are resilient to chemotherapy and give rise to progressively more refractory disease is of utmost importance.

Several mechanisms are hypothesized to endow sub-populations of cells within AML tumors with the capacity to survive exposure to therapeutic agents. These mechanisms include the presence of subsets of quiescent cancer stem cells with inherently greater self-renewal properties and reduced sensitivity to chemotherapy drugs4. Somatic mutations that can facilitate the expansion of putative leukemia stem cell populations can also emerge during disease establishment and progression58.

Genetic heterogeneity within tumors can also increase evolutionary fitness9,10. Genetic diversity among individual cells within a tumor presumably provides the greatest chance for a subset of cells with particular combinations of mutations (single nucleotide variants (SNVs), insertions and deletions (INDELs) as well as copy number aberrations (CNAs) and translocations) to survive when challenged by cytotoxic drugs. Genetic heterogeneity has been appreciated in AML since early karyotyping studies11,12, and tumor heterogeneity is also relevant in solid tumors and lymphoid malignancies1319, where several studies have linked the degree of genetic heterogeneity to clinical outcome16,20. However, cells from patients with AML feature a paucity of genetic lesions compared to most solid tumors21,22, suggesting that other factors, such as epigenetic changes, could contribute to the aggressive behavior of AML.

Indeed, aberrant epigenetic patterning is a hallmark of AML23,24. Gain or loss of cytosine methylation at specific loci disrupts promoter activity and gene expression regulation23,25,26. Epigenetic marks have great plasticity and may vary over time or through exposure to environmental stimuli. Thus, there is potential for epigenetic diversification to emerge in tumor cell populations and to change during disease progression. Sequence-based profiling can be used to determine quantitative cytosine methylation at base-pair resolution. DNA methylation at sequential CpGs (cytosine-phosphate-guanine dinucleotides) can create a phased epigenetic pattern (epiallele) that represents a native “cellular barcode”27,28. Such data can be used to measure epigenetic heterogeneity, which has been associated with worse clinical outcomes in lymphoid malignancies2831. Yet in contrast to AML, lymphoid malignancies display a high burden of somatic mutations and can arise from cell types such as B- and T-cells that are naturally prone to epigenetic heterogeneity32.

It is has been suggested that genetic mutations and/or cytogenetic abnormalities and epigenetic modifications would diversify along similar lines during disease progression28,33. This may be a selection process of genetic and epigenetic changes, which enhance tumor fitness through expression of oncogenes and/or repression of tumor suppressors. Alternatively, this may be a process of genetic diversification and clonal selection34 in parallel with epigenetic modifications35, leading to the same outcome. However, the interplay and/or independence of genetic versus epigenetic heterogeneity in AML have not been thoroughly explored. Here we performed the first large-scale comparative genomics and epigenomics study in serial specimens obtained from patients with AML. We assembled a cohort of 138 clinically annotated, paired AML patient samples (diagnosis and relapse; Supplementary Tables 1 and 2). Patient samples were characterized with genome-wide methylome sequencing, using a modified version of reduced representation bisulfite sequencing (RRBS)36,37, Enhanced Reduced Representation of Bisulfite Sequencing (ERRBS)38,39. In a subset of this cohort, whole exome-sequencing (WES) was performed in paired patient samples with matched germline control DNA, RNA sequencing (RNA-seq) was performed in paired patient samples, and single-cell RNA-sequencing (scRNA-seq) in the patient with the most timepoints. This allowed for epigenetic heterogeneity assessment and comparison to genetic and transcriptional variance during disease progression. The epigenetic features explored in this study are conceptually quite different than the epigenetic signatures that define uniform patterns of differentially methylated genes23,24. Notably, we find that genetic and epigenetic allelic variations follow distinct, and often independent, kinetics and patterns. Moreover, shifts in epiallele composition are linked to transcriptional deregulation at the global and single-cell levels, and higher epiallele burden is linked to clinically more aggressive disease. Collectively, our data show that epigenetic allelic diversification occurs during AML establishment and disease progression, and may be independent of the genetic landscape.

RESULTS

Epiallele burden is linked to poor clinical outcome

We defined loci with epigenetic allelic variance in our patient cohort using the Methclone compositional entropy equation40. By definition, the epigenetic state of each locus is comprised of cytosine methylation at four consecutive CpG dinucleotides. Each of the possible 16 CpG methylation patterns at these loci is an “epiallele” (Supplementary Fig. 1). Epigenetically shifted loci (“eloci”) occur when the epiallele proportions at these sites undergo a statistically significant entropy shift (calculated by delta Boltzmann entropy ΔS < Δ90) in their composition when comparing two specimens40. The global metric EPM (eloci per million loci), which normalizes for the variable depth of coverage per specimen and the number of loci measured40, was used to determine the overall magnitude of epiallele shifting across the genome and “epiallele burden.” Shifting can include both gain and/or loss of epialleles between two specimens. The epiallele and eloci measurements were determined using methylome data from ERRBS (Supplementary Table 3) and validated in a subset of specimens using orthogonal methylome sequencing methods on two different platforms (Agilent and Roche; Supplementary Fig. 2).

We evenly divided our patients into two cohorts based on their EPM values (highest and lowest 50%). There was no difference in the burden of somatic mutations between the two groups (Supplementary Fig. 3). We then plotted clinical outcome as “time to relapse,” and we observed that patients with high EPM at diagnosis compared to NBM had a shorter time to relapse as compared to the low EPM cohort (P = 0.0396, Mantel-Cox log rank test, Fig. 1a). This association was most significant for EPM assessed based on promoter-associated eloci (P = 0.0077, Mantel-Cox log rank test, Fig 1b).

Figure 1.

Figure 1

EPM levels at diagnosis compared to normal bone marrow segregate patients into two groups with distinct clinical outcomes. (a) Time to relapse analysis for patients (n = 137) with high (red) or low (black) EPM values at diagnosis compared to normal bone marrow. (b) Time to relapse analysis for patients (n = 137) with high (red) or low (black) EPM values assessed from promoter-annotated eloci (loci in promoters that were shared by at least 75% of patients were included). (c) Time to relapse analysis for patients with high (blue) or low (green) somatic mutation burden in diagnosis samples (n = 48). Mantel-Cox log rank test was used for the survival analysis (ac).

We also performed a multivariate analysis using linear regression to determine if EPM retained its association with clinical outcome independent of other clinical variables. We found that EPM maintained a significant association with clinical outcome (P = 0.024, Cox proportional hazards regression model), taking into account age, white blood cell count at the time of diagnosis, and gender. In contrast, none of these other criteria were independently linked to time to relapse in this cohort (Table 1). In contrast to EPM, the burden of somatic mutations based on WES (n = 48; Supplementary Table 4) was not significantly associated with time to relapse (P = 0.272, Mantel-Cox log rank test, Fig. 1c).

Table 1.

Multivariate analysis of EPM association with time to relapse. Multivariate Cox proportional hazards regression model used relapse time as response variable, and tested EPM and clinical parameters as variables in the entire cohort (n = 127).

127 Patients
Variable P value Hazard Ratio
EPM 0.024 1.559
Age 0.930 0.994
Gender 0.303 1.223
WBC count 0.339 0.999

Relapsed AMLs display variable changes in epiallele burden

To further understand the nature of epiallele shifting in AML, we compared diagnostic and relapsed AML specimen methylomes to NBMs (Fig. 2a). We observed that all AML patients displayed substantial epiallele shifting both at diagnosis and at relapse compared to NBMs (Fig. 2b,c). Although the degree of EPM increase was variable, it was increased across all patients regardless of disease stage (Fig. 2d). We next compared EPM in individual patients at relapse versus diagnosis (Fig 2a). In this case we observed significant intra-patient variation (Fig. 2d,e). Although most patients exhibited a considerable degree of epiallele shifting during disease progression (median log10(EPM) = 2.29), the magnitude of this difference was highly variable. Indeed, a few patients (8%) exhibited no significant epigenetic changes (EPM = 0; Fig. 2e) between diagnosis and relapse. Overall, these data suggest that global increases in epigenetic allele shifting is a universal feature of AML relative to NBM controls, but, in contrast, epiallele shifting is highly variable within individual patients during disease progression. Notably, the changes in epiallele burden were independent of patients’ age, percent blast purity of specimens, disease FAB subtype and abundance of somatic mutations (Supplementary Fig. 4ah).

Figure 2.

Figure 2

AML is characterized by high epiallele shift and variance. (a) Schematic diagram representing the DNA methylation patterns compared between CD34+ normal bone marrow controls (NBM), diagnostic AML and relapsed AML patient samples. (b,c) log10(EPM) values of diagnostic (b) and relapsed (c) patient samples versus NBMs. (d) Violin plot of the EPM values between the AML patient samples and NBMs and intra-patient relapse versus diagnosis (Wilcoxon rank sum tests: ***P < 0.001). (e) log10(EPM) values between AML diagnosis and relapse samples.

We also noted that relative to NBM controls, eloci were significantly overrepresented at CpG islands and promoters (P = 2.9×10−3 and P = 0.027 respectively; Wilcoxon signed rank tests) in AML patients at diagnosis (Supplementary Fig. 5af). Eloci were also significantly enriched at enhancers at diagnosis (based on ChIP-seq data in human CD34+ cells; active enhancers: P = 1.4×10−4; poised enhancers: P = 7.5×10−3; Wilcoxon signed rank tests; Supplementary Fig. 5jk). At relapse there was more significant enrichment for eloci at intronic and intergenic regions (P = 5.3×10−3 and P = 7.6×10−4 respectively; Wilcoxon signed rank tests; Supplementary Fig. 5hi). These differences point towards a significant disruption of enhancer states during AML progression.

Independent patterns of epigenetic and genetic diversity

The EPM measurement provides assessment of epigenetic allele shifting at the global level, but does not provide information on how specific eloci might vary during leukemia progression. This latter information would be similar to how genetic mutations are evaluated individually in tumors. From the genetic perspective, AML progression has been examined in limited numbers of patients using next generation sequencing and other techniques5,4144. These studies showed that while most genetic lesions did not change during disease progression, some were gained or lost at relapse. In a subset of patients there were no new somatic mutations at relapse.

Our current dataset enabled a simultaneous comparison of epigenetic and genetic allele burdens, discerning if and how their composition might vary during leukemia progression, and whether these two phenomenon correlate with each other. To address each of these points, we compared eloci occurring at diagnosis and separately at relapse to NBMs. We then defined three categories of eloci: 1) eloci unique to diagnosis (diagnosis-specific), 2) eloci unique to relapse (relapse-specific), and 3) eloci that were present both at diagnosis and relapse (shared).

We examined the eloci in several ways. First, we performed K-means clustering based on the proportion of diagnosis-specific, relapse-specific or shared eloci present in each patient (Supplementary Fig. 6a and Fig. 3a). This analysis segregated AML patients into cohorts as follows: 1) a significant predominance of diagnosis-specific eloci (cluster 1, n = 57, P = 5.3 × 10−11, Wilcoxon signed rank test), 2) mostly shared eloci and no predominance of diagnosis- or relapse-specific eloci (cluster 2, n = 39), and 3) a significant predominance of relapse-specific eloci (cluster 3, n = 42, P = 4.5×10−13, Wilcoxon signed rank test; Fig. 3a). There was no significant link between patients belonging to any of these three clusters with their age, white blood cell count, or FAB classification (Supplementary Fig. 6bd).

Figure 3.

Figure 3

Disease stage-specific epiallele patterns define unique subsets of AML patients. (a) Proportions of eloci that are diagnosis-specific (light green), shared (green), or relapse-specific (dark green) are shown for each cluster defined using K-means clustering. (b) Proportions of somatic mutations that were diagnosis-specific (light blue), shared (blue), or relapse-specific (dark blue) are shown for the subset of patients with exome-sequencing data within each cluster defined by the abundance of eloci in (a). (c,d) Number of somatic mutations (log10) for each eloci cluster at diagnosis (c; P = 0.048) or relapse (d; P = 0.008). (e,f) Proportion of somatic mutations whose variant allele frequency are increased (e; P = 0.367) or decreased (f; P = 0.0012) by 10% or more at relapse compared to diagnosis. (cf) Wilcoxon rank sum tests: *P < 0.05; **P < 0.01; NS = not significant. ID = patient counts.

We next determined the abundance of somatic genetic mutated alleles (SNVs and INDELs) for patients for whom WES was available (n = 48). We examined the abundance of somatic mutations at diagnosis and relapse within each of the three epigenetically defined patient clusters. We annotated somatic mutations into diagnosis-specific, relapse-specific or shared between the two time points, similar to the criteria used in Fig. 3a. The proportions of the three somatic mutation categories did not follow the same distribution as the eloci in cluster 1 & 3 (cluster 1, P = 0.89, cluster 3, P = 0.40; Wilcoxon signed rank tests; Fig. 3b). For example, samples in epigenetic cluster 1 exhibited significantly lower frequencies of somatic mutations at both diagnosis and relapse, compared with samples in cluster 3 (Fig. 3c, P = 0.048, Fig. 3d, P = 0.008, Wilcoxon rank sum tests; Supplementary Table 5). Concordantly, a reciprocal analysis using the median mutation burden per subject to divide patients revealed enrichment for higher mutations rates in cluster 3 patients (Supplementary Fig. 6e). These data indicate distinct combinations of genetic and epigenetic sub-types between the clusters of patients that are defined by the eloci.

We next examined whether specific individual somatic mutations might be linked to the tendency to develop eloci. Such considerations are of interest, given that many recurrent somatic mutations in AML occur in genes encoding epigenetic modifiers11,45. However, there was no significant association between specific mutations in genes previously reported to be recurrent in AML11,45 and the abundance of eloci (all genes P > 0.05, Chi-square test with Monte Carlo simulation; Supplementary Fig. 6f; Supplementary Table 6). As in many tumors, somatic mutations in AML occur in a heterogeneous pattern, representing the presence of genetically distinct subclones. Thus, it is possible that the genetic clonal complexity of AML might mirror the abundance of eloci. Using sciClone46 we determined that 32.5% of patients exhibited increasing clonal complexity, 20% exhibited decreasing clonal complexity, and 47.5% exhibited no change in clonal complexity between diagnosis and relapse (Supplementary Fig. 7). Yet none of these clonal evolution patterns were enriched in epigenetically defined clusters (P = 0.915 chi-square test; Supplementary Fig. 8), further indicating independent kinetics of genetic and epigenetic changes during AML evolution.

We also evaluated whether changes in variant allele frequency (VAF) of somatic mutations between diagnosis and relapse were linked to the pattern of eloci occurrence. Each patient exhibited variable numbers of alleles that decreased, increased or remained stable in frequency in diagnosis and relapse samples. We focused our assessment on the patients in epigenetic clusters 1 and 3, which exhibit the most differences in eloci distributions between diagnosis and relapse samples. Somatic mutations determined in diagnosis and relapsed cases compared to matched germline controls were assessed for changes in variant allele frequencies during disease progression. Increased VAF was defined by a minimum gain of 10 % (Fig. 3e), and decreased VAF was defined by a minimum loss of 10% (Fig. 3f). Among patients in epigenetic clusters 1 and 3, there was no significant difference in the tendency to gain somatic mutation VAF at relapse (Fig. 3e and Supplementary Table 5). However, among epigenetic cluster 3 patients there was significantly lower proportion of somatic mutations with reduced VAF upon disease relapse (P = 0.0012, Wilcoxon rank sum test; Fig. 3f and Supplementary Table 5). These data suggest that disease progression in AML can be classified based on the DNA methylation heterogeneity patterns. Furthermore, epigenetic heterogeneity and genetic diversification do not necessarily follow the same kinetics between diagnosis and relapse.

We next determined if the differences in epiallele shift patterns in cluster 1 and 3 were associated with perturbations in the transcriptional landscape. We were able to perform RNA-sequencing in paired diagnostic and relapsed patient specimens representing patients from each cluster (1 and 3; n = 19; Supplementary Table 7). A supervised analysis (see methods) revealed 114 genes differentially expressed between diagnostic samples in cluster 1 and cluster 3 (Supplementary Table 8). These included the cell cycle inhibitor CDKN1A and transcription factor SPI-B, which are expressed higher in cluster 3 and FOXC1 which is expressed higher in cluster 1. Overall, we detected upregulation of protein kinase and other signaling genes in cluster 1 and significant enrichment of genes linked to inflammation and immune response in cluster 3 specimens (Supplementary Table 9). Differential expression of transcription factors, inflammatory and signal transduction gene sets between the two groups suggests a potential for alternative mechanisms of oncogenesis between the patients in epigenetic clusters 1 and 3.

Epiallele and mutation shifts throughout AML progression

To create a higher resolution view of the progression kinetics of genetic and epigenetic heterogeneity in AML, we performed whole genome sequencing (WGS) and ERRBS on leukemia cells from a patient (AML_130) who had five serial specimens available (T1 = diagnosis; T2 – T5 = serial relapse time points; Supplementary Tables 2, 3 and 4). We first evaluated the global epigenetic allele burden (EPM) for each time point compared to NBMs (n = 14 NBMs; Fig. 4a). We observed a marked increase (150%) in epiallele burden occurring at T2. In contrast there was little increase (2.7%) in the number of somatic mutations at this time-point compared to germline DNA from the same patient (Fig. 4b). At the following three timepoints, there were relatively smaller increases in EPM. However at T4 there was a larger increase (29%) in the abundance of somatic mutations compared to other time points (5.4% or less), indicating that the initial epigenetic shift in T2 was antecedent to the later genetic evolution in AML cells from this patient at T4. Furthermore, genomic analysis did not yield evidence of acquisition or loss of any somatic mutation linked to epigenetic modifier genes at T2 that might explain the jump in epiallele diversity at T2 (Supplementary Table 10).

Figure 4.

Figure 4

Assessment of epiallele shift and genetic changes in serial samples from a single patient. ERRBS and WGS were performed in serial samples from a single patient (AML_130: diagnosis (T1) and four relapse collections: T2T5). (a) Epiallele shift (EPM) compared to NBMs (n = 14) at each time point (error bars are the standard error of the mean). (b) Somatic mutation burden at each time point. (c) The number of eloci that are shared and unique between all time points. (e) Density plot of the dominant epiallele frequency detected at eloci across all time points. (f) Density plot of the tumor variant allele frequencies detected at each time point.

We next examined the unique occurrences of individual eloci and somatic mutations at each timepoint. Analysis of eloci revealed a tendency of the epigenome to evolve continuously over time (Fig. 4c). Whereas a majority of eloci (65.79%) were unique to a single time point, many fewer eloci were shared among two, three or four timepoints (18.78%, 10.04%, 3.64% respectively); only 1.75% of eloci were shared across all five timepoints (Supplementary Fig. 9a). In sharp contrast, genetic alleles were far more stable; 24.6% of somatic alleles remained detectable over all five timepoints (Fig. 4d, Supplementary Fig. 9b, and Supplementary Table 11). Hence, genetically defined, mutated alleles exhibited significantly greater stability over time than eloci (P < 2.2 × 10−16, chi-square test).

To better discern the relationship between epigenetic and genetic heterogeneity, we examined which of the 16 possible epialleles was dominant within each elocus at each time point. The proportion of these specific epialleles was used as a measure of epigenetic clonality. Reciprocally, to evaluate genetic clonal complexity, we examined the VAF of somatic mutations at each time point. Notably, we observed a shift in dominant epiallele frequency over time. At early timepoints, the dominant epialleles tended to occur at a frequency of > 50% (Fig. 4e; median = 53%). A marked shift in epiallele frequency occurred at T3 at which time the dominant pattern shifted to a lower frequency (25%). Later on, by T5, the leukemia cells were again enriched for eloci with higher dominant epiallele frequency. However, the pattern was different when considering genetic VAF. At early timepoints (T1–T3), the median VAF was 44%. As the disease progressed, a shift to low frequency genetic mutations (VAF) did occur (Fig. 4f), but only at T5. Genetic alleles were also distinct from dominant epialleles in that they exhibited a tighter allele frequency distribution (P = 0.008, Wilcoxon rank sum test; Fig. 4e,f), perhaps linked to the greater plasticity of the epigenome versus the genome. Collectively, these higher resolution data support the notion that epigenetic and genetic heterogeneity in AML are not necessarily linked and may follow distinct patterns and distributions during disease progression.

Promoter epiallele shifts linked to transcription variance

Given that transcription is regulated through epigenetic marks, we next determined whether the presence of eloci was linked to alterations in gene expression. We focused on genes containing eloci within promoters, derived from the comparison of diagnosis to relapsed specimens (n = 19 pairs; Supplementary table 7). Genes containing shifts in epiallele composition at promoters (promoter eloci) exhibited significantly greater variance in their transcript abundance between relapse and diagnosis, as compared to genes without such epiallele shifts in their promoters (P < 0.001; Wilcoxon rank sum test; Fig. 5a,b). We next focused our analysis specifically on genes that were significantly differentially expressed when comparing relapse versus diagnosis AML specimens. Here again, a higher proportion of differentially expressed transcripts were significantly associated with promoters harboring eloci than with promoter not harboring eloci (P < 0.001; Wilcoxon signed rank test; Fig. 5c).

Figure 5.

Figure 5

Transcriptional variance is associated with high epiallele shift at promoters. (a) Density plot of log2 fold change of transcript levels of genes with eloci within their promoters (red), and genes without eloci in their promoters (blue) as measured from bulk cell populations (n = 19 paired patient samples). (b) Violin plot of the log2 fold change variance in transcript expression from genes with or without eloci in their promoters in bulk cell populations (Wilcoxon signed rank test; P = 3.82×106). (c) Violin plot of the percentage of genes that are differentially expressed (DEGs: absolute log fold change > 1; Wilcoxon signed rank test: P = 3.82×106) with or without eloci in their promoters in bulk cell populations (Wilcoxon signed rank test). (d) Violin plots of transcript expression level variance as measured by single cell RNA-sequencing (AML_130 relapse sample) and association (ANOVA test, P < 2.2×1016) with low (< 0.05), intermediate (0.050.2) and high (0.21) epiallele shift within respective gene promoters. Wilcoxon signed rank tests and ANOVA test: ***P < 0.001.

Finally, we used an additional approach to examine potential links between promoter epigenetic heterogeneity and transcription. We performed single-cell RNA-seq (scRNA-seq) in 96 cells from the first relapse sample from AML patient 130 (T2) We found that the genes with higher epiallele heterogeneity within their promoters also displayed significantly higher levels of transcriptional heterogeneity, as measured by cell-to-cell coefficients of variation (P < 2.2×10−16, ANOVA test, Fig. 5d). Hence the presence of eloci at gene promoters resulted in greater tendency of associated genes to exhibit deregulated expression.

DISCUSSION

AML is a tumor type well suited for examining questions about epigenetic heterogeneity, since it generally manifests a relative paucity of genetic lesions21,22 but still exhibits genetic clonal complexity and clonal evolution during progression. Mazor, et al.33 recently examined DNA methylation at individual CpG sites and somatic mutations during tumor progression in gliomas, where a phylogenetic analysis showed a co-dependency of change for the two features examined. However, an examination of overall DNA methylation at individual CpG sites with array-based measurements cannot fully address epigenetic heterogeneity as measured by epiallele diversification. Hence the question has remained unresolved as to whether epigenetic alleles and genetic alleles follow similar, or independent, courses during disease evolution. Although others have documented epigenetic heterogeneity in B-cell neoplasms28,29,32, these tumors mostly arise from cells that display intrinsic epigenetic heterogeneity, and the studies did not explore the link between genetic and epigenetic allelic diversity during tumor progression. This is the first report of dynamic epiallele shifting in a human tumor. Previous reports have used DNA methylation heterogeneity metrics such as epipolymorphism approaches that calculate diversity within individual specimens, as opposed to the methclone compositional entropy, which measures shifting between samples28,29. Hence these two approaches measure different properties of the epigenome.

Herein, we establish that tumor genetic and epigenetic heterogeneity in AML may arise as independent, biologically distinct phenomena, each presumably with a unique functional significance. The degree of leukemic epigenetic allelic burden was independent of age and other clinical parameters. There was no apparent association between the degree of epigenetic heterogeneity and the presence of somatic mutations affecting epigenetic modifier genes such as DNMT3A, TET2, and IDH1/2. We also explored whether dominant epigenetic alleles behaved in a similar or distinct manner from genetic alleles during clonal evolution. When compared with epiallele shifting, or changes in dominant epiallele composition of eloci at serial timepoints, there was no association between the kinetics and pattern of genetic and epigenetic alleles during leukemic progression.

It is especially notable that AML patients can be classified based on epiallele pattern kinetics and somatic mutation burdens during disease progression. Our results suggest that in at least some cases, AML patients at diagnosis may be divided into disease with predominant epiallele diversity and low somatic mutations (cluster 1: epigenetically-driven) and others with lower epiallele diversity and higher mutation burden (cluster 3; genetically-driven; Fig. 3a, c). The latter develop increasing epigenetic diversity upon progression (Fig. 3a). In both cases, genetic clonal composition remains predominantly stable (Supplementary Table 5). Most importantly, epigenetic instability is not necessarily linked to genetic instability or specific somatic mutations. This observation may point to alternative modes of dominant heterogeneity in newly diagnosed patients: one genetic and one epigenetic. Tumor heterogeneity in AML can thus not strictly be determined by either genetic or epigenetic analysis alone, and relying only on genetic mapping may underestimate the true tumor diversity. We speculate that de novo acquisition of eloci might represent the response of hematopoietic cells to environmental stresses at various points during the development and/or progression of disease. Conversely, exposure to chemotherapy may induce epigenetic plasticity, contributing to the increased eloci seen at relapse in some patients. Indeed, longitudinal profiling in a patient treated with four different regimens resulted each time in distinct patterns of epiallele shifting. It is also possible that unrecognized mutations or alterations of coding or non-coding regions could destabilize the epigenome at specific loci and contribute to epigenetic heterogeneity. One example is the protein CTCF, which can regulate cytosine methylation patterning and boundaries47. A CTCF haploinsufficiency murine model results in the acquisition of epigenetic hyper-variability hotspots and development of tumors48. In the case of B-cells, epigenetic heterogeneity may be linked to the actions of activation-induced cytosine deaminase48. In the case of AML, biological differences among patients with different modes of eloci progression patterns may be mediated through differential expression of hematopoietic transcription factors and functional gene sets.

Importantly, the acquisition of eloci may indeed have functional consequences, since we find that genes containing eloci at their promoters display more pronounced transcriptional variability and differential expression during disease progression. A recent single-cell RNA-seq study on newly diagnosed CLL patients also showed that sites of epigenetic variability had greater cell-to-cell transcriptional plasticity28. A second indication of biological relevance is the inferior outcome of patients with higher global epiallele (EPM) burdens in both univariate and multivariate analyses. The fact that the link to inferior outcome is even stronger for promoter epialleles suggests that eloci that perturb transcriptional regulation are particularly important in contributing to disease phenotype.

Using an epipolymorphism-based approach, we observed decreasing DNA methylation heterogeneity upon AML progression, which matches a previous report in DLBCL29 (Supplementary Fig. 10a). However, contrary to DLBCL, the abundance of epipolymorphisms in AML did not segregate patients into distinct clinical groups based on progression free survival (Supplementary Fig. 10b) – only the epiallele measure (EPM) separated the cohorts. Epiallele shifting as detected by methclone is thus a candidate biomarker in AML, and may be indicative of more aggressive disease, perhaps reflecting greater epigenetic plasticity and adaptability in certain tumors. Given the heterogeneous nature of AML, our study was underpowered to detect specific eloci that could be used as outcome biomarkers, although there was a trend for a set of 21 promoter eloci to associate with shorter time to relapse (Supplemental Table 12).

Since this study is the first to examine the longitudinal interplay of genetic and epigenetic heterogeneity in AML, it is certainly possible that genetic and epigenetic heterogeneity are linked in other means in AML or in different manners in other types of tumors. Studies in larger cohorts using improvements in sequencing technologies will likely yield additional insights. Collectively, our data point towards epigenetic heterogeneity as an important feature of AML with functional and clinical impacts, one which does not fully follow the kinetics and patterns in the genetic compartment, and which creates additional means by which a tumor can evolve.

Online MATERIALS AND METHODS

Patient characteristics

138 clinically annotated, paired AML patient samples (diagnosis and relapse; 86 males and 52 females) seen at medical centers in Australia, Germany, Netherlands, and the United States were collected. Patients with acute promyelocytic leukemia were excluded. All patients were treated according to the protocols of corresponding institutes and hospitals. The clinical and molecular characteristics of these patients are summarized in Supplementary Table 1 and detailed descriptions are provided in Supplementary Table 2. All patients were treated with combination chemotherapy (cytarabine arabinoside and an anthracycline) during induction phase followed by consolidation chemotherapy treatment with or without a stem cell transplantation in first remission per clinical center standards. Samples from serial time points were available for patient AML_130. Briefly this was a male with previous high radiation dose exposure who developed AML at age 57. Samples were available from the diagnostic time point (sample T1: leukapheresis sample) and after the following treatments: He underwent standard induction treatment (cytarabine+idarubicin) and achieved complete remission, but developed relapse prior to consolidation treatment initiation. Patient was then treated with high dose cytarabine without response (sample T2; peripheral blood sample), combination chemotherapy (sirolimus, mitoxantrone, etoposide, and cytarabine) without response (sample T3; peripheral blood sample), combination chemotherapy without response (cyclophosphamide followed by clofarabine), hydroxyurea and external beam radiation to tonsillar site of extramedullary hematopoiesis (sample T4; bone marrow aspirate), and investigational FLT3 tyrosine kinase inhibitor (KW-2449) followed by allogeneic stem cell transplant without remission (sample T5; leukapharesis sample). No treatments beyond the chemotherapy administered immediately after diagnosis in AML_130 induced a complete remission.

Sample collection and processing

Donors (AML patients and individuals without known hematological malignancies) signed informed consent according to the declaration of Helsinki for collection and use of sample materials in research protocols at the following clinical centers: Erasmus Medical Center (protocol number MEC-2015-155), Royal Adelaide Hospital and SA Pathology (Adelaide, South Australia; 1998-onwards), University of Pennsylvania (protocol number 703185), University of Rochester Medical Center (protocol number URCC ULEU07047), and the University Hospital of Ulm. Study protocols were approved by the Institutional Review Boards of corresponding institutes and hospitals (protocols above-noted), and at Weill Cornell Medicine (WCM; protocol number 0805009783). For AML patient cryopreserved specimens from the Royal Adelaide Hospital and SA Pathology (Adelaide, South Australia) collected and stored prior to 1998, the requirement for informed consent was waived by the Royal Adelaide Hospital Human Research Ethics Committee (RAH Protocol #110304b). The use of the samples obtained from RAH and SA Pathology in this specific research study was approved by the RAH HREC on September 10, 2010. Of 140 paired patient samples that were collected, 138 were successfully processed. Patient samples were collected at the time of diagnosis and within three months of clinically determined relapse. Samples were subjected to Ficoll separation on the day of collection and viably frozen or immediately subjected to nucleic acid extractions using standard techniques. De-identified samples were then provided to WCM. Viably frozen cells were available for further processing from patients AML_102 through AML_140, including serial samples from AML_130 (T1 – T5). These samples were thawed and depleted of CD3+ and CD19+ cells using magnetic beads (Miltenyi Biotec), yielding a blast percentage enrichment of 89.±9.5% as confirmed by post separation flow cytometry analysis. Lymphocytes were then isolated for germline controls using flow sorting for CD3+ and CD19+ cells on a FACSAria II cell sorter (BD Biosciences, San Jose, CA) yielding > 85% purity as confirmed by post sort flow cytometry analysis. Germline DNA was isolated from ex vivo-expanded lymphocytes for patient samples AML_074 through AML_101. Briefly, the procedure for ex vivo expansion was as follows: 1 million bone marrow cells were cultured in RPMI 1640 with L-glutamine and 10% fetal calf serum (FCS). T cell expansion was induced with T cell expander Dynabeads CD3/CD28 (Thermo Fisher). After 1 to 3 days rIL2/ml was added. Cells were subsequently purified to > 98% with CD3 microBeads (MACS Miltenyi Biotec). Flow cytometry analysis and sorting was performed using CD19 PerCP-Cy5.5 (HIB19) and CD3 PerCP-Cy5.5 (HIT3a) from Biolegend (San Diego, CA, USA) and human CD45 APC-A780 (2D1) from Ebiosciences (San Diego, CA, USA). Validation for each antibody is provided on the manufacturer’s website (http://www.biolegend.com/) and at Antibodypedia (http://www.antibodypedia.com/gene/4237/CD19/antibody/539284/302214; http://www.antibodypedia.com/gene/4581/CD3D/antibody/539101/300437) and DegreeBio (1DB_ID:1DB-001-0001093758). Lymphocytes were defined as CD45 high SSC (low) CD19+ or CD3+ cells, and leukemic blasts were defined as CD19 CD3 CD45 (low) SSC (low) cells49. DNA and RNA were extracted from the isolated cells using standard techniques. For patient samples AML_001 through AML_101, DNA and RNA were extracted from mononuclear cell layers using standard techniques. Normal bone marrow controls (NBM: CD34+ mononuclear cells; 7 males and 7 females) were purchased from AllCells (Alameda, CA, USA; n = 5) or isolated using magnetic bead positive selection for CD34+ (Miltenyi Biotec) from freshly collected bone marrow samples from individuals without known hematological malignancies (n = 9). Purity was verified using flow cytometry post separations to greater than 90%. All flow cytometry data were analyzed using Flowjo (TreeStar, Ashland, Oregon).

Enhanced reduced representation bisulfite sequencing (ERRBS)

ERRBS is a slightly modified version of RRBS36,37. Libraries were prepared by MspI restriction enzyme digestion of high molecular weight genomic DNA, followed by end repair, size selection, bisulfite conversion and library amplification as previously described38,39. Libraries were sequenced on a HiSeq 2000 Illumina machine using 75 bp single-end reads. Data alignment was performed to human genome hg19 as previously described (see data analysis description)38,39. Average reads per sample were 138,596,488 with a mean alignment rate of unique reads of 63.7%, covering on average 4,399,235 CpGs per sample at a minimum threshold coverage of 10X. Average coverage depth per CpG was 72.3 and average bisulfite conversion was 99.86% determined as previously described38,39. See Supplementary Table 3 for detailed sequencing statistics. A subset of patient samples were processed using SeqCap Epi 4M CpGiant Enrichment kit (NimbleGen-Roche) and SureSelectXT Human Methyl-Seq (Agilent Technologies) per manufacturer’s recommendations. These were sequenced using a paired-end 100bp approach on a HiSeq 2000 (Illumina, Inc.).

Data analysis

R50 version 3.2.1 was utilized for data analyses. Specific R packages and other tools/software used for analyses are noted in the respective sections below.

Code availability

Analysis scripts utilized in this manuscript have been deposited at https://github.com/ShengLi/relapsed_AML

ERRBS data analysis

We performed bisulfite-treated read alignment to hg19 genome and methylation calls as previously described38,39. Briefly, the adaptor sequences were removed by FAR software51. Preprocessed reads were then aligned to human genome reference hg19 using the bismark alignment software.

Epiallele shift analysis was performed using methclone40. Briefly, the epiallele patterns compositional changes between different samples were evaluated using methclone to calculate the combinatorial entropy (ΔS) change of epialleles at each locus. This analysis outputs the loci with a ranked list of epiallele changes defined by the entropy change. We first calculated the foreground combinatorial entropy using the epiallele composition detected, which is the sum of the entropy of each locus for two samples considered. We then calculated the background combinatorial entropy, using the epiallele composition after uniformly mixing all patterns of epialleles between two samples. The sum entropy was determined and then the entropy changes were defined by the difference between foreground and background combinatorial entropy values. Epiallele shifts per million loci (EPM) is a normalized measure of the global epiallele changes with ΔS < − 90 as the cutoff (eloci). Comparisons of AML specimens to NBMs report the average EPM determined from assessing each AML case against each NBM (n = 14). The diagnosis-specific eloci are eloci that are only detectable at diagnosis stage, when compared to NBMs. The relapse-specific eloci are eloci that are only detectable at relapse stage, when compared to NBMs. The shared eloci are eloci that are detectable at both stages. The eloci clusters were determined using K-means clustering (K = 3, Gap Statistic for Estimating the Number of Clusters) based on the proportion of eloci that are diagnosis-specific, shared, or relapse-specific. Within cluster 1 and 3, respectively, the proportion of eloci for diagnosis and relapse were compared using Wilcoxon signed rank tests within each cluster to measure the significance of dominance. Finally, we assessed the distribution of patients from each clinical center (n = 5) in the epigenetically-defined clusters. Using multinomial logistic regression analysis to model the clusters with clinical and demographic features we determined that the center identity was not equally distributed among the three clusters. However, we confirmed no technical or clinical etiology was related to this distribution for each center (age, WBC, gender, FAB classification, mutations, cytogenetics), and the sample sizes for the sub-cohorts were not powered to detect sub-clusters within each cohort.

Intra-tumor global methylation heterogeneity (MH) was assessed as previously described29. Briefly, epipolymorphisms were determined using proportions of DNA methylation patterns at four adjacent CpGs ( 1i=116pi2, where pi is the fraction of each DNA methylation pattern i among the cell population)27. Epipolymorphism loci shared by at least 75% of the patients (n = 104) were considered for further analysis. Because epipolymorphism is dependent on DNA methylation levels, each locus was binned by the average DNA methylation levels from 0 – 100% (21 bins in total). Therefore, for each sample, all the loci were assigned to one of the 21 different groups based on their mean DNA methylation levels. For each bin, the width was 5% (DNA methylation), except for the first and last bin (2.5% DNA methylation). The median epipolymorphism across all the loci within each bin were calculated. Then the overall epipolymorphism landscape was defined using the median epipolymorphism across the 21 bins spanning the DNA methylation levels from 0 – 100%. The intra-tumor overall MH was defined as the area under the median epipolymorphism across the spectrum of methylation percentages in 21 bins. The range of MH is 0 – 100. Higher MH represents higher overall intra-tumor heterogeneity. Each locus was covered by at least 60 sequencing reads.

Genomic Annotation

Genomic annotation reference files (CpG islands and RefSeq genes) for eloci distribution analyses were obtained from UCSC (https://genome.ucsc.edu/) using Feb. 2009 (GRCh37/hg19) assembly52,53. Promoters were defined as transcription start sites +/− 1kb. CpG shores were defined as 2kb flanking CpG islands, subtracted by any regions overlapping with nearby CpG islands. CpG shelves were defined by 2kb flanking CpG shores, subtracted by any regions overlapping with nearby CpG islands and shores. Enhancers were defined based on the NIH Roadmap Epigenomics Project54 CD34 mobilized primary cell data. Active enhancers were defined as H3K4me1 and H3K27ac peaks without H3K4me3 marks, and poised enhancers were defined as H3K4me1 without H3K27ac or H3K4me3 marks. Wilcoxon signed rank tests were used to compare the proportions of eloci falling into each of the genomic annotation regions.

RNA-sequencing

RNA sequencing (RNA-seq) libraries were prepared in two batches (Supplementary Table 7) using TruSeq RNA-Seq by polyA enrichment (Illumina, Inc.) and sequenced on HiSeq2000 (Illumina, Inc.) using a 50 bp paired-end approach per manufacturer’s recommendations. All paired samples were prepared within the same batch. Alignment was performed using the STAR aligner (version 2.3.0e)55 and human genome hg19 as reference. Aligned results were annotated using the Refseq gene model and HTSeq union mode. Gene expression data were normalized using RPKM. Blast enriched patient samples from AML_130 (lymphocyte depleted as described in Sample collection and processing) were also used for single cell isolations. Single cells were captured and mRNA isolated using Clontech’s SMARTer chemistry (v2) on the Fluidigm C1 Single Cell Auto Prep system. Illumina’s Nextera XT kit was used for library preparation prior to 100 bp paired-end sequencing on the HiSeq 2500 (Illumina, Inc.) platform. See Supplementary Table 8 for sequencing statistics. Data analysis was performed using r-make (http://physiology.med.cornell.edu/faculty/mason/lab/r-make/) for quality control, alignment and gene expression quantification.

DESeq256 was used for differential gene expression analysis performed. The significance of Differentially Expressed Genes (DEGs) was determined using Wald significance test to compare cluster 1 compared to cluster 3. The design matrix includes batch information to control for any possible batch differences. For multiple hypothesis testing, the significance cutoff used for optimizing the independent filtering was 0.05 (Benjamini–Hochberg). A log2 fold change greater than 1.2 or lesser than −1.2 was used for significance.

Transcriptional heterogeneity was assessed using single cell RNA-seq data. Specifically, the coefficient of variations and transcriptional abundance per gene across cells was determined using log2 RPKM. Genes were included if the average log2 RPKM in cells were higher than 1. Gene Ontology term enrichment analysis was performed using GEne SeT AnaLysis Toolkit57. A threshold of two overlapping genes was considered. Hypergeometric tests were used for significance determination and Benjamini–Hochberg58 was used for multiple testing correction.

Whole Exome Sequencing

Whole exome capture was performed on DNA isolated from 48-paired diagnosis and relapse patient samples and patient-matched germline samples. Germline DNA (lymphocytes) was subjected to whole genome amplification (repli-g kit; Qiagen) in 25 patients due to limited materials (see Supplementary Table 2). To obtain sufficient quantities of DNA from the remaining samples, T cells were expanded ex vivo (detailed under Sample collection and processing). DNA was extracted using standard techniques. Libraries were prepared per manufacturer’s recommendation using NimbleGen SeqCap EZ Human Exome Library v3.0, Agilent Human Exon V3 (Exon 50Mb), or Agilent SureSelect Human All Exon V4 (51 MB; see Supplementary Table 2 for specification of kit use per patient tumor and germline samples) and sequenced at a minimum of 50bp single-read sequencing on a HiSeq 2000 (Illumina, Inc.) to a mean coverage per base of 73X. See Supplementary Table 4 for sequencing statistics.

Whole Genome Sequencing

Illumina TruSeq Nano DNA Library Prep Kit (Illumina, Inc.) was performed on DNA from T1–T5 time points and germline (CD19/3 positive cells) isolated from the first relapse sample of patient AML_130. Libraries were prepared per manufacturer’s recommendation and sequenced using a 101 bp paired-end sequencing approach on the HiSeq 2000 platform per manufacture’s recommendations to a mean coverage per base of 43X. See Supplementary Table 4 for sequencing statistics.

Next generation sequencing data analysis

DNA-sequencing data was analyzed to determine somatic mutations, copy number aberrations and clonal evolution patterning using publically available tools (BWA59, GATK6062, Mutect63, Varscan64, somaticSniper65, SnpEFF66, XHMM67, DNAcopy library68, and sciClone46). Tool versions, inclusion, exclusion, and significance parameters, statistical tests as well as specific commands used and implemented are included in the supplementary information file and deposited into github (https://github.com/ShengLi/relapsed_AML).

Integrative analysis

The association between DEGs (derived from relapse versus diagnosis AML patient samples) and promoter-associated eloci was determined as follows. Eloci (maximum ΔS = −90) annotated to gene promoters (transcription start sites +/− 1 kb) were considered for the integrative analyses. Patients with a minimum of 30 genes with promoter-associated eloci were included (n = 19). Epiallele loci within gene promoters which exhibited a minimum ΔS greater than −2 were designated “non-eloci”. RPKM log fold change was used to measure the transcriptome level dynamics. The variance of gene expression (RPKM log fold change) was calculated for genes with or without eloci within their proximal promoters and compared across all patients using a Wilcoxon signed rank test. Genes with log fold change greater than 1 were defined as DEGs. The number of DEGs with or without eloci were compared using a Wilcoxon signed rank test.

Assessment for association between epigenetic clusters and mutations in genes recurrently affected in AML was performed. The variant calling pipeline (in-house developed algorithm69) used BAM files generated from the sequencing data of all paired diagnostic, relapse and germline samples to identify SNVs and small insertion and deletions (indels) in driver genes associated with AML11,45. For the identification of these somatic genetic aberrations we used a multi-variant calling approach integrating the output of six different somatic mutation detection algorithms: MuTect63 (version 1.17), Indelocator, GATK UnifiedGenotyper61 (version v3.2-2-gec30cee), SAMtools70 (version 0.1.19-44428cd), VarScan 264 (version 2.3.6) and Pindel71 (version 0.2.5a7). Briefly, MuTect allowed a VAF up to 10% or 6 reads containing the variant allele in the germline control to prevent the filtering of driver mutations detectable in lymphocyte control material72. Indelocator was allowed to detect indels with a VAF of 2% or greater. SAMtools, GATK UnifiedGenotyper and VarScan2 were run with default parameters. Pindel was used to detect the FLT3-internal tandem duplication (FLT3-ITD) aberration specifically within exons 13, 14 and 15 of the FLT3 gene by focusing on short insertions or tandem duplications. The detected variants were aggregated in a unique variant call format (VCF) file and subsequently annotated with ANNOVAR73. Annotations included dbSNP74, COSMIC75 and population based sequencing efforts, such as the 1000 genome project. The detected variants were further characterized by multiple fragment and regional characteristics by an in-house developed algorithm previously described69. In brief the algorithm determines the number of high quality (alignment score > = 40) and total number of fragments (irrespective of alignment score) for each detected variant. Based on the detected alternative allele this algorithm determines the VAF for high quality and all fragments. In addition, the algorithm determines the strand bias of the reads. For each potential somatic variant the same set of fragment and regional statistics was determined for the germline sample. Somatic mutations were detected by comparing the characteristics from the diagnostic or relapse sample to the germline sample. The FLT3-ITD aberration was detected by Pindel (script deposited into https://github.com/ShengLi/relapsed_AML), however, following this approach the VAF was not accurately estimated and therefore the VAF was considered not determined in the Supplementary Table. In a subset of patients, NPM1 mutations were not detectable due to a lack of coverage in the data generated as indicated in Supplemental Table 7. Mutations were considered diagnosis specific if alternative allele frequency at diagnosis was greater than 3%, and was not detected at relapse. Mutations were considered relapse-specific if alternative allele frequency at relapse was greater than 3%, and was not detected at diagnosis. Mutations were considered shared between diagnosis and relapse if alternative allele frequency at diagnosis and at relapse were both greater than 3%. Association between the frequency of each mutation and the epigenetic clusters was assessed using a Chi-square test with Monte Carlo simulation (number of replicates = 106).

Clinical correlation analysis

The correlation of overall epiallele shift (log10 EPM) with clinical parameters was evaluated. The clinical correlation between EPM (log10) and FAB classes was performed using ANOVA test. FAB classes with a minimum of 7 patients were assessed. We used the Pearson correlation (r) between overall epiallele shift and age or blast purity post separation and Hoeffding’s D statistics for dependency test76.

To determine which gene-associated epiallele shift loci (eloci) associated with clinical outcome we performed the following analysis: 1. We determined eloci between diagnosis samples and NBMs (we required that the epiallele region considered be covered by a minimum of 5 NBMs). 2. We annotated eloci to gene promoters (transcription start site +/− 1 kb) and excluded epiallele loci which were covered in less than ten patients. 3. The patients were divided into groups with longer (n = 69) versus shorter (n = 68) relapse free survival based on the median value of time to relapse. 4. The frequency of each elocus in the patient groups was assessed. 5. odds ratio was used to determine the association between early and late relapse group for each locus. 6. Significance was determined using a Fisher’s exact test for each locus. 7. Benjamini-Hochberg correction was applied to P values.

Survival analysis

Log rank (Mantel-Cox) test was used for survival analysis. For relapse-free survival analysis, survival endpoints in this study were time from diagnosis until AML relapse. The patients were divided by the median EPM (low EPM versus high EPM), median number of somatic mutations (low MUT versus high MUT), or median MH (low MH versus high MH) for respective comparisons. Among 138 AML patients in the cohort, time to relapse was available for 137 patients and white blood cell count was available for 127 patients.

Multivariate Cox proportional hazards regression model used relapse time as response variable, and included log10 EPM, age, gender, white blood cell as variables to be tested (n = 127). P-value of each predictor was used to assess if this clinical parameter is significantly associated with relapse time.

Supplementary Material

1
10
11
12
13
2
3
4
5
6
7
8
9

Acknowledgments

We thank C. Sheridan, J. Phillips, J. Ishii, L. Wang, J. Busuttil, T. Lee, P. Zumbo, J. Gandara, and A. Zeilemaker for technical support. We thank C. Sheridan for assistance with organization and maintenance of sample database and banking, M. Perugini, D. Iarossi, and I.S. Tiong for assistance with clinical database management, and Y. Neelamraju, Z. Li and M. R. De Massy for data management. Next generation sequencing protocols and sequencing were performed by the WCMC Epigenomics Core and the New York Genome Center. We thank A. Viale from the Integrated Genomics Operation and N. Socci from the bioinformatics core at MSKCC for sequencing services. We thank the South Australian Cancer Research Biobank for access to clinical samples. We thank F. Michor for expert recommendations regarding data analyses. The authors wish to thank the following sources of financial support: Starr Cancer Consortium grant I4-A442 (AMM, RL, CEM), STARR Cancer Consortium grant I7-A765 and I9-A9-071 (CEM), and funding from the Irma T. Hirschl and Monique Weill-Caulier Charitable Trusts, Bert L and N Kuggie Vallee Foundation and the WorldQuant Foundation, Pershing Square Sohn Cancer Research Alliance, and NASA (NNX14AH50G) (CEM), LLS SCOR 7006-13 (AMM), NCI K08CA169055 (FGB), funding from the American Society of Hematology (ASHAMFDP-20121) under the ASH-AMFDP partnership with The Robert Wood Johnson Foundation and ASH/EHA TRTH (FGB), Doris Duke Medical Foundation, Leukemia and Lymphoma Society Translational Research Program, and Geoffrey Beene Cancer Center financial support (CYP), Leukaemia & Lymphoma Research award (DG and RD), DFG grant SFB 1074 (project B3; KD and LB) and Heisenberg-Stipendium BU 1339/3-1 (LB), National Health and Medical Research Council (NH&MRC) and the Royal Adelaide Hospital Contributing Haematologists Fund financial support (RDA, AB and IL), R01CA102031 (GJR and MLG) and Leukemia Fighters funding (GJR, MLG and DCH).

Footnotes

ACCESSION CODES

Data from this study are available from the NCBI via the dbGaP accession number phs001027.v1.p1 (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001027.v1.p1)

AUTHOR CONTRIBUTIONS

A.M.M. and C.E.M. conceived the studies, designed analytical approaches, analyzed results and wrote manuscript; designed experiments with F.E.G-B. S.L.1 conceived of computational analyses, wrote code and performed computational analysis, wrote manuscript, and generated figures. F.E.G-B. performed, coordinated and/or supervised all patient sample experimental procedures, performed computational and bench experimental data management and analyses and wrote manuscript. S.S.C. and R.D. performed experiments (flow cytometry and sorting of subject samples) and associated data analysis. T.H., F.R. and J.P. performed computational analyses. M.A.S. and P.J.M.V. performed experiments (Exome Capture) and associated computational analysis. A.L.B., A.E.P., J.C., L.B., S.L.2, M.B., I.D.L., L.B.T, B.L., H.D., K.D., P.J.M.V., R.J.D., and M.C. coordinated patient sample collection and analyzed clinical data. D.N. assisted with statistical analyses. P.V. ran single-cell RNA-seq (scRNA-seq) analysis and library preparation and performed expression analysis. M.L.G., D.C.H., G.J.R., D.G., C.Y.P., and R.L all helped with sample collection, writing, analysis, and patient annotation. All authors read, edited and approved the manuscript.

1Sheng Li

2Selina Luger

COMPETING FINANCIAL INTERESTS STATEMENT

The authors declare no financial conflicts of interest.

Supplementary Information is linked to the online version of the paper at www.nature.com/nature

References

  • 1.Roboz GJ. Current treatment of acute myeloid leukemia. Curr Opin Oncol. 2012;24:711–719. doi: 10.1097/CCO.0b013e328358f62d. [DOI] [PubMed] [Google Scholar]
  • 2.Grimwade D, et al. Refinement of cytogenetic classification in acute myeloid leukemia: determination of prognostic significance of rare recurring chromosomal abnormalities among 5876 younger adult patients treated in the United Kingdom Medical Research Council trials. Blood. 2010;116:354–365. doi: 10.1182/blood-2009-11-254441. [DOI] [PubMed] [Google Scholar]
  • 3.Dohner H, et al. Diagnosis and management of acute myeloid leukemia in adults: recommendations from an international expert panel, on behalf of the European LeukemiaNet. Blood. 2010;115:453–474. doi: 10.1182/blood-2009-07-235358. [DOI] [PubMed] [Google Scholar]
  • 4.Ishikawa F, et al. Chemotherapy-resistant human AML stem cells home to and engraft within the bone-marrow endosteal region. Nature biotechnology. 2007;25:1315–1321. doi: 10.1038/nbt1350. [DOI] [PubMed] [Google Scholar]
  • 5.Ding L, et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012;481:506–510. doi: 10.1038/nature10738. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.McKerrell T, et al. Leukemia-associated somatic mutations drive distinct patterns of age-related clonal hemopoiesis. Cell Rep. 2015;10:1239–1245. doi: 10.1016/j.celrep.2015.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Moran-Crusio K, et al. Tet2 loss leads to increased hematopoietic stem cell selfrenewal and myeloid transformation. Cancer cell. 2011;20:11–24. doi: 10.1016/j.ccr.2011.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Xie M, et al. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat Med. 2014;20:1472–1478. doi: 10.1038/nm.3733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Landau DA, Carter SL, Getz G, Wu CJ. Clonal evolution in hematological malignancies and therapeutic implications. Leukemia. 2014;28:34–43. doi: 10.1038/leu.2013.248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Klco JM, et al. Functional heterogeneity of genetically defined subclones in acute myeloid leukemia. Cancer Cell. 2014;25:379–392. doi: 10.1016/j.ccr.2014.01.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cancer Genome Atlas Research N. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med. 2013;368:2059–2074. doi: 10.1056/NEJMoa1301689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Testa JR, Mintz U, Rowley JD, Vardiman JW, Golomb HM. Evolution of karyotypes in acute nonlymphocytic leukemia. Cancer Res. 1979;39:3619–3627. [PubMed] [Google Scholar]
  • 13.Cancer Genome Atlas N. Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012;487:330–337. doi: 10.1038/nature11252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Cancer Genome Atlas N. Comprehensive molecular portraits of human breast tumours. Nature. 2012;490:61–70. doi: 10.1038/nature11412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Cancer Genome Atlas N. Genomic Classification of Cutaneous Melanoma. Cell. 2015;161:1681–1696. doi: 10.1016/j.cell.2015.05.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Landau DA, et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell. 2013;152:714–726. doi: 10.1016/j.cell.2013.01.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sottoriva A, et al. Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proc Natl Acad Sci U S A. 2013;110:4009–4014. doi: 10.1073/pnas.1219747110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Zhang J, et al. Genetic heterogeneity of diffuse large B-cell lymphoma. Proc Natl Acad Sci U S A. 2013;110:1398–1403. doi: 10.1073/pnas.1205299110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Landau DA, et al. Mutations driving CLL and their evolution in progression and relapse. Nature. 2015 doi: 10.1038/nature15395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Mroz EA, Tward AD, Hammon RJ, Ren Y, Rocco JW. Intra-tumor genetic heterogeneity and mortality in head and neck cancer: analysis of data from the Cancer Genome Atlas. PLoS Med. 2015;12:e1001786. doi: 10.1371/journal.pmed.1001786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Kandoth C, et al. Mutational landscape and significance across 12 major cancer types. Nature. 2013;502:333–339. doi: 10.1038/nature12634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lawrence MS, et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature. 2014;505:495–501. doi: 10.1038/nature12912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Figueroa ME, et al. DNA methylation signatures identify biologically distinct subtypes in acute myeloid leukemia. Cancer cell. 2010;17:13–27. doi: 10.1016/j.ccr.2009.11.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Figueroa ME, et al. Leukemic IDH1 and IDH2 mutations result in a hypermethylation phenotype, disrupt TET2 function, and impair hematopoietic differentiation. Cancer cell. 2010;18:553–567. doi: 10.1016/j.ccr.2010.11.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Rampal R, et al. DNA Hydroxymethylation Profiling Reveals that WT1 Mutations Result in Loss of TET2 Function in Acute Myeloid Leukemia. Cell Rep. 2014;9:1841–1855. doi: 10.1016/j.celrep.2014.11.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Shih AH, et al. Mutational cooperativity linked to combinatorial epigenetic gain of function in acute myeloid leukemia. Cancer Cell. 2015;27:502–515. doi: 10.1016/j.ccell.2015.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Landan G, et al. Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues. Nat Genet. 2012;44:1207–1214. doi: 10.1038/ng.2442. [DOI] [PubMed] [Google Scholar]
  • 28.Landau DA, et al. Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. Cancer Cell. 2014;26:813–825. doi: 10.1016/j.ccell.2014.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Pan H, et al. Epigenomic evolution in diffuse large B-cell lymphomas. Nat Commun. 2015;6:6921. doi: 10.1038/ncomms7921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Chambwe N, et al. Variability in DNA methylation defines novel epigenetic subgroups of DLBCL associated with different clinical outcomes. Blood. 2014;123:1699–1708. doi: 10.1182/blood-2013-07-509885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.De S, et al. Aberration in DNA methylation in B-cell lymphomas has a complex origin and increases with disease severity. PLoS genetics. 2013;9:e1003137. doi: 10.1371/journal.pgen.1003137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Shaknovich R, et al. DNA methyltransferase 1 and DNA methylation patterning contribute to germinal center B-cell differentiation. Blood. 2011;118:3559–3569. doi: 10.1182/blood-2011-06-357996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Mazor T, et al. DNA Methylation and Somatic Mutations Converge on the Cell Cycle and Define Similar Evolutionary Histories in Brain Tumors. Cancer Cell. 2015;28:307–317. doi: 10.1016/j.ccell.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Greaves M, Maley CC. Clonal evolution in cancer. Nature. 2012;481:306–313. doi: 10.1038/nature10762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Feinberg AP, Koldobskiy MA, Gondor A. Epigenetic modulators, modifiers and mediators in cancer aetiology and progression. Nat Rev Genet. 2016;17:284–299. doi: 10.1038/nrg.2016.13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Meissner A, et al. Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic acids research. 2005;33:5868–5877. doi: 10.1093/nar/gki901. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Gu H, et al. Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling. Nature protocols. 2011;6:468–481. doi: 10.1038/nprot.2010.190. [DOI] [PubMed] [Google Scholar]
  • 38.Akalin A, et al. Base-pair resolution DNA methylation sequencing reveals profoundly divergent epigenetic landscapes in acute myeloid leukemia. PLoS genetics. 2012;8:e1002781. doi: 10.1371/journal.pgen.1002781. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Garrett-Bakelman FE, et al. Enhanced reduced representation bisulfite sequencing for assessment of DNA methylation at base pair resolution. J Vis Exp. 2015 doi: 10.3791/52246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Li S, et al. Dynamic evolution of clonal epialleles revealed by methclone. Genome Biol. 2014;15:472. doi: 10.1186/s13059-014-0472-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Parkin B, et al. Clonal evolution and devolution after chemotherapy in adult acute myelogenous leukemia. Blood. 2013;121:369–377. doi: 10.1182/blood-2012-04-427039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Kronke J, et al. Clonal evolution in relapsed NPM1-mutated acute myeloid leukemia. Blood. 2013;122:100–108. doi: 10.1182/blood-2013-01-479188. [DOI] [PubMed] [Google Scholar]
  • 43.Tawana K, et al. Disease evolution and outcomes in familial AML with germline CEBPA mutations. Blood. 2015 doi: 10.1182/blood-2015-05-647172. [DOI] [PubMed] [Google Scholar]
  • 44.Chou WC, et al. The prognostic impact and stability of Isocitrate dehydrogenase 2 mutation in adult patients with acute myeloid leukemia. Leukemia. 2011;25:246–253. doi: 10.1038/leu.2010.267. [DOI] [PubMed] [Google Scholar]
  • 45.Patel JP, et al. Prognostic relevance of integrated genetic profiling in acute myeloid leukemia. The New England journal of medicine. 2012;366:1079–1089. doi: 10.1056/NEJMoa1112304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Miller CA, et al. SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution. PLoS Comput Biol. 2014;10:e1003665. doi: 10.1371/journal.pcbi.1003665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Ong CT, Corces VG. CTCF: an architectural protein bridging genome topology and function. Nat Rev Genet. 2014;15:234–246. doi: 10.1038/nrg3663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Kemp CJ, et al. CTCF haploinsufficiency destabilizes DNA methylation and predisposes to cancer. Cell Rep. 2014;7:1020–1029. doi: 10.1016/j.celrep.2014.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Nicholson JK, Hubbard M, Jones BM. Use of CD45 fluorescence and side-scatter characteristics for gating lymphocytes when using the whole blood lysis procedure and flow cytometry. Cytometry. 1996;26:16–21. doi: 10.1002/(SICI)1097-0320(19960315)26:1<16::AID-CYTO3>3.0.CO;2-E. [DOI] [PubMed] [Google Scholar]
  • 50.Team RC. R Foundation for Statistical Computing, Vienna, Austria. 2012. R: A language and environment for statistical computing. URL http://www.Rproject.org/
  • 51.Matthias Dodt JTR, Ahmed R, Dieterich C. Flexbar – flexible barcode and adapter processing for next-generation sequencing platforms. MDPI Biology. 2012;1:895–905. doi: 10.3390/biology1030895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Kent WJ, et al. The human genome browser at UCSC. Genome Res. 2002;12:996–1006. doi: 10.1101/gr.229102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Karolchik D, et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32:D493–496. doi: 10.1093/nar/gkh103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Roadmap Epigenomics C, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Dobin A, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Wang J, Duncan D, Shi Z, Zhang B. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res. 2013;41:W77–83. doi: 10.1093/nar/gkt439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society, Series B (Methodological) 1995;57:289–300. [Google Scholar]
  • 59.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.DePristo MA, et al. A framework for variation discovery and genotyping using nextgeneration DNA sequencing data. Nat Genet. 2011;43:491–498. doi: 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Van der Auwera GA, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013;11:11 10 11–11 10 33. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Cibulskis K, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol. 2013;31:213–219. doi: 10.1038/nbt.2514. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Koboldt DC, et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome research. 2012;22:568–576. doi: 10.1101/gr.129684.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Larson DE, et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics. 2012;28:311–317. doi: 10.1093/bioinformatics/btr665. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Cingolani P, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 2012;6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Fromer M, et al. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am J Hum Genet. 2012;91:597–607. doi: 10.1016/j.ajhg.2012.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Venkatraman ES, Olshen AB. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics. 2007;23:657–663. doi: 10.1093/bioinformatics/btl646. [DOI] [PubMed] [Google Scholar]
  • 69.Groschel S, et al. Mutational spectrum of myeloid malignancies with inv(3)/t(3;3) reveals a predominant involvement of RAS/RTK signaling pathways. Blood. 2015;125:133–139. doi: 10.1182/blood-2014-07-591461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25:2865–2871. doi: 10.1093/bioinformatics/btp394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Jaiswal S, et al. Age-related clonal hematopoiesis associated with adverse outcomes. N Engl J Med. 2014;371:2488–2498. doi: 10.1056/NEJMoa1408617. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Sherry ST, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311. doi: 10.1093/nar/29.1.308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Forbes SA, et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 2015;43:D805–811. doi: 10.1093/nar/gku1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Hoeffding W. A Non-Parametric Test of Independence. The Annals of Mathematical Statistics. 1948;19:546–557. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
10
11
12
13
2
3
4
5
6
7
8
9

RESOURCES