Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2020 Nov 21.
Published in final edited form as: Nat Med. 2020 Sep 7;26(11):1726–1732. doi: 10.1038/s41591-020-1033-y

Genomic copy number predicts esophageal cancer years before transformation

Sarah Killcoyne 1,2,*, Eleanor Gregson 1,*, David C Wedge 3,4, Dan J Woodcock 3, Matthew Eldridge 5, Rachel de la Rue 6, Ahmad Miremadi 6, Sujath Abbas 1, Adrienn Blasko 1, Cassandra Kosmidou 1, Wladyslaw Januszewicz 1, Aikaterini Varanou Jenkins 1, Moritz Gerstung 2,+, Rebecca C Fitzgerald 1,+
PMCID: PMC7116403  EMSID: EMS86618  PMID: 32895572

Summary

Recent studies show that aneuploidies and driver gene mutations precede cancer diagnosis by many years14. We assess whether these genomic signals can be used for early detection and pre-emptive cancer treatment using the neoplastic precursor lesion Barrett’s esophagus, as an exemplar5. Shallow whole genome sequencing of 777 biopsies, sampled from 88 patients in Barrett’s surveillance over a period of up to 15 years shows that genomic signals can distinguish progressive from stable disease even ten years prior to histopathological transformation. These findings are validated on two independent cohorts of 76 and 248 patients. These methods are low cost and applicable to standard clinical biopsy samples. Compared with current management guidelines based on histopathology and clinical presentation, genomic classification enables earlier treatment for high risk patients as well as reduction of unnecessary treatment and monitoring for patients who are unlikely to develop cancer.


Early diagnosis of cancer is one of the best strategies to improve patient survival and decrease treatment-related side-effects that contribute to poorer health, however this strategy poses a risk of overtreatment6. Therefore, accurate biomarkers of early cancer progression are needed to stratify patients. Copy number (CN) alterations, though common in cancer are rarely found in normal tissues, raising the question whether these signals could help diagnose patients earlier.

This strategy can be tested in esophageal adenocarcinoma (EAC), which has a 5-year survival rate of less than 20%7. Its precursor tissue is known as Barrett’s esophagus (BE); however, the risk for a patient with BE progressing to EAC is only around 0.3% per annum8. Current surveillance programs focus on the presence and grade of dysplasia in BE patients as determined by histopathological examination of biopsies. Low- and high-grade dysplasia (LGD, HGD) are used as surrogates for early cancer transformation and trigger intervention, commonly by endoscopic resection and radiofrequency ablation9,10. Additional risk factors for progression include increasing age, male gender, greater length of the BE segment, and tobacco use at the initial evaluation, although these are not yet part of clinical guidelines11.

Improvements to risk assessment have focused on identifying individual molecular biomarkers, particularly p53 expression1216 and DNA methylation changes17,18. However, identification of mutational biomarkers for progression has been difficult, due to the low frequency of recurrent point mutations in either BE19 or EAC20,21. Instead, EAC and BE are characterized by early and frequent genomic (CN and structural) instability2024. As ongoing genomic instability leads to a large extent of clonal diversity, multiple investigations have focused on the heterogeneity and diversity of BE tissues25 as markers of higher risk2629.

We investigated genome-wide CN instability as a marker for risk of progression using shallow whole genome sequencing (sWGS; average depth 0.4x) in a retrospective, demographically matched, case-control cohort of patients (n=88) with all available endoscopy samples (n=777) collected during clinical surveillance for BE (Fig. 1a). sWGS was chosen as the protocol as it provides a genome-wide perspective on CN and the level of genomic instability and has been optimised for use in formalin-fixed paraffin embedded (FFPE) samples30.

Figure 1. Copy number profiles in Barrett’s Esophagous vary over space and time.

Figure 1

a. Shows the case-control cohort design for the discovery patient cohort (n=88). Non-progressor patients had a minimum follow-up of 3 years, progressor patients had a minimum one year follow-up and all patients start at NDBE. Archival samples were collected from every available endoscopy over time, and along the length of the BE segment. b-d Bar plots showing the adjusted CN values across the genome in 5Mb windows, with relative (within each sample) gains shown in the positive y-axis, and relative losses shown in the negative y-axis. b. Genomic CN profiles of individual samples for a progressor patient (P, top) and a non-progressor patient (NP, bottom). The colors across the chromosomes in each sample are based on the location relative to the stomach it was taken in the esophagus (sample nearest to the esophageal-gastric junction at the bottom, up the BE segment) and the ideograms to the right of the plots show the samples that belong to a single endoscopy indicated by the year. Note the variability in the CN profiles within samples from the progressor patient in chromosomes 14 and 17 in contrast to the shared pattern across the non-progressor patient in those regions. c-d, distribution of relative CN values at each genomic segment across all samples in the progressor and non-progressor patient groups. The grey in the middle is the median ± 1SD, indicating a likely diploid genome value. Purple and green show the range of relative gains and losses, respectively. In c all samples regardless of pathology are plotted and a large variation in the CN between progressor and non-progressor patients is clear (i.e. chromosomes 1, 4, 9, 11). In d only NDBE samples from both patient groups are plotted and the progressor patients still show a much larger CN signal despite being pathologically indistinguishable.

CN patterns were examined at multiple levels of the esophagus to understand how patients who progress differ from non-progressors. We observed that the genomes of individual progressive patients display a generalized disorder across the genomes that varies between samples and over time (Fig. 1b). Additionally, CN changes were not confined to cytological atypia (e.g. LGD, HGD), since similar profiles were observed for the non-dysplastic BE (NDBE) samples (n=518; Fig. 1c-d).

The CN information and a measure of overall complexity (Methods; Supplementary Fig. 1). were used to generate a cross-validated elastic-net regularized logistic regression model of progression and classification with the endpoint HGD or intramucosal cancer (IMC; Methods), and subsequently validated using an independent cohort of 76 patients (n=213 samples), alongside an orthogonal validation of the Seattle BE Study SNP array samples (n=1272)31.

This model was designed to be independent of demographic risk factors11 as our cohort was matched for sex, BE segment length, age at diagnosis, and smoking status (Supplementary Table 1). We used the area under the receiver operating curve (AUC, ROC) to evaluate the model training performance. As the model included the diagnostic samples with the most extreme CN (e.g. HGD, and IMC) we additionally trained a model excluding these and found that the AUC concordance was high (Supplementary Fig. 2a), indicating that the model was not sensitive to extreme samples. Aggregating predictions either per-endoscopy (mean or max sample predictions) or per-patient (mean or max predictions excluding HGD/IMC samples) did not measurably increase the prognostic accuracy (Supplementary Fig. 2b), suggesting that a single sample (e.g. pooled 4-quadrant biopsy) may be sufficient for prediction which could be ideal for clinical application.

Using all sample predictions generated by the model we evaluated the relative risk (RR) across the cohort. Those samples with the highest RR were more than 20 times more likely to progress than average, while those with the lowest RR were 10 times less likely (Fig. 2a). This information enabled us to calibrate risk classifications based on the enrichment of samples from progressor or non-progressor patients to maximize the sensitivity of our classes: ‘low’ (Pr≤0.3; sensitivity=0.87, specificity=0.65), ‘moderate’ (0.3>Pr<0.5), or ‘high’ (Pr≥0.5, sensitivity=0.72, specificity=0.82).

Figure 2. Genomic predictions of Barrett’s esophagus progression.

Figure 2

a. Histogram of the relative risk (RR) of cancer progression across the cohort based on the leave-one-patient out predictions. The highest RR is more than 30x greater risk of progression (dark reds) while the lowest RR is at a 10x lower risk (dark blues). Inset: shows the calibration of the predicted (x-axis) and observed (y-axis) probability of progression, evaluated in deciles. The ‘low’ (blue) and ‘high’ (red) risks are enriched for non-progressor and progressor patients respectively. b. Sample risk classifications in the discovery cohort of 88 patients (n=773 samples) plotted per pathology (e.g. NDBE, ID, LGD, HGD, IMC). These show that our model is able to predict progression before pathological changes are visible in NDBE samples and that c. these predictions are consistent in the independent validation cohort of 76 patients (n=213 samples). d. Illustration of risk classes across all samples in the discovery cohort (n=773). The row above the line shows progressor patients, while the row below the line are non-progressor patients. Each group of tiles denotes samples from a single patient, indicated by patient number above. On the x-axis endoscopies are plotted from the baseline on the left, to the final available endoscopy on the right. The y-axis indicates the relative location of the sample starting from the esophageal-gastric junction at the bottom up the length of the BE segment. The pop-out patient 69 shows example axis labels, all heatmaps include axis labels and pathology are included in Supplementary Figure 11.

Samples from patients who progressed were classified as “high risk” for progression independent of histopathology (Fig. 2b). Most importantly, CN profiles in NDBE samples that belonged to progressor patients were classified as high risk in 60.5% (104/172) while in non-progressor patients 64.7% (224/346) of samples were classified as “low risk”.

The model was then used to predict and classify risks per-sample for the validation cohort (76 patients, 213 samples). 78/142 (55%) samples from non-progressor patients were classified as low risk, and 55/71 (77%) of samples from patients who progressed were classified as high risk. As in the discovery cohort, high risk classification of progressor patient samples was largely independent of histopathology (Fig. 2c). Similarly, when we used our model to classify the historical Seattle study patient dataset (n=248, samples=1273 SNP array) we again find that samples from progressors are classified as high risk regardless of pathology (Supplementary Fig. 3-4). However, in this case the algorithm unsurprisingly suffers a loss of accuracy due to the differences in the methodology (see Supplemental Methods and Results for complete analysis and endpoint differences).

When sample classifications were plotted according to their spatial distribution in the segment and time of collection in the clinical history, strikingly concordant patterns emerged. Most progressive patient samples are classed as high risk throughout the disease history, while non-progressive patient samples are consistently low risk (Fig. 2d, Supplementary Fig. 5). This concordance is evident when we plot the highest risk at each timepoint per patient (Fig. 3a). For patients that progress, 50% (8/16) of endoscopies had at least one sample classified as high risk 8 or more years prior to transformation. This classification is in accordance with current diagnostic guidelines requiring only a single dysplastic sample to recommend treatment for a patient (Fig. 3b). Cases which lack early CN patterns of progression acquired these over the following years, leading to 78% (18/23) of endoscopies with at least one high risk sample one to two years prior to HGD/IMC diagnosis.

Figure 3. Cancer risk over time.

Figure 3

a. Per-endoscopy mean aggregated risks plotted per-patient (y-axis) over time (x-axis) in the months since the initial endoscopy. The top plot shows patients who progressed, and we see that in most patients we consistently classified their samples as ‘high’ risk, similarly in the non-progressors we consistently predict the a low-risk group. The interesting patients are the non-progressors who have consistently been ‘high’ risk. Follow-up continues on these patients and it is possible that they may ultimately progress to HGD/IMC. b. Looks at only the progressive patients and shows that CN can identify 50% of high risk patients more than 8 years prior to HGD or cancer.

More interesting were the patients who have not yet progressed but display a consistent pattern of high-risk endoscopies. Two patients were high risk in every sequenced sample, while the remaining patients displayed a mix of risks at each timepoint (Fig. 2d), presenting what could be clonal diversity in very early progression to EAC (follow-up for these patients continues) and resulting in consistent high-risk over time (Fig. 3a).

Statistical algorithms can be improved by increasing the size of the dataset. We therefore conducted sub-sampling of the discovery cohort with increasing numbers of patients and model training as described (Methods). With each increment in the number of patients the predictive accuracy of the model increased, reaching a (cross validated) AUC of 0.89 (specificity=0.83, sensitivity=0.82) when combining all discovery and validation patients (n=164; Supplementary Fig. 6), indicating that a larger knowledge bank of CN and progression data from BE will continue to improve the precision of patient stratification and the sensitivity of the model by adding stronger statistical signals and accounting for broader biological variation.

Current guidelines for the management of BE focus on the length of the BE segment and the presence or absence of LGD/HGD in any biopsy sample taken during endoscopy32,33. Most of our patients were in surveillance prior to the current treatment recommendations for LGD, and hence we can compare a set of recommendations based on the current guidelines33 with our model applying a similar criteria, but overlaying our risk classifications (Fig. 4a). We applied these recommendations across our entire discovery cohort (88 patients) and evaluated the first two endoscopies available excluding the endpoint (Fig. 4b, Supplementary Table 4). Using these criteria at the patient’s second surveillance endoscopy available (i.e. several years prior to transformation), 54% of progressor patients (19/35) would have received earlier treatment. Only 5 of these patients had repeat LGD diagnoses that could recommend earlier treatment or more aggressive surveillance under current pathology-based guidelines. 40% of progressor patients (14/35) would continue to receive yearly surveillance per current guidelines. The remaining 6% (2/35) would have been recommended reduced surveillance (3-5 years), however they would not have been diagnosed any differently under current guidelines as they were consistently NDBE. One patient (13) may have had delayed treatment, but this would have occurred under current guidelines as well as no dysplasia was identified prior to transformation. 51% of patients who have not progressed (21/40) would have less frequent endoscopies, 33% (13/40) would continue to receive yearly surveillance per current guidelines, and 17% (7/40) would have had potentially unnecessary treatment compared to current guidelines. Three patients from our discovery cohort are shown with the guidelines compared (Fig. 4c-d) as examples. Furthermore, the increasing sensitivity of the model as samples are taken closer to the endpoint is evident as most progressive patients are recommended treatment at their penultimate endoscopy while none would be recommended longer surveillance times (Supplementary Table 4).

Figure 4. CN profiling facilitates earlier treatment and reduced monitoring.

Figure 4

a. Provides a schematic overview of surveillance guidelines based on the CN model risk classes. It is important to note that these guidelines would apply at each endoscopy, and that they use information from the previous endoscopy to determine the treatment or surveillance. b. Uses this schematic to characterize the discovery cohort patients after their second endoscopy (many years prior to dysplastic transformation). The blue bar at the top indicates the number of non-progressor patients who would have reduced treatment needs over time, while the red bar at the bottom shows those progressor patients who would have had earlier intervention. The bars in the middle two groups would be the same as current guidelines. c-e. Individual patients with each sample plotted at the time of endoscopy. Samples are colored based on their risk class. Relevant clinical information is included above each endoscopy plot including the length of the BE segment and patient age at diagnosis. The recommendations for each patient based on the 2015 BE management guidelines33 are shown on each patient tile. Below the patients are the overall follow-up recommendations for the current guidelines and the CN model.

Recent evidence from the large-scale pan-cancer studies have suggested that genomic alterations are present many years before detectable disease1 in many cancer types. BE constitutes a known pre-malignant condition with historical follow-up to test whether genomic medicine can contribute to early cancer detection. Previous studies of BE progression have shown that genomic and epigenetic changes are present prior to cancer progression and differ in patients who do ultimately develop cancer including: p53 expression12,14; DNA methylation changes17,18; CN losses and copy neutral loss of heterozygosity26,28,34; and high clonal diversity27.

However, our analysis has shown that even highly variable CN profiles generated from the entire biopsy sample (not dissected or separated) translate into surprisingly stable predictions of a patient’s risk of progression. Further, these single-sample predictions were as accurate as aggregated data from multiple biopsies across the entire endoscopy or patient, showing that despite high levels of divergence there are common patterns of CN alterations indicative of progressive disease. This level of predictive power using a genome-wide algorithm is more challenging to achieve with a focussed biomarker approach given the disease heterogeneity.

Perhaps most interestingly for biomarker investigations is that, while our statistical model selects some genomic regions of instability as features that are known to be early drivers of EAC (e.g. TP53; Supplementary Fig. 7), few other features have any clearly associated tumour suppressor genes or other cancer-related activity (Supplementary Table 3). The heterogeneous nature of BE would partly explain the differences between the features our model selects as significantly contributing to progression from those found in previous studies28, however, there is currently no clear functional explanation for most of the features identified. It is likely that the sum of many small changes and the breakdown of gene regulatory control fuels oncogenicity.

While this study provides good evidence that genomic changes can predict future cancer risk, it is limited by the relatively small number of patients in the cohort, particularly progressive patients. Future studies that include more longitudinal genomic data will improve the sensitivity and specificity estimates of this model.

Ultimately, the combined use of low-cost genomic technologies, standard clinical samples and statistical modelling presented here is an example of how genomic medicine can be implemented for early detection for cancer. This demonstrates that genomic risk stratification has a realistic potential to enable earlier intervention for high-risk conditions, and at the same time reduce the intensity of monitoring and even reduce overtreatment in cases of stable disease.

Methods

Patient cohorts

A nested case-control cohort of 90 patients were initially recruited to this study from patients that had been under surveillance for BE in the East of England from 2001 to 2016 for a total of 632 person years. Permission to analyse existing clinical diagnostic samples was approved by the relevant institutional ethics committees (REC 14-NW-0252). Cases comprised 45 patients who progressed from NDBE to HGD or IMC with a minimum follow-up of 1 year (mean 4.6 ± 3.7 years). Controls were 45 patients who had not progressed beyond LGD starting from NDBE with a minimum follow-up of 3 years (mean 6.7 ± 3.2 years). Cases and controls were matched for age, gender, and length of BE segment (Supplementary Table 1). Patients had endoscopies at intervals determined by clinical guidelines with 4-quadrant biopsies taken every 2cm of BE length (Seattle protocol). One non-progressor patient revoked consent prior to analysis, and a second non-progressor was later removed during analysis when multiple comorbidities affecting the esophagus were identified. A total of 777 samples were sequenced with 773 passing our post-processing quality control.

An independent unmatched cohort of 75 patients was subsequently selected from patients under surveillance for BE in the East of England from 2001 to 2018 for model validation. This cohort was comprised of 18 patients who had progressed from NDBE to HGD or IMC with a minimum follow-up of 1 year (mean 6.1 ± 3.4 years) and 58 patients who had not progressed beyond LGD starting from NDBE with a minimum follow-up of 1.5 years (mean 5.4 ± 3.0 years). The earliest available endoscopy samples subsequent to initial BE diagnosis were obtained to assess future risk. No endpoint samples (e.g. HGD or IMC) were included. This cohort was selected from available samples with no attempt to match demographics, however no significant differences were found between the groups (Supplementary Table 2). A total of 219 samples were sequenced from this cohort, with 213 passing our post-processing quality control.

Each sample from both cohorts was graded by multiple expert GI histopathologists using current clinical guidelines for IMC, HGD, LGD, indeterminate (ID), and NDBE. A single biopsy graded as HGD or IMC was considered the endpoint for progression as patients were immediately recommended treatment in the clinic. Since 2014 patients with LGD are also routinely treated with RFA making prospective analysis of the real rate of progression difficult.

All patients had previously given informed consent to be part of the following studies: Progressor study (REC -10/H0305/52), Barrett’s Biomarker Study (REC -01/149), OCCAMs (REC 07/H0305/52 and 10/H0305/1), BEST (REC 06/Q0108/272) BEST2 (REC 10/H0308/71), Barrett’s Gene Study (REC 02/2/57), Time& TIME 2 (REC 09/H0308/118), NOSE study (REC 08/H0308/272), Sponge study (REC 03/306).

Patient samples from the Seattle Barrett’s Esophagus Study31, which use SNP arrays as an orthogonal measure of CN with an endpoint of EAC, were also included for further validation (Supplemental Methods and Results).

Tissue Sample Processing and p53 IHC

Formalin fixed, paraffin embedded (FFPE) tissue samples from routine surveillance endoscopies were processed from scrolls, without microdissection as this protocol aims to be clinically relevant. Following the Seattle protocol for endoscopic surveillance 4-quadrant biopsies were taken every 1-2cm of the Barrett’s length at each endoscopy per patient. At each 1-2cm length the quadrant biopsies were pooled for sequencing as a single sample to ensure sufficient DNA (75ng) was present.

An additional section at each level of the Barrett’s segment (n=88,n=590 sections) were stained (IHC) using a monoclonal antibody for wild-type and mutant p53 (NCL-L-p53-D07) at the NHS Addenbrooke’s Hospital UK on the Leica BOND-MAX™ system using Bond Polymer Refine Detection reagents (Leica Microsystems UK Ltd., Milton Keynes, UK) and graded by an expert pathologist as aberrant (absent or over-expressed) or normal35,36.

Shallow whole genome sequencing pipeline

Single-end 50-base pair sequencing was performed at a depth of 0.4X on the Illumina HiSeq platform. Sequence alignment was performed with BWA37 v.0.7.15, and pre-processing of the reads for mappability, GC content, and filtering was performed with QDNAseq30 using 50kb bins. Only autosomal sequences are retained after filtering due to low-depth mappability and GC correction. Samples were segmented for CN analysis using the piecewise constant fit function (pcf) in the R Bioconductor `copynumber` v1.16 package38. Input to this function was the GC adjusted read counts from QDNAseq (Supplementary Fig. 8).

Post-processing quality control

Per-segment residuals were calculated and the overall variance across the median absolute deviation of the segment residuals was derived as a per-sample quality control measure. This measure was developed using an additional set of samples (n=233), from fresh-frozen tumor tissue, FFPE cell-line tissue, and FFPE patient samples. No relationship was found between sample age and data quality, and post-segmentation quality issues were not resolvable (Supplemental Fig. 9). Therefore, samples with a mean variance of the segment residuals greater than 0.008 were excluded from analysis. This excluded more than 73% (171/233) from the quality control samples across all sample types (FFPE patient, FFPE cell line, fresh-frozen tumor). In the discovery cohort we excluded 0.5% (4/777) of samples, and in the validation cohort 2% (6/219) of samples.

Statistical methods

We encoded all CN data on a genome-wide scale by taking a weighted average of the segmented values per 5Mb windows, and mean standardization per genomic window. In order to evaluate chromosomal instability on a larger scale we averaged the segmented values across chromosome arms and adjusted each 5Mb window by the difference between the window and the arm. The resulting data was 589 5Mb windows and 44 chromosome arms. We additionally included a measure of genomic complexity (‘cx’) by summing, per-sample, the 5Mb windows that had CN values two standard deviations from the mean.

We performed elastic-net regression with the R glmnet39 package to fit regression models with varying regularization parameters. 5-fold cross validation repeated 10 times was performed on a per-patient basis removing all samples from 20% of patients in each fold. This process was performed in three conditions: using all samples; excluding HGD/IMC samples; excluding LGD/HGD/IMC. The two exclusion conditions were performed in order to assess the contribution of dysplasia to the classification rate of the model.

The model was additionally tuned on two parameters: 1) QDNAseq bin size; and 2) elastic-net regression penalty, between 0 (ridge) and 1 (lasso). We assessed the cross-validation classification performance of the model at multiple QDNAseq bin sizes, and at multiple regression penalties. We selected the final QDNAseq bin size by comparing the leave-one-patient out predictions from the discovery cohort, to the model predictions for the validation. This was done to minimize the batch errors in the raw data (Supplementary Figs. 10-11). For the regression penalty parameter, all models had a cross-validation classification rate of 72-75%. We therefore selected the parameter that limited the number of coefficients (n=74) and was not full lasso (e.g. 0.9). Coefficients determining the logarithmic relative risk change stemming from a unit change were calculated for each genomic region selected.

Subsequently, a leave-one-patient out analysis (excluding all samples of an individual) was performed to generate predictions for all samples from a single individual and estimate the overall model accuracy using the area under the ROC curve using the R pROC40 package.

Supplementary Material

Supp Appendix

Acknowledgements

We thank the patients who donated tissue samples to this project. The laboratory of R.C.F. is funded by a Core Programme Grant from the Medical Research Council (RG84369). This work was also funded by a United European Gastroenterology Research Prize (RG76026). We thank the Human Research Tissue Bank, which is supported by the UK National Institute for Health Research (NIHR) Cambridge Biomedical Research Centre, from Addenbrooke’s Hospital. Additional infrastructure support was provided from the Cancer Research UK–funded Experimental Cancer Medicine Centre. We also thank Brian J. Reid, Patricia C. Galipeau, and Carissa A. Sanchez from the Fred Hutchinson Cancer Research Center for their time and help in understanding their data, as well as Alexander Wolfgang Jung from the EMBL-EBI for his time.

Footnotes

Data availability

Sequencing data and associated metadata that support this study have been deposited in the European Genome-phenome Archive under accession EGAD00001006033. The code and model that support these findings have been provided as an R package in a GitHub repository (https://github.com/gerstung-lab/BarrettsProgressionRisk).

Author contributions

S.K. developed the statistical methods, analysed data, and wrote the manuscript and supporting information with input from E.G., R.C.F., and M.G. E.G. put together the discovery cohort, developed the sWGS methods, generated the sWGS data, and curated the clinical information with support from A.V.J. S.K. and E.G. are joint first authors. The initial processing pipeline was developed by D.C.W., D.J.W. and M.E., and provided input to the data analysis for the sWGS data. W.J., R.R., C.K. and A.M. identified, collected, and assessed pathology for patient samples. S.A., A.B., and C.K. sequenced the validation cohort and QC samples. R.C.F. initiated and jointly supervised the study with M.G. and are joint corresponding authors.

References

  • 1.Gerstung M, et al. The evolutionary history of 2,658 cancers. Nature. 2020;578:122–128. doi: 10.1038/s41586-019-1907-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Mitchell TJ, et al. Timing the Landmark Events in the Evolution of Clear Cell Renal Cell Cancer: TRACERx Renal. Cell. 2018 doi: 10.1016/j.cell.2018.02.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lee JJ-K, et al. Tracing Oncogene Rearrangements in the Mutational History of Lung Adenocarcinoma. Cell. 2019 doi: 10.1016/j.cell.2019.05.013. [DOI] [PubMed] [Google Scholar]
  • 4.Abelson S, et al. Prediction of acute myeloid leukaemia risk in healthy individuals. Nature. 2018;559:400–404. doi: 10.1038/s41586-018-0317-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Gregson EM, Bornschein J, Fitzgerald RC. Genetic progression of Barrett’s oesophagus to oesophageal adenocarcinoma. Br J Cancer. 2016;115:403–410. doi: 10.1038/bjc.2016.219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Esserman LJ et al. Addressing overdiagnosis and overtreatment in cancer: A prescription for change. Lancet Oncol. 2014;15:e234–e242. doi: 10.1016/S1470-2045(13)70598-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Siegel RL, Miller KD, Jemal A. Cancer statistics. CA Cancer J Clin. 2016;66:7–30. doi: 10.3322/caac.21332. [DOI] [PubMed] [Google Scholar]
  • 8.Masclee GMC, Coloma PM, De Wilde M, Kuipers EJ, Sturkenboom MCJM. The incidence of Barrett’s oesophagus and oesophageal adenocarcinoma in the United Kingdom and the Netherlands is levelling off. Aliment Pharmacol Ther. 2014;39:1321–1330. doi: 10.1111/apt.12759. [DOI] [PubMed] [Google Scholar]
  • 9.Phoa KN, et al. Radiofrequency ablation vs endoscopic surveillance for patients with Barrett esophagus and low-grade dysplasia: A randomized clinical trial. JAMA - J Am Med Assoc. 2014;311:1209–1217. doi: 10.1001/jama.2014.2511. [DOI] [PubMed] [Google Scholar]
  • 10.Shaheen NJ, et al. Radiofrequency Ablation in Barrett’s Esophagus with Dysplasia. N Engl J Med. 2009;360:2277–2288. doi: 10.1056/NEJMoa0808145. [DOI] [PubMed] [Google Scholar]
  • 11.Parasa S, et al. Development and Validation of a Model to Determine Risk of Progression of Barrett’s Esophagus to Neoplasia. Gastroenterology. 2018;154:1282–1289.e2. doi: 10.1053/j.gastro.2017.12.009. [DOI] [PubMed] [Google Scholar]
  • 12.Younes M, et al. p53 protein accumulation predicts malignant progression in Barrett’s metaplasia: a prospective study of 275 patients. Histopathology. 2017;71:27–33. doi: 10.1111/his.13193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Pettit K, Bellizzi A. Evaluation of p53 Immunohistochemistry Staining Patterns in Barrett Esophagus With Low-Grade Dysplasia. Am J Clin Pathol. 2015;144 A382–A382. [Google Scholar]
  • 14.Sikkema M, et al. Aneuploidy and Overexpression of Ki67 and p53 as Markers for Neoplastic Progression in Barrett’s Esophagus: A Case–Control Study. Am J Gastroenterol. 2009;104:2673–2680. doi: 10.1038/ajg.2009.437. [DOI] [PubMed] [Google Scholar]
  • 15.Keswani RN, Noffsinger A, Waxman I, Bissonnette M. Clinical use of p53 in Barrett’s esophagus. Cancer Epidemiol Biomarkers Prev. 2006;15:1243–9. doi: 10.1158/1055-9965.EPI-06-0010. [DOI] [PubMed] [Google Scholar]
  • 16.Reid BJ, et al. Predictors of progression in Barrett’s esophagus II: baseline 17p (p53) loss of heterozygosity identifies a patient subset at increased risk for neoplastic progression. Am J Gastroenterol. 2001;96:2839–2848. doi: 10.1111/j.1572-0241.2001.04236.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Alvi MA, et al. DNA methylation as an adjunct to histopathology to detect prevalent, inconspicuous dysplasia and early-stage neoplasia in Barrett’s esophagus. Clin Cancer Res. 2013;19:878–888. doi: 10.1158/1078-0432.CCR-12-2880. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Jin Z, et al. A multicenter, double-blinded validation study of methylation biomarkers for progression prediction in Barrett’s esophagus. Cancer Res. 2009;69:4112–4115. doi: 10.1158/0008-5472.CAN-09-0028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Weaver JMJ, et al. Ordering of mutations in preinvasive disease stages of esophageal carcinogenesis. Nat Genet. 2014;46:837–843. doi: 10.1038/ng.3013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Secrier M, et al. Mutational signatures in esophageal adenocarcinoma define etiologically distinct subgroups with therapeutic relevance. Nat Genet. 2016;48:1131–1141. doi: 10.1038/ng.3659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Frankell AM, et al. The landscape of selection in 551 esophageal adenocarcinomas defines genomic biomarkers for the clinic. Nat Genet. 2019;51:506–516. doi: 10.1038/s41588-018-0331-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Nones K, et al. Genomic catastrophes frequently arise in esophageal adenocarcinoma and drive tumorigenesis. Nat Commun. 2014;5 doi: 10.1038/ncomms6224. 5224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Blum A, et al. RNA Sequencing Identifies Transcriptionally-Viable Gene Fusions in Esophageal Adenocarcinomas. Cancer Res. 2016 doi: 10.1158/0008-5472.CAN-16-0979. canres.0979.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.The Cancer Genome Atlas Research Network. Integrated genomic characterization of oesophageal carcinoma. Nature. 2017 doi: 10.1038/nature20805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Ross-Innes CS, et al. Whole-genome sequencing provides new insights into the clonal architecture of Barrett’s esophagus and esophageal adenocarcinoma. Nat Genet. 2015;47:1038–1046. doi: 10.1038/ng.3357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Maley CC, et al. Genetic clonal diversity predicts progression to esophageal adenocarcinoma. Nat Genet. 2006;38:468–473. doi: 10.1038/ng1768. [DOI] [PubMed] [Google Scholar]
  • 27.Martinez P, et al. Dynamic clonal equilibrium and predetermined cancer risk in Barrett’s oesophagus. Nat Commun. 2016;7 doi: 10.1038/ncomms12158. 12158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Li X, et al. Assessment of esophageal adenocarcinoma risk using somatic chromosome alterations in longitudinal samples in Barrett’s esophagus. Cancer Prev Res. 2015;8:845–856. doi: 10.1158/1940-6207.CAPR-15-0130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Martinez P, et al. Evolution of Barrett’s Esophagus through space and time at single-crypt and whole-biopsy levels. Nat Commun. 2018:1–12. doi: 10.1038/s41467-017-02621-x. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Scheinin I, et al. DNA copy number analysis of fresh and formalin-fixed specimens by whole-genome sequencing : improved correction of systematic biases and exclusion of problematic regions. Genome Res. 2014:1–24. doi: 10.1101/gr.175141.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Li X, et al. Temporal and spatial evolution of somatic chromosomal alterations: A case-cohort study of Barrett’s esophagus. Cancer Prev Res. 2014;7:114–127. doi: 10.1158/1940-6207.CAPR-13-0289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Shaheen NJ, Falk GW, Iyer PG, Gerson LB. ACG Clinical Guideline: Diagnosis and Management of Barrett’s Esophagus. Am J Gastroenterol. 2016;111:30–50. doi: 10.1038/ajg.2015.322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Fitzgerald RC, et al. British Society of Gastroenterology guidelines on the diagnosis and management of Barrett’s oesophagus. Gut. 2014;63:7–42. doi: 10.1136/gutjnl-2013-305372. [DOI] [PubMed] [Google Scholar]
  • 34.Stachler MD, et al. Paired exome analysis of Barrett’s esophagus and adenocarcinoma. Nat Genet. 2015;47:1047–55. doi: 10.1038/ng.3343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Kaye PV, et al. Novel staining pattern of p53 in Barrett’s dysplasia - the absent pattern. Histopathology. 2010;57:933–935. doi: 10.1111/j.1365-2559.2010.03715.x. [DOI] [PubMed] [Google Scholar]
  • 36.Kaye PV, et al. Barrett’s dysplasia and the Vienna classification: Reproducibility, prediction of progression and impact of consensus reporting and p53 immunohistochemistry. Histopathology. 2009;54:699–712. doi: 10.1111/j.1365-2559.2009.03288.x. [DOI] [PubMed] [Google Scholar]
  • 37.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–60. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Nilsen G, et al. Copynumber: Efficient algorithms for single- and multi-track copy number segmentation. BMC Genomics. 2012;13:591. doi: 10.1186/1471-2164-13-591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010;33:1–22. [PMC free article] [PubMed] [Google Scholar]
  • 40.Robin X, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. doi: 10.1186/1471-2105-12-77. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Appendix

RESOURCES