Abstract
Background:
The clinical behavior of ampullary adenocarcinoma varies widely. Targeted tumor sequencing may better define biologically distinct subtypes to improve diagnosis and management.
Methods:
The hidden genome algorithm, a multilevel meta-feature regression model, was trained on a prospectively sequenced cohort of 3,411 patients (1,001 pancreatic adenocarcinoma, 165 distal bile duct adenocarcinoma, 2,245 colorectal adenocarcinoma) and subsequently applied to targeted panel DNA sequencing data from ampullary adenocarcinomas. Genomic classification (i.e., colorectal vs. pancreatic) was correlated with standard histological classification (i.e., intestinal [INT] vs. pancreatobiliary [PB]) and clinical outcome.
Results:
Colorectal genomic subtype prediction was primarily influenced by mutations in APC and PIK3CA, tumor mutational burden, and DNA mismatch repair (MMR) deficiency signature. Pancreatic genomic subtype prediction was dictated by KRAS gene alterations, particularly KRAS G12D, KRAS G12R, and KRAS G12V. Distal bile duct adenocarcinoma genomic subtype was most influenced by copy number gains in the MDM2 gene. Despite high (73%) concordance between immunomorphologic subtype and genomic category, there was significant genomic heterogeneity within both histologic subtypes. Genomic scores with higher colorectal probability were associated with greater survival compared to those with a higher pancreatic probability.
Conclusions:
The genomic classifier provides insight into the heterogeneity of ampullary adenocarcinoma and improves stratification, which is dictated by the proportion of colorectal and pancreatic genomic alterations. This approach is reproducible with available molecular testing and obviates subjective histologic interpretation.
Keywords: ampulla of Vater, next generation sequencing, genetic, biomarker, somatic mutation
Introduction:
Accurate classification of tumors is essential to guide management and inform prognosis. Traditionally, site of origin and histopathologic subtype have defined a tumor and its expected clinical phenotype. Targeted tumor sequencing is an increasingly available technology that allows more precise classification utilizing patterns of genomic alterations. Moreover, such analyses may help quantify tumor heterogeneity, which is important not only for understanding tumor biology but also differences in clinical behavior and potential tailored treatment.
The hidden genome classifier1,2 is a powerful tool that offers deeper insight into tumor biology. Although there is a robust literature describing the association between specific genomic alterations and individual tumor types, frequent and highly conserved variants comprise a small number of observed variants. Deep sequencing has revealed millions of unique somatic mutations, and often, >90% of somatic variants are singletons3. The hidden genome classifier uses multilevel meta-feature regression to utilize both common and rare variants and incorporates previously unobserved variants to determine cancer type1.
Ampullary adenocarcinoma is a heterogeneous disease that may benefit from hidden genome methodology. Ampullary adenocarcinoma is an uncommon malignancy of the periampullary region,4-7 the junction of the biliary, pancreatic, and digestive tracts. As such, adenocarcinomas arising within the duodenal ampullary complex have variable clinical phenotypes, likely due to differences in cell of origin8. Although two major histologic types – intestinal (INT) and pancreatobiliary (PB) – have been identified9,10, morphology-based subtype classification is unreliable for prognostication11. Additionally, molecular genotyping has not improved prognostic stratification relative to histology alone12. Thus, identification of distinct clinical subtypes for risk stratification and improving selection for multimodality therapies is a critical unmet need.
Herein, we report a molecular taxonomy of ampullary adenocarcinoma using hidden genome methodology based on a broad array of genomic alterations identified by targeted tumor sequencing. Additionally, we used the model to quantify the degree of genomic heterogeneity in individual samples, which included therapeutically actionable alterations.
Methods:
Patient Cohort
After institutional review board (IRB) approval, we identified consecutive patients treated for ampullary adenocarcinoma at Memorial Sloan Kettering Cancer Center (MSKCC) with targeted next-generation sequencing of their tumor. All specimens were sequenced using the MSK-IMPACT (Integrated Molecular Profiling of Actionable Cancer Targets) assay, a clinically validated targeted next-generation sequencing array that can detect mutations, copy-number alterations, and select rearrangements13. Demographic, clinical, pathologic, and outcome data were abstracted from an institutional database and medical record. All patients provided written informed consent. The study was conducted in accordance with the US Common Rule.
Pathologic Assessment of Ampullary Adenocarcinomas
Tumors were reviewed by a gastrointestinal pathologist blinded to the genotypic classifier. The histologic category was assigned using established criteria14 as intestinal (INT), pancreaticobiliary (PB), or mixed intestinal/pancreaticobiliary (mixed). The Ang immunophenotypic classification was assigned as "INT", "PB", or "ambiguous", as previously described15.
Hidden Genome Classification
For probabilistic classification of the ampullary tumors based on their genomic profiles, we trained three hidden genome group-lasso regularized multi-logistic models – (i) a three-class model trained on 2,245 colorectal adenocarcinomas, 165 distal bile duct adenocarcinomas, and 1,001 pancreatic adenocarcinomas (n=3411), (ii) a four-class model that added 254 gastric tumors (n=3665) to the three-class model, and (iii) a four-class model that added 69 small bowel adenocarcinomas (n=3480) to the three-class model. The training cohort included patients treated at MSKCC with available MSK-IMPACT sequencing of their primary tumor. Various descriptive statistics for the training data cohort are provided in Supplemental Table 1. Colorectal adenocarcinoma was used to represent intestinal genomics in the three-class model, given the rarity of primary small bowel adenocarcinoma and the overlap of driver mutations16.
The hidden genome model utilized the following predictors: (i) normalized binary indicators for 250 discriminative individual variants observed in the MSK-IMPACT cancer gene panels, (ii) normalized number of mutations observed in each MSK-IMPACT gene, (iii) normalized counts of mutations associated with each of the 96 possible single-base substitution categories17 (SBS-96; each considering the mutated base (6 possibilities), along with immediately 5’ and 3’ flanks (4 possibilities each), resulting in 6 × 4 × 4 = 96 categories), (iv) (square-root of) the total number of mutations observed in the tumor, (v) binary indicators of copy number loss and gain at each of 476 genes present in MSK-IMPACT panel, and (vi) average copy number log-ratio computed at 782 chromosome cytogenic bands spanned across the 22 autosomes18,19. Predictors (ii) – (iv) can be interpreted as scalar projections of three meta-features, namely, the gene itself, SBS-96, and an intercept meta-feature vector of 1, respectively (Supplementary Methods), along the direction of the mutation profile vector. The mutation contexts embodied in these meta-features combine information in the individual variants, including rare variants, thereby permitting a highly informative dimension reduction of the ultra-high dimensional mutation profile vector. The model also includes several discriminative hotspot variants with substantial residual effects not explained by mutation context.
We used the fitted hidden genome model to predict parental cancer sites for the sample of ampullary adenocarcinoma specimens. For each ampullary tumor, the predicted class probabilities were used to produce a soft classification of the tumors (i.e., percentage colorectal, pancreatic, and biliary). An associated hard classification was subsequently obtained by assigning each tumor to the class with the highest predicted probability if the highest probability was ≥0.5; otherwise, the hard class was “indeterminate”.
Statistical Analysis
Continuous data are expressed as medians and interquartile range (IQR) and compared between groups using Wilcoxon rank sum test. Categorical variables are expressed as frequencies and percentages and compared using Fisher’s exact test. Overall survival (OS) was defined from time of diagnosis to death or last follow-up; for the surgically resected subset, OS was additionally evaluated using time from resection to death or last follow-up. OS was estimated with Kaplan-Meier methods and compared with the log-rank test. All tests were two-sided, and P<0.05 was considered significant. SAS (version 9.4; SAS Institute) or R (version 4.0.1; R Foundation for Statistical Computing) was used for all analyses.
Results:
Patient Cohort
A cohort of 76 patients with ampullary adenocarcinoma was identified (Table 1), with a median age of 60.5 years (IQR 50.0-68.0); 62% were male and 57 (75%) underwent resection (pancreaticoduodenectomy). Median tumor diameter was 2.3 cm (IQR 1.7-2.8), and the majority (72.2%) had AJCC 8th edition stage III disease. Adverse histopathologic features were common: poor differentiation (n=19; 35.2%), lymphovascular invasion (n=39; 75.0%), and perineural invasion (n=37; 72.6%). By immuno-histologic assessment, 21 patients had INT, 50 patients had PB, and 5 patients had mixed subtype tumors.
Table 1.
| Variable, N(%) or median (IQR) or mean (SD) |
Surgery | Total (n=76) | ||
|---|---|---|---|---|
| No resection (n=19) |
Resection (n=57) |
|||
| Age, median (IQR)a | 61 (51-67) | 60 (48-70) | 60.5 (50-68) | |
| Gender, female | 6 (31.6%) | 23 (40.4%) | 29 (38.2%) | |
| CEA (ng/mL), median (IQR)b | 7.2 (3.8-40.9) | 2.5 (1.4-4.1) | 3.8 (1.9-12.0) | |
| Ca 19-9 (U/mL), median (IQR)c | 74 (18.5-1869) | 58 (24-285) | 58 (19-319) | |
| AJCC Staged | I-II | 0 (0%) | 13 (24.1%) | 13 (17.8%) |
| III | 2 (10.5%) | 39 (72.2%) | 41 (56.2%) | |
| IV | 17 (89.5%) | 2 (3.7%) | 19 (26.0%) | |
| Gradee | Well | 1 (5.3%) | 2 (3.7%) | 3 (4.1%) |
| Moderate | 11 (57.9%) | 33 (61.1%) | 44 (60.3%) | |
| Poor | 7 (36.8%) | 19 (35.2%) | 26 (35.6%) | |
| Immunomorphologic subtype | INT | 5 (26.3%) | 16 (28.1%) | 21 (27.6%) |
| PB | 13 (68.4%) | 37 (64.9%) | 50 (65.8%) | |
| Mixed | 1 (5.3%) | 4 (7.0%) | 5 (6.6%) | |
| T-stagef | T1-2 | N/A | 15 (28.9%) | N/A |
| T3 | N/A | 37 (71.2%) | N/A | |
| T4 | N/A | 2 (3.7%) | N/A | |
| Nodal positivityg | N/A | 39 (75.0%) | N/A | |
| Tumor size, median (IQR)h | N/A | 2.3 (1.7-2.8) | N/A | |
| Margin negative (R0)i | N/A | 49 (89.1%) | N/A | |
| Lymphovascular invasionj | N/A | 39 (75.0%) | N/A | |
| Perineural invasionk | N/A | 37 (72.6%) | N/A | |
| Chemotherapy | Neoadjuvant | N/A | 3 (5.3%) | N/A |
| Adjuvant | N/A | 40 (70.2%) | N/A | |
| Radiation, adjuvant | N/A | 17 (33.3%) | N/A | |
Age data missing for 2 patients
CEA data missing for 34 patients
Ca 19-9 data missing for 29 patients
AJCC Stage data missing for 3 patients
Grade data missing for 3 patients
T-stage data missing for 5 patients
Nodal positivity data missing for 5 patients
Tumor size data missing for 7 patients
Margin data missing for 2 patients
Lymphovascular invasion data missing for 5 patients
Perineural invasion data missing for 6 patients
Generating a Hidden Genome Model of Ampullary Adenocarcinoma
To visualize the discriminative signals of the hidden genome methodology when applied to the three-class training set, we performed a principal component analysis of all active predictors (i.e., selected in group-lasso) in the fitted model (Supplemental Figure 1). Following a Uniform Manifold Approximation and Projection (UMAP) analysis20 on the resultant first 50 principal components, an approximate two-dimensional embedding of all active predictors for each training set tumor was created (Figure 1A). There was distinct separation between colorectal and pancreatic tumors, suggesting unique genomic signatures of these two cancer types. Distal bile duct adenocarcinomas, in contrast, did not harbor strong tissue-specific signals but rather a mixture of colorectal- and pancreatic-specific genomic information. Still, many fell “closer” to pancreatic tumors, consistent with their established histologic similarities21,22. A complete heatmap displaying the values of all active predictors in the hidden genome model across all training set tumors (grouped by cancer sites) is displayed in Supplemental Figure 2.
Figure 1.
Visualizing the training data and the fitted 3-class hidden genome model. Panel-A: Scatter plots showing 2-dimensional embeddings of the lasso selected active genomic predictors in the training data. Each point in each scatter represents the genomic profile of a single training set tumor and is color-coded according to its cancer site. For each tumor, the lasso selected active genomic features from the fitted hidden genome model are first collected from the training data; then the first 50 principal components of these active genomic features are computed; finally, 2-dimensional uniform manifold approximation and projections (UMAPs) of these 50 principal components are computed to produce approximate 2-dimensional representations of the genomic profiles of all training set tumors. Panel-B: Log one-vs-rest odds ratios of top 5 predictors (with largest absolute log odds ratios) in each predictor group in the fitted hidden genome model. Each bar represents the change in the log odds of a tumor being classified into the corresponding cancer site, relative to not being classified into that site, for a one standard deviation increase in the associated predictor from its mean, while keeping all other predictors fixed at their respective means. Predictors from the same group (variants, genes, SBSs, gene copy number loss/gain indicators, and cytoband copy number average log ratios) are grouped. The cancer-type specific log odds ratios due to one standard deviation increases from the means of individual predictors while keeping all other predictors fixed at their respective means are plotted for the five most informative predictors in each predictor group. Panel-C: The precision-recall area under the curve (AUC) for one-vs-rest comparisons specific to each training cancer site category, obtained from the pre-validated hidden genome multinomial logistic predictive probabilities for tumors in that site, are plotted as horizontal blue bars. The final row displays the ‘Average’ of the individual class-specific AUCs. The darkened area on each bar represents the corresponding baseline AUC associated with a null classifier that randomly assigns class labels to tumors.
To visualize the tissue-site specific signals of the most informative predictors in the training data set, we plotted the estimated average odds ratios for being classified relative to not being classified (one-vs-rest, Supplemental Methods) (Figure 1B). KRAS had a large positive odds ratio for pancreatic cancer, with the hotspots KRAS G12D, KRAS G12V and the more pancreas-specific KRAS G12R hotspot providing additional discriminative information captured by the “residual” effects at the variant level. However, hotspot KRAS G13D was more specific to colorectal cancer, where its residual effect produced a large positive log odds ratio. In contrast, the APC gene had a large odds ratio for colorectal cancer, while its hotspot APC I1307K had a small odds ratio for that site. There were two predictors with large positive log odds ratios in distal bile duct adenocarcinoma, namely, copy number gain in the MDM2 gene and the SBS-96 category C>T T.T. Complete log-odds ratios of all active predictors in the fitted model are provided as supplemental data.
Finally, to assess the predictive accuracy of the fitted hidden genome model, we computed one-vs-rest precision-recall area under the curve (AUC) for each training site, and subsequently obtained as an overall measure the average of the site-specific AUCs, through pre-validated23 predictive probabilities (see Supplementary Methods). Note that a precision-recall AUC, unlike a receiver operating characteristic (ROC) AUC, adjusts for class size imbalances (which necessarily occur in one-vs-rest comparisons) and thus produces a robust measure of predictive performance of a multi-class classifier. As depicted in Figure 1C, the fitted hidden genome model achieved near perfect predictive accuracies in the colorectal (AUC=0.99) and pancreatic (AUC=0.94) genomic groups. The bile duct group, in contrast, had a noticeably smaller AUC (0.46), likely reflecting the absence of bile-duct specific discriminative genomic signals, heterogeneity, and small sample size. The precision-recall AUCs achieved by the hidden genome classifier were well-above the null baseline values across all cancer sites. The average of the site-specific AUCs was 0.79 (corresponding to a null baseline of 0.33), demonstrating strong overall classification performance of the fitted model.
Analogous heatmaps, UMAP, and odds ratio for the active predictors, along with precision-recall AUCS for the fitted four-class training set models (with gastric and small bowel as the additional training sites, respectively) are displayed in Supplemental Figures 3-6. The additional sites, gastric and small bowel, lacked strong discriminative signals and, similar to distal bile duct tumors, harbored mixtures of pancreatic- and colorectal- specific genomic signals (Supplemental Figures 3A and 5A). These were reflected in smaller one-vs-rest odds ratios (in absolute log scale; Supplemental Figures 3B and 5B), and precision-recall AUCs (Supplemental Figure 3C and 5C).
Predicting Parental Sites for Ampullary Tumors
Applying the trained model to the cohort of 76 ampullary adenocarcinomas, we observed a high degree of concordance between the genomic prediction and pathologic subtype (Figure 2A). For INT subtype classified by histology and immunohistochemistry, 76.2% of the samples were genomically predicted to be colorectal site of origin based on the hidden genome model, with a median predicted probability of 80%. For the PB subtype adenocarcinoma, 56.0% were genomically predicted as pancreatic site of origin, with a median predicted probability of 55%, representing a higher degree of heterogeneity. The distal bile duct adenocarcinoma genomic signature was rarely the dominant profile, with a median predicted probability of 5% for INT subtypes and 10% for PB subtypes. Of note, addition of gastric and small bowel adenocarcinoma to the three-class genomic model did not improve the diagnostic accuracy of the system (Supplemental Figure 7).
Figure 2.
Application of Hidden Genome methodology to test set of ampullary adenocarcinoma specimens. Panel-A: Boxplot for hidden genome predicted probabilities stratified by immunomorphologic subtype. Separately for ampullary tumors in the three pathologic subtype groups, viz., intestinal, mixed and pancreatobiliay (displayed across columns), the predicted probabilities (along the vertical axis) for colorectal, distal bile duct and pancreatic cancer types (along the horizontal axis) obtained from the fitted hidden genome model are plotted as boxplots. The plots show that most ampullary tumors with INT histological subtype have high predicted probabilities for colorectal, and most PB subtype have high pancreatic predicted probabilities. Panel-B: Swimmer plot displaying soft and hard classification probabilities. For all ampullary tumors (plotted across rows; the tumor IDs are displayed along the left-most column) the predicted probabilities of having colorectal (blue), distal bile duct (orange), and pancreatic (pink) as their parental cancer types are plotted along the 2nd column (from left), and the predicted hard classes are plotted along the 3rd column (from left). The “indeterminate” hard classes are displayed as grey bars. The right-most column shows the immuno-histologic classifications, which are used to group the rows. Panel-C: Table of genomic hard classification and the associated immunomorphologic classification.
The cumulative probabilities of the gene classifier for each patient sample are summarized in a probability swimmer plot (multiple horizontal barplots with a common horizontal axis), stratified by immuno-histologic subtype (Figure 2B). The INT subtypes predominately expressed a colorectal genomic profile, although there was a wide range (3-100%). There was similar heterogeneity for the PB subtypes; the pancreatic genomic profile ranged from 1-98%. Interestingly, of the five patients with mixed INT-PB subtype, nearly all (n=4) had a dominant genomic profile characterized by the colorectal signature (range 72-99%).
Genomic predicted probabilities were converted to a hard classifier by assigning a single category for any subtype that reached a 50% threshold. Overall, concordance between the hard classification and immuno-histologic subtype was 73.2% (Figure 2C). Of the 21 patients with INT subtype by histology and immunohistochemistry, 16 (76.2%) had a dominant genomic profile consistent with colorectal. For example, case P–0003602 was INT subtype by immunohistochemistry, which was concordant with a colorectal genotype (100% colorectal; 0% pancreatic; 0% distal bile duct). There were classic colorectal cancer genomic features, including KRAS G12C mutation and gain of chromosome 13. Similarly, case P–0023740 was INT subtype by immunohistochemistry, which was concordant with a colorectal genotype (97%) and included PIK3CA mutation and the presence of a microsatellite instability (MSI) signature (score=46). In contrast, PB had greater genomic heterogeneity. Of the 50 patients with PB subtype by histology and immunohistochemistry, 28 patients (56.0%) had a dominant genomic profile defined by the pancreatic genotype. The remaining patients included 7 colorectal, 8 distal bile duct, and 7 indeterminate. Hard classification into the distal bile duct subset was infrequent, and nearly always from the PB subtype (8 of 9; 89%). When including small bowel adenocarcinoma in the four-model classifier, the predicted probability of small bowel was low (range 0-6%) and never the predicted hard-classified genomic origin (Supplemental Figure 8).
Determining the Most Influential Predictors of Ampullary Tumors
We used the Jensen-Shannon importance metric (Supplemental Methods) to identify the 5 most influential predictors in each individual tumor hard prediction, collectively producing a list of 64 unique predictors with the largest influences across all tumors (Figure 3). The KRAS gene had a strong influence on nearly all pancreatic hard predictions. Conversely, colorectal hard predictions were influenced by both the APC gene mutations and tumor mutational burden (TMB). Finally, distal bile duct adenocarcinoma hard predictions were strongly influenced by copy number gains in the MDM2 gene.
Figure 3.
Visualizing the effects of most influential predictors in individual predictions of ampullary tumors. The Jensen-Shannon importance metrics for 64 most influential predictors (along the columns) are plotted against all 76 ampullary tumors (along the rows) as a heatmap. The rows (tumors) are grouped according to their predicted hard classes and the columns (predictors) are grouped according to their types.
Prognostication by Genomic Profile
After a median follow-up of 26.9 months (IQR 13.8-45.6), median OS was 50.6 months (IQR 34.6-85.63). There was no significant difference in OS between the INT and PB subtypes as defined by histology and immunohistochemistry, overall or in the surgically resected cohort (log rank p=0.129 and p=0.783, respectively; Figure 4A and Figure 4B). In contrast, in the cohort of patients classified into the colorectal or pancreatic hard genomic groups (n=58), there was a trend to improved survival in the colorectal genomic patients (p=0.089; Figure 4C). Moreover, the predicted genomic probabilities of the colorectal and pancreatic subtypes correlated with predicted 72-month survival probability (Figure 4D). In the bivariate gradient plot, genomic scores with higher colorectal probability were associated with higher survival probability, whereas higher pancreatic probabilities were associated with lower survival probability.
Figure 4.
Prognostication according to Hidden Genome predicted probabilities. Panel-A: Kaplan-Meier survival plot of overall survival in all patients, stratified by immunomorphologic subtype. Panel-B: Kaplan-Meier survival plot of overall survival in resected patients, stratified by immunomorphologic subtype. Panel-C: Kaplan-Meier survival plot of overall survival, stratified by genomic hard classification of colorectal vs. pancreatic subtypes. Panel-D: Bivariate gradient plot of predicted probability of genomic category (pancreatic: x-axis; colorectal: left y-axis) and predicted survival at 72 months (color gradation; right y-axis). The immunomorphologic category denoted by shape of point and summarized in key.
Application to Indeterminate Clinical Scenarios
There were several patients with mixed PB/INT histology based on immunohistochemical staining. We queried survival of these patients with mixed subtypes according to the dominant genomic profile present. Patient P–0022573 had morphologic features of PB subtype, yet immunohistochemical (IHC) staining was predictive of INT subtype (MUC1+, MUC2+, CK20+, CDX2+). The genomic classifier predicted higher likelihood of pancreatic genotype, with canonical KRAS G12D mutation. The patient’s clinical course reflected this, with recurrence 5.5 months and death 19.5 months after resection. Patient P–0035477 had morphologic features of mixed subtype along with ambiguous IHC (MUC1+, CK20+, CDX2−). The genomic classifier predicted intestinal subtype with 98% probability; there were a total of 72 mutations (62.3/M) with a MSH2 germline variant and MSI (score=43). The genomic profile correctly predicted a favorable clinical course, with the patient remaining disease-free for over 10 years after resection. Patient P-0012334 had a morphologically mixed tumor, but IHC was not performed because of inadequate tissue. After application of the genomic algorithm, a poor prognosis was predicted by the distal bile duct (75%) and pancreatic (22%) subtypes; the patient died 18 months after diagnosis. P–0002503 had mixed morphology and also inadequate tissue for IHC. The patient survived 71 months after resection, consistent with the 70% predicted probability of colorectal genomic subtype.
Discussion:
We developed a genomic classifier trained on the mutational profiles of related gastrointestinal malignancies to stratify ampullary adenocarcinomas using routinely collected genomic sequencing data. The genomic classifier had high concordance with existing histological and immunohistochemical subtypes and improved quantification of heterogeneity present in individual patient tumors, which may underlie the range of clinical phenotypes observed. Such genomic knowledge provides therapeutic targets – both immediately actionable, as well as potential candidates – and may, following validation in larger cohorts, help guide diagnosis and prognosis.
Our genomic classifier used multilevel meta-feature regression to extract both common and rare variants in training data and incorporate previously unobserved variants when applied to ampullary adenocarcinoma samples. Such methodology has previously been applied successfully to classify “unknown” tumors1, but this study represents the first successful application for tumor subtype classification. The genomic subtype classifier relied on common differences between pancreatic, distal bile duct, and colorectal adenocarcinomas, which were recapitulated in distinct clinical subtypes of ampullary adenocarcinoma. There was a high degree of concordance between the intestinal immunomorphological subtype and the colorectal genomic profile, characterized by APC and PIK3CA gene mutations, TMB, MSI mutational signatures and copy number gains of chromosomes 13 and 20 – all well-known genomic characteristics of colorectal tumors and small bowel adenocarcinomas16. Likewise, the pancreatobiliary immunomorphological subtype was most frequently associated with the pancreatic genomic profile, with frequent KRAS mutations. These results align with recent findings that disruptions in Wnt signaling (most commonly by APC mutation) and MSI are more frequently observed in intestinal subtype tumors, whereas pancreatobiliary subtypes are more likely to harbor KRAS and TP53 mutations24,25. Perkins et al. also demonstrated that KRAS mutations were more frequent in pancreatobiliary subtype defined by immunomorphology12; however, other genomic distinctions between the subtypes were not identified, likely due to the rarity of any single variant. This is a common limitation of single-gene mutational analyses, which is overcome by the genomic classifier methodology that additionally utilizes rare genomic variants.
Certain genomic alterations identified herein hold immediate therapeutic value. For instance, TMB was a strong determinant of the colorectal hard classifier. Checkpoint blockade has demonstrated activity in tumors with high TMB26, and may hold therapeutic value for these subsets. A point of future study will be to determine whether a pancreatic vs. colorectal genomic subtype warrant a tailored systemic chemotherapeutic regimen (e.g., FOLFIRINOX for pancreatic subtype ampullary adenocarcinoma given its successes in the adjuvant treatment of resectable pancreatic adenocarcinoma).
Additionally, the genomic classifier improved diagnosis in patients with mixed morphology tumors. Even after refinement of histologic classification based on immunohistochemical criteria, there was a small cohort that could not be binarized into an intestinal/pancreatobiliary classification. True mixed-type ampullary carcinomas thus appear to be a clinical entity, and the genomic analysis demonstrated the significant heterogeneity that may underlie the inability to “fit” certain tumors into a hard two-category classification. Here, the genomic classifier was particularly useful in guiding prognostication by quantifying the relative proportions of intestinal, pancreatic, and bile duct profiles in a given sample. Importantly, there was a continuum of survival outcomes that was linearly related to the relative proportion of favorable intestinal and unfavorable pancreatobiliary genomic profiles in any given tumor sample.
It remains to be determined in larger, independent samples if the genomic classifier can reliably be used for prognostication. We observed a trend to improved OS in the colorectal vs. the pancreatic genomic groups, which outperformed traditional histologic subtypes in predicting long-term survival. The PB subtype has been associated with inferior survival outcomes relative to the INT subtype in some7,27 but not all studies4,28,29. Genomic profiles that account for marked heterogeneity may allow for more accurate and consistent classification with sub-stratification into prognostically distinct groups. Multi-institutional collaboration will be required to adequately study this rare disease entity and to evaluate the genomic classifier in a larger data set where it can be incorporated into multivariable models along with known prognostic variables.
Several limitations warrant emphasis. First, MSK-IMPACT is increasingly used at our institution, but there is nonrandom referral for targeted sequencing, which may introduce bias into the cases included in these analyses. Second, the use of small bowel adenocarcinoma in our training set may have been the preferred input in place of colorectal adenocarcinoma, given its anatomic proximity to the ampulla. However, evaluation of small bowel adenocarcinoma in our expanded four-class model showed that small bowel adenocarcinoma did not have a distinct genomic signature apart from colorectal adenocarcinoma, adding little value to site-of-origin prediction. Third, tumor heterogeneity may impact the precision of the genomic classifier; we cannot rule out that geographic mapping of subclones would identify differing genomic profiles in different regions of the same tumor25. Fourth, the majority of specimens were collected at resection, but a minority of genomic sequencing was performed on a biopsy of a metastasis or on the primary tumor after systemic therapy. Genomic profiles evaluated by MSK-IMPACT are conserved between primary and metastatic lesions for several malignancies30,31, but this has not been adequately studied for ampullary adenocarcinoma. Lastly, there remains the question regarding the applicability of this hidden genome framework to other NGS platforms. The MSK-IMPACT assay used at our institution interrogates over 400 genes, yet these genes are covered by the majority of large NGS assays because they are commonly mutated in cancer. By comparison, the Foundation Medicine platform analyzes 324 similar genes, including MDM2 copy number gains. We anticipate that the hidden genome algorithm can be re-tooled for other NGS platforms, and are seeking such data to test this hypothesis.
Conclusions:
Our analyses suggest genomic criteria can assist in accurate diagnosis and prognostication of ampullary adenocarcinoma. Genomic heterogeneity shown in our model may be related to the multiple cells-of-origin, and identification of broad differences between genomic subtypes suggest potential subtype-specific therapeutic strategies that may improve survival for these patients.
Supplementary Material
Statement of translational relevance:
Ampullary adenocarcinomas are classified into intestinal or pancreatobiliary subtypes based on histological criteria, with potentially different clinical behavior. The incorporation of targeted tumor sequencing may better define biologically distinct phenotypes of ampullary adenocarcinoma to improve clinical diagnosis and management. Following training of the hidden genome algorithm, a multilevel meta-feature regression model, on related malignancies of the pancreas, distal bile duct and intestine, the molecular taxonomy was applied to an institutional cohort of ampullary adenocarcinoma patients. The genomic classifier methodology revealed significant heterogeneity among ampullary cancers. The genomic classifier better stratified the divergent outcomes in ampullary cancer, which were dictated by the proportion of colorectal and pancreatic genomic alterations. This approach is reproducible with available molecular testing, is not subject to subjective histologic interpretation and holds promise for improving identification of distinct clinical subtypes for risk stratification and may guide selection for multimodality therapies.
Funding:
This work was funded by National Cancer Institute awards P30 CA008748, R01 CA251339 (R. Shen) and U01 CA238444-01 A1 (WRJ). The funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Footnotes
Conflicts of interest: The authors declare no potential conflicts of interest
Software availability: The hidden genome model and the associated computational steps are implemented in the publicly available R package hidgenclassifier (https://www.github.com/c7rishi/hidgenclassifier).
Data availability:
The data underlying this article are available in cBioPortal, at https://www.cbioportal.org.
References:
- 1.Chakraborty S, Begg CB, Shen R. Using the "Hidden" genome to improve classification of cancer types. Biometrics. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Chakraborty S, Martin A, Guan Z, Begg CB, Shen R. Mining mutation contexts across the cancer genome to map tumor site of origin. Nat Commun. 2021;12(1):3051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bailey MH, Tokheim C, Porta-Pardo E, et al. Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell. 2018;174(4):1034–1035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ecker BL, Vollmer CM Jr., Behrman SW, et al. Role of Adjuvant Multimodality Therapy After Curative-Intent Resection of Ampullary Carcinoma. JAMA Surg. 2019;154(8):706–714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Howe JR, Klimstra DS, Moccia RD, Conlon KC, Brennan MF. Factors predictive of survival in ampullary carcinoma. Ann Surg. 1998;228(1):87–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Berberat PO, Kunzli BM, Gulbinas A, et al. An audit of outcomes of a series of periampullary carcinomas. Eur J Surg Oncol. 2009;35(2):187–191. [DOI] [PubMed] [Google Scholar]
- 7.Neoptolemos JP, Moore MJ, Cox TF, et al. Effect of adjuvant chemotherapy with fluorouracil plus folinic acid or gemcitabine vs observation on survival in patients with resected periampullary adenocarcinoma: the ESPAC-3 periampullary cancer randomized trial. JAMA. 2012;308(2):147–156. [DOI] [PubMed] [Google Scholar]
- 8.O'Connell JB, Maggard MA, Manunga J Jr., et al. Survival after resection of ampullary carcinoma: a national population-based study. Ann Surg Oncol. 2008;15(7):1820–1827. [DOI] [PubMed] [Google Scholar]
- 9.Robert PE, Leux C, Ouaissi M, et al. Predictors of long-term survival following resection for ampullary carcinoma: a large retrospective French multicentric study. Pancreas. 2014;43(5):692–697. [DOI] [PubMed] [Google Scholar]
- 10.Chang DK, Jamieson NB, Johns AL, et al. Histomolecular phenotypes and outcome in adenocarcinoma of the ampulla of vater. J Clin Oncol. 2013;31(10):1348–1356. [DOI] [PubMed] [Google Scholar]
- 11.Reid MD, Balci S, Ohike N, et al. Ampullary carcinoma is often of mixed or hybrid histologic type: an analysis of reproducibility and clinical relevance of classification as pancreatobiliary versus intestinal in 232 cases. Mod Pathol. 2016;29(12):1575–1585. [DOI] [PubMed] [Google Scholar]
- 12.Perkins G, Svrcek M, Bouchet-Doumenq C, et al. Can we classify ampullary tumours better? Clinical, pathological and molecular features. Results of an AGEO study. Br J Cancer. 2019;120(7):697–702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cheng DT, Mitchell TN, Zehir A, et al. Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): A Hybridization Capture-Based Next-Generation Sequencing Clinical Assay for Solid Tumor Molecular Oncology. J Mol Diagn. 2015;17(3):251–264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Nagtegaal ID, Odze RD, Klimstra D, et al. The 2019 WHO classification of tumours of the digestive system. Histopathology. 2020;76(2):182–188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ang DC, Shia J, Tang LH, Katabi N, Klimstra DS. The utility of immunohistochemistry in subtyping adenocarcinoma of the ampulla of vater. Am J Surg Pathol. 2014;38(10):1371–1379. [DOI] [PubMed] [Google Scholar]
- 16.Schrock AB, Devoe CE, McWilliams R, et al. Genomic Profiling of Small-Bowel Adenocarcinoma. JAMA Oncol. 2017;3(11):1546–1553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Alexandrov LB, Kim J, Haradhvala NJ, et al. The repertoire of mutational signatures in human cancer. Nature. 2020;578(7793):94–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Furey TS, Haussler D. Integration of the cytogenetic map with the draft human genome sequence. Hum Mol Genet. 2003;12(9):1037–1044. [DOI] [PubMed] [Google Scholar]
- 19.Cheung VG, Nowak N, Jang W, et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature. 2001;409(6822):953–958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Becht E, McInnes L, Healy J, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2018. [DOI] [PubMed] [Google Scholar]
- 21.Collins AL, Wojcik S, Liu J, et al. A differential microRNA profile distinguishes cholangiocarcinoma from pancreatic adenocarcinoma. Ann Surg Oncol. 2014;21(1):133–138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Takenami T, Maeda S, Karasawa H, et al. Novel biomarkers distinguishing pancreatic head Cancer from distal cholangiocarcinoma based on proteomic analysis. BMC Cancer. 2019;19(1):318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Tibshirani RJ, Efron B. Pre-validation and inference in microarrays. Stat Appl Genet Mol Biol. 2002;1:Article1. [DOI] [PubMed] [Google Scholar]
- 24.Gingras MC, Covington KR, Chang DK, et al. Ampullary Cancers Harbor ELF3 Tumor Suppressor Gene Mutations and Exhibit Frequent WNT Dysregulation. Cell Rep. 2016;14(4):907–919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Yachida S, Wood LD, Suzuki M, et al. Genomic Sequencing Identifies ELF3 as a Driver of Ampullary Carcinoma. Cancer Cell. 2016;29(2):229–240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Marabelle A, Fakih M, Lopez J, et al. Association of tumour mutational burden with outcomes in patients with advanced solid tumours treated with pembrolizumab: prospective biomarker analysis of the multicohort, open-label, phase 2 KEYNOTE-158 study. Lancet Oncol. 2020;21(10):1353–1365. [DOI] [PubMed] [Google Scholar]
- 27.Westgaard A, Tafjord S, Farstad IN, et al. Pancreatobiliary versus intestinal histologic type of differentiation is an independent prognostic factor in resected periampullary adenocarcinoma. BMC Cancer. 2008;8:170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zhou H, Schaefer N, Wolff M, Fischer HP. Carcinoma of the ampulla of Vater: comparative histologic/immunohistochemical classification and follow-up. Am J Surg Pathol. 2004;28(7):875–882. [DOI] [PubMed] [Google Scholar]
- 29.Colussi O, Voron T, Pozet A, et al. Prognostic score for recurrence after Whipple's pancreaticoduodenectomy for ampullary carcinomas; results of an AGEO retrospective multicenter cohort. Eur J Surg Oncol. 2015;41(4):520–526. [DOI] [PubMed] [Google Scholar]
- 30.Brannon AR, Vakiani E, Sylvester BE, et al. Comparative sequencing analysis reveals high genomic concordance between matched primary and metastatic colorectal cancer lesions. Genome Biol. 2014;15(8):454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Yaeger R, Chatila WK, Lipsyc MD, et al. Clinical Sequencing Defines the Genomic Landscape of Metastatic Colorectal Cancer. Cancer Cell. 2018;33(1):125–136 e123. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data underlying this article are available in cBioPortal, at https://www.cbioportal.org.




