Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Jul 1.
Published in final edited form as: Clin Cancer Res. 2019 Nov 21;26(1):82–92. doi: 10.1158/1078-0432.CCR-19-1467

Purity Independent Subtyping of Tumors (PurIST), a clinically robust, single sample classifier for tumor subtyping in pancreatic cancer

Naim U Rashid 1,2,*, Xianlu L Peng 1, Chong Jin 1,2, Richard A Moffitt 3,4, Keith E Volmar 5, Brian A Belt 6, Roheena Z Panni 7, Timothy M Nywening 7, Silvia G Herrera 1, Kristin J Moore 1, Sarah Hennessey 1, Ashley B Morrison 1, Ryan Kawalerski 3, Apoorve Nayyar 1, Audrey E Chang 1, Benjamin Schmidt 7, Hong Jin Kim 8, David C Linehan 6, Jen Jen Yeh 1,8,9,*
PMCID: PMC6942634  NIHMSID: NIHMS1541067  PMID: 31754050

Abstract

Purpose:

Molecular subtyping for pancreatic cancer has made substantial progress in recent years, facilitating the optimization of existing therapeutic approaches to improve clinical outcomes in pancreatic cancer. With advances in treatment combinations and choices, it is becoming increasingly important to determine ways to place patients on the best therapies upfront. While various molecular subtyping systems for pancreatic cancer have been proposed, consensus regarding proposed subtypes, as well as their relative clinical utility, remains largely unknown and presents a natural barrier to wider clinical adoption.

Methods:

We assess three major subtype classification schemas in the context of results from two clinical trials and by meta-analysis of publicly available expression data to assess statistical criteria of subtype robustness and overall clinical relevance. We then developed a Single Sample Classifier (SSC) using penalized logistic regression based on the most robust and replicable schema.

Results:

We demonstrate that a tumor-intrinsic two subtype schema is most robust, replicable, and clinically relevant. We developed Purity Independent Subtyping of Tumors (PurIST), a SSC with robust and highly replicable performance on a wide range of platforms and sample types. We show that PurIST subtypes have meaningful associations with patient prognosis and have significant implications for treatment response to FOLIFIRNOX.

Conclusion:

The flexibility and utility of PurIST on low-input samples such as tumor biopsies allows it to be used at the time of diagnosis to facilitate the choice of effective therapies for PDAC patients and should be considered in the context of future clinical trials.

Keywords: pancreatic cancer, molecular subtypes, single sample classifier, PurIST, biomarkers

Introduction

Recent treatment advances, including FOLFIRINOX (1), gemcitabine plus nab-paclitaxel (2), and olaparib for BRCA mutant patients (3), have provided patients and providers with better options. With the substantial progress in molecular subtyping for pancreatic cancer (4-9), there is now an opportunity to determine the optimal choice of therapy given a patient’s molecular subtype and other biomarker information, enabling “precision medicine” approaches in pancreatic cancer (10,11).

Transcriptomic molecular subtyping in pancreatic cancer is currently an area of active development, where multiple subtyping schemas for pancreatic cancer have been proposed. For example, three molecular subtypes with potential clinical and therapeutic relevance were first described by Collisson et al (5), leveraging a combination of cell line, bulk, and laser capture microdissected (LCM) patient samples: Collisson (i) quasi-mesenchymal (QM-PDA), (ii) classical, and (iii) exocrine-like. A subsequent study of pancreatic cancer patients, based on more diverse pancreatic cancer histologies in addition to the most common pancreatic ductal adenocarcinoma (PDAC), found four molecular subtypes (4): Bailey (i) squamous, (ii) pancreatic progenitor, (iii) immunogenic, and (iv) aberrantly differentiated endocrine exocrine (ADEX). More recently, Puleo et al. describe five subtypes which are based on features specific to tumor cells and the local microenvironment (7). Maurer et al. performed LCM of both tumor and stroma and showed the contribution of each to the three schemas above (8). Lastly, we have previously shown two tumor-intrinsic subtypes of PDAC (6) that we called Moffitt (i) basal-like, given the similarities with basal breast and basal bladder cancer, and (ii) classical, given the overlap with the Collisson classical subtype.

However, consensus regarding proposed subtypes for clinical decision-making in PDAC has been elusive. In addition, each proposed schema utilized independent cohorts of patients to demonstrate clinical relevance. As a result, the generalizability, robustness, and relative clinical utility of each proposed subtyping schema remains unclear. Comparative evaluations of these proposed subtyping systems have been limited, partially due to the difficulty in curating and applying these diverse subtyping approaches in new datasets.

Towards this end, we perform, for the first time, a systematic interrogation of the aforementioned subtyping schemas, based on a meta-analysis of their clinical utility across a large number of independent cohorts in PDAC including two clinical trials with treatment response data. We demonstrate that a tumor-intrinsic two subtype schema from Moffitt et al. (6) is robust and best explains overall survival (OS) and treatment response across multiple validation datasets. Given the clinically replicable performance of this tumor-intrinsic two subtype schema, we have developed a single sample classifier (SSC) that we call Purity Independent Subtyping of Tumors (PurIST) to perform subtype calling for clinical use. We show that PurIST performs well on multiple gene expression platforms including microarray, RNA sequencing (RNAseq), and NanoString. In addition, given the preponderance of non-surgical biopsies in the neoadjuvant and metastatic settings, we demonstrate its clinical utility for small sample volumes using a matched cohort of patients with bulk, archival and fine needle aspirations (FNAs) samples. Lastly, we show the stability of PurIST-predicted subtypes before and after treatment, and that PurIST basal-like subtype tumors are associated with treatment resistance to FOLFIRINOX, strongly supporting the need to incorporate subtyping into clinical trials of PDAC patients.

MATERIALS AND METHODS

Public datasets

Archival data were obtained from public sources (Supplementary Table 1, Figure 1). For the public datasets, expression was used ‘as-is’ with respect to the original publication, i.e. RNAseq data were not re-aligned and gene-level expression estimates were provided in terms of Fragments Per Kilobase per Million reads (FPKM) or Transcripts per Million (TPM), depending on the study.

Figure 1.

Figure 1.

Overall study workflow outlining the analyses performed, including the training and validation steps of the PurIST model. Supp. - supplementary. CMH – Cochran-Mantel-Haenszel Test. Strat. Cox PH - stratified Cox proportional hazards model. BIC - Bayesian information criterion.

Sample collection

De-identified bulk and FNA samples (“Yeh_Seq” dataset, Supplementary Table 1) were collected from the IRB-approved University of North Carolina Lineberger Comprehensive Cancer Center Tissue Procurement Core Facility after IRB exemption in accordance with the U.S. Common Rule and were flash frozen in liquid nitrogen. FNA samples were collected ex-vivo at the time of resection. The FNA technique used mirrors standard cytopathology procedures, where three passes were performed using a 22-gauge needle. Palpation was used to localize the tumor. Samples were frozen in either PBS or RNAlater (Millipore Sigma). FFPE samples were prepared, hematoxylin and eosin stained, and reviewed by a single pathologist (KEV) who was blinded to the results. See Supplemental Methods for data processing and analysis of Yeh_Seq samples. RNAseq (GSE131050) and NanoString (GSE131051) data generated from these samples are deposited in Gene Expression Omnibus (GEO).

Sample inclusion for Consensus Clustering (CC) analysis and PurIST training

Each collected public dataset was subjected to sample filtering to retain samples for CC based subtype calling by the Collisson, Bailey and Moffitt schemas, with criteria specified in Supplementary Table 1. We implemented the methods utilized in the original publication pertaining to each schema (Supplemental Methods). When prior subtype calls were available, the original published calls were used. Specifically, in PACA_AU_seq and PACA_AU_array, the original Bailey subtype calls were used; in TCGA_PAAD, the Collisson and Bailey calls were used. Duplicated samples in PACA_AU_seq and PACA_AU_array datasets were only used once, with the subtypes called in PACA_AU_array used when mismatches of subtype calls were found between the two datasets.

For treatment response and survival analysis, samples with available clinical and RNAseq data were used. Specifically, for the pooled survival analysis, samples from the following datasets with RNAseq data and CC calls were utilized: Linehan, Moffitt_GEO_array, PACA_AU_seq, PACA_AU_array and TCGA_PAAD (Survival Group, Supplementary Table 1). To train PurIST, Moffitt schema CC calls from the datasets in the Training Group (Aguirre, Moffitt_GEO_array and TCGA_PAAD, Supplementary Table 1) were utilized. These samples were further filtered to provide final training labels for the PurIST algorithm (Supplementary Tables 1,2) by dropping poorly clustered samples on the clustered dendrogram in each dataset based on visual inspection. We considered these filtered calls as “training labels”. Model training for PurIST is described in the Supplementary Methods.

RESULTS

The Moffitt tumor-intrinsic two subtype schema has important implications for treatment response

To evaluate the potential impact of molecular subtypes on treatment response, we utilize transcriptomic and treatment response data from two independent clinical trials, and perform a systematic analysis of treatment response with respect to CC calls from each of the three different subtyping schemas (Supplementary Methods) for PDAC: Collisson, Bailey, and Moffitt (4-6). We first examined the association of the subtypes from each schema with treatment response using patient samples from a promising Phase 1b trial by Nywening et. al (“Linehan”, Linehan_seq dataset, Supplementary Figure 2, Supplementary Tables 1, 2) of FOLFIRINOX in combination with a CCR2 inhibitor (PF-04136309) in locally advanced PDAC patients, where an objective response was seen in 49% of patients (12). Enrolled patients had no prior treatment, and underwent core biopsies prior to the start of therapy. Twenty eight patients with RNAseq and treatment data were available for analysis.

We found a significant overall association between categorical treatment response (based on RECIST 1.1 criteria) and pre-treatment subtype classifications from the Moffitt schema (P = 0.0117, Supplementary Table 3), where basal-like tumors showed no response to FOLFIRINOX alone or FOLFIRINOX plus PF-04136309 after stratifying by arm (Overall Response Rate (ORR) = 0%, Disease Control Rate (DCR) = 33%, Supplementary Table 3, Figure 2A, generalized Cochran-Mantel-Haenszel Test), while classical tumors showed a much stronger response overall (ORR = 40%, DCR = 100%). In contrast, we were unable to identify a relationship between subtype and treatment response under the Collisson (P = 0.428) and Bailey (P = 0.113) schemas (Figure 2A, Supplementary Table 3). As the sample size in this Phase 1b trial (n = 28 patients) was small, we similarly reanalyzed the COMPASS trial results (n = 40 patients) in the context of the three subtyping schemas.

Figure 2.

Figure 2.

Subtype performance in discriminating treatment response. A-B, Waterfall plots showing the percent change (% change) in size of tumor target lesions from baseline in the context of the Collisson, Bailey, and Moffitt schemas in the (A) Linehan and (B) COMPASS trials. +20% and −30% of size change are marked by dashed lines. Bar colors denote respective subtype calls of pre-treatment samples. RECIST treatment response classification based on these values are given in Supplementary Table 3. A, Colored tracks below each plot denote the subtype calls in pre- and post-treatment. Patients marked with * were treated with FOLFIRINOX. The remainder of patients were treated with FOLFIRINOX+PF-04136309. B, Patients marked with * were treated with gemcitabine/nab-paclitaxel (GP)-based therapy. The remainder of patients were treated with modified FOLFIRINOX (m-FOLFIRINOX). C, Sankey plots showing transitions in subtype pre- and post-treatment in the Linehan trial in the context of the Collisson, Bailey, and Moffitt schemas.

Patients enrolled in COMPASS underwent core needle biopsies and were treated with one of two standard first-line therapies, modified-FOLFIRINOX or gemcitabine plus nab-paclitaxel. Collected patient samples in COMPASS underwent laser capture microdissection (LCM) followed by whole genome sequencing and RNAseq. Subtypes for each schema were determined as mentioned previously. Similar to our findings in the Linehan Phase 1b trial, we found a significant association between the Moffitt two subtype schema with categorical treatment response stratifying by arm (P = 0.00098, generalized Cochran-Mantel-Haenszel Test), where the basal-like subtype had much lower response to either treatment (ORR = 10%, DCR = 50%) relative to the classical subtype (ORR = 36.7%, DCR = 100%). We also found significant associations between treatment response and the subtypes from the Collisson (P = 0.0024) and Bailey (P = 0.0067) schemas. However, we notably observe that the Bailey squamous subtype strongly overlaps with the Moffitt basal-like subtype, and the remaining non-squamous Bailey subtypes appear to overlap strongly with the Moffitt classical subtype (Figure 2B, Cohen’s Kappa = 1.0, P = 2.54 × 10−10). We similarly found that the Collisson QM-PDA and the remaining non-QM-PDA subtypes correspond strongly with the Moffitt basal-like and classical subtypes, respectively (Figure 2B, Cohen’s Kappa = 0.875, P = 2.44 × 10−8), a fact also mirrored in the Linehan Trial.

Given these observations, we formally evaluated the relative clinical utility of each subtyping system using non-nested model selection criteria such as Bayesian information criterion (BIC) (13). Briefly, such criteria evaluate model fit relative to the complexity of the model, as models with more predictors (subtypes) may simply have better fit due to overfitting, and also may contain excess predictors (additional subtypes) that do not contribute meaningfully in differentiating clinical outcomes. The model with the lowest BIC in a series of competing candidate models is preferred in statistical applications, and is agnostic to the magnitude of the difference (14). Considering response as a continuous outcome (% change in tumor volume), we find that the Moffitt schema had the best (lowest) BIC score in both datasets (Linehan BIC = 247.37, COMPASS BIC = 378.75, two-way ANOVA model, Supplementary Table 3), compared to the Collisson (Linehan BIC = 254.63, COMPASS BIC = 382.8) and Bailey (Linehan BIC = 250.75, COMPASS BIC = 385.66) schemas. This result similarly held if we considered response as a categorical variable (ordinal regression model, Supplementary Table 3). This finding is also reflected among the non-QM-PDA and non-squamous subtypes (Figure 2A, B, Supplementary Table 3), where little difference in response can be seen between these subtypes. Our results using BIC suggest that the additional subtypes found in the Collisson and Bailey schemas do not demonstrate additional benefit in differentiating treatment response over the Moffitt two subtype schema. Taken together, these results suggest that the Moffitt basal-like and classical subtypes strongly and parsimoniously explains treatment response relative to other schemas in both clinical trials.

The Linehan Phase 1b trial captured both pre- and post-treatment biopsies, providing a unique opportunity to evaluate the stability of molecular subtypes after treatment. As pre- and post-treatment biopsies are unlikely to be obtained from the same location, these samples may also provide an opportunity to evaluate intra-patient tumor heterogeneity. Interestingly, we found strong stability in the Moffitt schema subtypes in pre- and post-treatment biopsies (Figure 2C, Cohen’s Kappa = 1.0, P = 2.54 × 10−10), suggesting that not only may there be less tumor-intrinsic subtype heterogeneity within a tumor, but also that the Moffitt schema subtypes are not affected by treatment, either with FOLFIRINOX or with the addition of the CCR2 inhibitor. In contrast, we found higher rates of switching in Collisson subtypes pre- to post-treatment (Figure 2C, Supplementary Figure 2), where changes in the exocrine-like and classical subtypes were more common. Similarly, the non-squamous Bailey subtypes appeared to show the highest rate of subtype switching pre- and post-treatment, with the ADEX subtype demonstrating the highest rate of switching among these subtypes (Supplementary Figure 2). It is unclear whether there is any clinical significance to such subtype transitions. Prior studies have suggested that the Bailey ADEX, Bailey immunogenic, and Collisson exocrine-like subtypes are confounded by tumor purity in contrast to the Moffitt subtypes (7-9), which may explain some of the increased heterogeneity in subtypes pre- and post-treatment in these schemas. In contrast, the Collisson QM-PDA and Bailey squamous subtypes, which were shown to overlap strongly with the Moffitt basal-like subtype, were observed to be much more stable between the two time points.

The tumor-intrinsic two subtype schema strongly and replicably differentiates patient survival across multiple studies

Given the paucity of available genomic data in the context of treatment response in PDAC, we also perform a meta-analysis of five independent patient cohorts with OS data available: Linehan_seq, Moffitt GEO array (GSE71729), ICGC PACA_AU array, ICGC PACA_AU seq and TCGA PAAD (Survival Group, Supplementary Figure 3, Supplementary Tables 1, 2). To determine the potential replicability of the different subtyping schemas (Collisson, Bailey, Moffitt) in differentiating clinical outcomes, we utilized CC subtype calls from each schema.

We find that the Moffitt tumor-intrinsic two subtype schema reliably differentiates survival across individual datasets (Supplementary Figure 1, Supplementary Table 4), showing significant associations with OS in the majority of individual studies in contrast to other schemas. After pooling datasets, we found that patients with Moffitt basal-like subtype tumors have significantly worse prognosis compared to the Moffitt classical subtype (Figure 3C, stratified HR = 1.98, P < 0.0001, stratified Cox proportional hazards model). We also observed similar trends in the Bailey squamous and Collisson QM-PDA subtypes relative to other subtypes in the same schemas (Figure 3A, B), mirroring our treatment response results from the previous section (Figure 2A, B). However, overall subtype-specific survival differences were most pronounced within the two subtype schema across studies (Supplementary Table 4), compared to the Collisson (P = 0.069), and Bailey (P = 0.076) schemas. Moreover, we find that that non-squamous subtypes in the Bailey schema have very similar OS to one another (Figure 3B), where a direct overall comparison of these subtypes showed no statistically significant differences in survival in our pooled dataset (immunogenic vs ADEX stratified HR = 1.07, pancreatic progenitor vs ADEX HR = 1.01, overall P = 0.82). We find a similar result when comparing survival among patients from the non-QM-PDA subtypes in the Collisson schema in the pooled data (Figure 3A, exocrine-like vs. classical stratified HR = 1.17, P = 0.344).

Figure 3.

Figure 3.

Subtype performance in predicting patient prognosis in pooled datasets from the Survival Group (Supplementary Table 1). Kaplan-Meier plots of OS in the context of the subtyping schemas of A, Collisson, B, Bailey and C, Moffitt. Log-rank p -values for overall association were determined from stratified cox proportional hazards models, where dataset was used as a stratification factor to account for variation in baseline hazard across studies. BIC was calculated to compare the three subtyping schemas.

In our pooled dataset, strong correspondence was again found between the Bailey squamous, Collisson QM-PDA, and Moffitt basal-like subtypes, and between the Moffitt classical subtype and the remaining subtypes in the Bailey (Cohen’s Kappa = 0.56, P = 0, Supplementary Figure 3A) and Collisson (Cohen’s Kappa = 0.4, P = 0, Supplementary Figure 3B) schemas. In TCGA PAAD, where estimates of tumor purity were available, Moffitt classical patients that were also classified as QM-PDA in the Collisson schema had much lower tumor purity than other samples (P = 0.0016, Supplementary Figure 3C). The Bailey ADEX and immunogenic samples also had lower tumor purity, regardless of whether they were called Moffitt classical or basal-like (Supplementary Figure 3D). These findings are similar to other studies (7-9), and suggest that the discordance in subtype assignment between schemas may be driven by tumor purity.

To determine the best fitting model for OS, we calculate BIC with respect to the stratified Cox proportional hazards model pertaining to each schema. Similar to our analysis of treatment response, we find that the Moffitt two subtype schema has the best (lowest) BIC and therefore has the best and most parsimonious fit to the pooled survival data (Figure 3, Supplementary Table 4). We also find this to be the case in the majority of individual studies, replicated across each of our validation datasets (Supplementary Table 4). These results reflect our finding that no difference in OS can be observed among the Collisson non-QM-PDA and Bailey non-squamous subtypes in our pooled analysis. In total, these findings support the conclusion that the Moffitt two subtype schema strongly and parsimoniously explains differences in OS, compared to alternate subtyping schemas. Our results further suggest that the additional subtypes found in the Collisson and Bailey schemas do not demonstrate additional clinical benefit in terms of predicting OS relative to the simpler Moffitt two subtype schema, based on BIC and direct statistical comparison of the Collisson non-QM-PDA and Bailey non-squamous subtypes. Given the robustness and highly replicable clinical utility of the Moffitt schema, we next developed a single-sample classifier (SSC) based on this tumor-intrinsic two subtype schema in order to avoid reliance on CC-based analysis.

PurIST SSC

The ability to resolve and assign subtypes via clustering is limited when applied to individual patients. Re-clustering new samples with existing training samples may also change existing subtype assignments. Thus, we developed a robust SSC, PurIST, to predict subtype in individual patients, based on our three largest bulk gene expression datasets (TCGA PAAD, Aguirre Biopsies, and Moffitt GSE71729, Training Group, Supplementary Table 1). A key element of our method includes the utilization of tumor-intrinsic genes previously identified (6) to avoid the possible confounding of tumor gene expression with those from other tissue types. For model training we designated training labels (Supplementary Methods). We use rank-derived quantities as predictors in our final SSC model instead of the raw expression values, utilizing the k Top Scoring Pair (kTSP) approach to generate these predictors (Supplementary Methods). The motivation of this approach is that while the raw values of gene expression may be on different scales in different studies, their relative magnitudes can be preserved by ranks.

We find that this type of rank transformation of the raw expression data has several advantages. First, a single predictor (TSP) only depends on the ranks of raw gene expression of a gene pair in a sample. Hence, its value is robust to overall technical shifts in raw expression values (i.e., due to variation in sequencing depth), and, as a result, is less sensitive to common between-sample normalization procedures of data pre-processing (15-17). Second, it simplifies data integration over different training studies as data are on the same scale. Lastly, prediction in new patients is also simplified, as normalizing new patient data to the training set is no longer necessary, which may further affect the accuracy of model predictions (16).

Development and External Validation of PurIST Classifier

We apply a systematic procedure (Supplementary Methods) implementing the above approach to derive our PurIST model for prediction in the tumor-intrinsic two subtype schema given the training labels (Supplementary Methods) and rank transformed predictors for each training samples. The selected eight gene pairs (TSPs), fitted model, and model coefficients are given in Supplementary Table 5. Figure 4A (Supplementary Methods) describes the validation that is performed in a hypothetical new patient by computing the values of each of the eight selected TSPs in that patient, where a value of 1 is assigned if the first gene in a TSP, gene A, has greater expression than the second gene, gene B, in that patient (and assigned 0 value otherwise). These values are then multiplied by the corresponding set of estimated TSP model coefficients, summing these values to get the patient “TSP Score” after correction for estimated baseline effects. This score is then converted to a predicted probability of belonging to the basal-like subtype, where values greater the 0.5 suggest basal-like subtype membership and the classical subtype otherwise.

Figure 4.

Figure 4.

Development and Validation of the PurIST SSC classifier. A, Overview of PurIST prediction procedure. Gene expression for genes pertaining to each PurIST TSP is first measured in a new sample. Values are assigned for each TSP given the relative expression of each gene in the TSP (1 if Gene A > Gene B expression in the pair, 0 otherwise). Given the set of estimated PurIST TSP coefficients, a TSP Score is calculated by summing the product of each TSP and its corresponding TSP coefficient, adjusting for the model intercept. This value is finally transformed into a predicted probability of belonging to the basal-like subtype for classification (inverse logit function). B, Heatmap of expression values pertaining to PurIST genes and patients from six validation studies belonging to the Validation Group (Group Membership, Supplementary Table 1). Columns are ordered by PurIST SSC-predicted basal-like probability. Genes pertaining to each TSP are presented in order along the rows, where genes with higher expression in the basal-like subtype in the pair is labeled with an orange bar on the left track and blue otherwise. CC subtype and training labels used for PurIST SSC training (white bars indicate not utilized for training) show strong correspondence with the SSC-predicted subtypes. Switching in relative gene expression within each gene pair can be observed with respect to subtype. Expression values across PurIST genes were rank transformed to equalize the expression scales across studies. SSC predicted basal-like probabilities were separated into subclasses to illustrate the level of model confidence in prediction (Methods). C, Barplot of SSC confidence levels categorizing the predicted basal-like class probabilities indicate that the majority of predictions are highly confident, with few in the “likely” ranges of each (predicted probabilities between 0.4-0.6). D, Misclassification rates among higher confidence predictions (Strong classical/basal) in either subtype are very low. Shading of bars indicate the relative percentage of each CC subtype in a given prediction category. Lower confidence predictions (Likely/Lean categories), as expected, have higher misclassification rates with respect to CC subtypes but are less frequent overall. E, Consensus ROC curve derived across all six validation studies (blue) in addition to ROC curves pertaining to each individual study (grey) are presented. The consensus AUC is relatively high at 0.993, indicating excellent prediction performance utilizing the CC subtypes as ground truth. F, Inter-study variability curve indicates low variability (y-axis) across studies with respect to various basal-like predicted probability thresholds (x-axis) utilized to classify subjects at basal or classical. At the standard basal-like probability threshold of 0.5, the between study variability in ROC curves is very low, suggesting strong replicability in classification performance of the PurIST SSC across studies. G, Table showing individual metrics assessing PurIST performance in recapitulating CC subtypes in each validation dataset.

To assess the quality of our prediction model, we evaluate the cross-validation error of the final model in our Training Group. We find that the internal leave-one-out cross validation error for PurIST on the Training Group is low (3.1%). To validate this model, we apply it to the Validation Group datasets (Supplementary Table 1, Figure 4) and determine whether PurIST predictions recapitulate the CC subtypes in each study. We find that pooled validation samples strongly segregate by CC subtype when sorted by their predicted basal-like probability, despite diverse studies of origin (Figure 4B). This suggests that our methodology avoids potential study-level batch effects. The relative expression of classifier genes within each classifier TSP (paired rows, Figure 4B) strongly discriminates between subtypes in each sample, forming the basis of our robust TSP-oriented approach for subtype prediction (Supplementary Figure 4). We also find that, visually, predicted subtypes from PurIST have strong correspondence with independently-determined CC subtypes. Overall, the PurIST classifier predicted subtypes with high levels of confidence (Figure 4C), with most basal-like subtype predictions having predicted basal-like probabilities > 0.9 (Strong basal-like), and most classical subtype predictions with predicted basal probabilities of < 0.1 (Strong classical). Among these high confidence predictions, the majority of these calls corresponded with subtypes obtained independently via CC (Figure 4D). Lower confidence calls (Likely/Lean basal-like/classical categories of prediction) had higher rates of misclassification, although these less confident calls were more rare in our validation datasets (Figure 4C).

To evaluate the overall classification performance of PurIST across studies, we apply a nonparametric meta-analysis approach to obtain a consensus ROC curve based on the individual ROC curves from each validation study (18). We found that the overall consensus Area Under the Curve (AUC) is high, with a value of 0.993. ROC curves from individual studies were also consistent (Figure 4E). In addition, we find that the estimated inter-study variability of these ROC curves with respect to predicted basal-like probability threshold t is low overall, with relatively higher variance at low thresholds and almost no variability at our standard threshold of 0.5 or greater (Figure 4F). This reflects the similarity of individual ROC curves seen in Figure 4E. We find that within our validation datasets, the prediction accuracy rates were in general 90% or higher, and individual study AUCs were 0.95 or greater (Figure 4G). Furthermore, sensitivities and specificities were often high and in some cases equal to 1, reflecting near perfect classification accuracy. These results suggest that PurIST is robust across multiple datasets and platforms and recapitulates the subtypes independently obtained via CC, which we have shown to have high clinical utility.

Replicability of PurIST in archival formalin-fixed and paraffin embedded (FFPE) and FNA samples

Because frozen bulk tumor samples are not commonly available in routine clinical practice, we next looked at the replicability of PurIST predictions across sample types that are more widely collected in clinical practice. Notably, nearly all pre-operative and metastatic biopsies are obtained using either fine-needle aspiration (FNA) or core biopsy techniques. Prior studies have shown the feasibility of performing RNAseq on core biopsies (11) and endoscopic ultrasound guided FNAs, both of which are commonly utilized in the diagnosis of pancreatic cancer (19). We therefore evaluated the performance of PurIST in both FFPE and FNA samples.

Among 47 pairs of matched FNA and bulk samples that passed quality control (Yeh_Seq dataset, Supplementary Methods), we found significant agreement between the PurIST subtype calls of the matched FNA and bulk samples (Cohen’s Kappa = 0.544, p = 2.8e-05) (Figure 5A, Supplementary Table 1). Only three pairs of samples (6.4%) show disagreement in subtype calling results using PurIST. CC calls of the bulk samples are also shown as a comparison. We performed a similar evaluation with tumors that we had matched FFPE, FNA, and bulk samples available (Figure 5B, Supplementary Table 1). We found complete agreement among PurIST subtype predictions among FFPE, FNA, and bulk samples in patients that had all three sample types available (five sets total), further supporting that PurIST is robust across different sample preparations. We also found that the genes pertaining to PurIST TSPs are comparatively less variable than genes not designated as tumor-intrinsic (Supplementary Figure 5). For example, PurIST TSP genes, originally selected from our tumor-intrinsic gene list, have significantly higher Spearman correlation between sample types than Bailey immunogenic (p = 0.0149) or ADEX genes (p = 0.0083) (Supplementary Figure 5), using a permutation test (Supplementary Methods). The stability of TSP genes across sample types, support their robustness and their ability to identify tumor-intrinsic signals in samples that may be confounded by low-input or degradation.

Figure 5.

Figure 5.

Comparison of CC and PurIST performance on different sample collections and gene expression platforms. A, Comparison of subtyping results using PurIST between matched FNA and bulk RNAseq samples. PurIST scores (estimated log-odds of a sample being basal-like vs. classical, Supplementary Methods) are in the upper waterfall plot (solid dark slate gray: bulk; solid light blue: FNA) and inferred subtypes are in the corresponding boxes below (solid blue: classical; solid orange: basal-like). Bulk RNAseq CC results are indicated by blue (classical) and orange (basal-like) square borders. “*”- ampullary carcinoma. B, Subtyping results using PurIST across matched FNA, FFPE and bulk RNAseq samples. PurIST scores are in the upper waterfall plot (solid dark slate gray: bulk; solid light blue: FNA; solid brown: FFPE) and inferred subtypes are in the corresponding boxes below (solid blue: classical and solid orange: basal-like). As a comparison, bulk RNAseq CC results are indicated by blue (classical) vs. orange (basal-like) square borders. “*” – NanoString. C, Subtyping results using PurIST (and PurIST-n for NanoString) between matched RNAseq and Nanostring samples. The PurIST(-n) scores (estimated log odds of a sample being basal-like vs. classical, Methods) are in the upper waterfall plot (solid red: RNAseq; solid green: NanoString). Inferred subtypes are in the corresponding boxes below (solid blue: classical; solid orange: basal-like). Bulk RNAseq CC results are indicated by blue (classical) and orange (basal-like) square borders. (A-C) Border colors of waterfall plots indicate sample origin. Cohen’s kappa coefficients and p-values measure the agreement on subtype calls between bulk and FNA in (A), and RNAseq and NanoString in (C).

Replicability of PurIST predictions on a NanoString platform

RNAseq assays in Clinical Laboratory Improvement Amendments (CLIA) certified laboratories are still in their infancy. Thus, we evaluated the performance of PurIST on samples using NanoString nCounter, a gene expression quantification system that directly quantifies molecular barcodes. This platform has been widely used in cancer molecular subtyping (20), and is more widely available in CLIA certified laboratories. In samples with both RNAseq and NanoString platform expression data available, we evaluated the consistency between subtype calls based on their RNAseq and NanoString expression data using PurIST-n (Figure 5C, Supplementary Table 5, Supplementary Methods). This updated classifier is trained in a manner similar to PurIST, with the exception that genes were limited to those in common between the two platforms, as a more limited set of genes were available for our NanoString probeset. We found that there was strong agreement between PurIST-n calls in 51 patients with matched RNAseq/NanoString samples (Cohen’s Kappa = 0.879, p = 2.25e-11), where only one sample showed disagreement in its PurIST-n call. This discrepancy may be due to the relatively lower read count in the RNAseq sample for this patient. In addition, it is noteworthy that the PurIST-n call for this sample is a low confidence call (“lean classical”). These results support the replicability of PurIST on the NanoString platform and suggest that NanoString may be more robust at overcoming the hurdles of low input or degraded samples.

Applicability of PurIST to treatment decision-making

We next evaluated the potential utility of using PurIST for clinical decision making. In basal-like and classical samples that were classified by PurIST, we found significant survival differences in both the pooled public (with all Training Group samples removed) and the Yeh_Seq FNA datasets, with basal-like samples showing shorter OS (Figure 6A, B, Supplementary Figure 6, Supplementary Table 4). We then looked at the relevance of PurIST to treatment response in the COMPASS and Linehan trials (Figure 6C, D). PurIST recapitulated 48 out of 49 PDAC subtype calls compared to the previous CC based calls in the COMPASS dataset, and 66 out of 66 subtype calls in the Linehan dataset (Supplementary Tables 1, 2). Only one patient with a CC classical tumor was called basal-like by PurIST and had stable disease (SD, % change >- 30% and < 20%) in the COMPASS trial. Notably, the only PR seen in a PurIST basal-like tumor was in a patient with an unstable DNA subtype (10). In agreement with our CC analysis (Figure 2B), we find that PurIST predicted subtype tumors had similar associations with treatment response (Figure 6C, D, Supplementary Table 3). We also found no change in PurIST subtype or the confidence of the call after treatment, suggesting that PurIST tumor subtypes are unchanged after treatment with FOLFIRINOX +/− PF-04136300 (Figure 6D, E). Finally, after excluding the sample with an unstable DNA subtype, we show a positive correlation between PurIST basal-like predicted class probabilities and worse treatment response in basal-like tumors (Figure 6F). No association of PurIST classical confidence and treatment response was seen (Figure 6G).

Figure 6.

Figure 6.

Clinical relevance of PurIST single sample classifier in datasets belonging to the Treatment Group. A-B, Kaplan-Meier plots of OS in (A) pooled datasets belonging to the Survival Group minus datasets belonging to the Training Group and (B) Yeh Seq FNA samples. P-value and Hazards ratio for overall association were estimated by stratified Cox proportional hazards model in (A), where dataset of origin was used as a stratification factor. C-D, Waterfall plots showing the percent change (% change) in size of tumor target lesions from baseline in the context of PurIST subtypes in the (C) COMPASS and (D) Linehan trials. +20% and −30% of size change are marked by dashed lines. C, Bar colors denote PurIST subtype calls of the patient tumors. Patients marked with * were treated with gemcitabine/nab-paclitaxel (GP)-based therapy, and the rest were treated with modified FOLFIRINOX (m-FOLFIRINOX). D, Bar colors denote PurIST subtype calls of pre-treatment samples. Colored tracks below to compare subtype calls for samples pre- and post-treatment of PurIST subtyping and the Moffitt schema. Patients marked with * were treated with FOLFIRINOX, and the rest were treated with FOLFIRINOX+PF-04136309. E, Correlation between the PurIST score (basal-like probability) for patient samples pre- and post-treatment in the Linehan trial. Basal-like samples were colored by orange and classical samples are colored by blue. F-G, Correlation between the percentage of change (% change) of tumors and the PurIST score (basal-like probability) derived from PurIST in (F) basal-like and (G) classical samples, excluding the basal-like sample with an unstable DNA subtype.

Discussion

Several subtyping systems for pancreatic cancer have now been proposed. Despite this, several limitations remain before they can be clinically usable. Here we leverage the wealth of transcriptomic studies that have been performed in pancreatic cancer to determine the molecular subtypes that may be most clinically useful and replicable across studies. Our results show that while multiple molecular subtypes may be used to characterize patient samples, the two tumor-intrinsic subtypes from the Moffitt schema: basal-like (overlaps with Bailey squamous/Collisson QM-PDA) and classical (overlaps with non-Bailey squamous/non-Collisson QM-PDA) are the most concordant and clinically robust. The compelling findings of basal-like tumors showing resistance to FOLFIRINOX and the lack of objective studies comparing current first-line therapies FOLFIRINOX vs gemcitabine plus nab-paclitaxel strongly support the need to evaluate the role of molecular subtyping in treatment decision making for PDAC patients. Therefore, we have developed a SSC based on the two tumor-intrinsic subtypes that avoids the instability associated with current strategies of clustering multiple samples and the low tumor purity issues in PDAC samples.

Prior studies have shown that merging samples from multiple studies (horizontal data integration) can improve the performance of prediction models, relative to training on individual studies (21). However, systematic differences in the scales of the expression values in each dataset are often observed, as some may have been separately normalized prior to their publication or were generated from a variety of expression platforms. Complicated cross-platform normalizations are often employed in such situations prior to model training. Furthermore, new samples must be normalized to the training dataset prior to prediction in order to obtain relevant predicted values. This often results in a “test-set bias” (16), where predictions may change due to the samples in the test set or the normalization approach used. In addition, prediction models may change with the addition of new training samples, as renormalizations may be warranted among training samples. In all, this leads to potential complications for data merging, stability of prediction, and model accuracy (22,23). We present PurIST which is not dependent on cross-study normalization, and is robust to platform type and sample collection differences. We show that the sensitivity and specificity of PurIST calls are high across multiple independent studies, demonstrating that the PurIST classifier recapitulates the tumor-intrinsic subtype calling obtained initially by CC. Given the significant clinical relevance of the two tumor-intrinsic subtypes for both prognosis and treatment response, and the high accuracy of predicted subtype calls in our validation datasets, PurIST may have tremendous clinical value. Specifically, we show that PurIST works for gene expression data assayed across multiple platforms, including microarrays, RNAseq and NanoString. Furthermore, the algorithm provides replicable classification for matched samples from snap frozen bulk tissue as well as FNA, core biopsies and archival tissues.

Thus, PurIST may be flexibly used on low input and more degraded samples and may be performed with targeted gene expression platforms such as NanoString, avoiding the need for a CLIA RNAseq assay. Our enduring findings that basal-like subtype tumors are significantly less likely to respond to FOLFIRINOX-based regimens strongly supports the need for the incorporation of molecular subtyping in future clinical trials in order to determine the association of molecular subtypes with this and other therapies. In addition, the stability of PurIST subtypes after treatment is a noteworthy finding and may point to fundamental biological differences in the tumor subtypes. However, larger clinical trials with pre- and post-treatment biopsies will be needed in order to determine if this is a treatment dependent observation. Our ability to subtype based on either core or FNA biopsies considerably increases the flexibility and practicality of integrating PDAC molecular subtypes into future clinical trials in the metastatic and neoadjuvant setting where bulk specimens are rarely available.

In summary, we present a clinically usable SSC that may be used on any type of gene expression data including RNAseq, microarray, and NanoString, and on diverse sample types including FFPE, core biopsies, FNAs, and bulk frozen tumors. While results of the association of FOLFIRINOX resistance in patients with basal-like subtype tumors is compelling, future prospective clinical trials in PDAC patients will be needed to evaluate the utility of PurIST in treatment decision making, and in the context of different therapies.

Supplementary Material

1
2
3
4
5
6
7
8

STATEMENT OF TRANSLATIONAL RELEVANCE.

Molecular subtyping for pancreatic cancer has made substantial progress in recent years, facilitating the optimization of existing therapeutic approaches to improve clinical outcomes in pancreatic cancer. We show that a tumor-intrinsic two subtype schema is the most replicable and clinically robust across different subtype schemas, with basal-like subtype tumors showing resistance to FOLFIRINOX-based regimens in two independent clinical trials. Our results strongly support the need to evaluate molecular subtyping in treatment decision-making for PDAC patients in the context of future clinical trials. We present PurIST, a clinically usable single sample classifier that is robust and highly replicable across different gene expression platforms and sample collection types, and may be utilized in future clinical trials.

Acknowledgements

We thank the following UNC Lineberger Comprehensive Cancer Center Core Facilities for their excellent technical assistance: Translational Genomics Laboratory, Animal Pathology, Translational Pathology, Tissue Procurement. We thank the University of Rochester Genomics Research Center for their excellent technical assistance.

This study used the COMPASS data (https://www.ebi.ac.uk/ega/studies/EGAS00001002543) which was originally generated with the support of the Ontario Institute for Cancer Research through funding provided by the Government of Ontario.

Funding

This study was funded by R01-CA199064 and R01-CA199064-03S1 (NR, JJY), U24-CA211000 (XP, JJY), T32CA009621 (RZP), R01-CA168863 and PANCAN RAN-2 16-95-LINE (DCL, JJY).

Footnotes

Conflicts of Interest

RM, NR, JJY, XP are inventors of PurIST which has been licensed to GeneCentric Technologies.

References

  • 1.Conroy T, Desseigne F, Ychou M, Bouché O, Guimbaud R, Bécouarn Y, et al. FOLFIRINOX versus gemcitabine for metastatic pancreatic cancer. New England Journal of Medicine 2011;364(19):1817–25. [DOI] [PubMed] [Google Scholar]
  • 2.Von Hoff DD, Ervin T, Arena FP, Chiorean EG, Infante J, Moore M, et al. Increased survival in pancreatic cancer with nab-paclitaxel plus gemcitabine. The New England journal of medicine 2013;369(18):1691–703 doi 10.1056/NEJMoa1304369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kindler HL, Hammel P, Reni M, Van Cutsem E, Mercade TM, Hall MJ, et al. Olaparib as maintenance treatment following first-line platinum-based chemotherapy (PBC) in patients (pts) with a germline BRCA mutation and metastatic pancreatic cancer (mPC): Phase III POLO trial. J Clin Oncol 2019; 37:(suppl; abstr LBA4). [Google Scholar]
  • 4.Bailey P, Chang DK, Nones K, Johns AL, Patch AM, Gingras MC, et al. Genomic analyses identify molecular subtypes of pancreatic cancer. Nature 2016;531(7592):47–52 doi 10.1038/nature16965. [DOI] [PubMed] [Google Scholar]
  • 5.Collisson EA, Sadanandam A, Olson P, Gibb WJ, Truitt M, Gu S, et al. Subtypes of pancreatic ductal adenocarcinoma and their differing responses to therapy. Nature medicine 2011;17(4):500–3 doi 10.1038/nm.2344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Moffitt RA, Marayati R, Flate EL, Volmar KE, Loeza SG, Hoadley KA, et al. Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma. Nature genetics 2015;47(10):1168–78 doi 10.1038/ng.3398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Puleo F, Nicolle R, Blum Y, Cros J, Marisa L, Demetter P, et al. Stratification of Pancreatic Ductal Adenocarcinomas Based on Tumor and Microenvironment Features. Gastroenterology 2018;155(6):1999–2013 e3 doi 10.1053/j.gastro.2018.08.033. [DOI] [PubMed] [Google Scholar]
  • 8.Maurer C, Holmstrom SR, He J, Laise P, Su T, Ahmed A, et al. Experimental microdissection enables functional harmonisation of pancreatic cancer subtypes. Gut 2019. doi 10.1136/gutjnl-2018-317706. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Network CGAR. Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma. Cancer cell 2017;32(2):185–203 e13 doi 10.1016/j.ccell.2017.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Aung KL, Fischer SE, Denroche RE, Jang GH, Dodd A, Creighton S, et al. Genomics-Driven Precision Medicine for Advanced Pancreatic Cancer - Early Results from the COMPASS Trial. Clinical cancer research : an official journal of the American Association for Cancer Research 2017. doi 10.1158/1078-0432.CCR-17-2994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Aguirre AJ, Nowak JA, Camarda ND, Moffitt RA, Ghazani AA, Hazar-Rethinam M, et al. Real-time genomic characterization of advanced pancreatic cancer to enable precision medicine. Cancer discovery 2018. doi 10.1158/2159-8290.CD-18-0275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Nywening TM, Wang-Gillam A, Sanford DE, Belt BA, Panni RZ, Cusworth BM, et al. Targeting tumour-associated macrophages with CCR2 inhibition in combination with FOLFIRINOX in patients with borderline resectable and locally advanced pancreatic cancer: a single-centre, open-label, dose-finding, non-randomised, phase 1b trial. The lancet oncology 2016;17(5):651–62 doi 10.1016/S1470-2045(16)00078-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Schwarz G Estimating Dimension of a Model. Ann Stat 1978;6(2):461–4 doi DOI 10.1214/aos/1176344136. [DOI] [Google Scholar]
  • 14.Kass RE, Raftery AE. Bayes Factors. Journal of the American Statistical Association 1995;90(430):773–95 doi 10.1080/01621459.1995.10476572. [DOI] [Google Scholar]
  • 15.Afsari B, Braga-Neto UM, Geman D. Rank Discriminants for Predicting Phenotypes from Rna Expression. Annals of Applied Statistics 2014;8(3):1469–91 doi 10.1214/14-Aoas738. [DOI] [Google Scholar]
  • 16.Patil P, Bachant-Winner PO, Haibe-Kains B, Leek JT. Test set bias affects reproducibility of gene signatures. Bioinformatics 2015;31(14):2318–23 doi 10.1093/bioinformatics/btv157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Leek JT. The tspair package for finding top scoring pair classifiers in R. Bioinformatics 2009;25(9):1203–4 doi 10.1093/bioinformatics/btp126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Martinez-Camblor P Fully non-parametric receiver operating characteristic curve estimation for random-effects meta-analysis. Statistical methods in medical research 2017;26(1):5–20 doi 10.1177/0962280214537047. [DOI] [PubMed] [Google Scholar]
  • 19.Rodriguez SA, Impey SD, Pelz C, Enestvedt B, Bakis G, Owens M, et al. RNA sequencing distinguishes benign from malignant pancreatic lesions sampled by EUS-guided FNA. Gastrointest Endosc 2016;84(2):252–8 doi 10.1016/j.gie.2016.01.042. [DOI] [PubMed] [Google Scholar]
  • 20.Veldman-Jones MH, Lai Z, Wappett M, Harbron CG, Barrett JC, Harrington EA, et al. Reproducible, Quantitative, and Flexible Molecular Subtyping of Clinical DLBCL Samples Using the NanoString nCounter System. Clinical cancer research : an official journal of the American Association for Cancer Research 2015;21(10):2367–78 doi 10.1158/1078-0432.CCR-14-0357. [DOI] [PubMed] [Google Scholar]
  • 21.Richardson S, Tseng GC, Sun W. Statistical Methods in Integrative Genomics. Annu Rev Stat Appl 2016;3:181–209 doi 10.1146/annurev-statistics-041715-033506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lusa L, McShane LM, Reid JF, De Cecco L, Ambrogi F, Biganzoli E, et al. Challenges in projecting clustering results across gene expression-profiling datasets. Journal of the National Cancer Institute 2007;99(22):1715–23 doi 10.1093/jnci/djm216. [DOI] [PubMed] [Google Scholar]
  • 23.Paquet ER, Hallett MT. Absolute assignment of breast cancer intrinsic molecular subtype. Journal of the National Cancer Institute 2015;107(1):357 doi 10.1093/jnci/dju357. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3
4
5
6
7
8

RESOURCES