The machine learning–aided serum lipidomics approach could be used to help early diagnosis of patients with PDAC.
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancers, characterized by rapid progression, metastasis, and difficulty in diagnosis. However, there are no effective liquid-based testing methods available for PDAC detection. Here we introduce a minimally invasive approach that uses machine learning (ML) and lipidomics to detect PDAC. Through greedy algorithm and mass spectrum feature selection, we optimized 17 characteristic metabolites as detection features and developed a liquid chromatography-mass spectrometry-based targeted assay. In this study, 1033 patients with PDAC at various stages were examined. This approach has achieved 86.74% accuracy with an area under curve (AUC) of 0.9351 in the large external validation cohort and 85.00% accuracy with 0.9389 AUC in the prospective clinical cohort. Accordingly, single-cell sequencing, proteomics, and mass spectrometry imaging were applied and revealed notable alterations of selected lipids in PDAC tissues. We propose that the ML-aided lipidomics approach be used for early detection of PDAC.
INTRODUCTION
Pancreatic ductal adenocarcinoma (PDAC) is a highly malignant tumor (1). The high mortality of PDAC results from the locally advanced stage or distant metastases that are often found at the time of diagnosis due to the aggressive nature of this neoplasm (1–3). Hence, many patients with PDAC are not eligible for surgical treatment (4). The reasons for the difficulty in making a primary diagnosis of PDAC are as follows: (i) The anatomic location of the pancreas renders the detection of tumor difficult, (ii) symptoms (weight loss, fatigue, abdominal and back pain, and malaise) are not diagnostic, and (iii) methods for noninvasive detection of pancreatic tumor are imperfect (1, 2, 5). Identification of more effective methods for the detection of PDAC is therefore warranted.
Metabolomics allows the collection, detection, and analysis of all kinds of small-molecule metabolites, which are highly sensitive to biological activities and pathological conditions (6). Metabolomics can be divided into targeted and untargeted metabolomics (7–9). Because of the great metabolite coverage of untargeted metabolomics and reliability of targeted metabolomics, the integration of both assays is a powerful strategy for disease-related biomarker studies. Thus, accurate, robust, and low-cost metabolomics detection methods hold promise for future disease diagnoses (10–12).
Machine learning (ML) refers to data analysis and establishment of appropriate and effective detection or verification models and is an important branch of artificial intelligence (AI) (13–15). In ML, the greedy algorithm is an algorithmic paradigm that follows the problem-solving heuristic of making the locally optimal choice at each stage, and the support vector machine (SVM) is effective to analyze the data used for classification. AI has been applied in several medical fields, which demonstrates its promise for broader application (16–18). Although the combination of ML and metabolomics for diagnosis is an attractive and promising concept, previous works mainly focus on model construction rather than selecting the critical metabolites for disease detection (19, 20). Combining the greedy algorithm and SVM to filter out the important lipid features for the development of disease detection approach has not been reported.
In this study, we sought to combine ML and metabolomics performed with lipid metabolites of serum from patients with PDAC and normal individuals to classify and select lipid features of PDAC. Then, a targeted lipid multiple reaction monitoring (MRM)–mode quantification assay for PDAC detection was established and validated in large sizes of samples. We anticipated that this method would yield an effective, reliable, and accurate minimally invasive approach to PDAC detection.
RESULTS
Serum lipidomics profiling of PDAC
To fully characterize serum metabolites in patients with PDAC, the exploratory study was first conducted with high-performance liquid chromatography–mass spectrometry (HPLC-MS). Preoperative serum samples were collected from 333 patients with PDAC, and serum samples from 262 healthy controls served as normal control (NC) (fig. S1 and table S1). Among them, PDAC and NC samples were collected from four cohorts in three independent hospitals (cohort 1: 23 PDAC versus 33 NC, cohort 2: 50 PDAC versus 50 NC, cohort 3: 210 PDAC versus 129 NC, and cohort 4: 50 PDAC versus 50 NC) (table S1). Patients with either pancreatic cancer undergoing preoperative neoadjuvant therapy, a history of other malignancies before the diagnosis of pancreatic cancer, or other types of pancreatic tumor were excluded from the study. Untargeted lipid metabolomics profiling and data analysis were carried out with serum from both patients with PDAC and NC.
Raw data collected from the data-dependent acquisition (DDA)–MS were processed with MS-DIAL software for feature detection, spectra deconvolution, metabolite identification, and peak alignment among samples. A total of 1416 metabolites belonging to 19 classes of lipids were identified in positive-ion mode, and 669 metabolites belonging to 16 classes of lipids were found in negative-ion mode for each sample.
Construction of an SVM classification model of untargeted lipidomics data of PDAC
All 595 samples were divided into a training set including 495 samples (cohorts 1, 2, and 3) and a test set of 100 samples (cohort 4) (Fig. 1A and table S1). SVM was applied to classify patients with PDAC and NC with all detected lipids. In ML, normalization is a substantial step in data preprocessing, and here, this was performed by using L2 normalization (21). Data gathered in positive-ion and negative-ion modes were analyzed separately.
Fig. 1. Study strategy and classification performance of the ML-aided metabolic PDAC detection approach.
(A) Study strategy and schematic illustration of establishment of the ML-aided metabolic PDAC detection approach. An exploratory study (sample size of n = 595) is set for an SVM model. A feature selection was conducted by greedy algorithm and LC-MS. Setup of targeted lipid MRM-mode quantification assay and validation study [n = 1898; 495 for training set (training set), 100 for internal test set (test set), 1003 for external validation cohort (independent test set), and 300 for prospective clinical cohort (clinical test set)] are outlined. (B and C) Classification performance summaries of the ML-aided metabolic PDAC detection approach on the training, cross-validation, and test set in positive-ion mode (B) and negative-ion mode (C) (shown are means ± SD and n = 5000 iterations; each dot represents an indicated data for one iteration of SVM evaluation).
As shown in Fig. 1A, we performed 5000 times of fourfold cross-validation on the data of the training set in the exploratory study and evaluated classification performance on the training, cross-validation, and test dataset. For each iteration, we randomly selected samples for training (213 PDAC versus 159 NC) and cross-validation (70 PDAC versus 53 NC) for evaluation (c = 5, in SVM).
Upon completing 5000 experiments, the mean training accuracy was 94.34% [95% confidence interval (CI), 94.32 to 94.36%], the mean accuracy of cross-validation was 91.30% (95% CI, 91.23 to 91.36%), while the mean test accuracy was 82.26% (95% CI, 82.07 to 82.46%) with a specificity of 98.05% (95% CI, 98.02 to 98.08%) and a sensitivity of 66.48% (95% CI, 66.08 to 66.88%) in the data from positive-ion mode (Fig. 1B and table S2). The mean test accuracy was 85.88% (95% CI, 85.63 to 86.13%) with a specificity of 71.93% (95% CI, 71.42 to 72.44%) and a sensitivity of 99.83% (95% CI, 99.81 to 99.85%) in the data from negative-ion mode (Fig. 1C and table S2). These results suggest that combining lipid metabolomics and SVM is a promising approach to detection of PDAC.
ML-based and MS-based feature selection
As described above, we used the whole feature space of untargeted lipidomics data, which contained 1416 and 669 metabolites for positive-ion and negative-ion modes, respectively. Because each metabolite imposes different effects on the distinction of PDAC and NC, the whole feature space may contain some “noise” metabolites that are not useful for PDAC detection. To select subsets of features that make a substantial contribution to accurate discrimination between PDAC and NC, a greedy-based feature selection was carried out.
Briefly, we treated all features as input for ML and identified the most effective features using the feature weight generated by SVM and the greedy algorithm. First, we performed 5000 times of fourfold cross-validation to generate a squared mean weight of each feature (fig. S2, A and B). Then, we sorted mean weights based on the square value from large to small in which a larger value indicated a greater importance of the feature in the SVM model for PDAC detection. Next, the greedy algorithm was conducted to choose features of top-ranking importance (Top-100) to generate final predictive models for feature selection. The greedy algorithm was performed by selecting top-ranking features one by one for evaluation. For each current feature, once the combining of previous selected features with a current feature achieved a higher level of performance, the current feature would be marked and added into the selected feature set. For example, for the Nth iteration, based on the features set containing the previous selected features, the greedy algorithm first added the current (Top-N) feature into the set and then conducted 500 times of fourfold cross-validation to evaluate the average performance (c = 5, in SVM). If the average performance obtained was better than that of previous features set, then it was taken to mean that the current (Top-N) feature was complementary to the existing selected feature set and thus critical to identify the PDAC and NC. After applying greedy algorithm for feature selection, the mean accuracies on the training, cross-validation, and test set were increased as shown in Fig. 2 (A and B).
Fig. 2. Greedy-based feature selection of the ML-aided metabolic PDAC detection approach.
(A) Greedy-based feature selection in positive-ion mode. SVM model with Top-27 features shows the best performance of mean accuracy of classification on cross-validation dataset in positive-ion mode (n = 500 iterations). (B) Greedy-based feature selection in negative-ion mode. SVM model with Top-19 features shows the best performance of mean accuracy of classification on cross-validation dataset in negative-ion mode (n = 500 iterations). (C and D) Classification performance summaries of modified ML-aided metabolic PDAC detection approach after greedy-based feature selection on the training, cross-validation, and test set in positive-ion mode (C) and negative-ion mode (D) (shown are means ± SD and n = 5000 iterations; each dot represents an indicated data for one iteration of SVM evaluation). Greedy algorithm is efficient for the feature selection of the ML-aided metabolic PDAC detection approach.
As shown in Table 1, the classification accuracy on the test dataset was 93.61% (95% CI, 93.57 to 93.65%) with a specificity of 89.92% (95% CI, 89.83 to 90.01%) and a sensitivity of 97.30% (95% CI, 97.27 to 97.32%) in the data from the positive-ion mode using 27 selected lipids and 90.40% (95% CI, 90.34 to 90.47%) with a specificity of 83.15% (95% CI, 83.00 to 83.31%) and a sensitivity of 97.66% (95% CI, 97.61 to 97.70%) in the data from the negative-ion mode using 19 selected lipids (Fig. 2, C and D). With analysis of receiver operating characteristic (ROC) (22), the mean area under the curve (AUC) for this method on the test set was 0.9901 for the positive-ion mode and 0.9906 for the negative-ion mode (fig. S2, C and D). Concurrently, we compared the classification accuracy of traditional ML-based feature selection (see Materials and Methods) with the greedy-based feature selection of this ML-aided metabolic PDAC detection approach. We found that application of greedy algorithm–based feature selection achieved a higher level of classification performance with less features in classifying PDAC and control serum (fig. S2, E and F). These results demonstrate the high efficiency of the data processing and feature selection in the ML-aided metabolic PDAC detection approach.
Table 1. Classification performance of the ML-aided metabolic PDAC detection approach with feature selection in the exploratory and in the validation study.
| Classification performance of the ML-aided metabolic PDAC detection approach | ||||||
| The ML-aided metabolic PDAC detection approach with ML-based feature selection | ||||||
| (exploratory study, n = 595) | ||||||
| Training set | Cross-validation set | Test set | ||||
| Data collection | Positive-ion mode | Negative-ion mode | Positive-ion mode | Negative-ion mode | Positive-ion mode | Negative-ion mode |
| Mean accuracy | 0.9596 | 0.9535 | 0.9531 | 0.9472 | 0.9361 | 0.904 |
| 95% CI of accuracy | 0.9595–0.9597 | 0.9534–0.9537 | 0.9526–0.9536 | 0.9467–0.9477 | 0.9357–0.9365 | 0.9034–0.9047 |
| Mean specificity | 0.9863 | 0.9736 | 0.9858 | 0.9667 | 0.8992 | 0.8315 |
| 95% CI of specificity | 0.9861–0.9864 | 0.9734–0.9738 | 0.9854–0.9862 | 0.9661–0.9674 | 0.8983–0.9001 | 0.8300–0.8331 |
| Mean sensitivity | 0.9397 | 0.9385 | 0.9284 | 0.9324 | 0.973 | 0.9766 |
| 95% CI of sensitivity | 0.9395–0.9399 | 0.9383–0.9388 | 0.9275–0.9292 | 0.9316–0.9333 | 0.9727–0.9732 | 0.9761–0.9770 |
| The ML-aided metabolic PDAC detection approach | ||||||
| (validation study, n = 1898) | ||||||
| Training set | Test set |
Independent test set |
Clinical test set | |||
| (n = 495) | (n = 100) | (n = 1003) | (n = 300) | |||
| Accuracy | 0.8949 | 0.8600 | 0.8674 | 0.8500 | ||
| Mean squared error of accuracy |
0.1051 | 0.1400 | 0.1326 | 0.1500 | ||
| Specificity | 0.8915 | 0.8000 | 0.8610 | 0.8100 | ||
| 95% CI of specificity | 0.8398–0.9285 | 0.6586–0.8950 | 0.8225–0.8925 | 0.7473–0.8605 | ||
| Sensitivity | 0.8975 | 0.9200 | 0.8717 | 0.9300 | ||
| 95% CI of sensitivity | 0.8547–0.9292 | 0.7989–0.9741 | 0.8416–0.8968 | 0.8562–0.9690 | ||
| AUC | 0.9591 | 0.9444 | 0.9351 | 0.9389 | ||
Next, MS-based feature selection was carried out. Specifically, all top-ranking features listed in the SVM–greedy algorithm model were filtered on the basis of the chromatographic peak shape of the extracted ion chromatography and the quality of tandem MS (MS/MS) spectra matching results to ensure the reliability and suitability of the selected features for further targeted quantification analysis (table S3).
Establishment of an LC-MS–based targeted lipidomics assay using MRM mode
After ML-based feature selection and MS-based optimization (Fig. 2 and table S3), we selected 12 lipids in the positive-ion mode and 8 lipids in the negative-ion mode including 3 selected lipids overlapped (table S4 and fig. S3) that achieved the best classification performance with the least number of features suitable for detection by MS. To improve data stability and detection efficacy, we developed an LC-MS–based targeted quantification assay for candidate lipids using MRM mode. From the list identified in both the positive-ion and the negative-ion modes, we eventually selected all 17 lipids, including diacylglycerol (DG; 18:1-18:1), lysophosphatidylcholines (LPCs; 14:0, 16:0, 18:1, and 20:4), phosphatidylcholines (PCs; 16:0-16:0, 16:0-18:1, 18:0-18:2, 18:0-20:3, 16:0-22:5, 18:0-22:5, and O-16:0-18:2), lysophosphatidylethanolamine (LPE; 22:4), phosphatidylethanolamine (PE; 16:0-18:2), and sphingomyelin (SM; d18:1/18:0, d18:2/24:1, and d18:2/24:2) for MRM assay method development (fig. S3A, red checks). As shown in figs. S4 and S5 (A and B), the MS/MS spectra and retention times of the lipids in DDA and MRM modes matched perfectly with chemical standards or references, which confirmed the identity of the selected lipids. After parameter optimization that includes LC elution gradient adjustment, transition selection, and CE (collision energy) and DP (declustering potential) optimization, an assay method was lastly constructed that could quantify these 17 lipid markers in one single 19-min LC-MS run (Fig. 3A, fig. S3, and table S5). Standard curves were analyzed using LPC 18:0, PC 16:0-16:0, PE 16:0-18:2, and DG 18:1-18:1 with nine concentration points ranging from 0.20 to 100.00 μg/ml (except for LPC 18:0, which was ranging from 0.40 to 200.00 μg/ml). The r2 values of standard curves for LPC 18:0, PC 16:0-16:0, PE 16:0-18:2, and DG 18:1-18:1 were 0.9968, 0.9958, 0.9990, and 0.9988 with accuracies ranging from 92.89 to 114.95%, 89.54 to 111.24%, 94.16 to 105.42%, and 90.15 to 108.15%, respectively (fig. S5, C to F).
Fig. 3. Method establishment and classification performance of the ML-aided metabolic PDAC detection approach on validation study.
(A) Extracted ion chromatogram of 17 selected lipids quantified by MRM-mode assay. The 17 selected lipid markers (DG 18:1-18:1; LPC 14:0, 16:0, 18:1, and 20:4; PC 16:0-16:0, 16:0-18:1, 18:0-18:2, 18:0-20:3, 16:0-22:5, 18:0-22:5, and O-16:0-18:2; LPE 22:4; PE 16:0-18:2; and SM d18:1/18:0, d18:2/24:1, and d18:2/24:2) are shown with standards in a single 19-min LC-MS run. Each lipid is represented by a different color. (B) ROC curve of the ML-aided metabolic PDAC detection approach on the training set of the validation study. The asterisk sign denotes the cutoff (score = 0) for the ML-aided metabolic PDAC detection approach. (C) ROC curve of the ML-aided metabolic PDAC detection approach on the internal validation dataset of the validation study. The asterisk sign denotes the cutoff (score = 0) for the ML-aided metabolic PDAC detection approach. (D) ROC curve of the ML-aided metabolic PDAC detection approach on the external validation dataset of the validation study. The asterisk sign denotes the cutoff (score = 0) for the ML-aided metabolic PDAC detection approach. The ML-aided metabolic PDAC detection approach shows good performance on detection of PDAC in an independent external validation dataset of the validation study. (E) ROC curves of the ML-aided metabolic PDAC detection approach and CA19-9 on the prospective clinical cohort of the validation study (the ML-aided metabolic PDAC detection approach in black, CT in green, and CA19-9 in blue). The red asterisk sign denotes the cutoff (score = 0) for the ML-aided metabolic PDAC detection approach, the red plus sign for CT diagnosis, and the red multiplication sign for the 37 U/ml cutoff for CA19-9. The ML-aided metabolic PDAC detection approach shows accurate, robust, and better performance on the prospective clinical cohort than CA19-9 and CT scanning.
To verify the classification efficacy of the newly established approach, we applied the method on serum samples of 1898 participants from five hospitals (Fig. 1A and fig. S6). Among them, 595 samples in the exploratory study were used as the training set (n = 495) and test set (n = 100) for internal validation (figs. S1 and S6 and table S1). Multivariate binary logistic regression analysis indicated that sex and age status had limited impact on data distribution of the ML-aided metabolic PDAC detection approach (fig. S6, C to E). These data indicate that selected features are specific for classifying PDAC and NC, independent of age and sex.
Validation of the ML-aided metabolic PDAC detection approach
To further demonstrate the classification performance of the method, we built an SVM model based on the data from the training set and evaluated the model on the test set (c = 10, in SVM). The classification accuracy of the ML-aided metabolic PDAC detection approach reached 89.49% (mean squared error, 0.1051) with 89.15% specificity (95% CI, 83.98 to 92.85%) and 89.75% sensitivity (95% CI, 85.47 to 92.92%) on the training set and an accuracy of 86.00% (mean squared error, 0.1400) with 80.00% specificity (95% CI, 65.86 to 89.50%) and 92.00% sensitivity (95% CI, 79.89 to 97.41%) on the test set (Table 1). The AUC reached 0.9591 for the training set and 0.9444 for the test set (Table 1 and Fig. 3, B and C). These results illustrate the accuracy and effectiveness of the ML-aided metabolic PDAC detection approach.
Next, we validated the performance of the approach on an external validation dataset (Fig. 1A and fig. S1). The external validation dataset contained samples from 1003 eligible participants (600 PDAC versus 403 NC), which was obtained from two independent hospitals (fig. S1 and table S1) and was single-blind to data analysts. As shown in Table 1 and Fig. 3D, the classification accuracy reached 86.74% (mean squared error, 0.1326) with 86.10% specificity (95% CI, 82.25 to 89.25%) and 87.17% sensitivity (95% CI, 84.16 to 89.68%) as well as an AUC of 0.9351 (Table 1). Among 600 PDAC samples, 86.38% (406 of 470 cases) early-stage PDAC samples (stage I to II) and 90.00% (113 of 130 cases) late-stage PDAC samples (stage III to IV) were accurately detected by the ML-aided metabolic PDAC detection approach (table S6). These data demonstrate the robustness of the ML-aided metabolic PDAC detection approach for the detection of PDAC in various stages.
To explore the clinical potential, we examined the performance of the ML-aided metabolic PDAC detection approach on a prospective and single-blind hospital-based cohort, which was established in the Health Management Institute of the Second Medical Center of PLAGH. In this prospective cohort, we recruited 130 cancer-free participants who had undergone medical examination as NC and 170 patients who had received pancreatic surgery, including 70 patients with benign diseases of the pancreas (served as NC)—including chronic pancreatitis, intraductal papillary mucinous neoplasm, mucinous cystic neoplasm, and pseudocysts—and 100 patients diagnosed with PDAC (served as PDAC). As shown in Table 1, the classification accuracy achieved 85.00% (mean squared error, 0.1500) with 81.00% specificity (95% CI, 74.73 to 86.05%) and 93.00% sensitivity (95% CI, 85.62 to 96.90%). The AUC was 0.9389 (Fig. 3E, black line). In this cohort, 90.91% (50 of 55 cases) early-stage PDAC (stage I to II) and 95.56% (43 of 45 cases) late-stage PDAC (stage III to IV) samples were accurately detected by the ML-aided metabolic PDAC detection approach (table S6). We compared the classification performance of the ML-aided metabolic PDAC detection approach with the canonical PDAC biomarker carbohydrate antigen 19-9 (CA19-9) and radiology approach (CT scanning). As shown in Fig. 3E, the AUC of CA19-9 was 0.8790, with 83.00% (mean squared error, 0.1700) accuracy, 79.00% sensitivity (95% CI, 69.47 to 86.25%), and 85.00% specificity (95% CI, 79.12 to 89.59%) (37 U/ml cutoff; Fig. 3E). The AUC of CT scanning was 0.7098, with 86.67% (mean squared error, 0.1222) accuracy, 78.00% sensitivity (95% CI, 68.39 to 85.42%), and 91.00% specificity (95% CI, 85.93 to 94.43%). In DeLong’s test for ROC (23), the AUC of ML-aided metabolic PDAC detection approach was statistically different from the AUC of CA19-9 (P = 0.0394) and CT (P = 0.0010).
The performance in detection of patients with benign pancreatic diseases was concurrently evaluated. The AUC of ML-aided metabolic PDAC detection approach was 0.9309, with 88.24% (mean squared error, 0.1176) accuracy, 93.00% sensitivity (95% CI, 85.62 to 96.90%), and 81.43% specificity (95% CI, 69.98 to 89.36%). The AUC of CT scanning was 0.5794, with 76.47% (mean squared error, 0.2353) accuracy, 78.00% sensitivity (95% CI, 68.39 to 85.42%), and 74.29% specificity (95% CI, 62.21 to 83.66%), and the AUC of CA19-9 was 0.7910, with 72.94% (mean squared error, 0.2716) accuracy, 79.00% sensitivity (95% CI, 69.47 to 86.25%), and 64.29% specificity (95% CI, 51.87 to 75.13%). The AUC of the ML-aided metabolic PDAC detection approach was statistically different from that of CT (P = 2.4815 × 10−5) and CA19-9 (P = 0.0010) in DeLong’s test. These results demonstrate the ideal performance of the ML-aided metabolic PDAC detection approach in distinguishing benign diseases of pancreas from PDAC. Therefore, the ML-aided metabolic PDAC detection approach has clinical potential for PDAC detection, and combination of the AI approach with CA19-9 or CT scanning may be helpful for the clinical diagnosis of PDAC.
MALDI-MSI, tissue proteomics, and scRNA-seq revealed a global lipid metabolism disturbed in PDAC
As shown in fig. S6B, the expression of saturated/monounsaturated PC and SM was increased, whereas that of LPC and polyunsaturated PC was decreased in serum of PDAC. Matrix-assisted laser desorption/ionization–mass spectrometry imaging (MALDI-MSI) analysis was applied to detect the feature lipids of the ML-aided metabolic PDAC detection approach in surgically resected cancerous tissues and adjacent pancreas tissues of five patients with PDAC (fig. S7 and table S1). Results revealed that the expression of one saturated PC (PC 32:0; PC 16:0-16:0) and two SMs (SM d36:1; SM d18:1/18:0 and SM d42:3; SM d18:2/24:1) was increased, while that of two LPCs (LPC 16:0 and LPC 18:1) and one DAG (DAG 36:2; DAG 18:1-18:1) was decreased in cancerous lesions, compared with adjacent noncancerous pancreatic tissues (fig. S7, A and G). These MALDI-MSI results confirmed that the six feature lipids of the ML-aided metabolic PDAC detection approach were consistently altered in PDAC tissues in situ.
Dysregulation of metabolism is a well-established hallmark of cancer (24). However, lipid metabolism has not been systematically explored in PDAC. Therefore, we conducted tissue proteomics analysis and single-cell RNA sequencing (scRNA-seq) analysis of PDAC samples (see table S1 and Materials and Methods).
With respect to tissue proteomics analysis, 10 primary tumors from patients with treatment-naïve PDAC and five paired adjacent control pancreas samples were digested, quantified by label-free strategy of LC-MS/MS, and analyzed (fig. S8, A and B). As shown in fig. S8 (C and D) and table S7, several proteins and pathways associated with lipid metabolism are dysregulated in PDAC tissues, which are shown and highlighted. This result indicates that, in addition to other abnormal metabolism pathways (11, 24, 25), disturbed lipid metabolism is one of the characteristics of PDAC.
Next, we conducted scRNA-seq analysis on an open resource dataset of PDAC (26). Original data of normal pancreas were obtained from patients with benign pancreatic lesion resection (table S1). In parallel, 14 PDAC samples were selected on the basis of their location of pancreatic head or uncinated processes (table S1). Approximately 253 million unique transcripts were obtained from 29,458 cells. Of these, 24,178 cells (82.1%) originated from patients with PDAC compared to 5280 cells (17.9%) that originated from normal pancreas tissues. To generate an overview, 10 major cell lineages were assigned on the basis of well-established markers, including acinar cells, type 1 and 2 ductal cells, endocrine cells, endothelial cells, fibroblasts, stellate cells, macrophage cells, and B and T lymphocytes (fig. S8, E and F). As shown in fig. S8G, the percentages of T lymphocytes, B lymphocytes, and macrophage cells were increased in PDAC tissues, whereas the percentages of type 1 ductal cells and endothelial cells were decreased compared to those in normal pancreas tissues.
We then distinguished PDAC cells from normal epithelial and other types of cells by inferring large-scale copy number variations (CNVs) (see Materials and Methods) (27, 28). As shown in fig. S8H, the CNV of type 2 ductal cell was abnormally higher than that of other cell types, indicating that type 2 ductal cells were highly malignant. In addition, gene expression patterns of type 1 from NCs and type 2 ductal cells from patients with PDAC were examined and confirmed by The Cancer Genome Atlas–Genotype-Tissue Expression (TCGA-GTEx) dataset of PDAC and normal pancreas tissues and published datasets of organoid of PDAC (fig. S8, I to K) (29–31). Thus, type 2 ductal cells were assigned as malignant cells in PDAC, and type 1 ductal cells in NCs were assigned as normal epithelial cells.
Each of the 10 cell lineages among PDAC and control pancreas tissues were directly compared on the 70 metabolism pathways and 15 lipid metabolism pathways as described, respectively (see Materials and Methods) (32). A direct comparison of tumor cells and normal ductal cells showed notable alterations not only in pentose phosphate pathway, nitrogen metabolism, oxidative phosphorylation, and glycine and threonine metabolism pathways as typical tumor signatures but also in several lipid metabolism pathways including glycerophospholipid metabolism, glycerolipid metabolism, fatty acid metabolism, and glycosphingolipid biosynthesis pathways (fig. S9A) (24, 33, 34). Fifteen lipid metabolic pathways are shown in fig. S9B. Notably, these pathways, including glycerolipid metabolism, glycerophospholipid metabolism, and sphingolipid metabolism, were greatly dysregulated in PDAC cells (fig. S9B). As shown in fig. S9C, we also characterized global metabolic changes in ductal cells by comparing PDAC cells or normal ductal cells to others cell types within PDAC samples and healthy pancreas samples, respectively, and then sorting metabolic pathways by rank order. These analyses revealed that glycerophospholipid metabolism is the most significantly altered lipid metabolism–related pathway in PDAC cells (fig. S9C, red lines). Functional analyses of the ML-aided metabolic PDAC detection approach–related metabolites suggest that PC, LPC, and SM are enriched in glycerophosholipid metabolism and sphingolipid metabolism (fig. S3B). Notably, these metabolic pathways were dysregulated in PDAC tissues compared with control pancreas tissues, and the glycerophosholipid metabolism pathway was altered in our proteomics data (fig. S9, B and C, and table S7). In addition, similar results were observed in the TCGA-GTEx dataset of PDAC and normal pancreas samples and an independent mRNA microarray expression dataset of PDAC and paired adjacent tissues (fig. S9, D and E) (35). These proteomics data and scRNA-seq results demonstrate that lipid metabolism is widely dysregulated in PDAC. Together, we have demonstrated the characteristic of disturbed lipid metabolism of PDAC and established an ML-aided metabolic PDAC detection approach based on a combination of targeted lipidomics and ML, and this method can effectively detect PDAC.
DISCUSSION
In most medical applications, the ML approach is typically evaluated on a dataset that is derived from the same dataset and has a distribution consistent with the training dataset (16, 36, 37). In comparison, the ML-aided metabolic PDAC detection approach has been tested and evaluated by a large external validation cohort (n = 1003) including 78.33% (470 of 600) early-stage PDAC samples and a prospective clinical cohort independently. It indicates the stability of performance of this method and the prospect for more general applications of this approach. The characteristics of rapid processing and high accuracy demonstrate the potential for future application of this novel and effective method for PDAC detection.
Conventionally, data dimensionality reduction and biomarker screening of metabolomics or lipidomics mainly based on analysis of variance (ANOVA), AUC, or partial least squares discriminant analysis (PLS-DA) (19, 20, 38). In this study, the SVM-based greedy algorithm was innovatively applied and showed outstanding performance on feature selection of the data from the serum lipidomics. On the basis of the features that were selected by the greedy algorithm and MS, an MRM-based assay was established, with internal standards and standard curves. Considered as the gold standard of LC-MS–based lipid quantification, the results of the MRM-based target lipids assay are more accurate and reliable than those from untargeted metabolomics and are robust between different LC-MS platforms (7–9). These advantages make the workflow of the ML-aided metabolic PDAC detection approach as an attractive option for the discovery of disease-related biomarkers and the establishment of disease detection approaches in different clinical applications. The successful establishment of this method in this study from untargeted metabolomics combining SVM–greedy algorithm and LC-MS–based feature selection with the targeted metabolomics assay paves the way for the identification of metabolites as biomarkers and detection for other types of carcinomas.
Currently, there is no liquid-based testing available for PDAC detection except CA19-9. However, CA19-9 testing has several limitations such as false elevation for patients with benign pancreatobiliary conditions, mostly due to biliary obstruction. The characteristics of the ML-aided metabolic PDAC detection approach as accurate and having high sensitivity, minimally invasive (serum-based), and nonradioactive might thus help to detect PDAC in clinical practice. Applying this approach appropriately to the current clinical diagnosis of suspected PDAC is helpful, especially the simultaneous detection of cancer biomarkers such as CA19-9. Our approach can support clinicians to perform PDAC diagnosis more comprehensively and accurately and may facilitate them to arrange more appropriate follow-up diagnosis and treatment procedures. Therefore, its incorporation into current diagnostic methods may complement conventional diagnosis procedures in patients with high risk of PDAC.
Among the selected features, there were six types of lipids, including four LPCs, seven PCs, three SMs, one LPE, one PE, and one DG. LPC, PC, and PE are involved in glycerophospholipid metabolism, and SMs are involved in sphingolipid metabolism. Tissue proteomics and scRNA-seq analyses revealed that, in PDAC, glycerophospholipid and sphingolipid metabolism pathways were dysregulated in tumor cells. A series of changes of these metabolites may reflect alterations in lipid metabolism and related signal transduction pathways during PDAC initiation and development. MALDI-MSI analyses revealed that, in PDAC tissues, expression levels of LPC (LPC 16:0 and LPC 18:1) and DAG 36:2 were decreased and those of PC 32:0, SM d36:1, and SM d42:3 were increased in tumor areas. It has been reported that serum LPCs are significantly lower in PDAC (19, 20, 39, 40), which is consistent with our findings. The high levels of saturated/monounsaturated PC in cancer have been shown to be a metabolic feature of tumors (41–44). SM, PC, and DAG are also involved in cancer cell apoptosis or autophagy (45–47). As revealed by tissue proteomics, scRNA-seq, and MALDI-MSI data, these changes of lipid in serum may be partially caused by the adenocarcinoma cells and the carcinoma-associated stromal cells. Thus, changes of these metabolites in patients with PDAC may reflect cell proliferation and apoptotic resistance of cancer cells.
Imaging the spatial distribution of different metabolites in tissues is of great significance for mechanical exploration and disease diagnosis. MSI technologies offer a powerful tool for chemical and spatial characterization of biological tissues with high specificity and sensitivity. The strength of MALDI-MSI is that it can provide region-specific molecular information within the primary lesion to complement the information provided by histopathology, making its application valuable to both cancer biomarker evaluation and clinical diagnosis. In this work, we combined ML-analyzed serum lipidomics, tissue proteomics, and scRNA-seq with MALDI-MSI technology to characterize lipid metabolism of PDAC from the integration of peripheral circulating blood and tissue spatial lipidomics.
Some limitations of this study should be acknowledged. In this study, we aimed to develop an approach to detect PDAC from NC. Thus, the patients with PDAC at various stages were pooled and labeled to train the SVM model. The features selected by this model were considered as common features of early and late stages of PDAC. The signature could be further improved to distinguish early and late stages of PDAC or predict the prognosis of patients with PDAC and implemented in more clinical testing. The ML-aided metabolic PDAC detection approach was built up and validated by several cohorts that all included subjects that were of East Asian populations. Whether this method was suitable for the detection of PDAC in other populations needs further examination. Since the relationships between obesity, diabetes, or new-onset diabetes and PDAC are indeterminate, the performance of the ML-aided metabolic PDAC detection approach may be affected by metabolic-related confounding factors (48). However, this approach was validated and prospectively examined in two independent datasets and showed good performance. This approach needs to be extensively analytical and clinically validated in multicenter, multiethnic, and large-scale cohorts with stricter enrollment requirements before it is officially used as a clinical screening/early detection tool of PDAC. The interpretation of the results from the approach in the screening and detection of PDAC should be beneficial in combination with other methods such as CA19-9, abdominal ultrasound, and CT.
In summary, this work has established a prototype method of metabolomics combined with ML and greedy algorithm to refine the disease detection test program for targeted metabolomics with ML. Advantages of the ML-aided metabolic PDAC detection approach demonstrate the potential application of this method in auxiliary diagnosis of PDAC. This accurate and sensitive approach to PDAC detection provides a new prospect of disease diagnosis through ML-aided metabolomics. We propose that the proper clinical application of this approach may benefit patients with PDAC for accurate diagnosis and possibly lead to more effective interventions.
MATERIALS AND METHODS
Experimental design
Patients with pancreatic cancer who underwent surgery at the Departments of General Surgery of the Peking University First and Third Hospital and Departments of Hepatobiliary Surgery and Oncology Surgery of Chinese PLA Hospital were enrolled with the following criteria: (i) pathologically confirmed PDAC, (ii) no history of other malignancies, and (iii) no neoadjuvant therapy (chemotherapy, radiotherapy, etc.) before surgery. We included asymptomatic adults with non-PDAC in Peking University First Hospital, Peking University Third Hospital, and Chinese PLA Hospital as controls, and the inclusion criteria were (i) age of >18 years and (ii) no history of malignancies.
For validation study, patients with pancreatic cancer who underwent surgery at the Pancreas Center, The First Affiliated Hospital of Nanjing Medical University and Health Management Institute, the Second Medical Center of Chinese PLA General Hospital were enrolled with the following criteria: (i) pathologically confirmed PDAC, (ii) no history of other malignancies, and (iii) no neoadjuvant therapy (chemotherapy, radiotherapy, etc.) before surgery. Patients with benign diseases of pancreas including chronic pancreatitis, intraductal papillary mucinous neoplasm, mucinous cystic neoplasm, and pseudocyst and asymptomatic adults who received health checkups were enrolled at the Health Management Institute, the Second Medical Center of Chinese PLA General Hospital. CA19-9 was measured in the certified clinical laboratory in the Health Management Institute, the Second Medical Center of Chinese PLA General Hospital.
For MSI and proteomics analysis, primary tumors from treatment-naïve patients with PDAC were collected at Peking University Third Hospital. None of the patients had a prior history of other types of cancer or received any anticancer treatments before surgery. This study was approved by the hospital’s Institutional Review Board and Ethics Committee Board of the Chinese PLA General Hospital (S2019-137-01), and informed consent was obtained from all subjects.
Serum collection
For enrolled patients, 4 ml of peripheral blood was collected in serum separation tubes at the time before anesthesia and surgery. For control population, peripheral blood was collected at the morning of heath check or examination in outpatients. All participants have been fasted at least 8 hours before blood collection. Whole blood was then centrifuged at 1600g for 10 min followed by centrifugation at 16,000g for 10 min. Serum aliquots were transferred into cryovials and stored at −80°C.
Radiologic evaluation and definition of PDAC appearances
Abdominal contrast CT was performed preoperatively with a 64- or 256-detector row scanner (Siemens, GE Healthcare) in all patients. The patients were scanned using specific pancreaticobiliary protocol with 3-mm-thick contiguous sections: First, iopromide was administered with a power injector, and then arterial phase images were acquired at 25s - 30s, followed by venous phase images at 50s - 60s. Coronal and sagittal multiplanar reformats were generated and sent to the picture archiving and communication systems for review. The CT images were evaluated by two surgeons and two experienced abdominal radiologists; cases with discrepancies were reevaluated simultaneously by the four reviewers until a consensus was reached. For each patient, irregular low-density foci in the pancreas tissue, pancreatic duct expansion, blurred peripheral fat space, and wrapped neighboring blood branches are the most indicative features for malignancy definition.
Chemical materials
Ammonium acetate and formic acid were purchased from Sigma-Aldrich (St. Louis, MO, USA). LC-MS–grade isopropanol (IPA), acetonitrile (ACN), and methanol were purchased from Thermo Fisher Scientific (USA). Deionized water was produced by a Milli-Q system. The chemical standards were of analytical grade with typical purity of >99%. LPC (13:0, 14:0, 18:0, and 18:1), PC (16:0-16:0, 16:0-d31-18:1, 16:0-18:1, and 18:0-18:2), LPE (13:0), PE (16:0-18:2 and 16:0-d31-18:1), SM (d18:1/18:0), and DG (8:0-8:0 and 18:1-18:1) were purchased from Avanti Polar Lipids (USA).
Sample preparation for metabolomics
For untargeted lipid profiling, lipids were extracted from serum samples as previously described (49). Briefly, 100 μl of serum was mixed with 400 μl of chloroform/methanol (2:1, v/v) in a 1.5-ml centrifuge tube. After vortex and shaking for 15 min, the mixture was centrifuged at 12,000 rpm for 20 min at 4°C. The lower lipid containing chloroform phase was evaporated under vacuum, and the residue was stored at −80°C for subsequent analysis. All samples were processed in the same laboratory to avoid bias.
For targeted lipid quantification, lipids were extracted from 50 μl of serum samples with 200 μl of chloroform/methanol (2:1, v/v) containing LPC 13:0, PC 16:0-d31-18:1, PE 16:0-d31-18:1, LPE 13:0, and DG 8:0-8:0 at 10 mg/ml as internal standards. After vortex and shaking, the mixture was centrifuged at 12,000 rpm for 20 min at 4°C. The lower organic phase was evaporated under vacuum and stored at −80°C until use.
For untargeted lipidomics analysis, dried samples were reconstituted in 100 μl of chloroform/methanol (1:1, v/v) and diluted threefold in IPA/ACN/water (2:1:1, v/v/v). After centrifugation at 12,000 rpm for 15 min, 10 μl of supernatant was injected for LC-MS/MS analysis.
For targeted lipid quantification, dried samples were reconstituted in 50 μl of chloroform/methanol (1:1, v/v) and diluted 120 times in IPA/ACN/water (2:1:1, v/v/v). After centrifugation at 12,000 rpm for 15 min, 2 μl of supernatant was injected into HPLC for LC-MS/MS analysis.
High-performance liquid chromatography
For untargeted lipidomics, an Ultimate 3000 ultrahigh-performance LC (UHPLC) system coupled to Q-Exactive MS (Thermo Fisher Scientific) was used for lipid separation and detection. Samples were reconstituted in 20 μl of chloroform/methanol (1:1, v/v) and diluted three times in IPA/ACN/water (2:1:1, v/v/v). After centrifugation at 12,000 rpm for 15 min, 5 μl of supernatant was injected for LC-MS/MS analysis.
Chromatographic separation was performed on a reversed-phase X-select CSH C18 column (4.6 mm by 100 mm, 2.5 μm) (Waters, USA). Two solvents were used for gradient elution: (A) ACN/water (3:2, v/v) and (B) IPA/ACN (9:1, v/v). Both A and B contained 10 mM ammonium acetate and 0.1% formic acid. The gradient program was as follows: 0 min, 40% B; 2 min, 43% B; 2.1 min, 50% B; 10 min, 60% B; 10.1 min, 75% B; 16 min, 99% B; 17 min, 99% B; 18 min, 40% B; and 19 min, 40% B. The column temperature was maintained at 50°C, and the flow rate was set to 0.6 ml/min.
For targeted lipid quantification, a Nexera UHPLC system (Shimazu) coupled with a QTRAP 6500 MS (AB Sciex) was used. Chromatographic separation was performed on a reversed-phase X-select CSH C18 column (2.1 mm by 100 mm, 2.5 μm) (Waters, USA) using the same mobile phase as described for the untargeted lipidomics. The gradient program was as follows: 0 min, 40% B; 0.5 min, 40% B; 0.6 min, 50% B; 6.6 min, 60% B; 6.7 min, 75% B; 9.7 min, 99% B; 14 min, 99% B; 14.5 min, 40% B; and 19 min, 40% B. The column temperature was maintained at 50°C, and the flow rate was set to 0.3 ml/min.
Mass spectrometry
Untargeted lipidomics analysis was performed using the Q-Exactive MS (Thermo Fisher Scientific) equipped with a HESI ion source in ESI mode. Mass spectrometric data were collected in a data-dependent top 10 scan mode. Survey of full-scan MS spectra [mass range mass/charge ratio (m/z), 190 to 1200] was acquired with a resolution R = 35,000, a maximum injection time (IT) = 80 ms, and an automatic gain control (AGC) target of 5 × 106. MS/MS fragmentation was performed using high-energy c-trap dissociation with a resolution R = 17,500, a maximum IT = 70 ms, and an AGC target of 1 × 105. The stepped normalized collision energy (NCE) was set to 15, 30, or 45. Dynamic exclusion was set to 8 s. External mass calibration was applied before every sequence run. The source spray voltage was maintained at 3.3 kV in positive-ion mode and 3 kV in negative-ion mode. All other interface settings were identical for both positive-ion mode and negative-ion mode. The capillary temperature, sheath gas flow, auxiliary gas flow, and probe heater temperatures were set to 320°, 40°, 10°, and 300°C, respectively. A pooled serum sample was prepared as quality control (QC) to assess the stability of the LC-MS instrument and ensure the reliability of the data. A QC sample was run before and after the sequence and after every 25 sample runs in the sequence.
Targeted lipid quantification assay was performed on a QTRAP 6500 (AB Sciex) triple quadrupole MS in MRM mode. One to three transitions were selected for each target lipid, and CE and DP energy were optimized for each transition to achieve maximum sensitivity. In total, 38 transitions (LPC, 13 transitions; PC, 10 transitions; LPE, 4 transitions; PE, 3 transitions; SM, 3 transitions; and DG, 5 transitions) representing 17 target lipids (LPC 14:0, 16:0, 18:1, and 20:4; PC 16:0-16:0, 16:0-18:1, 18:0-18:2, 18:0-20:3, 16:0-22:5, 18:0-22:5, and O-16:0-18:2; LPE 22:4; PE 16:0-18:2; SM d18:1/18:0, d18:2/24:1, and d18:2/24:2; DG 18:1-18:1) and 5 internal standards (LPC 13:0, PC 16:0-d31-18:1, PE 16:0-d31-18:1, LPE 13:0, and DG 8:0-8:0) were acquired in a 19-min LC-MS run in positive-ion mode. Other parameters were as follows: dwell time, 30 ms; curtain gas, 35; ion source gas 1, 60; ion source gas 2, 60; temperature, 350°C; and spray voltage, 5.5 kV.
Data processing
The acquired raw data of untargeted lipidomics were processed with MS-DIAL software (v3.82) according to the instructions in the software tutorial (50). The MS/MS spectra–based lipid identification was performed in MS-DIAL by searching the acquired MS/MS spectra against the software’s internal in silico MS/MS spectra database (version: LipidDBs-VS23-FiehnO), which includes MS1 and MS/MS information of common lipid species. The tolerance for MS1 and MS/MS search was set to 0.01 and 0.05 Da, respectively. Other parameters used in MS-DIAL were set as the default settings. Datasets containing m/z values, retention time, metabolite name, and peak area were exported as an Excel file.
L2 normalization was used to normalize the data according to the following equation
where fi denotes the ith dimension data for each sample and M is dimension of the data. We use the ratio of mean of each feature from the training and test sets to scale the testing dataset.
Support vector machine
SVM targets infer a hyperplane in sample space. Intuitively, a perfect hyperplane should contain the greatest distance to the nearest training samples of any class. In ML, SVM is a supervised model with associated algorithms that analyze data used for classification and regression analysis.
The core of SVM is to infer the weight w and bias b, which is treated as the classifier and fixed after training, with the training dataset. Given a testing sample x, its prediction score y can be obtained by y = wTx + b. For the binary classification, i.e., PDAC and NC, the calling threshold is fixed as 0 to classify the positive and negative samples. For example, the sample can be classified as the PDAC if y ≤ 0 and vice versa. Further validation of the model was conducted on three different cohorts without re-fit within each cohort. For SVM, the linear kernel is applied to train the model.
In this work, the SVM model (liblinear 2.20) was built to classify PDAC from NC samples. Therefore, the SVM would learn a binary discriminative classifier defined by a separating hyperplane. Thus, given N samples (x1, y1), (x2, y2),……, (xN, yN), where xi and yi are the data and label for ith sample, respectively, the SVM finds the “maximum margin” or hyperplane that could divide those samples with the following equation
As shown in the above equation, the inferred w could be regarded as the importance weight for each feature. On the basis of w, features with higher importance can be selected.
Feature selection
To target select features with information that is substantial for classifying the PDAC and NC, we computed the importance weight for each feature. All the normalized training data were denoted as 𝐗∈R495 × 1416(669). SVM was used to generate the weight W∈R1 × 1416(669) for all features. Once the weight W was obtained, we treated the square value of Wi as the importance weight of the ith feature by order. All features were therefore sorted on the basis of their weight value. Once obtaining the weight for each feature, the greedy-based feature selection progress was conducted to analyze the features one by one based on its weight from high to low. Targeting to mark the selected feature, the greedy-based feature selection defines a selected feature set, which is empty before applying the feature selection. For the Nth feature, the greedy-based feature selection first adds the Nth feature into the previous selected features set and then conducts 500 times of iterations of fourfold cross-validation by fixing the hyperparameter of SVM, c = 5, to evaluate the average performance. If the obtained average performance is higher than the performance obtained by the previous selected features set, then the Nth feature would be added into the selected features set; otherwise, discard.
For the traditional ML-based feature selection, based on the fact that a feature with a higher value has a greater influence, a validation operation of feature selection was conducted to select Top-X importance features with the highest classification accuracy. Here, the top 100 importance features were analyzed to generate predictive models for feature selection, which was performed by increasing and selecting from the top-ranking feature one by one for evaluation, e.g., selecting the Top-1 feature as the first model, Top-2 features as the second model, and then iterated to Top-N features as the Nth model. Mean accuracies for each model (n = 100) in feature selection were calculated after 500 times of iterations of fourfold cross-validation by fixing the hyperparameter of SVM, c = 5. The classification performance of the two methods was directly compared with the average accuracy after 500 times of iterations of fourfold cross-validation using the same number of selected features.
Protein digestion
Protein concentrations were determined using the bicinchoninic acid assay (Thermo Fisher Scientific). Proteins were subjected to disulfide bond reduction with 10 mM dithiothreitol (room temperature, 30 min) and alkylation with 20 mM iodoacetamide (room temperature, 30 min in the dark). Trichloroacetic acid (TCA)–acetone precipitation was performed before protease digestion. In brief, 100% TCA solution was added to each sample in a ratio of 1:9 and vortexed. Samples were centrifuged at 16,000g for 15 min at 4°C and subsequently washed twice with 100% cold acetone. Samples were resuspended in 100 mM tris-HCl (pH 8.5) and digested at 37°C overnight with trypsin at a 50:1 protein-to-protease ratio.
LC-MS/MS analysis for proteomics
MS data were collected on an Orbitrap Elite mass spectrometer coupled to a Proxeon NanoLC-1000 UHPLC. The 100-μm capillary column was packed with 15 cm of ReproSil-Pur C18-AQ resin (3 μm, 120 Å; Dr. Maisch GmbH). Peptides were eluted with a linear gradient from 5 to 32% buffer B (98% acetonitrile and 0.1% formic acid) in 200 min at a constant flow rate of 300 nl/min. MS data were acquired with a Top20 data-dependent MS/MS method. The scan sequence began with an MS1 spectrum (Orbitrap analysis, resolution of 240,000, 300 to 1600 Th, AGC target of 1 × 106, and maximum injection time of 200 ms). MS2 analysis consisted of collision-induced dissociation, AGC of 1 × 104, NCE of 35, q value of 0.25, maximum injection time of 150 ms, and isolation window at 2 Th.
MS raw files were analyzed using MaxQuant software, version 1.6.14 (51), and peptide lists were searched against human UniProt FASTA databases (downloaded on February 2020). A contaminant database generated by the Andromeda search engine was configured with cysteine carbamidomethylation as a fixed modification and N-terminal acetylation and methionine oxidation as variable modifications. We set the false discovery rate (FDR) to 0.01 for protein and peptide levels, with a minimum length of six amino acids for peptides. The FDR was determined by searching a reverse database. Enzyme specificity was set as C-terminal to arginine and lysine as expected using trypsin as proteases. A maximum of two missed cleavages were allowed. Peptide identification was performed in Andromeda with an initial precursor mass deviation of up to 7 parts per million and a fragment mass deviation of 0.5 Th.
Analysis of proteomics data
MS data were processed using R 4.0.1. Proteins with less than 50% missing values in either cancer group or control group were retained, and remaining missing values were filled with zero. Student’s t test was performed after log transformation, and P values were adjusted using the Benjamini and Hochberg method (52). Gene Ontology overrepresentation analysis was performed using the clusterProfiler (R package) in proteins with fold change of ≥2 and adjusted P ≤ 0.05 (53).
scRNA-seq data processing
Integrated single-cell count matrix was obtained from the accession number of GSA: CRA001160 (26). The filtered gene expression matrices were analyzed by R software (version 4.0.2) with Seurat package (version 3.2.0) (54, 55). In brief, cells with >200 genes detected were selected for further analyses. Low-quality cell data were removed if (i) >8000 or <200 genes or (ii) >10% unique molecular identifiers derived from the mitochondrial genome. After removal of low-quality cells, the gene expression matrices were normalized by the NormalizeData function, and 2000 features with high cell-to-cell variation were calculated using the FindVariableFeatures function. To reduce the dimensionality of the datasets, the RunPCA function was conducted with default parameters on linear transformation scaled data generated by the ScaleData function. Then, the ElbowPlot, DimHeatmap, and JackStrawPlot functions were used to identify the true dimensionality of each dataset, which were highly recommended by the Seurat developers. Last, we clustered cells using the FindNeighbors and FindClusters functions and performed nonlinear dimensional reduction with the RunUMAP function with default settings. All details of the Seurat analyses performed in this work can be found in the website tutorial (https://satijalab.org/seurat/v3.0/pbmc3k_tutorial.html).
Cell type annotation
Cells were annotated by cell type file provided by the authors from public database (26). Parenchymal cell (acinar cells, type 1 and 2 ductal cells, and endocrine cells), stromal cell (fibroblasts, stellate cells, and endothelial cells), and immune cell (macrophages and T and B lymphocytes) clusters were identified, as marked in fig. S8F.
CNV estimation and identification of malignant cells
To infer CNVs from the scRNA-seq data, we used an approach described previously with the R code provided in https://github.com/broadinstitute/inferCNV with the default parameters (27, 28). Immune cells, endothelial cells, stellate cells, endocrine cells, and fibroblast cells were considered as putative nonmalignant cells, and their CNV estimates were used to define background.
Gene set variation analysis
Pathway analyses were performed on the 70 metabolism pathways and 15 lipid metabolism pathways as described (32). To assign pathway activity–estimated value to individual cells, we applied gene set variation analysis (GSVA) (56) using standard settings, as implemented in the GSVA package (version 1.36.3). Significantly disturbed pathways were identified by Limma R packages (version 3.44.3) (57) with Benjamini-Hochberg–corrected P ≤ 0.01.
Validation of scRNA-seq analysis
The data of TCGA and GTEx were used to confirm the annotation of malignant cells and normal epithelium cells. We obtained the compiled public dataset from UCSC Xena (“TCGA TARGET GTEx” cohort) (31). DESeq2 standardized dataset with “RSEM expected_count” was downloaded, and 183 PDAC samples from TCGA and 167 normal pancreas samples from GTEx were analyzed (58). DESeq2 tool was used for differentially expressed genes analysis, and the parameters of screening were set as Padj < 0.01 and abs [log2(FC + 1)] > 2. Meanwhile, the parameter of differential genes analysis between type 2 ductal cell in the PDAC samples and type 1 ductal cell in control samples of scRNA-seq dataset was accordingly set as Padj < 0.01. The Top-50 of both up-regulated and down-regulated differentially expressed genes of type 2 ductal cell in the scRNA-seq dataset were further analyzed.
The data of TCGA-GTEx and the independent dataset of PDAC were used to validate the lipid metabolism signature in PDAC (35). Data of 178 TCGA PDAC samples (1 metastatic PDAC was excluded) and 171 normal pancreas samples (4 TCGA PDAC adjacent normal pancreas and 167 GTEx normal pancreas) were downloaded by UCSC Xena project with a standard pipeline (31). The mRNA microarray expression data of 69 cases of PDAC and 61 paired normal pancreas tissues were downloaded from the National Center for Biotechnology Information’s Gene Expression Omnibus with accession number GSE62452. GSVA was applied to calculate the enrichment scores of Kyoto Encyclopedia of Genes and Genomes–derived metabolic pathways in different samples. t test was conducted to analyze the difference between PDAC and normal pancreas samples.
Confocal microscopy
Sections were fixed with 4% paraformaldehyde for 10 min, followed by blocking with 1% bovine serum albumin and incubation with specific primary antibodies of Pan-Keratin (C11, Cell Signaling Technology) and MUC1 (ab109185, Abcam) at 4°C overnight. After incubation with secondary antibodies (Alexa Fluor 647, Invitrogen) at room temperature for 1 hour and staining with 4′,6-diamidino-2-phenylindole (0.5 μg/ml) for 10 min, cover glasses were mounted and evaluated with fluorescence microscopy. A Nikon TCS A1 microscope was used for confocal microscopy.
Matrix-assisted laser desorption/ionization–mass spectrometry imaging
For MALDI-MSI analysis, the iMScope TRIO system of Shimadzu company (Japan) was used. Shortly, frozen tissue samples from patients with PDAC were cut into 5-μm sections, mounted onto indium-tin-oxide–coated conductive glass slides, and put in a vacuum drying oven at room temperature for 1 hour to allow better adhesion. For matrix deposition, the substrate vapor deposition device iMLayer was used. In the vacuum environment, dihydroxybenzic acid (DHB; Sigma-Aldrich) matrix was heated and sublimed, and the matrix was coated on the sample slides to achieve a higher-precision matrix crystal. The substrate thickness was 1.5 μm for DHB sublimation. A series of parameters used for analysis were shown as follows: the resolution is 50 μm × 50 μm and the mass range m/z was from 400 to 900. Both positive-ion and negative-ion modes were used as polarity. The sample voltage was maintained at 3.5 kV in positive-ion mode and 3.0 kV in negative-ion mode. The detector voltage was 1.75 to 1.8 kV. The laser spot diameter was 25 μm, and the intensity was 50.
MALDI-MSI quantitative analysis
Data from MALDI-MSI were normalized before quantitative analysis. For a specific m/z, the quantities in the range of [m/z −0.05, m/z + 0.05] were counted as a two-dimensional array. Then mean μ and SD δ of the array were calculated. Data larger than the threshold t (t = μ + 3*δ) were regarded as outliers deviating from the distribution and assigned with t. All data in the array were normalized to [0, 1] by dividing by t. The heatmap corresponding to the m/z was displayed through the colormap “jet” from the library “matlibplot” in Python. To avoid the interference of the background, the noises outside the tissues were filtered in advance. In addition to visual comparison, data were also quantified according to regional statistics to be more convincing. Specifically, the normalized mean intensities of cancer and normal regions were calculated as μ_cancer and μ_normal. Then, the ratio r (r = μ_cancer/μ_normal) was adopted to measure the expression differences between them. Note that the ratio was no longer meaningful when the two values were small because of weak expressions in both regions. When at least one region was strongly expressed, r reflected the expression differences quantitatively. When r was greater than 1, the expression of the cancer region was higher than that of the normal region and vice versa.
Statistical analysis
MATLAB R2018a, SPSS R24.0.0, and Prism GraphPad v8.0.2 (263) softwares were used for statistical analysis. For data from targeted assay, t-distributed stochastic neighbor embedding (t-SNE) analysis was conducted with MATLAB using the tsne function. Pathway enrichment and hierarchy cluster analysis was performed using the MetaboAnalyst statistical analysis tool web service (www.metaboanalyst.ca/faces/upload/StatUploadView.xhtml). DeLong’s test was performed with MATLAB using the DeLongUserInterface function (https://github.com/PamixSun/DeLongUI). Multivariate linear regression analysis was performed with SPSS. Statistical significance is shown as *P < 0.05, **P < 0.01, and ***P < 0.001.
Acknowledgments
We thank J. Zhou, K. Zhang, S. Di, A. Liu, C. Yang, and F. Wang for technical assistance and critical discussions. We thank J. Luo and W. Shen for providing chemical standard reagents.
Funding: This work was supported by grants including the National Key Research and Development Program of China (2016YFA0500302 to Y. Yin), National Natural Scientific Foundation of China (82030081, 81430056, 31420103905, and 81874235 to Y. Yin; 30700349 and 30440012 to L.G.), Beijing Municipal Science and Technology Commission (Z131100004013036 to L.G.), Shu Fan Education and Research Foundation, and Lam Chung Nin Foundation for Systems Biomedicine.
Author contributions: Conceptualization: Y. Yin, L.G., Q.Z., K.J., G.W., and H.Y. Methodology: G.W., H.Y., R.P., J.L., H.N., G.Z., Z.Z., X.Z., and Z.M. Investigation: G.W., H.Y., Y.G., Z.L., Y.L., Y. Yuan, H.S., H.N., G.Z., Z.Z., Z.M., X.Z., M.Q., Y.M., and L.G. Visualization: G.W., H.Y., R.P., H.S., H.N., G.Z., Z.Z., X.Z., and M.Q. Supervision: Y. Yin, L.G., Q.Z., K.J., Y.J., Y. Yang, and Z. Zhao. Writing—original draft: Y. Yin, L.G., Q.Z., K.J., G.W., H.Y., Y.G., Z.L., M.Q., H.N., G.Z., X.Z., and Z.Z. Writing—review and editing: All authors.
Competing interests: The authors declare that they have no competing interests.
Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Raw data have been deposited at the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the MassiVE partner repository with the dataset identifier PXD027113. Code is available from the Zenodo repository at https://doi.org/10.5281/zenodo.5148069 or GitHub at https://github.com/coldrainyht/PDAC.
Supplementary Materials
This PDF file includes:
Figs. S1 to S9
Tables S1 to S7
Legend for data S1
Other Supplementary Material for this manuscript includes the following:
Data S1
REFERENCES AND NOTES
- 1.Ryan D. P., Hong T. S., Bardeesy N., Pancreatic adenocarcinoma. N. Engl. J. Med. 371, 2140–2141 (2014). [DOI] [PubMed] [Google Scholar]
- 2.Garrido-Laguna I., Hidalgo M., Pancreatic cancer: From state-of-the-art treatments to promising novel therapies. Nat. Rev. Clin. Oncol. 12, 319–334 (2015). [DOI] [PubMed] [Google Scholar]
- 3.Yachida S., Jones S., Bozic I., Antal T., Leary R., Fu B., Kamiyama M., Hruban R. H., Eshleman J. R., Nowak M. A., Velculescu V. E., Kinzler K. W., Vogelstein B., Iacobuzio-Donahue C. A., Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature 467, 1114–1117 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Loos M., Kleeff J., Friess H., Büchler M. W., Surgical treatment of pancreatic cancer. Ann. N. Y. Acad. Sci. 1138, 169–180 (2008). [DOI] [PubMed] [Google Scholar]
- 5.Hezel A. F., Kimmelman A. C., Stanger B. Z., Bardeesy N., Depinho R. A., Genetics and biology of pancreatic ductal adenocarcinoma. Genes Dev. 20, 1218–1249 (2006). [DOI] [PubMed] [Google Scholar]
- 6.Idle J. R., Gonzalez F. J., Metabolomics. Cell Metab. 6, 348–351 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wei R., Li G., Seymour A. B., High-throughput and multiplexed LC/MS/MRM method for targeted metabolomics. Anal. Chem. 82, 5527–5533 (2010). [DOI] [PubMed] [Google Scholar]
- 8.Kitteringham N. R., Jenkins R. E., Lane C. S., Elliott V. L., Park B. K., Multiple reaction monitoring for quantitative biomarker analysis in proteomics and metabolomics. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 877, 1229–1239 (2009). [DOI] [PubMed] [Google Scholar]
- 9.Ciccimaro E., Blair I. A., Stable-isotope dilution LC-MS for quantitative biomarker analysis. Bioanalysis 2, 311–341 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hocher B., Adamski J., Metabolomics for clinical use and research in chronic kidney disease. Nat. Rev. Nephrol. 13, 269–284 (2017). [DOI] [PubMed] [Google Scholar]
- 11.Mayers J. R., Wu C., Clish C. B., Kraft P., Torrence M. E., Fiske B. P., Yuan C., Bao Y., Townsend M. K., Tworoger S. S., Davidson S. M., Papagiannakopoulos T., Yang A., Dayton T. L., Ogino S., Stampfer M. J., Giovannucci E. L., Qian Z. R., Rubinson D. A., Ma J., Sesso H. D., Gaziano J. M., Cochrane B. B., Liu S., Wactawski-Wende J., Manson J. A. E., Pollak M. N., Kimmelman A. C., Souza A., Pierce K., Wang T. J., Gerszten R. E., Fuchs C. S., Heiden M. G. V., Wolpin B. M., Elevation of circulating branched-chain amino acids is an early event in human pancreatic adenocarcinoma development. Nat. Med. 20, 1193–1198 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Halbrook C. J., Lyssiotis C. A., Employing metabolism to improve the diagnosis and treatment of pancreatic cancer. Cancer Cell 31, 5–19 (2017). [DOI] [PubMed] [Google Scholar]
- 13.R. S. Michalski, J. G. Carbonell, T. M. Mitchell, Machine Learning: An Artificial Intelligence Approach (Springer Science & Business Media, 2013). [Google Scholar]
- 14.Nasrabadi N. M., Pattern recognition and machine learning. J. Electron. Imaging 16, 049901 (2007). [Google Scholar]
- 15.I. H. Witten, E. Frank, M. A. Hall, C. J. Pal, Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann, 2016). [Google Scholar]
- 16.Kermany D. S., Goldbaum M., Cai W., Valentim C. C. S., Liang H., Baxter S. L., Keown A. M., Yang G., Wu X., Yan F., Dong J., Prasadha M. K., Pei J., Ting M. Y. L., Zhu J., Li C., Hewett S., Dong J., Ziyar I., Shi A., Zhang R., Zheng L., Hou R., Shi W., Fu X., Duan Y., Huu V. A. N., Wen C., Zhang E. D., Zhang C. L., Li O., Wang X., Singer M. A., Sun X., Xu J., Tafreshi A., Lewis M. A., Xia H., Zhang K., Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131.e9 (2018). [DOI] [PubMed] [Google Scholar]
- 17.Capper D., Jones D. T. W., Sill M., Hovestadt V., Schrimpf D., Sturm D., Koelsche C., Sahm F., Chavez L., Reuss D. E., Kratz A., Wefers A. K., Huang K., Pajtler K. W., Schweizer L., Stichel D., Olar A., Engel N. W., Lindenberg K., Harter P. N., Braczynski A. K., Plate K. H., Dohmen H., Garvalov B. K., Coras R., Hölsken A., Hewer E., Bewerunge-Hudler M., Schick M., Fischer R., Beschorner R., Schittenhelm J., Staszewski O., Wani K., Varlet P., Pages M., Temming P., Lohmann D., Selt F., Witt H., Milde T., Witt O., Aronica E., Giangaspero F., Rushing E., Scheurlen W., Geisenberger C., Rodriguez F. J., Becker A., Preusser M., Haberler C., Bjerkvig R., Cryan J., Farrell M., Deckert M., Hench J., Frank S., Serrano J., Kannan K., Tsirigos A., Brück W., Hofer S., Brehmer S., Seiz-Rosenhagen M., Hänggi D., Hans V., Rozsnoki S., Hansford J. R., Kohlhof P., Kristensen B. W., Lechner M., Lopes B., Mawrin C., Ketter R., Kulozik A., Khatib Z., Heppner F., Koch A., Jouvet A., Keohane C., Mühleisen H., Mueller W., Pohl U., Prinz M., Benner A., Zapatka M., Gottardo N. G., Driever P. H., Kramm C. M., Müller H. L., Rutkowski S., von Hoff K., Frühwald M. C., Gnekow A., Fleischhack G., Tippelt S., Calaminus G., Monoranu C.-M., Perry A., Jones C., Jacques T. S., Radlwimmer B., Gessi M., Pietsch T., Schramm J., Schackert G., Westphal M., Reifenberger G., Wesseling P., Weller M., Collins V. P., Blümcke I., Bendszus M., Debus J., Huang A., Jabado N., Northcott P. A., Paulus W., Gajjar A., Robinson G. W., Taylor M. D., Jaunmuktane Z., Ryzhova M., Platten M., Unterberg A., Wick W., Karajannis M. A., Mittelbronn M., Acker T., Hartmann C., Aldape K., Schüller U., Buslei R., Lichter P., Kool M., Herold-Mende C., Ellison D. W., Hasselblatt M., Snuderl M., Brandner S., Korshunov A., von Deimling A., Pfister S. M., DNA methylation-based classification of central nervous system tumours. Nature 555, 469–474 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Coudray N., Ocampo P. S., Sakellaropoulos T., Narula N., Snuderl M., Fenyo D., Moreira A. L., Razavian N., Tsirigos A., Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Mayerle J., Kalthoff H., Reszka R., Kamlage B., Peter E., Schniewind B., Gonzalez Maldonado S., Pilarsky C., Heidecke C. D., Schatz P., Distler M., Scheiber J. A., Mahajan U. M., Weiss F. U., Grutzmann R., Lerch M. M., Metabolic biomarker signature to differentiate pancreatic ductal adenocarcinoma from chronic pancreatitis. Gut 67, 128–137 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Fahrmann J. F., Bantis L. E., Capello M., Scelo G., Dennison J. B., Patel N., Murage E., Vykoukal J., Kundnani D. L., Foretova L., Fabianova E., Holcatova I., Janout V., Feng Z., Yip-Schneider M., Zhang J., Brand R., Taguchi A., Maitra A., Brennan P., Max Schmidt C., Hanash S., A plasma-derived protein-metabolite multiplexed panel for early-stage pancreatic cancer. J. Natl. Cancer Inst. 111, 372–379 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.R. G. Cinbis, J. Verbeek, C. Schmid, Segmentation Driven Object Detection with Fisher Vectors, in Computer Vision (ICCV), 2013 IEEE International Conference on (IEEE, 2013), pp. 2968–2975. [Google Scholar]
- 22.Hanley J. A., McNeil B. J., The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982). [DOI] [PubMed] [Google Scholar]
- 23.Sun X., Xu W., Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Proc. Let. 21, 1389–1393 (2014). [Google Scholar]
- 24.Pavlova N. N., Thompson C. B., The emerging hallmarks of cancer metabolism. Cell Metab. 23, 27–47 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Ying H., Kimmelman A. C., Lyssiotis C. A., Hua S., Chu G. C., Fletcher-Sananikone E., Locasale J. W., Son J., Zhang H., Coloff J. L., Yan H., Wang W., Chen S., Viale A., Zheng H., Paik J. H., Lim C., Guimaraes A. R., Martin E. S., Chang J., Hezel A. F., Perry S. R., Hu J., Gan B., Xiao Y., Asara J. M., Weissleder R., Wang Y. A., Chin L., Cantley L. C., DePinho R. A., Oncogenic Kras maintains pancreatic tumors through regulation of anabolic glucose metabolism. Cell 149, 656–670 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Peng J., Sun B. F., Chen C. Y., Zhou J. Y., Chen Y. S., Chen H., Liu L., Huang D., Jiang J., Cui G. S., Yang Y., Wang W., Guo D., Dai M., Guo J., Zhang T., Liao Q., Liu Y., Zhao Y. L., Han D. L., Zhao Y., Yang Y. G., Wu W., Single-cell RNA-seq highlights intra-tumoral heterogeneity and malignant progression in pancreatic ductal adenocarcinoma. Cell Res. 29, 725–738 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Patel A. P., Tirosh I., Trombetta J. J., Shalek A. K., Gillespie S. M., Wakimoto H., Cahill D. P., Nahed B. V., Curry W. T., Martuza R. L., Louis D. N., Rozenblatt-Rosen O., Suva M. L., Regev A., Bernstein B. E., Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Tirosh I., Izar B., Prakadan S. M., Wadsworth M. H. II, Treacy D., Trombetta J. J., Rotem A., Rodman C., Lian C., Murphy G., Fallahi-Sichani M., Dutton-Regester K., Lin J.-R., Cohen O., Shah P., Lu D., Genshaft A. S., Hughes T. K., Ziegler C. G. K., Kazer S. W., Gaillard A., Kolb K. E., Villani A.-C., Johannessen C. M., Andreev A. Y., Van Allen E. M., Bertagnolli M., Sorger P. K., Sullivan R. J., Flaherty K. T., Frederick D. T., Jané-Valbuena J., Yoon C. H., Rozenblatt-Rosen O., Shalek A. K., Regev A., Garraway L. A., Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Romero-Calvo I., Weber C. R., Ray M., Brown M., Kirby K., Nandi R. K., Long T. M., Sparrow S. M., Ugolkov A., Qiang W., Zhang Y., Brunetti T., Kindler H., Segal J. P., Rzhetsky A., Mazar A. P., Buschmann M. M., Weichselbaum R., Roggin K., White K. P., Human organoids share structural and genetic features with primary pancreatic adenocarcinoma tumors. Mol. Cancer Res. 17, 70–83 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Boj S. F., Hwang C.-I., Baker L. A., Chio I. I. C., Engle D. D., Corbo V., Jager M., Ponz-Sarvise M., Tiriac H., Spector M. S., Gracanin A., Oni T., Yu K. H., van Boxtel R., Huch M., Rivera K. D., Wilson J. P., Feigin M. E., Öhlund D., Handly-Santana A., Ardito-Abraham C. M., Ludwig M., Elyada E., Alagesan B., Biffi G., Yordanov G. N., Delcuze B., Creighton B., Wright K., Park Y., Morsink F. H. M., Molenaar I. Q., Rinkes I. H. B., Cuppen E., Hao Y., Jin Y., Nijman I. J., Iacobuzio-Donahue C., Leach S. D., Pappin D. J., Hammell M., Klimstra D. S., Basturk O., Hruban R. H., Offerhaus G. J., Vries R. G. J., Clevers H., Tuveson D. A., Organoid models of human and mouse ductal pancreatic cancer. Cell 160, 324–338 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Goldman M. J., Craft B., Hastie M., Repecka K., McDade F., Kamath A., Banerjee A., Luo Y., Rogers D., Brooks A. N., Zhu J., Haussler D., Visualizing and interpreting cancer genomics data via the Xena platform. Nat. Biotechnol. 38, 675–678 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Xiao Z., Dai Z., Locasale J. W., Metabolic landscape of the tumor microenvironment at single cell resolution. Nat. Commun. 10, 3763 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Hanahan D., Weinberg R. A., The hallmarks of cancer. Cell 100, 57–70 (2000). [DOI] [PubMed] [Google Scholar]
- 34.Hanahan D., Weinberg R. A., Hallmarks of cancer: The next generation. Cell 144, 646–674 (2011). [DOI] [PubMed] [Google Scholar]
- 35.Yang S., He P., Wang J., Schetter A., Tang W., Funamizu N., Yanaga K., Uwagawa T., Satoskar A. R., Gaedcke J., Bernhardt M., Ghadimi B. M., Gaida M. M., Bergmann F., Werner J., Ried T., Hanna N., Alexander H. R., Hussain S. P., A novel MIF signaling pathway drives the malignant character of pancreatic cancer by targeting NR3C2. Cancer Res. 76, 3838–3850 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Bejnordi B. E., Veta M., van Diest P. J., van Ginneken B., Karssemeijer N., Litjens G., van der Laak J. A. W. M.; the CAMELYON Consortium, Hermsen M., Manson Q. F., Balkenhol M., Geessink O., Stathonikos N., van Dijk M. C., Bult P., Beca F., Beck A. H., Wang D., Khosla A., Gargeya R., Irshad H., Zhong A., Dou Q., Li Q., Chen H., Lin H.-J., Heng P.-A., Haß C., Bruni E., Wong Q., Halici U., Öner M. Ü., Cetin-Atalay R., Berseth M., Khvatkov V., Vylegzhanin A., Kraus O., Shaban M., Rajpoot N., Awan R., Sirinukunwattana K., Qaiser T., Tsang Y.-W., Tellez D., Annuscheit J., Hufnagl P., Valkonen M., Kartasalo K., Latonen L., Ruusuvuori P., Liimatainen K., Albarqouni S., Mungal B., George A., Demirci S., Navab N., Watanabe S., Seno S., Takenaka Y., Matsuda H., Phoulady H. A., Kovalev V., Kalinovsky A., Liauchuk V., Bueno G., Fernandez-Carrobles M. M., Serrano I., Deniz O., Racoceanu D., Venâncio R., Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Ting D. S. W., Cheung C. Y.-L., Lim G., Tan G. S. W., Quang N. D., Gan A., Hamzah H., Garcia-Franco R., Yeo I. Y. S., Lee S. Y., Wong E. Y. M., Sabanayagam C., Baskaran M., Ibrahim F., Tan N. C., Finkelstein E. A., Lamoureux E. L., Wong I. Y., Bressler N. M., Sivaprasad S., Varma R., Jonas J. B., He M. G., Cheng C.-Y., Cheung G. C. M., Aung T., Hsu W., Lee M. L., Wong T. Y., Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 318, 2211–2223 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Unger K., Mehta K. Y., Kaur P., Wang Y., Menon S. S., Jain S. K., Moonjelly R. A., Suman S., Datta K., Singh R., Fogel P., Cheema A. K., Metabolomics based predictive classifier for early detection of pancreatic ductal adenocarcinoma. Oncotarget 9, 23078–23090 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Mehta K. Y., Wu H. J., Menon S. S., Fallah Y., Zhong X., Rizk N., Unger K., Mapstone M., Fiandaca M. S., Federoff H. J., Cheema A. K., Metabolomic biomarkers of pancreatic cancer: A meta-analysis study. Oncotarget 8, 68899–68915 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Rios Peces S., Diaz Navarro C., Marquez Lopez C., Caba O., Jimenez-Luna C., Melguizo C., Prados J. C., Genilloud O., Vicente Perez F., Perez Del Palacio J., Untargeted LC-HRMS-based metabolomics for searching new biomarkers of pancreatic ductal adenocarcinoma: A pilot study. SLAS Discov 22, 348–359 (2017). [DOI] [PubMed] [Google Scholar]
- 41.Louis E., Adriaensens P., Guedens W., Vanhove K., Vandeurzen K., Darquennes K., Vansteenkiste J., Dooms C., de Jonge E., Thomeer M., Mesotten L., Metabolic phenotyping of human blood plasma: A powerful tool to discriminate between cancer types? Ann. Oncol. 27, 178–184 (2016). [DOI] [PubMed] [Google Scholar]
- 42.Piyathilake C. J., Frost A. R., Manne U., Bell W. C., Weiss H., Heimburger D. C., Grizzle W. E., The expression of fatty acid synthase (FASE) is an early event in the development and progression of squamous cell carcinoma of the lung. Hum. Pathol. 31, 1068–1073 (2000). [DOI] [PubMed] [Google Scholar]
- 43.Swinnen J. V., Roskams T., Joniau S., Van Poppel H., Oyen R., Baert L., Heyns W., Verhoeven G., Overexpression of fatty acid synthase is an early and common event in the development of prostate cancer. Int. J. Cancer 98, 19–22 (2002). [DOI] [PubMed] [Google Scholar]
- 44.Lee G. K., Lee H. S., Park Y. S., Lee J. H., Lee S. C., Lee J. H., Lee S. J., Shanta S. R., Park H. M., Kim H. R., Kim I. H., Kim Y. H., Zo J. I., Kim K. P., Kim H. K., Lipid MALDI profile classifies non-small cell lung cancers according to the histologic type. Lung Cancer 76, 197–203 (2012). [DOI] [PubMed] [Google Scholar]
- 45.Oskouian B., Saba J. D., Cancer treatment strategies targeting sphingolipid metabolism. Adv. Exp. Med. Biol. 688, 185–205 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Ogretmen B., Sphingolipid metabolism in cancer signalling and therapy. Nat. Rev. Cancer 18, 33–50 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Dressler K. A., Mathias S., Kolesnick R. N., Tumor necrosis factor-alpha activates the sphingomyelin signal transduction pathway in a cell-free system. Science 255, 1715–1718 (1992). [DOI] [PubMed] [Google Scholar]
- 48.Andersen D. K., Korc M., Petersen G. M., Eibl G., Li D., Rickels M. R., Chari S. T., Abbruzzese J. L., Diabetes, pancreatogenic diabetes, and pancreatic cancer. Diabetes 66, 1103–1110 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Zhou J., Li Y., Chen X., Zhong L., Yin Y., Development of data-independent acquisition workflows for metabolomic analysis on a quadrupole-orbitrap platform. Talanta 164, 128–136 (2017). [DOI] [PubMed] [Google Scholar]
- 50.Tsugawa H., Cajka T., Kind T., Ma Y., Higgins B., Ikeda K., Kanazawa M., VanderGheynst J., Fiehn O., Arita M., MS-DIAL: Data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods 12, 523–526 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Cox J., Mann M., MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol 26, 1367–1372 (2008). [DOI] [PubMed] [Google Scholar]
- 52.Benjamini Y., Drai D., Elmer G., Kafkafi N., Golani I., Controlling the false discovery rate in behavior genetics research. Behav. Brain Res. 125, 279–284 (2001). [DOI] [PubMed] [Google Scholar]
- 53.Yu G., Wang L. G., Han Y., He Q. Y., clusterProfiler: An R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Stuart T., Butler A., Hoffman P., Hafemeister C., Papalexi E., Mauck W. M. III, Hao Y., Stoeckius M., Smibert P., Satija R., Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Butler A., Hoffman P., Smibert P., Papalexi E., Satija R., Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Hänzelmann S., Castelo R., Guinney J., GSVA: Gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Ritchie M. E., Phipson B., Wu D., Hu Y., Law C. W., Shi W., Smyth G. K., limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43, e47 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Love M. I., Huber W., Anders S., Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figs. S1 to S9
Tables S1 to S7
Legend for data S1
Data S1



