Abstract
Background
Formalin-fixed, paraffin-embedded (FFPE) tissue slides are routinely used in cancer diagnosis, clinical decision-making, and stored in biobanks, but their utilization in Raman spectroscopy-based studies has been limited due to the background coming from embedding media.
Methods
Spontaneous Raman spectroscopy was used for molecular fingerprinting of FFPE tissue from 46 patient samples with known methylation subtypes. Spectra were used to construct tumor/non-tumor, IDH1WT/IDH1mut, and methylation-subtype classifiers. Support vector machine and random forest were used to identify the most discriminatory Raman frequencies. Stimulated Raman spectroscopy was used to validate the frequencies identified. Mass spectrometry of glioma cell lines and TCGA were used to validate the biological findings.
Results
Here, we develop APOLLO (rAman-based PathOLogy of maLignant gliOma)—a computational workflow that predicts different subtypes of glioma from spontaneous Raman spectra of FFPE tissue slides. Our novel APOLLO platform distinguishes tumors from nontumor tissue and identifies novel Raman peaks corresponding to DNA and proteins that are more intense in the tumor. APOLLO differentiates isocitrate dehydrogenase 1 mutant (IDH1mut) from wild-type (IDH1WT) tumors and identifies cholesterol ester levels to be highly abundant in IDHmut glioma. Moreover, APOLLO achieves high discriminative power between finer, clinically relevant glioma methylation subtypes, distinguishing between the CpG island hypermethylated phenotype (G-CIMP)-high and G-CIMP-low molecular phenotypes within the IDH1mut types.
Conclusions
Our results demonstrate the potential of label-free Raman spectroscopy to classify glioma subtypes from FFPE slides and to extract meaningful biological information thus opening the door for future applications on these archived tissues in other cancers.
Keywords: FFPE tissue, glioma, lipid metabolism, machine learning, Raman spectroscopy
Graphical Abstract
Graphical Abstract.
Key Points.
We developed APOLLO, a Raman-based platform that predicts the methylation types of gliomas.
We identified cholesterol metabolism as the top discriminative feature between IDH1 mutated and wild-type gliomas.
Importance of the Study.
We developed an automated pipeline (APOLLO) capable of identifying biochemical differences between different subtypes of gliomas, using unmodified FFPE tissue slides, spontaneous Raman spectroscopy, and machine learning. APOLLO identified signals from Raman spectra that distinguish tumor tissue from non-tumor, IDH1WT versus IDH1mut tumors and within IDH1mut subtypes, G-CIMP-high versus G-CIMP-low. This study demonstrates the utility of unprocessed FFPE tissues together with APOLLO in identifying lipid pathways differentially altered in IDH1mut versus IDH1WT gliomas. Importantly, the same slide can be used in a multiplex fashion for other downstream applications. APOLLO provides a new avenue for metabolic and biochemical investigation of tissue that is stored in biobanks and can be extended to any FFPE tissue or classification type.
Before 2016, morphologic features drove the classification of glioma.1 However, the discovery of IDH mutation2 in glioma has changed the classification of these diseases thus making it one of the most important biomarkers for diagnosis, prognosis, and treatment.2–4 In 2021, WHO emphasized the classification of gliomas based on molecular markers more, and thus simplified adult diffuse glioma into three major classes: IDH1mut astrocytoma, IDH1mut oligodendroglioma, and IDH1WT glioblastoma with the grading done within each tumor subtype.5 Classification of gliomas includes the construction of classifiers that use machine learning and DNA methylation profiling, which minimize the need for human evaluation of the tumor histopathology6–12 and hint at the need to use molecular markers. Thus, being able to predict the glioma subtypes and identifying the biochemical differences between different subtypes would aid clinicians in designing more tailored treatments.
Lipids are very important building blocks, promote proliferative signaling via membrane remodeling13 and binding to signaling proteins,14 and serve as energy reserves via β-oxidation15 in gliomas. While IDHWT glioblastoma (GBM) have enhanced fatty acid levels, which are used as a source of energy16 membrane building blocks and to enhance signaling pathways,17 IDH1mut gliomas upregulate sphingolipid and phospholipid pathways.18–21 Lipid alterations in gliomas are commonly discovered by lipidomic assay and include the extraction of lipids from millions of cells or tens of milligrams of tissue.22 This approach while producing metabolite-level detection, loses the spatial resolution needed to identify tumor metabolic heterogeneity.22
Spontaneous Raman spectroscopy is a vibrational technique that can capture the molecular fingerprints of tissue in situ without the need for sample preparation and is particularly suited for measuring lipids with subcellular resolution.23–25 Raman spectroscopy has been used to detect brain tumor margin and diffuse tumor cells,26–29 to predict different brain tumor grades,30 histologically distinct regions,31 or genetic subtypes.32 Spontaneous Raman spectroscopy it is currently used for fresh or frozen tissue and not in FFPE tissue due to the high background coming from the embedding media.27,33–36 Even when Raman spectroscopy is used for FFPE tissues,31 the sample is manipulated to remove the paraffin, a process that washes out the lipids.32
Stimulated Raman spectroscopy (SRS), a resonantly enhanced form of Raman spectroscopy has been widely used in brain tumors to recreate H&E images of glioma,33,34,37,38 and aided by additional genomic and clinical information, SRS has been used for fast diagnosis.39 Previous SRS studies were limited to detecting and analyzing only 2 frequencies which is insufficient for distinguishing among finer composition changes associated with glioma subtypes.33,34,37,39 In contrast, spontaneous Raman spectroscopy can capture the complexity and variability of the chemical composition of biological tissues in the form of thousands of frequencies. However, such complexity necessitates the use of computational approaches to disambiguate the differences associated with Raman spectra. Machine-learning approaches are ideal for understanding these finer differences and identifying the most relevant frequencies but need to be applied to the spectra themselves as opposed to images constructed as previously described.33,34,37,38,40,41
To understand the biochemical differences between different types of gliomas, to take advantage of all the frequencies encoded in the spontaneous Raman spectra and the availability of FFPE tissue slides without manipulating them, we developed APOLLO (rAman-based PathOLogy of maLignant gliOma). APOLLO is a Raman-based machine-learning platform that can classify glioma subtypes from FFPE samples and extract the biochemical differences. We found that APOLLO accurately discriminates tumor versus nontumor regions of the tissue, IDH1mut versus IDHWT, and within IDHmut subtypes, G-CIMP-high versus G-CIMP-low. Overall, these results showcase the potential of APOLLO to predict the methylation subtype of gliomas and to discover novel biology in the most common form of tissue existent in biobanks.
Methods
DNA Methylation Glioma Cohort
All the samples and their methylome classification were provided by Dr. Houtan Noushmehr (Supplementary Table 1). All the samples were collected at the Hermelin Brain Tumor Center at Henry Ford Health System and were de-identified. The project was approved by the Institutional Review Board of each Institution (HFHS IRB# 10963; University Hospitals IRB # CC296 (CASE 1307)) and patients consented to have their specimens used for research purposes.
Raman Data Acquisition
Raman spectra were acquired on a ThermoFisher DXR2xi Raman microscope equipped with 532 nm single frequency laser diode, which was set to deliver 10 mW of power at the sample. The microscope was equipped with an Olympus air objective 10x/NA 0.25, MPlanN FN22. Data acquisition was performed using a 50-mm confocal pinhole 0.25 s exposure time 5 scans, a medium baseline correction over the entire spectral window (50-3400 cm−1). Stimulated Raman spectroscopy images were collected for each frequency, one at a time for large areas of the tissue slides, using a Leica STELLARIS 8 CRS (Coherent Raman Scattering) microscope equipped with FALCON and fluorescence lifetime imaging (FLIM) and using a 10× air objective HC PL APO 10×/0.40 CS2 NA 0.4. Autofluorescence was collected on far-red using excitation at 641 nm and an emission window of 650-750 nm. The detector used was a HyDS1. The SRS system includes a white light laser as an excitation light source, from a CRS Laser picoEmerald S, an Acousto Optical Beam Splitter (AOBS) and a highly sensitive, prism-based spectral detection design with computer-controlled adjustable bandwidth for all fluorescence channels.
Data Preprocessing
For our analysis, we only removed the silent region between 1797.210 and 2697.804, and the region between 0 and 248.651529, dividing each spectrum into two regions. Baseline correction was done using the airPLS algorithm introduced by Zhang et al.42 We used a lambda value of 4 and a p-order value of 1 to approximate the baseline of each nonsilent region. Any remaining negative intensities, after the removal of the baseline were replaced by 0, which resulted in a flat baseline at 0. Following this procedure, we concatenated the nonsilent regions into a single spectrum. We normalized the data by dividing the spectral intensities by the individual L2-norm of the spectra. Viewed as vectors, each spectrum had a norm of 1 after this step. Following normalization, we noted that the spectra in sample HF-1887 deviated significantly from the other samples. Due to this difference, we elected to remove it from the dataset.
DBSCAN Algorithm
DBSCAN’s implementation from scikit-learn43 performed empirically well out of the box. Given our setup, we found ε to be, by far the most determinant hyper-parameter. We found that lowering ε reduces the number of false positives, while increasing ε reduces the number of false negatives. Our emphasis was on reducing false positives, erring toward a more cautious labeling of spectra as tumor rather than nontumor. Using a global ε value for all clusterings is a “high wire balancing act,” as the optimal ε value seems to differ for different types of samples. We did not choose this route to avoid overfitting. Given our objective, we empirically found ε = 0.28 to be a good trade-off for all samples.
Mini-Batch K-Means Algorithm
The unsupervised clustering algorithm K-means44 is a popular machine-learning model used for partitioning data into K clusters. Mini-batch K-means45 is a variation of the K-means algorithm that avoids heavy computational costs by processing only a small “batch” of data at a time. This can drastically reduce the required computational time while sufficiently big batch sizes retain the quality of the clusters. The entire dataset is assigned to its clusters after the cluster centroids are computed based on the mini-batches.
Silhouette Indices
To evaluate the clusters produced by the DBSCAN models, we used the silhouette index. The samples that behaved differently from the majority are discussed below. Sample HF-1887 is the only one in which the tumor cluster had weak silhouette indices, and we discarded it from further analysis as being potentially corrupted. Sample HF-2849 was found to be very noisy, but it was nevertheless successfully curated. In samples HF-2070, HF-1002, and HF-2548 almost all their spots were localized to the tumor cluster.
Random Forest Models
Random forest models46,47 are ensemble models consisting of a collection of independent decision trees. Each tree is trained using a random selection of data points, a method commonly referred to as “bagging.” The trees also utilize “feature bagging,” which allocates a random combination of features to each tree. These methods are key to making random forest models robust and less likely to overfit data than individual decision trees. The predictions produced by a random forest model are the average predictions of its decision trees. Random forest models also provide a feature importance estimate for the data, depending on how often the feature was interrogated in the various decision nodes of the random forest.
Support Vector Classifiers
Support vector classifiers48 are supervised machine-learning models that try to separate binary datasets using a simple linear function. The models are trained to find a hyperplane that separates with maximum margin the 2 classes, producing a wide decision boundary. The model can also be expanded using the kernel trick to make it suitable for nonlinear classification of complex datasets. The kernel trick is a way of transforming the input data using a kernel method, such as a polynomial transformation, or the radial basis function.
Results
Overview of APOLLO: rAman-Based PathOLogy of maLignant gliOma
To develop the APOLLO platform, we obtained FFPE samples from 46 patients whose tumors had been profiled and classified into different methylation subtypes according to Cecarelli et al6 classification (Supplementary Table 1). For each sample, we had parallel sections to confirm the presence of tumor cells using H&E staining (Supplementary Figure 1). Between 2116 to 14 945 spectra were recorded for each of the 59 selected regions, (~300 μm2) with 5 scans, each averaging a 2-s accumulation time, resulting in 300 506 Raman spectra (Figure 1A). These regions were then exported as pseudo-3D matrices X⋅Y⋅1738 (with (X,Y) ranging from a minimum of (46,46) to maximum of (245,61) (Figure 1B). Each spot of the X ⋅ Y tumor slice was characterized through a numerical vector fingerprint of size 1738, encoding the Raman wavelengths 50-3399 cm−.1
Figure 1.
Overview of APOLLO. A. Our study design involves: 1) stain FFPE adjacent slides H&E staining to confirm tumor from the region of interest, 2) confirm methylation subtype, and 3) analyze the samples using spontaneous Raman spectroscopy. B. The machine learning training design. The dataset consists of the Raman spectra of each tumor spot, together with its methylation label (IDH1 mutant, or wild type, LGm1 or LGm2). Because the data are often imbalanced, i.e., one class is significantly more populated with than the other, we split the data into 5 disjoint datasets in preparation for the 5-fold cross-validation training of the machine learning model. To avoid data leakage, we used a tumor-stratified approach: a sample contributes all its spots to a single subset. The data distribution in each subset roughly follows the distribution of the entire dataset. We run a 5-fold cross-validation, training 5 separate random forest models, using each of the 5 subsets as the validation set, one by one, with the other 4 as training sets. The predictions of the 5 separate random forest models are combined into the final random forest model. C. The model is further boosted by training a support vector classifier on its 20 most important Raman frequencies.
In the preprocessing steps, we removed the beginning of the spectrum and the biological silent region as previously described35,49–51 (Figure 2A). Then, the baseline of each Raman spectrum was corrected using the airPLS algorithm42 (Figure 2B). The 2 regions in each spectrum were then joined to form new Raman vectors of length 1167. The vectors were L2-normalized individually, as previously described52 (Figure 2C).
Figure 2.
Pre-processing steps, artifact removal, and validation of tumor versus nontumor spectra. A. Remove silent region. The top graph shows the regions of the spectra to be removed and the bottom graph shows the overlaid spectra after the regions were removed. B. Example of baseline correction. The top graph illustrates the application of a polynomial function to correct the baseline (black line), and the bottom graph shows the overlaid spectra after the baseline was corrected. C. Spectral normalization. The top and bottom graphs show the data before and after normalization. The data were normalized by dividing the spectral intensities by the individual L2-norm of the spectra. Viewed as vectors, each spectrum had a norm of 1 after this step. The spectra in sample HF-1887 deviated significantly from those of the other samples, so we removed it from the dataset. D-G. Artifact removal steps. D, Spatial representation of the clusters; the tumor cluster is orange, and the nontumor is blue. E, Principal component analysis of the tumor (orange) versus nontumor (blue) clusters to visualize the separation between the spectra. Overlayed Raman spectra separated by clustering for the tumor (F, orange) and the nontumor (G, blue) subsets. H-K, Validations steps. H, DBSCAN clustering is shown as a 2D spatial representation with the yellow part representing the tumor cluster and the purple part as the nontumor cluster (H, left). It is compared with 2D mapping of the same region obtained using a correlation function on the OMNICSTM software provided by the instrument (H, right). A visual inspection is done to ensure similarity between the images. The silhouette indices for (I) the tumor spectra and (J) the nontumor spectra. Each boxplot corresponds to a sample. The points belonging to each boxplot correspond to the Raman spectra for the respective samples. The boxplot in which the median of its tumor spectra is close to 0 corresponds to sample HF-1887, which was later removed. K, Colored lines represent the median spectrum over the entire dataset (blue line), over the tumor areas (red line), and over the nontumor areas (yellow line). The shape of the tumor median is almost identical to that of the global median, while the nontumor median diverges from the global median.
We used a clustering algorithm to automatically separate tumor from nontumor tissue in each sample. Then, we trained several classifiers for the IDH wild-type versus mutant characterization and for the G-CIMP-high versus G-CIMP-low characterization. Each classifier was trained using a stratified, 5-fold cross-validation design (Figure 1B, Supplementary Table 2). We found that a combination of random forests and support vector machines produced optimal models in terms of accuracy, precision, and recall. Finally, we extracted the most significant Raman frequencies that contributed to distinguishing the classes using a combination of feature ranking based on random forests and statistical testing using ANOVA and Chi2 (Figure 1C).
To determine how spots with similar Raman spectra are grouped into subregions, we trained machine-learning clustering models separately for each of the scanned areas. We used DBSCAN43 as the clustering model, a method that is robust to noise and automatically identifies the optimal number of clusters in each region. For each sample, we trained its own separate DBSCAN model to identify its specific tumor/nontumor separation. We found that, for each sample, exactly one cluster of collected spectra was similar in shape to the median spectrum of our dataset, and with intensity peaks on the frequencies expected for tumor spots (Figure 2K). We assigned this cluster to the tumor spots in the sample. The rest of the data were grouped by DBSCAN into one or up to 4 more clusters.
We assessed the quality of the clusters by using several methods (Figure 2D-J), namely, principal component analysis (PCA)53—and tSNE54—based separation between the clusters, spatial separation of the clusters in the scanned region followed by experimental validation, and silhouette scores of the clusters. The clustering results identified 257 904 spectra (86% of the data) as tumor spots and 42 602 spectra (14% of the data) as nontumor (Supplementary Table 1). With two exceptions (HF-1887 and HF-2849), the tumor cluster of each sample included most spectra, and it was consistent with the H&E staining results.
To evaluate the clusters produced by the DBSCAN models, we used the silhouette index55 (Figure 2I, J). We found that the tumor clusters had high silhouette indices, suggesting a strong confidence in the quality of these clusters. Nontumor and noise clusters had lower silhouette indices, due to the high variance between spectra. To confirm this result, we compared the cluster diameters and found that the diameters of the nontumor and noise clusters were indeed consistently larger than the diameters of tumor clusters.
An example of the clustering results is shown in Supplementary Figure 2 for sample HF-2102. Based on the H&E staining results, we expected that most of the Raman spectra to correspond to tumor cells (Supplementary Figure 1f). The spectra are shown in Supplementary Figure 2a overlayed on top of each other and the tumor spectra are shown separately in Supplementary Figure 2b. Within the nontumor spectra (Supplementary Figure 2c), DBSCAN further stratifies the spectra as three different nontumor clusters (Supplementary Figs. 2d-f). The spectra from Supplementary Figs. 2e and 2f are the result of local sample heating and represent another example of noise that can be produced during acquisition, and that can be successfully identified and removed with APOLLO. To further validate the results of DBSCAN, we visualized the clusters as a 2D spatial representation (Supplementary Figure 2g), where yellow pixels denote tumor spectra, and purple pixels denote nontumor spectra. We compared this spatial representation with a correlation plot obtained for the same area, as produced using the Omnics software (Thermo Fisher, Inc.; Supplementary Figure 2h), where the blue region represents tumor tissue, and the red and green regions represent nontumor tissues. Visual inspection of the two images shows that the mathematical clustering matches the Raman microscope image for this sample. As an additional validation step, we visualized the clusters through their PCA (Supplementary Figure 2i) and t-SNE (Supplementary Figure 2j) representations. Although the nontumor spectra can be further subclustered (cyan dots in the PCA and t-SNE plots), they are located within the main nontumor cluster.
APOLLO Discriminates Tumor Versus Nontumor Tissue in FFPE Slides
To determine which Raman frequencies are most important for discriminating between tumor and nontumor regions, we performed statistical feature-importance ranking by using ANOVA and Chi2. We also performed feature selection using a random forest model classifier trained on the tumor/nontumor separation of all 59 Raman samples. The receiver operating statistics curve (ROC) analysis of model performance for the tumor versus nontumor classification showed outstanding performance (average area under the ROC [AUC] of 0.99; Supplementary Figure 2k, confusion matrix in Supplementary Table 6).
In each of the 3 statistical analyses, we selected the Raman frequencies whose normalized importance score was greater than a cutoff threshold of 0.2 (Figure 3A-3C). The features identified by ANOVA (Figure 3A) and Chi2 (Figure 3B) methods are similar, whereas fewer features were singled out by the random forest method (Figure 3C). The frequency we identified at 2850 cm−1, which scored very high in importance (0.5-0.8), is very close to the previously published peak at 2845 cm−132,36,38,40 (CH2 bonds that are abundant in lipids). However, in this study, we did not identify the peak at 2930 cm−1 (CH3 bonds that predominate in proteins and DNA33,37,39,41). Instead, we identified novel Raman shifts that are important to discriminate tumor from nontumor, such as 2883, 1690, 1607, 1573, 1401, and 1335 cm−1 (Supplementary Table 3). The frequency with the highest importance in all 3 of the statistical methods in our study was 2883 cm−1, which can be attributed to the CH2 stretching vibrations of fatty acids, mostly cholesterol esters such as cholesteryl stearate.56 Other frequencies correspond to either collagen (1401 cm−1) amino acids (1607 cm−1), proteins (1607 cm−1, 1690 cm−1), or DNA/RNA (1335 cm−1, 1573 cm−1)56 (Supplementary Table 3).
Figure 3.
Identification and validation of most discriminative Raman frequencies for tumor versus nontumor. The analysis was done with (A) ANOVA, (B) Chi2, and (C) a random forest model, using a 5-fold cross-validation strategy with oversampling to compensate for the data imbalance as described in Figure 1B. D-R, Validation of discriminative frequencies using sample HF-1086 (LGm5-IDHwt). D, Optical image of the sample HF-1086 delineates the tumor tissue from the FFPE part. The red square indicates the areas that are shown in the following images. E, Autofluorescence of the using excitation at 641 nm and an emission window of 650-750 nm. F, The same image using fluorescence lifetime imaging (FLIM). G, Representative H&E staining of the adjacent slice from the same sample (HF-1086) to show the presence of tumor cells in the tissue. The scale is 500 mm. H, SRS image for 2930 cm−1 Raman frequency corresponding to CH3 bonds that predominate in proteins and DNA33,34 and that was found to distinguish tumors from normal cells. I. SRS image for 2845 cm−1, the Raman frequency that corresponds to CH2 bonds that are abundant in lipids and that was found to distinguish tumor from normal cells.33,34 J and K, Spectral processing of SRS 2845 and 2930 cm−1 to match the tumor-nontumor delineation, using stimulated Raman histology.33,34 In J, the SRS image of 2845 cm−1 was subtracted from the SRS image of 2930 cm−1 and colored in blue; then it was overlayed with the SRS image of 2845 cm−1 which was colored in green. In K, 3 images are overlayed: the SRS image of 2845 cm−1 subtracted from the SRS image of 2930 cm−1 (red) is overlayed with the SRS image of 2845 cm−1 (green), and with SRS 2930 cm−1 (blue) as previously described, to produce stimulated Raman histology.33,34 l, SRS image for 2883 cm−1 identified by APOLLO to best in discriminate tumor from nontumor. M, SRS image for 1335 cm−1. N, The ratio between images corresponding to SRS 1335 cm−1 and SRS 2883 cm−1, identified by APOLLO. It clearly highlights the areas of tumors from FFPE and seen in D. O, SRS image for 2850 cm−1 identified by the APOLLO to have the second highest score in discriminating tumor from nontumor P, SRS image for 1607 cm−1. R, The ratio between images corresponding to SRS 1607 cm−1 and SRS 2850 cm−1, overlayed with the tissue autofluorescence. The image highlights the areas of tumor from FFPE seen in D.
To validate APOLLO extracted frequencies, we selected the common Raman frequencies from all three statistical methods and acquired stimulated Raman spectroscopy (SRS) on them (Figure 3H-R). For the same slide, as a control, we acquired the optical image that shows the location of the tissue and the presence of the FFPE region (Figure 3D). We also acquired autofluorescence using a far-red wavelength (631 nm; Figure 3E) and fluorescence lifetime imaging (FLIM) measurements (Figure 3F); the H&E staining of the adjacent slide shows the presence of the tumor (Figure 3G).
To compare with the previously established stimulated Raman histology (SRH) method,33,34 we recorded SRS images of the peaks at 2930 cm−1 (Figure 3H) (CH3 bonds from proteins and DNA), and SRS at 2845 cm−1 (Figure 3I), attributed to CH2 bonds in lipids. We then processed the images as previously published33,34,37,38,41 first by subtracting the image corresponding to SRS at 2845 cm−1 from the SRS at 2930 cm−1 and coloring it blue and we then superimposed that image with the SRS at 2845 cm−1, which was colored in green (Figure 3J). We also reconstructed the SRH image for this slide, by overlaying 3 images containing: 1) the difference between 2930 and 2845 cm−1 (red), 2) the 2845 cm−1 image (green), and 3) 2930 cm−1 (blue) as previously published33,34 (Figure 3K). Although the resulting image showed the tumor architecture from FFPE, the high-intensity signal was present in the FFPE section of the slide. To compare these two methods, we recorded SRS images using the most important frequencies identified by APOLLO. The frequencies we identified in this region, 2883 cm−1 (Figure 3L) and 2850 cm−1 (Figure 3O) were very similar to the previously identified frequencies in that the maximum signal was found in the FFPE region of the slide. However, the lower frequency regions that APOLLO identified such as 1335 cm−1 (DNA/RNA) (Figure 3M) and 1607 cm−1 (tyrosine and phenylalanine of proteins)56,57 (Figure 3P) that APOLLO identified, selectively discriminated the tumor as having high intensity and while lower intensity was present in the FFPE region. As a result, we could produce images based on the ratio between the tumor and FFPE such as 1335 cm−1 over 2850 cm−1 to enhance the intensity of the tumor over the FFPE background (Figure 3N and R). We show that these frequencies are also discriminative between tumor tissue and rips in the tissues, not just the FFPE medium (Supplementary Figure 3). Together these data provide a novel set of frequencies that improve the detection of brain tumors in a very fast and accurate manner.
APOLLO Distinguishes IDH1mut From IDH1WT Subtypes and Identifies High Cholesterol in IDH1mut Glioma
To determine whether APOLLO could discriminate the IDH1mut from IDH1WT subtypes, we trained several different classifier models, based on random forests, support vector machines, boosted decision trees, multilayer perceptrons (a simplified form of artificial neural network), and convolutional neural networks. The best model had an average area under the ROC (AUC) of 0.82 (ROC curve in Figure 4H, confusion matrix in Supplementary Table 6) and was obtained by combining a random forest model and a support vector classifier trained on its top 20 most important features.
Figure 4.
Identification and validation of the most discriminative Raman frequencies in our dataset that distinguish IDH1WT versus IDH1mut. The analysis was done with: A. ANOVA, B. Chi2, and C. a random forest model. D, Left: Optical image of the sample HF-1002-V2AT (LGm4 IDH1wt) distinguishes the tumor tissue from the FFPE part. Right: Optical image of the sample HF-2070-V1T (LGm2 IDH1mut) distinguishes the tumor tissue from the FFPE part. The red square indicates the area shown in the following images. Panels E-G are shown at the same scale and acquired under identical conditions. E, SRS of 2970 cm−1 for IDH1wt (left) and IDH1mut (right). F, SRS of 2930 cm−1 for IDH1wt (left) and IDH1mut (right). G, SRS of 2883 cm−1 for IDH1wt (left) and IDH1mut (right). H, ROC statistics to evaluate the performance of the classification. The analysis was done using a 5-fold cross-validation strategy with oversampling to compensate for the data imbalance. I, Cholesterol levels in IDHWT and IDHmut cell lines measured by mass spectrometry. Statistics were determined using Graph Prism 10 and paired t-test. P values lower than 0.05 are denoted with *P < 0.05; **P < 0.005; and ***P < 0.0005. J, mRNA expression of acetyl-CoA acetyltransferase (ACAT1) for IDH1wt (cyan) and IDH1mut (orange) from TCGA. K and L, Kaplan-Maier survival plots using TCGA data for glioblastoma (IDH1wt)(K) and low-grade gliomas (IDH1mut)(L) to show the different effect of ACAT1 expression on survival of patients with those tumor types. Graphs for J, K, and L were produced using GlioVis.58
We then identified all the frequencies in common among the three statistical methods and proposed tentative assignments based on the existing literature in which the same laser (532 nm) was used (Fig. 4A-C, Supplementary Table 4). We identified cholesterol esters such as cholesteryl stearate and palmitate to be amongst the peaks that scored high (peaks at 2883 cm−1, 1440 cm−1, and 532 cm−1) (Figs 4D-G).56,57 Although the peaks at 2883 cm−1 (Figure 4G) and 1440 cm−1 could also come from FFPE, the peak around 532 cm−1 has never been assigned to paraffin or formalin59 which suggests that the abundance of cholesterol esters plays an important role in discriminating IDH1mut from IDH1WT, a finding with significant implications for the biology of gliomas. We, therefore, obtained additional validation for these findings. First, we measured cholesterol levels from patient-derived IDH1WT and IDH1mut cell lines and indeed the levels of this metabolite were higher in cells containing an IDHmut (Figure 4I). Secondly, we investigated the enzyme, acetyl-CoA acetyltransferase (ACAT1)/sterol O-acyltransferase 1 (SOAT1), which combines cholesterol with fatty acids, such as palmitic or oleic acid, to make cholesterol esters, with the most abundant forms being cholesteryl palmitate (CPA) and cholesteryl stearate (CSA). mRNA data from TCGA showed high expression in ACAT1 mRNA levels for IDH1mut glioma compared with IDHWT glioblastomas (Figure 4J). High expression of mRNA for ACAT1/SOAT1 in IDH1mut lower-grade gliomas was correlated with better survival in patient samples from TCGA, whereas high mRNA expression of this enzyme is not linked with survival in patients with IDHWT glioblastomas (Figure 4K, L). We hypothesized that this enzyme has a role in detoxifying the IDH1mut cells from free cholesterol, which has not been previously reported, or free fatty acids reported to be high in these tumors,19 by placing it in storage in the form of lipid droplets; thus, this enzyme could be investigated in the future.
APOLLO Discriminates G-CIMP-High From G-CIMP-Low Methylation Subtypes
Since the G-CIMP-high (LGm1) subtype is associated with a better prognosis compared to G-CIMP-low (LGm2), the ability to discriminate between finer subtypes of IDH1mut gliomas has important clinical implications. Therefore, we trained APOLLO to classify these 2 subtypes of IDH1mut gliomas. The Raman frequencies that scored the highest for this discrimination were: 2887 cm−1 (CH2 of phospholipids), 2865 cm−1 (CH2 in lipids and proteins), 2836 cm−1 (lipid changes), 1414 cm−1 (C=C in a quinoid ring found in phenylalanine and tyrosine from proteins), 1300 cm−1 (DNA), and 1292 cm−1 (cytosine)56 (Figure 5A-C, Supplementary Table 5). These frequencies point to the predominant role of lipids in distinguishing the aggressive IDH1mut glioma from the indolent IDH1mut glioma, a novel finding that warrants further exploration. Using SRS imaging, we showed that the frequencies that scored the highest in the classification (2887 cm−1) (Figure 5E, G) and 2836 cm−1 (Figure 5E, F) have different intensities in randomly selected samples that are either IDH1mut G-CIMP-high or IDH1mut G-CIMP-low. The optical image was taken for the samples, to confirm that the SRS image is taken from the tissue region and has little FFPE present (Figure 5E).
Figure 5.
APOLLO discriminates between different IDH1mut subtypes and reveals intratumor heterogeneity. The graphs show the most discriminative Raman frequencies for classifying IDH1mut G-CIMP-high and IDH1mut G-CIMP-low. The analysis was done with: A. ANOVA, B. Chi2, and C. a random forest model using a 5-fold cross-validation strategy with oversampling to compensate for the data imbalance. D. Area under the precision-recall curves (AUPR) to evaluate the performance of the classification. E. F, Comparison of 2 samples: an IDH1mut G-CIMP-high (E, F, G, left panel) and IDH1mut G-CIMP-low (E, F, G right panel) for the highest scored frequencies-SRS 2836 cm−1 (F), SRS 2887 cm−1 (G), and the associated optical image (E).
We evaluated the model using the precision-recall (PR) curves, a more suitable choice for imbalanced datasets than the ROC curves, which tend to be over-optimistic in such cases. As expected, the model performed better at classifying G-CIMP-low (LGm2), which was 2 times more abundant in the dataset than G-CIMP-high (LGm1). The mean precision of the model over the 5 cross-validation folds was 0.89 for G-CIMP-low and 0.63 for G-CIMP-high, while the mean recall was 0.91 and 0.60, respectively, with the area under the precision-recall curves being 0.86 and 0.78, respectively, surprisingly high numbers (Figure 5D, confusion matrix in Supplementary Table 6).
APOLLO Unravels Intratumor Heterogeneity in FFPE Slides
Taking advantage of Raman spectroscopy’s single-cell resolution, we assessed the intratumor heterogeneity of the samples in the dataset. The analysis was done by clustering of the Raman tumor spectra over the entire dataset (as opposed to that done sample by sample for the tumor/nontumor separation). We applied mini-batch K-means to cluster the data into 2 clusters and, in a separate analysis, into 6 clusters. Figure 6A and B show an example of the surface maps for sample HF-2106. The color map is given by the cluster labels of each Raman spectrum; black was used for the nontumor spots. The 2-cluster model separated the dataset into 2 separate clusters which contained a relatively balanced distribution of 142 609 and 110 999 spectra, respectively. The 6-cluster model had a less even distribution of spectra, with 2 clusters containing most spectra and the rest containing smaller subsets. The distribution of the cluster sizes were 100 416, 2714, 1171, 127 638, 11 000, and 10 669.
Figure 6.
APOLLO reveals intratumor heterogeneity. The surface of sample HF 2106 is separated by the binary clustering model (A) and the 6-cluster model (B). Both models show distinct shapes on the tumor surface. The most important features of the spectra, separated into 2 clusters are shown in (C-E) where ANOVA, Chi2, and Random Forest feature importance is displayed in that order. The same is shown for the 6-cluster model in (F-H). The biggest Raman peaks in the spectra are present as the most important features through all methods. The UMAP dimensionality reduction is displayed in (I), showing 4 apparent regions which the data can be organized into. The binary separation using the binary cluster model is shown in (J) and the 6-cluster separation is in (K). In both clustering models, the 2 smaller areas in the UMAP plot consist of a single cluster.
We stored the cluster assignments for each spectrum and trained Random Forest models via the same 5-fold cross-validation to learn the cluster arrangement. Our models could separate the binary cluster labels with an accuracy 0.97 on the validation folds. The 6 cluster labels were learned with an accuracy 0.91 on the validation folds. The most relevant features of the spectra, separated into the binary- and 6-cluster setup, were identified using ANOVA, Chi2, and the Random Forest model trained using the cluster labels (Figure 6C-H).
To further evaluate the heterogeneity, present in our dataset, we applied the UMAP algorithm to our dataset, a method designed to perform dimensionality reduction on high dimensional data while preserving its global structure. The UMAP reduction yielded a clear separation of the dataset into 4 separate clusters displayed in Figure 6I. Each separate region observed in the UMAP plot consists of different amounts of spectra. The biggest region consists of 205 169 spectra. The upper area consists of 44 958 spectra and the 2 smaller areas to the right in the plot consist of 2310 (upper) and 1171 (lower). We also applied the binary and 6 cluster mini-batch k-means models to the spectra and mapped the separation between the clusters using the UMAP components (Figure 6J, K). When tracing the methylation labels of the spectra, we notice that all UMAP clusters are infiltration zones between the methylation types. This indicates the subtlety of distinguishing between the methylation sub-types and that using a reduced data spectrum, rather than the full one offered by the spontaneous Raman spectra, obscures the differences between them.
Discussion
Raman spectroscopy offers an unprecedented opportunity to analyze tissue composition at a biochemical level and with subcellular resolution, but the spectra are highly convoluted and hard to interpret by eye. Herein, we introduced an automated machine-learning platform that is fast to identify spectral differences associated with tumor subtypes. We demonstrated that our platform, APOLLO can predict glioma subtypes at different levels of granularity. While previous studies have avoided the use of FFPE slides or performed deparaffinization before Raman measurements31,60 we were able to reduce the noise associated with paraffinization using machine learning. Importantly, we were able to demonstrate that lipids, which are often washed out during de-paraffinization, provide crucial information for different classifications.
To demonstrate the feasibility of our workflow, we first tested APOLLO on the ability to discriminate tumor versus nontumor regions, as this is the main application of current SRS methods, using 2 well-established Raman frequencies: 2845 cm−1 (lipids) and 2930 cm−1 (for proteins and DNA).33,34,37,38,41 We demonstrate that APOLLO is capable of discriminating tumor from nontumor regions and can identify one of the main peaks (2845 cm−1), used before for discriminating tumors from nontumor tissue in fresh tumors.33,34,37,41 Interestingly, APOLLO identified a novel Raman shift at 2883 cm−1, which had the highest score contributing to the discriminative power between tumor and nontumor. APOLLO revealed frequencies that play a crucial role in discriminating between tumor and nontumor areas, including frequencies originating from proteins (peaks at 1607 cm−1 corresponding to tyrosine and phenylalanine ring vibrations) and DNA (peak at 1335 cm−1 corresponding to CH3, CH2 modes from purine bases).56 We think that these novel frequencies contribute to a highly accurate classification because their intensity is highest in the tumor rather than the paraffin region. These findings suggest that APOLLO can distinguish unique Raman peaks that are specific to the tumor region and eliminate the need for chemical deparaffinization due to AI-based removal of paraffin spectra. Previous work that used FFPE tissue had to chemically deparaffinize the tissue before spectroscopic acquisition.31,32 These findings imply that archived tissue can be analyzed as it is by APOLLO and further used for downstream applications such as RNA scope or 10× genomics.
APOLLO found that lipids and fatty acids are higher in abundance in IDH1mut glioma, which we have shown before.19 However, APOLLO identified that cholesterol esters were highly predominant in IDH1mut tumors, suggesting a novel biology for these tumors and a potential avenue for future targeting. Within IDH1mut, APOLLO parsed out the finer subtypes, demonstrating the ability to detect biologically and clinically relevant differences. APOLLO’s ability to distinguish between 2 subtypes of IDHmut gliomas (G-CIMP-high from G-CIMP-low) has the potential to expedite the clinical management of the latter given their relative aggressiveness and poorer prognosis.6,12
Other studies have classified IDH1mut versus IDH1WT gliomas with spontaneous Raman spectroscopy and designed the study to diagnose gliomas quickly using fresh tissue for intraoperative application.32,36,61 As a result, the average number of spectra recorded was around 161 spectra per sample32 at random spots in the sample location, while we used an average of 5100 spectra per sample and have a continuous region we scan. The increase in signal-to-noise in our study coming from both having a 532-nm laser30 as well as the number of scans increases the confidence in peak selection by machine learning. In previous work,61 authors pre-selected the peaks a priory by overlaying the spectra and deciding by eye which intensities are different and then considered the intensity of those peaks into the next step of machine learning. In contrast, we considered all the Raman peaks into the next step, making APOLLO fully automated, without any manual labor, and in line with the standard of practice of many spectroscopic techniques. In addition, different from other studies that utilize only spontaneous Raman spectroscopy32,62 or only SRS32,36–38,40 herein, we used both methods. Our findings imply that the SRS machines could be improved with the addition of novel frequencies that APOLLO identified, in the future. They also open the possibility that we can image tumor heterogeneity in the tissue based solely on the biochemical composition of the sample. In addition, combining spontaneous Raman with antibody-conjugated surface-enhanced substrates that target a specific protein such EGFR63 could be used for the identification of certain subtypes of gliomas. APOLLO is a general workflow that deals with spontaneous Raman spectra and includes automated ways to remove the noise, correct the baseline, and find differences in the spectra for different types of samples. As a result, this computational framework could be applied to any Raman spectra that come from analyzing other samples such as: extracellular vesicles, tumor circulating cells, CSF, and serum.
Supplementary material
Supplementary material is available online at Neuro-Oncology (https://academic.oup.com/neuro-oncology).
Acknowledgments
The authors would like to thank Alan and Ashley Dabbiere for their monetary donation which contributed to the acquisition of the Leica Stellaris 8 CRS instrument. We thank E. He, from Medical Arts of the National Institutes of Health for help with the figures.
Contributor Information
Adrian Lita, National Cancer Institute, National Institutes of Health, Neuro-Oncology Branch, Bethesda, Maryland, USA.
Joel Sjöberg, Department of Mathematics and Statistics, University of Turku, Turku, Finland.
David Păcioianu, Faculty of Mathematics and Computer Science, University of Bucharest, Bucharest, Romania.
Nicoleta Siminea, Department of Bioinformatics, National Institute for Research and Development in Biological Sciences, Bucharest, Romania; Faculty of Mathematics and Computer Science, University of Bucharest, Bucharest, Romania.
Orieta Celiku, National Cancer Institute, National Institutes of Health, Neuro-Oncology Branch, Bethesda, Maryland, USA.
Tyrone Dowdy, National Cancer Institute, National Institutes of Health, Neuro-Oncology Branch, Bethesda, Maryland, USA.
Andrei Păun, Department of Bioinformatics, National Institute for Research and Development in Biological Sciences, Bucharest, Romania; Faculty of Mathematics and Computer Science, University of Bucharest, Bucharest, Romania; SCORE Lab, I3US, Universidad de Sevilla, Sevilla, Spain.
Mark R Gilbert, National Cancer Institute, National Institutes of Health, Neuro-Oncology Branch, Bethesda, Maryland, USA.
Houtan Noushmehr, Department of Neurosurgery, Henry Ford Health System, Detroit, Michigan, USA.
Ion Petre, Department of Bioinformatics, National Institute for Research and Development in Biological Sciences, Bucharest, Romania; Department of Mathematics and Statistics, University of Turku, Turku, Finland.
Mioara Larion, National Cancer Institute, National Institutes of Health, Neuro-Oncology Branch, Bethesda, Maryland, USA.
Funding
This research was supported by the National Institutes of Health Intramural Research Program through an NCI FLEX award to A.L., M.R.G., and M.L. entitled “Live cell metabolism via Raman imaging microscopy.” This work was partially supported by the Core Program within the Romanian National Research, Development, and Innovation Plan 2022-2027, carried out with the support of MRID, project no. 23020101(SIA-PRO), contract no 7N/2022, and project PNRR-I8 no. 842027778, contract no. 760096.
Conflict of interest statement
None declared.
Data availability
All raw data files analyzed to generate the presented results are available here: https://drive.google.com/drive/folders/1CNAwZa3H22FQtprSRXwHXsJ8FdazIUBd
Code availability
The code is available upon request for free to academic institutions, and under license from NIH for commercial entities. The code can be accessed here: https://github.com/ionpetre/APOLLO
Author contributions
A.L. and J.S. contributed equally. A.L. performed the data acquisition, contributed to the data analysis, wrote the paper, and edited it. J.S. constructed the mathematical models, wrote the paper, and edited it. N.S., D.P., O.C., A.P., M.R.G., H.N., T.D., I.P., and M.L. wrote the paper and edited it. M.L. and I.P. supervised the work. M.L. and I.P. contributed equally.
References
- 1. Louis DN, Ohgaki H, Wiestler OD, et al. The 2007 WHO classification of tumours of the central nervous system. Acta Neuropathol. 2007;114(2):97–109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Yan H, Parsons DW, Jin G, et al. IDH1 and IDH2 mutations in gliomas. N Engl J Med. 2009;360(8):765–773. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Cohen AL, Holmen SL, Colman H.. IDH1 and IDH2 mutations in gliomas. Curr Neurol Neurosci Rep. 2013;13(5):345. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Louis DN, Perry A, Reifenberger G, et al. The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary. Acta Neuropathol. 2016;131(6):803–820. [DOI] [PubMed] [Google Scholar]
- 5. Louis DN, Perry A, Wesseling P, et al. The 2021 WHO classification of tumors of the central nervous system: a summary. Neuro-Oncology. 2021;23(8):1231–1251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Ceccarelli M, Barthel FP, Malta TM, et al. ; TCGA Research Network. Molecular profiling reveals biologically discrete subsets and pathways of progression in diffuse glioma. Cell. 2016;164(3):550–563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Capper D, Jones DTW, Sill M, et al. DNA methylation-based classification of central nervous system tumours. Nature. 2018;555(7697):469–474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Wu Z, Abdullaev Z, Pratt D, et al. Impact of the methylation classifier and ancillary methods on CNS tumor diagnostics. Neuro Oncol. 2022;24(4):571–581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Jaunmuktane Z, Capper D, Jones DTW, et al. Methylation array profiling of adult brain tumours: diagnostic outcomes in a large, single centre. Acta Neuropathol Commun. 2019;7(1):24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Turcan S, Rohle D, Goenka A, et al. IDH1 mutation is sufficient to establish the glioma hypermethylator phenotype. Nature. 2012;483(7390):479–483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Noushmehr H, Weisenberger DJ, Diefes K, et al. ; Cancer Genome Atlas Research Network. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell. 2010;17(5):510–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. de Souza CF, Sabedot TS, Malta TM, et al. A Distinct DNA methylation shift in a subset of glioma CpG Island methylator phenotypes during tumor recurrence. Cell Rep. 2018;23(2):637–651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Bi J, Ichu TA, Zanca C, et al. Oncogene amplification in growth factor signaling pathways renders cancers dependent on membrane lipid remodeling. Cell Metab. 2019;30(3):525–538.e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Bi J, Khan A, Tang J, et al. Targeting glioblastoma signaling and metabolism with a re-purposed brain-penetrant drug. Cell Rep. 2021;37(5):109957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Molendijk J, Robinson H, Djuric Z, Hill MM.. Lipid mechanisms in hallmarks of cancer. Mol Omics. 2020;16(1):6–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Wu X, Geng F, Cheng X, et al. Lipid droplets maintain energy homeostasis and glioblastoma growth via autophagic release of stored fatty acids. iScience. 2020;23(10):101569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Guo D, Bell EH, Chakravarti A.. Lipid metabolism emerges as a promising target for malignant glioma therapy. CNS Oncol. 2013;2(3):289–299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Fack F, Tardito S, Hochart G, et al. Altered metabolic landscape in IDH-mutant gliomas affects phospholipid, energy, and oxidative stress pathways. EMBO Mol Med. 2017;9(12):1681–1695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Lita A, Pliss A, Kuzmin A, et al. IDH1 mutations induce organelle defects via dysregulated phospholipids. Nat Commun. 2021;12(1):614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Dowdy T, Zhang L, Celiku O, et al. Sphingolipid pathway as a source of vulnerability in IDH1mut glioma. Cancers (Basel). 2020;12(10):2910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Zaibaq F, Dowdy T, Larion M.. Targeting the sphingolipid rheostat in gliomas. Int J Mol Sci . 2022;23(16):9255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Ruiz-Rodado V, Lita A, Larion M.. Advances in measuring cancer cell metabolism with subcellular resolution. Nat Methods. 2022;19(9):1048–1063. [DOI] [PubMed] [Google Scholar]
- 23. Pliss A, Kuzmin AN, Lita A, et al. A single-organelle optical omics platform for cell science and biomarker discovery. Anal Chem. 2021;93(23):8281–8290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Lita A, Kuzmin AN, Pliss A, et al. Toward single-organelle lipidomics in live cells. Anal Chem. 2019;91(17):11380–11387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Kuzmin AN, Pliss A, Rzhevskii A, Lita A, Larion M.. BCAbox algorithm expands capabilities of Raman microscope for single organelles assessment. Biosensors (Basel). 2018;8(4):106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Zhou Y, Liu CH, Sun Y, et al. Human brain cancer studied by resonance Raman spectroscopy. J Biomed Opt. 2012;17(11):116021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Jermyn M, Mok K, Mercier J, et al. Intraoperative brain cancer detection with Raman spectroscopy in humans. Sci Transl Med. 2015;7(274):274ra19. [DOI] [PubMed] [Google Scholar]
- 28. Jermyn M, Desroches J, Mercier J, et al. Raman spectroscopy detects distant invasive brain cancer cells centimeters beyond MRI capability in humans. Biomed Opt Express. 2016;7(12):5129–5137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Amharref N, Beljebbar A, Dukic S, et al. Discriminating healthy from tumor and necrosis tissue in rat brain tissue samples by Raman spectral imaging. Biochim Biophys Acta. 2007;1768(10):2605–2615. [DOI] [PubMed] [Google Scholar]
- 30. Zhou Y, Liu CH, Wu B, et al. Optical biopsy identification and grading of gliomas using label-free visible resonance Raman spectroscopy. J Biomed Opt. 2019;24(9):1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Klamminger GG, Gérardy JJ, Jelke F, et al. Application of Raman spectroscopy for detection of histologically distinct areas in formalin-fixed paraffin-embedded glioblastoma. Neurooncol. Adv. 2021;3(1):vdab077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Livermore LJ, Isabelle M, Bell IM, et al. Rapid intraoperative molecular genetic classification of gliomas using Raman spectroscopy. Neurooncol. Adv. 2019;1(1):vdz008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Orringer DA, Pandian B, Niknafs YS, et al. Rapid intraoperative histology of unprocessed surgical specimens via fibre-laser-based stimulated Raman scattering microscopy. Nat Biomed Eng. 2017;1:0027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Pekmezci M, Morshed RA, Chunduru P, et al. Detection of glioma infiltration at the tumor margin using quantitative stimulated Raman scattering histology. Sci Rep. 2021;11(1):12162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Fung AA, Shi L.. Mammalian cell and tissue imaging using Raman and coherent Raman microscopy. Wiley Interdiscip Rev Syst Biol Med. 2020;12(6):e1501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Bury D, Morais CLM, Ashton KM, Dawson TP, Martin FL.. Ex vivo Raman spectrochemical analysis using a handheld probe demonstrates high predictive capability of brain tumour status. Biosensors (Basel). 2019;9(2):49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Hollon TC, Pandian B, Adapa AR, et al. Near real-time intraoperative brain tumor diagnosis using stimulated Raman histology and deep neural networks. Nat Med. 2020;26(1):52–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Ji M, Orringer DA, Freudiger CW, et al. Rapid, label-free detection of brain tumors with stimulated Raman scattering microscopy. Sci Transl Med. 2013;5(201):201ra119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Hollon T, Jiang C, Chowdury A, et al. Artificial-intelligence-based molecular classification of diffuse gliomas using rapid, label-free optical imaging. Nat Med. 2023;29(4):828–832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Ji M, Lewis S, Camelo-Piragua S, et al. Detection of human brain tumor infiltration with quantitative stimulated Raman scattering microscopy. Sci Transl Med. 2015;7(309):309ra163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Hollon TC, Lewis S, Pandian B, et al. Rapid intraoperative diagnosis of pediatric brain tumors using stimulated Raman histology. Cancer Res. 2018;78(1):278–289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Zhang ZM, Chen S, Liang YZ.. Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst. 2010;135(5):1138–1146. [DOI] [PubMed] [Google Scholar]
- 43. Ester M, Kriegel HP, Sander J, Xiaowei X. A density-based algorithm for discovering clusters in large spatial databases with noise. Published online December 31, 1996. https://www.osti.gov/biblio/421283. Accessed June 15, 2024. [Google Scholar]
- 44. Lloyd S. Least squares quantization in PCM. IEEE Trans Inf Theory. 1982;28(2):129–137. [Google Scholar]
- 45. Sculley D. Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web. WWW’10. New York, NY: Association for Computing; Machinery; 2010:1177–1178. doi: 10.1145/1772690.1772862 [DOI] [Google Scholar]
- 46. Breiman L. Random Forests. Mach Learn. 2001;45(1):5–32. [Google Scholar]
- 47. Ho TK. Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition. Vol 1; August 14–16; 1995; Montreal, Canada: IEEE; 1995:278–282 vol.1. doi: 10.1109/ICDAR.1995.598994 [DOI] [Google Scholar]
- 48. Cortes C, Vapnik V.. Support-vector networks. Mach Learn. 1995;20(3):273–297. [Google Scholar]
- 49. Yamakoshi H, Dodo K, Palonpon A, et al. Alkyne-tag Raman imaging for visualization of mobile small molecules in live cells. J Am Chem Soc. 2012;134(51):20681–20689. [DOI] [PubMed] [Google Scholar]
- 50. Jamieson LE, Greaves J, McLellan JA, et al. Tracking intracellular uptake and localisation of alkyne tagged fatty acids using Raman spectroscopy. Spectrochim Acta A Mol Biomol Spectrosc. 2018;197:30–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. van Manen HJ, Kraan YM, Roos D, Otto C.. Single-cell Raman and fluorescence microscopy reveal the association of lipid bodies with phagosomes in leukocytes. Proc Natl Acad Sci USA. 2005;102(29):10159–10164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Kanno N, Kato S, Ohkuma M, et al. Machine learning-assisted single-cell Raman fingerprinting for in situ and nondestructive classification of prokaryotes. iScience. 2021;24(9):102975. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Hotelling H. Analysis of a complex of statistical variables into principal components. J Educ Psychol. 1933;24(7):498–520. [Google Scholar]
- 54. van der Maaten LJP, Hinton GE.. Visualizing high-dimensional data using t-SNE. J Mach Learn Res. 2008;9(nov):2579–2605. [Google Scholar]
- 55. Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65. [Google Scholar]
- 56. Czamara K, Majzner K, Pacia MZ, et al. Raman spectroscopy of lipids: a review. J Raman Spectrosc. 2015;46(1):4–20. [Google Scholar]
- 57. Krafft C, Neudert L, Simat T, Salzer R.. Near infrared Raman spectra of human brain lipids. Spectrochim Acta A Mol Biomol Spectrosc. 2005;61(7):1529–1535. [DOI] [PubMed] [Google Scholar]
- 58. Bowman RL, Wang Q, Carro A, Verhaak RGW, Squatrito M.. GlioVis data portal for visualization and analysis of brain tumor expression datasets. Neuro Oncol. 2017;19(1):139–141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Fullwood LM, Griffiths D, Ashton K, et al. Effect of substrate choice and tissue type on tissue preparation for spectral histopathology by Raman microspectroscopy. Analyst. 2014;139(2):446–454. [DOI] [PubMed] [Google Scholar]
- 60. Krafft C, Codrich D, Pelizzo G, Sergo V.. Raman and FTIR microscopic imaging of colon tissue: a comparative study. J Biophotonics. 2008;1(2):154–169. [DOI] [PubMed] [Google Scholar]
- 61. Riva M, Sciortino T, Secoli R, et al. Glioma biopsies classification using Raman spectroscopy and machine learning models on fresh tissue samples. Cancers (Basel). 2021;13(5):1073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Klein K, Klamminger GG, Mombaerts L, et al. Computational assessment of spectral heterogeneity within fresh glioblastoma tissue using Raman spectroscopy and machine learning algorithms. Molecules. 2024;29(5):979. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Rotter LK, Berisha N, Hsu HT, et al. Visualizing surface marker expression and intratumoral heterogeneity with SERRS-NPs imaging. Nanotheranostics. 2022;6(3):256–269. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All raw data files analyzed to generate the presented results are available here: https://drive.google.com/drive/folders/1CNAwZa3H22FQtprSRXwHXsJ8FdazIUBd







