Abstract
This study addresses the core issue facing a surgical team during breast cancer surgery: quantitative prediction of tumor likelihood including estimates of prediction error. We have previously reported that a molecular probe, Laser Raman spectroscopy (LRS), can distinguish healthy and tumor tissue. We now report that combining LRS with two machine learning algorithms, unsupervised k-means and stochastic nonlinear neural networks (NN), provides rapid, quantitative, probabilistic tumor assessment with real-time error analysis. NNs were first trained on Raman spectra using human expert histopathology diagnostics as gold standard (74 spectra, 5 patients). K-means predictions using spectral data when compared to histopathology produced clustering models with 93.2–94.6% accuracy, 89.8–91.8% sensitivity, and 100% specificity. NNs trained on k-means predictions generated probabilities of correctness for the autonomous classification. Finally, the autonomous system characterized an extended dataset (203 spectra, 8 patients). Our results show that an increase in DNA|RNA signal intensity in the fingerprint region (600–1800 cm−1) and global loss of high wavenumber signal (2800–3200 cm−1) are particularly sensitive LRS warning signs of tumor. The stochastic nature of NNs made it possible to rapidly generate multiple models of target tissue classification and calculate the inherent error in the probabilistic estimates for each target.
Subject terms: Biophysics, Cancer, Breast cancer
Introduction
Breast cancer is the most common cancer affecting women across the globe1. With the advancement of screening techniques, more breast cancers are caught at early stages2. Early stage breast cancer treatment includes breast conserving surgery and obtaining negative margins is paramount in preventing recurrence. Unfortunately, one in five patients will require re-excision surgery in order to achieve negative surgical margins3. As a result, the accurate determination of tumor margins in real time during surgical intervention has received significant attention4,5. Advanced technologies proposed to solve what has become known as “the margins problem” have included hyperspectral optical imaging6, magnetic resonance imaging7, and ultrasound8.
Amongst these technologies, Laser Raman Spectroscopy (LRS) is an emerging optical technique of considerable utility in surgical diagnostics. LRS probes the vibrational frequencies of molecular bonds to generate a unique biochemical signature for target tissues. The technique has been employed in diagnostic efforts for liver9,10, oral11–15, and prostate cancer16–19, as well as leukemia20,21, inflammation22,23, and apoptosis24,25. In breast cancer diagnostics, LRS can characterize microcalcifications26–30, distinguish immortalized, transformed, and invasive breast cancer cells31, and map the spatial distribution of carotenoids, mammaglobin, palmitic acid and sphingomyelin in ductal breast cancer32.
In brief, the excitement about LRS for breast cancer diagnosis is a response to the spectral specificity of the Raman scattering event, making it possible to quickly distinguish between lipid, protein, and DNA|RNA cell components33,34. LRS harnesses the vibrational frequencies of molecular bonds to provide a unique biochemical signature for target tissue. As a result, the technique can detect cellular changes characteristic of cancer tissue in vivo during the surgical procedure, facilitating real time margin evaluation. Here the morphological characteristics of breast cancer that should produce alterations in the LRS signal are quite clear: a massive increase in nuclear material and loss of cytoplasmic volume (predominantly lipids) compared to the healthy state35,36.
Currently, breast margin evaluation is most commonly performed by the pathologist following formalin fixation, paraffin embedding, thin sectioning, slide mounting, and staining of the tissue with haemotoxylin and eosin (H&E) stains. This process takes at least a day, and often longer. Haemotoxylin (purple) binds to acidic moieties such as DNA and RNA. Eosin (pink) binds to basic molecules. During slide preparation, cytoplasmic lipids are removed leaving behind structural proteins as spatial proxies. Most of these proteins are basic, including cytoplasmic filaments, intracellular membranes, and extracellular fibers. The classical H&E strategy not only requires binding pigments post-operatively, but cannot directly interrogate the lipid component of healthy or cancerous tissue. LRS can supply a similar cellular analysis in real time, as spectral signatures serve as proxies for the morphological alterations documented by histology.
LRS directly probes all the major cellular components without preparation: DNA, RNA, proteins, carbohydrates, and lipids. The spectra generated are so information-rich it has become quite common when employing LRS for surgical diagnostics to evaluate the entire Raman spectrum using principal component analysis (PCA) for initial feature extraction and data compression37–42. In a previous communication, we have presented data confirming that Raman spectral analysis using PCA in combination with linear discriminant analysis (LDA) can distinguish cancerous from healthy breast tissue using 16 bands gathered from the “fingerprint” region (here defined as 600–1800 cm−1) and 3 bands from the “high wavenumber” region (here defined as 2800–3000 cm−1)43.
However, LDA provided probabilistic estimates of tumor which were either quite high or quite low. For instance, running PCA and LDA on this current full dataset (n = 203) yielded high probabilities with an average of 1.0 and a standard deviation of 2.8e−04 and low probabilities with an average of 4.34e−03 and a standard deviation of 2.58e−02. Visual inspection of tumor and healthy spectra revealed that quite dissimilar spectra often received equally “certain” classification predictions from these algorithms. Hence in this paper we report on the use of stochastic neural networks (NNs) for transparent, statistically rigorous, probabilistic classification of healthy and tumor tissue. Additionally, we have noted that lipid components generate a significantly stronger Raman signal than both protein and DNA|RNA targets. Analyzing our own data and the published work of others, it appears that many of the spectral shifts reported as diagnostic for breast cancer may be due to the loss of lipid signals rather than detection of pathognomonic shifts in RNA, DNA, and protein composition. To evaluate this hypothesis, we have now investigated the ability of NNs to estimate the Bayesian probability that a Raman spectrum contains signatures characteristic of cancer using data from (1) the entire spectral bandwidth (600–3000 cm−1), (2) the fingerprint region (600–1800 cm−1), and (3) the high wavenumber region (2800–3000 cm−1).
In this communication, we first describe the information content of infrared Raman spectra characterizing healthy and cancer-containing breast tissue. We identify nine spectral regions useful in comparing DNA|RNA, protein, carbohydrate, and lipid cellular components of healthy and cancer cells. Six of these spectral regions originate in the fingerprint region (FP) (600–1800 cm−1) and three are collected in the high wavenumber (HW) region (2800–3000 cm−1). We first demonstrate the use of an unsupervised clustering algorithm, k-means, to initially identify clusters of healthy and cancerous targets. We compare the spectral data partitioning to the human expert classification using standard clinical histopathology. We then present the results of training three NNs to estimate the Bayesian probability that a target exhibits the LRS signatures expected from cancer tissue. One NN, FPHW, provides a broadband analysis of the spectral data using all nine bands. The two other networks focus on data from just the FP (6 bands) or HW (3 bands) regions. We demonstrate that the inherent stochastic nature of NNs make it possible to rapidly generate multiple sets of target tissue classification and then use those analyses to calculate the inherent error in the probabilistic estimates for each target. Our data indicate that loss of signal in HW bands may serve as an early warning marker of tissue destruction, while several FP bands may be particularly sensitive to subtle shifts in RNA, DNA, and protein composition. Finally, we illustrate the use of stochastic NNs to evaluate the unsupervised k-means classification of 203 spectra from 8 patients with ductal breast cancer independent of histopathological diagnostics.
Results
The pathologist on our team (DS) estimated the amount of cancer present in each Raman spectral target area using a semi-quantitative five-point scale ranging from 0 (no evidence of tumor) to 100% (all regions in the target area involved to some extent by tumor tissue). We collected a total of 203 samples out of which 154 were correlated with an H&E image and labelled with a quintile assessment of tumor involvement. See “Methods” for full description of slide preparation and imaging.
Figure 1A shows the H&E stains for examples of the five of the tissue categories. These are 1 mm2 regions surrounding the target site for the Raman data acquisition. Healthy tissue appears red where eosin dye has bound to cell structural proteins. Healthy regions also contain empty spaces where paraffin has replaced lipids during slide preparation. Tumor-rich tissue appears blue and purple where hematoxylin dye has bound to DNA, RNA, and peri-nuclear proteins. Figure 1B shows the mean LRS spectrum for each of the five regions along with 1-sigma error bars (n0 = 25; n25 = 12; n50 = 19; n75 = 49; n100 = 49).
Since Raman spectra are high dimensional, feature extraction is necessary to avoid overfitting during neural network training. This is achieved by selecting regions with the highest variance (most information) across both the entire dataset (n = 203, 8 patients) and the subset of spectra assigned to histopathology quintiles (n = 154, 5 patients). Figure 2A shows the variance for both the entire dataset and the histopathology subset with the nine bands chosen as features for the classification algorithms. The peak intensity for the bands and their probable biochemical origin appear in Table 1. Figure 2B provides a direct comparison of the mean ‘healthy’ (0%, n0 = 25) and ‘tumor’ (100%, n100 = 49) spectra along with 1-sigma error bars. For a comparison between this heuristic feature selection and PCA see Supplementary material.
Table 1.
Peak position (cm−1) | Assignment |
---|---|
796 | DNA|RNA ring breathing modes and O–P–O backbone33,34,63–69 |
828 | Proline, hydroxyproline, tyrosine, O–P–O)68,70,71; mono- and polysaccharides63,72,73 |
1048 | Symmetric stretch vibration of ν3PO43- in hydroxyapatite70 and glycogen67 |
1300 | Lipids63,69–72,74–78, collagen70,79, protein amide III68 |
1437 | Lipids14,63,71,77, fatty acids73–75, triglycerides80, collagen70,78,79,81, and phospholipids78,81 |
1654 | Amide I if collagen assignment, and/or C=C of lipids in normal tissue37,75 |
2853 | Symmetric stretch of lipids78,82 |
2896 | Asymmetric stretch of protein, lipids, glycogen71,82–84 |
2937 | C–H vibrations in lipids, proteins, glycogen, DNA|RNA71,84,85 |
Nine Raman bands were identified known to provide information on multiple cellular components including DNA|RNA, proteins, carbohydrates, and lipids (Table 1).
The bands are used as inputs to simple 3-layer neural networks with configuration 9:3.2; 6:3.2; or 3:3.2, i.e. 9, 6, or 3 input nodes, 3 hidden nodes, and 2 output nodes. Specifically, for NN FPHW (9:3.2), Raman fluxes from 6 bands in the fingerprint region (FP) were combined with three bands from the high wavenumber (HW) region to serve as 9 inputs for a network. NN FP (6:3.2) used the six bands from the finger print region, and NN HW (3:3.2) employed as inputs the three bands found in the high wavenumber region. Like the full spectrum data set used for FPHW, these inputs provided a mixture of information on all four of the primary cellular constituents: DNA|RNA, proteins, carbohydrates, and lipids.
First, k-means was run on the eigenvector dataset (0%, n = 25 and 100%, n = 49 tumor), in order to compare it to the human expert histopathology classification. Table 2 shows the number of spectra in the each of the two clusters (healthy, tumor) generated by k-means.
Table 2.
Dataset | Cluster 1 (healthy) | Cluster 2 (tumor) |
---|---|---|
FPHW | 29 | 45 |
FP | 30 | 44 |
HW | 29 | 45 |
Table 3 shows the prediction statistics for the k-means classification of the targets with histopathology classifications of 100% (n = 49) and 0% (n = 25) likelihood of tumor. These prediction statistics simply represent the k-means clustering of spectral data compared to the histopathology quintile classification of 100% or 0% tumor (N = 74). Accuracy was 94.5% for FPHW and HW, and 93.2% for FP. Sensitivity was 91.8% for both FPHW and HW, and 89.8% for FP. Specificity was 100% for all three.
Table 3.
K-means versus histopathology prediction statistics | |||
---|---|---|---|
Dataset | Accuracy (%) | Sensitivity (%) | Specificity (%) |
FPHW | 94.59 | 91.84 | 100 |
FP | 93.24 | 89.80 | 100 |
HW | 94.59 | 91.84 | 100 |
Tumor probabilities produced by the NNs trained using k-means spectral classification were compared to the predictions by NNs trained using histopathology classification as the gold standard. The results appear in Fig. 3. Spectra were obtained from regions classified by histopathology as likely to be composed of either 100% tumor or 0% tumor. Networks were trained using as inputs (x) 6 bands from the FP region (FP); (△) 3 bands from the HW region (HW); or (●) the full set of 9 bands (FPHW). Targets obtained from tissue in tumor-rich region according to H&E stain are denoted by red markers, while data obtained from areas apparently devoid of tumor are denoted by blue markers. This analysis is repeated with balanced classes (Healthy, Tumor, n = 25) in Supplementary materials (See Supplementary Fig. S1).
The k-means clustering and human expert histopathology classification of 100% and 0% tumor regions disagreed on five (5) spectra. Table 4 shows the NN (trained on k-means) generated probabilities for these 5 spectra for all three datasets (FP, HW, FPHW) as well the average probability. Predictions of tumor likelihood > 20% appear in bold in the table.
Table 4.
Neural network prediction of tumor likelihood | |||||
---|---|---|---|---|---|
NN | s1 (%) | s2 (%) | s3 (%) | s4 (%) | s5 (%) |
FPHW | 18.5 | 4.6 | 0.1 | 23.8 | 61.2 |
FP | 4.2 | 8.6 | 10.1 | 11.1 | 13.4 |
HW | 11.3 | 13.1 | 20.4 | 8.5 | 82.0 |
Average NN | 11.4 | 8.7 | 10.2 | 14.5 | 52.2 |
Likelihoods of tumor > 20% appear in bold. Figure 4 shows these 5 spectra along with average spectra from regions classified as 0% and 100% tumor.
Figure 4 shows these five spectra (labeled s1 to s5) along with the average spectra from the healthy and tumor regions (quintile 0% and 100%).
The five spectra were acquired from H&E regions assessed to be 100% tumor by histopathology. Spectra s1–s4 were predicted by unsupervised k-means to be most similar to healthy spectra using any one of the 3 feature sets (FP, HW, and FPHW). The s5 spectrum was classified by k-means as cancerous when using HW and FPHW data inputs, but as healthy when using the FP feature set. Spectra s1 and s2 (rendered in gray in Fig. 4) are most similar to the average healthy spectrum. The similarity is evident both to qualitative visual inspection and the two spectra appear near the center of the k-means cluster for healthy spectra. Healthy features include the relatively weak RNA|DNA backbone and protein signal at 796 and 828 cm−1, respectively, with the strong signal in HW. For spectra s1 and s2 all three NN configurations generate Bayesian estimates of tumor likelihood of < 20%. Spectra s3 and s4 (green) show characteristics of both healthy and tumor spectra. The RNA|DNA backbone and protein signals are increased with the band at 796 cm−1 increasing relatively more than the 828 cm−1 band, but the strong signal in HW so characteristic of healthy tissue remains relatively intact. Spectra s3 and s4 exhibit an increase in the RNA|DNA backbone and protein signature and a decrease in the high wavenumber signal expected for tumor. For spectrum s3 HW, the NN trained on high wavenumber data, generates a 20.4% likelihood of tumor. The NNs trained solely on fingerprint region data, FP, produces only a 10.1% likelihood of tumor. Similarly, for s4 FPHW produced a 23.8% likelihood of tumor. Spectra s5 (gold) is qualitatively most similar to the average cancer spectra. Here the HW signal has collapsed while the nucleotide and protein signals have increased further. In addition, the band intensity ratio for 796/828 has now shifted to favor the nucleotide moiety as it has in the average tumor spectrum (red) at the bottom of the graph. NNs HW and FPHW generate significant tumor likelihood for spectrum s5 (HW ~ 82.0%, FPHW ~ 61.2%).
Unsupervised k-means clustering and NN probability generation predict the likelihood of tumor for a larger dataset (n = 203, 8 patients) using all three feature sets (FP, HW, FPHW). The number of spectra in each k-means cluster for each of the three datasets appear in Table 5. All Raman spectra visually agreed with cluster assignment. These cluster assignments are further assessed through NN Bayesian probability estimation of how likely the k-means cluster assignment is correct. The use of stochastic NNs and three NN configurations makes it possible to generate two types of information (variance) to assess the reliability of the Bayesian probability of cancer. In this experiment, the k-means classification is employed as the gold standard for 10 train-test cycles with each of the three NN configurations. Two types of variance are measured. First, the variance in the output for each NN from the 10 test-train cycles is determined and designated the “intra-NN variance” or VRA. The lower the VRA, the more certain the NN is of its prediction of tumor likelihood (PTumor). The second variance, designated the inter-NN variance (VER), is generated using the 10-run average output probabilities (PTumor) of the 3 NN configurations. Increases in VER indicate disagreement amongst the three NN configurations in their estimate of PTumor. These two types of variances represent the reliability and reproducibility of the final algorithm (k-means and neural network) for the full dataset (n = 203). Since leave-one-out cross validation was employed in the neural network analysis, these two variances are also our best safeguard for overfitting. The hallmark of a Bayesian estimator is that the probabilities of the two events sum to one. We have summed the probabilities for the two output classes of the NNs and confirmed that they sum to one as another safeguard for overfitting.
Table 5.
Dataset | Cluster 1 (healthy) | Cluster 2 (tumor) |
---|---|---|
FP | 88 | 115 |
FPHW | 86 | 117 |
HW | 87 | 116 |
This remains consistent over multiple re-runs of the algorithm.
Figure 5A depicts VRA for each of the three NN configurations as a function of PTumor between the NN training boundaries (PTumor ~ 0 = > healthy, PTumor ~ 1 = > tumor).
PTumor estimates are divided into those with VRA < 1σ and VRA > 1σ. Vertical dotted lines indicate the point where a 5th order polynomial fit to the data intersects at the 1σ level. Significant increases in VRA (VRA > 1 σ) appear as expected in the boundary zone between the two classes (i.e., ~ 0.2 < PTumor < ~ 0.8). Figure 5B depicts VRA as a function of VER. Significant inter-NN variance (VER > 1σ) occurred in 7 targets (shaded red), all from tumor or boundary regions. Table 6 compares the cancer probability predictions (PTumor) and the NN variances (VRA and VER) for the 7 high samples.
Table 6.
Figure 6 | Histologya | Regionb | NN tumor prediction | Inter-NN variance (VRA) | Intra-NN variance (VER) | ||||
---|---|---|---|---|---|---|---|---|---|
FP | HW | FPHW | FP | HW | FPHW | ||||
s1 | 100 | B | 0.615 | 0.871 | 0.785 | 0.017 | 0.097 | 0.006 | 0.019 |
s2 | 100 | T | 0.635 | 0.927 | 0.846 | 0.023 | 0.097 | 0.002 | 0.008 |
s3 | – | T | 0.606 | 0.932 | 0.800 | 0.027 | 0.13 | 0.001 | 0.031 |
s4 | 100 | B | 0.448 | 0.897 | 0.777 | 0.054 | 0.103 | 0.002 | 0.039 |
s5 | 100 | B | 0.366 | 0.891 | 0.674 | 0.07 | 0.062 | 0.002 | 0.06 |
s6 | 100 | B | 0.354 | 0.873 | 0.487 | 0.072 | 0.066 | 0.010 | 0.088 |
s7 | 75 | T | 0.141 | 0.763 | 0.387 | 0.098 | 0.003 | 0.047 | 0.07 |
Significant inter-NN variance (see Fig. 5B, VER > 1σ) occurred in 7 targets, all from tumor or boundary regions. The highest NN tumor probability for each target appears in bold. NNs using only FP inputs predicts 3 of these targets contain tumor, but with relatively low probabilities (PTumor = 0.62 ± 0.02, N = 3). FPHW using the full nine inputs predicts 5 of the 7 targets contain tumor and produces higher probabilities (PTumor = 0.78 ± 0.06, N = 5). The NNs trained only on HW data predict all 7 of these targets contain tumor. HW NNs generated the highest tumor likelihood probabilities (PTumor = 0.88 ± 0.06, N = 3). The variation in the output of each stochastic NN (train-test cycle repeated 10 times with random weight restarts) provides an intra-NN variance or VRA for each of the three NN configurations. Six of the seven HW NNs exhibits the least variance in their predictions (VRA between 0.001 and 0.10). One a single target (s7) NN FP exhibits minimal variance across ten trials (VRA ~ 0.003) while generating a low tumor probability (P ~ 0.141). For target s7 HW predicts a tumor likelihood of P ~ 0.763 with VRA = 0.047.
aPathologist target labels in Fig. 6. Pathologist estimate of likelihood probe would strike tumor tissue in 1 mm2 region around laser.
bMacroscopic (1X) visual assessment of spectra collection sites as tumor (T), healthy (H), or boundary (B) regions.
The highest PTumor estimate and the lowest VRA for each target appears in bold. These are data points worth investigating since there is high disagreement between the nets (high VER,) but each net is confident in its’ output (low VRA). NNs using only FP inputs predict 3 of these targets contain tumor, but with relatively low probabilities (P = 0.62 ± 0.02, N = 3). FPHW using the full nine inputs predicts 5 of the 7 targets contain tumor and produces higher probabilities (P = 0.78 ± 0.06, N = 5). The NNs trained only on HW data predict all 7 of these targets contain tumor. HW NNs generated the highest tumor likelihood probabilities (PTumor = 0.88 ± 0.06). In six of the seven high VER samples, the HW NN exhibits the least variance in its predictions (VRA between 0.001 and 0.10). For the remaining target (s7), FP exhibits the lowest variance across ten trials (VRA ~ 0.003), but generated a tumor likelihood PTumor ~ 0.141. For that same target (s7) the HW NN predicted a tumor likelihood of P ~ 0.763 with VRA = 0.047.
Spectra corresponding to these 7 targets appear in Fig. 6. The average spectra from tumor and healthy regions appear at bottom and top of figure, for comparison. All of the 7 spectra show a depletion in signal strength for the three HW bands. The HW NN cancer likelihood probabilities ranged from PTumor ~ 0.763 to 0.927. s7 produces the lowest HW tumor probability (PTumor = 0.763) and highest VER (~ 0.047) due to the residual signal at 2853 cm-1. Spectra s5, s6, and s7 triggered a low-level warning in FP (PTumor ~ 0.6). These spectra show increases in signal strength at 796 and 828 cm−1, indicative of cellular increases in nucleic acids and proteins. Spectra s4, s5, and s7 contain narrow peaks of activity expected from surgical marking dyes at locations that do not compromise the nine NN diagnostic bands.
Seventeen targets (Fig. 5B, yellow shading) exhibit high VRA (> 1σ), but low VER (< 1σ). See Table S2in Supplementary material for PTumor, VRA, and VER values. For these targets there is high agreement amongst the three nets, but none of them are very certain of their decision. All targets were from tumor or border regions. Compared to the 7 targets in Table 6 (the red zone of Fig. 5B) these NNs show more individual variation (higher VRA), less disagreement (lower VER), and lower PTumor estimates. No NN configuration predicted tumor for all 17 targets. All three NN configurations agreed in their prediction of tumor for 6 samples (highlighted in bold). All three NN configurations also agreed when they predicted no tumor likely for 7 samples. In the remaining 4 samples at least one NN configuration predicted tumor, but probabilities were quite low ranging from PTumor ~ 0.505 to 0.634.
Figure 7 shows a laser spot size of 85 μm diameter (the dotted red rings) superimposed on 100 × 100 μm regions from healthy, boundary, and tumor zones. The inset shows the location (black arrow) of the regions on the H&E slide. The scale bar in the lower left corner of each image denotes 20 μm. Paraffin-filled white spaces (L) are lipid-rich in vivo. In the first image, healthy cells are surrounded by protein-rich supporting stroma (P). Cell nuclei (N) rich in DNA, RNA, and peri-nuclear proteins occur infrequently in these healthy regions and then increasingly dominate in boundary and tumor zones. From the geometric constraints depicted in these images it appears that spectral mixtures of healthy and tumor signals can be expected to occur quite frequently when probing heterogenous, rapidly progressing tumors. Hence, probabilistic algorithmic outputs were chosen to represent the likely detection of a cluster of mixed cells.
The final statistics and spectra identified by the autonomous, probabilistic artificial intelligence techniques employed in this study are presented in Table 7. Table 7 shows the autonomous classification of the 154 spectra for which we have H&E quintile assignments. Here the spectra from the original five (5) histopathology quintiles have been re-binned according to the maximum probability predicted by any one the three NNs (FP, HW, FPHW) when classifying the full data set (n = 203, 8 patients). The Bayesian probability quintiles are equal range bins: 0.0 ≤ P < 0.2; 0.2 ≤ P < 0.4; 0.4 ≤ P < 0.6; 0.6 ≤ P < 0.8 and 0.8 ≤ P ≤ 1.0. The mean spectra corresponding to these quintiles appear in Fig. 8. 147 of 154 spectra (95.4%) were assigned to end member quintiles: 0.0 ≤ P < 0.2 (N = 53, tumor unlikely) and 0.8 ≤ P ≤ 1.0 (N = 94, tumor highly likely). No targets received a probability score in the 0.6 ≤ P < 0.8 range. 24 of 25 spectra (96%) classified by histopathology as healthy did not receive a cancer prediction > 0.2 from the autonomous classification. The one spectra in the 0.2 ≤ P < 0.4 bin received a maximum score of P = 0.21. For the 49 spectra obtained from regions deemed 100% tumor-rich by histopathology, 45 (91.8%) were placed in the highest probability quintile 0.8 ≤ P < 1.0 bin. Two spectra were considered highly likely to originate in healthy tissue (0.0 ≤ P < 0.2) and two others appeared in the 0.2 ≤ P < 0.4 bin. The average spectra for each of the other four probability quintiles appear in Fig. 8. Maximum signal separation occurs with bands for the DNA|RNA signal at 796 cm−1, the protein band at 828 cm−1, the lipid signal at 1437 cm−1 and the three HW bands.
Table 7.
Histopathology | Maximum NN probability of cancer quintiles | |||||
---|---|---|---|---|---|---|
Quintile | 0.0 ≤ P < 0.2 | 0.2 ≤ P < 0.4 | 0.4 ≤ P < 0.6 | 0.6 ≤ P < 0.8 | 0.8 ≤ P ≤ 1.0 | N |
100% | 2 | 2 | 0 | 0 | 45 | 49 |
75% | 12 | 2 | 0 | 0 | 35 | 49 |
50% | 6 | 0 | 1 | 0 | 12 | 19 |
25% | 9 | 0 | 1 | 0 | 2 | 12 |
0% | 24 | 1 | 0 | 0 | 0 | 25 |
N | 53 | 5 | 2 | 0 | 94 | 154 |
Using the maximum prediction of cancer likelihood generated by NNs trained on FPHW, FP, or HW inputs with autonomous k-means classification as the gold standard, the original histopathology quintiles are cross-matched to five (5) equally spaced cancer probability bins. 147 of 154 spectra (95.4%) were assigned to end member quintiles: 0.0 ≤ P < 0.2 (N = 53, tumor unlikely) and 0.8 ≤ P ≤ 1.0 (N = 94, tumor highly likely). No targets received a probability score in the 0.6 ≤ P < 0.8 range. The average spectra for each of the other four probability quintiles appear in Fig. 8. 24 of 25 spectra (96%) classified by histopathology as healthy did not receive a cancer prediction > 0.2 from the autonomous classification. The one spectra in the 0.2 ≤ P < 0.4 bin received a maximum score of P = 0.21. For the 49 spectra obtained from regions deemed 100% tumor-rich by histopathology 45 (91.8%) were placed in the highest probability quintile 0.8 ≤ P < 1.0 bin. Two spectra were considered highly likely to originate in healthy tissue (0.0 ≤ P < 0.2) and two others appeared in the 0.2 ≤ P < 0.4 bin.
The data presented here indicate that a panel of unsupervised, autonomous, stochastic nonlinear neural networks trained on both broad and focused infrared laser Raman spectroscopy data, can provide an operating team with Bayesian probability estimates that a tissue region contains cancerous cells. The networks all learn from spectral bands rich in information from four of the major cellular constituents: DNA|RNA, protein, carbohydrate, and lipid. However, for 7 out of the 203 spectra evaluated (~ 3.4%), the NNs trained on full spectrum, fingerprint region, and high wavenumber bands showed significant variability in their estimation of cancer likelihood. For conservative, real time management of surgical patients, the availability of multiple estimates of the likelihood of tumor provides the surgical team with a broader safety net. The NNs trained on high wavenumber data appear particularly sensitive to loss of signals from the destruction of C–H bonds, a potential early warning sign of the chaotic disruption of cell structure. However, a “loss of signal” signature can leave the surgeon uncertain about whether this is a global defect in signal acquisition, or a true indicator of tissue damage. The appearance of a strong increase in signal intensity for bands attributable in DNA|RNA, and/or to protein detected in the fingerprint region of the spectra can provide critical information in determining the origin of the high wavenumber signal collapse. For several spectra the alteration in nucleotide and protein signals was the only warning signal that would have been available to the clinician.
Discussion
There are two succinct takeaways from the data reported here. (1) There is a strong correlation between NNs trained by autonomous k-means spectral classification and by histopathology “gold standards”. (2) The detection of what may be subtle early signs of tumor by the HW NNs is noteworthy, but not unexpected. LRS high wavenumber measurements have previously been shown to be reliable markers of breast cancer cell evolution in immortalized, transformed, and invasive cells31, and can accurately estimate the loss of lipid content during in vivo evaluation of breast cancer progression44. Evaluation of our spectra revealed the decrease in HW flux warning of alterations in C–H bonds in DNA|RNA, protein, carbohydrates, and lipids. C–H bonds are one of the most abundant Raman targets in human tissue and one of the bonds the most easily altered during cellular growth, exposure to high energy radiation, or changes in pH, acidity, or hydration. The subtle changes in C–H signatures detected by HW can apparently be masked in the FPHW analysis by strong lipid signals in the fingerprint region.
It should be noted that all three NNs show a significant range of probabilities for spectra obtained from both healthy and tumor-rich regions. It should not be surprising that mixed signatures occur when probing aggressive tumors in vivo. The laser probes commonly used for LRS acquire their data from a tissue volume approximately 100 μm in diameter. Breast cancer cells observed during histological evaluation in this study range in diameter from 10 to 20 μm and are often interspersed with relatively healthy collagenous and lipid tissues, particularly in boundary conditions.
In this communication, we report that autonomous machine algorithms can mine the information content of infrared Raman spectra to distinguish healthy from cancer-containing breast tissue. The evaluation can be accomplished on a time scale of minutes instead of the days to weeks required at present for histopathological evaluation. We identify nine spectral regions useful in comparing DNA|RNA, protein, carbohydrate, and lipid cellular components of healthy and cancer cells. Six of these bands are in the Raman spectrum fingerprint region (FP) (600–1800 cm−1) and three are in the high wavenumber (HW) region (2800–3000 cm−1). Three stochastic nonlinear NNs were trained on either FP, HW, or FPHW bands to estimate the Bayesian probability that a spectrum exhibits changes expected in cancer tissue. To demonstrate the possibility of replacing a two-week wait for histopathology with a real-time tumor detection using only spectra, training was accomplished with two gold standards. In a first experiment, the three NNs used histopathology diagnostics as their gold standard for training. In the second experiment, the autonomous classification of the spectra by k-means produced the gold standard. In both experiments one of the three NNs provided a broadband analysis of the spectral data using all nine bands. The two other networks focused on data from only the FP (6 bands) or HW (3 bands) regions. Our data indicate that loss of signal in HW bands can serve as an early warning marker of tissue destruction, while the relative intensity of two strong FP bands may be particularly sensitive to the shifts in RNA, DNA, and protein composition characteristic of proliferating breast cancer. Finally, we demonstrate that even though ~ 96% of the time any of the three NNs can distinguish between healthy and tumor tissue, for 7 of 203 spectra only the availability of data from all three would have ensured a detection of tumor activity. Our data indicate that without the multi-network approach described here, critical early diagnostic signs may be hidden in analyses that rely only on full spectrum data or on data from only FP or HW regions. The ability to (1) poll the disparate diagnostic strengths of multiple algorithms and (2) assess stochastic NNs prediction certainty, adds a needed level of transparency to the interaction between machine algorithm and practicing surgeon. It also provides a way forward for laboratory experiments designed to identify the fundamental biomolecular shifts occurring in tumor evolution.
There are limitations when testing this technique. An H&E stained tissue sample is a micrometer scale 2-D sampling (a 4 μm thick tissue slice) of a centimeter scale 3-D resected tissue. In the protocol for this study pathology laboratory personnel are not asked to deviate from standard clinical evaluation. 2-D orientation of the sample is maintained, but z-axis depth for extraction of the slide material is not constrained. The infrared LRS probe samples a site approximately 0.5 to 1.0 mm deep into the tissue, a depth that can be slightly above or below the H&E stained tissue (most likely above). Solving the disconnect between the 3-D and 2-D technologies will require modification of clinical protocols to significantly increase H&E sampling along the z-axis. This technical issue is beyond the scope of the current study and does not affect the autonomous machine language diagnostic technique central to these experiments. This study utilizes tissue biomarkers from regions that are visible to the surgeon and used in the post-operative pathology assessment, therefore this study looks at information from tissue that is used in the current accepted standard of care. Machine learning algorithms employed over the last decade in LRS breast cancer investigations have often not provided two critical pieces of information important to the practicing surgeon: a probability that a classification is correct and the expected error in that probability. Stochastic backpropagation artificial neural networks inherently provide both pieces of information not simply for clusters of data, but specifically for each tissue site examined by LRS.
Methods
Raman instrumentation
The Raman instrumentation has been described in a previous communication43. All experiments employed B&W Tek’s 785 nm system, the i-Raman Plus. The i-Raman Plus uses a high quantum efficiency 2048-pixel CCD array detector with a spectral resolution of 4.5 cm−1 and a spectral coverage range from 147–3350 cm−1. The detector cooled temperature is − 2 °C with a typical dynamic range of 50,000:1. The effective pixel size is 14 μm × 9 μm. The integration time and exposure time is 30 s (single accumulation) for each sample and the laser power at sample is 100mW.
The system is highly portable. The spectrometer housing connects via fiber optic cables to the BAC102 Raman Trigger Probe. The probe has a spot size of 50–85 μm. The BAC150B probe holder can be used to stabilize the probe for benchtop data collection, or the probe can be handheld during use in the surgical field. Alternatively, the probe can be integrated with B&W Tek’s BAC151B video sampling system, or the BAC104 adapter can be used to integrate the probe with a standard laboratory Olympus microscope. All data discussed in this article were taken using the standard BAC150B probe.
Tissue preparation and histology
Tissue samples were collected following surgical resection under Institutional Review Board (IRB) protocol (#16317) at City of Hope in Duarte, California (VJ, LL, YF). All experimental protocols used in this study were approved by City of Hope IRB. All methods were carried out in accordance with relevant guidelines and regulations. Informed consent was obtained from all patients.
Following resection, samples were immediately frozen and stored at − 80 °C. For Raman spectral analysis, samples were thawed ~ 5–10 min before data collection. There are no histopathology or spectral alterations indicative of freeze artifacts. Once spectral data were obtained, the sample was formalin fixed and paraffin embedded, and standard H&E-stained slides were prepared. These slides were digitally scanned at 20X resolution using a Ventana iScan HT whole slide scanner (Roche Holding AG, Basel, Switzerland) and the resulting whole slide images were viewed using QuPath open source imaging software (Queen’s University Belfast, Belfast Northern Ireland, UK). Areas from which spectra were obtained were correlated with the H&E findings.
There was one tissue sample per patient and spectra were systematically taken from all regions of the tissue sample. Histopathology was obtained for all tissue samples. A subset of 154 regions (1 mm2) of H&E tissue stains centered around laser target site were interrogated using QuPath image processing software by expert pathologist to generate a quintile assessment of cancer involvement (0%, 25%, 50%, 75% and 100% tumor).
Spectral preprocessing
The raw spectral data files were extracted from the B&W Tek’s software. Standard preprocessing methods for Raman spectral data were accomplished in MATLAB 2017b45,46 as follows:
The region between 147–600 cm−1 was removed since it exhibits fluorescence from the fiber optic feedback loop and bleedthrough from the laser.
Fluorescence correction was implemented by calculating a baseline for the spectra by using an asymmetrically reweighted penalized least squares smoothing algorithm, arPLS, developed by Baek and coworkers47. This method utilizes an iterative process to determine the noise and correspondingly updates the weights in order to calculate a baseline. This calculated baseline was subtracted from the intensity values of the raw spectrum in order to perform a fluorescence removal. The arPLS parameters are lambda = 105 and ratio = 10–3.
Spectra were smoothed by Savitzky-Golay algorithm with a sliding window encompassing 7 bands48.
Each spectrum was centered around its’ mean by subtracting the mean of the spectrum from every data point in the spectrum.
Finally, each spectrum was divided by its Euclidean norm in order to normalize each spectrum to a vector of length 1.
For publication plotting purposes, we have removed the dead zone between 1800 and 2800 cm−1 for all figures.
Neural networks to generate bayesian estimate of cancer
Bayesian probability theory presents a formalized methodology for establishing the likelihood that any particular observation can be correctly included in a specific class of event49.
Bayesian theory is a fundamental tool used to quantify how human experts generate reproducible classification decisions. Machine learning algorithms known as neural networks (NNs), have become a widely used tool for turning Bayesian theory into practical application. NNs were originally implemented as simple optimization algorithms modeled on signal processing characteristics of the human brain50–54. NNs compress data and extract discriminatory features from a data set in a manner similar to PCA. NNs can also model non-linear interactions and distinguish classes that are not linearly separable, feats beyond the capabilities of PCA. However, the most powerful feature of stochastic nonlinear NNs is their ability to not only provide target classification, but also generate a Bayesian probability estimate of the correctness of their decision for each individual target. NNs constructed as stochastic back propagation algorithms were predicted theoretically55,56, and then shown experimentally to be57, robust Bayesian estimators.
Neural network architecture and computational methods used in this study have been described previously58–61. All networks in the experiments reported here are constructed and run in MATLAB 2017b. It is important to realize that the NN will provide a probability of class inclusion for all proposed classes. Additionally, if the human expert classification is in error, NNs prove remarkably adept at disagreeing, i.e. they will provide a data-driven, best classification estimate even if the ‘gold standard’ diagnosis is incorrect. For a formal review of Bayesian probability theory and neural network estimation of Bayesian probabilities, see Supplementary material.
Comparing gold standards: training NNs using histopathology versus autonomous k-means spectral classification
During training of a stochastic backpropagation algorithm an output “gold standard” must be provided. For the first experiment we trained NNs using first histopathology classification and then autonomous k-means classification of Raman spectra as the gold standard. In the first case, NN output node values are compared to the expert opinion—in this study, the pathologist's evaluation of standard H&E stained post-operative slides. The pathologist (DS) examined 154 images of tissue in a 1 mm2 area centered around the target site for the Raman probe. The fraction of tissue deemed to contain at least some cancer was recorded as 0, 25, 50, 75, or 100%. A total of 74 targets were determined to be either completely devoid of tumor (T = 0%, n = 25) or entirely involved by tumor (T = 100%, n = 49).
For NN training and evaluation this data subset of 74 targets were assigned output vectors of [1,0] for tumor sites, [0,1] for apparently healthy regions according to, first, the histopathological diagnostic and, second, according to the autonomous classification by k-means.
Since it is not possible to retrieve histological data for each spectral sample, unsupervised autonomous algorithms are explored in this study. For autonomous classification we replace the tentative class membership assignment derived from histopathology with direct unsupervised examination of each Raman spectrum using a machine learning clustering algorithm, k-means62. The k-means algorithm simply starts with k groups, each consisting of a single random point, and then adds each new point to the group with the mean nearest to the location of the new point. After a point is added to a group, the mean of that group is recalculated to incorporate the new point. At each step the k-means are, in fact, the means of the groups they represent, hence the algorithm is known as k-means. In this study, the k-means classification was implemented in MATLAB using the squared Euclidean distance metric and the k-means ++ algorithm for cluster center initialization45. Once the algorithm has tentatively assigned each spectrum to a cluster, stochastic backpropagation NNs then generate the Bayesian probability that the k-means clustering has successfully identified the appropriate class just as they did when using the histopathology diagnostic as the gold standard.
To test an entire data set, a leave-one-out round-robin procedure using multiple nets is employed in all experiments. In this strategy all the data are used for training except for one spectrum. That spectrum is then used as a test sample for the trained network. The training and testing are repeated cycling through all members of the data set until all spectra have been classified by networks that have not seen the test spectrum during training.
We implemented this technique by modifying the MATLAB 2017b Neural Network Pattern Recognition toolbox with 9, 6, or 3 input nodes, 3 nodes in second layer and 2 output layer nodes for NNs FPHW, FP, and HW, respectively. The implementation employed the scaled conjugate backpropagation algorithm and Softmax transfer function for the output layer. Following training and testing of the six stochastic backpropagation NNs to compare the histopathology and autonomous gold standards, the autonomous classification using k-means and NNs was extended to the full spectral data set (n = 203, 8 patients).
Supplementary Information
Acknowledgements
Research reported in this publication included work performed in the Pathology Core supported by the National Cancer Institute of the National Institutes of Health under Award No. P30CA033572. The content of this manuscript is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The authors particularly thank, in memoriam, our colleague Willie C. Zúñiga, for his insights into the fundamental physics of the Raman event, his unstinting commitment to this project, and his untiring support and encouragement. He is sorely missed each day.
Author contributions
Conceptualization and study design: R.K., V.J., D.M., V.B.R., Y.S., J.P.S., D.S., P.D.C., L.L., Y.F., M.C.S.-L. Data collection: R.K., D.M., V.B.R., Y.S., J.P.S. Data analysis and interpretation: R.K., V.J., D.M., D.S., Y.F., M.C.S.-L. Manuscript writing: R.K., V.J., D.S., P.D.C., Y.F., M.C.S.-L.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-021-85758-6.
References
- 1.Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2018;68:394–424. doi: 10.3322/caac.21492. [DOI] [PubMed] [Google Scholar]
- 2.Donepudi MS, Kondapalli K, Amos SJ, Venkanteshan P. Breast cancer statistics and markers. J. Cancer Res. Ther. 2014;10(3):506–511. doi: 10.4103/0973-1482.137927. [DOI] [PubMed] [Google Scholar]
- 3.Fisher S, Yasui Y, Dabbs K, Winget M. Re-excision and survival following breast conserving surgery in early stage breast cancer patients: a population-based study. BMC Health Serv. Res. 2018;18:1–10. doi: 10.1186/s12913-018-2882-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Kahlert S, Kolben TM, Schmoeckel E, Czogalla B, Hester A, Degenhardt T, Kempf C, Mahner S, Harbeck N, Kolben T. Prognostic impact of residual disease in simultaneous additional excision specimens after one-step breast conserving therapy with negative final margin status in primary breast cancer. EJSO. 2018;44(9):1318–1323. doi: 10.1016/j.ejso.2018.06.014. [DOI] [PubMed] [Google Scholar]
- 5.Nayyar A, Gallagher KK, McGuire KP. Definition and management of positive margins for invasive breast cancer. Surg. Clin. N. Am. 2018;98(4):761–771. doi: 10.1016/j.suc.2018.03.008. [DOI] [PubMed] [Google Scholar]
- 6.Van de Vijver K, Kho E, de Boer L, Sterenborg H, Ruers T. Hyperspectral optical imaging for intraoperative margin assessment during breast cancer surgery. Virchows Arch. 2018;473:S200–S200. [Google Scholar]
- 7.Kang JH, Youk JH, Kim JA, Gweon HM, Eun NL, Ko KH, Son EJ. Identification of preoperative magnetic resonance imaging features associated with positive resection margins in breast cancer: a retrospective study. Korean J. Radiol. 2018;19(5):897–904. doi: 10.3348/kjr.2018.19.5.897. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kosehan D, Dener C, Akin K, Bozkurt A, Bilgic I, Cakir B. The value of preoperative lesion dedicated ultrasound of breast cancer before conserving surgery for optimizing margins. Breast J. 2017;23(2):159–163. doi: 10.1111/tbj.12711. [DOI] [PubMed] [Google Scholar]
- 9.Pence IJ, Patil CA, Lieber CA, Mahadevan-Jansen A. Discrimination of liver malignancies with 1064 nm dispersive Raman spectroscopy. Biomed. Opt. Express. 2015;6(8):2724–2737. doi: 10.1364/BOE.6.002724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Tolstik T, Marquardt C, Matthaus C, Bergner N, Bielecki C, Krafft C, Stallmach A, Popp J. Discrimination and classification of liver cancer cells and proliferation states by Raman spectroscopic imaging. Analyst. 2014;139(22):6036–6043. doi: 10.1039/C4AN00211C. [DOI] [PubMed] [Google Scholar]
- 11.Cals FLJ, Schut TCB, Hardillo JA, de Jong RJB, Koljenovic S, Puppels GJ. Investigation of the potential of Raman spectroscopy for oral cancer detection in surgical margins. Lab. Invest. 2015;95(10):1186–1196. doi: 10.1038/labinvest.2015.85. [DOI] [PubMed] [Google Scholar]
- 12.Carvalho, L. F. C. S. Bonnier, F., O'Callaghan, K., O'Sullivan, J., Flint, S., Neto, L. P. M., Soto, C. A. T., Dos Santos, L., Martin, A. A., Byrne, H. J., Lyng, F. M. Raman spectroscopic analysis of oral squamous cell carcinoma and oral dysplasia in the highwavenumber region. In Proc. SPIE 9531, Biophotonics South America, 953125 (2015); 10.1117/12.2180996
- 13.Chen P-H, Shimada R, Yabumoto S, Okajima H, Ando M, Chang C-T, Lee L-T, Wong Y-K, Chiou A, Hamaguchi H-O. Automatic and objective oral cancer diagnosis by Raman spectroscopic detection of keratin with multivariate curve resolution analysis. Sci. Rep. 2016;6:1–9. doi: 10.1038/s41598-016-0001-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Malini R, Venkatakrishna K, Kurien J, Pai KM, Rao L, Kartha VB, Krishna CM. Discrimination of normal, inflammatory, premalignant, and malignant oral tissue: A Raman spectroscopy study. Biopolymers. 2006;81(3):179–193. doi: 10.1002/bip.20398. [DOI] [PubMed] [Google Scholar]
- 15.Singh SP, Sahu A, Deshmukh A, Chaturvedi P, Krishna CM. In vivo Raman spectroscopy of oral buccal mucosa: a study on malignancy associated changes (MAC)/cancer field effects (CFE) Analyst. 2013;138(14):4175–4182. doi: 10.1039/c3an36761d. [DOI] [PubMed] [Google Scholar]
- 16.Kast RE, Tucker SC, Killian K, Trexler M, Honn KV, Auner GW. Emerging technology: applications of Raman spectroscopy for prostate cancer. Cancer Metastasis Rev. 2014;33(2–3):673–693. doi: 10.1007/s10555-013-9489-6. [DOI] [PubMed] [Google Scholar]
- 17.Patel II, Martin FL. Discrimination of zone-specific spectral signatures in normal human prostate using Raman spectroscopy. Analyst. 2010;135(12):3060–3069. doi: 10.1039/c0an00518e. [DOI] [PubMed] [Google Scholar]
- 18.Silveira L, Leite KRM, Silveira FL, Srougi M, Pacheco MTT, Zangaro RA, Pasqualucci CA. Discrimination of prostate carcinoma from benign prostate tissue fragments in vitro by estimating the gross biochemical alterations through Raman spectroscopy. Lasers Med. Sci. 2014;29(4):1469–1477. doi: 10.1007/s10103-014-1550-3. [DOI] [PubMed] [Google Scholar]
- 19.Wang L, He DL, Zeng J, Guan ZF, Dang Q, Wang XY, Wang J, Huang LQ, Cao PL, Zhang GJ, Hsieh J, Fan JH. Raman spectroscopy, a potential tool in diagnosis and prognosis of castration-resistant prostate cancer. J. Biomed. Opt. 2013;18(8):087001. doi: 10.1117/1.JBO.18.8.087001. [DOI] [PubMed] [Google Scholar]
- 20.Happillon T, Untereiner V, Beljebbar A, Gobinet C, Daliphard S, Cornillet-Lefebvre P, Quinquenel A, Delmer A, Troussard X, Klossa J, Manfait M. Diagnosis approach of chronic lymphocytic leukemia on unstained blood smears using Raman microspectroscopy and supervised classification. Analyst. 2015;140(13):4465–4472. doi: 10.1039/C4AN02085E. [DOI] [PubMed] [Google Scholar]
- 21.Manago S, Valente C, Mirabelli P, Circolo D, Basile F, Corda D, De Luca AC. A reliable Raman-spectroscopy-based approach for diagnosis, classification and follow-up of B-cell acute lymphoblastic leukemia. Sci. Rep. 2016;6:1–13. doi: 10.1038/srep24821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.de Carvalho L, Sato ET, Almeida JD, Martinho HD. Diagnosis of inflammatory lesions by high-wavenumber FT-Raman spectroscopy. Theoret. Chem. Acc. 2011;130(4–6):1221–1229. doi: 10.1007/s00214-011-0972-2. [DOI] [Google Scholar]
- 23.Haka AS, Sue E, Zhang C, Bhardwaj P, Sterling J, Carpenter C, Leonard M, Manzoor M, Walker J, Aleman JO, Gareau D, Holt PR, Breslow JL, Zhou XK, Giri D, Morrow M, Iyengar N, Barman I, Hudis CA, Dannenberg AJ. Noninvasive detection of inflammatory changes in white adipose tissue by label-free Raman spectroscopy. Anal. Chem. 2016;88(4):2140–2148. doi: 10.1021/acs.analchem.5b03696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Karimbabanezhadmamaghani P. Cell death dynamics monitoring using Raman micro-spectroscopy. University of British Columbia; 2015. [Google Scholar]
- 25.Ong YH, Lim M, Liu Q. Comparison of principal component analysis and biochemical component analysis in Raman spectroscopy for the discrimination of apoptosis and necrosis in K562 leukemia cells. Opt. Express. 2012;20(20):22158–22171. doi: 10.1364/OE.20.022158. [DOI] [PubMed] [Google Scholar]
- 26.Baker R, Matousek P, Ronayne KL, Parker AW, Rogers K, Stone N. Depth profiling of calcifications in breast tissue using picosecond Kerr-gated Raman spectroscopy. Analyst. 2007;132(1):48–53. doi: 10.1039/B614388A. [DOI] [PubMed] [Google Scholar]
- 27.Haka AS, Shafer KE, Fitzmaurice M, Dasari RR, Feld MS. Distinguishing type II microcalcifications in benign and malignant breast lesions using Raman spectroscopy. Lab. Invest. 2002;82(1):36A–36A. [PubMed] [Google Scholar]
- 28.Haka AS, Shafer-Peltier KE, Fitzmaurice M, Crowe J, Dasari RR, Feld MS. Identifying differences in microcalcifications in benign and malignant breast lesions by probing differences in their chemical composition using Raman spectroscopy. Cancer Res. 2002;62:5375–5380. [PubMed] [Google Scholar]
- 29.Kerssens MM, Matousek P, Rogers K, Stone N. Towards a safe non-invasive method for evaluating the carbonate substitution levels of hydroxyapatite (HAP) in micro-calcifications found in breast tissue. Analyst. 2010;135:3156–3161. doi: 10.1039/c0an00565g. [DOI] [PubMed] [Google Scholar]
- 30.Sathyavathi R, Saha A, Soares JS, Spegazzini N, McGee S, Dasari RR, Fitzmaurice M, Barman I. Raman spectroscopic sensing of carbonate intercalation in breast microcalcifications at stereotactic biopsy. Sci. Rep. 2015;5:1–12. doi: 10.1038/srep09907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Chaturvedi D, Balaji SA, Bn VK, Ariese F, Umapathy S, Rangarajan A. Different phases of breast cancer cells: Raman study of immortalized, transformed, and invasive cells. Biosens. Basel. 2016;6(4):57. doi: 10.3390/bios6040057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Abramczyk H, Brozek-Pluska B. New look inside human breast ducts with Raman imaging. Raman candidates as diagnostic markers for breast cancer prognosis: mammaglobin, palmitic acid and sphingomyelin. Anal. Chim. Acta. 2016;909:91–100. doi: 10.1016/j.aca.2015.12.038. [DOI] [PubMed] [Google Scholar]
- 33.De Gelder J, Gussem KD, Vandenabeele P, Moens L. Reference database of Raman spectra of biological molecules. J. Raman Spectrosc. 2007;38:1133–1147. doi: 10.1002/jrs.1734. [DOI] [Google Scholar]
- 34.Movasaghi Z, Rehman S, Rehman IU. Raman spectroscopy of biological tissues. Appl. Spectrosc. Rev. 2007;42(5):493–541. doi: 10.1080/05704920701551530. [DOI] [Google Scholar]
- 35.Meksiarun P, Ishigaki M, Huck-Pezzei VAC, Huck CW, Wongravee K, Sato H, Ozaki Y. Comparison of multivariate analysis methods for extracting the paraffin component from the paraffin-embedded cancer tissue spectra for Raman imaging. Sci. Rep. 2017;7:1–10. doi: 10.1038/srep44890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Isabelle M, Dorney J, Lewis A, Lloyd GR, Old O, Shepherd N, Rodriguez-Justo M, Barr H, Lau K, Bell I, Ohrel S, Thomas G, Stone N, Kendall C. Multi-centre Raman spectral mapping of oesophageal cancer tissues: a study to assess system transferability. Faraday Discuss. 2016;187:87–103. doi: 10.1039/C5FD00183H. [DOI] [PubMed] [Google Scholar]
- 37.Haka AS, Shafer-Peltier KE, Fitzmaurice M, Crowe J, Dasari RR, Feld MS. Diagnosing breast cancer by using Raman spectroscopy. Proc. Natl. Acad. Sci. USA. 2005;102(35):12371–12376. doi: 10.1073/pnas.0501390102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Haka AS, Volynskaya Z, Gardecki JA, Nazemi J, Lyons J, Hicks D, Fitzmaurice M, Dasari RR, Crowe JP, Feld MS. In vivo margin assessment during partial mastectomy breast surgery using Raman spectroscopy. Can. Res. 2006;66(6):3317–3322. doi: 10.1158/0008-5472.CAN-05-2815. [DOI] [PubMed] [Google Scholar]
- 39.Haka AS, Volynskaya Z, Gardecki JA, Nazemi J, Shenk R, Wang N, Dasari RR, Fitzmaurice M, Feld MS. Diagnosing breast cancer using Raman spectroscopy: prospective analysis. J. Biomed. Opt. 2009;14(5):054023. doi: 10.1117/1.3247154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Shafer-Peltier KE, Haka AS, Fitzmaurice M, Crowe J, Dasar RR, Feld MS. Raman microspectroscopic model of human breast tissue: Implications for breast cancer diagnosis in vivo. J. Raman Spectrosc. 2002;33:552–563. doi: 10.1002/jrs.877. [DOI] [Google Scholar]
- 41.Shafer-Peltier, K. E.; Haka, A. S.; Fitzmaurice, M.; Crowe, J.; Myles, J.; Dasari, R. R.; Feld, M. S., Chemical basis for breast cancer diagnosis using Raman spectroscopy. Lasers Surg. Med.2002, 2–2.
- 42.Shafer-Peltier KE, Haka AS, Motz JT, Fitzmaurice M, Dasari RR, Feld MS. Model-based biological Raman spectral imaging. J. Cell. Biochem. 2002;87:125–137. doi: 10.1002/jcb.10418. [DOI] [PubMed] [Google Scholar]
- 43.Zúñiga WC, Jones V, Anderson SM, Echevarria A, Miller NL, Stashko C, Schmolze D, Cha PD, Kothari R, Fong Y, Storrie-Lombardi MC. Raman spectroscopy for rapid evaluation of surgical margins during breast cancer lumpectomy. Sci. Rep. 2019;9:1–16. doi: 10.1038/s41598-019-51112-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Garcia-Flores AF, Raniero L, Canevari RA, Jalkanen KJ, Bitar RA, Martinho HS, Martin AA. High-wavenumber FT-Raman spectroscopy for in vivo and ex vivo measurements of breast cancer. Theoret. Chem. Acc. 2011;130(4–6):1231–1238. doi: 10.1007/s00214-011-0925-9. [DOI] [Google Scholar]
- 45.MATLAB and Neural Network Pattern Recognition Release . The MathWorks Inc: Natick. Massachusetts; 2017. p. 2017. [Google Scholar]
- 46.Greene, C. errbar.m. https://www.mathworks.com/matlabcentral/fileexchange/50472-errbar.
- 47.Baek S-J, Park A, Ahn Y-J, Choo J. Baseline correction using asymmetrically reweighted penalized least squares smoothing. Analyst. 2015;140:250–257. doi: 10.1039/C4AN01061B. [DOI] [PubMed] [Google Scholar]
- 48.Savitzky A, Golay MJE. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 1964;36(8):1627–1639. doi: 10.1021/ac60214a047. [DOI] [Google Scholar]
- 49.Bayes T. Essay towards solving a problem in the doctrine of chances. Philos. Trans. R. Soc. London. 1763;53:370–418. doi: 10.1098/rstl.1763.0053. [DOI] [PubMed] [Google Scholar]
- 50.Kohonen T. Self-Organization and Associative Memory. Springer-Verlag; 1977. [Google Scholar]
- 51.Hopfield JJ, Tank DW. Computing with neural circuits: a model. Science. 1986;233:625–633. doi: 10.1126/science.3755256. [DOI] [PubMed] [Google Scholar]
- 52.Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature. 1986;323:533–536. doi: 10.1038/323533a0. [DOI] [Google Scholar]
- 53.Adorf HM. Connectionism in neural networks. In: Heck A, Murtagh F, editors. Knowledge-Based Systems in Astronomy. Springer-Verlag; 1989. pp. 215–245. [Google Scholar]
- 54.Hinton GE. Connectionist Symbol Processing. MIT Press; 1991. [Google Scholar]
- 55.Gish, H. In A probabilistic approach to the understanding and training of neural network classifiers, In: IEEE Confrence on Acoustics Speech and Signal Processing, 1990; Institute of Electrical and Electronic Engineering 1990.
- 56.Richard MD, Lippmann RP. Neural network classifiers estimate Bayesian a-posteriori probabilities. Neural Comput. 1991;3:461–483. doi: 10.1162/neco.1991.3.4.461. [DOI] [PubMed] [Google Scholar]
- 57.Storrie-Lombardi MC, Lahav O, Sodre L, Storrie-Lombardi LJ. Morphological classification of galaxies by artificial neural networks. Mon. Not. R. Astron. Soc. 1992;259:8–12. doi: 10.1093/mnras/259.1.8P. [DOI] [Google Scholar]
- 58.Dorn ED, McDonald GD, Storrie-Lombardi MC, Nealson KH. Principal component analysis and neural networks for detection of amino acid biosignatures. Icarus. 2003;166(2):403–409. doi: 10.1016/j.icarus.2003.08.011. [DOI] [Google Scholar]
- 59.Storrie-Lombardi MC, Fisk MR. Elemental abundance distributions in suboceanic basalt glass: evidence of biogenic alteration. Geochem. Geophys. Geosyst. 2004;5(10):Q10005. doi: 10.1029/2004GC000755. [DOI] [Google Scholar]
- 60.Storrie-Lombardi, M. C.; Hoover, R. B.; Abbas, M.; Jerman, G.; Coston, J.; Fisk, M., Probabilistic classification of elemental abundance distributions in Nakhla and Apollo 17 lunar dust samples: art. no. 630906. In Instruments, Methods, and Missions for Astrobiology IX, Hoover, R. B.; Levin, G. V.; Rozanov, A. Y., Eds. SPIE: Bellingham, 2006; Vol. 6309, pp 1–10.
- 61.Storrie-Lombardi, M. C.; Lambert, L. J.; Borchert, M. S.; Kimura, A.; Roseto, J.; Bing, J. In Measuring aqueous humor glucose across physiological levels: NIR Raman spectroscopy, multivariate analysis, artificial neural networks, and Bayesian probabilities, International Conference on Environmental Systems, http://hdl.handle.net/2014/20218, Davers, MA, U.S.A., SAE: Davers, MA, U.S.A., 1998; pp. 146–151.
- 62.MacQueen J. On convergence of k-means and partitions with minimum average variance. Ann. Math. Stat. 1965;36(3):1084–2000. [Google Scholar]
- 63.Stone N, Kendall C, Smith J, Crow P, Barr H. Raman spectroscopy for identification of epithelial cancers. Faraday Discuss. 2004;126:141–157. doi: 10.1039/b304992b. [DOI] [PubMed] [Google Scholar]
- 64.Stone N, Kendell C, Shepherd N, Crow P, Barr H. Near-infrared Raman spectroscopy for the classification of epithelial pre-cancers and cancers. J. Raman Spectrosc. 2002;33:564–573. doi: 10.1002/jrs.882. [DOI] [Google Scholar]
- 65.Farguharson S, Shende C, Inscore FE, Maksymiuk P, Gift A. Analysis of 5-fluorouracil in saliva using surface-enhanced Raman spectroscopy. J. Raman Spectrosc. 2005;36:208–212. doi: 10.1002/jrs.1277. [DOI] [Google Scholar]
- 66.Ruiz-Chica AJ, Medina MA, Sanchez-Jimenez F, Ramirez FJ. Characterization by Raman spectroscopy of conformational changes on guaninecytosine and adenine-thymine oligonucleotides induced by aminooxy analogues of spermidine. J. Raman Spectrosc. 2004;35:93–100. doi: 10.1002/jrs.1107. [DOI] [Google Scholar]
- 67.Binoy J, Abraham JP, Joe IH, Jayakumar VS, Petit GR, Nielsen OF. NIR-FT Raman and FT-IR spectral studies and ab initio calculations of the anti-cancer drug combretastatin-A4. J. Raman Spectrosc. 2004;35:939–946. doi: 10.1002/jrs.1236. [DOI] [Google Scholar]
- 68.Chan JW, Taylor DS, Zwerdling T, Lane ST, Ihara K, Huser T. Micro-Raman spectroscopy detects individual neoplastic and normal hematopoietic cells. Biophys. J. 2006;90:648–656. doi: 10.1529/biophysj.105.066761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Notingher I, Green C, Dyer C. Discrimination between ricin and sulphur mustard toxicity in vitro using Raman spectroscopy. J. R. Soc. Interface. 2004;1:79–90. doi: 10.1098/rsif.2004.0008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Cheng WT, Liu MT, Liu HN, Lin SY. Micro-Raman spectroscopy used to identify and grade human skin pilomatrixoma. Microsc. Res. Tech. 2005;68:75–79. doi: 10.1002/jemt.20229. [DOI] [PubMed] [Google Scholar]
- 71.Shetty G, Kendall C, Shepherd N, Stone N, Barr H. Raman spectroscopy: evaluation of biochemical changes in carcinogenesis of oesophagus. Br. J. Cancer. 2006;94:1460–1464. doi: 10.1038/sj.bjc.6603102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Gniadecka M, Wulf HC, Mortensen NN, Nielsen OF, Christensen DH. Diagnosis of basal cell carcinoma by Raman spectroscopy. J. Raman Spectrosc. 1997;28:125–129. doi: 10.1002/(SICI)1097-4555(199702)28:2/3<125::AID-JRS65>3.0.CO;2-#. [DOI] [Google Scholar]
- 73.Krafft C, Neudert L, Simat T, Salzer R. Near infrared Raman spectra of human brain lipids. Spectrochim. Acta Part A. 2005;61:1529–1535. doi: 10.1016/j.saa.2004.11.017. [DOI] [PubMed] [Google Scholar]
- 74.Hanlon EB, Manoharan R, Koo TW, Shafer KE, Motz JT, Fitzmaurice M, Kramer JR, Itzkan I, Dasari RR, Feld MS. Prospects for in vivo Raman spectroscopy. Phys. Med. Biol. 2000;45:1–59. doi: 10.1088/0031-9155/45/2/201. [DOI] [PubMed] [Google Scholar]
- 75.Dukor RK. Vibrational spectroscopy in the detection of cancer. Biomed. Appl. 2002;5:3335–3359. [Google Scholar]
- 76.Huang Z, McWilliams A, Lui M, McLean DI, Lam S, Zeng H. Near-infrared Raman spectroscopy for optical diagnosis of lung cancer. Int. J. Cancer. 2003;107:1047–1052. doi: 10.1002/ijc.11500. [DOI] [PubMed] [Google Scholar]
- 77.Lakshimi RJ, Kartha VB, Krishna CM, Solomon JGR, Ullas G, Uma Devi P. Tissue Raman spectroscopy for the study of radiation damage: brain irradiation of mice. Radiat. Res. 2002;157:175–182. doi: 10.1667/0033-7587(2002)157[0175:TRSFTS]2.0.CO;2. [DOI] [PubMed] [Google Scholar]
- 78.Kateinen E, Elomaa M, Laakkonen UM, Sippola E, Niemela P, Suhonen J, Jarninen K. Qualification of the amphetamine content in seized street samples by Raman spectroscopy. J. Forensic Sci. 2007;52(1):88–92. doi: 10.1111/j.1556-4029.2006.00306.x. [DOI] [PubMed] [Google Scholar]
- 79.Frank CJ, McCreecy RL, Redd DCB. Raman spectroscopy of normal and diseased human breast tissues. Anal. Chem. 1995;67:777–783. doi: 10.1021/ac00101a001. [DOI] [PubMed] [Google Scholar]
- 80.Silveira L, Sathaiah S, Zângaro RA, Pacheco MT, Chavantes MC, Pasqualucci CA. Correlation between nearinfrared Raman spectroscopy and the histopathological analysis of atherosclerosis in human coronary arteries. Lasers Surg. Med. 2002;30:290–297. doi: 10.1002/lsm.10053. [DOI] [PubMed] [Google Scholar]
- 81.Huang ZW, McWilliams A, Lui H, McLean DI, Lam S, Zeng HS. Near-infrared Raman spectroscopy for optical diagnosis of lung cancer. Int. J. Cancer. 2003;107(6):1047–1052. doi: 10.1002/ijc.11500. [DOI] [PubMed] [Google Scholar]
- 82.Kolijenovic S, Scut TB, Vincent A, Kros JM, Puppels GJ. Detection of meningioma in dura mater by Raman spectroscopy. Anal. Chem. 2005;77(24):7958–7965. doi: 10.1021/ac0512599. [DOI] [PubMed] [Google Scholar]
- 83.Kline NJ, Treado PJ. Raman chemical imaging of breast tissue. J. Raman Spectrosc. 1997;28:119–124. doi: 10.1002/(SICI)1097-4555(199702)28:2/3<119::AID-JRS73>3.0.CO;2-3. [DOI] [Google Scholar]
- 84.Mourant JR, Short KW, Carpenter S, Kunapareddy N, Coburn L, Powers TM, Freyer JP. Biochemical differences in tumorigenic and nontumorigenic cells measured by Raman and infrared spectroscopy. J. Biomed. Opt. 2005;10(3):031106. doi: 10.1117/1.1928050. [DOI] [PubMed] [Google Scholar]
- 85.Sigurdsson S, Philipsen PA, Hansen LK, Laesen L, Gniadecka M, Wulf HC. Detection of skin cancer by classification of Raman spectra. IEEE Trans. Biomed. Eng. 2004;51:10. doi: 10.1109/TBME.2004.831538. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.