Deep Learning on Multimodal Chemical and Whole Slide Imaging Data for Predicting Prostate Cancer Directly from Tissue Images

Md Inzamam Ul Haque; Debangshu Mukherjee; Sylwia A Stopka; Nathalie Y R Agar; Jacob Hinkle; Olga S Ovchinnikova

doi:10.1021/jasms.2c00254

. Author manuscript; available in PMC: 2023 Sep 5.

Published in final edited form as: J Am Soc Mass Spectrom. 2023 Jan 10;34(2):227–235. doi: 10.1021/jasms.2c00254

Deep Learning on Multimodal Chemical and Whole Slide Imaging Data for Predicting Prostate Cancer Directly from Tissue Images

Md Inzamam Ul Haque ¹, Debangshu Mukherjee ², Sylwia A Stopka ³, Nathalie Y R Agar ⁴, Jacob Hinkle ⁵, Olga S Ovchinnikova ⁶

PMCID: PMC10479534 NIHMSID: NIHMS1921678 PMID: 36625762

GRAPHICAL ABSTRACT:

graphic file with name nihms-1921678-f0001.jpg

Prostate cancer is one of the most common cancers globally and is the second most common cancer in the male population in the US. Here we develop a study based on correlating the hematoxylin and eosin (H&E)-stained biopsy data with MALDI mass-spectrometric imaging data of the corresponding tissue to determine the cancerous regions and their unique chemical signatures and variations of the predicted regions with original pathological annotations. We obtain features from high-resolution optical micrographs of whole slide H&E stained data through deep learning and spatially register them with mass spectrometry imaging (MSI) data to correlate the chemical signature with the tissue anatomy of the data. We then use the learned correlation to predict prostate cancer from observed H&E images using trained coregistered MSI data. This multimodal approach can predict cancerous regions with ~80% accuracy, which indicates a correlation between optical H&E features and chemical information found in MSI. We show that such paired multimodal data can be used for training feature extraction networks on H&E data which bypasses the need to acquire expensive MSI data and eliminates the need for manual annotation saving valuable time. Two chemical biomarkers were also found to be predicting the ground truth cancerous regions. This study shows promise in generating improved patient treatment trajectories by predicting prostate cancer directly from readily available H&E-stained biopsy images aided by coregistered MSI data.

INTRODUCTION

Prostate cancer (PC) is the second most common cause of cancer as well as the second leading cause of cancer death among men. Approximately 1 in 8 men will be diagnosed with PC during their lifetime.¹ Therefore, a significant amount of effort has been focused on developing novel treatment options,² early detection,³ as well as predictive models for risk prediction.⁴ In particular, work has focused on forecasting metastatic PC⁵ (mPC) as it is most likely to lead to patient death. A good deal of work has focused on developing artificial intelligence (AI) and machine learning (ML) approaches to automatically grade hematoxylin and eosin (H&E) pathology slides to address these problems.^6–9 Litjens et al.¹⁰ showed that deep learning can be used for histopathologic slide analysis as a tool to improve the efficacy of PC diagnosis as well as reduce the workload for pathologists.

Mass spectrometry imaging (MSI) offers a path to improve detection and labeling of pathology slides by introducing a chemical imaging modality that is able to identify PC based on chemical biomarkers with much higher confidence.^11,12 There have been multiple studies regarding the identification/diagnosis of prostate cancer using mass spectrometry (MS). For example, Andersen et al.¹³ showed that specific tissue compartments within prostate cancer samples have distinct metabolic profiles, and using matrix assisted laser desorption ionization-time-of-flight (MALDI-TOF) MSI data, they identified several differential metabolites and lipids that have potential to be developed further as diagnostic and prognostic biomarkers for prostate cancer. Various MALDI-based MS techniques including imaging, profiling, and proteomics in-depth analysis where MALDI MS follows fractionation and separation methods such as gel electrophoresis have been used to identify prostate cancer biomarkers.¹⁴ A more recent study¹⁵ found nine key biomarkers when using MSI on intact human prostate tissue specimens that determined metabolites which could either differentiate between benign and malignant prostate tissue or indicate prostate cancer aggressiveness. Therefore, work on the integration of multimodal bioinformation data, specifically MSI and H&E data, offers the potential to improve on identification and accurate labeling of PC. In 2015, Van de Plas et al.¹⁶ reported a data fusion framework for MSI and H&E stain microscopy enabling prediction of a molecular distribution both at high spatial resolution and with high chemical specificity. Vollnhals et al.¹⁷ compared two pansharpening methods, Intensity–Hue–Saturation and Laplacian Pyramid, and demonstrated the latter was more robust for image fusion between MSI and electron microscopy. However, these fusion based approaches are limited by the fundamental difference between physical mechanisms of image generation for MSI and techniques used for data up-sampling. Moreover, these approaches are prone to reconstruction errors and are unable to reconstruct full spectral predictions. To circumvent these problems, our group has demonstrated that a physically constrained model between two MSI imaging techniques is able to accurately reconstruct and predict both high spatial resolution images as well as high spectral resolution mass spectra.¹⁸

In this work, we demonstrate a machine learning approach that utilizes whole slide H&E pathology labeled data and MSI data from a 9.4T MALDI Fourier-transform ion cyclotron resonance (FTICR) MS for predicting PC directly from H&E tissue images. We show, as is reported in the previous literature,¹¹ that MSI is extremely useful for predicting cancerous regions, and we leverage this to develop a pipeline for predicting cancerous regions using both whole slide H&E-to-MSI and MSI-to-PC stages. Since MSI is expensive and not widely available, our approach uses paired MSI and H&E data which does not require manual labels to train the H&E-to-MSI model. The resulting model provides not only a binary cancer/noncancer segmentation but a spectral estimate at each point of an H&E image, resulting in a highly interpretable prediction. This work lays the groundwork for developing more accurate MSI prediction with larger paired MSI/H&E datasets, and minimizing the amount of manually labeled images required for PC prediction by leveraging the richness of intermediate H&E supervision.

MATERIALS AND METHODS

The multimodal study uses elements from the individual modalities as shown in Figure 1. It should be noted that prediction of PC is performed with both the individual modalities as well as the multimodal study. Prediction results and comparison of these three different scenarios are reported in the Results.

Data.

For this study, previously published human prostate RAW files¹¹ were provided by the authors. Briefly, the mass spectrometry data consisted of human prostate tissue specimens that were cryosectioned and imaged at a pixel size of 120 μm using a 9.4 T SolariX XR FT ICR MS (Bruker Daltonics, Billerica, MA). Corresponding high-resolution annotated H&E images of the same tissue were provided after MALDI matrix removal. Further details regarding data collection can be found in the original publication.¹¹

In total, we received five tissue samples of MSI data and corresponding annotated high-resolution H&E whole slide images. With the five tissue images, a 5-fold cross-validation is used during training for all the logistic regression models that are explained later in this section. For each fold, total pixels of the four preprocessed tissue images, excluding the holdout image for testing, are shuffled and divided into 80%–20% for training and validation, respectively.

MSI Data Processing.

The MSI data was originally acquired in the imzML format, a common data format for MS imaging. To facilitate experimentation, we convert this data to HDF5 format. Since the imzML files are in the range of a couple hundred gigabytes, HDF5 format is a suitable choice for fast I/O processing and storage. Each imzML file consists of a large number of mass spectra from which we can produce an ion image of the whole slide as well as see the spectra for any spatial coordinate. After inspecting the spectra for several coordinates, we found that there is a slight difference in the m/z values between individual spectra. This difference in the m/z values is shown in Figure S1. To have a common m/z axis for all the coordinates in an image, we first interpolate the intensity values for m/z values ranging from 100 to 1,000 with a step size of 0.001 and then convert it to HDF5 file format. We zoom into a spectra to show the effect of interpolation and compare it with the original spectra before interpolation in Figure S1.

Binary Mask Extraction.

Pathologist annotations were provided in the form of contours overlaid on low-resolution H&E images. We extract binary masks from the contours for both cancer and noncancer regions as well as nontissue background (Figure S2). These masks are used to register both the high-resolution whole slide H&E images and MSI images. Affine transformation followed by phase correlation has been used to coregister H&E and MSI images using the extracted binary masks. The phase correlation is an efficient subpixel image translation registration by cross-correlation used from the scikit-image library. After extracting the masks it is possible to see the differences in mass spectra between cancer and noncancer regions (Figure S3).

PCA of MSI Data.

The interpolated MSI files were too large to conveniently process since they contained 9,00,000 spectral channels at thousands of spatial points. Therefore, we used principal component analysis (PCA) to reduce the dimensionality of individual spectra. Figure 2 shows the first five PCA components for one of the MSI images. We explored the choice of number of principal components with values of 25, 50, 100, 200, and 300. The values of the cumulative explained variance for 25, 50, 100, 200, and 300 principal components are respectively 86.68%, 92.71%, 97.08%, 98.51%, and 98.50% (Figure 2G). We get the best cumulative explained variance with a minimum of 200 principal components. Principal components values greater than 200 do not improve our final result. Results with each of these values are shown in the Results. These 200 components of PCA explained 98.51% of the variability in the data with the first three components combined explaining 67% variability (Figure 2G). Although some components of PCA captured the cancerous vs noncancerous regions better than others, no single PCA component could clearly segment important regions. The PCA training was done jointly using the MSI images and tested on a holdout image. Incremental PCA from the scikit-learn library has been used to fit the training in memory and for faster convergence.

Figure 2. — Dimension reduction of MSI using PCA. (A) Original cancer annotations by pathologists for a tissue specimen. (B)–(F) First five PCA components, respectively, for the tissue specimen. A color bar is given showing intensity values in arbitrary units for the five components. Although the components show hints of cancerous regions, no single PCA component can clearly segment cancerous regions. (G) Cumulative explained variance achieved by the PCA components. The highest explained variance is achieved with a minimum of 200 components as seen from the plot.

Logistic Regression of MSI.

We trained a logistic regression model using the 200 PCA components of all five MSI images to predict cancerous regions. A 5-fold cross validation was used during training. The extracted binary masks were used as the labels for logistic regression. Scikitlearn’s SGD classifier class has been used to perform this regression task. This estimator implements regularized linear models with stochastic gradient descent (SGD) learning. We used the “log” loss that gives logistic regression, a probabilistic classifier. A regularization parameter of 0.01 has been used with an adaptive learning rate starting from 0.01. Since we had a class imbalance in labels, we used a “balanced” class weight fit preset which uses the values of labels to automatically adjust weights inversely proportional to class frequencies in the input data. After training, the model was tested on a holdout image to check performance.

H&E Data Processing.

The high-resolution whole slide H&E tissue images were provided in Carl Zeiss CZI data format. Again, for faster I/O processing and storage, we converted this data to HDF5 format. Since these are high-resolution images, we upscaled the binary masks, combined noncancer and cancer masks to get the foreground and then coregistered the H&E images. We also normalized each H&E image to have zero mean and a unit standard deviation in each color channel.

MSI Prediction from H&E Data.

First, we extracted features using deep learning from the normalized high-resolution H&E images. Inspired by the results of Lu et al.,¹⁹ we used a pretrained ResNet-50 model for extracting the features from H&E images. We have used the PyTorch deep learning framework in this study, and the ImageNet pretrained ResNet-50 model is acquired from PyTorch. Since these are large high-resolution images, we divide each image into 512 × 512 patches before feeding them to the feature extractor. We have also used halos of 256 pixels around the patches to eradicate grid artifacts. This feature extraction step increases the number of channels from 3 to 1024 since we are extracting from the third layer of the Resnet-50 backbone. Also, we reduced the spatial dimension by 16 times using average pooling. This reduction was necessary to get closer to the spatial dimension of the MSI images. Second, to match exactly with the spatial dimension of MSI PCA components, for each slice, we regridded the extracted features. Considering the extracted features as the source dataset and the MSI PCA components as the target dataset, we used two strategies: downscaling and upscaling. Downscaling is used when the target shape is smaller than the source shape. If the source shape is an integer multiple of the target shape, we perform average pooling. Otherwise, we perform Gaussian blurring followed by subsampling. Upscaling is performed when the target shape is larger than the source shape, in which case we use linear interpolation. Having the spatial dimension of both datasets the same, we perform linear regression on the extracted features to reduce the number of channels from 1024 to 200 and compare with the MSI PCA components. A 5-fold cross validation has been used for training. Adam optimizer with a learning rate of 0.1 has been used, and the convergence took 20000 iterations. We saved the predicted MSI components and train the same logistic regression model that was used with the H&E data to predict cancerous regions. In other words, we first predict MSI components from the downscaled H&E features and then train a logistic regression model as discussed in the previous subsection with the predicted MSI data to predict cancer labels.

Logistic Regression of H&E data.

We trained a logistic regression model using the regridded deep H&E features to predict cancerous regions and to compare performance with the results achieved with MSI data. Again, a 5-fold cross validation was used during training. The same extracted binary masks that were used for the MSI data training were used as the labels for logistic regression. Scikit-learn’s SGD classifier that implements regularized linear models with stochastic gradient descent (SGD) learning was used for the training. Same as before, we have used the “log” loss that gives logistic regression, a probabilistic classifier. A regularization parameter of 0.001 has been used with an adaptive learning rate starting from 0.001. Similar to the MSI logistic regression training, we used a “balanced” class weight fit preset which uses the values of labels to automatically adjust weights inversely proportional to class frequencies in the input data, essentially removing class imbalance of the training data. After training, the model was tested on a holdout image to evaluate performance.

MSI Peak Prediction.

Previously, Steurer et al.¹² identified 32 distinguishable m/z signals that occurred in 5–90% of the spectra of in their 729 analyzable tissue samples. They found a total of 15 of these signals appeared to be associated with epithelial structures, based on the comparison with the H&E stain of the slide - m/z 605, 616, 644, 678, 700, 899, 976, 1,014,1,044, 1,199, 1,275, 1,502, 3,071, 3,086, and 3,577. We first increased our range of m/z values from 1,000 to 1,500 to test most of these spectral features, since our original interpolation accounted for 100–1,000 m/z values. Since our H&E to MSI predictions correspond to the MSI PCA components, we convert the predictions to the original MSI m/z spectral dimension, transforming 200 components to 1,500,000 features. We use inverse PCA transformation from scikit-learn library to do this transformation. Then we choose the above-mentioned features in the range from 100 to 1,500 m/z. Also, Randall et al.¹¹ listed m/z values that were searched against the Lipid Maps database. We observe ground truth MSI and their prediction from H&E for these m/z values. We looked for chemical biomarkers with the ground truth MSI that best captured the cancerous regions of the tissue specimens and then observed the predicted MSI.

RESULTS

We have trained logistic regression models to predict cancerous regions from MSI PCA components, regridded H&E deep learning extracted data, and converted MSI data from H&E data as discussed in the Methods. Quantitative results including several metrics are summarized in Table 1. It is a pixel-based result, and there is a class imbalance present. From the total five tissue samples, on average, 14.00% pixels belong to cancerous regions, 54.78% pixels belong to noncancerous regions, and 31.22% pixels are the background of the images. These results were achieved with a 5-fold cross-validation. Here, AUC refers to the area under the receiver operating characteristic (ROC) curve, mIoU is the mean intersection over union, also known as the Jaccard similarity coefficient score. The mIoU measures the size of the intersection divided by the size of the union of the true labels and the predicted labels above a given confidence threshold. The F1 score can be interpreted as a harmonic mean of the precision and recall of labels above a given threshold. The Dice score computes the Dice dissimilarity between predicted and true labels; hence, a lower dice score is better.

Table 1.

Logistic Regression Test Metrics As Mean ± Std Using Five-Fold Cross-Validation for Label Prediction from MSI, H&E, and MSI as First Predicted from H&E

Input	Accuracy	AUC	mIoU	F1	Dice score
MSI	0.85 ± 0.01	0.72 ± 0.01	0.63 ± 0.01	0.59 ± 0.02	0.41 ± 0.02
H&E	0.80 ± 0.02	0.67 ± 0.02	0.55 ± 0.02	0.52 ± 0.03	0.53 ± 0.02
H&E to MSI - 25 PCs	0.67 ± 0.01	0.60 ± 0.01	0.43 ± 0.01	0.38 ± 0.02	0.62 ± 0.02
H&E to MSI - 50 PCs	0.73 ± 0.02	0.63 ± 0.01	0.49 ± 0.02	0.44 ± 0.02	0.56 ± 0.02
H&E to MSI - 100 PCs	0.76 ± 0.02	0.65 ± 0.02	0.53 ± 0.03	0.48 ± 0.03	0.52 ± 0.03
H&E to MSI - 200 PCs	0.80 ± 0.02	0.67 ± 0.02	0.56 ± 0.02	0.52 ± 0.03	0.48 ± 0.03
H&E to MSI - 300 PCs	0.80 ± 0.02	0.66 ± 0.02	0.56 ± 0.02	0.51 ± 0.02	0.48 ± 0.03

Open in a new tab

From Table 1 we can see that a better prediction is achieved with MSI data. As previously reported in,¹¹ this result actually shows that PC can be better predicted using chemical information obtained from MSI data. H&E to MSI prediction is shown for 25, 50, 100, 200, and 300 principal components. We get the best result with a minimum of 200 PCs, anything more than that does not improve the result. A qualitative version of this H&E to MSI prediction result with different numbers of PCs is shown in Figure S4. The logistic regression performs similarly when we get label prediction directly from downscaled H&E features and from MSI components predicted from H&E features although we get slightly better mIoU and Dice score for the latter (lower Dice score is better). Combining H&E and MSI data, we get 80% accuracy with logistic regression. A qualitative result is also shown in Figure 3 which corresponds to the quantitative results we achieved. Predictions for three different tissue specimens are shown in Figure 3. We can categorize these predictions into three different categories. First, for the first tissue specimen in Figure 3A, cancerous regions are revealed well for all three predictions as shown in Figure 3B–D. This is an example of a good prediction. Second, if we look closely at Figure 3B,F, some secondary regions are revealed with the MSI to label prediction. When we look at the prediction for the second tissue specimen directly from H&E in Figure 3G, the secondary region seen with MSI prediction is revealed more. Interestingly, this region is revealed even more when we predict from predicted MSI components as seen in Figure 3H. These secondary regions can be random noise or actual cancerous regions which can be missing in the original annotation and are subject to further validation. Third, label prediction directly from MSI is best for the third tissue specimen as seen in Figure 3J. But two primary regions are missing in the other two predictions for this specimen as shown in Figure 3K,L. Since we have trained the logistic regression models with limited data, we believe the prediction of cancerous regions with the multimodal approach will improve with more variability in the training data.

Figure 3. — (A), (E), and (I) are original cancer annotations for three different tissue specimens. (B), (F), and (J) show predicted cancer regions from MSI PCA components for the three tissue specimens. (C), (G), and (K) show predicted cancer labels from regridded H&E deep learning extracted features for the three tissue specimens. (D), (H), and (L) show prediction of cancer labels from MSI predictions which is predicted form H&E features. All the predictions in this figure are achieved using logistic regression. The colorbar represents the probability of cancer annotation for each row. Row one shows good agreement between MSI (B) and H&E (C), with a possible false positive region that is not labeled in the H&E (C) or H&E to MSI (D) predicted components. In row two, MSI (F) and H&E (G) perform well, but H&E to MSI (H) underperforms. In row three, MSI (J) significantly outperforms H&E (K), and H&E to MSI (L) partially recovers some of that performance.

Figure 4 shows the results of H&E to MSI prediction for three different tissue specimens. As discussed in the Methods, we used a linear regression model with regridded H&E deep learning extracted features as the input and MSI PCA components as the labels. Out of 200 PCA components, prediction for the component having the highest R² is shown. We achieved an overall R² score of 0.23. In this case, it is expected to not have very high accuracy and R² score since we are predicting a different modality of imaging data. Nevertheless, we can see similar patterns in the predicted MSI images corresponding to the actual MSI images. Prediction for tissue specimen 2 as shown in Figure 4B is slightly better compared to the other two tissue specimens. A single pixel is chosen from the cancerous region for each tissue specimen, and the mass spectra are shown in Figure 4C,F,I. It is evident that there is an intensity mismatch (1.1e8 A.U. on average) between the actual and predicted spectra, but the prediction is clearly able to capture most of the m/z peaks in the actual spectra. Since we interpolated the MSI data in the range of 100 to 1,000 m/z values, the corresponding mass spectra are shown in the same range. Also, because of using PCA components as the labels in this linear regression problem, we ended up having some negative values in the prediction which are clipped to zero in the spectra plots in this figure.

From the m/z features, we obtained for MSI peak prediction as discussed in the Methods, we found that ground truth MSI image for the first tissue specimen with m/z 782.5655, which is identified as phosphatidylcholine PC(34:1) (Δppm = 1.93), reveals a cancerous region as shown in Figure 5B. The predicted MSI for this biomarker appears to identify cancerous regions in Figure 5C, when compared with the original tissue annotation in Figure 5A. Interestingly, in Figure Figure 5B, the ground truth MSI captures one of the cancerous regions but misses the other completely whereas our predicted image in Figure 5C captures the region missing in the ground truth. We found another chemical biomarker m/z 780.5483, which is identified as cardiolipin CL(80:9) (Δppm = 0.18) that provided similar result as seen with the first tissue specimen. We tested and validated with another tissue specimen and achieved satisfactory results as the cancerous regions can be identified both by ground truth and predicted MSI images as shown in Figure 5D,E respectively. Like the original MSI, the predicted MSI images for these two m/z values also show the same secondary predicted cancer regions that were missing in the original annotation. Figure 5 shows that in addition to labeling PC, our method provides useful additional information in the form of predicted chemical information. Since our cancer predictions are made directly from inferred chemical signatures, the results are inherently explainable by the chemical signatures at each point that were predicted from patterns detected on the H&E images.

Figure 5. — Comparison of cancer regions prediction directly from H&E with ground truth MSI. (A) and (D) are the original cancer annotation for two different tissue specimens. (B) Ground truth MSI of the first tissue specimen for m/z 782.5655 which is identified as phosphatidylcholine PC(34:1) (Δppm = 1.93). (C) Predicted MSI of the first tissue specimen directly from H&E for the same m/z as ground truth. (E) Ground truth MSI of the second tissue specimen for m/z 780.5483 which is identified as cardiolipin CL(80:9) (Δppm = 0.18). (F) Predicted MSI of the second tissue specimen directly from H&E for the same m/z as ground truth.

DISCUSSION

In this work, we have developed a machine learning approach to detect PC directly from H&E data incorporating chemical information found in MSI data. We found that H&E can predict mass spectra somewhat accurately, indicating a correlation between features visible in optical H&E imaging and the chemical information present in MSI. We also found that prostate cancer regions can be predicted reliably from MSI, outperforming H&E-based prediction, indicating that the mass spectra contain useful information for the segmentation of cancerous regions. However, since MSI data are expensive to acquire and unavailable in a typical pathology lab, direct use of MSI data is infeasible in practice.

Instead, we used paired MSI and H&E data for the five tissue samples to show a proof of principle that such paired data can be useful for training feature extraction networks on H&E data. We verified that the overlapping information between modalities matches what needed for segmentation by predicting cancerous regions directly from H&E as well as from predicted MSI. Moreover, we found two MSI biomarkers (Figure 5) corresponding to specific masses that correctly identified the cancerous regions. Our prediction using H&E and MSI data was able to identify those regions as well.

As shown in the Results, secondary regions are also identified in some cases using MSI. These regions could be errors due to random noise or cancerous regions which were missing in the original pathology annotation. In future studies, additional validation of these secondary regions would be useful, for example, using immunohistochemistry (IHC) imaging.

Our approach shows the feasibility of using readily available H&E data to predict the rich chemical information available in MSI images. Although our training process requires paired data including both H&E and MSI data from the same samples, the resulting trained models could be relevant in clinical settings where only H&E is available. To date, large public datasets of H&E images have been collected to support automated cancer detection and diagnosis, including manually curated slide-level or pixel-level annotations acquired at great cost. Our results suggest that we can reliably train feature extraction networks for automated H&E-based pathology without the need for time-consuming and expensive manual annotation by expert pathologists, as we achieve similar results with both MSI supervision and pathologist supervision. Even with our small sample set, we find that this approach gives reasonable results to show that PC could potentially be diagnosed accurately with correlating MSI and H&E data. However, we do expect that accuracy of our predictions could be improved in future work by using deep learning with end-to-end training in a semisupervised approach along with large-scale datasets such as Panda,²⁰ which has around 11,000 whole-slide images of digitized H&E-stained biopsies, none of which is paired with MSI. These preliminary results along with a relative lack of MSI in public pathology datasets motivate the collection of larger paired H&E/MSI datasets in the future to support large-scale feature learning efforts for H&E analysis. Overall, we have demonstrated that we could potentially improve PC detection from H&E slides by incorporating MSI data, and that augmentation of labeled H&E datasets with paired MSI data can improve explainability by providing chemical information to support predictions. This lays the groundwork for developing more accurate prediction of PC in patients to improve patient care.

Supplementary Material

NIHMS1921678-supplement-SI.pdf^{(3.5MB, pdf)}

NIHMS1921678-supplement-2.pdf^{(812.1KB, pdf)}

ACKNOWLEDGMENTS

This research is supported by the Office of Research and Development, Veterans Health Administration, award MVP017. This publication does not represent the views of the Department of Veteran Affairs or the United States Government. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The authors also acknowledge the support of the larger VA partnership, NIH grants U54-CA210180 (N.Y.R.A.), P41-EB028741 (N.Y.R.A.), and T32EB025823 (S.A.S.).

Footnotes

Complete contact information is available at: https://pubs.acs.org/10.1021/jasms.2c00254

Notes

The authors declare the following competing financial interest(s): N.Y.R.A. is a key opinion leader for Bruker Daltonics and receives support from Thermo Finnegan and EMD Serono. All other authors declare that the research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.

Contributor Information

Md Inzamam Ul Haque, The Bredesen Center, University of Tennessee, Knoxville, Tennessee 37996, United States.

Debangshu Mukherjee, Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, United States.

Sylwia A. Stopka, Department of Neurosurgery and Department of Radiology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, United States

Nathalie Y. R. Agar, Department of Neurosurgery and Department of Radiology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, United States; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, United States

Jacob Hinkle, Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, United States.

Olga S. Ovchinnikova, The Bredesen Center, University of Tennessee, Knoxville, Tennessee 37996, United States

Data Availability Statement

Code for the experiments and figures in this paper are available at https://github.com/inzamam1190/HEtoMALDI.

REFERENCES

(1).Key Statistics for Prostate Cancer | Prostate Cancer Facts 2021; https://www.cancer.org/cancer/prostate-cancer/about/key-statistics.html.
(2).Rebello RJ; Oing C; Knudsen KE; Loeb S; Johnson DC; Reiter RE; Gillessen S; Van der Kwast T; Bristow RG Prostate cancer. Nature Reviews Disease Primers 2021, 7, 1–27. [DOI] [PubMed] [Google Scholar]
(3).Karakiewicz PI; Hutterer GC Predictive models and prostate cancer. Nature Clinical Practice Urology 2008, 5, 82–92. [DOI] [PubMed] [Google Scholar]
(4).Aladwani M; Lophatananon A; Ollier W; Muir K Prediction models for prostate cancer to be used in the primary care setting: a systematic review. BMJ. Open 2020, 10, No. e034661. [DOI] [PMC free article] [PubMed] [Google Scholar]
(5).Danciu I; Agasthya G; Tate JP; Chandra-Shekar M; Goethert I; Ovchinnikova OS; McMahon BH; Justice AC In with the old, in with the new: machine learning for time to event biomedical research. Journal of the American Medical Informatics Association 2022, 29, 1737–1743. [DOI] [PMC free article] [PubMed] [Google Scholar]
(6).Schmidt B; Bhambhvani HP; Fan RE; Kunder C; Kao CS; Higgins JP; Rusu M; Sonn GA External validation of an artificial intelligence algorithm for prostate cancer Gleason grading and tumor quantification. Journal of Urology 2021, 206, e1004–e1004. [Google Scholar]
(7).Linkon AHM; Labib MM; Hasan T; Hossain M; Jannat M-E Deep learning in prostate cancer diagnosis and Gleason grading in histopathology images: An extensive study. Informatics in Medicine Unlocked 2021, 24, 100582. [Google Scholar]
(8).Chandramouli S; Leo P; Lee G; Elliott R; Davis C; Zhu G; Fu P; Epstein JI; Veltri R; Madabhushi A Computer Extracted Features from Initial H&E Tissue Biopsies Predict Disease Progression for Prostate Cancer Patients on Active Surveillance. Cancers 2020, 12, 2708. [DOI] [PMC free article] [PubMed] [Google Scholar]
(9).Rana A; Lowe A; Lithgow M; Horback K; Janovitz T; Da Silva A; Tsai H; Shanmugam V; Bayat A; Shah P Use of Deep Learning to Develop and Analyze Computational Hematoxylin and Eosin Staining of Prostate Core Biopsy Images for Tumor Diagnosis. JAMA Network Open 2020, 3, No. e205111. [DOI] [PMC free article] [PubMed] [Google Scholar]
(10).Litjens G; Sánchez CI; Timofeeva N; Hermsen M; Nagtegaal I; Kovacs I; Hulsbergen van de Kaa C; Bult P; van Ginneken B; van der Laak J Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci. Rep. 2016, 6, 26286. [DOI] [PMC free article] [PubMed] [Google Scholar]
(11).Randall EC; Zadra G; Chetta P; Lopez BGC; Syamala S; Basu SS; Agar JN; Loda M; Tempany CM; Fennessy FM; et al. Molecular characterization of prostate cancer with associated Gleason score using mass spectrometry imaging. Molecular cancer research: MCR 2019, 17, 1155–1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
(12).Steurer S; Borkowski C; Odinga S; Buchholz M; Koop C; Huland H; Becker M; Witt M; Trede D; Omidi M; et al. MALDI mass spectrometric imaging based identification of clinically relevant signals in prostate cancer using large-scale tissue microarrays. Int. J. Cancer 2013, 133, 920–928. [DOI] [PubMed] [Google Scholar]
(13).Andersen MK; Høiem TS; Claes BSR; Balluff B; Martin-Lorenzo M; Richardsen E; Krossa S; Bertilsson H; Heeren RMA; Rye MB; et al. Spatial differentiation of metabolism in prostate cancer tissue by MALDI-TOF MSI. Cancer & Metabolism 2021, 9, 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
(14).Flatley B; Malone P; Cramer R MALDI mass spectrometry in prostate cancer biomarker discovery. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics 2014, 1844, 940–949. [DOI] [PubMed] [Google Scholar]
(15).Kurreck A; Vandergrift LA; Fuss TL; Habbel P; Agar NYR; Cheng LL Prostate cancer diagnosis and characterization with mass spectrometry imaging. Prostate Cancer and Prostatic Diseases 2018, 21, 297–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
(16).Van de Plas R; Yang J; Spraggins J; Caprioli RM Image fusion of mass spectrometry and microscopy: a multimodality paradigm for molecular tissue mapping. Nat. Methods 2015, 12, 366–372. [DOI] [PMC free article] [PubMed] [Google Scholar]
(17).Vollnhals F; Audinot J-N; Wirtz T; Mercier-Bonin M; Fourquaux I; Schroeppel B; Kraushaar U; Lev-Ram V; Ellisman MH; Eswara S Correlative Microscopy Combining Secondary Ion Mass Spectrometry and Electron Microscopy: Comparison of Intensity–Hue–Saturation and Laplacian Pyramid Methods for Image Fusion. Anal. Chem. 2017, 89, 10702–10710. [DOI] [PubMed] [Google Scholar]
(18).Borodinov N; Lorenz M; King ST; Ievlev AV; Ovchinnikova OS Toward nanoscale molecular mass spectrometry imaging via physically constrained machine learning on co-registered multimodal data. npj Computational Materials 2020, 6, 1–8. [Google Scholar]
(19).Lu MY; Williamson DFK; Chen TY; Chen RJ; Barbieri M; Mahmood F Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering 2021, 5, 555–570. [DOI] [PMC free article] [PubMed] [Google Scholar]
(20).Wang X; Zhang X; Zhu Y; Guo Y; Yuan X; Xiang L; Wang Z; Ding G; Brady D; Dai Q et al. PANDA: A Gigapixel-Level Human-Centric Video Dataset 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Seattle, WA, 2020; pp 3265–3275. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1921678-supplement-SI.pdf^{(3.5MB, pdf)}

NIHMS1921678-supplement-2.pdf^{(812.1KB, pdf)}

Data Availability Statement

Code for the experiments and figures in this paper are available at https://github.com/inzamam1190/HEtoMALDI.

[R1] (1).Key Statistics for Prostate Cancer | Prostate Cancer Facts 2021; https://www.cancer.org/cancer/prostate-cancer/about/key-statistics.html.

[R2] (2).Rebello RJ; Oing C; Knudsen KE; Loeb S; Johnson DC; Reiter RE; Gillessen S; Van der Kwast T; Bristow RG Prostate cancer. Nature Reviews Disease Primers 2021, 7, 1–27. [DOI] [PubMed] [Google Scholar]

[R3] (3).Karakiewicz PI; Hutterer GC Predictive models and prostate cancer. Nature Clinical Practice Urology 2008, 5, 82–92. [DOI] [PubMed] [Google Scholar]

[R4] (4).Aladwani M; Lophatananon A; Ollier W; Muir K Prediction models for prostate cancer to be used in the primary care setting: a systematic review. BMJ. Open 2020, 10, No. e034661. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] (5).Danciu I; Agasthya G; Tate JP; Chandra-Shekar M; Goethert I; Ovchinnikova OS; McMahon BH; Justice AC In with the old, in with the new: machine learning for time to event biomedical research. Journal of the American Medical Informatics Association 2022, 29, 1737–1743. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] (6).Schmidt B; Bhambhvani HP; Fan RE; Kunder C; Kao CS; Higgins JP; Rusu M; Sonn GA External validation of an artificial intelligence algorithm for prostate cancer Gleason grading and tumor quantification. Journal of Urology 2021, 206, e1004–e1004. [Google Scholar]

[R7] (7).Linkon AHM; Labib MM; Hasan T; Hossain M; Jannat M-E Deep learning in prostate cancer diagnosis and Gleason grading in histopathology images: An extensive study. Informatics in Medicine Unlocked 2021, 24, 100582. [Google Scholar]

[R8] (8).Chandramouli S; Leo P; Lee G; Elliott R; Davis C; Zhu G; Fu P; Epstein JI; Veltri R; Madabhushi A Computer Extracted Features from Initial H&E Tissue Biopsies Predict Disease Progression for Prostate Cancer Patients on Active Surveillance. Cancers 2020, 12, 2708. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] (9).Rana A; Lowe A; Lithgow M; Horback K; Janovitz T; Da Silva A; Tsai H; Shanmugam V; Bayat A; Shah P Use of Deep Learning to Develop and Analyze Computational Hematoxylin and Eosin Staining of Prostate Core Biopsy Images for Tumor Diagnosis. JAMA Network Open 2020, 3, No. e205111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] (10).Litjens G; Sánchez CI; Timofeeva N; Hermsen M; Nagtegaal I; Kovacs I; Hulsbergen van de Kaa C; Bult P; van Ginneken B; van der Laak J Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci. Rep. 2016, 6, 26286. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] (11).Randall EC; Zadra G; Chetta P; Lopez BGC; Syamala S; Basu SS; Agar JN; Loda M; Tempany CM; Fennessy FM; et al. Molecular characterization of prostate cancer with associated Gleason score using mass spectrometry imaging. Molecular cancer research: MCR 2019, 17, 1155–1165. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] (12).Steurer S; Borkowski C; Odinga S; Buchholz M; Koop C; Huland H; Becker M; Witt M; Trede D; Omidi M; et al. MALDI mass spectrometric imaging based identification of clinically relevant signals in prostate cancer using large-scale tissue microarrays. Int. J. Cancer 2013, 133, 920–928. [DOI] [PubMed] [Google Scholar]

[R13] (13).Andersen MK; Høiem TS; Claes BSR; Balluff B; Martin-Lorenzo M; Richardsen E; Krossa S; Bertilsson H; Heeren RMA; Rye MB; et al. Spatial differentiation of metabolism in prostate cancer tissue by MALDI-TOF MSI. Cancer & Metabolism 2021, 9, 9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] (14).Flatley B; Malone P; Cramer R MALDI mass spectrometry in prostate cancer biomarker discovery. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics 2014, 1844, 940–949. [DOI] [PubMed] [Google Scholar]

[R15] (15).Kurreck A; Vandergrift LA; Fuss TL; Habbel P; Agar NYR; Cheng LL Prostate cancer diagnosis and characterization with mass spectrometry imaging. Prostate Cancer and Prostatic Diseases 2018, 21, 297–305. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] (16).Van de Plas R; Yang J; Spraggins J; Caprioli RM Image fusion of mass spectrometry and microscopy: a multimodality paradigm for molecular tissue mapping. Nat. Methods 2015, 12, 366–372. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] (17).Vollnhals F; Audinot J-N; Wirtz T; Mercier-Bonin M; Fourquaux I; Schroeppel B; Kraushaar U; Lev-Ram V; Ellisman MH; Eswara S Correlative Microscopy Combining Secondary Ion Mass Spectrometry and Electron Microscopy: Comparison of Intensity–Hue–Saturation and Laplacian Pyramid Methods for Image Fusion. Anal. Chem. 2017, 89, 10702–10710. [DOI] [PubMed] [Google Scholar]

[R18] (18).Borodinov N; Lorenz M; King ST; Ievlev AV; Ovchinnikova OS Toward nanoscale molecular mass spectrometry imaging via physically constrained machine learning on co-registered multimodal data. npj Computational Materials 2020, 6, 1–8. [Google Scholar]

[R19] (19).Lu MY; Williamson DFK; Chen TY; Chen RJ; Barbieri M; Mahmood F Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering 2021, 5, 555–570. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] (20).Wang X; Zhang X; Zhu Y; Guo Y; Yuan X; Xiang L; Wang Z; Ding G; Brady D; Dai Q et al. PANDA: A Gigapixel-Level Human-Centric Video Dataset 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Seattle, WA, 2020; pp 3265–3275. [Google Scholar]

PERMALINK

Deep Learning on Multimodal Chemical and Whole Slide Imaging Data for Predicting Prostate Cancer Directly from Tissue Images

Md Inzamam Ul Haque

Debangshu Mukherjee

Sylwia A Stopka

Nathalie Y R Agar

Jacob Hinkle

Olga S Ovchinnikova

GRAPHICAL ABSTRACT:

INTRODUCTION

MATERIALS AND METHODS

Figure 1.

Data.

MSI Data Processing.

Binary Mask Extraction.

PCA of MSI Data.

Figure 2.

Logistic Regression of MSI.

H&E Data Processing.

MSI Prediction from H&E Data.

Logistic Regression of H&E data.

MSI Peak Prediction.

RESULTS

Table 1.

Figure 3.

Figure 4.

Figure 5.

DISCUSSION

Supplementary Material

ACKNOWLEDGMENTS

Footnotes

Contributor Information

Data Availability Statement

REFERENCES

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases