Skip to main content
PLOS One logoLink to PLOS One
. 2024 Oct 10;19(10):e0300526. doi: 10.1371/journal.pone.0300526

Interpretable dimensionality reduction and classification of mass spectrometry imaging data in a visceral pain model via non-negative matrix factorization

Kasun Pathirage 1,2,#, Aman Virmani 1,2,#, Alison J Scott 3, Richard J Traub 4, Robert K Ernst 3, Reza Ghodssi 1,2, Behtash Babadi 1,2,*, Pamela Ann Abshire 1,2,*
Editor: Bardia Yousefi5
PMCID: PMC11466421  PMID: 39388402

Abstract

Mass spectrometry imaging (MSI) is a powerful scientific tool for understanding the spatial distribution of biochemical compounds in tissue structures. In this paper, we introduce three novel approaches in MSI data processing to perform the tasks of data augmentation, feature ranking, and image registration. We use these approaches in conjunction with non-negative matrix factorization (NMF) to resolve two of the biggest challenges in MSI data analysis, namely: 1) the large file sizes and associated computational resource requirements and 2) the complexity of interpreting the very high dimensional raw spectral data. There are many dimensionality reduction techniques that address the first challenge but do not necessarily result in readily interpretable features, leaving the second challenge unaddressed. We demonstrate that NMF is an effective dimensionality reduction algorithm that reduces the size of MSI datasets by three orders of magnitude with limited loss of information, yielding spatial and spectral components with meaningful correlation to tissue structure that may be used directly for subsequent data analysis without the need for additional clustering steps. This analysis is demonstrated on an MSI dataset from female Sprague-Dawley rats for an animal model of comorbid visceral pain hypersensitivity (CPH). We find that high-dimensional MSI data (∼ 100,000 ions per pixel) can be reduced to 20 spectral NMF components with < 20% loss in reconstruction accuracy. The resulting spatial NMF components are reproducible and correlate well with H&E-stained tissue images. These components may also be used to generate images with enhanced specificity for different tissue types. Small patches of NMF data (i.e., 20 spatial NMF components over 20 × 20 pixels) provide an accuracy of ∼ 87% in classifying CPH vs naïve control subjects. This paper presents the novel data processing methodologies that were used to produce these results, encompassing novel data processing pipelines for data augmentation to support training for classification, ranking of features according to their contribution to classification, and image registration to enhance tissue-specific imaging.

Introduction

Mass spectrometry imaging (MSI) produces three-dimensional images in which each pixel at an (x, y) location has a corresponding mass spectrum with mass-to-charge (m/z) and intensity axes. Raw MSI datasets can be difficult to interpret due to the sparse and distributed nature of the information, with many tissue characteristics associated with a combination of ions rather than individual ions. Analysis of replicate-powered MSI data presents significant challenges due to their large size, necessitating dimensionality reduction techniques and extraction of features for further analysis.

Most approaches to address dimensionality reduction in MSI do not preserve the inherent physical properties of MSI spectra, namely that MSI spectra are nonnegative. In this work, we explored the use of non-negative matrix factorization (NMF) as a dimensionality reduction technique that preserves spectral nonnegativity and provides a strong correlation with physiological features. Spectral peaks present in the extracted NMF components represent lipid ions present in the tissues.

This research establishes a new way to interpret MSI data that significantly reduces the data size and produces interpretable features, allowing for faster data processing and histological analysis based on MSI-derived features. We describe a data pipeline for extracting interpretable information from MSI data that can be represented compactly while preserving the spectral and spatial interpretability of the compressed MSI data. We have applied our data pipeline to two important applications- biological classification and generation of tissue histology images- to study the viability of our methods.

This paper presents novel approaches for MSI data analysis that build upon existing methods in three distinct ways: 1) by introducing a data augmentation technique that allows the use of NMF components for classification into biological groups using limited training data; 2) by introducing a statistical approach that may be used to extract biologically relevant, class-distinctive latent variables and to rank their contributions to the classification accuracy; and 3) by introducing an image registration technique that enhances the tissue-type specificity and correlation with H&E-stained images. The approaches are demonstrated on an MSI dataset for a rodent model of chronic visceral pain.

Background and related research

This paper builds on existing techniques in mass spectrometry imaging, H&E staining, data compression, and data classification, which are briefly summarized below.

Mass spectrometry imaging

Mass spectrometry imaging (MSI) is an analysis technique that generates a spatial distribution of ions and abundances in a given sample and can be used for a variety of molecular targets. Several MSI processes use different ionization techniques, with the most widely used being matrix-assisted laser desorption ionization (MALDI), desorption electro-spray ionization (DESI), and secondary ion mass spectrometry (SIMS), and their uses are well-reviewed [13].

The spatial aspect of MSI makes it possible to obtain anatomical images of any ion detected in the mass spectra in a given experiment. MSI has been widely used to map diverse analytes, but it is particularly effective in analyzing lipids [4, 5] and has been used to map lipids and lipid fine structure in brain tissue [6], simultaneously map host and bacterial lipids [7], liposomal drug distribution [8, 9], and cancer [10] in tissues.

H&E staining

H&E staining is a two-dye staining technique that is commonly used to evaluate tissues [11, 12]. It is also used in tandem with mass spectrometry [13]. The differential properties of the two dyes, hematoxylin and eosin, enhance the contrast of tissue features when observed under a microscope. Hematoxylin stains genetic material a blue-purple color, highlighting structures such as ribosomes and chromatin within the nucleus. Eosin stains cytoplasmic structures, highlighting cytoplasm, cell wall, collagen, and connective tissue in varying shades of pink [14]. H&E staining helps to discriminate between different types of cells and tissues and provides an important tool to understand the patterns, shapes, and arrangement of cells in a tissue sample [15, 16]. However, the evaluation of H&E-stained tissue still relies on the expertise of a trained pathologist or histologist; this process can be tedious and time-consuming, and there are abundant examples of similar pathologies that are not well resolved using H&E alone. The development of automated image segmentation to rapidly isolate regions of interest from standard stainings is an active area of interest.

Data compression

MSI datasets can be very large (∼ GB) depending on factors including spectral range, spatial sampling, and density of spectral data collection, with the potential for millions of ions to be represented at each location in a tissue sample. Dimensionality reduction techniques simplify the analysis of such datasets by representing MSI data compactly with minimal loss in information. Verbeeck et al. describe several unsupervised machine-learning approaches for MSI data analysis [17]. They compare principal components analysis (PCA) and NMF as dimensionality reduction techniques for MSI data, assessing the interpretability of extracted features using a synthetic dataset with known composition. They further report the ability of NMF to extract anatomically relevant regions in brain tissue imaged with MALDI MSI. Nijs et al. compared several dimensionality reduction algorithms including NMF, PLSA, LDA, and KL NMF, and found that NMF provides the best fit overall for MSI data [18]. Paine et al. were able to identify different compounds in cancer tissue from NMF spectra [19], establishing that NMF yields meaningful spectral components with peaks attributable to compounds present in the sample. Another important characteristic of the spatial components produced by NMF is its strong spatial correlation with anatomical tissue structure, which enables its capability to produce segmented views of tissue features. Trindade et al. used the spatial distributions of NMF components to differentiate similar but distinct resin types [20].

MSI data processing

Several authors have reported data processing methods to extract and interpret information from MSI data. These methods involve clustering of spectral and spatial features to extract tissue characteristics as well as annotation of metabolites of interest. Many clustering methods exist to interpret the spatial information in MSI data such as k-means, GMM, and TSNE. Clustering on MSI data is computationally expensive and is usually preceded by dimensionality reduction. Prasad et al. evaluated several clustering methods using both real and synthetic MSI data and found that clustering performance decreased with increasing complexity, and data compression prior to clustering improves the performance [21]. In our analysis based on novel visceral pain data, we found that performing NMF across multiple tissue samples inherently produced meaningful spatial distinction of the components without the need for explicit clustering.

Recent data processing approaches have reported new ways to incorporate either additional spectral or spatial features into MSI data after dimensionality reduction to extract interpretable information. Smets et al. incorporated spectral information in addition to spatial information by adding prioritization of selected m/z values to uniform manifold approximation and projection (UMAP) spatial embeddings [22]. Smets et al. have also reported an approach to combine molecular data from multiple UMAP spatial embeddings with histology data by creating low-dimensional 3D representations of RGB images which are fused using an adjustable parameter based on H&E data [23]. We have observed that NMF-compressed data inherently produces meaningful and interpretable features reflecting histology as well as spectral information and in this work have explored spectral and spatial representations based on linear combinations of NMF features.

Zhang et al. report a method that uses patches for data augmentation for training an ML model for subsequent dimensionality reduction and clustering [24]. The method presented in our paper also uses patches for data augmentation—but in this case for training of a classifier. The distinction is that the patches in our paper have already passed through a dimensionality reduction algorithm (i.e., NMF) whereas the Zhang et al. patches are taken directly from the raw MSI data and are used to train a dimensionality reduction algorithm. Unlike in this work, the Zhang et al. methodology allows spatially overlapping patches, which is suitable for training a dimensionality reduction algorithm but would introduce bias into the training of a classifier.

SVM classification

Support vector machines (SVM) are a class of supervised machine learning algorithms that are mostly used for classification and regression problems [25]. SVM is widely used in the data analysis of biological and other sciences [26]. SVM operates by finding a decision boundary with the maximum margin, i.e., one that is farthest away from all classes. The decision boundary in general can be quantified over a higher dimensional space than the ambient space of the features, giving rise to Kernel SVM, in which the kernel defines the high-dimensional feature mapping. Examples of such kernels are the Radial Basis Functions (RBF) and polynomial kernels [25]. When the kernel is the identity mapping, the resulting SVM is known as linear SVM. For linear SVM, the classifier is equivalent to a linear combination of the features that is discriminative of the classes. While Kernel SVM typically achieves higher classification accuracy [25], it results in more complex and often less interpretable models. Linear SVM, on the other hand, generates simpler models whose weights may be used to identify the latent features that contribute to the classifier’s performance [27].

Methodology

Ethics statement

This study was carried out in strict accordance with the recommendations in the guide for the care and use of laboratory animals of the National Institutes of Health and the guide for the use of laboratory animals by the International Association for the Study of Pain. The protocol was approved by the Institutional Animal Care and Use Committee (IACUC) at the University of Maryland, Baltimore (Protocol Number: 0220020).

Animal model

Nociplastic pain describes chronic pain conditions that are not due to injury or disease (e.g., temporomandibular disorder (TMD), irritable bowel syndrome (IBS), fibromyalgia, migraine headache). Human patients often experience two or more conditions resulting in comorbid or chronic overlapping pain conditions (COPCs). Stress modulates colonic pain through activation of the hypothalamic-pituitary-adrenal (HPA) axis and the sympathoadrenal medullary (SAM) axis evoking the release of inflammatory mediators sensitizing colonic afferents. This leads to the hypothesis that the transition from normal sensory processing in the GI tract to chronic visceral pain involves changes in metabolic processing in the colon. In animals, orofacial inflammation followed by stress results in chronic visceral hypersensitivity modeling pain in patients with TMD and IBS [28, 29]. Using this comorbid pain hypersensitivity (CPH) model in female rats, colon tissue was collected at a period of heightened visceral hypersensitivity.

Female rats (Envigo; 10 weeks old at arrival at University of Maryland, Baltimore, animal facility) were acclimated to the animal facility for one week. Naïve rats (n = 4) were left in their home cage under normal husbandry conditions for 3 weeks. Rats were then euthanized by CO2 asphyxiation followed by decapitation and tissue harvest. Following one week of acclimation, CPH rats (n = 4) were briefly sedated with isoflurane, and Complete Freund’s Adjuvant (CFA; Sigma-Aldrich, F5881; 50 μL, 1:1 in saline) was injected into both masseter muscles. Starting the following day restraint stress was produced by placing rats in Broome-style rodent restrainers (4.8 cm diameter, 20 cm length) preventing movement for 2 hrs per day for 4 consecutive days. Rats were tilted at a 45-degree angle head up or head down in 15-minute blocks alternating with 15-minute blocks in the horizontal position. Two weeks after the last stress session, rats were subject to colorectal distention (3 trials of 20, 40, 60 mmHg distention, 20 sec each, 3 min interstimulus interval). Rats were subsequently euthanized by CO2 asphyxiation followed by decapitation and tissue harvest.

Tissue preparation

Colons were collected from naïve and CPH groups (n = 4 ea.) from cecum to anus and placed into a petri dish with room temperature porcine gelatin in endotoxin-free water (2% w/v; Sigma G1890). Gelatin solution was injected (∼200 μL) at five evenly distributed points along the colon length using a 21G needle. The colons were split along the mesenteric line and fecal material was removed. The anal junction was grasped with two flat wooden toothpicks and rolled, lumenal side inward, toward the cecal junction. The colon rolls were then placed upright on a foil boat, float-frozen on a pool of liquid nitrogen, sealed and stored at –80°C before sectioning. Colon tissues were removed, prepared, and frozen in less than five minutes. Serial cryosections were collected on a Leica CM1950 (Leica Biosystems) starting from at least 1/3 of the cross-sectional depth of the rolled tissue at 12 μm thickness and thaw-mounted on indium tin oxide (ITO) glass microscope slides (Delta Technologies). This preparation orients the proximal colon on the outer rings of the roll and the distal colon in the center. Slides were stored at –80°C prior to data collection. At the time of analysis, glass slides were placed in a vacuum desiccator to thaw (less than five minutes total). An orientation light scan was collected on a flatbed scanner.

Mass spectrometry imaging and staining

Sections were coated with norharmane (NRM) matrix solution of 7 mg/mL in 2:1 (v:v) chloroform:methanol using an HTX M5 Matrix Sprayer (HTX Technologies, NC). The following matrix application settings were used: 10 passes, 10 psi, 2 L/min nitrogen gas, 30C nozzle, 40 mm height, 0.1 mL/min, 1200 mm/min velocity, CC pattern, and 2.5 mm track spacing. Data were collected on a Bruker timsTOF flex (Bruker Daltonics) instrument in negative ion mode from m/z 600–2000. The instrument was calibrated to the Agilent ESI peptide standard mix resulting in a sub-ppm standard deviation calibration. The MALDI laser was operated in the M5 small setting with 16 μm x 16 μm beam scan resulting in 50 μm spatial resolution. Data used in this work were collected using MALDI negative mode MSI since it has a wide range of molecular masses, and negative ion mode lipid data provides excellent reproduction of the details of tissue structures [30].

Raw data were individually imported into SCiLS Lab software [31] as centroided data on loading and the individual files were exported to the common data format, imzML [32] for further analysis. Following MSI, the matrix was cleared with two consecutive dips (10 seconds each) in 70% ethanol and tissue stained with H&E as previously described [30]. Slides were cleared in xylene and permount was used to attach coverslips. Optical images were collected on an Aperio slide scanner (Leica Biosystems) at 20x magnification and images were exported in eps format from Leica’s ScanScope software.

Datasets

8 MSI datasets along with their respective H&E-stained images were generated using tissue samples from the 4 CPH and 4 naïve animals. The 8 datasets collectively will be referred to as the ‘data cohort’, while the term ‘dataset’ will refer to the MSI data corresponding to a single tissue sample. Fig 1 shows ion images from the 8 datasets at m/z 885.5, with samples from CPH animals shown on the left, and naïve control animals shown on the right. The chosen ion at m/z 885.5 is suspected to be the lipid ion 1-stearoyl, 2-arachidonyl-phoshphatidylinositol (SAPI), which is known to play a role in pain hypersensitivity in animals [7]; however, the identity of this lipid candidate has not been confirmed.

Fig 1. The biological replicates.

Fig 1

(a) Mass-spectrum image at m/z 885.5, and (b) the H&E-stained image for each biological replicate. The CPH and datasets are shown as groups on the left and right respectively.

Data processing pipeline

The Python programming language running on a Dell Precision 5820 (Intel Core i9 10900X CPU with 20 cores) with 256 GB system memory was used to process and analyze the data. The Python library pyimzML [33] was used to parse the data from imzML format to the computer memory. Fig 2 summarizes the data processing pipeline which is explained in the steps below.

Fig 2. Overview of the data pipeline.

Fig 2

(a) Raw data is binned, truncated, and normalized. (b) The 3D image-spectral datasets are flattened and stacked into 2D matrix form. (c) Computation of NMF corresponding to spatial-spectral decomposition. (d) NMF features are used for further processing including classification and histological analysis.

  • Step 1: Data binning This experiment resulted in large individual datasets (∼ 13 GB per dataset) and is therefore saved in a sparse file format. Binning, as the pre-processing step, offers a two-fold advantage. First, it allows us to down-sample the data to a lower resolution to make the computations much faster in the subsequent steps. Second, it enables matrix calculations that are needed later, by transforming the data into uniform and equally spaced m/z bins. This is shown in Fig 3.

    The sparse nature of MSI datasets allows for spectral binning with minimal information loss. We used a bin width of 0.05 Da and maximum peak intensity (ion abundance) within a bin to represent the bin intensity. Binning at the 0.05 Da bin width reduces the spectral dimension from 100,000 to 28,000 features.

    The binned data was stored in memory in a 3-dimensional array of size (Ad × Bd × M) per dataset. Here, Ad and Bd are the number of pixels in the horizontal and vertical dimensions for a given dataset d which form an ‘ion image’ for each detected m/z value. A dataset can therefore be understood as a stack of M images, each with a size of Ad × Bd pixels. Binning leads to M being consistent for every pixel p in every dataset in the cohort. It should however be noted that the Ad and Bd values are different for each dataset d. This is because the tissue samples from different animals may take different physical shapes and sizes.

  • Step 2: Truncation Although each spectrum ranged from 600 Da to 2000 Da, we found that the spectra became much sparser beyond 1100 Da. This corresponds to the upper limit of the typical phospholipid mass range and was subsequently truncated to 1100 Da. This truncation further reduced the spectral dimension from 28000 to 10000 features.

  • Step 3: Normalization The binned data is subsequently normalized based on the total ion current (TIC) measure using the formula in Eq 1.
    I˜x,y,sd=Ix,y,sds=1MIx,y,sd (1)
    Here Ix,y,sd and I˜x,y,sd are the raw and TIC normalized intensities of the sth bin center (m/z value) at the (x, y)th pixel location of the dth dataset respectively, and M is the total number of bins in each pixel location (x, y).

    TIC normalization is an essential part of the pipeline. The sum of intensity values s=1MIx,y,sd for a TIC normalized spectrum at pixel location (x, y) adds up to 1.

  • Step 4: Dimensionality reduction After preprocessing the datasets as described in steps 1–3, we perform dimensionality reduction separately with NMF and with PCA. While the following steps describe the steps used with NMF, many of the same considerations apply to PCA.
    1. Step 4a: Flattening and stacking each dataset We first flatten the 3-dimensional datasets of size (Ad × Bd × M) into 2D datasets of size (AdBd × M), and stack them along the combined (xy) spatial axis for input to the NMF algorithm. This ensures that NMF finds basis vectors that are common to all datasets. Since NMF does not change the data order along the rows, we are able to separate the low-dimensional output corresponding to each dataset from the stacked output.
    2. Step 4b: Computing the NMF spatial and spectral features Stacking of the datasets leads to a combined data array I of dimensions (N˜×M), where N˜=d=18Ad×Bd is the total number of pixels in all datasets in the cohort. We performed NMF on this combined data array, reducing the dimensionality M from 10,000 to 20. This reduced number of dimensions is denoted by m.
      The NMF algorithm compresses raw MSI data as given in Eq 2.
      I˜x,y,sd=j=1mZx,y,jd·Ψs,j+Ex,y,sd (2)
      where the reduced dimension representation for I˜x,y,sd is defined at spectral bin s (m/z) and 2D spatial location (x, y) for the dth tissue sample (with d = 1, 2, ….D), Zx,y,jd is the jth spatial NMF component at location (x, y) and Ψs,j is the jth spectral NMF component at m/z bin s, and finally Ex,y,sd is the residual error that cannot be captured by the NMF decomposition. Note that the spectral NMF component is sample-independent to account for the population-level spectral composition, whereas the spatial component is sample-dependent to account for the sample-specific spatial variations. The NMF components are estimated by minimizing
      d=1Dx,y,s|I˜x,y,sd-j=1mZx,y,jd·Ψs,j|2 (3)
      subject to the non-negativity constraints Zx,y,jd0,x,y,j and Ψs,j ≥ 0, ∀s, j [34 35], in which the optimization problem in Eq 3 is typically solved using iterative methods.

Fig 3. MSI spectra before, and after binning.

Fig 3

Original MSI spectra are binned to bins of width 0.05 m/z to create uniformly spaced peaks. The bin size of 0.05 m/z sufficiently preserves the spectral resolution of the MSI data as can be seen through the insets.

This reduced feature space contains 20 basis vectors, (spectral NMF components; Ψ), and 20 low-dimensional features (spatial NMF components; Z). The portion of each spatial NMF component (Z·,·,jd) corresponding to each dataset d can be reshaped into an image describing the spatial distribution of lipid ions contained in its corresponding spectral component. The output of the NMF algorithm is the transformed MSI data with 20 component spectra and 20 spatial intensity maps.

The number of NMF components was selected to be 20 based on the normalized residual reconstruction error as shown in Fig 4. The normalized reconstruction error is defined as the sum of the squared difference between the binned MSI data and its NMF reconstruction, normalized by the sum of squares of the binned MSI data. It falls quickly for the first few NMF components and then more gradually, with the reconstruction error falling under 20% for 20 NMF components. Results in the remainder of the paper are presented for 20 NMF components.

Fig 4. Performance for NMF and PCA data compression in classification and data reconstruction.

Fig 4

(a) The relationship between SVM classification accuracy and the width of patches extracted from the NMF spatial intensity maps. The gray dotted line shows the patch width of 20 × 20 pixels used to generate the results presented in this paper. (b) The relationship between reconstruction error (normalized root-mean-squared error as a percentage) and the number of PCA/NMF components. The gray dotted line marks the normalized error of 20%.

We used the NMF algorithm available with the scikit-learn package for Python [36]. 6000 iterations with the default parameters were required for convergence. The sklearn NMF library does not sort NMF components according to any measure. Therefore, we use a backward elimination technique to rank them based on their contribution to the reconstruction of the binned MSI data. Starting with the full set of 20 NMF components, we remove one component and calculate the residual reconstruction error between the binned MSI data and its reconstruction using the remaining NMF components. The component that leads to the highest residual reconstruction error when removed is ranked as the most important component for the reconstruction task and is designated as component 0. This process is repeated until only 1 NMF component is left, which is the least important component for reconstruction and is designated as component 19.

Classification pipeline

This subsection describes the methodology used to train a support vector machine (SVM) classifier to distinguish between CPH and naïve data. High classification accuracy is one of the necessary conditions for the presence of ‘pain-related metabolites’ in CPH animals, and the absence of such in naïve animals. However, it should be noted that this is not a sufficient condition for the hypothesis to be considered true. The steps in the pipeline are described below.

  • Step 1: Spatial reconstruction In the dimensionality-reduction step, all the datasets were concatenated into a single array. To perform the classification, the datasets need to be separated and labeled as being from a CPH dataset or a naïve one. After separating and labeling the data, the NMF spatial intensity map for each dataset d was reconstructed by reshaping the data into a 3-dimensional array of size (Ad × Bd × m). It is important to notice the resemblance this has to the initial binned, truncated, and normalized 3-dimensional data array I˜x,y,sd mentioned in the data processing pipeline above. The key difference is that the depth dimension has now been reduced from 10,000 to 20. This modified data cohort will hereforth be referred to as the ‘compressed data cohort’.

  • Step 2: Data augmentation The 8 datasets in the compressed data cohort in their raw form would only contribute 8 labeled data samples for the classification task. Data augmentation was therefore required to prevent overfitting of the SVM classifier. As shown in Fig 5, augmentation was achieved by redefining a data sample as a spatially cropped version of a dataset from the compressed data cohort. Each dataset was spatially divided into a grid of non-overlapping patches with each patch being of size (20 × 20) pixels. All patches that were off-tissue were discarded from the augmented dataset. This defines a data sample to be a 3-dimensional array of size (20 × 20 × m), where m is the number of components in the feature space after dimensionality reduction and takes a value of 20. This approach makes the implicit assumption that the metabolites leading to pain in CPH animals are distributed throughout the entire tissue area.

    The 3-dimensional data samples were flattened into 1-dimensional vectors of size (8000 × 1) in preparation to be fed into an SVM classifier. The data augmentation step generated a total of 2,000 labeled sample vectors.

  • Step 3: Data split 80% of the data patches were used to train the classifier. The remaining 20% were used as a testing set. Both the training and testing sets were balanced such that there were equal numbers of samples for both the classes CPH and naïve.

  • Step 4: SVM classification Binary classification was performed with the positive class corresponding to CPH data and the negative class corresponding to naïve data. A high classification accuracy would therefore establish a necessary (but not sufficient) condition to infer that the CPH animals had certain features in their MSI data that correlated with their hypersensitivity to pain. We used the Support Vector Classifier (SVC) module of the Python sklearn library [36], evaluating both linear and radial basis functions (RBF) as kernels. We used 5-fold cross-validation, implemented with the GridSearchCV module of the Python sklearn library to tune the hyperparameters of the SVM algorithm.

Fig 5. SVM classification and data augmentation methodology.

Fig 5

20×20×20 patches of NMF data sampled from different regions of labeled CPH and naïve datasets were used to train and test the SVM classifier.

Ranking NMF components according to their contribution to classification accuracy

One of the goals of this study is to test the hypothesis that there exists certain lipid ion features or a cluster of features in CPH animals that correlate with sensitivity to pain. This hypothesis is tested with the results of the SVM classification. We identified the latent variables that support this hypothesis by evaluating which NMF components were most important in discriminating between CPH and naïve animals. We call these NMF components “the candidate list” as their spectra may contain lipid ions that are associated with pain hypersensitivity.

The data cohort used in this study contains images from 8 different animals with natural differences in the size of the tissue segments, resulting in different spatial sizes for the MSI ion images as well as the NMF spatial intensity maps. Images of different sizes produce different numbers of patches, and including all patches will bias the classification toward animals with larger MSI ion images. To overcome this bias, we developed a statistical sampling methodology, as follows:

  1. Determine the dataset with the smallest number of patches, P.

  2. Randomly select P image patches from each dataset.

  3. Use 5-fold cross-validation to train and test N SVM classifiers, each using only the intensity map corresponding to a single NMF component. This results in N values for mean cross-validation classification accuracy.

  4. Return to step 2, selecting a different random set of patches for each of the non-minimum size images.

  5. Repeat 50 times to obtain a distribution of classification accuracies for each of the N components.

To rank the 20 NMF components, we use an iterative approach. During the first iteration, N takes the value 20; i.e., the total number of NMF components. We perform two-sample unpaired t-tests to evaluate the statistical significance of the mean of each distribution against that of the distribution that has the highest mean [37]. We select the component with the highest mean and any other components that have mean accuracies statistically similar to it and append them to the candidate list. Let us now assume that the candidate list has K candidate components at the end of this first iteration. During the next iteration, we repeat steps 1–4 above on the candidate list, augmented by single NMF components that are not yet selected (i.e., N takes a value of 20 − K now) This process is repeated until the stopping criterion is reached, which in this case was to expand the candidate list until the components in the candidate list alone can produce a classification accuracy of at least 75% (i.e., 25% above chance level). This procedure generates a list of components that are ranked according to their impact on classification accuracy. This process is computationally intensive, so we used the University of Maryland supercomputing cluster to accelerate the computations through parallelization of the repeat calculations.

Image registration pipeline

Generating biological and mechanistic insight from untargeted spatial ‘omic information in raw MSI data is generally a difficult task due in part to its high dimensional nature. Further, tissue annotation is a time-consuming step that requires expert-level evaluation for complex pathology. In MSI, each ion will have its corresponding ion image. However, not all ions will have a distinctive spatial structure. In contrast, an H&E-stained image contains a multitude of anatomical features with fine spatial resolution, but without well-defined mapping to different tissue types due to the limited specificity available from shades of the two stains applied to the tissue.

Although H&E staining and MSI are separate modalities carrying separate types of information, their complementary nature enables augmenting the anatomical features visible in the H&E-stained images with information from the NMF spatial intensity maps. However, MS images and H&E-stained images have differences in scaling and non-linear perspective distortions as they are acquired using disparate types of instruments. Therefore, to effectively compare the NMF spatial intensity maps generated from MSI with H&E-stained images, we follow the image registration pipeline described below to align these images.

  • Step 1: Scaling Image alignment algorithms typically require the input images to be scaled to the same size. Therefore, the H&E-stained images which have higher pixel resolution are down-sampled to match the size of their corresponding NMF spatial intensity maps. This was accomplished using the Python OpenCV library [38].

  • Step 2: H&E-stained image segmentation The scaled H&E-stained images are decomposed into segments based on the color profile of visible spatial structures such as the muscular lining, mucosal layers, and regions of immune cell aggregation. As an example, to segment the muscular lining, a region of interest (ROI) 8 × 8 pixels in size is defined on top of the muscular lining in an H&E-stained image. The means μc and standard deviations σc of each color channel c (red, green, blue) within this ROI are calculated. The entire H&E-stained image is subsequently thresholded to extract pixels with intensities within two standard deviations of the mean (μc ± 2σc) of each color channel. Fig 6 shows the ROI determination and thresholding.

  • Step 3: Edge detection Edges act as landmarks that can enhance alignment. Therefore, we run the NMF spatial intensity maps and segments of H&E-stained images through a Canny edge detector available in the Python OpenCV library [39].

  • Step 4: Homography transformation and alignment We iteratively optimize a homography transformation algorithm in the Python OpenCV library between each H&E image segment and NMF spatial map. The optimal alignment between each image pair is obtained by maximizing the enhanced correlation coefficient (ECC) score [40]. The ECC score takes a high value if the NMF spatial map and the H&E image segment being aligned have similar spatial structures. For each dataset, we select the alignment that gives the highest ECC score and extract the corresponding warp matrix. We then apply the non-linear transformation defined by this warp matrix on the original H&E-stained image to obtain the desired alignment.

Fig 6. Simplified methodology for anatomical feature segmentation of H&E-stained images.

Fig 6

Red, green, and blue color channel intensity distributions are extracted in a region of interest (ROI) centered within the boundaries of the desired anatomical structure. Following outlier removal, 6 thresholds are determined based on the means and standard deviations of each color channel data as parameters to generate a mask that can be applied to the original H&E-stained image to segment the desired anatomical feature.

Enhanced tissue-specific image generation

The NMF spatial intensity maps contain anatomically relevant features. Color-coding each spatial intensity map and overlaying them on top of each other generates a composite ‘NMF-based H&E-like image’ that provides enhanced tissue specificity by highlighting different anatomical structures in different colors.

NMF spatial intensity maps contain pixels with both low and high intensities. Regions of low intensities generally correspond to noise or background and usually represent areas of a tissue that contain low abundances of ions defined by the respective NMF component’s spectrum. If such low-intensity pixels are color-coded, they may overlap with information-rich high-intensity pixels of a different NMF component’s spatial intensity map, thereby masking important information. Therefore it is important that the color-coding is only applied to pixels that have intensities above a certain threshold. This threshold may be tweaked depending on the context or depending on the contrast between the foreground and background pixels.

Once an appropriate threshold is determined, it is applied on the NMF spatial intensity maps to extract two binary masks; i) a foreground mask that defines the foreground pixel locations, and ii) a background mask that defines background pixel locations. We apply the background mask on the corresponding spatial intensity map to select and artificially set the below-threshold pixels to zeros. This way, only the foreground pixels will be color-coded during subsequent steps. To convert the grayscale NMF spatial intensity maps to their color-coded versions, we first determine an appropriate set of visually contrasting colors and obtain their {red, blue, green} vector mapping using a color vector table. A color vector is a three-element vector containing a value between 0 and 1 for each of the three color channels red, green, and blue. For example, the color vector (1, 0, 0) represents pure red while the color vector (0.1, 0.8, 0.3) represents a mixture of 10% red, 80% green, and 30% blue. The foreground pixels previously extracted are scaled by the color vector and assigned to three color channels to form a color-coded NMF spatial intensity map. This process is repeated for the remaining spatial intensity maps. As the final step, these color-coded NMF spatial intensity maps are combined into a single composite image by simply adding them together. To display images, they need to be converted to 8-bit unsigned integer arrays. The previous steps could generate certain pixels that have intensity values greater than 255, which cannot be represented by an 8-bit integer. Such pixels are artificially clamped at 255. It should be noted however that clamping of too many pixels could lead to a saturated image with poor contrast.

Results

Dimensionality reduction with NMF

Fig 7a shows the five NMF components that contributed the most towards reducing the reconstruction error, accounting for 70% of the reconstruction accuracy achieved with all 20 components. It should be noted that all spectral intensities are non-negative. The spatial components, i.e., the NMF spatial intensity maps, show diverse and distinct spatial structures. Data is shown from one representative dataset out of the 8 in the compressed data cohort.

Fig 7. The first five NMF and PCA components ranked according to their contribution towards MSI data reconstruction.

Fig 7

(a) First five NMF components. Observe how the NMF spectral peaks take only positive values. (b) First five PCA components. The PCA spatial maps gradually lose structure for higher-ordered PCA components.

Dimensionality reduction with PCA

Fig 7b shows the five PCA components that explain the most variance in the data, accounting for 90% of the variance and 13% of the reconstruction accuracy achieved with 20 PCA components. It should be noted that certain spectral intensities are negative. Fig 7b also shows how the spatial information captured by the first few PCA components are feature-dense, but the presence of distinct spatial structure gradually tails off with an increasing number of components. In comparison, the NMF representation captures sparser spectral and spatial components with approximately equal numbers of spectral peaks and spatial features in each component.

CPH vs naïve data discrimination

The F1 score is a measure of classification accuracy reflecting precision and recall. The SVM classifier achieved F1 scores of 99.9% and 87.5% on NMF training and testing data respectively. These results were achieved with a kernel SVM using the RBF kernel. SVM with a linear kernel achieved F1 scores of 95% and 83% on NMF training and testing data respectively.

Fig 4 shows how the classification accuracy depends on the size of image patches used during the data augmentation stage. Note that the number of patches was maintained at a constant value during these analyses.

We found that the RBF kernel SVM classifier achieved a classification accuracy of 99.9% and 87.4% on PCA training and testing data respectively. Although the SVM classification accuracy is comparable for PCA and NMF data, NMF produces histologically meaningful spatial components, and directly interpretable spectral components compared to their PCA counterparts.

This study was carried out using an MSI data cohort based on single tissue samples from 8 rats split into two experimental groups, CPH and naïve. As shown in Fig 4, successful discrimination of samples from the two groups was obtained using an SVM classifier based on relatively small (20 × 20 × 20) patches of NMF features. Classification accuracy generally increased with the number of NMF components used and with the patch size (saturating around 20 components). Classification accuracy increased from 75% for 5 NMF components, to 82% for 10 NMF components, to 87.2% for 20 NMF components (for 20 × 20 × 20 patches). Classification accuracy increased from 78% for 5 × 5 × 20 patches to a maximum of 87.65% for 25 × 25 × 20 patches, decreasing for larger patches due to the limited number of samples.

Most discriminating spectral features

Fig 8 shows the spatial and spectral distributions corresponding to the four NMF components that contributed the most toward classification accuracy. As explained in methods, these four components alone, when used in a linear SVM classifier yield a CPH vs naïve discriminatory F1 score of 77.5% compared to 83% for all 20 components (this accuracy level should not be confused with the 87.5% F1 score obtained when RBF kernel SVM was used with all 20 NMF components).

Fig 8. Four NMF components that contributed the most towards discriminating CPH vs naïve data samples.

Fig 8

Note that the spatial maps are distributed throughout the colon structure indicating that pain-causing biomarkers may be found throughout the colon and fall into one of three classes: complex fingerprint, simple predominant ion, and off tissue.

It can be observed in Fig 8 that the NMF components that contributed the most towards SVM accuracy are distributed throughout the swiss-roll structure and fall into three classes: NMF 10 exhibits a complex spectral “fingerprint” with moderate intensities, NMFs 0 and 2 reflect spectra predominated by high-intensity single lipid ions, and NMF 18 is an off-tissue component.

Alignment of H&E-stained images with spatial components of NMF

Fig 9 shows a selected subset of the aligned NMF spatial intensity maps alongside H&E-stained images. It is notable that the NMF spatial components reflect spatially coherent regions of the tissue such as the muscular lining, submucosa, regions of inflammation, etc., in the different components. As shown in Fig 10, by color-coding and overlaying individual NMF spatial intensity maps, an equivalent to an H&E-stained image with enhanced specificity for different tissue types can be obtained.

Fig 9. Spatial features in NMF components align strikingly well with anatomical features in H&E-stained images.

Fig 9

a) Example alignment for a CPH dataset showing H&E image, original NMF spatial components, and the aligned images which enhance identification of tissue structure. b) Alignment for a naïve dataset. Similar features are apparent in the raw and H&E images. However, the correlation is moderate due to alignment mismatch. The aligned images demonstrate excellent correlation and preserve detailed tissue structure.

Fig 10. Composite H&E-like images.

Fig 10

a) NMF spatial maps may be color-coded and combined to generate H&E-like images with enhanced contrast and spatial detail relative to actual H&E-stained images (despite the higher pixel resolution of H&E data). Only 5 components are represented for simplicity. The composite image contains many of the visible anatomical features of the original H&E image with greater specificity for different tissue structures. b) Color-coded PCA spatial maps do not overlay well to generate a high-contrast composite image. This is because only PCA 0 and PCA 1 have well-defined structures while the rest of the components are noisy, producing a smeared/saturated composite image.

This indicates that NMF features can be used to identify tissue structures, at least in rat colon tissue. This may reflect underlying differences in the phospholipids present in the cell membranes for different tissues which the NMF spatial-spectral decomposition successfully retains. It is hypothesized that this observation may also translate to interpretable feature extraction in other types of tissue with complex histological features.

Discussion

In this work, we discussed the high interpretability of both spatial and spectral components generated by NMF. The NMF spectral components are strictly non-negative, and can thus be interpreted to represent the presence of specific lipid ions corresponding to the mass-to-charge peaks in the spectra. We also found that NMF spatial intensity maps correlate strongly with spatial tissue structure and therefore could be used to obtain information typically captured with a different modality such as H&E-stained imaging.

Comparison of NMF with PCA

We compared NMF feature extraction with PCA, a dimensionality reduction technique commonly used with MSI data. A comparison of quantitative and qualitative performance characteristics is shown in Table 1.

Table 1. Comparison of performance characteristics for PCA and NMF features on MSI data cohort from an animal model of comorbid visceral pain hypersensitivity.

Performance comparison PCA NMF
Reconstruction error (20 components) 17.99% 18.94%
Classifier accuracy (20 components) 88.09% 87.65%
Spectral interpretability requires extra processing nonnegativity allows molecular ID
Spatial interpretability 1–2 components correlate with histological features most components exhibit significant correlation with histological features
H&E-like composite image Blurred and spatially overlapping (Fig 10b) High spatial resolution, distinct spatial features (Fig 10a)

The fact that PCA components are allowed to have negative values makes it difficult to interpret the meaning carried by the peak intensities in a PCA spectrum. A PCA component having large positive peak intensities may superimpose with another component carrying negative peak intensities to generate a reconstructed spectrum carrying zero intensity peaks. Therefore, although PCA permits reconstructing the data with high accuracy, it does not generate readily interpretable information in each component spectrum. In contrast, NMF spectra are strictly non-negative. Therefore, spectral intensities carried by each component are always constructively superimposed during reconstruction. Consequently, if a given NMF component spectrum shows a significant ion, this peak will be clearly reflected in the reconstructed data. Hence, given a sufficiently low NMF reconstruction error, we can confidently state that the presence of an ion in an NMF component spectrum indicates that the lipid annotation corresponding to that mass-to-charge value was present in the tissue sample.

The results shown in Fig 7b establish that the PCA spatial and PCA spectral components exhibit distinct structures for low-order components, but the presence of distinct structures markedly decreases for higher-order components. This may be attributed to the fact that the PCA algorithm extracts components such that a maximum amount of variance in the data is captured in each successive component. Therefore, the higher-order PCA components will capture finer detail of the data such as pixel noise. In contrast, the NMF algorithm extracts components by optimizing the reconstruction error with a positivity constraint, thereby capturing differences inherent to the data in different components which together represent the whole data. It can be observed (Fig 7) that spatial structure is prominent only in the first few PCA components but quickly becomes blurred, whereas the spatial structure in NMF components is preserved for all components. In some cases, NMF spatial components also represent off-tissue or matrix components.

Ranking NMF components

We have established a methodology for ranking the contribution of each NMF component to classification accuracy, with the top four components shown in Fig 8. The NMF spectra corresponding to the components fall into two general classes, those that indicate the predominant presence of single ions and those that reflect a more complex combination of ions and their relative abundances. For example, the significant peak in NMF component 2 aligns with the putative phospholipid SAPI which has been found in other work to correlate with inflammation [7]. Likewise, the significant peak in component 0 seems to correspond with the putative phospholipid PI 36:1, while the spectrum in component 10 reflects a complex combination of ions that is more indicative of something like a tissue fingerprint. These identification of compounds from the NMF peaks are from preliminary analysis and have not yet been validated experimentally. This is one of the future directions of our ongoing research project.

Interpreting NMF spatial maps

As shown in Fig 10, by color-coding individual NMF spatial intensity maps, an equivalent to an H&E-stained image with better spatial specificity for different tissue types may be obtained. This is an unexpected result as there is no a-priori requirement for NMF components to capture anatomical information separately in its components. This ability to use NMF spatial intensity maps as a basis to generate ‘H&E-stained like’ images with higher spatial detail promises a link between the two techniques of MSI and H&E-stained imaging. There is good evidence here for an NMF-based H&E feature extraction tool that can automate the reading of well-defined tissue stains given enough training, and full automation of MSI-H&E coregistration based on automated regions of interest. We note that this study identifies useful roles for both the spectral and spatial components resulting from NMF feature extraction, with the spectral components providing interpretability into constituent ions and the spatial components enhancing the spatial interpretability of tissue structure and yielding high classification accuracy.

Attempts to align NMF spatial maps with H&E-stained images using approaches based on feature-matching were unsuccessful. This could be attributed to the graininess present in both NMF spatial maps and H&E-stained images, thereby precluding standard feature-based alignment techniques from extracting landmark features to generate a satisfactory alignment. We found that the homography transformation tuned by maximizing the enhanced correlation coefficient was successful in aligning NMF and H&E data.

Limitations of NMF

Despite the demonstrated advantages of NMF feature extraction for MSI data, the computation itself has a relatively high cost. The requirement for high resolution in binning also increases the computational burden. At 0.05 Da bin size, an MSI dataset may occupy 20—30 GB of system memory. This leads to approximately 200 GB of data when the 8 datasets are stacked. While the PCA algorithm executes on this stacked data in under 10 minutes, approximately 24 hours were required for the convergence of the NMF algorithm. These bottlenecks may preclude the average user from using this technique on large datasets. However, the initial exploratory analysis could be achieved faster by using a larger bin width (0.1 Da). Indeed, similar results were obtained in preliminary analysis with bin widths of 1 Da. This issue could be resolved in future work by efficient computation with multiprocessing, better algorithms to compute NMF using GPU servers, and batch implementations of the NMF algorithm such that a large number of datasets could be processed with limited computational resources.

Conclusion

This research was motivated by the observation that NMF provides effective feature extraction for MSI data, with individual NMF components exhibiting a strong correlation with underlying tissue structures and offering interpretability according to the m/z values of constituent molecular compounds. The novel contribution of this paper consists of three data processing methodologies that perform data augmentation to support the training and testing of classifiers, ranking of the most important features for classification, and image registration to support tissue-specific imaging. These methods are demonstrated on a novel MSI data cohort for a rodent model of chronic visceral pain.

The MSI data processing pipelines establish distinct roles for the spectral and spatial NMF components. The spectral components allow for interpretability in terms of the m/z values of constituent ions due to the nonnegativity constraint in NMF spectral decomposition. The spatial components not only enhance the contrast and spatial detail of tissue structures but also are distinctive enough to allow for high classification accuracy.

The overall advantage of these methodologies is spatial and spectral representations of MSI data that are directly interpretable, an observation that we have leveraged in conjunction with downstream analysis methods. We note that the PCA features exhibit similar or slightly higher performance than the NMF features for classification and reconstruction (Fig 4); the spectral and spatial interpretability of NMF features is a distinct advantage over PCA, allowing NMF components to be used directly for downstream data processing such as classification and data fusion without the need for an additional clustering step. The novel data augmentation technique allows data cohorts with a limited number of tissue samples to be used for training and testing unbiased classifiers. The novel feature ranking technique allows data analysis efforts to highlight components that are most discriminative between two experimental groups. This allows those components to be prioritized in subsequent investigations and analysis. The novel image registration technique allows NMF feature components with underlying correlation to tissue structure (as identified in H&E-stained images) to be identified and combined to enhance specificity for different tissue types and anatomical structures. Image registration is required when establishing correlations between multiple experimental techniques that do not ensure registration at the scale of individual pixels.

The main disadvantage and limitation of these methodologies arises from their high computational cost. For the visceral pain data cohort with all 8 samples, execution of the NMF algorithm on a Dell Precision 5820 (Intel Core i9 10900X CPU with 20 cores) with 256 GB system memory required approximately 24 hours whereas execution of the PCA algorithm finished in about 10 minutes. The computational cost is further increased due to multiple samples of MSI data required in the data methodology presented here and limits the number of samples that can be processed.

Detailed descriptions of data processing pipelines are presented for 1) NMF feature extraction, 2) classification based on NMF features, and 3) image registration of NMF and H&E data. Three novel methodologies were developed for data augmentation, feature ranking, and image registration. The utility of these methods is demonstrated on a novel MSI data cohort for a rodent model of chronic visceral pain and supported by results including the successful and robust classification of naïve and co-morbid pain subjects as well as a meaningful interpretation of NMF features regarding tissue histology.

Supporting information

S1 Fig. NMF spatial maps for the data cohort.

Each row shows 20 NMF spatial maps for each of the 8 datasets. The first four rows represent the CPH data and the last four rows represent the naïve data.

(TIF)

pone.0300526.s001.tif (2.1MB, tif)
S2 Fig. PCA spatial maps for the data cohort.

Each row shows 20 PCA spatial maps for each of the 8 datasets. The first four rows represent the CPH data and the last four rows represent the naïve data.

(TIF)

pone.0300526.s002.tif (2.2MB, tif)
S3 Fig. Overlay of NMF spatial maps on the H&E images.

Overlay of 20 NMF spatial maps over the corresponding H&E image for each of the 8 datasets. The first four rows represent the CPH data and the last four rows represent the naïve data.

(TIF)

pone.0300526.s003.tif (4.3MB, tif)

Data Availability

All MSI data files are available from the open science website Zenodo (doi: 10.5281/zenodo.7901681).

Funding Statement

This work was supported by a 2021 MPower Seed Grant from The University of Maryland Strategic Partnership: MPowering the State to authors PA, RE, RG, BB, AS, and RT. The funder website is https://mpower.maryland.edu/ The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Buchberger AR, DeLaney K, Johnson J, Li L. Mass spectrometry imaging: A review of emerging advancements and future insights. Anal Chem [Internet]. 2018;90(1):240–65. Available from: 10.1021/acs.analchem.7b04733 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Dilmetz BA, Lee Y-R, Condina MR, Briggs M, Young C, Desire CT, et al. Novel technical developments in mass spectrometry imaging in 2020: A mini-review. Anal Sci Adv [Internet]. 2021;2(3–4):225–37. Available from: 10.1002/ansa.202000176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Porta Siegel T, Hamm G, Bunch J, Cappell J, Fletcher JS, Schwamborn K. Mass spectrometry imaging and integration with other imaging modalities for greater molecular understanding of biological tissues. Mol Imaging Biol [Internet]. 2018;20(6):888–901. Available from: 10.1007/s11307-018-1267-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Tobias F, Olson MT, Cologna SM. Mass spectrometry imaging of lipids: untargeted consensus spectra reveal spatial distributions in Niemann-Pick disease type C1. J Lipid Res [Internet]. 2018;59(12):2446–55. Available from: 10.1194/jlr.D086090 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Diao X, Ellin NR, Prentice BM. Selective Schiff base formation via gas-phase ion/ion reactions to enable differentiation of isobaric lipids in imaging mass spectrometry. Anal Bioanal Chem [Internet]. 2023; Available from: 10.1007/s00216-023-04523-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Claes BSR, Bowman AP, Poad BLJ, Young RSE, Heeren RMA, Blanksby SJ, et al. Mass spectrometry imaging of lipids with isomer resolution using high-pressure ozone-induced dissociation. Anal Chem [Internet]. 2021;93(28):9826–34. Available from: 10.1021/acs.analchem.1c01377 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Scott AJ, Post JM, Lerner R, Ellis SR, Lieberman J, Shirey KA, et al. Host-based lipid inflammation drives pathogenesis in Francisella infection. Proc Natl Acad Sci U S A [Internet]. 2017;114(47):12596–601. Available from: 10.1073/pnas.1712887114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Blanc L, Lenaerts A, Dartois V, Prideaux B. Visualization of Mycobacterial biomarkers and tuberculosis drugs in infected tissue by MALDI-MS imaging. Anal Chem [Internet]. 2018;90(10):6275–82. Available from: 10.1021/acs.analchem.8b00985 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Cheng S-H, Groseclose MR, Mininger C, Bergstrom M, Zhang L, Lenhard SC, et al. Multimodal imaging distribution assessment of a liposomal antibiotic in an infectious disease model. J Control Release [Internet]. 2022;352:199–210. Available from: 10.1016/j.jconrel.2022.08.061 [DOI] [PubMed] [Google Scholar]
  • 10. Holzlechner M, Eugenin E, Prideaux B. Mass spectrometry imaging to detect lipid biomarkers and disease signatures in cancer. Cancer Rep [Internet]. 2019;2(6):e1229. Available from: 10.1002/cnr2.1229 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Hristu R., Stanciu S. G., Dumitru A., Paun B., Floroiu I., Costache M., et al. (2021). Influence of hematoxylin and eosin staining on the quantitative analysis of second harmonic generation imaging of fixed tissue sections. Biomedical Optics Express, 12(9), 5829–5843. 10.1364/BOE.428701 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Li Y., Li N., Yu X., Huang K., Zheng T., Cheng X., et al. (2018). Hematoxylin and eosin staining of intact tissues via delipidation and ultrasound. Scientific Reports, 8(1), 12259. 10.1038/s41598-018-30755-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Deutskens F., Yang J., & Caprioli R. M. (2011). High spatial resolution imaging mass spectrometry and classical histology on a single tissue section. Journal of Mass Spectrometry, 46(6), 568–571. 10.1002/jms.1926 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Chan JKC. The wonderful colors of the hematoxylin-eosin stain in diagnostic surgical pathology. Int J Surg Pathol [Internet]. 2014;22(1):12–32. Available from: 10.1177/1066896913517939 [DOI] [PubMed] [Google Scholar]
  • 15. Titford M. The long history of hematoxylin. Biotech Histochem [Internet]. 2005;80(2):73–8. Available from: 10.1080/10520290500138372 [DOI] [PubMed] [Google Scholar]
  • 16. Feldman AT, Wolfe D. Tissue processing and hematoxylin and eosin staining. Methods Mol Biol [Internet]. 2014;1180:31–43. Available from: 10.1007/978-1-4939-1050-2_3 [DOI] [PubMed] [Google Scholar]
  • 17. Verbeeck N, Caprioli RM, Van de Plas R. Unsupervised machine learning for exploratory data analysis in imaging mass spectrometry. Mass Spectrom Rev [Internet]. 2020;39(3):245–91. Available from: 10.1002/mas.21602 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Nijs M, Smets T, Waelkens E, De Moor B. Mathematical comparison of non-negative matrix factorization related methods with practical implications for the analysis of mass spectrometry imaging data. Rapid Commun Mass Spectrom [Internet]. 2021;35(21):e9181. Available from: 10.1002/rcm.9181 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Paine MRL, Kim J, Bennett RV, Parry RM, Gaul DA, Wang MD, et al. Whole reproductive system non-negative matrix factorization mass spectrometry imaging of an early-stage ovarian cancer mouse model. PLoS One [Internet]. 2016;11(5):e0154837. Available from: 10.1371/journal.pone.0154837 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Trindade GF, Abel M-L, Lowe C, Tshulu R, Watts JF. A time-of-flight secondary ion mass spectrometry/multivariate analysis (ToF-SIMS/MVA) approach to identify phase segregation in blends of incompatible but extremely similar resins. Anal Chem [Internet]. 2018;90(6):3936–41. Available from: 10.1021/acs.analchem.7b04877 [DOI] [PubMed] [Google Scholar]
  • 21. Prasad M., Postma G., Franceschi P., Buydens L. M. C., & Jansen J. J. (2022). Evaluation and comparison of unsupervised methods for extracting spatial patterns from mass spectrometry imaging data (MSI). Scientific Reports, 12(1). 10.1038/s41598-022-19365-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Smets T., Waelkens E., & De Moor B. (2020). Prioritization of m/z-values in mass spectrometry imaging profiles obtained using uniform manifold approximation and projection for dimensionality reduction. Analytical Chemistry, 92(7), 5240–5248. 10.1021/acs.analchem.9b05764 [DOI] [PubMed] [Google Scholar]
  • 23. Smets T., De Keyser T., Tousseyn T., Waelkens E., & De Moor B. (2021). Correspondence-aware manifold learning for microscopic and spatial omics imaging: A novel data fusion method bringing mass spectrometry imaging to a cellular resolution. Analytical Chemistry, 93(7), 3452–3460. doi: 10.1021/acs.analchem.0c04759 [DOI] [PubMed] [Google Scholar]
  • 24. Zhang W., Claesen M., Moerman T., Groseclose M. R., Waelkens E., De Moor B., et al. (2021). Spatially aware clustering of ion images in mass spectrometry imaging data using deep learning. Analytical and Bioanalytical Chemistry, 413(10), 2803–2819. doi: 10.1007/s00216-021-03179-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst [Internet]. 1998;13(4):18–28. Available from: 10.1109/5254.708428 [DOI] [Google Scholar]
  • 26. Byvatov E, Schneider G. Support vector machine applications in bioinformatics. Appl Bioinformatics. 2003;2(2):67–77. [PubMed] [Google Scholar]
  • 27.Alexander Statnikov, Douglas Hardin, and Constantin Aliferis. Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables. In Computational Causal Discovery Laboratory; 2006.
  • 28. Traub RJ, Cao D-Y, Karpowicz J, Pandya S, Ji Y, Dorsey SG, et al. A clinically relevant animal model of temporomandibular disorder and irritable bowel syndrome comorbidity. J Pain [Internet]. 2014;15(9):956–66. Available from: 10.1016/j.jpain.2014.06.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Ji Y., Hu B., Klontz C., Li J., Dessem D., Dorsey S. G., & Traub R. J. (2020). Peripheral mechanisms contribute to comorbid visceral hypersensitivity induced by preexisting orofacial pain and stress in female rats. Neurogastroenterology and Motility: The Official Journal of the European Gastrointestinal Motility Society, 32(7), e13833. doi: 10.1111/nmo.13833 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Scott AJ, Chandler CE, Ellis SR, Heeren RMA, Ernst RK. Maintenance of deep lung architecture and automated airway segmentation for 3D mass spectrometry imaging. Sci Rep [Internet]. 2019;9(1):20160. Available from: 10.1038/s41598-019-56364-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Trede D. SCiLS Lab: software for analysis and interpretation of large MALDI-IMS datasets. In: Proceedings of OurCon. Ourense, Spain; 2012.
  • 32. Römpp A, Schramm T, Hester A, Klinkert I, Both J-P, Heeren RMA, et al. imzML: Imaging Mass Spectrometry Markup Language: A common data format for mass spectrometry imaging. Methods Mol Biol [Internet]. 2011;696:205–24. Available from: 10.1007/978-1-60761-987-1_12 [DOI] [PubMed] [Google Scholar]
  • 33.Alexandrov Team, Fay. pyimzML: A parser to read .imzML files with Python [Internet]. 2016 [cited 2023 Apr 10]. Available from: https://github.com/alexandrovteam/pyimzML.
  • 34. Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature [Internet]. 1999;401(6755):788–91. Available from: 10.1038/44565 [DOI] [PubMed] [Google Scholar]
  • 35. Wang Y-X, Zhang Y-J. Nonnegative Matrix Factorization: A Comprehensive Review. IEEE Trans Knowl Data Eng [Internet]. 2013;25(6):1336–53. Available from: 10.1109/tkde.2012.51 [DOI] [Google Scholar]
  • 36. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python [Internet]. arXiv [cs.LG]. 2012. Available from: http://arxiv.org/abs/1201.0490. [Google Scholar]
  • 37. Lehmann EL, Romano JP, Casella G (2005) Testing statistical hypotheses. Springer. Available from: 10.1007/0-387-27605-X. [DOI] [Google Scholar]
  • 38. Bradski G. The openCV library. Dr. Dobb’s Journal: Software Tools for the Professional Programmer. 2000;25:120–3. [Google Scholar]
  • 39. Canny J. A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell. 1986;8(6):679–98. doi: 10.1109/TPAMI.1986.4767851 [DOI] [PubMed] [Google Scholar]
  • 40. Evangelidis GD, Psarakis EZ. Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans Pattern Anal Mach Intell [Internet]. 2008;30(10):1858–65. Available from: 10.1109/TPAMI.2008.113 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Bardia Yousefi

26 Jul 2023

PONE-D-23-15506Interpretable dimensionality reduction and classification of mass spectrometry imaging data in a visceral pain model via non-negative matrix factorizationPLOS ONE

Dear Dr. Abshire,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Sep 09 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Bardia Yousefi, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. To comply with PLOS ONE submissions requirements, in your Methods section, please provide additional information regarding the experiments involving animals and ensure you have included details on (1) methods of sacrifice, (2) methods of anesthesia and/or analgesia, and (3) efforts to alleviate suffering.

3. Thank you for stating the following financial disclosure: 

"This work was supported by a 2021 MPower Seed Grant from The University of Maryland Strategic Partnership: MPowering the State to authors PA, RE, RG, BB, AS, and RT. The funder website is 

https://mpower.maryland.edu/

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

We note that one or more of the authors is affiliated with the funding organization, indicating the funder may have had some role in the design, data collection, analysis or preparation of your manuscript for publication; in other words, the funder played an indirect role through the participation of the co-authors. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please do the following:

(1) Review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. These amendments should be made in the online form.

(2) Confirm in your cover letter that you agree with the following statement, and we will change the online submission form on your behalf: 

“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

4. We noted in your submission details that a portion of your manuscript may have been presented or published elsewhere:

"A version of this submission has been uploaded to bioRxiv.

" ext-link-type="uri" xlink:type="simple">https://www.biorxiv.org/content/10.1101/2023.04.24.538180v1"

Please clarify whether this publication was peer-reviewed and formally published. If this work was previously peer-reviewed and published, in the cover letter please provide the reason that this work does not constitute dual publication and should be included in the current manuscript.

5. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

6. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

7. Please ensure that you refer to Figure 3 in your text as, if accepted, production will need this reference to link the reader to the figure.

Additional Editor Comments:

This article has good merits, but needs a revision before it goes further. Thanks

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Interpretable dimensionality reduction and classification of mass spectrometry imaging data in a visceral pain model via non-negative matrix factorization

The authors tried to analyze the Mass spectrometry imaging (MSI) data using non-negative matrix factorization. NMF used to reduce dimensionality and encountering spatial components.

In my opinion, this manuscript has these good points:

- The subject is interesting and might absorb many readers in the field;

- Nice written and well presenting the idea;

- Suitable analytical representations;

Also, there are some suggestions that would increase the strength of the paper which is listed bellows;

- One of my major points about your article concerns the novelty of this article, authors should improve their novelty more highlighted. NMF is used for dimensionality reduction and spectroscopy data analysis (please google this to find more published contributions in the NMF for Hyperspectral, and spectroscopy), what is new in your article. Please specify your contributions.

- Why did the size of data shrink from 100K to 20? There should a gap statistic similar approach to justify this.

- Similarity to clustering is also needed to the be highlighted and how does this manifest itself into the analysis.

Thank you

Reviewer #2: In the presentation of the work, the article has a nice beginning; nonetheless, it has to be examined in order to help the reader comprehend “Interpretable dimensionality reduction and classification of mass spectrometry imaging data in a visceral pain model via non-negative matrix factorization." Despite the fact that I believe the work does not satisfy the requirements for publication in PLOS ONE and that there are some concerns that need to be addressed, I strongly propose a comprehensive review that will add value to the results that were acquired through discussion. The writers need to make some changes to the paper. On the other hand, I would like to provide the authors with the following remarks and suggestions:

1. The abstract does not communicate well; it must be revised.

2. In particular, it is not entirely evident how this publication contributes to the body of previous research when compared to other papers that have been published. Because of this, unable to propose that the current version be accepted.

3. In Background and related work section, the author can introduce more literature and analyse its shortcomings to highlight the advantages and innovations of this paper.

4. The results themselves need to be explained, which is why there must be a section or paragraph dedicated to the discussion along with an appropriate comparison table of the suggested work.

5. More specifically, only simulated results were presented, and there was no attempt made to verify the suggested work by practical means in the tabular form.

6. Add some recently proposed techniques (2020-2023) in the related work section of the manuscript.

7. There are too many spelling and grammar mistakes in the paper. It needs proper spelling and grammar checking.

8. Conclusion section should be extended by mentioning the advantages, disadvantages, and limitations of the study.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Oct 10;19(10):e0300526. doi: 10.1371/journal.pone.0300526.r002

Author response to Decision Letter 0


30 Oct 2023

We thank the reviewers for their thoughtful feedback on our manuscript. We have now substantially revised our manuscript by including new analyses, clarifying our results, and improving the clarity and rigor of our expositions. In summary, the major changes include:

1) Adding new analyses to justify the choice of the reduced dimensions (Fig. 4b) and extending the comparison of our proposed approach with existing methods (Fig. 10b and Table 1);

2) Substantially revising the abstract to emphasize the novel contributions of the paper and to be more accessible for a general audience;

3) Clarifying our main contributions in the context of existing recent work, in both the background and discussion sections, and highlighting the advantages, disadvantages, and limitations of our approach;

4) Substantially revising and enhancing the conclusion section; and

5) Improving the rigor and clarity of our presentation by significantly revising the text throughout the manuscript.

In what follows, we respond to the comments of the reviewers in a point-by-point fashion.

We would like to thank the reviewing editor and the anonymous reviewers for their careful critique of our work and for their constructive and thorough feedback.

Reviewer #1:

Interpretable dimensionality reduction and classification of mass spectrometry imaging data in a visceral pain model via non-negative matrix factorization

The authors tried to analyze the Mass spectrometry imaging (MSI) data using non-negative matrix factorization. NMF used to reduce dimensionality and encountering spatial components.

In my opinion, this manuscript has these good points:

- The subject is interesting and might absorb many readers in the field;

- Nice written and well presenting the idea;

- Suitable analytical representations;

Response: We thank the reviewer for summarizing the strengths of our contributions.

Reviewer #1:

Also, there are some suggestions that would increase the strength of the paper which is listed bellows;

- One of my major points about your article concerns the novelty of this article, authors should improve their novelty more highlighted. NMF is used for dimensionality reduction and spectroscopy data analysis (please google this to find more published contributions in the NMF for Hyperspectral, and spectroscopy), what is new in your article. Please specify your contributions.

Response: We thank the reviewer for raising this point on clarifying the novelty of our work. We have now revised our abstract to clarify the main contributions of the work as being three novel MSI data analysis techniques that leverage the spatial and spectral interpretability of NMF compressed MSI data (pp. 1 - 2).

We also added a section highlighting the connection to existing work in MSI data analysis to highlight the novel aspects of this work (pp. 5 - 6).

Reviewer #1:

- Why did the size of data shrink from 100K to 20? There should a gap statistic similar approach to justify this.

Response:

We are not performing a traditional clustering, so gap statistics are not readily available or well defined for this approach. Instead we set a target error rate of 20% in reconstruction accuracy and used this criterion to select the number of components to retain in subsequent analysis. We have now revised Fig 4 to clearly show the selection criterion for the reduced dimension (for both PCA and NMF) (pp. 13)

Reviewer #1:

- Similarity to clustering is also needed to the be highlighted and how does this manifest itself into the analysis.

Response: One of the key findings in this work is that NMF inherently produces components with spatially distinct structure obviating the need for an additional explicit clustering step. We have now added this explanation to the section on review of existing MSI data processing methods (pp. 5 - 6).

Reviewer #2:

In the presentation of the work, the article has a nice beginning; nonetheless, it has to be examined in order to help the reader comprehend “Interpretable dimensionality reduction and classification of mass spectrometry imaging data in a visceral pain model via non-negative matrix factorization." Despite the fact that I believe the work does not satisfy the requirements for publication in PLOS ONE and that there are some concerns that need to be addressed, I strongly propose a comprehensive review that will add value to the results that were acquired through discussion. The writers need to make some changes to the paper. On the other hand, I would like to provide the authors with the following remarks and suggestions:

Response: We thank the reviewer for their careful critique of our work and for providing a number of insightful suggestions to improve it.

Reviewer #2:

1. The abstract does not communicate well; it must be revised.

Response: We have now revised our abstract to clarify the main contributions of the work as being three novel MSI data analysis techniques that leverage the spatial and spectral interpretability of NMF compressed MSI data (pp. 1 - 2).

Reviewer #2:

2. In particular, it is not entirely evident how this publication contributes to the body of previous research when compared to other papers that have been published. Because of this, unable to propose that the current version be accepted.

Response: As noted above, we have now revised the abstract to better communicate the novelty of our work (pp. 1 - 2).

We have also added a more thorough literature review, specifically highlighting the novelty of our work in terms of NMF as it applies to downstream MSI data processing (pp. 5 - 6).

Reviewer #2:

3. In Background and related work section, the author can introduce more literature and analyze its shortcomings to highlight the advantages and innovations of this paper.

Response: We have now added a more thorough literature review, specifically highlighting the novelty of our work on NMF data compression as it applies to downstream MSI data processing. We have also emphasized how our work relates to existing approaches (pp. 5 - 6).

Reviewer #2:

4. The results themselves need to be explained, which is why there must be a section or paragraph dedicated to the discussion along with an appropriate comparison table of the suggested work.

Response: We have added subheadings in the discussion section to highlight the material related to results for distinct topics in our paper and improve readability. (pp. 22- 25)

Since the novelty of our work lies in data processing methodologies (i.e., how to integrate NMF into data processing pipelines for downstream analyses), it is hard to perform a quantitative comparison with other methods.

To establish a benchmark for how NMF and PCA perform on the novel dataset used in this work, we have added a table illustrating the comparison of PCA and NMF results on this novel dataset (pp. 22).

We have also modified Fig 10 to show how the methodologies described in this paper, when applied to NMF compressed MSI data, compare against MSI data compressed with PCA. (pp. 21)

Reviewer #2:

5. More specifically, only simulated results were presented, and there was no attempt made to verify the suggested work by practical means in the tabular form.

Response: All of the results in this paper are based on a novel MSI dataset obtained from an animal model of comorbid visceral pain hypersensitivity, which is described in the Animal Model, Tissue Preparation, Mass spectrometry imaging and staining, and Datasets sections (pp. 7-9), We have reworded the abstract to emphasize the novelty of this dataset that we use for validating the methods (pp. 1 - 2). As noted above, we have also added a table comparing PCA and NMF approaches as applied to this novel dataset (pp. 22).

Reviewer #2:

6. Add some recently proposed techniques (2020-2023) in the related work section of the manuscript.

Response: We have added a section on MSI data processing highlighting recent contributions and their connection to this work. We have updated the bibliography with the respective literature (references 21 - 24) (pp. 5 - 6).

Reviewer #2:

7. There are too many spelling and grammar mistakes in the paper. It needs proper spelling and grammar checking.

Response: We have thoroughly proofread the manuscript for spelling and grammatical mistakes and have resolved all issues that we found.

Reviewer #2:

8. Conclusion section should be extended by mentioning the advantages, disadvantages, and limitations of the study.

Response: We have significantly expanded the conclusion, describing in detail the advantages, disadvantages and limitations of our study (pp. 25 - 26).

Attachment

Submitted filename: Response_to_Reviewers_PONE-D-23-15506R1.docx

pone.0300526.s004.docx (30.2KB, docx)

Decision Letter 1

Bardia Yousefi

29 Feb 2024

Interpretable dimensionality reduction and classification of mass spectrometry imaging data in a visceral pain model via non-negative matrix factorization

PONE-D-23-15506R1

Dear Dr. Abshire,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Bardia Yousefi, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Authors responded well to the comments received. I recommend accepting this manuscript

Congratulations

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Authors responded well to my comments.

Particularly respond to the novelty was sufficient.

I don't have any comments.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

Acceptance letter

Bardia Yousefi

21 Mar 2024

PONE-D-23-15506R1

PLOS ONE

Dear Dr. Abshire,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Bardia Yousefi

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. NMF spatial maps for the data cohort.

    Each row shows 20 NMF spatial maps for each of the 8 datasets. The first four rows represent the CPH data and the last four rows represent the naïve data.

    (TIF)

    pone.0300526.s001.tif (2.1MB, tif)
    S2 Fig. PCA spatial maps for the data cohort.

    Each row shows 20 PCA spatial maps for each of the 8 datasets. The first four rows represent the CPH data and the last four rows represent the naïve data.

    (TIF)

    pone.0300526.s002.tif (2.2MB, tif)
    S3 Fig. Overlay of NMF spatial maps on the H&E images.

    Overlay of 20 NMF spatial maps over the corresponding H&E image for each of the 8 datasets. The first four rows represent the CPH data and the last four rows represent the naïve data.

    (TIF)

    pone.0300526.s003.tif (4.3MB, tif)
    Attachment

    Submitted filename: Response_to_Reviewers_PONE-D-23-15506R1.docx

    pone.0300526.s004.docx (30.2KB, docx)

    Data Availability Statement

    All MSI data files are available from the open science website Zenodo (doi: 10.5281/zenodo.7901681).


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES