Abstract
Breast cancer poses a significant global health challenge, requiring improved diagnostic solutions for its timely intervention and treatment. Real-time diagnostic approaches in current practice offer promising avenues for early detection. However, these techniques often lack specificity, necessitating the development of robust diagnostic tools for real-time applications. In the current study, fluorescence spectroscopy is integrated with machine learning algorithms, and a graphical user interface (GUI) is developed for rapid breast cancer prediction. This study records 206 native fluorescence spectra, 103 spectra each from 31 normal and 31 malignant breast tissues using 325 nm excitation, followed by discrimination analysis using different machine learning algorithms, including backpropagation artificial neural network (BP-ANN), support vector machine (SVM), and Naïve Bayes (NB). Comparative analysis reveals that SVM in combination with a polynomial kernel demonstrated the superior performance of accuracy (98.78%), sensitivity (100%), specificity (97.56%), and precision (97.62%), among others. Furthermore, the in-house developed GUI applied to the current data showed the possibility of real-time prediction of pathological breast tissues, facilitating standalone applications.
1. Introduction
Despite advancements in medical sciences and technology, cancer remains a significant global health concern. According to The Global Cancer Observatory, female breast cancer ranked number one among the rest of the cancers. Traditional screening methods, such as mammography, while effective for early detection and treatments, often yield false-positive results in the case of dense breasts. Therefore, there is an urgent need for the development of efficient, fast, accurate, and rapid screening technologies for the early detection and diagnosis of breast cancer. Optical spectroscopy, including native fluorescence/autofluorescence, has shown immense potential to study cellular components and tissue pathology. − In autofluorescence, the emission is due to an intrinsic fluorophore present in the specimen under study. It is a phenomenon that appears when intrinsic fluorophores get excited at their suitable excitation wavelengths. Several intrinsic fluorophores in the biological specimens show fluorescence properties. , Alterations in the tissue structure, metabolic activity, and the tumor microenvironment related to breast cancer can change the concentration and distribution of fluorophores such as collagen, elastin, nicotinamide adenine dinucleotide hydride (NADH), flavin adenine dinucleotide (FAD), etc., resulting in observable differences in their absorbance and fluorescence spectra. , Autofluorescence spectroscopy, which exploits the natural fluorescence properties of endogenous fluorophores in cells and tissues at appropriate excitations, has shown high sensitivity with reduced sample preparation steps, making it suitable for studying various cancer types. − Several research studies have demonstrated the efficacy of fluorescence spectroscopy in cancer diagnosis using machine learning algorithms, where Stokes shift spectroscopy, synchronous luminescence spectroscopy, and native fluorescence spectroscopy are discussed.
The application of machine learning algorithms to spectral data analysis for differentiation and discrimination of various tissue pathological conditions has revolutionized the field of biomedical spectroscopy. Researchers around the globe have shown several-fold improvement in the classification of spectral data by extracting meaningful information from them using various statistical tools, including machine learning , such as linear discriminant analysis (LDA), principal component analysis (PCA), support vector machine (SVM), , and many more. − These algorithms analyze spectral data to identify key features that differentiate between normal and malignant tissues, leading to accurate classification and diagnosis of cancer. While techniques such as PCA have been widely used for feature reduction, they may suffer from limitations such as information loss and susceptibility to noise. Researchers advocate for feature selection techniques over feature extraction methods to address these challenges. Feature selection helps eliminate noise and redundant features, improving the accuracy and reliability of cancer diagnosis.
Among various feature selection methods, Minimum Redundancy Maximum Relevance (mRMR) offers distinct advantages by simultaneously maximizing feature relevance to the target class while minimizing redundancy among selected features, making it particularly effective for spectral data where adjacent wavelengths exhibit high correlation. Unlike transformation methods such as PCA, mRMR preserves the original features rather than creating abstract components, maintaining the physical interpretability of spectral bands for more meaningful analysis. Furthermore, mRMR demonstrates superior computational efficiency compared to wrapper methods while handling feature interactions more effectively than univariate approaches, resulting in more robust models that are less prone to overfitting when analyzing high-dimensional spectral data sets. ,
Fluorescence spectroscopy, combined with advanced data analysis techniques, holds great promise for the early detection and diagnosis of breast cancer. , By leveraging machine learning algorithms and innovative spectroscopy methods, it is possible to achieve high accuracy and reliability in cancer classification, leading to better patient outcomes and reducing the burden of this devastating disease on society. Further research and development in this field are essential to realizing the full potential of optical spectroscopy in cancer diagnosis and treatment. Therefore, in the present study, native fluorescence spectra of normal and breast tumor tissues were recorded at 325 nm excitation and subjected to machine learning-based analysis for discrimination using proper training and testing models. Further, an attempt has been made to develop a GUI-based machine learning analysis to predict unknown new data for standalone applications.
2. Materials and Methods
2.1. Experimental Setup
Figure shows the experimental setup used in the current study. The setup consists of a He–Cd laser (325 nm) as an excitation source, a 7-fiber-based optical probe (1 for excitation and 6 for collection of native fluorescence), and an Ocean Optics-QE Pro spectrograph (grating: 300 lines/mm, blazed at 300 nm) for spectral dispersion and detection. The setup also consists of a longpass filter (365 nm longpass filter, Schneider Kreuznach, Germany) to avoid Rayleigh scattered light and a computer for spectral recording and storing.
1.

Instrumentation setup used to record native fluorescence spectra of normal and malignant breast tissues ex vivo at 325 nm excitation. Various components of the setup are marked as He–Cd laser, power supply, optical lenses, YZ translational stage with a fiber coupler and an SMA adapter, fiber probe consisting of a single fiber, seven fiber assembly, six fiber assembly (200 μm) for collection of native fluorescence, longpass filter (allowing >365 nm), spectrograph-CCD for spectral dispersion and detection, and a computer for recording/storing native fluorescence data.
2.2. Data Collection and Handling
After obtaining the institutional ethics committee approval, surgically resected malignant breast tissue samples were collected from the consenting patients. The control samples were selected from the uninvolved areas of the same samples under study. The collected breast tissues were of different sizes, ranging from approximately 15 to 20 mm2 in surface area and a thickness of 2–4 mm. The samples were mounted on a quartz plate and excited by a 325 nm He–Cd laser for recording the corresponding native fluorescence spectra using a spectrograph in the spectral range of 375–650 nm. The samples were kept moist with normal saline during spectral acquisition, and three spectra were recorded at 3–4 different sites on every sample. Three spectra in each location were averaged to obtain the mean spectrum, which was used for discrimination analysis of the malignant samples from the normal. Thus, a total of 206 spectra (103 normal, 103 malignant) from 31 normal and 31 malignant tissue samples from 31 different patients were obtained in this study. The cancer tissues were collected from 31 patients, with the majority classified as grade II or grade III and at stages II to III.
2.3. Data Analysis
2.3.1. Preprocessing of the Spectra
The recorded spectra in the present study were subjected to baseline correction, region of interest selection, smoothing using filter functions to remove unwanted noise signals, and, finally, normalization with respect to the highest peak in the spectra. This was carried out using in-house built code implemented using MATLAB v.2019b.
2.3.2. Feature Selection
In the current study, a filter-based feature selection algorithm known as Minimum Redundancy Maximum Relevance (mRMR) was used. The mRMR was selected for its ability to identify features that are highly relevant to the classification task while minimizing redundancy among the selected features, which is particularly important in spectral data where adjacent wavelengths may be highly correlated. To minimize the correlation between different features of a target class and maximize the correlation within the class, the mRMR uses mutual information. After applying this method, the feature ranking was obtained based on its importance, and the top 3 features at wavelengths 400.6 nm (feature 1), 453.2 nm (feature 2), and 477.6 nm (feature 3) were selected for further analysis.
2.3.3. Spectral Intensity Ratio Calculation on Selected Features
The intensity ratios of selected features 2 and 3 versus feature 1 for each of the normalized spectra under study were calculated, as shown in Table . These ratios were subsequently used as the input feature matrix for the machine learning model for training and testing.
1. Details of the Intensity Ratios of Features in Normalized Spectra.
| normalized intensity ratios | intensity corresponding to | features corresponding wavelengths (nm) |
|---|---|---|
| R1 | feature 2/feature 1 | 453.2/400.6 |
| R2 | feature 3/feature 1 | 477.6/400.6 |
2.4. Machine Learning
In the current study, a comparison of the performances of artificial neural networks (ANNs) was made with different training algorithms like resilient backpropagation (RP), scaled conjugate gradient (SCG), and Gaussian discriminant analysis (GDA). Similarly, support vector machine (SVM) learning algorithms with different kernel functions, radial basis function (RBF), polynomial, and linear. Likewise, Naïve Bayes-based classification analysis was also attempted in the study. To ensure the generalizability of the model, we employed k-fold cross-validation (k = 5) during the training process.
2.5. Graphical User Interface Using MATLAB
In the current study, a MATLAB-based GUI was designed and developed to predict new spectral data that can be used by anybody without prior coding knowledge. In actual practice, user interfaces wait for the response from the end-user to perform any operation, like “clicking on a button” and provide the performance outcomes in the form of a graph or text message, etc. By building these interfaces, anyone without knowledge of the development of the code can use it easily for the analysis of the data. Also, the GUI can be installed in any system without the knowledge of MATLAB coding for standalone applications, thereby making it suitable for user-friendly and remote applications. It is an interactive platform for real-time classification of data, enabling efficient differentiation between two or more classes of data. It simplifies data analysis by integrating the trained machine learning model at the back end, making it accessible for clinical and research applications. Currently, the GUI is the most prevalent and recognized user interface for computers. It has an input and an output component. The measured data, upon spectral preprocessing or selected spectral features from the data of a study, are fed to the input of the GUI in a particular format, which upon specific operation provides the required outputs in a particular format. The input and output information on the GUI is not specific to any particular type of spectral pattern/data, but can be extended to any other type of data. The GUI designed in this study accepts spectral data files in .spc format, displaying the corresponding 2D fluorescence spectrum alongside presaved reference spectra for normal and malignant breast tissue. Upon loading the spectral data, the file name appears in the status bar, ensuring traceability. The system applies preprocessing steps, including spectral normalization, to enhance data consistency before classification. When the “Predict” button is pressed, the model classifies the spectrum, triggering a green indicator for normal tissue or a red indicator for malignant tissue. Additionally, users can generate a diagnostic report in .pdf format via the “Generate and Print Report” button, summarizing the prediction results for documentation and further analysis. While this study focuses on the fluorescence spectral range specific to breast tissue analysis, the GUI framework can be extended to accommodate other spectral data sets with appropriate model training.
3. Results and Discussion
This study is an attempt to elucidate the diagnosis of breast tissues as malignant and normal by native fluorescence spectroscopy. The in-house-designed and developed experimental setup was used to record native fluorescence from the tissues. The spectral processing was performed using in-house developed MATLAB codes. Minor fluctuations in the signal that were attributed to noise during signal recording were filtered using a median filter of order 10. Further, baseline correction and unity normalization were performed. A noticeable variation in the native fluorescence signal was observed in normal and malignant samples in the wavelength range from 375 to 650 nm. It was considered to be a region of interest (ROI) for further analysis.
Figure shows the typical preprocessed, normalized mean native fluorescence spectra of normal and malignant breast tissues under study. The spectral peaks for both normal and malignant are due to endogenous tissue fluorophores, collagen, elastin, NADH, etc. In the case of normal cells, these peaks represent the distribution of these fluorophores in healthy tissue. However, in malignant tissue, changes in composition, metabolism, and tumor microenvironment can alter the concentration and distribution of these fluorophores, leading to differences in native fluorescence intensity and spectral characteristics compared to normal tissue. Understanding these variations may aid in diagnostic and prognostic applications in biomedical spectroscopy and imaging. The observed variations in these components hold potential as discriminative parameters for distinguishing between normal and malignant conditions. This elucidation underscores the importance of utilizing spectral characteristics as diagnostic markers in biomedical applications.
2.

Typical preprocessed, normalized mean native fluorescence spectra of normal and malignant breast tissues.
The recorded spectra in the current study demonstrated a distinct minor peak due to collagen/elastin at ∼409 nm and a major peak due to NADH at ∼435 nm for normal tissues and at ∼409 and ∼460 nm due to collagen/elastin and NADH, respectively, for malignant tissues. In contrast, a study by Chowdary et al. on breast pathological tissues reported the major peak due to collagen at 390 nm, the minor peak due to NADH at ∼460 nm for malignant breast tissues, and the minor peak at ∼390 nm due to collagen and the major peak at ∼460 nm due to NADH for the normal breast tissues. However, the experimental conditions used in their study were different, used 325 nm pulsed laser light for sample excitation and a spectrograph ICCD combination for spectral recording, maybe the reason behind the variation.
The recorded native fluorescence spectra were subjected to machine learning-based classification analysis. In order to improve the power of prediction ability of the machine learning model, a subset of features from the entire spectra was selected. Further, even the available features might contain redundant information, which could negatively impact the performance of the model. Therefore, in the current study, a filter-based feature selection method, the mRMR algorithm, was used to rank the most relevant features, thereby reducing the feature redundancy. There were 344 features ranked by the mRMR algorithm in the ROI under study at an increment of ∼0.78 nm. The ranking of all 344 features is shown in Figure , with the top 3 features highlighted in it. Feature 1 has the highest prediction score of 0.65, indicating that it is the best feature to distinguish between normal and malignant. The prediction score values for feature 2 and feature 3 were found to be 0.177 and 0.175, respectively, and thereafter, these values drastically decreased for the remaining features. These top 3 features (features 1, 2, and 3) are thus considered for further analysis in the study. Further, feature 1 corresponds to an emission wavelength of 400.6 nm, representing collagen emission, and features 2 and 3 correspond to emission wavelengths of 453.2 and 477.6 nm, respectively, representing NADH emission. Increased metabolic activity in cancer cells often leads to elevated NADH levels, while changes in the extracellular matrix can affect collagen fluorescence. These tissue fluorophores, collagen, and NADH are already known biomarkers for discriminating normal from malignant tumor tissues and are also identified by the feature selection method in the present study. Thus, this observation suggests that the mRMR is an efficient feature selection method for removing data redundancy and hence was used in the current study. The mRMR minimizes the correlation between different features of a target class and maximizes the correlation between the features and target classes. The correlation plot of the top 3 features for normal and malignant is shown in Figure .
3.

Plot showing mean native fluorescence spectra of normal and malignant breast tissues (X-axis-bottom wavelength vs Y-axis-left normalized intensity) along with the corresponding mRMR feature ranking (X-axis-top features number vs Y-axis-right prediction rank values). Highlighted numbers 1–3 correspond to the top 3 features having position (1)/feature 1: X-axis = 400.6 and Y-axis = 0.65, position (2)/feature 2: X-axis = 453.2 and Y-axis = 0.177, position (3)/feature 3: X-axis = 477.6 and Y-axis = 0.175, where the X-axis (bottom) corresponds to wavelength in nanometers, and the Y-axis on the right corresponds to prediction importance score (rank value).
4.
Correlation plot of the top 3 features for normal and malignant spectra. 1CF, 2CF, and 3CF are the top 3 malignant features and 1NF, 2NF, and 3NF are the top 3 normal features.
The correlation plot reveals that all features exhibit predominantly low correlation with each other, as indicated by the mRMR algorithm. Notably, features 2 and 3 of malignant samples demonstrate a positive correlation with an increasing feature index. Conversely, other features remain uncorrelated. Further, when scatter plots between the normalized spectral intensities of features 1, 2, and 3 versus sample numbers for each of the normalized spectra under study were plotted, they demonstrated clear discrimination between normal and malignant samples under study for feature 1 and partial discrimination for features 2 and 3, as shown in Figure .
5.
Scatter plot of top 3 features (a–c) for normal and malignant spectra against sample numbers. (a) Feature 1 versus sample number, (b) feature 2 versus sample number, and (c) feature 3 versus sample number.
Further, when the intensities of normal and malignant native fluorescence spectra for features 1, 2, and 3 in the normalized spectra were compared, they demonstrated clear variation between normal and malignant spectra, as shown in Figure . Mann–Whitney’s U test also showed a P-value significance of <0.0001 for all 3 features as shown in Figure . Further, using these intensities, intensity ratios R1 and R2 (Table ) for each of the normal and malignant spectra under study were calculated and plotted, as shown in Figure . The figure clearly differentiated normal from malignant, suggesting that these 2 features were sufficient for the machine learning-based classification of the spectra under study, reducing the input feature dimension from 3 to 2. The reduced features are then used as input features for machine learning (ML) algorithms in the study. Out of 206 (103 normal, 103 malignant) spectra, 124 (62 normal, 62 malignant) spectra (60%) were used for training, and the remaining 82 (41 normal, 41 malignant) spectra (40%) were used for testing the model.
6.
Plot of wavelength position for (a) feature 1, (b) feature 2, and (c) feature 3 in the normalized averaged native fluorescence spectra of normal and malignant samples under study (*** significance using Mann–Whitney’s U test having P < 0.0001).
7.

Scatter plot showing the training set corresponding to R1 versus R2 for both normal and malignant groups.
In the current study, backpropagation-ANN , was modeled to have 3 layers, the first layer was the input layer (124 spectra × 2 intensity ratios), 1 hidden middle layer having 8 neurons, and the last layer is an output layer having 2 binary outputs corresponding to normal0 and malignant1. Once this model was trained, it was used to predict test data, generating prediction scores with a score value between 0 and 1. The value <0.5 was considered the normal class, and >0.5 was considered the malignant class. Upon performance evaluation, ANN-RP demonstrated an accuracy of 97.56% and sensitivity, specificity, and precision of 100, 95.12, and 95.35%, respectively. ANN-SCG demonstrated an accuracy of 98.37% and sensitivity, specificity, and precision of 100, 97.56, and 97.62%, respectively. ANN-GDA demonstrated an accuracy of 98.78% and sensitivity, specificity, and precision of 100, 97.56, and 97.62%, respectively.
Further, a support vector machine (SVM) with various kernels RBF, linear, and polynomial of order 3, was used in this study to train and test the model (124 spectra × 2 intensity ratios). The trained model was later used to calculate the prediction score values (0 for normal and 1 for malignant). In the case of SVM in different kernels, SVM-RBF demonstrated an accuracy of 97.56% and sensitivity, specificity, and precision of 100, 95.12, and 95.35%, respectively. SVM polynomial demonstrated an accuracy of 98.78% and sensitivity, specificity, and precision of 100, 97.56, and 97.62%, respectively. SVM-Linear demonstrated an accuracy of 97.56% and sensitivity, specificity, and precision of 100, 95.12, and 95.35%, respectively.
Similarly, the Naïve Bayesalgorithm was also designed with the same input features under study, and prediction score values of less than 0.5 were considered normal, and greater than 0.5 were considered malignant. The trained model showed an accuracy of 97.56% and sensitivity, specificity, and precision of 100, 95.12, and 95.35%, respectively. The overall accuracy, sensitivity, specificity, precision, F-score, AUC, and MCC values of the models under study are listed in Table .
2. Performance Metrics Showing Accuracy, Sensitivity, Specificity, Precision, F-Score, MCC, and AUC Values for All of the ML Algorithms under Study.
From Table , it is clear that the SVM polynomial of order 3, ANN-GDA, and ANN-SCG perform better when compared with other algorithms. This is due to the ability of polynomial kernels to model nonlinear relationships between features that may have been advantageous in capturing the complex spectral signatures associated with breast cancer, compared to linear or RBF kernels. The score values generated by the model were listed automatically in the Excel file in a specified folder using an in-house developed code using the MATLAB platform. Supporting Table S1 shows the score values for the samples under study and the “Match” condition in the form of “Yes/No” for the SVM polynomial kernel model. Figure shows the training, testing, and classified data plotted along with the decision boundary for the SVM polynomial model. The zoomed inset shows 1 mismatched case/sample, which is actually normal but predicted as malignant.
8.
Scatter plot of training, testing, and prediction data corresponding to 2 ratios R1 and R2 for both normal and malignant classes, along with the decision boundary. The purple arrow shows the misclassification.
In the current study, 5-fold cross-validation analysis using SVM polynomial showed consistent accuracy values of 97.56, 98.78, 96.57, 98.37, and 97.09%, respectively, with an average performance of 97.67% ± 0.91%, suggesting no possibility of data overfitting and representing a generalized model. Therefore, this model was finalized as a prediction model in the back end of a graphical user interface (GUI) to classify/predict any unknown new native fluorescence spectrum of normal or malignant breast tissue under investigation on a real-time basis. Figure (a) represents the screenshot of the GUI consisting of a left panel with a display screen, a status bar, and a “Browse data” button. On the right side of the GUI, 2 indicators representing “Normal” and “Malignant” at the top, followed by the “Predict” and “Generate and Print Report” buttons, and an “Exit” button. When a sample data in “.spc” format is browsed, its corresponding two-dimensional (2D) spectrum gets displayed on the screen along with normalized “normal” and “malignant” spectrum presaved in the GUI as a ready reference, as shown in Figure (b). The file name of the selected spectrum displayed on the screen also appears on the “status bar”. After spectral display, when the “Predict” button is pressed, a green indicator light turns ON for normal (Figure (c)), and a red indicator light for malignant (Figure (d)) based on the prediction by the trained model in the back end of the GUI. A report of the prediction analysis (green/red) can be generated by pressing the “Generate and Print Report” button. This operation will provide the prediction of the loaded “.spc” file in the form of “.pdf” as a diagnosis report, as shown in Figure (e,f).
9.
GUI for the prediction of new spectra given by the user. (a) Screenshot of the app developed. (b) Plot of the mean of normal and malignant along with the input sample. (c, d) The prediction result example for normal and malignant conditions. (e, f) The report generated with the option to print or download.
Table highlights the performance of various breast cancer diagnostic techniques and compares them with the current study. Although the biopsy remains the gold standard due to its high specificity, it is invasive and time-consuming. MRI, though highly accurate, is expensive and less accessible. Ultrasound, on the other hand, is more affordable and widely available but has relatively lower sensitivity and specificity for denser breasts, making it less reliable as a standalone diagnostic tool. In contrast, our technique achieves superior diagnostic performance and offers faster results and cost-effectiveness compared to MRI and biopsy. Being noninvasive, it eliminates patient discomfort while maintaining high reliability. With its ability to provide accurate and efficient detection, our technique has strong potential for clinical translation, making it a promising alternative for early diagnosis and routine screening in healthcare settings.
3. Comparison of Various Breast Cancer Diagnostic Modalities with the Current Study.
| methods | accuracy (%) | sensitivity (%) | specificity (%) | invasiveness | refs |
|---|---|---|---|---|---|
| mammography | 77.9–89.3 | 60–97 | 64.5–80 | noninvasive | − |
| ultrasound | 74–85 | 61–87 | 75–76.8 | noninvasive | ,, |
| MRI | 86.9–98.4 | 72.2–94.6 | 66.7–74.2 | noninvasive | , |
| biopsy | 92.5 | 87–94.2 | 88.1–98 | invasive | , |
| current study | 98.78 | 100 | 97.56 | noninvasive |
A limitation of this study is the relatively small sample size (31 normal and 31 malignant cases), which may affect the statistical robustness and generalizability of the findings, which can be overcome by including a larger sample size in future studies. Additionally, breast cancer is highly heterogeneous, with variations in molecular subtypes, metabolic profiles, and extracellular matrix composition potentially influencing the observed spectral signatures. These factors could contribute to variability in fluorescence responses, necessitating a cautious interpretation of the results.
Beyond sample size limitations, the clinical translation of fluorescence-based spectral diagnostics presents additional challenges. Larger-scale clinical trials are essential to establish the diagnostic accuracy, sensitivity, and specificity of this approach across diverse patient populations. Furthermore, regulatory approvals must be obtained to ensure compliance with clinical safety and efficacy standards. The integration of spectral-based techniques into existing diagnostic workflows also requires careful consideration, including compatibility with current imaging modalities, cost-effectiveness, and ease of use in clinical settings. Addressing these challenges will be crucial for advancing this technology toward routine clinical application and improving breast cancer detection and characterization.
4. Conclusions
The present study demonstrates the effectiveness of native fluorescence spectroscopy integrated with machine learning for the classification of normal and malignant breast tissues. By recording native fluorescence spectra and applying machine learning-based classification, we identified SVM polynomial as the most consistent and reliable model, achieving an accuracy of 98.78%, a sensitivity of 100%, and a specificity of 97.56% after cross-validation, ensuring a generalized model without overfitting. To translate this research into practical use, we developed a graphical user interface (GUI) based on this optimized model, enabling the real-time prediction of unknown samples in a user-friendly and time-efficient manner. This approach significantly reduces the time and effort required for fluorescence data acquisition and classification, offering a rapid, minimally invasive, and cost-effective alternative to breast cancer diagnosis. By providing accurate and immediate classification, this method has the potential to alleviate patient anxiety, expedite clinical decision-making, and reduce the economic burden associated with conventional diagnostic procedures. The integration of spectroscopy and machine learning in this study represents a promising step toward improving early breast cancer detection, facilitating timely interventions, and ultimately enhancing patient outcomes.
Supplementary Material
Acknowledgments
The authors thank the Manipal Academy of Higher Education (MAHE), Manipal, India, for infrastructure and facilities. K.K.M. would like to thank the Indian Council of Medical Research (ICMR), Government of India, New Delhi, for financial support under two grants (ref 17x(3)/Adhoc/33/2022-ITR and EM/Dev/SG/75/0782/2023) and infrastructure support from DBT-BUILDER (BT/INF/22/SP43065/2021) Govt. of India.
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.4c11669.
MATLAB code for implementing backpropagation artificial neural networks; support vector machines; and Naïve Bayes classifiers; extended descriptions of performance metrics; including cross-validation; confusion matrices; ROC curves; and Matthews correlation coefficient; general architecture of backpropagation artificial neural network algorithm, and score values for test data using the SVM POLYNOMIAL ML algorithm (PDF)
The authors declare no competing financial interest.
References
- Cancer Today, 2024. https://gco.iarc.who.int/today/. (accessed June 08, 2024).
- Mao X., He W., Humphreys K., Eriksson M., Holowko N., Yang H., Tapia J., Hall P., Czene K.. Breast Cancer Incidence After a False-Positive Mammography Result. JAMA Oncol. 2024;10(1):63–70. doi: 10.1001/jamaoncol.2023.4519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim J. A., Wales D. J., Yang G.-Z.. Optical Spectroscopy for in Vivo Medical Diagnosisa Review of the State of the Art and Future Perspectives. Prog. Biomed. Eng. 2020;2(4):042001. doi: 10.1088/2516-1091/abaaa3. [DOI] [Google Scholar]
- Rodrigues J., Amin A., Chandra S., Mulla N. J., Nayak G. S., Rai S., Ray S., Mahato K. K.. Machine Learning Enabled Photoacoustic Spectroscopy for Noninvasive Assessment of Breast Tumor Progression In Vivo: A Preclinical Study. ACS Sens. 2024;9(2):589–601. doi: 10.1021/acssensors.3c01085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wei Y., Chen H., Yu B., Jia C., Cong X., Cong L.. Multi-Scale Sequential Feature Selection for Disease Classification Using Raman Spectroscopy Data. Comput. Biol. Med. 2023;162:107053. doi: 10.1016/j.compbiomed.2023.107053. [DOI] [PubMed] [Google Scholar]
- Li Z., Li Z., Chen Q., Zhang J., Dunham M. E., McWhorter A. J., Feng J.-M., Li Y., Yao S., Xu J.. Machine-Learning-Assisted Spontaneous Raman Spectroscopy Classification and Feature Extraction for the Diagnosis of Human Laryngeal Cancer. Comput. Biol. Med. 2022;146:105617. doi: 10.1016/j.compbiomed.2022.105617. [DOI] [PubMed] [Google Scholar]
- Mukunda D. C., Joshi V. K., Mahato K. K.. Light Emitting Diodes (LEDs) in Fluorescence-Based Analytical Applications: A Review. Appl. Spectrosc. Rev. 2022;57(1):1–38. doi: 10.1080/05704928.2020.1835939. [DOI] [Google Scholar]
- Mukunda D. C., Rodrigues J., Joshi V. K., Raghushaker C. R., Mahato K. K.. A Comprehensive Review on LED-Induced Fluorescence in Diagnostic Pathology. Biosens. Bioelectron. 2022;209:114230. doi: 10.1016/j.bios.2022.114230. [DOI] [PubMed] [Google Scholar]
- Chohan D. P., Biswas S., Wankhede M., Menon P., K A., Basha S., Rodrigues J., Mukunda D. C., Mahato K. K.. Assessing Breast Cancer through Tumor Microenvironment Mapping of Collagen and Other Biomolecule Spectral FingerprintsA Review. ACS Sens. 2024;9:4364–4379. doi: 10.1021/acssensors.4c00585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Palmer G. M., Keely P. J., Breslin T. M., Ramanujam N.. Autofluorescence Spectroscopy of Normal and Malignant Human Breast Cell Lines. Photochem. Photobiol. 2003;78(5):462–469. doi: 10.1562/0031-8655(2003)0780462ASONAM2.0.CO2. [DOI] [PubMed] [Google Scholar]
- Chowdary M. V. P., Mahato K. K., Kumar K. K., Mathew S., Rao L., Krishna C. M., Kurien J.. Autofluorescence of Breast Tissues: Evaluation of Discriminating Algorithms for Diagnosis of Normal, Benign, and Malignant Conditions. Photomed. Laser Surg. 2009;27(2):241–252. doi: 10.1089/pho.2008.2255. [DOI] [PubMed] [Google Scholar]
- Deal J., Mayes S., Browning C., Hill S., Rider P., Boudreaux C., Rich T. C., Leavesley S. J.. Identifying Molecular Contributors to Autofluorescence of Neoplastic and Normal Colon Sections Using Excitation-Scanning Hyperspectral Imaging. J. Biomed. Opt. 2019;24(2):021207. doi: 10.1117/1.JBO.24.2.021207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Adams A. C., Kufcsák A., Lochenie C., Khadem M., Akram A. R., Dhaliwal K., Seth S.. Fibre-Optic Based Exploration of Lung Cancer Autofluorescence Using Spectral Fluorescence Lifetime. Biomed. Opt. Express. 2024;15(2):1132–1147. doi: 10.1364/BOE.515609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lim S. Y., Yoon H. M., Kook M.-C., Jang J. I., So P. T. C., Kang J. W., Kim H. M.. Stomach Tissue Classification Using Autofluorescence Spectroscopy and Machine Learning. Surg. Endosc. 2023;37(8):5825–5835. doi: 10.1007/s00464-023-10053-6. [DOI] [PubMed] [Google Scholar]
- De Veld D. C. G., Witjes M. J. H., Sterenborg H. J. C. M., Roodenburg J. L. N.. The Status of in Vivo Autofluorescence Spectroscopy and Imaging for Oral Oncology. Oral Oncol. 2005;41(2):117–131. doi: 10.1016/j.oraloncology.2004.07.007. [DOI] [PubMed] [Google Scholar]
- Chithra K., Aruna P., Einstein G., Vijayaraghavan S., Ganesan S.. Monitoring Breast Cancer Response to Treatment Using Stokes Shift Spectroscopy of Blood Plasma. J. Fluoresc. 2019;29(3):803–812. doi: 10.1007/s10895-019-02399-9. [DOI] [PubMed] [Google Scholar]
- Gnanatheepam E., Kanniyappan U., Dornadula K., Prakasarao A., Singaravelu G.. Synchronous Luminescence Spectroscopy as a Tool in the Discrimination and Characterization of Oral Cancer Tissue. J. Fluoresc. 2019;29(2):361–367. doi: 10.1007/s10895-018-02343-3. [DOI] [PubMed] [Google Scholar]
- Pu Y., Sordillo L. A., Yang Y., Alfano R. R.. Key Native Fluorophores Analysis of Human Breast Cancer Tissues Using Gram–Schmidt Subspace Method. Opt. Lett. 2014;39(24):6787–6790. doi: 10.1364/OL.39.006787. [DOI] [PubMed] [Google Scholar]
- Zhang Z., Zhang Y., Ying L., Sun C., Zhang H. F.. Machine-Learning Based Spectral Classification for Spectroscopic Single-Molecule Localization Microscopy. Opt. Lett. 2019;44(23):5864–5867. doi: 10.1364/OL.44.005864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang K., Guo P., Luo A.-L.. A New Automated Spectral Feature Extraction Method and Its Application in Spectral Classification and Defective Spectra Recovery. Mon. Not. R. Astron. Soc. 2017;465(4):4311–4324. doi: 10.1093/mnras/stw2894. [DOI] [Google Scholar]
- Nazeer S. S., Saraswathy A., Gupta A. K., Jayasree R. S.. Fluorescence Spectroscopy to Discriminate Neoplastic Human Brain Lesions: A Study Using the Spectral Intensity Ratio and Multivariate Linear Discriminant Analysis. Laser Phys. 2014;24(2):025602. doi: 10.1088/1054-660X/24/2/025602. [DOI] [Google Scholar]
- Kamath S. D., Mahato K. K.. Optical Pathology Using Oral Tissue Fluorescence Spectra: Classification by Principal Component Analysis and k-Means Nearest Neighbor Analysis. J. Biomed. Opt. 2007;12(1):014028. doi: 10.1117/1.2437738. [DOI] [PubMed] [Google Scholar]
- Dramićanin T., Lenhardt L., Zeković I., Dramićanin M. D.. Support Vector Machine on Fluorescence Landscapes for Breast Cancer Diagnostics. J. Fluoresc. 2012;22(5):1281–1289. doi: 10.1007/s10895-012-1070-0. [DOI] [PubMed] [Google Scholar]
- Boser, B. E. ; Guyon, I. M. ; Vapnik, V. N. . A Training Algorithm for Optimal Margin Classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory; Association for Computing Machinery: New York, NY, USA, 1992; pp 144–152. [Google Scholar]
- Toğaçar M., Ergen B., Cömert Z.. Application of Breast Cancer Diagnosis Based on a Combination of Convolutional Neural Networks, Ridge Regression and Linear Discriminant Analysis Using Invasive Breast Cancer Images Processed with Autoencoders. Med. Hypotheses. 2020;135:109503. doi: 10.1016/j.mehy.2019.109503. [DOI] [PubMed] [Google Scholar]
- Alexe G., Dalgin G. S., Ganesan S., DeLisi C., Bhanot G.. Analysis of Breast Cancer Progression Using Principal Component Analysis and Clustering. J. Biosci. 2007;32(1):1027–1039. doi: 10.1007/s12038-007-0102-4. [DOI] [PubMed] [Google Scholar]
- Huang Y.-L., Kuo S.-J., Chang C.-S., Liu Y.-K., Moon W. K., Chen D.-R.. Image Retrieval with Principal Component Analysis for Breast Cancer Diagnosis on Various Ultrasonic Systems. Ultrasound Obstet. Gynecol. 2005;26(5):558–566. doi: 10.1002/uog.1951. [DOI] [PubMed] [Google Scholar]
- Yu Y., He Z., Ouyang J., Tan Y., Chen Y., Gu Y., Mao L., Ren W., Wang J., Lin L., Wu Z., Liu J., Ou Q., Hu Q., Li A., Chen K., Li C., Lu N., Li X., Su F., Liu Q., Xie C., Yao H.. Magnetic Resonance Imaging Radiomics Predicts Preoperative Axillary Lymph Node Metastasis to Support Surgical Decisions and Is Associated with Tumor Microenvironment in Invasive Breast Cancer: A Machine Learning, Multicenter Study. eBioMedicine. 2021;69:103460. doi: 10.1016/j.ebiom.2021.103460. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wajeed M. A., Tiwari S., Gupta R., Ahmad A. J., Agarwal S., Jamal S. S., Hinga S. K.. A Breast Cancer Image Classification Algorithm with 2c Multiclass Support Vector Machine. J. Healthcare Eng. 2023;2023:3875525. doi: 10.1155/2023/3875525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fodor, I. K. A Survey of Dimension Reduction Techniques; UCRL, 2002. [Google Scholar]
- Chandrashekar G., Sahin F.. A Survey on Feature Selection Methods. Comput. Electr. Eng. 2014;40(1):16–28. doi: 10.1016/j.compeleceng.2013.11.024. [DOI] [Google Scholar]
- Peng H., Long F., Ding C.. Feature Selection Based on Mutual Information Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005;27(8):1226–1238. doi: 10.1109/TPAMI.2005.159. [DOI] [PubMed] [Google Scholar]
- Bolón-Canedo, V. ; Sánchez-Maroño, N. ; Alonso-Betanzos, A. . Feature Selection for High-Dimensional Data. In Artificial Intelligence: Foundations, Theory, and Algorithms; Springer International Publishing: Cham, 2015. [Google Scholar]
- Jiang Y., Li C.. mRMR-Based Feature Selection for Classification of Cotton Foreign Matter Using Hyperspectral Imaging. Comput. Electron. Agric. 2015;119:191–200. doi: 10.1016/j.compag.2015.10.017. [DOI] [Google Scholar]
- Pal M., Foody G. M.. Feature Selection for Classification of Hyperspectral Data by SVM. IEEE Trans. Geosci. Remote Sens. 2010;48(5):2297–2307. doi: 10.1109/TGRS.2009.2039484. [DOI] [Google Scholar]
- Mahmoud A., El-Sharkawy Y. H.. Delineation and Detection of Breast Cancer Using Novel Label-Free Fluorescence. BMC Med. Imaging. 2023;23:132. doi: 10.1186/s12880-023-01095-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shirkavand A., Babadi M., Fashtami L. A., Mohajerani E.. Application of Optical Spectroscopy in Diagnosing and Monitoring Breast Cancers: A Technical Review. Clin. Spectrosc. 2023;5:100027. doi: 10.1016/j.clispe.2023.100027. [DOI] [Google Scholar]
- Ding C., Peng H.. Minimum Redundancy Feature Selection from Microarray Gene Expression Data. J. Bioinf. Comput. Biol. 2005;03(02):185–205. doi: 10.1142/S0219720005001004. [DOI] [PubMed] [Google Scholar]
- MATLAB GUI, 2024. https://in.mathworks.com/discovery/matlab-gui.html. (accessed July 10, 2024).
- Wagnieres G. A., Star W. M., Wilson B. C.. In Vivo Fluorescence Spectroscopy and Imaging for Oncological Applications. Photochem. Photobiol. 1998;68(5):603–632. doi: 10.1111/j.1751-1097.1998.tb02521.x. [DOI] [PubMed] [Google Scholar]
- Radovic M., Ghalwash M., Filipovic N., Obradovic Z.. Minimum Redundancy Maximum Relevance Feature Selection Approach for Temporal Gene Expression Data. BMC Bioinf. 2017;18(1):9. doi: 10.1186/s12859-016-1423-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hammoudi N., Ahmed K. B. R., Garcia-Prieto C., Huang P.. Metabolic Alterations in Cancer Cells and Therapeutic Implications. Chin. J. Cancer. 2011;30(8):508–525. doi: 10.5732/cjc.011.10267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holdsworth G., Bon H., Bergin M., Qureshi O., Paveley R., Atkinson J., Huang L., Tewari R., Twomey B., Johnson T.. Quantitative and Organisational Changes in Mature Extracellular Matrix Revealed through High-Content Imaging of Total Protein Fluorescently Stained in Situ. Sci. Rep. 2017;7(1):9963. doi: 10.1038/s41598-017-10298-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosner B., Grove D.. Use of the Mann–Whitney U-Test for Clustered Data. Stat. Med. 1999;18(11):1387–1400. doi: 10.1002/(SICI)1097-0258(19990615)18:11<1387::AID-SIM126>3.0.CO;2-V. [DOI] [PubMed] [Google Scholar]
- Rumelhart, D. E. ; Hinton, G. E. ; Williams, R. J. . (1986) D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Internal Representations by Error Propagation,” Parallel Distributed Processing: Explorations in the Microstructures of Cognition, Vol. I, D. E. Rumelhart and J. L. McClelland (Eds.) Cambridge, MA: MIT Press, Pp. 318–362. In Neurocomputing, Vol. 1; Anderson, J. A. ; Rosenfeld, E. , Eds.; The MIT Press, 1988; pp 675–695. [Google Scholar]
- Riedmiller M., Braun H.. A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. IEEE Int. Conf. Neural Networks. 1993;1:586–591. doi: 10.1109/ICNN.1993.298623. [DOI] [Google Scholar]
- Møller M. F.. A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning. Neural Networks. 1993;6(4):525–533. doi: 10.1016/S0893-6080(05)80056-5. [DOI] [Google Scholar]
- Domingos P., Pazzani M.. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Mach. Learn. 1997;29(2):103–130. doi: 10.1023/A:1007413511361. [DOI] [Google Scholar]
- Alotaibi B. S., Alghamdi R., Aljaman S., Hariri R. A., Althunayyan L. S., AlSenan B. F., Alnemer A. M.. The Accuracy of Breast Cancer Diagnostic Tools. Cureus. 2024;16(1):e51776. doi: 10.7759/cureus.51776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeeshan M., Salam B., Khalid Q. S. B., Alam S., Sayani R.. Diagnostic Accuracy of Digital Mammography in the Detection of Breast Cancer. Cureus. 2018;10(4):e2448. doi: 10.7759/cureus.2448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fitzjohn J., Zhou C., Chase J. G.. Critical Assessment of Mammography Accuracy. IFAC-PapersOnLine. 2023;56(2):5620–5625. doi: 10.1016/j.ifacol.2023.10.472. [DOI] [Google Scholar]
- Azarpey N.. Sensitivity and Specificity of Ultrasound and Mammography for Detection of Breast Malignancy: A Systematic Review and Metaanalysis. Onkol. Radioter. 2023;17(9):333–339. [Google Scholar]
- Wang M., He X., Chang Y., Sun G., Thabane L.. A Sensitivity and Specificity Comparison of Fine Needle Aspiration Cytology and Core Needle Biopsy in Evaluation of Suspicious Breast Lesions: A Systematic Review and Meta-Analysis. Breast. 2017;31:157–166. doi: 10.1016/j.breast.2016.11.009. [DOI] [PubMed] [Google Scholar]
- Verkooijen H. M., Peeters P. H. M., Buskens E., Koot V. C. M., Rinkes I. H. M. B., Mali W. P. T., van Vroonhoven T. J. M. V.. Diagnostic Accuracy of Large-Core Needle Biopsy for Nonpalpable Breast Disease: A Meta-Analysis. Br. J. Cancer. 2000;82(5):1017–1021. doi: 10.1054/bjoc.1999.1036. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.








