Abstract
Positive margin status after breast-conserving surgery (BCS) is a predictor of higher rates of local recurrence. Intraoperative margin assessment aims to achieve negative surgical margin status at the first operation, thus reducing the re-excision rates that are usually associated with potential surgical complications, increased medical costs, and mental pressure on patients. Microscopy with ultraviolet surface excitation (MUSE) can rapidly image tissue surfaces with subcellular resolution and sharp contrasts by utilizing the nature of the thin optical sectioning thickness of deep ultraviolet light. We have previously imaged 66 fresh human breast specimens that were topically stained with propidium iodide and eosin Y using a customized MUSE system. To achieve objective and automated assessment of MUSE images, a machine learning model is developed for binary (tumor vs. normal) classification of obtained MUSE images. Features extracted by texture analysis and pre-trained convolutional neural networks (CNN) have been investigated for sample descriptions. A sensitivity, specificity, and accuracy better than 90% have been achieved for detecting tumorous specimens. The result suggests the potential of MUSE with machine learning being utilized for intraoperative margin assessment during BCS.
Keywords: Breast-conserving surgery, tumor margin, ultraviolet, fluorescence microscopy, machine learning
1. INTRODUCTION
Breast cancer is the most commonly diagnosed cancer and the second leading cause of cancer deaths among women in the United States.1 BCS, or lumpectomy, combined with following whole breast radiation therapy is a treatment option for early-stage patients. However, positive margin status after BCS indicates a higher risk of local cancer recurrence. Therefore, women with positive margins after BCS are usually recommended to undergo additional surgery or surgeries to achieve negative margins. Implementing intraoperative margin assessment aimed at achieving negative surgical margins during the first operation not only reduces the financial cost but also minimizes additional mental stress, worse cosmesis, and risks of surgical complications to patients. A positive margin is defined as tumor cells at the surface of the excised breast tissue, which suggests that cancer has not been removed completely. Since the release of the Society of Surgical Oncology and the American Society for Radiation Oncology (SSO-ASTRO) 2014 guidelines, “no ink on tumor” has been adopted by many physicians as the standard for an adequate margin in invasive breast cancers.2 Among existing and emerging margin assessment technologies, MUSE3 employs the property of deep ultraviolet light that it is absorbed strongly by proteins, deoxyribonucleic acids (DNAs), and other biomolecules in tissues, which achieves thin optical sectioning and fluorescence imaging at superficial layers of a fresh unfixed specimen in high resolution without complex settings. Studies have investigated the feasibility of using MUSE for breast tumor detection on unfixed human breast specimens.4-6 Nevertheless, visual interpretation of MUSE images by non-pathologists is a qualitative and subjective process, which may suffer from inadequate experience, lethargy and fatigue, or even unintended errors. To overcome such limitations and achieve a robust diagnosis, reliable and automated assessment of breast MUSE images is desired to aid the positive surgical margin detection process in operating room settings.
2. METHODS
2.1. Image data
A 285 nm wavelength deep ultraviolet LED (M285L4, Thorlabs) was used for fluorescence excitation. A 4× apochromatic long working distance objective lens (Plan Fluor 4×, numerical aperture NA = 0.13) and a USB 3.0 charge-coupled device (CCD) color camera (MTR3CCD06000KPA, ToupTek) were used for image acquisition. Propidium iodide and eosin Y were selected for topical staining of cell nuclei and cytoplasm, respectively. Details of the experimental setup and protocol can be found in a previous publication.6 Fresh human breast specimens were collected from lumpectomy, mastectomy, and breast reconstruction cases by the Medical College of Wisconsin (MCW) Tissue Bank from the years 2018 to 2020. All specimens were de-identified and the study was approved by MCW Institutional Review Board (IRB) and Institutional Biosafety Committee (IBC). In total, 66 specimens including 42 tumors and 24 normal were received and imaged using the established protocol. After imaging, specimens were returned to the MCW Tissue Bank to obtain hematoxylin and eosin (H&E) images of the same tissue surface, which is the gold standard for diagnosis. Delineation of tumor regions, annotations of adipose, stroma, and other structures on MUSE images were guided by the histopathology H&E images as reference. Stitched MUSE images of full tissue surfaces were divided into small non-overlapping patches of 400×400 pixels (0.51×0.51 mm2 actual size) and each patch image was labeled as adipose, stroma, other normal, or tumor according to the annotation. A total of 36,128 patches were extracted, with 6,685 adipose, 13,624 stroma, 4,724 other normal, and 11,095 tumors.
2.2. Feature extraction
Two approaches were investigated for feature extraction from patch images. The first approach is texture analysis, which has been widely used in medical image analysis to quantitatively describe spatial patterns caused by the variations of pixel or voxel intensities. First-order method (histogram mean, variance, skewness, kurtosis, and entropy), fractal measures (fractal dimension and lacunarity), gray-level co-occurrence matrix (GLC-M), gray-level run length matrix (GLRLM), Gabor filtering, and local binary pattern (LBP) were implemented for texture feature extraction on the red (R) channel of patch images.7 More details about texture analysis on MUSE images can be found in a previous publication.8 The second approach is applying CNNs pre-trained on a large image dataset as feature extractors. Outputs from deep layers of a well-pre-trained CNN provide rich information about the input image. We explored AlexNet,9 Inception-V3,10 ResNet-18,11 ResNet-50,11 InceptionResNet-V2,12 and SqueezeNet13 trained on ImageNet14 to extract features from patch images. Color RGB patch image was resized to the input size of the network. For instance, a 400×400×3 pixels patch was re-scaled to 224×224×3 pixels for ResNet. Pre-processing steps including 2D Wiener filtering and intensity normalization were applied. Feature extraction methods and feature dimensionalities are summarized in Tab.1. The t-SNE dimensionality reduction algorithm15 was used for the visualization of the study data. Pearson’s correlation coefficients were calculated between features to check their linear dependencies.
Table 1.
Summary of feature extraction methods.
| Method | Features | Feature dimensionality | |
|---|---|---|---|
| Texture analysis | First-order | Mean, variance, skewness, kurtosis, entropy | 5 |
| Frctal measures | Fractal dimension, lacunarity | 2 | |
| GLCM | Contrast, correlation, energy, hogemeneity | 4 | |
| GLRLM | SRE, LRE, GLN, RLN, LGRE, HGRE, etc. | 11 | |
| Gabor filtering | 4 | ||
| LBP | 14 | ||
| Pretrained networks | AlexNet | FC-8 layer | 1000 |
| Inception-V3 | Average pooling layer | 2048 | |
| ResNet-18 | Average pooling layer | 512 | |
| ResNet-50 | Average pooling layer | 2048 | |
| InceptionResNet-V2 | Average pooling layer | 1536 | |
| SqueezeNet | Global average pooling layer | 1000 | |
2.3. Patch-level classification
For intraoperative margin assessment, surgeons only need to be informed whether there are tumors on the specimen surface. And a full specimen image is usually too large for direct image processing. Therefore, a linear support vector machine (SVM) or logistic regression classifier was trained to achieve binary classification (tumor vs. normal) for patch images. Features extracted by different methods were evaluated and compared. The combination of all texture analysis features was also evaluated. Performance metrics including sensitivity, specificity, accuracy, and area under the receiver operating characteristic (ROC) curve (AUC) were calculated from averaging the performance across the 5 folds using 5-fold cross-validation. All patches clipped from the same specimen image were bundled-up, which means they were all included in the training set or test set. The Synthetic Minority Oversampling Technique (SMOTE) was investigated to address the data imbalance issue.16
2.4. Margin-level classification
The margin-level classification is achieved based on patch-level classification results. A cluster-based weighted majority voting method was proposed for this task. In the prediction process, a full specimen image is divided into non-overlapping patches and each patch is predicted by the patch-level classifier (resulting in a patch-level classification grid map), and a cluster is defined as one or more adjacent patches with the same prediction label (normal or tumor). In a patch-level classification map, denote the total number of clusters for tumors as . Each tumor cluster is denoted as and it contains tumor patches. It is hypothesized that a larger size cluster carries more prediction confidence. Therefore, each tumor cluster is assigned a weight by the normalization of sizes of all tumor clusters:
| (1) |
The mean posterior probability of tumor prediction is calculated from all patches in cluster as
| (2) |
The tumor prediction score for the full specimen image is obtained as the weighted mean posterior probability of all tumor clusters
| (3) |
Following a similar process, the normal prediction score can be calculated from normal clusters in the same patch-level classification map. The margin-level prediction rule is defined as
| (4) |
The decision threshold is determined by the ratio , which is a hyperparameter that can be set by an empirical value or obtained via the validation process.
Another patch-based weighted majority voting method was also investigated for the decision fusion of margin-level specimen labels.17 Similarly, a full specimen image was divided into patches of 400×400 pixels in a non-overlapping way and each patch was predicted by the patch-level classifier. The most discriminative patches (with an estimated posterior probability ≥ 75%) for both tumor and normal predictions were selected for decision fusion. Each selected patch was assigned a weight as same as its posterior probability for tumor prediction. The margin-level label can be obtained by comparing the weighted voting summations for both tumor and normal labels. Compared to cluster-based decision fusion, the patch-based weighted majority voting method does not consider the spatial information between patches. Sensitivity, specificity, accuracy, and AUC were used for evaluating margin-level classification performance and 5-fold cross-validation was implemented.
3. RESULTS
3.1. Feature analysis
Fig.1 shows t-SNE visualizations of the study data from six representative feature extraction methods. Each patch image is represented by a dot. Tumors are indicated by red dots and the three normal types are represented by green, black, and cyan dots. Tumors exhibit a similar level of difference from normal in all texture analysis combinations, AlexNet, and ResNet-50 features by visual inspection. However, tumors appear to have more overlapping with a normal cluster in GLRLM and Inception-V3 features. Six examples of linear correlations between features are shown in Fig.2. ResNet-50 features have the least linear dependencies. Inception-V3 and InceptionResNet-V2 exhibit slightly higher linear dependencies. AlexNet and SqueezeNet features have moderate linear dependencies. The combination of all texture analysis features appears to contain more correlations between different features.
Figure 1.
Examples of t-SNE embedding of patch images. (a)-(c) are visualizations from LBP, GLRLM, and all texture analysis features. (d)-(f) are visualizations from features extracted by pre-trained AlexNet, Inception-V3, and ResNet-50 networks.
Figure 2.
Pearson’s linear correlation coefficients for (a) all texture analysis, (b) ResNet-18, (c) Inception-V3, (d) AlexNet, (e) SqueezeNet, and (f) InceptionResNet-V2 features.
3.2. Patch-level classification
ROC curves for patch-level classification are shown in Fig.3. Pre-trained network features are indicated by solid lines and texture analysis features are shown in dashed lines. When the classifier was trained directly from the training set, the combination of all texture analysis features achieved an ROC similar to that of AlexNet. When SMOTE technique was used to address the data imbalance issue, performance improvements can be observed for texture analysis methods. However, SMOTE did not improve classification outcomes when the classifier was trained with pre-trained network features. Sensitivity, specificity, accuracy, and AUC are summarized in Tab.2. Overall, features extracted by pre-trained networks provide better performance than those from texture analysis. Training a classifier using SMOTE benefits most texture analysis methods with more balanced sensitivity and specificity, improved accuracy, and higher or similar AUC. This finding was not observed in high dimensional pre-trained network features as their AUC decreased when SMOTE was applied. The highest AUC (0.977) was achieved using ResNet-50 features.
Figure 3.
ROC for patch-level classification. (a) The patch-level classifier was trained directly by the training set. (b) The patch-level classifier was trained with SMOTE oversampling technique.
Table 2.
Summary of patch-level classification.
| Trained without SMOTE | |||||
|---|---|---|---|---|---|
| Method | Sensitivity | Specificity | Accuracy | AUC | |
| Texture analysis | GLCM | 68.54% | 91.88% | 84.87% | 0.905 |
| Gabor | 60.56% | 91.01% | 81.68% | 0.871 | |
| LBP | 81.31% | 88.74% | 87.60% | 0.927 | |
| Fractal | 76.99% | 88.90% | 85.93% | 0.918 | |
| GLRLM | 80.98% | 92.66% | 89.85% | 0.943 | |
| First-order | 58.71% | 90.43% | 80.43% | 0.869 | |
| All texture features | 84.33% | 92.56% | 90.95% | 0.955 | |
| Pre-trained networks | AlexNet | 82.14% | 88.78% | 86.96% | 0.957 |
| SqueezeNet | 79.59% | 91.21% | 87.40% | 0.968 | |
| Inception-V3 | 87.58% | 87.80% | 88.92% | 0.959 | |
| ResNet-18 | 83.50% | 92.56% | 89.88% | 0.963 | |
| ResNet-50 | 93.13% | 87.15% | 89.96% | 0.977 | |
| InceptionResNet | 89.57% | 89.22% | 90.50% | 0.975 | |
| Trained with SMOTE | |||||
| Method | Sensitivity | Specificity | Accuracy | AUC | |
| Texture analysis | GLCM | 84.96% | 84.38% | 85.51% | 0.928 |
| Gabor | 79.83% | 82.50% | 83.07% | 0.894 | |
| LBP | 86.31% | 83.87% | 86.25% | 0.926 | |
| Fractal | 88.76% | 80.33% | 84.39% | 0.927 | |
| GLRLM | 84.95% | 88.29% | 88.43% | 0.943 | |
| First-order | 82.27% | 84.65% | 84.98% | 0.907 | |
| All texture features | 86.83% | 88.86% | 89.45% | 0.95 | |
| Pre-trained networks | AlexNet | 86.69% | 77.70% | 82.51% | 0.948 |
| SqueezeNet | 88.95% | 90.10% | 89.98% | 0.957 | |
| Inception-V3 | 91.73% | 81.48% | 86.14% | 0.948 | |
| ResNet-18 | 89.08% | 85.43% | 87.72% | 0.952 | |
| ResNet-50 | 90.83% | 89.90% | 90.87% | 0.968 | |
| InceptionResNet | 88.95% | 92.08% | 91.71% | 0.969 | |
3.3. Margin-level classification
Based on patch-level classification results, three feature selections were further investigated for margin-level classification, the LBP, the combination of all texture analysis methods, and ResNet-50. The ROC curves for margin-level classification are shown in Fig.4. Patch-based decision fusion yielded slightly higher AUCs than cluster-based decision fusion. And in each decision fusion method, the highest AUC was achieved with all texture analysis features, and the lowest AUC was with LBP features alone. ResNet-50 features were likely to achieve the most balanced comprehensive performance according to the upper left corner of the ROC curves. TN (true negative), TP (true positive), FN (false negative), FP (false positive), sensitivity, specificity, and accuracy are summarized in Tab.3. Optimization of hyperparameters for decision fusion can be implemented using a validation set. In this study, the performance was reported with fixed hyperparameters. Additionally, the theoretical performance observed from ROC curves was also reported. The combination of ResNet-50 features and the cluster-based weighted majority voting method can potentially achieve 100% sensitivity and 91.67% specificity, and there were 2 normal specimens misclassified as tumors in this case.
Figure 4.
Margin-level classification results with all texture analysis, LBP, and ResNet-50 features. (a) Decision fusion with cluster-based weighted majority voting. (b) Decision fusion with patch-based weighted majority voting.
Table 3.
Summary of margin-level classification.
| Decision fusion method | TN | TP | FN | FP | Sensitivity | Specificity | Accuracy | ||
|---|---|---|---|---|---|---|---|---|---|
| Cluster-based | Fixed hyperparameters | LBP | 21 | 40 | 2 | 3 | 95.24% | 87.50% | 92.42% |
| All texture | 22 | 40 | 2 | 2 | 95.24% | 91.67% | 93.94% | ||
| ResNet-50 | 23 | 38 | 4 | 1 | 90.48% | 95.83% | 92.42% | ||
| From ROC | LBP | 22 | 38 | 4 | 2 | 90.48% | 91.67% | 90.91% | |
| All texture | 22 | 41 | 1 | 2 | 97.62% | 91.67% | 95.45% | ||
| ResNet-50 | 22 | 42 | 0 | 2 | 100.00% | 91.67% | 96.97% | ||
| Patch-based | Fixed hyperparameters | LBP | 22 | 38 | 4 | 2 | 90.48% | 91.67% | 90.91% |
| All texture | 22 | 37 | 5 | 2 | 88.10% | 91.67% | 89.39% | ||
| ResNet-50 | 22 | 38 | 4 | 2 | 90.48% | 91.67% | 90.91% | ||
| From ROC | LBP | 22 | 38 | 4 | 2 | 90.48% | 91.67% | 90.91% | |
| All texture | 22 | 39 | 3 | 2 | 92.86% | 91.67% | 92.42% | ||
| ResNet-50 | 22 | 41 | 1 | 2 | 97.62% | 91.67% | 95.45% | ||
4. DISCUSSIONS AND CONCLUSION
Patch-level classification results showed that features extracted by pre-trained networks provide a good discriminate ability for breast tumor detection on MUSE images. The combination of all texture analysis features also achieved similar performance. Among the investigated texture analysis methods, LBP has low computational complexity and may be suitable for time-sensitive scenarios. Multiple instance learning is a common approach for aggregating patch-level results to margin-level classifications. However, it usually requires a large size of training data to reach good performance. In this study, considering the number of specimens (N = 66) is limited, a cluster-based weighted majority voting method was proposed and compared with a patch-based weighted majority voting method for decision fusion. The cluster-based voting method considers both tumor and normal clusters of patches to predict the margin-level label and it considers the spatial relationship between patches while the patch-based method does not. Although cluster-based decision fusion has a slightly lower AUC, it shows the potential of achieving the highest sensitivity and accuracy with ResNet-50 features. Margin-level classification results demonstrated that a high sensitivity and specificity can be achieved by the proposed workflow. Future work will include more specimens and investigate classification with fine-tuning using popular CNN architectures as backbones.
In conclusion, 66 freshly excised human breast specimens were stained with propidium iodide and eosin Y and then imaged by MUSE. Invasive carcinomas exhibit visual contrast in color, texture, and shape compared to normal regions. Machine learning based on texture analysis and pre-trained CNN extracted features has achieved a sensitivity, specificity, and accuracy all above 90% for classifying tumor specimens on obtained images. Therefore, MUSE with machine learning may be utilized to detect positive margins intraoperatively during lumpectomy to reduce re-execution rates.
ACKNOWLEDGMENTS
This study has been supported by the Marquette University College of Engineering GHR Foundation grant (Dr. Bing Yu and Dr. Taly Gilat-Schmidt), Marquette University startup grant (Dr. Bing Yu), We Care Fund, Medical College of Wisconsin, Department of Surgery (Dr. Tina Yen and Dr. Bing Yu), and NIH Research Project Grant (Dr. Bing Yu and Dr. Dong Hye Ye).
REFERENCES
- [1].Giaquinto AN, Sung H, Miller KD, Kramer JL, Newman LA, Minihan A, Jemal A, and Siegel RL, “Breast cancer statistics, 2022,” CA: A Cancer Journal for Clinicians 72(6), 524–541 (2022). [DOI] [PubMed] [Google Scholar]
- [2].Moran MS, Schnitt SJ, Giuliano AE, Harris JR, Khan SA, Horton J, Klimberg S, Chavez-MacGregor M, Freedman G, Houssami N, Johnson PL, Morrow M, Society of Surgical, O., and American Society for Radiation, O., “Society of surgical oncology-american society for radiation oncology consensus guideline on margins for breast-conserving surgery with whole-breast irradiation in stages i and ii invasive breast cancer,” J Clin Oncol 32(14), 1507–15 (2014). [DOI] [PubMed] [Google Scholar]
- [3].Fereidouni F, Harmany ZT, Tian M, Todd A, Kintner JA, McPherson JD, Borowsky AD, Bishop J, Lechpammer M, Demos SG, and Levenson R, “Microscopy with ultraviolet surface excitation for rapid slide-free histology,” Nat Biomed Eng 1(12), 957–966 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Xie W, Chen Y, Wang Y, Wei L, Yin C, Glaser AK, Fauver ME, Seibel EJ, Dintzis SM, and Vaughan JC, “Microscopy with ultraviolet surface excitation for wide-area pathology of breast surgical margins,” Journal of biomedical optics 24(2), 026501 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Yoshitake T, Giacomelli MG, Quintana LM, Vardeh H, Cahill LC, Faulkner-Jones BE, Connolly JL, Do D, and Fujimoto JG, “Rapid histopathological imaging of skin and breast cancer surgical specimens using immersion microscopy with ultraviolet surface excitation,” Scientific reports 8(1), 1–12 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Lu T, Jorns JM, Patton M, Fisher R, Emmrich A, Doehring T, Schmidt TG, Ye DH, Yen T, and Yu B, “Rapid assessment of breast tumor margins using deep ultraviolet fluorescence scanning microscopy,” Journal of Biomedical Optics 25(12), 126501 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Mirmehdi M, [Handbook of texture analysis], Imperial College Press; (2008). [Google Scholar]
- [8].Lu T, Jorns JM, Ye DH, Patton M, Fisher R, Emmrich A, Schmidt TG, Yen T, and Yu B, “Automated assessment of breast margins in deep ultraviolet fluorescence images using texture analysis,” Biomedical Optics Express 13(9), 5015–5034 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Krizhevsky A, Sutskever I, and Hinton GE, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM 60(6), 84–90 (2017). [Google Scholar]
- [10].Szegedy C, Vanhoucke V, Ioffe S, Shlens J, and Wojna Z, “Rethinking the inception architecture for computer vision,” in [Proceedings of the IEEE conference on computer vision and pattern recognition], 2818–2826. [Google Scholar]
- [11].He K, Zhang X, Ren S, and Sun J, “Deep residual learning for image recognition,” in [Proceedings of the IEEE conference on computer vision and pattern recognition], 770–778. [Google Scholar]
- [12].Szegedy C, Ioffe S, Vanhoucke V, and Alemi AA, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in [Thirty-first AAAI conference on artificial intelligence], [Google Scholar]
- [13].Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, and Keutzer K, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” arXiv preprint arXiv:1602.07360 (2016). [Google Scholar]
- [14].Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, and Bernstein M, “Imagenet large scale visual recognition challenge,” International journal of computer-vision 115(3), 211–252 (2015). [Google Scholar]
- [15].Van der Maaten L and Hinton G, “Visualizing data using t-sne,” Journal of machine learning research 9(11) (2008). [Google Scholar]
- [16].Chawla NV, Bowyer KW, Hall LO, and Kegelmeyer WP, “Smote: synthetic minority oversampling technique,” Journal of artificial intelligence research 16, 321–357 (2002). [Google Scholar]
- [17].To T, Gheshlaghi SH, and Ye DH, “Deep learning for breast cancer classification of deep ultraviolet fluorescence images toward intra-operative margin assessment,” in [2022 44th Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC)], 1891–1894, IEEE. [DOI] [PubMed] [Google Scholar]




