Skip to main content
Breast Cancer Research : BCR logoLink to Breast Cancer Research : BCR
. 2025 Jun 12;27:104. doi: 10.1186/s13058-025-02027-4

Wearable device for axillary lymph node screening in breast cancer based on infrared thermography and artificial intelligence

Xiaoying Zhong 1,2,#, Jinqiu Deng 3,#, Ping Lu 1,#, Zhichao Zuo 3, Yu Zhao 4, Yidong Zhou 1,, Xuefei Wang 1,
PMCID: PMC12164098  PMID: 40506759

Abstract

Background

Breast cancer (BC) is the most prevalent cancer among women worldwide, and patients with metastasis to axillary lymph nodes (ALN) experience significantly lower survival rates. Current imaging-based screening methods often suffer from low sensitivity and limited accessibility for detecting ALN metastasis in breast cancer patients. In this study, we present an AI-based infrared thermography system for ALN metastasis detection to improve diagnostic accessibility and reduce intervention-related morbidity.

Methods

In this study, we curated an internal and external cohort for developing and accessing the deep learning model-based infrared thermography system. The internal cohort included 460 inpatient participants from Peking Union Medical College Hospital, randomly divided into a training set (70%) for model development and a hold-out internal validation set (30%) for initially model evaluation. The external cohort, consisting of 80 patients from both outpatient and inpatient departments recruited from Longfu Hospital, served for independent validation of the developed screening tool.

Results

The developed model AI-IRT for axillary lymph node (ALN) metastasis detection exhibited high diagnostic performance, achieving an Area Under the Curve (AUC) of 0.9424 and an accuracy of 0.8478 in the internal validation set, with a sensitivity of 0.8958 and specificity of 0.8222. In a tertiary classification scenario, the model produced an AUC of 0.8936, with corresponding accuracy, sensitivity, and specificity values of 0.7246, 0.7246, and 0.7852, respectively. In the external validation set, the AI-IRT system achieved an AUC of 0.881 and an accuracy of 0.875, with a sensitivity of 0.892 and specificity of 0.861. For the tertiary classification, the model attained an AUC of 0.771 and an accuracy of 0.613, with both sensitivity and specificity at 0.613 and 0.695, respectively.

Conclusion

Evaluated on both curated internal and external cohorts, the proposed AI-IRT demonstrated strong performance across multiple centers, highlighting its potential to enhance pre-operative and intra-operative decision-making in the treatment of breast cancer patients.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13058-025-02027-4.

Keywords: Breast cancer screening, Artificial intelligence, Thermography, Lymph node detection

Introdution

Breast cancer is the most commonly diagnosed cancer in women worldwide, with metastasis remaining the leading cause of breast cancer-related mortality [1, 2]. Lymph node involvement is a key prognostic factor, significantly impacting recurrence and survival rates [3]. When breast cancer remains localized to the breast, the five-year survival rate is 98.8%. However, this rate diminishes significantly to 85.8% when regional lymph node metastases are present [4]. Thus, precise evaluation of the axillary lymph nodes in patients with breast cancer is essential for accurate staging and informed therapeutic decisions.

The surgical management of breast cancer has evolved significantly over recent decades, influenced by contemporary clinical research [510] and the need to address associated complications such as lymphedema, seroma formation, and nerve injuries [11]. The American Society of Clinical Oncology (ASCO) now recommends sentinel lymph node (SLN) biopsy over routine axillary lymph node (ALN) dissection for patients with early-stage breast cancer [12]. While reducing ALN dissection can minimize complications, it also raises the risk of missing metastases, making ALN assessment critical both before and after surgery. Traditional screening methods, including mammography, ultrasonography, and magnetic resonance imaging (MRI) have notable limitations. For instance, routine mammography detects only 50% of level I lymph nodes, while deeper nodes require specialized imaging [13]. Previous systematic reviews have reported relatively low sensitivities for evaluating axillary lymph nodes using traditional imaging modalities [1417]. Furthermore, access to screening remains inadequate, with over half of the global population lacking access to breast cancer and ALN screening programs [1822]. Consequently, portable tools may be considered as complementary technologies, especially in resource-limited settings.

The rise of digitization has generated significant interest in computer-based image analysis, leading to the development of computer-assisted detection and diagnosis algorithms in the early 2000s. In the past five years, the field has been transformed by the widespread adoption of artificial intelligence (AI), particularly through deep learning [2326]. AI showed large potentials for breast cancer diagnosis and prognosis predictions using figures from imaging tools or pathological findings [27]. Our previous work [28] as well as other studies have shown that Infrared Thermography (IRT) have equivalent ability in breast cancer screening compared with traditional screening methods [2931]. Recent study has also demonstrated that IRT has the potential to assist in screening for metastatic lymph nodes in oral cavity cancer [32]. However, there is a lack of research on the application of IRT systems for detecting breast cancer ALN metastasis.

In this study, we developed a portable, home-based AI-IRT system [28] to evaluate the axillae of patients. Internal and external cohorts were curated from two different medical centers to develop and evaluate the developed AI-IRT system. The developed system can simultaneously capture images of the breast and axillary lymph nodes, providing a comprehensive view of the condition, thereby enhancing lymph node evaluation in breast cancer patients.

Methods

Ethical approval and datasets

This retrospective study received approval from the Research Ethics Committee of Peking Union Medical College and Hospital (Ethical approval: K24C4056). The methodologies involving human subjects were sanctioned by both institutional and national research committees, adhering to the ethical principles outlined in the Declaration of Helsinki and CLAIM checklist (Supplementary materials) [33]. Informed consent was obtained in writing from all participants. The findings of the study are presented in accordance with the TRIPOD and STROCSS criteria [34]. Datasets in the training and internal validation sets were derived from a prospective clinical trial previously published [27] (NCT: 04761211) and the datasets used for the external validation set was prospectively collected in Beijing Longfu Hospital. The trial assessed the effectiveness and safety of the AI-IRT system from August 1, 2020, to Oct 30, 2023. This study utilized real-world data and adhered to clinical practices and NCCN guidelines. Dataset A comprised 460 patients from inpatients, randomly allocated into three groups: a training set (70%, 322 of 460) for model training and hyperparameter tuning, and an internal validation set (30%, 138 of 460) for model evaluation. Participants were recruited in Peking Union Medical College Hospital from patients seeking for surgical treatment of the breasts in the inpatients department. Dataset B comprised 80 patients recruited from both outpatients and inpatients department of Longfu Hospital between Feb 1, 2024 to May 1, 2024, used for independent external validation, a relatively diverse population to mimic real-world scenarios where datasets are not always consistent with the training dataset.

All patients received routine screening or treatment for breast disease. Outpatients underwent ultrasound or mammography within three months before recruitment, while inpatients had either ultrasound, mammography, or positron emission tomography-computed tomography (PET-CT) examinations within the same timeframe. Imaging data along with clinicopathological characteristics were collected both at screening and follow-ups. Imaging evidence and physical examinations were considered comprehensively by clinicians to assess for surgery indications. The inclusion criteria for the study specified patients seeking for surgical operations who had undergone ultrasound or mammography examinations within the past three months and were aged between 18 and 80 years. Pregnant or lactating women, patients with nipple discharge, individuals allergic to latex, those with skin diseases, and patients suffering from inflammatory breast diseases were excluded. Exclusions were applied to examinations that involved invalid patient identifiers, duplicate images, nonstandard images, images lacking necessary metadata.

The AI-IRT system

The AI-IRT system comprised both hardware and software components. The hardware includes a wearable device and an ultra-sensitive infrared camera (InfiRay, detailed in Supplementary Table 1). The software component consists of an AI algorithm and a big data platform. The infrared camera is attached to mobile phones and images were captured using a standardized protocol before being uploaded to an application. These images were then automatically analyzed by the algorithm, which assessed them for ALN metastasis risk, categorizing them as either low (0) or intermediate/high (1) in the binary classification model and no LN metastasis (0), 1–3 LN metastasis (1) or ≥ 4 LN metastasis (2) in the tertiary classification model.

Human analysis of breast and ALN images was conducted based on the heat dissipation within the tumor's vascular network. Consequently, breast cancer ALN metastasis risk was categorized into three groups according to temperature gradients (TGs): TG-1 and TG-2 denoting low risk (score of 0), TG-3 indicating intermediate risk (score of 1), and TG-4 and TG-5 representing high risk (score of 2). Further details can be found in Supplementary Table 2.

Labels

ALNs evaluated were assigned an ultrasound prediction score for metastasis risk: 0 (low, LNs undetected), 1 (intermediate, detected but likely not metastatic), and 2 (high, detected and metastatic). Images were labeled based on biopsy results: 0 (no metastatic LN), 1 (1–3 metastatic LNs), and 2 (≥ 4 metastatic LNs), and further classified by location: Right (R) or Left (L). Each image received a score: 0 (R0L0), 1 (R0L1, R1L0, or R1L1), and 2 (R0L2, R2L0, R1L2, R2L1, or R2L2). In the binary classification, categories 1 and 2 were combined to indicate ALN metastasis, while tertiary classification classified the number of metastases. All LN-positive results were pathology-confirmed.

Image processing and model construction

After collecting Red, Green, and Blue (RGB) images, a series of image preprocessing operations were conducted. Initially, images were filtered to exclude those with improper posture or obstructing objects in the axillary area. Then, images were resized to 256 × 256 pixels, with a 224 × 224 pixel crop taken from the center for input into the AI system. All images were anonymized to minimize human-induced errors.

Postoperative axillary lymph node metastasis is categorized into three classes: 0 (no lymph node metastasis), 1 (1–3 lymph node metastases), 2 (≥ 4 lymph node metastases). The AI classification system involved three stages: image preprocessing, feature extraction, and classification. Preprocessed images were subjected to feature extraction using a deep learning network, followed by classification with a multilayer perceptron. In the binary classification, each image was assigned a probability score (P) for class 1 (metastasis). Metastasis was predicted if P exceeded a set threshold. For the tertiary classification, the system output three scores corresponding to the categories 0, 1, and 2, with the highest score determining the predicted class.

Training and implementation

In the binary classification task, Resnet18 (pre-trained on the ImageNet dataset) is used as the feature extraction network, while in the tertiary classification task, Resnet152 (pre-trained on the ImageNet dataset) is employed. The choice of Resnet18 and Resnet152 was determined to be optimal after a series of comparative experiments with other networks, demonstrating their superior performance in binary and tertiary classification tasks, respectively. Resnet effectively addresses the degradation problem in deep learning, significantly alleviating the difficulties associated with training neural networks with a large number of layers. Due to the cost constraints of dataset collection, pre-trained Resnet18 and Resnet152 models on the large-scale ImageNet dataset were utilized.

Resnet is composed of residual blocks, with Resnet18 containing 18 convolutional layers and Resnet152 containing 34. The network starts with a convolutional and max-pooling layer for preprocessing, followed by residual modules, which enables efficient learning through short connections to the nonlinear output. After passing through the Resnet network, a Multilayer Perceptron (MLP) with 20 nodes is used for classification. Resnet18 has approximately 11.2 million parameters, and Resnet152 has 21.8 million. The Adam optimizer with a learning rate of 0.0001 was used to minimize training loss over 500 epochs. Neural network models utilized the Python language and the PyTorch deep learning framework [35] on a Linux operating system (Ubuntu 22.04.1 LTS) with a 12th-gen Intel (R) Core (TM) i7-12700F CPU and a single NVIDIA GeForce RTX 4090 GPU.

Model performance evaluation

To evaluate model performance and interpretability, we applied four methods: Decision curve analysis (DCA) was used to assess clinical net benefit across a range of probability thresholds, reflecting real-world decision-making value [36]; model calibration was evaluated using calibration plots and the Brier score to quantify the agreement between predicted probabilities and actual outcomes [37, 38]; t-SNE was applied to visualize high-dimensional features (from the MLP’s second layer) in 2D space, illustrating class separation in training, validation, and external test sets [39]; Gradient-weighted class activation mapping (Grad-CAM) was used to highlight key regions influencing classification decisions, with visualizations generated for true positive (TP), true negative (TN), false positive (FP), and false negative (FN) cases using ResNet18 and ResNet152 models [40].

Statistical analysis

In the binary classification model, we evaluate performance using accuracy, defined as TP+TNTP+TN+FP+FN. Sensitivity (recall) was calculated by TPTP+FN, measuring the model’s ability to identify positive cases, while specificity TNTN+FP, indicating its ability to recognize negative cases. AUC (Area Under Curve) is derived from the ROC curve, which plots the true positive rate TPTP+FN against the false positive rate FPFP+TN. For the tertiary classification model, we utilized a confusion matrix to compute performance metrics. Accuracy is calculated as TPi(TPi+FPi+FNi), where TPi represents true positives for each class. Additionally, precision TPTP+FP, recall TPTP+FN, and F1-score 2×Precision×RecallPrecision+Recall provide further insights into classification performance. All statistical analysis, including metric computation and result visualization, was conducted in RStudio 4.2.1.

Results

Study overview and dataset characteristics

An overview of the study design is depicted in Fig. 1. Users attached the infrared camera to their mobile phones to acquire thermal images in a standing posture, following standard protocols. The datasets were sourced from a prospective clinical trial of our previous publication [28] and were used to evaluate the effectiveness of assessing ALN metastasis. A total of 540 patients were included in the study, including 460 in dataset A and 80 in dataset B. In dataset A, patients were randomized into a training and testing set (n = 322), and an internal validation set (n = 138). Dataset B was utilized to assess diagnostic accuracy through independent external validation. The median age of the patients was 47 years, with an interquartile range of 39–55 years. All patients underwent screening with ultrasound (100%), mammography (70.1%), and PET-CT (13.0%), and were followed up for a duration of 6 months (Supplementary Table 3).

Fig. 1.

Fig. 1

An overview of the design of the study. All participants underwent ultrasound or PET-CT or Mammogram examinations and AI-IRT images were taken. AI-IRT images from dataset A were used for model training and internal validation. Patients suspected with metastatic LNs underwent ALN or SLN dissection and metastasis were confirmed by the pathological findings

The dataset distributions of the cohort are detailed in Table 1. In dataset A, 459 patients (99.8%) underwent surgery, with 330 of these cases (72%) being diagnosed as malignant. Across the total cohort, patients were scored based on the LN involvement as follows: score 0 for no involvement (64%, n = 347), score 1 for 1–3 LNs involvement (20%, n = 106), and score 2 for more than 4 ALNs involved (16%, n = 87). In dataset B, 69 patients underwent surgery, with 37 of them with ALN metastasis. Notably, pathological assessments of ALNs were not performed for patients whose breast nodules were at low risk (benign or pre-malignant nodules) and did not have the indications for further operations, they were scored with 0.

Table 1.

Dataset distribution

Total Dataset A Dataset B
Training set Internal validation set External validation set
Patients, n 540 322 138 80
Age, n
 < 40 141 (26.11%) 84 (26.09%) 34 (24.64%) 23 (28.75%)
 40–69 373 (69.07%) 220 (68.32%) 99 (71.74%) 54 (67.50%)
 ≥ 70 26 (4.81%) 18 (5.59%) 5 (3.62%) 3 (3.75%)
BI-RADS
 ≤ 2 8 (1.48%) 3 (0.93%) 2 (1.45%) 3 (3.75%)
 3 73 (13.52%) 61 (18.94%) 9 (6.52%) 3 (3.75%)
 4 297 (55.00%) 197 (61.18%) 68 (49.28%) 32 (40.00%)
 5 35 (6.48%) 16 (4.97%) 15 (10.87%) 4 (5.00%)
 6 127 (23.52%) 45 (13.98%) 44 (31.88%) 38 (47.5%)
ALN Ultrasound, n
 LNs not reported 285 (52.78%) 172 (53.42%) 78 (56.52%) 35 (43.75%)
 LNs reported 161 (29.81%) 95 (29.50%) 36 (26.09%) 30 (37.5%)
 Metastatic LNs suspected 94 (17.41%) 55 (17.08%) 24 (17.39%) 15 (18.75%)
LN involvement, n
 0 347 (64.26%) 212 (65.84%) 92 (66.67%) 43 (53.75%)
 1–3 106 (19.63%) 56 (17.39%) 23 (16.67%) 27 (33.75%)
 ≥ 4 87 (16.11%) 54 (16.77%) 23 (16.67%) 10 (12.50%)
No surgery, n 12 (2.22%) 1 (0.31%) 0 (0%) 11 (13.75%)
Surgery, n 528 (97.78%) 321 (99.69%) 138 (100%) 69 (86.25%)

LN Lymph node, BI-RADS Breast Imaging Reporting and Data System, ALN Axillary lymph

Baseline clinicopathological characteristics of the patients who underwent surgery are summarized in Table 2. Ultrasound findings were included, with 95 cases were reported to have metastasis suspected in dataset A and 15 in dataset B. Pathological results were categorized into benign, pre-malignant (ductal carcinoma in situ), and malignant conditions. Overall, 330 cases were malignant, of which 273 in dataset A and 57 in dataset B. The tumor staging included T0 (DCIS), T1, T2, and T3 stages, with dataset A showing a distribution of 28, 155, 101, and 17 patients respectively, and dataset B showing 2, 29, 28, and 0 patients respectively. Receptor statuses for estrogen (ER), progesterone (PR), and HER-2 are also provided. Ki67% was also demonstrated in Table 2. Notably, ultrasound results and pathological results revealed statistically significant difference between dataset A and B, revealing a considerably variance in the population of the two datasets. A consistent accuracy throughout various populations can further prove the stability of the AI model.

Table 2.

Characteristics of operated patients

Total
n = 528
Dataset A
N = 459
Dataset B
N = 69
P value
Age, n 0.9722b
 < 40 135 (26.57%) 118 (25.71%) 17 (24.64%)
 40–69 367 (69.51%) 318 (69.28%) 49 (71.01%)
 ≥ 70 26 (4.92%) 23 (5.01%) 3 (4.35%)
Menopausal Status 0.793b
 Premenopause 316 (59.85%) 276 (60.13%) 40 (57.97%)
 Menopause 212 (40.15%) 183 (39.87%) 29 (42.03%)
Surgery, n 528 0.6992b
    SLN biopsy only 242 (45.83%) 212 (46.19%) 30 (43.78%)
    ALND 286 (54.17%) 247 (53.81%) 39 (56.52%)
ALN Ultrasound, n 0.0089
 LN Not reported 273 (51.71%) 249 (54.25%) 24 (34.78%)
 LN Reported 161 (30.49%) 131 (28.54%) 30 (43.48%)
 LN Metastasis suspected 94 (17.80%) 79 (17.21%) 15 (21.74%)
Neoadjuvant Chemotherapy 0.0839b
 Yes 37 (7.01%) 31 (6.75%) 9 (13.04%)
 No 491 (92.99%) 428 (93.25%) 60 (86.96%)
Pathological results, n 0.0011*
 Benign 168 (31.82%) 158 (34.42%) 10 (14.49%)
 Pre-malignant 30 (5.68%) 28 (6.10%) 2 (2.90%)
 Malignant 330 (62.50%) 273 (59.48%) 57 (82.61%)
Molecular Subtypes 0.5565
 Luminal A 78 (23.64%) 68 (24.91%) 10 (17.54%)
 Luminal B 160 (48.48%) 132 (48.35%) 28 (49.12%)
 HER2-enriched 44 (13.33%) 34 (12.45%) 10 (17.54%)
 Tripple-Negative 48 (14.55%) 39 (14.29%) 9 (15.77%)
Histological Subtypes 0.2340
 Invasive ductal carcinoma 292 (88.48%) 242 (88.64%) 50 (87.72%)
 Invasive lobular carcinoma 16 (4.85%) 15 (5.50%) 1 (1.75%)
Other Invasive carcinoma  22 (6.67%) 16 (5.86%) 6 (10.53%)
Lymph Vascular Invasion 0.4266b
 Yes 98 (29.70%) 84 (30.77%) 14 (24.56%)
  No 232 (70.30%) 189 (69.23%) 43 (75.44%)
T stage 0.0656b
 T0(DCIS) 30 (8.33%) 28 (9.30%) 2 (3.39%)
 T1 184 (51.11%) 155 (51.50%) 29 (49.15%)
 T2 129 (35.84%) 101 (33.55%) 28 (47.46%)
 T3 17 (4.72%) 17 (5.65%) 0 (0)
N stage 0.0831b
 N0 138 (41.82%) 117 (42.86%) 21 (36.84%)
 N1 103 (31.21%) 77 (28.20%) 26 (45.62%)
 N2 44 (13.33%) 39 (14.29%) 5 (8.77%)
 N3 45 (13.64%) 40 (14.65%) 5 (8.77%)
ER 0.5216b
 Positive 237 (71.82%) 198 (72.53%) 39 (68.42%)
 Negative 93 (28.18%) 75 (27.47%) 18 (31.58%)
PR 0.7662b
 Positive 203 (61.52%) 169 (61.90%) 34 (59.65%)
 Negative 127 (38.48%) 104 (38.10%) 23 (40.35%)
HER-2 0.0521b
 Positive 95 (28.79%) 81 (29.67%) 14 (24.56%)
 Negative 235 (71.21%) 192 (70.33%) 43 (75.44%)
Ki67(%)a 43.42 ± 23.92 43.25 ± 26.02 44.23 ± 25.67

Data are presented as n (%). Data were analyzed using the χ2 test unless otherwise specified. P values of less than 0.05 were considered statistically significant

SLN Sentinel lymph node, LN Lymph node, ALND Axillary lymph node dissection, DCIS Ductal carcinoma in situ, ER Estrogen receptor, PR Progesterone receptor, HER-2 Human epidermal growth factor receptor 2

aData are means ± SD

bFisher’s extract test

Performance of AI-IRT

In the binary classification task, the Resnet18 model achieved an AUC of 0.9424, and its accuracy, sensitivity, and specificity were 0.8478, 0.8958, and 0.8222, respectively (Fig. 2C). In the external validation dataset, the results obtained were as follows: AUC was 0.8812, accuracy was 0.8750, sensitivity was 0.8919, and specificity was 0.8605 (Fig. 2E). The tertiary classification task performed in the internal validation set revealed an AUC of 0.8936, accuracy of 0.7246, sensitivity of 0.7246 and specificity of 0.7852 using the Resnet152 model (Fig. 2D). In the external validation dataset using a tertiary classification, the results obtained were as follows: AUC was 0.7714, accuracy was 0.6125, sensitivity was 0.6125, and specificity was 0.6947 (Fig. 2F).

Fig. 2.

Fig. 2

Overview of the artificial intelligence-based infrared thermography system construction and the performances. A Each breast was assigned a score based on pathology results. B Images were preprocessed to extract data on axillary lymph nodes laterality and then processed by an artificial intelligence algorithm and scored as 0 (low), 1 (intermediate), and 2 (high) according to the risk of metastasis. C The system evaluated in a binary model using the internal validating set. D The system evaluated in a tertiary classification model using the internal validating set. E The system evaluated in a binary classification model evaluated using the external validating set. F The system evaluated in a tertiary classification model using the external validating set. MLP layer, multilayer perceptron layer; AUC, area under curve; ROC, receiver operating characteristics

Figure 3 provided a comprehensive evaluation of the model. DCA (Fig. 3A) illustrated the model’s net benefit across different threshold probabilities, demonstrating its superiority over the "treat all" and "treat none" strategies, highlighting its clinical utility. The calibration plot (Fig. 3B) assessed the model’s probability estimates against the ideal calibration line, showing minor deviations but overall alignment. Collectively, these analyses provide insights into the model’s decision-making, feature representation, and reliability, showcasing its strengths. Furthermore, the calculated Brier score of 0.0991 suggested that the model exhibits good calibration, with its predicted probabilities closely matching the true outcomes.

Fig. 3.

Fig. 3

Model evaluation and interpretability. A DCA shows the model’s net benefit across threshold probabilities. B Calibration plot compares predicted probabilities to the ideal calibration

Figure 4 presented the t-SNE visualization results of both the binary and tertiary classification models, based on the features extracted from the second layer of the MLP classifier. The results demonstrated clear distribution boundaries between the different classes in both models, indicating good stability and discriminative capability of the feature representations.

Fig. 4.

Fig. 4

t-SNE visualizations of the binary and tertiary classification models. (A) t-SNE of the training set for the binary classification model; (B). t-SNE validation result of the test set for the binary classification model; (C). t-SNE validation result of the external validation set for the binary classification model; (D). t-SNE validation result of the training set for the tertiary classification model; (E). t-SNE validation result of the test set for the tertiary classification model; (F). t-SNE validation result of the external validation set for the tertiary classification model

Figure 5 showed the Grad-CAM visualizations for the binary and tertiary classification models, illustrating four representative cases: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). In the TP and TN examples, the models primarily focused on the axillary region of the patients, while in the FP and FN cases, part of the attention is directed outside the axilla. This suggested that the models generally attend to the correct regions for classification, and the incorrect predictions may be partly attributed to attention being diverted to irrelevant areas.

Fig. 5.

Fig. 5

Grad-CAM highlights key regions influencing predictions. A GRAD-CAM visualizations of four examples (TP, TN, FP, FN) from ResNet18; (B). GRAD-CAM visualizations of four examples (TP, TN, FP, FN) from ResNet152

Comparison with ultrasound

We evaluated the performance of the AI system in testing set against ultrasound reports by professional diagnostic medical sonographers from the ultrasound department of the hospital. The evaluation of the ultrasound predictions in the internal validating set revealed the accuracy, sensitivity, specificity and precision of 0.8261, 0.5417, 0.9778, and 0.9286, respectively (Fig. 6A). The external validating set resulted in a similar trend of 0.7250, 0.4054, 1.0, and 1.0, respectively (Fig. 6B).

Fig. 6.

Fig. 6

Clinical evaluations of lymph node metastasis by various imaging tools. A Evaluations of the ultrasound results reported by professional sonographers using the internal validating set. B Evaluations of the ultrasound results reported by professional sonographers using the external validating set. C Ultrasound images of the OBC case. D PET-CT image of the OBC case. E AI-IRT image of OBC case. ROC, receiver operating characteristic curve

Demonstration of AI-IRT on case of occult breast cancer

Occult breast cancer (OBC) is a rare and challenging subtype of breast cancer where metastatic cells are detectable in the lymph nodes, yet the primary tumor remains imperceptible [41, 42]. Figure 6 shows the images taken by ultrasound, PET-CT and IRT of an OBC patient in our cohort. The ultrasound image revealed multiple LNs of irregular margins and structures with some fused together at the right axilla, while the breast tissue of the right side revealed no abnormality (Fig. 6C). PET-CT image showed Multiple lymph nodes with abnormally increased radiotracer uptake are present in the right axilla and right pectoral muscle space, while no abnormal radiotracer uptake was observed in either breast (Fig. 6D). The patient’s AI-IRT image is shown in Fig. 6E. The patient ultimately underwent a modified radical mastectomy of the right breast. Pathological examination reported: Metastatic carcinoma in the lymph nodes of the right breast (axillary 4/37); and it was diagnosed as occult breast cancer.

Discussion

This study is the first to evaluate the efficacy of an AI-IRT for lymph node assessments, particularly ALNs in BC patients. The binary classification model showed strong predictive performance with an AUC of 0.9424, demonstrating its ability to accurately identify true positives while minimizing false positives. Additional metrics, including accuracy (0.8478), sensitivity (0.8958), and specificity (0.8222), further validated the Resnet18 model’s robustness. The tertiary classification model produced an AUC of 0.8936, accuracy of 0.7246, sensitivity of 0.7246, and specificity of 0.7852, highlighting the system’s ability to distinguish between different classes. These findings reinforce AI-IRT’s potential as an effective adjunctive tool in ALN evaluation for BC management.

Refining ALN evaluation methods can significantly improve BC management, from staging to therapeutic decisions and post-operative follow-up. Reliable screening tools for ALN metastasis are crucial for preoperative assessments and postoperative surveillance to detect recurrence or spread. The design of AI-IRT allows for simultaneous visualization of the breast and ALNs, enabling synchronous evaluation. This is particularly useful in diagnosing OBC, where metastatic cells are found in the lymph nodes without a detectable primary tumor. Traditional imaging often misses OBC, presenting a diagnostic challenge. In cases of lymph node skip metastasis, where cancer bypasses proximal or sentinel LNs and instead metastasize to distally located LNs, the IRT system may prove advantageous. As lymph node skip metastasis may lead to underestimation of the disease’s spread if only the nearest lymph nodes are examined, it can complicate the diagnosis and staging of breast cancer. By addressing these diagnostic hurdles, the IRT system enhances sensitivity in breast cancer screenings, aiding early detection and treatment.

Our AI-IRT model has the potential to serve as an adjunctive tool in the preoperative evaluation of ALNs in BC patients. In clinical practice, physicians can leverage the model’s predictions in several key decision-making scenarios. The model can assist in identifying patients at higher risk of ALN metastasis, aiding in the selection of SLN biopsy vs. direct ALN dissection, as well as opting for neoadjuvant systemic treatments first. Secondly, it possesses the potential to minimizing unnecessary invasive procedures. Patients with a low AI-IRT risk score may avoid unnecessary ALND, reducing surgical morbidity such as lymphedema, seroma formation, and nerve injuries [11]. This can preserve patients’ arm function, aligning with de-escalation strategies in modern BC treatment. In addition, it can be utilized in postoperative surveillance and recurrence monitoring. AI-IRT can potentially be integrated into long-term follow-up protocols to monitor lymph node status post-surgery, detecting early signs of recurrence without radiation exposure.

Currently, the prevalent imaging modalities for evaluating ALNs in BC encompass mammography, axillary ultrasonography, and magnetic resonance imaging (MRI) [13, 43]. Nonetheless, SLN biopsy is recognized as the most precise technique for axillary staging [44]. Conventional imaging techniques has limitations such as mammography’s use of ionizing radiation, high false-positive rates, discomfort, and ineffectiveness in dense breast tissue [45]. Comparisons of ultrasound results in ALN metastasis prediction revealed high specificity but low sensitivity, consistent with prior studies. Integrating infrared thermography with machine learning offers a novel, non-invasive, radiation-free, and cost-effective alternative. Thermography can detect physiological changes like increased vascularization and heat patterns, aiding in early BC detection. This technology is poised to transform ALN evaluation by offering a more accurate, less invasive screening option, with applications beyond pre-operative screening to peri-operative follow-up. As BC surgeries move toward more conservative approaches, the need for reliable ALN metastasis monitoring becomes even more critical.

The utility of AI-IRT system is particularly pronounced in regions beset by medical resource scarcity or in populous nations where widespread screening and sustained surveillance pose substantial logistical challenges. Despite its promising performance, this study has several limitations. Firstly, the training sample from dataset A came from a single institution, limiting cohort diversity and generalizability to a global population. While the binary model performed well with the external validation set, the tertiary classification model’s performance was less robust, highlighting the need for further research across diverse demographics. Secondly, the study did not evaluate the AI-IRT system in post-operative surveillance, leaving its effectiveness in surgically altered physiology unclear. Future research should address this gap. In addition, several patients only received SLN biopsy only, which may result in missing metastatic LNs as skip metastasis exists. Thus, it may introduce false negative bias. Lastly, the limited number of images prevented the use of more comprehensive multinomial models, underscoring the need for larger studies that integrate AI algorithms with clinicopathological data to improve predictive precision.

Conclusion

While AI-IRT is not positioned to supplant conventional clinical screening modalities, its integration could significantly enrich the diagnostic landscape by providing ancillary data vital for guiding clinical decision-making and follow-ups in the management of BC. In this study, a deep-learning based model was constructed and the performance of AI-IRT in ALN screening was assessed using both an internal validation set and an independent external validation set. This study pioneers the integration of a portable AI-IRT system into clinical workflows for ALN assessment.

Supplementary Information

Supplementary Material 1. (197.2KB, docx)

Acknowledgements

The data storage and management for this study were supported by the Aliyun (Alibaba Cloud) Platform. We thank Dr. Lang Jie (Beijing Longfu Hospital) for providing external validation data used in this study.

Authors’ contributions

Study concept and design: X.W., Y.Z.; Acquisition, analyses, or interpretation: All authors; Drafting of the manuscript: X.Z., J.D., Y.Z.; Critical revision of the manuscript for important intellectual content: All authors; Statistical analyses: X.Z., J.D., Y.Z.; Obtained funding: X.W.; Administrative, technical, or material support: P.L., Z.Z.; Study supervision: X.W., Y.Z.

Funding

This study was funded by the National High-Level Hospital Clinical Research Funding (2022-PUMCH-A-018, 2022-PUMCH-C-043).

Data availability

All data used in this study are available upon request.

Declarations

Ethics approval and consent to participate

This retrospective study received approval from the Research Ethics Committee of Peking Union Medical College and Hospital (K24C4056) and consent to participate was gathered from all participants.

Consent for publication

Written informed consent for publication of the case details and any accompanying images was obtained from the patient (or their legal guardian).

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xiaoying Zhong, Jinqiu Deng and Ping Lu contributed equally to this work.

Contributor Information

Yidong Zhou, Email: zhouyd@pumch.cn.

Xuefei Wang, Email: wangxuefeipumch@sina.com.

References

  • 1.Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71:209–49. [DOI] [PubMed] [Google Scholar]
  • 2.Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024. Published online April 4. 10.3322/caac.21834. [DOI] [PubMed]
  • 3.Beenken SWUMZY, et al. Axillary lymph node status, but not tumor size, predicts locoregional recurrence and overall survival after mastectomy for breast cancer. Annu Surg. 2003;237:732–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chang JM, Leung JWT, Moy L, Ha SM, Moon WK. Axillary nodal evaluation in breast cancer: state of the art. Radiology. 2020;295:500–15. [DOI] [PubMed] [Google Scholar]
  • 5.Galimberti V, Cole BF, Zurrida S, et al. Axillary dissection versus no axillary dissection in patients with sentinel-node micrometastases (IBCSG 23–01): a phase 3 randomised controlled trial. Lancet Oncol. 2013;14:297–305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Krag DN, Anderson SJ, Julian TB, et al. National surgical adjuvant breast and bowel project sentinel-lymph-node resection compared with conventional axillary-lymph-node dissection in clinically node-negative patients with breast cancer: overall survival fi ndings from the NSABP B-32 randomised phase 3 trial. Lancet Oncol. 2010;11:927–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sanvido VM, Elias S, Facina G, Bromberg SE, Nazário ACP. Survival and recurrence with or without axillary dissection in patients with invasive breast cancer and sentinel node metastasis. Sci Rep. 2021;11:19893. 10.1038/s41598-021-99359-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Huang TW, Kuo KN, Chen KH, et al. Recommendation for axillary lymph node dissection in women with early breast cancer and sentinel node metastasis: a systematic review and meta-analysis of randomized controlled trials using the GRADE system. Int J Surg. 2016;34:73–80. [DOI] [PubMed] [Google Scholar]
  • 9.Giuliano AE, McCall L, Beitsch P, et al. Locoregional recurrence after sentinel lymph node dissection with or without axillary dissection in patients with sentinel lymph node metastases: the American college of surgeons oncology group z0011 randomized trial. Ann Surg. 2010;252:426–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Veronesi U, Paganelli G, Viale G, et al. A randomized comparison of sentinel-node biopsy with routine axillary dissection in breast cancer. 2003.  www.nejm.org. [DOI] [PubMed]
  • 11.Lucci A, McCall LM, Beitsch PD, et al. Surgical complications associated with sentinel lymph node dissection (SLND) plus axillary lymph node dissection compared with SLND alone in the American College of Surgeons Oncology Group trial Z0011. J Clin Oncol. 2007;25:3657–63. [DOI] [PubMed] [Google Scholar]
  • 12.Lyman GH, Temin S, Edge SB, et al. Sentinel lymph node biopsy for patients with early-stage breast cancer: American Society of Clinical Oncology clinical practice guideline update. J Clin Oncol. 2014;32:1365–83. [DOI] [PubMed] [Google Scholar]
  • 13.Chung HL, Le-Petross HT, Leung JWT. Imaging updates to breast cancer lymph node management. Radiographics. 2021;41:1283–99. [DOI] [PubMed] [Google Scholar]
  • 14.Alvarez S, Añorbe E, Alcorta P, López F, Alonso I, Cortés J. Role of sonography in the diagnosis of axillary lymph node metastases in breast cancer: a systematic review. Am J Roentgenol. 2006;186:1342–8. [DOI] [PubMed] [Google Scholar]
  • 15.Ratanaprasatporn L, Chikarmane SA, Giess CS. Strengths and weaknesses of synthetic mammography in screening. Radiographics. 2017;37:1913–27. [DOI] [PubMed] [Google Scholar]
  • 16.Cheung YC, Chen SC, Hsieh IC, et al. Multidetector computed tomography assessment on tumor size and nodal status in patients with locally advanced breast cancer before and after neoadjuvant chemotherapy. Eur J Surg Oncol. 2006;32:1186–90. [DOI] [PubMed] [Google Scholar]
  • 17.Fuster D, Duch J, Paredes P, et al. Preoperative staging of large primary breast cancer with [18F]fluorodeoxyglucose positron emission tomography/computed tomography compared with conventional imaging procedures. J Clin Oncol. 2008;26:4746–51. [DOI] [PubMed] [Google Scholar]
  • 18.Hoq MI, Jahan S, Mahmud MH, Hasan MMU, Jakaria M. Breast cancer screening awareness, practice, and perceived barriers: a community-based cross-sectional study among women in south-eastern Bangladesh. Health Sci Rep. 2024;7:e1799. 10.1002/hsr2.1799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gebremariam A, Addissie A, Worku A, et al. Association of Delay in Breast Cancer Diagnosis With Survival in Addis Ababa, Ethiopia: A Prospective Cohort Study. JCO Glob Oncol. 2023. Published online Sept. 10.1200/go.23.00148. [DOI] [PMC free article] [PubMed]
  • 20.Magwesela FM, Msemakweli DO, Fearon D. Barriers and enablers of breast cancer screening among women in East Africa: a systematic review. BMC Public Health. 2023;23:1915. 10.1186/s12889-023-16831-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Abeje S, Seme A, Tibelt A. Factors associated with breast cancer screening awareness and practices of women in Addis Ababa, Ethiopia. BMC Womens Health. 2019;19:4. 10.1186/s12905-018-0695-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sreedevi A, Quereshi MA, Kurian B, Kamalamma L. Screening for breast cancer in a low middle income country: predictors in a rural area of Kerala, India. Asian Pac J Cancer Prev. 2014;15:1919–24. [DOI] [PubMed] [Google Scholar]
  • 23.Sechopoulos I, Teuwen J, Mann R. Artificial intelligence for breast cancer detection in mammography and digital breast tomosynthesis: state of the art. Semin Cancer Biol. 2021;72:214–25. [DOI] [PubMed] [Google Scholar]
  • 24.Kim HE, Kim HH, Han BK, et al. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study. Lancet Digit Health. 2020;2:e138–48. [DOI] [PubMed] [Google Scholar]
  • 25.Wallis MG. Artificial intelligence for the real world of breast screening. Eur J Radiol. 2021;144:109661. 10.1016/j.ejrad.2021.109661. [DOI] [PubMed] [Google Scholar]
  • 26.Ongena YP, Yakar D, Haan M, Kwee TC. Artificial intelligence in screening mammography: a population survey of women’s preferences. J Am Coll Radiol. 2021;18:79–86. [DOI] [PubMed] [Google Scholar]
  • 27.Zhang J, Wu J, Zhou XS, Shi F, Shen D. Recent advancements in artificial intelligence for breast cancer: image augmentation, segmentation, diagnosis, and prognosis approaches. Semin Cancer Biol. 2023;96:11–25. [DOI] [PubMed] [Google Scholar]
  • 28.Wang X, Chou K, Zhang G, et al. Breast cancer pre-clinical screening using infrared thermography and artificial intelligence: a prospective, multicentre, diagnostic accuracy cohort study. Int J Surg. 2023;109:3021–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Resmini R, da Faria Silva L, Medeiros PRT, Araujo AS, Muchaluat-Saade DC, Conci A. A hybrid methodology for breast screening and cancer diagnosis using thermography. Comput Biol Med. 2021;135:104553. 10.1016/j.compbiomed.2021.104553. [DOI] [PubMed] [Google Scholar]
  • 30.Martín-Del-Campo-Mena E, Sánchez-Méndez PA, Ruvalcaba-Limon E, et al. Development and validation of an infrared-artificial intelligence software for breast cancer detection. Explor Target Antitumor Ther. 2023;4:294–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Rassiwala M, Mathur P, Mathur R, et al. Evaluation of digital infra-red thermal imaging as an adjunctive screening method for breast carcinoma: a pilot study. Int J Surg. 2014;12:1439–43. [DOI] [PubMed] [Google Scholar]
  • 32.Dong F, Tao C, Wu J, et al. Detection of cervical lymph node metastasis from oral cavity cancer using a non-radiating, noninvasive digital infrared thermal imaging system. Sci Rep. 2018;8:7219. 10.1038/s41598-018-24195-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Mongan J, Moy L, Kahn CE. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): a guide for authors and reviewers. Radiol Artif Intell. 2020;2:e200029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Agha R, Abdall-Razak A, Crossley E, Dowlut N, Iosifidis C, Mathew G. STROCSS 2019 Guideline: Strengthening the reporting of cohort studies in surgery. Int J Surg. 2019;72:156–65. [DOI] [PubMed] [Google Scholar]
  • 35.Paszke A, Gross S, Chintala S, et al. Automatic differentiation in PyTorch.31st Conference on Neural Information Processing Systems; 2017.
  • 36.Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006;26:565–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17:230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Brier G W. Verification of forecasts expressed in terms of probability[J]. Monthly weather review. 1950;78(1):1–3.
  • 39.Van der Maaten L, Hinton G. Visualizing data using t-SNE[J]. Journal of machine learning research. 2008;9(11).
  • 40.Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. http://gradcam.cloudcv.org.
  • 41.Ofri A, Moore K. Occult breast cancer: where are we at? Breast. 2020;54:211–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Terada M, Adachi Y, Sawaki M, et al. Occult breast cancer may originate from ectopic breast tissue present in axillary lymph nodes. Breast Cancer Res Treat. 2018;172:1–7. [DOI] [PubMed] [Google Scholar]
  • 43.Marino MA, Avendano D, Zapata P, Riedl CC, Pinker K. Lymph node imaging in patients with primary breast cancer: concurrent diagnostic tools. Oncologist. 2020;25:e231–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Valente SA, Levine GM, Silverstein MJ, et al. Accuracy of predicting axillary lymph node positivity by physical examination, mammography, ultrasonography, and magnetic resonance imaging. Ann Surg Oncol. 2012;19:1825–30. [DOI] [PubMed] [Google Scholar]
  • 45.Myers ER, Moorman P, Gierisch JM, et al. Benefits and harms of breast cancer screening: a systematic review. JAMA. 2015;314:1615–34. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1. (197.2KB, docx)

Data Availability Statement

All data used in this study are available upon request.


Articles from Breast Cancer Research : BCR are provided here courtesy of BMC

RESOURCES