Abstract
Helicobacter pylori (HP) have chronically infected more than half of the world’s population and is a cause of chronic gastritis, peptic ulcers and gastric carcinoma. The manual detection of HP in a glass slide with a microscope is extremely time-consuming and might miss at least 14% of HP-positive cases due to eye fatigue of pathologists. Here, a total of 270 gastric biopsy specimens were selected. All stained slides were scanned for analysis by the Faster-R-CNN with ResNet 50 or VGG16, then the model performance was evaluated. Furthermore, the real-time microscopic field, smartphone and AI algorithm were connected through 5G networks and the AI results were sent back to the smartphone for confirmation by the pathologists. Finally, the diagnoses of different pathologists with/without AI assistance were compared. As results, we present a deep learning framework (the Faster-R-CNN with ResNet 50) which can automatically detect HP of gastric biopsies and achieve 89.23% accuracy. We found the real-time system can effectively improve the consistency and accuracy of diagnosis among different pathologists in detecting HP because of real-time alert for lesions with sounds and labels. Thus we concluded that our smartphone-aided detection system by deep learning is the first real-time AI-assisted diagnostic tool for Helicobacter pylori screening. It can be used with a traditional microscope, does not interfere the pathologist’s perspective during routine slide diagnosis, and does not add extra steps or observation time for pathologists.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-025-05527-7.
Keywords: Helicobacter pylori, Gastric biopsy, Deep learning, Artificial intelligence, Smartphone
Subject terms: Computational biology and bioinformatics, Gastroenterology
Introduction
Helicobacter pylori (HP) is an aerophilic, Gram-negative and spiral-shaped bacterium that colonizes the gastric mucosa of humans and has been identified to be an essential pathogen of gastrointestinal diseases1. HP was firstly isolated by Australian scientists Warren and Marshall from the chronic gastritis mucosa in 1983. HP have chronically infected over 50% of the world’s population and can persist for decades. It is well known that HP is an etiological factor for chronic gastritis, peptic ulcers, gastric carcinoma, and mucosa-associated lymphoid tissue lymphoma (MALT)2. Therefore, early identification and management of HP infection are particularly important, which can effectively reduce the occurrence of chronic gastric diseases and malignant gastric tumors.
It is well known that histopathological evaluation of gastric mucosal biopsy specimen and Warthin-Starry staining are generally considered as the gold standard for HP infection diagnosis3. However, the manual detection of HP in a glass slide with a microscope is extremely time-consuming, which need experienced pathologists to carefully observe under a light microscope at 40× magnification with taking half a minute per slide. Besides, traditional detection under microscope can miss at least 14% of HP-positive cases due to eye fatigue of pathologists4. Therefore, the idea of artificial intelligence (AI) through computer-aided diagnosis (CAD) systems has been taken forward, wishing to enhance the efficiency, precision, uniformity of diagnosis and reduce the workload of pathologists.
Currently, deep learning algorithms of AI were introduced in the field of gastric biopsy pathology. In 2020, Martin et al.5 firstly trained a deep learning model to classify non-neoplastic gastric biopsy specimens into normal gastric mucosa, reactive gastropathy and HP-related gastritis, which can act as a useful tool for diagnosing H pylori gastritis. Therefore, it provides an important basis for the application of deep learning models to detect Helicobacter pylori.
However, due to the “black box” nature of deep learning, it is extremely difficult to interpret the results and make decision in clinical practice. Thus, the concept of “augmented intelligence” is proposed, which means that AI can assist clinicians to enhance human intelligence rather than replace it. In this regard, the application of AI in medical practice can help physicians to reduce potential errors and improve diagnostic accuracy. Furthermore, utilizing deep learning techniques on comprehensive slide pathology images can greatly enhance the precision and speed of diagnosis. Nevertheless, the incorporation of these AI algorithms into pathological diagnosis is not easy due to two main challenges, including digital image processing technology and deep learning algorithms. Firstly, microscopic examination is conducted using conventional analog microscope at present, which requires substantial infrastructure investment for digital workflow. Secondly, due to hardware, firmware and software differences, it is challenging even for medical experts to use AI algorithms developed by others. In 2019, Stumpe MC et al.6 offered an economical approach to address the challenges of AI microscopy, which enabled the real-time integration of algorithms into optical microscope. This novel technology is known as augmented reality (AR), leading to the development of augmented reality microscope (ARM), which has overlayed the original image with AI algorithm. It marks the beginning of a new era in the AI-assisted diagnosis of pathology via incorporating microdisplays into traditional light microscopes in the pathologist workflow.
Nevertheless, augmented reality microscopes can interfere the pathologist’s perspective during routine slide diagnosis because of microdisplays. In order to solve this problem, recently, smartphone-based microscopy with deep learning was applied in the field of medicine, including detection of viruses, automated classification of parasites, screening sickle cells, quantification of immunoassays, and so on7. Hence, in the present study, we will connect the real-time microscopic field of view to the smartphone which has connected to AI algorithm through 5G networks, and develop a smartphone-aided detection system of Helicobacter pylori in gastric tissue samples by deep learning.
Materials and methods
Ethics statement
We confirmed that all research involving human gastric biopsy specimen received approval from the Ethics Committee of the Peking University Health Science Center, and informed consent was secured from every participant. All procedures were conducted in accordance with applicable guidelines and regulations.
Data collection
A total of 270 gastric biopsy specimens were randomly selected available at the Department of Pathology (Peking University Third Hospital) from 2017 to 2018. These specimens were then collected to undergo Hematoxylin and Eosin staining and H. pylori was diagnosed with Warthin-Starry silver stains. Two expert gastrointestinal (GI) pathologists with more than 20 years experience reviewed all 270 cases by traditional microscope, blind to the initial pathology reports. Their diagnoses were considered the gold standard. The positive signals of H. pylori were categorized into four levels (-, +, ++, +++) based on the revised Sydney System8, which were shown in Fig. 1. A series of negative (n = 30), WS 1+ (n = 60), WS 2+ (n = 90) and WS 3+ (n = 90) were shown in Fig. 2A. All stained slides were scanned by a digital section scanner (KFBIO, KF-PRO-005, China), at 40 × objective magnification. Digital slides were independently evaluated by two expert pathologists, blind to traditional microscope (TM) diagnosis.
Fig. 1.
The illustrations of HP negative and positive cases displayed by digital images. A–D, Hematoxylin and Eosin staining. E–H, Warthin-Starry Silver stains. Red curve marks the inner edge of glandular lumen infected with Helicobacter pylori (magnification, ×200, Scale bars: 20 μm).
Fig. 2.
Data information and illustration of the proposed network framework. A, The number of cases per category. B, Annotation of the glandular lumens containing HP by the LabelImg software. C, Illustration of the proposed neural network structure of Faster-RCNN.
Furthermore, another 270 gastric biopsy specimens at the Department of Pathology (Peking University Third Hospital) from 2024 to 2025 were enrolled into our cohort, including a series of negative (n = 30), WS 1+ (n = 60), WS 2+ (n = 90) and WS 3+ (n = 90).
Artificial intelligence
The average size of a digital slice is 7500 × 6500 pixels, while the average size of HP is 10 × 20 pixels. Hence, HP has too small target size, varying shapes, and complex background information. Fortunately, HP are easier to identify inside the glandular lumens according to pathologists’ experience, and glandular lumens have apparent texture information. Therefore, we used a rectangular box of 2552 × 1235 pixels to crop consecutive slices containing the glandular lumens and surrounding region, then, we used the LabelImg software to annotate the glandular lumens containing HP and saved the labeling information on XML files (Fig. 2B). The dataset was divided into 80% for training, 10% for testing, and 10% for validation, respectively. Furthermore, we have built a Faster region-focused convolutional neural network (Faster-RCNN) to recognize HP in the glandular lumens using windows 10.0 and the Google deep learning framework Tensor Flow. The Faster-RCNN mainly includes four essential components: backbone network (convolutional layers), regional proposal network (RPN), region of interest (ROI) pooling, classification and regression layers. Two backbone networks, ResNet 50 and VGG 16, were selected for our model. The original images are input into the convolutional layers, which derive the characteristics of pictures and generate feature maps. Then, the region proposal networks share these feature maps and obtain candidate regions. Next, the candidate regions are pooled to output specific feature via the ROI pooling layers. Finally, the feature maps are input into the classification and regression layers to generate predicted boxes. The proposed neural network structure of Faster-RCNN is shown in Fig. 2C with the relevant outputs of different layers.
Moreover, cross-validation was carried out as follows. the entire dataset is first randomly divided into ten equal parts, the models are trained on nine parts and tested on the tenth part, and this process is repeated ten times each time using a different part for testing.
Model performance evaluation
In this study, F-measure is adopted to evaluate deep neural network architectures (ResNet 50 and VGG 16). The evaluation metrics are listed below.
TP: true positive.
FP: false positive.
TN: true negative.
FN: false negative.
Precision = TP/(TP + FP).
Recall = TP/(TP + FN).
F1-measure = 2×(Precision×Recall)/(Precision + Recall) = 2TP/(2TP + FP + FN).
During Hp detection, Intersection over Union (IOU) is a metric for evaluating the ratio of overlapping area between the predictive box and the ground-truth bounding box, and the intersection ratio greater than 0.5 is recorded as correct prediction, otherwise, it was considered wrong. Additionally, the confidence score of each predicted bounding box is calculated. Then, confidence threshold (Score_Threshold) of deep learning models was set. If the score exceed the threshold, it will be regarded as a positive sample, otherwise, it is a negative sample. Finally, the precision rate (Precision), recall rate (Recall) and F score of models were calculated.
During the training process, we applied data preprocessing techniques (such as random adjustment of image contrast, random cropping, random scaling, rotate, mosaic, flip, blur etc.) to prevent the model from overfitting. Meanwhile, we closely monitored the validation loss during training and stopped training when the loss began to decrease slowly to prevent overfitting. Furthermore, using a good pre-trained weight can reduce the need for large amounts of data and multiple iterations of the model, which also helps prevent overfitting indirectly.
Connection of the real-time microscopic field, smartphone and AI algorithm
The Betrue Real-time Microscopic Field Sharing System (RealVision System, Guangzhou Betrue Technology) was installed on a microscope, the microscopic field was connected with smartphone via WiFi, and then the smartphone was connected directly to AI algorithm (the Faster-RCNN network with a ResNet-50 backbone) on the server through 5G networks. Eventually, the AI results are sent back to the smartphone for confirmation by the pathologists.
Comparison of the diagnoses of different pathologists with CNN results
Two expert gastrointestinal (GI) pathologists, two surgical pathologists without GI fellowship training and two pathology residents blindly reviewed all 270 cases by traditional microscope. After a 3-month washout period, the four non-GI pathologists also read the same slides under microscope with AI assistance.
Statistical analysis
Statistical analysis was performed using SPSS version 24.0 (IBM Corporation, USA). Pearson’s phi coefficient test was used as a measure of correlation. To determine interrater reliability, Cohen’s kappa with the 95% confidence interval (CI) was evaluated. The κ value was interpreted as follows: less than 0.20 indicates poor agreement; 0.21–0.40 signifies fair agreement; 0.41–0.60 represents moderate agreement; 0.61–0.80 denotes good agreement; and greater than 0.80 indicates very good agreement. P values below 0.05 were deemed statistically significant.
Results
Notable correlation between HP identified using a traditional microscope (TM) and digital pathology (WSI)
In 270 gastric biopsies reviewed with WSI, only 7 cases (2.6%) was not consistent with the original diagnosis of TM. In the 7 cases, 3 cases by WSI were confirmed the correct diagnosis reviewed by two expert GI pathologists. Therefore, using WSI, we recorded no false positive (0%) and no false negative (0%). Only 4 cases of HP(++) were diagnosed as HP(+++). However, 3 cases by TM were false negative (1.1%). There was the notable correlation between TM and WSI (P < 0.01) and both modalities diagnosed 30 HP negative, 60 HP(+), 90 HP(++) and 90 HP(+++) with an excellent correlation (Pearson’s Phi = 0.988), as demonstrated in Table 1. When compared to both modalities, considered as “gold standard” for HP infection diagnosis, TM demonstrated an accuracy, sensitivity and specificity of 0.9889, 0.9875 and 1.0000, respectively (Table 2), whereas, all values for WSI were 1.0000. Treating TM and WSI reading methods as if they were two different reviewers, Cohen’s kappa for inter-observers agreement was calculated and was nearly 1 (Cohen’s kappa = 0.964), suggesting very good agreement between TM and WSI.
Table 1.
The significant association between HP diagnosed with a traditional microscope (TM) and with digital pathology (WSI). The observations are from pure visual reading on WSI without an artificial intelligence classifier.
| TM | WSI | both modalities | ||
|---|---|---|---|---|
| HP negative(-) | 33 | 30 | 30 | |
| HP weak postive(+) | 57 | 60 | 60 | |
| HP moderate positive(++) | 90 | 86 | 90 | |
| HP strong positive(+++) | 90 | 94 | 90 | |
| Totals | 270 | 270 | 270 | |
p < 0.01, Pearson’s Phi Coefficient = 0.988.
Table 2.
Diagnostic performance measures of TM and WSI for the detection of HP in 270 gastric biopsies. Both modalities of TM and WSI were used as the reference standard.
| TM | WSI | |
|---|---|---|
| Accuracy | 0.9889 | 1.0000 |
| Sensitivity | 0.9875 | 1.0000 |
| Specificity | 1.0000 | 1.0000 |
| Positive Predictive Value(Precision) | 1.000 | 1.0000 |
| Negative Predictive Value | 0.9091 | 1.0000 |
The Faster-R-CNN using ResNet 50 is better than VGG 16 in detecting HP of gastric biopsies
In evaluation, the data collection includes 4320 training images, 540 test images and 540 validation images. The images of HP were RGB images with the resolution of 850 × 617 pixels. In order to save computer time and memory, we set the hyperparameter as: batch size = 1, learning rate = 0.00001, epochs = 100, decay = 0.92. After training, the value of loss function decreased significantly (Fig. 3A, upper). This can be further observed that ResNet 50 achieved slightly faster convergence than VGG 16. Besides, the results suggest that VGG 16 start overfitting after 30 epochs because the training loss continues to decrease while the validation loss (val loss) starts to increase. However, ResNet 50 could ovoid the overfitting problem. Furthermore, model performance in detecting Hp was evaluated by means of predicting visual images. Red boxes and blocks represent the model prediction and confidence, respectively. Only predictions with a confidence score greater than 0.5 were displayed. As shown in Fig. 3A (middle), the feature extraction capability of the two backbone networks is different. The predicted results of ResNet 50 and VGG 16 showed that they are able to identify the glandular lumen infected with Helicobacter pylori. However, VGG 16 identified some glandular lumen without Hp infection, leading to false-positive diagnosis.
Fig. 3.
The Faster-R-CNN with ResNet 50 is better than VGG 16 in detecting HP of gastric biopsies. A, After training, the value of loss function of both models is calculated (upper). Furthermore, model performance in detecting Hp was evaluated by means of predicting visual images. Red boxes and blocks represent the model prediction and confidence, respectively. Only predictions with a confidence score greater than 0.5 were displayed (middle). Then, F1 scores were calculated for different confidence threshold (Score_Threshold) using ResNet 50 or VGG 16 (bottom). B, The precision-recall (PR) curves of ResNet 50 and VGG 16. C, The average detection accuracy and recall rate of ResNet 50 and VGG 16, respectively.
Then, as shown in Fig. 3A (bottom), F1 scores were calculated for different confidence threshold (Score_Threshold) using ResNet 50 or VGG 16. Experimental results showed that VGG 16 achieved a F1 score of 0.53 when the confidence threshold was set as 0.5. The F1 score reached a peak of 0.72 when the confidence threshold was set as 0.9. Meanwhile, the ResNet 50 model achieved a F1 score of 0.75 when the confidence threshold was set as 0.5, and the F1 score reached a peak of 0.87 when the confidence threshold was set as 0.92. Thus, as assessed by F1 scores, the ResNet 50 model outperformed the VGG 16 model in diagnosing Hp infection.
Furthermore, as shown in Fig. 3B, ResNet 50 model showed better performance than VGG 16 when comparing precision-recall (PR) curves, which revealed that the ResNet 50 model achieved higher precision than VGG 16 with the same recall rate. As shown in Fig. 3C, the average detection accuracy and recall rate of ResNet 50 are 9.88% and 14.08% higher than those of VGG 16, respectively. Therefore, the Faster-RCNN network with a ResNet-50 backbone was developed for detecting Helicobacter pylori in our study.
In addition, we employed the Receiver Operating Characteristic (ROC) curve to assess the accuracy of the Convolutional Neural Network (CNN) decision-making process. The ROC curve is constructed by plotting sensitivity against specificity, and we computed the Area Under the Curve (AUC) for this purpose. The ROC curve was generated by plotting the relevant values. These values ranged from 0 to 1, with values approaching 1 indicating high classification accuracy. Figure S1 presents the ROC curves and their corresponding AUCs, demonstrating that the ResNet 50 model exhibits strong performance with an AUC of 0.8999.
Furthermore, we developed our model based on the two-stage Faster R-CNN, and compared with other widely used models, such as the one-stage YOLOX network, SSD network, RetinaNet network, the two-stage framework with HRNet or Swin Transformer. All models were trained for 100 epochs and evaluated on the test set to obtain performance metrics. The performance results of various models are presented in Table S1.
Successful connection of the real-time microscopic field, smartphone and AI algorithm
In this study, we successfully developed the connection of the microscopic field, smartphone and AI algorithm (the Faster-RCNN network with a ResNet-50 backbone) via the Betrue Real-time Microscopic Field Sharing System and 5G networks. Eventually, the AI results for lesions with real-time alerts with sounds and labels are sent back to the smartphone for confirmation by the pathologists (Fig. 4). The AI system’s processing time for a single field of view (FOV) was approximately 200 milliseconds, thereby providing real-time assistance to pathologists. Our results suggested that AI algorithm is able to identify Helicobacter pylori in the glandular lumen of gastric biopsy, highlights them in the field of smartphone and help pathologists to make clinical decisions (Fig. 5).
Fig. 4.
Successful connection of the real-time microscopic field, smartphone and AI algorithm. The Betrue Real-time Microscopic Field Sharing System (RealVision System, Guangzhou Betrue Technology) was installed on a microscope, the microscopic field was connected with smartphone via WiFi, and then the smartphone was connected directly to AI algorithm (the Faster-RCNN network with a ResNet-50 backbone) on the server through 5G networks. Eventually, the AI results are sent back to the smartphone for confirmation by the pathologists.
Fig. 5.
AI algorithm is able to identify Helicobacter pylori in the glandular lumen of gastric biopsy, highlights them in the field of smartphone and help pathologists to make clinical decisions.
The real-time AI assistance system can effectively improve the consistency and accuracy of diagnosis among different pathologists
As shown in Table 3, our results showed that CNN (ResNet 50), expert GI pathologists, general surgical pathologists and pathology residents were 89.23%, 100%, 85%, and 55% consistent with the gold standard diagnosis, respectively, suggesting CNN can help non-GI pathologists to make more precise diagnosis.
Table 3.
Pathologist versus CNN concordance with gold standard diagnoses.
| Reviewer | Average % Concordant Results With the Gold Standard Diagnosis |
|---|---|
| CNN(ResNet 50) | 89.23% |
| Expert GI pathologists (n = 2) | 100% |
| Non-GI fellowship trained surgical pathologists(n = 2) | 85% |
| Pathology residents (n = 2) | 55% |
CNN, convolutional neural network; GI, gastrointestinal.
Furthermore, two pathology residents and two non-GI surgical pathologists observed the slides without AI assistance initially, followed by a second review with AI assistance using the same microscope. The washout period was 3 months. As indicated in Table 4, the sensitivities and the corresponding κ values among the 4 pathologists changed. However, by means of AI assistance, the sensitivity for detecting Helicobacter pylori significantly improved from 0.6625 to 0.94375 (P = 0.0305 < 0.05), and the κ score improved from 0.3037 to 0.7885 (P = 0.0018 < 0.05). Moreover, the class-specific performance in general surgical pathologists is shown in Table 5. For cases of HP negative and HP weak positive, the detection sensitivity markedly increased from 0.8313 to 1.0000 with AI assistance (P = 1.9662e-10 < 0.05). And the κ score increased from 0.5337 (moderate) to 1.0000 (good) through AI assistance. However, for cases of HP moderate positive and strong positive, AI assistance did not affect the detection sensitivity and the κ score statistically. Therefore, these results suggested that this AI assistance system can effectively improve the consistency and accuracy of diagnosis among different pathologists in detecting Helicobacter pylori.
Table 4.
Sensitives, specificities, and κ scores of the reader study with AI assistance and without AI assistance.
| Sensitivity | Specificity | κ Score | ||||
|---|---|---|---|---|---|---|
| With AI | Without AI | With AI | Without AI | With AI | Without AI | |
| Pathology resident 1 | 0.8875 | 0.4375 | 1.0000 | 1.0000 | 0.6368 | 0.1474 |
| Pathology resident 2 | 0.8875 | 0.5500 | 1.0000 | 1.0000 | 0.6368 | 0.2136 |
| Non-GI fellowship trained surgical pathologist 1 | 1.0000 | 0.7750 | 1.0000 | 1.0000 | 1.0000 | 0.4336 |
| Non-GI fellowship trained surgical pathologist 1 | 1.0000 | 0.8875 | 1.0000 | 1.0000 | 1.0000 | 0.6368 |
| Overall | 0.94375 | 0.6625 | 1.0000 | 1.0000 | 0.7885 | 0.3037 |
Table 5.
Sensitivity, specificities and κ scores Class-Specific diagnoses with AI assistance and without AI assistance in Non-GI fellowship trained surgical pathologists.
| Sensitivity | κ Score | |||
|---|---|---|---|---|
| With AI | Without AI | With AI | Without AI | |
| HP negative(-) | 1.0000 | 0.8313 | 1.0000 | 0.5337 |
| HP weak postive(+) | 1.0000 | 0.8313 | 1.0000 | 0.5337 |
| HP moderate positive(++) | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| HP strong positive(+++) | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
Discussion
In the present study, we studied whether WSI (whole slide imaging) could accurately recognize HP infection on gastric biopsies and we used W-S silver stain in order to diagnose HP infection quickly with WSI. Fortunately, we found that there was the very good agreement between WSI and TM (traditional microscope) in HP recognition. Our results were consistent with previous reports. Liscia DS et al.9 observed only 4 incorrect positive results and 4 incorrect negative results in the analysis of 185 gastric biopsies using WSI, resulting in a consensus rate of 95.6%. When compared to TM, known as the benchmark for diagnosing HP infection, WSI achieved a sensitivity of 0.95 and a specificity of 0.96, respectively. Furthermore, in our study, we recorded no false positive and no false negative by WSI and 3 false negatives by TM, indicating that WSI is comparable to or even better than TM in comparing gastric biopsies with HP negative to those with HP positive. Besides, we observed that WSI was better than TM in evaluating the extent of HP infection. We speculated that high-definition magnification and precision counts of WSI by computer might be the cause. The contrast between WSI and TM was further examined by conceptualizing the computer and the microscope as distinct reviewers. Cohen’s kappa suggested a significant concordance between the two methods. Thus, our data clearly demonstrate that diagnosis of HP infection by WSI of W-S silver stain is feasible and precise at a 40× magnification, leading us to conclude that WSI images are effective for training AI algorithms.
Moreover, Since 2015, Faster-R-CNN has been acknowledged as a groundbreaking AI algorithm for its exceptional speed in small object detection10. Thus, we selected Faster-R-CNN using ResNet 50 and VGG16 as the backbone network for HP detection of gastric biopsies. To evaluate the effectiveness of these two models, we used several criteria, including F-measures, PR curve, accuracy, and recall. We observed that the Faster-R-CNN using ResNet 50 is better than VGG 16 for HP detection of gastric biopsies. Our results are in agreement with several previous reports. Shaolong Ma et al.10 found that Faster R-CNN paired with ResNet 50 provided superior recognition performance compared to VGG16. ResNet (Residual Neural Network) was introduced by four Chinese researchers, including Kaiming He from Microsoft Research Institute10. ResNet 50 is a kind of ResNet architecture containing 50 layers. ResNet 50 helps mitigate the vanishing gradient issue and enables the model to capture an identity function, ensuring higher layers to perform as least as well as the lower layers. Our results revealed that ResNet 50 model achieved 89.23% detection accuracy and 85% recall rate. These results were consistent with previous reports. David R, Martin et al.5 successfully trained a HALO-AI image analysis software with VGG model to recognize Helicobacter pylori gastritis (HPG) and the HALO-AI correct area distribution (AD) reached 87.3%. Compared with HALO-AI, our model is highly consistent with the diagnostic habits of pathologists, with firstly recognizing the glandular lumen of the gastric pits and then identifying Helicobacter pylori colonization with the glandular lumen. Furthermore, a new histopathology dataset for gastric mucosa, named DeepHP, was developed to diagnose Helicobacter pylori infection, alongside a convolutional neural network (CNN) model for gastritis diagnosis11. Then, they explored the classification performance of three CNN architectures, including VGG16, Inception V3, and ResNet 50. They found the VGG16 had the best performance with an Area Under the Curve of 0.998. They observed an enhancement in the performance of the pre-trained VGG16 and Inception V3 models following the fine-tuning method. Conversely, the ResNet 50 model exhibited a decline in performance because of the new data domain or the selection of active layers in the fine-tuning phase11. Despite the increased parameter for VGG16 in their study, the performance of the other models (ResNet 50 and Inception V3) remains good. The final results of AI are greatly impacted by the intricacy of the models, the pre-training dataset, the optimized hyperparameters, and the dataset configuration. Therefore, modifications to the parameters can readily impact the performance of the convolutional neural network (CNN) models.
At present, there are primarily two modes for the use of artificial intelligence technology in pathology diagnostic assistance. The first mode is to scan slides digitally and subsequently conduct comprehensive slide image analysis. The second mode is to acquire real-time images directly from the microscope field of view (FOV) and present the AI-generated outcomes via instant visualization technology12–16. The latter does not require huge investments in infrastructure and extra time for preliminary scanning, and seems to provide a better understanding of a pathologist’s daily work habits, assisting pathologists to enhance the rapidity, precision, uniformity of diagnosis and reduce the workload.
Therefore, we selected the second mode and successfully developed the connection of the microscopic field, smartphone and AI algorithm (the Faster-RCNN network with a ResNet-50 backbone) via the Betrue Real-time Microscopic Field Sharing System and 5G networks. Our results suggested that AI algorithm is able to identify Helicobacter pylori in the glandular lumen of gastric biopsy, highlights them in the field of smartphone and help pathologists to make clinical decisions. Thus, the integration of AI into clinical pathology workflows with real-time alerts of lesions has the potential to reduce inadvertent errors and improve the efficiency and accuracy of HP recognition. Our results are in agreement with several previous reports. Martin C. stumpe et al.6 suggested an augmented reality microscope (ARM), which seamlessly integrated AI into routine workflows by overlaying real-time AI-generated information superimposed on the present view of the slide. ARM has proven to be effective for detecting prostate cancer and metastatic breast cancer. It is anticipated to have extensive applications in pathology, including the evaluation of lymph node metastasis and the count of mitotic figures. Furthermore, Aydogan ozcan et al. introduced a deep learning system for the automated detection of sickle cells in blood smears via a smartphone microscope. They blindly performed this mobile sickle cells detection on 96 patients, achieving ~ 98% accuracy7. Therefore, these results provide the basis for developing a smartphone-aided detection system of HP in gastric biopsy specimens by deep learning.
Finally, our results suggested that the real-time AI assistance system can effectively enhance the consistency and precision of diagnoses among different pathologists in detecting Helicobacter pylori. The results were consistent with previous findings. Tang et al.17 selected 486 slides of abnormal cervical epithelial cells to assess the efficacy of the AI microscope. The findings of this study indicate that an AI-enhanced microscope is capable of offering real-time support for screening cervical cytology, thereby enhancing both the precision and effectiveness of cervical cytology diagnoses.
Taken together, our smartphone-aided detection system by deep learning is the first real-time AI-assisted diagnostic tool for Helicobacter pylori screening. It integrates seamlessly into routine pathology workflows, with an inference and labeling time of less than 200 milliseconds per FOV. It can be used with a traditional microscope, does not interfere the pathologist’s perspective during routine slide diagnosis, and does not add extra steps or observation time for pathologists because of real-time alert for lesions with sounds and labels. Thus, this smartphone-aided detection system by deep learning can act as a valuable tool for diagnostic aid of H pylori gastritis.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Acknowledgements
The article was supported by the National Natural Science Foundation of China (50775011, 81572533) and Beijing Natural Sciences Foundation (7182078).
Author contributions
Guanmeng Gao: Methodology and software; Zihan Wei: Data curation; Fei Pei: Resources and writing-original draft; Yajie Du: Slides scanning and labeling; Beiying Liu: Project administration, supervision and writing-review and editing. All authors reviewed the manuscript.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.”
Declarations
Conflict of interest
The authors declare that there are no conflicts of interest.
Footnotes
Guanmeng Gao, Zihan Wei and Fei Pei contributed equally to this work.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Wu, X. et al. A Multi-Omics study on the effect of Helicobacter Pylori-Related genes in the tumor immunity on stomach adenocarcinoma. Front. Cell. Infect. Microbiol.12, 1–20. 10.3389/fcimb.2022.880636 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zhang, X. D. et al. Ilaprazole-amoxicillin dual therapy at high dose as a first-line treatment for helicobacter pylori infection in hainan: a single-center, open-label, noninferiority, randomized controlled trial. BMC Gastroenterol.23 (1), 249–259. 10.1186/s12876-023-02890-5 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kwon, Y. H. et al. The diagnostic validity of the (13)c-urea breath test in the gastrectomized patients: single tertiary center retrospective cohort study. J. Cancer Prev.19 (4), 309–317. 10.15430/JCP.2014.19.4.309 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Franklin, M. M. et al. A deep learning convolutional neural network can differentiate between Helicobacter Pylori gastritis and autoimmune gastritis with results comparable to Gastrointestinal pathologists. Arch. Pathol. Lab. Med.146 (1). 10.5858/arpa.2020-0520-OA (2022). [DOI] [PubMed]
- 5.David, R. et al. Clark. A deep learning convolutional neural network can recognize common patterns of injury in gastric pathology. Arch. Pathol. Lab. Med. 2019/06 /28;144:1–9. 10.5858/arpa.2019-0004-OA [DOI] [PubMed]
- 6.Cameron, P. H. et al. MacDonald, Yun, Liu, Shiro, Kadowaki, Kunal, Nagpal, Timo, Kohlberger, Jeffrey, Dean, Stumpe. An augmented reality microscope with real-time artificial intelligence integration for cancer diagnosis. Nat Med 2019/08/14;25:1453–1457. 10.1038/s41591-019-0539-7 [DOI] [PubMed]
- 7.de Kevin, H. & Koydemir, C. Yair, rivenson, derek, tseng, elizabeth, Van dyne, lissette, bakic, doruk, karinca, kyle, liang, megha, ilango, esin, gumustekin, aydogan, ozcan. Automated screening of sickle cells using a smartphone-based microscope and deep learning. NPJ Digit. Med. 2020/06 /09;3:1–8. 10.1038/s41746-020-0282-y [DOI] [PMC free article] [PubMed]
- 8.Weisong, W., Wang, L., Liu, Y. & Yuchang, H. Improving the detection of Helicobacter pylori in biopsies of chronic gastritis: a comparative analysis of H&E, methylene blue, Warthin-Starry, immunohistochemistry, and quantum Dots immunohistochemistry. Front. Oncol. 2023/08/ 03;13: 1–9. 10.3389/fonc.2023.1229871 [DOI] [PMC free article] [PubMed]
- 9.Liscia, D. S. et al. Use of digital pathology and artificial intelligence for the diagnosis of Helicobacter pylori in gastric biopsies. Pathologica114, 32–40. 10.32074/1591-951X-751 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ma, S., Huang, Y., Che, X. & Gu, R. Faster RCNN-based detection of cervical spinal cord injury and disc degeneration. J. Appl. Clin. Med. Phys.21 (9), 235–243. 10.1002/acm2.13001 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gonçalves, W. G. et al. .d. DeepHP: A new gastric mucosa histopathology dataset for Helicobacter pylori infection diagnosis. Int. J. Mol. Sci.23, 14581. 10.3390/ijms232314581 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bao, H. et al. Artificial intelligence–assisted cytology for detection of cervical intraepithelial neoplasia or invasive cancer: a multicenter, clinical-based, observational study. Gynecol. Oncol.159, 171–178. 10.1016/j.ygyno.2020.07.099 (2020). [DOI] [PubMed] [Google Scholar]
- 13.Sanyal, P. et al. Artificial intelligence in cytopathology: a neural network to identify papillary carcinoma on thyroid fine-needle aspiration cytology smears. J. Pathol. Inf.9, 43. 10.4103/jpi.jpi_43_18 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Harmon, S. A. et al. Artificial intelligence at the intersection of pathology and radiology in prostate cancer. Diagn. Interv Radiol.25, 183–188. 10.5152/dir.2019.19125 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ström, P. et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study. Lancet Oncol.21, 222–232. 10.1016/S1470-2045(19)30738-7 (2020). [DOI] [PubMed] [Google Scholar]
- 16.Sanghvi, A. B. et al. Performance of an artificial intelligence algorithm for reporting urine cytopathology. Cancer Cytopathol.127, 658–666. 10.1002/cncy.22176 (2019). [DOI] [PubMed] [Google Scholar]
- 17.Hong-Ping, T. et al. Han. Cervical cytology screening facilitated by an artificial intelligence microscope: A preliminary study. Cancer Cytopathol.129 (9), 693–700. 10.1002/cncy.22425 (2021). [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.”





