Skip to main content
PLOS Digital Health logoLink to PLOS Digital Health
. 2023 Feb 17;2(2):e0000058. doi: 10.1371/journal.pdig.0000058

Artificial intelligence model for analyzing colonic endoscopy images to detect changes associated with irritable bowel syndrome

Kazuhisa Tabata 1, Hiroshi Mihara 1,*, Sohachi Nanjo 1, Iori Motoo 1, Takayuki Ando 1, Akira Teramoto 1, Haruka Fujinami 1, Ichiro Yasuda 1
Editor: Benjamin P Geisler2
PMCID: PMC9937744  PMID: 36812592

Abstract

IBS is not considered to be an organic disease and usually shows no abnormality on lower gastrointestinal endoscopy, although biofilm formation, dysbiosis, and histological microinflammation have recently been reported in patients with IBS. In this study, we investigated whether an artificial intelligence (AI) colorectal image model can identify minute endoscopic changes, which cannot typically be detected by human investigators, that are associated with IBS. Study subjects were identified based on electronic medical records and categorized as IBS (Group I; n = 11), IBS with predominant constipation (IBS-C; Group C; n = 12), and IBS with predominant diarrhea (IBS-D; Group D; n = 12). The study subjects had no other diseases. Colonoscopy images from IBS patients and from asymptomatic healthy subjects (Group N; n = 88) were obtained. Google Cloud Platform AutoML Vision (single-label classification) was used to construct AI image models to calculate sensitivity, specificity, predictive value, and AUC. A total of 2479, 382, 538, and 484 images were randomly selected for Groups N, I, C and D, respectively. The AUC of the model discriminating between Group N and I was 0.95. Sensitivity, specificity, positive predictive value, and negative predictive value of Group I detection were 30.8%, 97.6%, 66.7%, and 90.2%, respectively. The overall AUC of the model discriminating between Groups N, C, and D was 0.83; sensitivity, specificity, and positive predictive value of Group N were 87.5%, 46.2%, and 79.9%, respectively. Using the image AI model, colonoscopy images of IBS could be discriminated from healthy subjects at AUC 0.95. Prospective studies are needed to further validate whether this externally validated model has similar diagnostic capabilities at other facilities and whether it can be used to determine treatment efficacy.

Author summary

This study reports on an endoscopic image artificial intelligence (AI) model for detecting irritable bowel syndrome (IBS). Endoscopic images of IBS patients usually do not have any teacher data because their changes cannot be detected by a human observer. However, we investigated the possibility of using the presence or absence of symptoms as teacher data, and found that endoscopic images of IBS patients could be discriminated with high accuracy from those of healthy subjects, and that endoscopic images of diarrhea-type IBS could also be discriminated from those of constipation-type IBS. It is expected that this will enable endoscopic AI diagnosis in other functional gastrointestinal diseases such as NERD and functional dyspepsia by building an image AI model based on the presence or absence of symptoms. In addition, this study uses a code-free deep learning approach, which has the potential to improve clinicians’ access to deep learning. Further research is needed to determine whether real-time IBS image determination as well as prediction of treatment efficacy is possible.

Introduction

Irritable bowel syndrome (IBS) affects about 10% of the Western population, and its prevalence is increasing annually [1]. Patients with IBS frequently experience abdominal pain and changes in stool habits, but often exhibit no abnormalities in immediate diagnostic tests or lower gastrointestinal endoscopy [2]. Recent evidence indicates that aspects of Western lifestyles, such as frequent antibiotic therapy, that alter the microbiota, may be involved in development of IBS. Biofilm formation is a unique mode of microbial growth [3] and polymicrobial biofilms have been implicated in several gastrointestinal disorders [4,5]. In a recent study, biofilm formation by E. coli and Ruminococcus gnavus in the terminal ileum to the ascending colon was seen in 60% of IBS cases, and close observation revealed changes in areas where biofilms formed [6]. However, even though microinflammation is present histologically, human investigators currently cannot determine whether an endoscopic image is from a patient with IBS.

Imaging artificial intelligence models (AI) have been developed to detect lower gastrointestinal tract lesions in real time, and several models have already been clinically applied [7]. Complex AI models such as those developed for imaging applications require deep learning algorithms and generally can only be built using Python libraries and require programming expertise. Although relatively few physicians have such skills, tools such as Google Cloud Platform (Google Inc. Mountain View, CA Available at: http://cloud.google.com/vision/. Accessed 13 Feb 2022) now allow the building of AI models without such programming expertise, and thus the application of AI models in the medical field is likely to expand [8,9]. In fact, use of AI in the pathological diagnosis of infertile spermatozoa and otolaryngology imaging has already been reported [10,11].

Typically, training datasets are needed to develop AI, but such datasets are not available for functional gastrointestinal diseases, which do not show abnormalities even on endoscopy. However, inclusion of additional information such as the presence or absence of symptoms in training datasets used to develop AI models may allow detection of minute changes in the colon that cannot be detected by human observers. The purpose of this study was to determine whether AI models for image analysis can differentiate between different types of IBS and healthy colonoscopic images in real-world clinical practice using Google cloud AutoML Vision.

Materials and methods

Ethics

The study protocol was approved by the Ethics Committee of the Toyama University hospital (approval No. R2021032). All methods were performed in accordance with the relevant guidelines and regulations as well as with the Declaration of Helsinki. The study design was accepted by the ethics committee on the condition that a document declaring an opt-out policy (the need for consent was waived by the ethics committee) by which any potential patients and/or their relatives could refuse to be included, was uploaded to the website of the Toyama University hospital.

For use in real-world patients with IBS, patients were identified not by ROME criteria, but instead based on disease names recorded for insurance purposes between January 2010 and December 2020. These names included "Irritable bowel syndrome” (Group I), “constipated irritable bowel syndrome” (Group C), and “diarrhea irritable bowel syndrome” (Group D). Other diseases such as colorectal cancer, inflammatory bowel disease, and eosinophilic gastroenteritis were excluded based on symptoms and results of histopathological examinations. However, cases with nonspecific inflammatory cell infiltrates that did not meet the diagnostic criteria and were being followed up under the respective insurance disease names were included in the relevant group. For symptomatic patients, colonoscopy was done as part of a workup for changes in bowel habits (e.g., diarrhea), and asymptomatic patients had undergone colorectal cancer screening. Asymptomatic patients comprised Group N. Colonoscopy images were obtained from the endoscopy reporting system. Images were taken by more than 10 trainees or specialists at a single institution with an Olympus CF-HQ290Z or PCF-H290Z colonoscope. The accuracy of the model was improved by building the model multiple times after excluding normal light images of the terminal ileum, rectal inversion and anus, narrow band or dye-spread images. No biofilm was detected. Images having scores of 2 (i.e., minor amount of residual staining, small fragments of stool and/or opaque liquid, with well-visualized mucosa of colon segments) and 3 (entire mucosa of the colon segment was well-visualized with no residual staining, small fragments of stool and/or opaque liquid) on the Boston Bowel Preparation Scale (BBPS) [12] were employed. A total of 20 to 40 images were used, with about 5 images for regions in each segment (cecum, ascending, transverse, descending, sigmoid colons and rectum) per patient. Groups N, I, C, and D had 88, 11, 12 and 12 patients, respectively, for which 2,479, 382, 538, and 484 images, respectively, were used. The accuracy of the model increases with the number of patients, but a minimum of 100 images afforded a certain degree of accuracy. Thus, the number of patients and images used was considered sufficient to construct this model.

In this study we used annotation and Algorithm Generation using Google Cloud AutoML Vision from the Google Cloud Platform (GCP) (Google, Inc.). Four labels were defined as Groups N, I, C and D in the training dataset (single label classification). Three models were produced that differentiated Group I and N, Group N, C and D, or Group C and D. This process was done entirely by a single physician (HM).

Artificial neural network programming, training and external validation

The Google Cloud AutoML Vision platform was used to automatically and randomly select training set images (80%), validation set images (10%), and test set images (10%) from the dataset for algorithm training processes. Since the images are independent, an external validation can be performed. A total of 16 nodes (2 hours) were used to train the algorithm. AutoML Vision provides metrics: positive predictive values and sensitivity to stated thresholds, and area under the curve (AUC). For each model, we also generated a confusion matrix that cross-references the true labels against the labels predicted by the deep learning mode [8]. Using the extracted binary diagnostic accuracy data, we created a contingency table (confusion matrix) showing the calculated values for specificity at a threshold of 0.5. The confusion matrix showed results for true positive, false positive, true negative, and false negative. The probability of a given label for each image is presented as a score between 0 and 1.

Results

Group N vs. Group I

The first question we addressed was whether AI could distinguish patients with irritable bowel syndrome from healthy subjects. IBS is classified as IBS-C, IBS-D, and IBS-MIX, but their percentage in Group I has not been determined. In Japan, ramosetron and linaclotide are covered for IBS-D and IBS-C, respectively, resulting in Group D being patients prescribed ramosetron, Group C being patients prescribed linaclotide, and Group I being IBS patients not prescribed either. As training, validation and test images, 1969, 255 and 255 images, respectively, were used for Group N and 304, 39 and 39 images, respectively, were used for Group I. A comparison of patients in Group N for whom endoscopy showed no apparent abnormalities and Group I showed that the average precision (positive predictive value), precision and recall of the algorithm was 94.6%, 88.78% and 88.78%, respectively, based on automated training and testing using the AI model developed (Fig 1). Precision recall curves were generated for each individual label as well as for the algorithm overall. We adopted a threshold value of 0.5 to yield balanced precision and recall. The AUC of the model to discriminate Group N and Group I and the confusion matrix are shown in Table 1. The total AUC was 0.95 (Group I AUC 0.48, Group N AUC 0.97) and the sensitivity, specificity, positive predictive value and negative predictive value of Group I detection were 30.8%, 97.6%, 66.7%, and 90.23%, respectively. We found that the confusing rate for Group I and Group N was 69% and 2%, respectively. Representative images from endoscopy for patients with high IBS scores (Fig 2) and high normal scores are shown (Fig 3).

Fig 1.

Fig 1

(A) The precision of the Groups I and N colon image AI model is plotted as a function of recall. The blue shaded area represents the area under the curve (AUC), and the blue dot indicates the value in the case with a reliability threshold of 0.5. (B) The intersection of the recall (blue line) and the precision (red line) is shown. The blue and red dots indicate the values when the reliability threshold is set to 0.5.

Table 1. Groups I and N model of the colon; AUC and confusion matrix.

The confusion matrix shows how often each label was correctly classified in the model (Predictive and True labels’ agreement) and the labels that were confused for that label (Predictive and True labels’ disagreement).

AUC
All labels 0.95
Group I 0.48
Group N 0.97
Predictive label
True label Group I Group N
Group I 31% 69%
Group N 2% 98%

Fig 2. Images that scored relatively high (0 to 1) in the colon image AI model for detecting Group I are shown.

Fig 2

Values shown correspond to score.

Fig 3. Images that scored relatively high (0 to 1) in the colon image AI model for detecting Group N are shown.

Fig 3

Values shown correspond to score.

Group N vs. Groups C and D

Next, images from Group N were compared with images from endoscopy for patients in Groups C and D. As training, validation and test images, 419, 51 and 53 images, respectively, were used for Group C and 387, 48 and 49 images, respectively, were used for Group D. For these groups, the average precision (positive predictive value), precision, and recall of the algorithm were 83.2%, 77.71%, and 67.97%, respectively, based on automated training and testing (Fig 4). The precision recall curves and the threshold value were set as described above. The AUC of the model to discriminate among the groups and the confusion matrix are shown in Table 2. The total AUC was 0.83 (0.90 for Group N, 0.45 for Group C and 0.60 for Group D) and the sensitivity, specificity, and positive predictive value of Group N were 87.5%, 46.2%, and 79.9%, respectively. The confusing rate for Groups N, D and C was 12%, 51% and 66%, respectively.

Fig 4.

Fig 4

(A) The precision of the Groups N, C and D colon image AI model is plotted as a function of recall. The blue shaded area represents the area under the curve (AUC), and the blue dot indicates the value in the case with a reliability threshold of 0.5. (B) The intersection of the recall (blue line) and the precision (red line) is shown. The blue and red dots indicate the values when the reliability threshold is set to 0.5.

Table 2. Groups N, C and D model of the colon; AUC and confusion matrix.

AUC
All labels 0.83
Group N 0.90
Group C 0.45
Group D 0.60
Predictive label
True label Group N Group C Group D
Group N 87% 5% 7%
Group C 62% 35% 4%
Group D 45% 6% 49%

Group C vs. Group D

In comparing Group C and Group D, the average precision (positive predictive value), precision, and recall of the algorithm were 89.75%, 87.5%, and 87.5%, respectively, based on automated training and testing (Fig 5). The precision recall curves and the threshold value were set as described above. The AUC of the model to discriminate among the groups and the confusion matrix are shown in Table 3. The total AUC was 0.90 (0.87 for Group C and 0.94 for Group D). The confusing rate for Groups D and C was 18% and 7%, respectively. Representative images from endoscopy for patients with high IBS-D scores (Fig 6) and high IBS-C scores (Fig 7) are shown.

Fig 5.

Fig 5

(A) The precision of the Groups C and D colon image AI model is plotted as a function of recall. The blue shaded area represents the area under the curve (AUC), and the blue dot indicates the value in the case with a reliability threshold of 0.5. (B) The intersection of the recall (blue line) and the precision (red line) is shown. The blue and red dots indicate the values when the reliability threshold is set to 0.5.

Table 3. Groups C and D model of the colon; AUC and confusion matrix.

AUC
All labels 0.90
Group C 0.87
Group D 0.94
Predictive label
True label Group C Group D
Group C 93% 7%
Group D 18% 82%

Fig 6. Images that scored high (0 to 1) in the colon image AI model for detecting Group D are shown.

Fig 6

Values shown correspond to score.

Fig 7. Images that scored high (0 to 1) in the colon image AI model for detecting Group C are shown.

Fig 7

Values shown correspond to score.

Group N vs. Group I + Group C +Group D

Finally, Groups I, C, and D as a whole were merged to see if they could be distinguished from Group N using the same images. The average precision(positive predictive value), precision, and recall of the algorithm were 81.2%, 72.6%, and 72.6%, respectively.

Discussion

Endoscopic images of patients with IBS typically show no abnormalities. In this study, we investigated whether AI can detect minute endoscopic changes associated with microinflammation and other microenvironmental changes that cannot be easily detected by human observers. We constructed a code-free AI model to detect Group I with an AUC of 0.95 and a high specificity among Group N. The model was re-created separately for Groups C and D, and the AUC was slightly lower at 0.83, suggesting that Groups C and D may be distinguishable from each other. When the AI model was re-created to distinguish between the two, the AUC was 0.90, which was greater than the difference between the two groups and healthy subjects. The model for Group N vs. Group I + Group C + Group D is slightly less accurate than Group N vs. Group I, suggesting that Groups I, C, and D may be different populations in terms of images. To the best of our knowledge, this is the first AI model that can detect IBS in endoscopic images. Further investigation is needed to determine whether AI can differentially detect histological abnormalities, the presence of biofilm, and/or deformation of the colorectal lumen. Since the model was not constructed according to segments, whether a particular segment might have a greater or lesser contribution to diagnoses was unclear. Here we assumed that any segment used to calculate the IBS score was appropriate for that segment. The advantage of this model is that it returns an IBS score independently of segment. Preliminary results showed that the greatest differences occurred in the sigmoid colon, but further detailed studies are needed before conclusions about the diagnostic value of different segments can be drawn.

This model has several limitations. First, IBS diagnosis was defined not by ROME criteria, but instead by disease names that were recorded for insurance purposes. However, ROME IV criteria are not always used in clinical practice, and thus this model is expected to be accurate because it is designed for AI use in clinical practice. Second, patients in the Groups C and D may have been treated with linaclotide and ramosetron, respectively. Third, age, gender, and treatment response were not taken into account when selecting images for the training dataset. Fourth, the minimum sample size in GCP AutoML Vision was 100 images, but ideally more than 1,000 images are required. Furthermore, since the training, validation, and test images were from the same patient population, even though very different images were used, the results require validation with an independent patient cohort. And GCP AutoML Vision does not allow iterative refinements during the training phase in the form of loss vs. epoch or accuracy vs. epoch plots, making it difficult to confirm that the model is converging on the optimal path. Last, the study groups included cases before and after treatment and cases with and without treatment response.

The prime advantage of the use of Google Cloud AutoML Vision is that it requires no coding expertise and can be easily used with datasets to build AI Models. The code-free deep learning approach used in this study has the potential to improve access of clinicians to deep learning [8,9]. Other research groups have already reported medical image classification and otolaryngology diagnosis that was performed with an automated, coding-free deep learning approach [10,11]. Many deep convolutional neural networks (CNN) architectures exist [13]. In 2014 the predecessor model to GCP AutoML Vision, GoogLeNet, won the ImageNet Large Scale Visual Recognition Competition (ILSVRC) an international image AI competition, and achieved high accuracy with low computational cost [14]. Later, based on the theoretical background of the Neural Architecture Search, which optimizes the neural network architecture itself, GCP AutoML enabled high-quality image classification models to be generated even by those having no expertise in machine learning. For radiological images, Resnet, which enables construction of deep neural networks with over 1,000 layers, yielded better results than those obtained with GoogLeNet [15]. Whether Microsoft Azure, which is based on Resnet, is better than GCP AutoML Vision for building code-free endoscopic image AI models is a subject for future study.

Meanwhile, the endoscopic features the Artificial Neural Network (ANN) AI classifier uses to distinguish between model patients with IBS vs. healthy controls are unclear as the current model is essentially a black box. However, an explainable AI model is becoming available [16], and by adding a function to display the areas of importance that contributed to the score, determining which endoscopic features are characteristic of IBS should be possible.

We have confirmed that AI-based algorithms are also suitable for symptom-based diagnosis. Such algorithms may be able to detect differences in endoscopic images in other functional gastrointestinal disorders, such as functional dyspepsia and nonerosive gastroesophageal reflux disease. The accuracy of the AI model for IBS is expected to vary depending on the light source setting of the endoscope and scope, and whether the same accuracy is achieved at other facilities should be explored. In summary, here we described the development of an AI model that did not require coding experience and that can differentiate Groups I, C, and D from colonoscopy images. Construction of AI models based on the presence or absence of symptoms could be a new method to diagnose functional gastrointestinal diseases.

Acknowledgments

We thank Ayaka Maeda, Masaya Hiraki, Shun Kuraishi, and Kenji Ogawa, medical engineering technicians at the Medical Device Management Center, University of Toyama Hospital, for their support in collecting and organizing the images.

A summary of this study was presented at the 23rd Annual Meeting of the Japanese Society of Neurogastroenterology.

Data Availability

All image datasets used in this study can be found at https://doi.org/10.5061/dryad.9s4mw6mkp. Several images were removed from the repository as the data contain potentially sensitive information such as a partial patient ID in the image.

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Enck P, Aziz Q, Barbara G, Farmer AD, Fukudo S, Mayer EA, et al. Irritable bowel syndrome. Nat Rev Dis Primers. 2016;2:16014. doi: 10.1038/nrdp.2016.14 ; PubMed Central PMCID: PMC5001845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Mearin F, Lacy BE, Chang L, Chey WD, Lembo AJ, Simren M, et al. Bowel Disorders. Gastroenterology. 2016. doi: 10.1053/j.gastro.2016.02.031 . [DOI] [PubMed] [Google Scholar]
  • 3.Banwell JG, Howard R, Cooper D, Costerton JW. Intestinal microbial flora after feeding phytohemagglutinin lectins (Phaseolus vulgaris) to rats. Appl Environ Microbiol. 1985;50(1):68–80. doi: 10.1128/aem.50.1.68-80.1985 ; PubMed Central PMCID: PMC238575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Baumgart DC, Carding SR. Inflammatory bowel disease: cause and immunobiology. Lancet. 2007;369(9573):1627–40. doi: 10.1016/S0140-6736(07)60750-8 . [DOI] [PubMed] [Google Scholar]
  • 5.Maier L, Pruteanu M, Kuhn M, Zeller G, Telzerow A, Anderson EE, et al. Extensive impact of non-antibiotic drugs on human gut bacteria. Nature. 2018;555(7698):623–8. doi: 10.1038/nature25979 ; PubMed Central PMCID: PMC6108420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Baumgartner M, Lang M, Holley H, Crepaz D, Hausmann B, Pjevac P, et al. Mucosal Biofilms Are an Endoscopic Feature of Irritable Bowel Syndrome and Ulcerative Colitis. Gastroenterology. 2021;161(4):1245–56 e20. doi: 10.1053/j.gastro.2021.06.024 ; PubMed Central PMCID: PMC8527885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Kudo SE, Misawa M, Mori Y, Hotta K, Ohtsuka K, Ikematsu H, et al. Artificial Intelligence-assisted System Improves Endoscopic Identification of Colorectal Neoplasms. Clin Gastroenterol Hepatol. 2020;18(8):1874–81 e2. doi: 10.1016/j.cgh.2019.09.009 . [DOI] [PubMed] [Google Scholar]
  • 8.Faes L, Wagner SK, Fu DJ, Liu X, Korot E, Ledsam JR, et al. Automated deep learning design for medical image classification by health-care professionals with no coding experience: a feasibility study. Lancet Digit Health. 2019;1(5):e232–e42. doi: 10.1016/S2589-7500(19)30108-6 . [DOI] [PubMed] [Google Scholar]
  • 9.Eea Korot. Code-free deep learning for multi-modality medical image classification. Nat Mach Intell 2021. [Google Scholar]
  • 10.Ito Y, Unagami M, Yamabe F, Mitsui Y, Nakajima K, Nagao K, et al. A method for utilizing automated machine learning for histopathological classification of testis based on Johnsen scores. Sci Rep. 2021;11(1):9962. doi: 10.1038/s41598-021-89369-z ; PubMed Central PMCID: PMC8107178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Livingstone D, Chau J. Otoscopic diagnosis using computer vision: An automated machine learning approach. Laryngoscope. 2020;130(6):1408–13. doi: 10.1002/lary.28292 . [DOI] [PubMed] [Google Scholar]
  • 12.Lai EJ, Calderwood AH, Doros G, Fix OK, Jacobson BC. The Boston bowel preparation scale: a valid and reliable instrument for colonoscopy-oriented research. Gastrointest Endosc. 2009;69(3 Pt 2):620–5. doi: 10.1016/j.gie.2008.05.057 ; PubMed Central PMCID: PMC2763922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Khan A, Sohail A, Zahoora U, Qureshi AS. A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review. 2020;53(8):5455–516. doi: 10.1007/s10462-020-09825-6 [DOI] [Google Scholar]
  • 14.Ioffe S, Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In: Francis B, David B, editors. Proceedings of the 32nd International Conference on Machine Learning; Proceedings of Machine Learning Research: PMLR; 2015. p. 448–56. [Google Scholar]
  • 15.Ananda A, Ngan KH, Karabag C, Ter-Sarkisov A, Alonso E, Reyes-Aldasoro CC. Classification and Visualisation of Normal and Abnormal Radiographs; A Comparison between Eleven Convolutional Neural Network Architectures. Sensors (Basel). 2021;21(16). doi: 10.3390/s21165381 ; PubMed Central PMCID: PMC8400172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Quellec G, Al Hajj H, Lamard M, Conze PH, Massin P, Cochener B. ExplAIn: Explanatory artificial intelligence for diabetic retinopathy diagnosis. Med Image Anal. 2021;72:102118. doi: 10.1016/j.media.2021.102118 . [DOI] [PubMed] [Google Scholar]
PLOS Digit Health. doi: 10.1371/journal.pdig.0000058.r001

Decision Letter 0

Benjamin P Geisler, Alexander Wong

3 Aug 2022

PDIG-D-22-00137

Artificial Intelligence Model for Analyzing Colonic Endoscopy Images to Detect Changes Associated with Irritable Bowel Syndrome

PLOS Digital Health

Dear Dr. Mihara,

Thank you for submitting your manuscript to PLOS Digital Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Digital Health's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript within 60 days Oct 02 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at digitalhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pdig/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Benjamin P. Geisler, M.D., M.P.H., F.A.C.P., M.R.C.P. (London), F.H.M.

Academic Editor

PLOS Digital Health

Journal Requirements:

1. In the online submission form, you indicated that “Data cannot be shared for confidentiality reasons. Queries about the data should be directed to the corresponding author.”. All PLOS journals now require all data underlying the findings described in their manuscript to be freely available to other researchers, either 1. In a public repository, 2. Within the manuscript itself, or 3. Uploaded as supplementary information.

This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If your data cannot be made publicly available for ethical or legal reasons (e.g., public availability would compromise patient privacy), please explain your reasons by return email and your exemption request will be escalated to the editor for approval. Your exemption request will be handled independently and will not hold up the peer review process, but will need to be resolved should your manuscript be accepted for publication. One of the Editorial team will then be in touch if there are any issues.

2. Please provide separate figure files in .tif or .eps format and remove any figures embedded in your manuscript file. Please also ensure that all files are under our size limit of 10MB.

For more information about how to convert your figure files please see our guidelines: https://journals.plos.org/digitalhealth/s/figures

3. Please ensure that you refer to Table 3 in your text as, if accepted, production will need this reference to link the reader to the table.

4. All figures and supporting information files will be published under the Creative Commons Attribution License (creativecommons.org/licenses/by/4.0/). Authors retain ownership of the copyright for their article and are responsible for third-party content used in the article.

Figures 2, 3, and 6: Please confirm (a) that you are the photographer; or (b) provide written permission from the photographer to publish the photo(s) under our CC-BY 4.0 license.

Please upload any written confirmation as an 'Other' file type. It must clarify that the copyright holder understands and agrees to the terms of the CC BY 4.0 license; general permission forms that do not specify permission to publish under the CC BY 4.0 will not be accepted. Note that uploading an email confirmation is acceptable.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Partly

Reviewer #4: Yes

--------------------

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: No

--------------------

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

Reviewer #4: Yes

--------------------

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

--------------------

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors’ purpose was to determine whether image AI models can differentiate between different types of irritable bowel syndrome (IBS) and healthy colonoscopic images using Google cloud AutoML Vision.

I have some comments to make on their research.

1. Page 6, line 90-92: I think that that the text and references do not match. The authors should check this.

2. Page 7, line 98: The content of the reference and the text appear not to match.

3. Page 7, line 98-100: The content of the references clearly does not match the text.

4. Page 8, line120-122: The authors mention some groups of patients. However, they do not give any background information for them. The authors should state such background factors as gender, age, history of illness, and treatment history. In addition, I think that the total numbers of patients are not sufficient to make an AI model.

Reviewer #2: In this retrospective study from Japan, Mihara et al used an artificial neural network (ANN) AI classifiers to analyze and model endoscopic images from colonoscopies to differentiate between patients with various irritable bowel syndrome/IBS subtypes (IBS, IBS-D, IBS-C) versus healthy controls with relatively good performance with an area under the curve (AUC) ranging from 0.85-0.93. Objective markers of diagnosing IBS are much needed. This is an interesting study and addresses an important topic. I have the following critiques and recommendations:

1. Abstract, Background/Aims: "IBS is not an organic disease, and the patients typically show no abnormalities in lower gastrointestinal endoscopy." The authors are correct that IBS patients usually have normal colonoscopies (endoscopic and histologic). Since most colonoscopies are normal in patients with IBS, the authors should clarify why they chose to review colonoscopy images by AI modeling.

2. Abstract, Background/Aims: "Recently, biofilm formation has been visualized by endoscopy, and the ability of endoscopy to detect microscopic changes due to dysbiosis and microinflammation has been reported." If this is the case, why did the authors only look at patients with IBS without biofilms? Why mention biofilms if their study doesn't even measure this?

3. Methods: "These names included "Irritable bowel syndrome (group I)." The authors should clarify what type of IBS patients were included in Group 1 and how they differed from the other groups with constipation-predominant (IBS-C) and diarrhea-predominant (IBS-D).

4. Methods: Among patients with IBS-D, were other etiologies of chronic diarrhea ruled out (e.g. inflammatory bowel disease, microscopic colitis, small intestinal bacterial overgrowth)?

5. Methods: The authors should clarify the rationale for colonoscopy in the included patients? Was this screening colonoscopy for colon cancer? Was colonoscopy done as part of workup of altered bowel habits (e.g. diarrhea)?

6. Results: The authors should include a Table 1 with baseline clinical characteristics of included patients and healthy controls include patient age, sex, medications, etc.

7. Methods: The authors should expand the details of the AI classifier they used for the study. Was was ANN used versus other AI classifiers? What are the benefits of this strategy when analyzing endoscopic images?

8. Methods/Results: For the Figures, the author state "High-scoring images in patients with IBS (score 0-1)." What do these scores refer to? How was this derived and what do they mean clinically?

9. Methods/Results: What endoscopic features (e.g. alterations in vascular patterns, mucosal folds, colonic mucosa patterns or color, presence or scarring or erythema) did the artificial neural network (ANN) AI classifier use to differentiate/model patients with IBS vs healthy controls?

10. Methods/Results: Older patients with chronic constipation often have diverticulosis across the colon. How did the AAN AI classifier account for diverticulosis in the predictive model? Did diverticulosis predict IBS-C? Similarly, patients who use laxatives sometimes have discoloration of the colon (melanosis coli). How did the AAN AI classifier model this finding?

11. Methods/Results: Were biopsies obtained from the included patient and histologic assessment performed to rule out microscopic colitis? Sometimes during colonoscopy, colonic mucosa appears endoscopically normally, but has inflammation at the histologic level.

12. Methods/Results: Different segments of the colon (right colon, transverse, rectum, etc) have different endoscopic features. For all the included patients, the authors should clarify how many endoscopic images were obtained for each segment and clarify whether images from specific segments were more diagnostic/contributed more to AI-derived model to differentiate IBS patients from healthy controls.

13. Methods/Results: The analysis could be more robust and not only differentiate/classify IBS vs non-IBS patients, but also should correlate the endoscopic images with validated IBS severity scores (e.g. Birmingham IBS Symptom Questionnaire , IBS Symptom Severity Scale, etc).

14. Methods/Discussion: The authors should state that the small sample size and lack of validation of the ANN AI model in an independent cohort of patients are additional limitations of the study.

15. Discussion: The authors state "Endoscopic images of IBS typically show no abnormalities, but we investigated whether AI could detect microinflammation that cannot be easily detected by human observers." How was microinflammation defined and assessed from colonoscopy endosocopic images in the included cohort of patients?

Reviewer #3: The authors applied the Google cloud platform AutoML Vision to build a model to differentiate between patients with IBS and asymptomatic healthy controls based on colonoscopy images. The validation cohort was a subset of the training set and the model could differentiate asymptomatic controls and IBS patients with a sensitivity of 30 % and specificity of 97 %.

Before publication some major points need to be clarified:

- The “group I” comprised of “IBS” patients while “group C” and “D” comprised of IBS-C and IBS-D patients. While IBS-D and IBS-C are clear subgroups, it is unclear, what defines group “I”. Also the diagnostic algorithm for IBS diagnosis is not well defined. Rome criteria should be applied. For the sake of clinical relevance we recommend combining IBS-M and IBS-D and excluding IBS-C from such analysis. Also a larger dataset (>100 patients and controls) would reduce overfitting.

- The authors show colonoscopy pictures of high scoring IBS and control patients (Figure 2 and 3). There is no evident visual difference between those pictures, but authors suggest that a machine learning model might pick up differences in i.e. vasculation pattern. In fact endoscopically visible biofilms as mentioned in their introduction are obviously not present in the representative images. Were any biofilms detectable in this study cohort ? Were images with incomplete bowel cleansing removed from the data set ? Then they likely have remove obvious biofilm patients from their data set.

-Machine learning is highly sensitive to bias. Authors need to improve on cohort characterization; were all patients scoped with the same model of endoscope. Where patients from each cohort scoped in the same hospital/ by the same endoscopist?

Minor points:

-Abstract, Background and Aims: authors start with “IBS is not an organic disease...” but continue to elaborate on biofilm formation, dysbiosis and microinflammation. This mismatch needs to be elaborated.

Reviewer #4: The authors made a image recognition model using Google AutoML vision. They successfully distinguished healthy vs IBD samples.

The approach has merits and significant advantage in understanding the pathophysiology of IBD.

However, the authors are using Google AutoML tool as a black box. The training and validation statistics are only limited

the to final ROC or contingency tables.

I think the authors should reveal how training improved the models in different iterations. They should clearly separate traning and validation datasets with number of samples.

The tables with AUC should also reveal number of samples used.

The authors didn't reveal how the images were scores and how were they ranked.

--------------------

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: John Gubatan, MD

Reviewer #3: No

Reviewer #4: Yes: Debashis Sahoo

--------------------

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLOS Digit Health. doi: 10.1371/journal.pdig.0000058.r003

Decision Letter 1

Benjamin P Geisler, Alexander Wong

24 Oct 2022

PDIG-D-22-00137R1

Artificial Intelligence Model for Analyzing Colonic Endoscopy Images to Detect Changes Associated with Irritable Bowel Syndrome

PLOS Digital Health

Dear Dr. Mihara,

Thank you for submitting your manuscript to PLOS Digital Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Digital Health's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please perform an additional analysis of a combined group IBS patients vs. controls, as suggested by one of the reviewers during the first revision cycle. Please see reviewer #3's comment regarding this below.

Please submit your revised manuscript within 60 days Dec 23 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at digitalhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pdig/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

* A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Benjamin P. Geisler, M.D., M.P.H., F.A.C.P., M.R.C.P. (London), F.H.M.

Academic Editor

PLOS Digital Health

Journal Requirements:

Additional Editor Comments (if provided):

Please perform an additional analysis of a combined group IBS patients vs. controls, as suggested by one of the reviewers during the first revision cycle.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: (No Response)

Reviewer #4: (No Response)

--------------------

2. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

Reviewer #4: Yes

--------------------

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: No

--------------------

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

Reviewer #4: Yes

--------------------

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

--------------------

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This revised paper is well written and acceptable to be published in PLOS Digithal Health.

Reviewer #2: The authors have addressed my critiques to the best of their available datasets.

Reviewer #3: The authors did not perform the requested analysis of all IBS patients combined against controls. Splitting IBS patients into small ill-defined groups and comparing against a bigger cohort increases the possibility of overfitting. In terms of study design neither image location, endoscopic equipment nor disease classification was standardized. Furthermore, the authors could not specify relevant changes such as endoscopically visible biofilms in their images. An independent control cohort, which underlines clinical applicability of the model is lacking. It is thus more likely that the model is able to detec a signal for individual patients instead of underlying features of IBS.

Reviewer #4: The authors have not addressed my question of providing iterative improvements of the ML models during training phase. This data is crucial because it will show if the models are converging and achieving optimal path or no change in improvements at all. This is usually done by showing a plot of Loss vs Epochs or Accuracy vs Epochs.

--------------------

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: John Gubatan

Reviewer #3: No

Reviewer #4: Yes: Debashis Sahoo

--------------------

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLOS Digit Health. doi: 10.1371/journal.pdig.0000058.r005

Decision Letter 2

Benjamin P Geisler, Alexander Wong, Bo Wang

12 Jan 2023

Artificial Intelligence Model for Analyzing Colonic Endoscopy Images to Detect Changes Associated with Irritable Bowel Syndrome

PDIG-D-22-00137R2

Dear Hiroshi Mihara,

We are pleased to inform you that your manuscript 'Artificial Intelligence Model for Analyzing Colonic Endoscopy Images to Detect Changes Associated with Irritable Bowel Syndrome' has been provisionally accepted for publication in PLOS Digital Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow-up email from a member of our team. 

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact digitalhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Digital Health.

Best regards,

Benjamin P. Geisler, M.D., M.P.H., F.A.C.P., M.R.C.P. (London), F.H.M.

Academic Editor

PLOS Digital Health

***********************************************************

Reviewer Comments (if any, and for reference):

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #4: All comments have been addressed

**********

2. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #2: Yes

Reviewer #4: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

Reviewer #4: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #4: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: (No Response)

Reviewer #4: All concerns are addressed.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: John Gubatan, MD

Reviewer #4: Yes: Debashis Sahoo

**********

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Response to Reviewers.docx

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    All image datasets used in this study can be found at https://doi.org/10.5061/dryad.9s4mw6mkp. Several images were removed from the repository as the data contain potentially sensitive information such as a partial patient ID in the image.


    Articles from PLOS Digital Health are provided here courtesy of PLOS

    RESOURCES