Advancing artificial intelligence applicability in endoscopy through source-agnostic camera signal extraction from endoscopic images

Ioannis Kafetzis; Philipp Sodmann; Robert Hüneburg; Jacob Nattermann; Nora Martens; Daniel R Englmann; Wolfram G Zoller; Alexander Meining; Alexander Hann

doi:10.1371/journal.pone.0325987

. 2025 Jun 11;20(6):e0325987. doi: 10.1371/journal.pone.0325987

Advancing artificial intelligence applicability in endoscopy through source-agnostic camera signal extraction from endoscopic images

Ioannis Kafetzis ^1,^*, Philipp Sodmann ¹, Robert Hüneburg ^2,³, Jacob Nattermann ^2,³, Nora Martens ^4,⁵, Daniel R Englmann ⁶, Wolfram G Zoller ⁷, Alexander Meining ¹, Alexander Hann ¹

Editor: Kazunori Nagasaka⁸

PMCID: PMC12157078 PMID: 40498866

Abstract

Introduction

Successful application of artificial intelligence (AI) in endoscopy requires effective image processing. Yet, the plethora of sources for endoscopic images, such as different processor-endoscope combinations or capsule endoscopy devices, results in images that vastly differ in appearance. These differences hinder the generalizability of AI models in endoscopy.

Methods

We developed an AI-based method for extracting the camera signal from raw endoscopic images in a source-agnostic manner. Additionally, we created a diverse dataset of standardized endoscopic images, named Endoscopic Processor Image Collection (EPIC), from 4 different endoscopy centers. Included data were recorded using 9 different processors from 4 manufacturers with 45 endoscopes. Furthermore, images recorded with 4 capsule endoscopy devices from 2 manufacturers are included. We evaluated the camera signal extraction method using 641 manually annotated images from 5 different, publicly available endoscopic image datasets, as well as on the EPIC dataset. Results were compared it with a published baseline in terms of Intersection over Union (IoU) and Hausdorff distance (HD).

Results

In segmenting the camera signal on images from public datasets, our method achieved mean IoU of 0.97 which was significantly higher than that of the baseline method and mean HD of 21 pixels which was significantly lower compared to the baseline. On the standardized images of the EPIC dataset, there was no significant difference between IoU but our method achieved a significantly lower HD. Both the developed AI-based method and the generated dataset are made publicly available.

Conclusion

This work introduces an AI-based method that effectively segments the endoscope camera signal from the raw endoscopic data in a source-agnostic way. Utilizing the proposed method as a preprocessing step allows existing AI models to use any endoscopic image, independent of its source, without compromising performance. Additionally, EPIC, a dataset of diverse endoscopic images, is generated. The proposed method, trained AI model weights, and the EPIC dataset are made publicly available.

Introduction

In recent years, artificial intelligence (AI) has entered the medical field, finding direct involvement in patient care [1,2], showing the potential to achieve human-like performance [3]. This is particularly true for endoscopy, as the image-based nature of the examination can benefit from AI-based computer vision methods [4]. In colonoscopy, multiple AI-based systems attempt to enhance the physician’s ability to detect adenoma during the examination [5–8] with several commercially computer-aided detection (CADe) systems being developed [9–13] and introduced in clinical routine [14]. Further applications of AI in endoscopy include polyp size estimation [15], characterization of colorectal polyps [16], automatic report generation for colonoscopy examinations [17], resection planning of gastrointestinal neoplasia [18], recognition of eosinophilic esophagitis [19], and management of patients suffering from Crohn’s disease [20], hiatal hernias [21] and assessment of the gastroesophageal junction [22].

Raw data captured during the endoscopic examination is suboptimal to be used with AI, as the recorded interface contains the camera signal along with borders where patient and examination related information are displayed. The presence of extensive additional information in combination with the variability of the camera signal’s shape and location in the raw data compromises AI performance and increases the required computational power. As a result, existing AI models cannot perform adequately “out of the box” with data from different sources. Furthermore, using uncropped images introduces model bias that diminishes model performance, as the model focuses on the borders and shape of the camera signal [23]. On the other hand, extracting only the camera signal can significantly improve model generalizability, as it vastly decreases aforementioned discrepancies in model inputs. Such a method is also beneficial for cloud computing [24], as reducing the amount of transmitted data can reduce communication delays.

Several studies evaluating AI-based systems only used preset dimensions for extracting the camera signal [5,9]. This approach might be effective on a small scale yet becomes limiting for multi-center studies which are required for proper evaluation of AI models [25]. The lack of standardized image pre-processing and the need of manual endoscopic image cropping are acknowledged by Sierra-Jerez and colleagues in [26] as important obstacles in developing AI models in colonoscopy.

Image processing methods that can enhance performance of computer vision methods have been investigated [27]. Such methods include undistortion of the endoscopic images [28], camera calibration [29], and treatment of light artefacts and reflections [30]. Furthermore, extraction of the camera signal location has been extensively studied when the camera signal is of circular shape, which is common in endoscopic surgery [31–33]. Moving to data from gastroscopy and colonoscopy, the circularity assumption does usually not hold, as image signal is visualized in different shapes ranging from polygonal to elliptical. Mathematical methods for extracting the camera signal in these cases have been investigated [34,35]. Yet, such methods lack flexibility and thus fail to adapt to the ever-increasing diversity of raw data. AI-based cropping of endoscopic images was discussed in [23], as a method to reduce model bias. The AI presented there was not evaluated on different sources of endoscopic images and was not made publicly available.

The main aim of this work is the development and evaluation of an AI-based method that extracts the camera signal from the raw data in a source agnostic method, and to make the method and AI model weights publicly available. Such a method can support usage of AI models developed for endoscopy, by unifying input data, limiting their variability without loss of relevant information. The proposed method is evaluated using manually annotated endoscopic images from a variety of publicly available datasets. Additional evaluation is performed using a unique new dataset of standardized endoscopic images, called the Endoscopic Processor Image Collection (EPIC), which is introduced in this work. The EPIC dataset contains images recorded with a variety of sources, such as different processor-endoscope combinations and capsule endoscopy devices. Furthermore, the dataset contains endoscopic manually annotated binary masks indicating the location of the camera signal. An image processing method for performing camera signal segmentation is used as a baseline for comparison. The proposed method, AI model weights, and the EPIC dataset are made publicly available, to facilitate widespread adoption and further research in AI for endoscopy.

Materials and methods

Endoscopic camera signal extraction pipeline

The proposed method takes any endoscopic image as input and predicts a binary mask that segments the camera signal in the input. To reduce any prediction noise, only the largest connected component of the prediction is considered. Based on this, a minimum dimension sub-image containing the camera signal is extracted. In this work, picture-in-picture mode (PiP) is considered part of the camera signal.

The AI utilizes the UNet architecture [36] with an Efficient Net [37] as backbone. Training data for the model comprised of 1.765 manually annotated endoscopic images extracted from examination videos recorded between January 15th, 2019, and January 31st, 2022, in four different endoscopic centers. Gold standard binary semantic segmentation masks for each image were manually created to delineate the camera signal– where a value of 1 indicated pixel inclusion within the camera signal and 0 otherwise. The training data included a range of image quality levels from clear, high-quality images to low quality, blurry images. This diverse set of indicative images reflects the varied conditions encountered during endoscopy. The training images were pseudo-anonymized according to standard practice for patient data and contained no patient identifying information. An in-depth presentation of the model selection process and training details are presented in S1 File.

Generation of the “Endoscopic Processors Image Collection” dataset

The EPIC dataset contains raw endoscopic images captured across four German hospitals with backgrounds of either white or a combination of blue and green hues. Background color selection facilitates clear identification and separation of camera signals against any borders. Example of the recording setup from one of the Hospitals is shown in Fig 1. Such images were gathered for multiple combinations of endoscopic processors and endoscopes. Given that the screen aspect ratio can significantly influence the final appearance of these images, various aspect ratios—16:9, 4:3, and 5:4—were also used to collect representative examples. For applications involving capsule endoscopy or cholangioscopy, where acquiring new images was not always feasible, only existing images that matched the above description and contained no patient identifying data were included in the dataset. In addition to the raw images, the EPIC dataset includes manually generated binary masks that precisely delineate the camera signal’s position on each image.

Fig 1 — Displayed are three different Pentax (PENTAX Europe GmbH, Hamburg, Germany) processor generations with endoscopes attached. On the examination table are the two different textures used for obtaining the images.

No centers contributed images for both the EPIC dataset and training of the proposed AI. This guarantees that the EPIC dataset remains distinct and valid for performance evaluation purposes. For a detailed breakdown of the EPIC dataset’s contents, including processor-specific data and capsule-endoscopy-related information, readers are referred to S1 Table for endoscopic processor data and S2 Table for data from capsule endoscopy.

Method evaluation

The proposed method was evaluated on two different sets of endoscopic images. The first set consists of 641 images from five publicly available datasets [38–42] and the second set is the EPIC dataset. Gold standard was manually annotated for all images using binary masks. The mask had a value of 1 for pixels belonging to the camera signal and 0 otherwise. The method was evaluated by means of intersection over union (IoU) and Hausdorff distance (HD). IoU is defined as the percentage overlapping with the gold standard. The HD, in the context of semantic segmentation, is defined as follows. Let $m_{1}$ and $m_{2}$ be two binary masks. The one-sided HD from $m_{1}$ to $m_{2}$ is defined as $h d_{s} (m_{1}, m_{2}) = {max}_{x \in m_{1}} {min}_{y \in m_{2}} {‖ x - y ‖}_{2}$ . This can be interpreted as the maximum distance from a point $x \in m_{1}$ to the closest point $y \in m_{2}$ . The HD of $m_{1}$ and $m_{2}$ is defined as the maximum of the two one-sided HDs, that is $H D (m_{1}, m_{2}) = max (h d_{s} (m_{1}, m_{2}), h d_{s} (m_{2}, m_{1}))$ [43,44]. Thus, the HD between the two masks can be seen as the greatest distance from a point in either mask to its nearest point in the other mask.

To evaluate the efficiency of our proposed method, we compared our method with the currently used standard, which is described in section 3.2 of [34], where the data pre-processing pipeline is presented. This image processing method first extracts the bright intensity from the image, compares each pixel to a threshold value and calculates the largest connected component in the image, which represents the location for the camera signal.

Statistical analysis

Statistical analysis was performed using the SciPy [45] library for Python. The mean value and 95% confidence intervals (CI) around it are calculated for the IoU and HD. Additionally, since the measurements are continuous pairs of not normally distributed data, the Wilcoxon test with a significance level of 0.05 is employed for comparing the proposed method and baseline performance.

Statement of ethics

Prospective collection of endoscopic examinations during clinical routine was approved by the local ethical committee responsible for each study center (Ethik-Kommission Landesärztekammer Baden-Württemberg (F-2021–042. F-2020–158), Ethik-Kommission Landesärztekammer Hessen (2021–2531), Ethik-Kommission der Landesärztekammer Rheinland-Pfalz (2021–15677) and Ethik-Kommission University Hospital Würzburg (12/20, 20200114 04)). All procedures were in accordance with the Helsinki Declaration of 1964 and later versions. Signed informed consent from each patient where data collection was performed prospectively was obtained prior to participation.

Results

Composition of the EPIC dataset

The EPIC dataset comprises 267 raw endoscopic images, along with manually extracted, gold standard masks that indicate the location of the camera signal. The raw data were captured using nine different endoscopic processors: Olympus CV-180, CV-190, and CV-1500 (Olympus Europa SE & Co. KG, Hamburg, Germany), Storz Image-1S (KARL STORZ SE & Co. KG, Tuttlingen, Germany), Pentax EPK-i, EPK-i7000, and EPK-i7010 (PENTAX Europe GmbH, Hamburg, Germany), and Fujifilm VP-4450HD and VP-7000 (FUJIFILM Europe GmbH, Düsseldorf, Germany). The number of processor-endoscope combinations varied up to 23 for one included processor. Furthermore, for 14 processor-endoscope combinations, images with different aspect ratios, namely 16:9, 4:3, and 5:4, are included.

Additionally, the dataset includes images captured using four different capsule endoscopy devices from two different manufacturers: OMOM (JINSHAN Science & Technology (Group) Co., Ltd., 118 Nishang Road, Yubei, Chongqing, China) and Medtronic (MEDTRONIC TRADING NL B.V., Larixplein 4 5616 VB Eindhoven, The Netherlands). Indicative examples of endoscopic images from the EPIC dataset are presented in Fig 2.

Fig 2 — The images are displayed in their original resolution.

Evaluation of the proposed method

When tested with 641 endoscopic images from publicly available datasets, the proposed method achieved an IoU score of 0.97 (95% CI: 0.969–0.971), which was significantly higher than the mean IoU of 0.939 (95% CI: 0.932–0.946) achieved by the baseline method (p < 0.001). Additionally, the proposed method achieved a mean HD of 21 pixels (95% CI: 20–23), which was significantly lower than the mean HD of 51 pixels (95% CI: 45–57) achieved by the baseline (p < 0.001). The distributions of IoU and HD values are illustrated in Fig 3.

Fig 3 — The distributions of intersection over union (left) and Hausdorff distance (right) values on the test dataset are compared.

On the EPIC dataset, our method achieved a mean IoU of 0.962 (95% CI: 0.955–0.969), which was higher but comparable to the mean IoU of 0.954 (95% CI: 0.946–0.962) for the baseline method (p = 0.68). For HD, our method achieved a mean of 40 pixels (95% CI: 34–46), which was significantly lower than the mean HD of 52 pixels (95% CI: 44–60) of the baseline p = 0.02. The distribution of the results is shown in Fig 4, and the evaluation across different endoscopic processors in terms of IoU and HD is presented in S1 Fig.

Fig 4 — The distributions of intersection over union (left) and Hausdorff distance (right) values on the test dataset are compared.

Examples of applying the proposed method to endoscopic images from publicly available datasets are shown in Fig 5, where each row corresponds to a different test image. In the first column, the mask for the camera signal is overlaid with the image. In the second column, the results of comparing the predicted mask with the gold standard are displayed. Green color indicates true positives, red color false positives and blue color false negatives. In the third column, the camera signal is marked with a green bounding box and finally, the fourth column depicts the result of the extraction of the endoscopic image.

Finally, the time required for the proposed method to extract the camera signal was investigated. In images from public datasets, the mean execution time per image was 0.011 seconds which is close to 90 frames-per-second (fps). For images on the EPIC datasets, which are in general of higher resolution, the mean execution time per image was 0.018 seconds, or 52 fps. The results were obtained using an NVIDIA GeForce RTX 3080 Ti (NVIDIA Corporation, 2788 San Tomas Expressway, Santa Clara, CA 95051, USA). Both cases indicate that the model can achieve real-time performance.

Discussion

Although AI methods have been presented as highly beneficial for physicians, especially in the field of endoscopy, it remains challenging to integrate these models into clinical settings due to significant variability among endoscopic images used as input. This variability is primarily attributed to differences in the hardware, such as endoscopic processors and endoscopes, and software such as the device used to record the examination.

The main contribution of this work is the development of an AI-based method that, given raw endoscopic image data as input, efficiently extracts the location of the endoscopic camera image in a manner that is agnostic to the source, that is independent of the specific endoscopic hardware or software used to capture the images. The generated model and trained weights are made publicly available under https://www.kaggle.com/m/273059 (https://doi.org/10.34740/KAGGLE/M/273059), to support broader adoption from the research community.

Furthermore, we created a dataset of raw endoscopic images called the EPIC dataset. Our goal with this dataset is threefold: to highlight the substantial diversity present in endoscopic images, to determine the causes for their variability, and to catalog these diverse images in a standardized format. The EPIC dataset includes 267 raw endoscopic data captured using nine different processors, multiple different endoscopes, and four different capsule devices used for capsule endoscopy. The variety of combinations between hardware and recording options used makes the proposed dataset ideal for illustrating the wide range of shapes and forms in which the camera signals in displayed in raw endoscopic images. Moreover, the EPIC dataset contains manually annotated binary masks indicating the location of the camera signal for each image. The diversity of the dataset can be further highlighted considering the timespan of endoscopic images covered, with the oldest endoscopic processor included was introduced in 2006 whereas the newest one was launched in 2020.The EPIC dataset is made available under https://doi.org/10.34740/kaggle/dsv/11103826

The proposed method was evaluated using images from publicly available endoscopic datasets and the EPIC dataset. In segmenting the camera signal from public dataset images, our AI achieved a mean IoU 0.97 and mean HD of just 21 pixels, which are significantly better compared to the mean IoU of 0.939 (p < 0.01) and mean HD of 51 pixels (p < 0.01) that the baseline achieved. On the EPIC dataset, the proposed method achieved a mean IoU of 0.962 which was comparable to the 0.954 of the baseline (p = 0.68) and HD of 40 pixels which was significantly lower than the 52 pixels of the baseline (p = 0.02).The images of the EPIC dataset highly benefit the baseline method, as they are captured so that the camera signal has a color easily distinguishable from the background. Therefore, the baseline method was expected to achieve top results when evaluated on the EPIC dataset. Furthermore, the AI model has never encountered similar looking images during its training process.

Robust and reliable automatic extraction of the endoscopic camera signal from the raw data is necessary in the development of AI for colonoscopy [26]. In this direction, Yao H. and colleagues performed the cropping of the endoscopic image by binarizing it and detecting the largest 4-connected component [34], which served as a baseline for comparison of the proposed method. In [35] endoscopic images are cropped by first converting them in grayscale and then extracting a circular region with its center matching the center of the image. This method results, by design, in the loss of information from the endoscopic image. Finally, the work closest to our results is that of [23], where a U-Net model is used to determine the location of the camera signal in the endoscopic image. Yet, the work did not evaluate the model performance, especially how it performs on images from different sources and did not make their model available to the public. In terms of creating images with masks indicating the camera signal, Sanchez-Peralta et. al. proposed a dataset of raw endoscopic images, where masks for the camera signal are also provided [46]. Yet, all the images come from one processor, and extraction of the mask for the endoscopic image is manually performed. Finally, several works effectively extract the endoscopic camera signal for surgical videos, where the part of the image containing the signal is assumed to be circular [31–33]. Yet, the assumption of a circular camera signal is not present in the general endoscopic images, where the shape ranges from ellipse to polygonal.

We believe that our method can significantly enhance the generalizability of AI models trained for endoscopy. By integrating the proposed method into existing and future AI models, we achieve a streamlined process. First, raw endoscopic images are processed with our method to extract only the camera signal. This contains all relevant information necessary for the AI model while minimizing irrelevant data, thereby standardizing the input images. The standardized image is then used as input to the AI. This way, AI input includes all relevant and only a minimum amount of irrelevant information, independent from their source. This further contributes to standardizing endoscopic images, a process that has been already shown to significantly impact AIs, for example in esophageal cancer detection [47] and improved mucosal visualization in capsule endoscopy [48]. Thus, introduction of our method can improve generalizability of AI for endoscopy to data from any sources without any additional overhead. Furthermore, we believe that the proposed method can easily find successful application in pipelines for pre-processing endoscopic image data and AI model training. The fact that the image area to be cropped is selected as the minimum area rectangle containing all the mask pixels, together with the high performance of the AI suggests that no loss of relevant information occurs from using the proposed method.

Inclusion of data from different sources has already proved beneficial in training AI models that find applications in clinical practice. As an example, diversity in data training allowed successful application of an AI in clinical routine in multiple different endoscopy centers in [49], achieving high performance despite usage of different hardware. Furthermore, studies have incorporated images from publicly available datasets as external validation, where standardization of images from different sources plays a central role [21]. Finally, our proposed method can address the challenge of varying endoscopic equipment across different centers, facilitating the execution of multicenter studies for AI in endoscopy.

Endoscopy is a rapidly developing field, especially in terms of hardware. Furthermore, there are certainly sources of data that are rare, and thus obtaining relevant images can be harder. Considering this, the ability of the proposed method to remain performant when used with data from newer devices is significant. We are confident that the model can maintain high performance, as its evaluation was performed on two datasets that had no overlap with the training dataset. Furthermore, we make the model publicly available and welcome data submission from researchers to further develop and improve model performance. Any newer versions of the model weights will also be made available, acknowledging all data contributions as well.

There are also some limiting factors for this work. Endoscopic data usually contains elements such as motion blur and artefacts, that could be disrupting to the AIs performance. To mitigate this problem, such data were also annotated and included in the training dataset. Furthermore, in the case of video data, the cropping dimensions can be obtained via averaging of sequential frames, enabling removal of any outliers. Another limitation is that the EPIC dataset does not cover the whole spectrum of existing combinations. To address this issue, we plan to keep updating the EPIC dataset, as well as welcome and acknowledge image contributions that extend it.

Conclusions

In this work, we propose an AI-based method that effectively extracts camera signals from raw endoscopic data, independent of the endoscopy hardware and software used to record them. This can enhance the standardization of images used as input to AI models, thereby increasing their transferability and generalizability across diverse clinical settings. This is crucial for maintaining consistent, high-quality data with reduced variability issues that can arise when training AI models on diverse datasets. Moreover, the proposed method’s source agnosticism supports data sustainability by unifying diverse datasets into standardized format, simplifying their inclusion in AI training and evaluation pipelines.

Additionally, we generate a dataset, called EPIC, of standardized endoscopic images, attempting to highlight and collectively report the diversity that can be introduced by different endoscopic equipment.

By making both the proposed method and the EPIC dataset publicly available, we aspire to generate a collaborative environment where researchers can build upon these foundational resources to further advance their work in AI-driven endoscopic applications.

Supporting information

S1 Table. Image data from endoscopic processors included in the EPIC dataset.

The EPIC dataset contains 267 images stored using 9 different endoscopic processors, several endoscopes and different recording aspect ratio settings. ERCP: Endoscopic Retrograde Cholangiopancreatography, EPIC: Endoscopic Processor Image Collection.

(DOCX)

pone.0325987.s001.docx^{(31KB, docx)}

S2 Table. Image data from capsule endoscopy included in the EPIC dataset.

Description of the different small bowel and colon video capsules included in the EPIC dataset. EPIC: Endoscopic Processor Image Collection.

(DOCX)

pone.0325987.s002.docx^{(26.4KB, docx)}

S1 Fig. Performance of the proposed and baseline methods on the different endoscopic processors.

The subgroup analysis for each processor in terms of intersection over union and Hausdorff distance on the different processors included in the EPIC dataset is displayed. The mean value is indicated with a circle and lines depict 95% confidence intervals.

(TIFF)

pone.0325987.s003.tiff^{(43.8MB, tiff)}

S1 File. Camera signal extraction pipeline and training of the camera image segmentation model.

Description of the pipeline and training for the proposed method.

(DOCX)

pone.0325987.s004.docx^{(27.7KB, docx)}

Data Availability

The proposed pipeline and model weights are made available for download from https://www.kaggle.com/m/273059 (DOI: 10.34740/KAGGLE/M/273059). The EPIC dataset proposed in the paper and used for method validation can be downloaded from here https://www.kaggle.com/dsv/11103826 (DOI: 10.34740/KAGGLE/DSV/11103826). The image data for external validation are parts of five different public datasets, namely Kvasir-Insturment https://datasets.simula.no/kvasir-instrument/, PolypGen https://www.synapse.org/Synapse:syn45200214, PolypSet https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FCBUOR, HyperKvasir https://datasets.simula.no/hyper-kvasir/, and the El Salvador atlas gastrointestinal video endoscopy https://www.gastrointestinalatlas.com/index.html.

Funding Statement

The authors AH and WGZ receive public funding from the state government of Baden-Württemberg, Germany (Funding cluster “Forum Gesundheitsstandort Baden-Württemberg”) to research and develop artificial intelligence applications for polyp detection in screening colonoscopy (funding number 5409.0–001.01/15). The funders had no role in study design, data collection and analysis, decision to publish, or presentation of the manuscript.

References

1.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. doi: 10.1038/s41591-018-0300-7 [DOI] [PubMed] [Google Scholar]
2.He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K. The practical implementation of artificial intelligence technologies in medicine. Nat Med. 2019;25(1):30–6. doi: 10.1038/s41591-018-0307-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1(6):e271–97. doi: 10.1016/S2589-7500(19)30123-2 [DOI] [PubMed] [Google Scholar]
4.Hann A, Troya J, Fitting D. Current status and limitations of artificial intelligence in colonoscopy. United European Gastroenterol J. 2021;9(5):527–33. doi: 10.1002/ueg2.12108 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Brand M, Troya J, Krenzer A, Saßmannshausen Z, Zoller WG, Meining A. Development and evaluation of a deep learning model to improve the usability of polyp detection systems during interventions. United European Gastroenterol J. 2022;10(5):477–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Kamba S, Tamai N, Saitoh I, Matsui H, Horiuchi H, Kobayashi M, et al. Reducing adenoma miss rate of colonoscopy assisted by artificial intelligence: a multicenter randomized controlled trial. J Gastroenterol. 2021;56(8):746–57. doi: 10.1007/s00535-021-01808-w [DOI] [PubMed] [Google Scholar]
7.Li J, Lu J, Yan J, Tan Y, Liu D. Artificial intelligence can increase the detection rate of colorectal polyps and adenomas: a systematic review and meta-analysis. Eur J Gastroenterol Hepatol. 2021;33(8):1041–8. doi: 10.1097/MEG.0000000000001906 [DOI] [PubMed] [Google Scholar]
8.Glissen Brown JR, Mansour NM, Wang P, Chuchuca MA, Minchenberg SB, Chandnani M, et al. Deep learning computer-aided polyp detection reduces adenoma miss rate: a United States Multi-center Randomized Tandem Colonoscopy Study (CADeT-CS Trial). Clin Gastroenterol Hepatol. 2022;20(7):1499–507.e4. doi: 10.1016/j.cgh.2021.09.009 [DOI] [PubMed] [Google Scholar]
9.Repici A, Badalamenti M, Maselli R, Correale L, Radaelli F, Rondonotti E. Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial. Gastroenterology. 2020;159(2):512–20.e7. [DOI] [PubMed] [Google Scholar]
10.Weigt J, Repici A, Antonelli G, Afifi A, Kliegis L, Correale L, et al. Performance of a new integrated computer-assisted system (CADe/CADx) for detection and characterization of colorectal neoplasia. Endoscopy. 2022;54(2):180–4. doi: 10.1055/a-1372-0419 [DOI] [PubMed] [Google Scholar]
11.Wang P, Liu P, Glissen Brown JR, Berzin TM, Zhou G, Lei S, et al. Lower adenoma miss rate of computer-aided detection-assisted colonoscopy vs routine white-light colonoscopy in a prospective tandem study. Gastroenterology. 2020;159(4):1252–61.e5. doi: 10.1053/j.gastro.2020.06.023 [DOI] [PubMed] [Google Scholar]
12.Gimeno-García AZ, Negrin DH, Hernández A, Nicolás-Pérez D, Rodríguez E, Montesdeoca C. Usefulness of a novel computer-aided detection system for colorectal neoplasia: a randomized controlled trial. Gastrointest Endosc. 2022. doi: S0016-5107(22)02037-5 [DOI] [PubMed] [Google Scholar]
13.Soons E, Rath T, Hazewinkel Y, van Dop WA, Esposito D, Testoni PA, et al. Real-time colorectal polyp detection using a novel computer-aided detection system (CADe): a feasibility study. Int J Colorectal Dis. 2022;37(10):2219–28. doi: 10.1007/s00384-022-04258-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Levy I, Bruckmayer L, Klang E, Ben-Horin S, Kopylov U. Artificial intelligence-aided colonoscopy does not increase adenoma detection rate in routine clinical practice. Am J Gastroenterol. 2022;117(11):1871–3. [DOI] [PubMed] [Google Scholar]
15.Sudarevic B, Sodmann P, Kafetzis I, Troya J, Lux TJ, Saßmannshausen Z, et al. Artificial intelligence-based polyp size measurement in gastrointestinal endoscopy using the auxiliary waterjet as a reference. Endoscopy. 2023;55(9):871–6. doi: 10.1055/a-2077-7398 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Kader R, Cid-Mejias A, Brandao P, Islam S, Hebbar S, González-Bueno Puyal J. Polyp characterisation using deep learning and a publicly accessible polyp video database. Dig Endosc. 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Lux TJ, Saßmannshausen Z, Kafetzis I, Sodmann P, Herold K, Sudarevic B, et al. Assisted documentation as a new focus for artificial intelligence in endoscopy: the precedent of reliable withdrawal time and image reporting. Endoscopy. 2023;55(12):1118–23. doi: 10.1055/a-2122-1671 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Ebigbo A, Mendel R, Scheppach MW, Probst A, Shahidi N, Prinz F, et al. Vessel and tissue recognition during third-space endoscopy using a deep learning algorithm. Gut. 2022;71(12):2388–90. doi: 10.1136/gutjnl-2021-326470 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Römmele C, Mendel R, Barrett C, Kiesl H, Rauber D, Rückert T, et al. An artificial intelligence algorithm is highly accurate for detecting endoscopic features of eosinophilic esophagitis. Sci Rep. 2022;12(1):11115. doi: 10.1038/s41598-022-14605-z [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Konikoff T, Goren I, Yalon M, Tamir S, Avni-Biron I, Yanai H, et al. Machine learning for selecting patients with Crohn’s disease for abdominopelvic computed tomography in the emergency department. Dig Liver Dis. 2021;53(12):1559–64. doi: 10.1016/j.dld.2021.06.020 [DOI] [PubMed] [Google Scholar]
21.Kafetzis I, Fuchs K-H, Sodmann P, Troya J, Zoller W, Meining A, et al. Efficient artificial intelligence-based assessment of the gastroesophageal valve with Hill classification through active learning. Sci Rep. 2024;14(1):18825. doi: 10.1038/s41598-024-68866-x [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Kafetzis I, Sodmann P, Herghelegiu B-E, Brand M, Zoller WG, Seyfried F, et al. Prospective evaluation of real-time artificial intelligence for the hill classification of the gastroesophageal junction. United European Gastroenterol J. 2025;13(2):240–6. doi: 10.1002/ueg2.12721 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Qayyum A, Bilal M, Qadir J, Caputo M, Vohra H, Akinosho T. Segmentation-based dynamic cropping of endoscopic videos to address label leakage in surgical tool detection. 2023.
24.Waljee AK, Weinheimer-Haus EM, Abubakar A, Ngugi AK, Siwo GH, Kwakye G, et al. Artificial intelligence and machine learning for early detection and diagnosis of colorectal cancer in sub-Saharan Africa. Gut. 2022;71(7):1259–65. doi: 10.1136/gutjnl-2022-327211 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Hassan C, Spadaccini M, Mori Y, Foroutan F, Facciorusso A, Gkolfakis P, et al. Real-time computer-aided detection of colorectal neoplasia during colonoscopy : a systematic review and meta-analysis. Ann Intern Med. 2023;176(9):1209–20. doi: 10.7326/M22-3678 [DOI] [PubMed] [Google Scholar]
26.Sierra-Jerez F, Ruiz J, Martinez F. A non-aligned deep representation to enhance standard colonoscopy observations from vascular narrow band polyp patterns. Annu Int Conf IEEE Eng Med Biol Soc. 2022;2022:1671–4. doi: 10.1109/EMBC48229.2022.9871752 [DOI] [PubMed] [Google Scholar]
27.Münzer B, Schoeffmann K, Böszörmenyi L. Content-based processing and analysis of endoscopic images and videos: a survey. Multimed Tools Appl. 2018;77(1):1323–62. [Google Scholar]
28.Helferty JP, Zhang C, McLennan G, Higgins WE. Videoendoscopic distortion correction and its application to virtual guidance of endoscopy. IEEE Trans Med Imaging. 2001;20(7):605–17. doi: 10.1109/42.932745 [DOI] [PubMed] [Google Scholar]
29.Stehle T, Hennes M, Gross S, Behrens A, Wulff J, Aach T. Dynamic distortion correction for endoscopy systems with exchangeable optics. In: Meinzer HP, Deserno TM, Handels H, Tolxdorff T, editors. Bildverarbeitung für die Medizin 2009. Berlin, Heidelberg: Springer; 2009: 142–6. [Google Scholar]
30.Meslouhi O, Kardouchi M, Allali H, Gadi T, Benkaddour Y. Automatic detection and inpainting of specular reflections for colposcopic images. Open Computer Science. 2011;1(3):341–54. doi: 10.2478/s13537-011-0020-2 [DOI] [Google Scholar]
31.Münzer B, Schoeffmann K, Böszörmenyi L. Detection of circular content area in endoscopic videos. In: Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems. 2013: 534–6. https://ieeexplore.ieee.org/document/6627865/?arnumber=6627865 [Google Scholar]
32.Budd C, Garcia-Peraza Herrera LC, Huber M, Ourselin S, Vercauteren T. Rapid and robust endoscopic content area estimation: a lean GPU-based pipeline and curated benchmark dataset. Computer Methods Biomechanics Biomed Eng. 2023;11(4):1215–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Munzer B, Schoeffmann K, Boszormenyi L. Improving encoding efficiency of endoscopic videos by using circle detection based border overlays. In: 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW). 2013: 1–4. doi: 10.1109/icmew.2013.6618304 [DOI] [Google Scholar]
34.Yao H, Stidham RW, Gao Z, Gryak J, Najarian K. Motion-based camera localization system in colonoscopy videos. Medical Image Analysis. 2021;73:102180. [DOI] [PubMed] [Google Scholar]
35.Pore A, Finocchiaro M, Dall’Alba D, Hernansanz A, Ciuti G, Arezzo A. Colonoscopy navigation using end-to-end deep visuomotor control: a user study. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2022: 9582–8. [Google Scholar]
36.Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Medical image computing and computer-assisted intervention – MICCAI 2015. Cham: Springer International Publishing; 2015: 234–41. [Google Scholar]
37.Tan M, Le QV. EfficientNet: rethinking model scaling for convolutional neural networks. arXiv. 2020. [cited 2023 December 13]. http://arxiv.org/abs/1905.11946 [Google Scholar]
38.Jha D, Ali S, Emanuelsen K, Hicks SA, Thambawita V, Garcia-Ceja E. Kvasir-instrument: diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy. In: Lokoč J, Skopal T, Schoeffmann K, Mezaris V, Li X, Vrochidis S, editors. MultiMedia modeling. Cham: Springer International Publishing; 2021: 218–29. [Google Scholar]
39.Ali S, Jha D, Ghatwary N, Realdon S, Cannizzaro R, Salem OE, et al. A multi-centre polyp detection and segmentation dataset for generalisability assessment. Sci Data. 2023;10(1):75. doi: 10.1038/s41597-023-01981-y [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Wang G. Replication data for: colonoscopy polyp detection and classification: dataset creation and comparative evaluations. Harvard Dataverse. 2021. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FCBUOR [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Borgli H, Thambawita V, Smedsrud PH, Hicks S, Jha D, Eskeland SL, et al. HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci Data. 2020;7(1):283. doi: 10.1038/s41597-020-00622-y [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Murra-Saca JA. El Salvador atlas gastrointestinal video endoscopy. 2005. https://www.gastrointestinalatlas.com/index.html [Google Scholar]
43.Rockafellar RT, Wets RJB. Variational Analysis. Springer Berlin Heidelberg. 1998. doi: 10.1007/978-3-642-02431-3 [DOI] [Google Scholar]
44.Karimi D, Salcudean SE. Reducing the Hausdorff Distance in medical image segmentation with convolutional neural networks. IEEE Trans Med Imaging. 2020;39(2):499–513. doi: 10.1109/TMI.2019.2930068 [DOI] [PubMed] [Google Scholar]
45.Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261–72. doi: 10.1038/s41592-019-0686-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Sánchez-Peralta LF, Pagador JB, Picón A, Calderón ÁJ, Polo F, Andraka N. PICCOLO white-light and narrow-band imaging colonoscopic dataset: a performance comparative of models and datasets. Appl Sci. 2020;10(23):8501. [Google Scholar]
47.Chen Y-C, Karmakar R, Mukundan A, Huang C-W, Weng W-C, Wang H-C. Evaluation of Band Selection for Spectrum-Aided Visual Enhancer (SAVE) for esophageal cancer detection. J Cancer. 2025;16(2):470–8. doi: 10.7150/jca.102759 [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Wang YP, Karmakar R, Mukundan A, Tsao YM, Sung TC, Lu CL. Spectrum aided vision enhancer enhances mucosal visualization by hyperspectral imaging in capsule endoscopy. Sci Rep. 2024;14(1):22243. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Lux TJ, Banck M, Saßmannshausen Z, Troya J, Krenzer A, Fitting D, et al. Pilot study of a new freely available computer-aided polyp detection system in clinical practice. Int J Colorectal Dis. 2022;37(6):1349–54. [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. 2025 Jun 11;20(6):e0325987. doi: 10.1371/journal.pone.0325987.r001

Author response to Decision Letter 0

29 Jan 2025

PLoS One. doi: 10.1371/journal.pone.0325987.r002

Decision Letter 0

Kazunori Nagasaka

Dear Dr. Kafetzis,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

ACADEMIC EDITOR:

Minor enhancements to the discussion of broader impacts and potential limitations could provide a more rounded perspective.

Please submit your revised manuscript by Apr 25 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Kazunori Nagasaka

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

3. Thank you for stating the following financial disclosure: [The authors Alexander Hann and Wolfram G. Zoller receive public funding from the state government of Baden-Württemberg, Germany (Funding cluster “Forum Gesundheitsstandort Baden-Württemberg”) to research and develop artificial intelligence applications for polyp detection in screening colonoscopy (funding number 5409.0–001.01/15).].

Please state what role the funders took in the study. If the funders had no role, please state: ""The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.""

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

4. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please ensure that your ethics statement is included in your manuscript, as the ethics statement entered into the online submission form will not be published alongside your manuscript.

5. We note that you have indicated that there are restrictions to data sharing for this study. For studies involving human research participant data or other sensitive data, we encourage authors to share de-identified or anonymized data. However, when data cannot be publicly shared for ethical reasons, we allow authors to make their data sets available upon request. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

Before we proceed with your manuscript, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., a Research Ethics Committee or Institutional Review Board, etc.). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of recommended repositories, please see https://journals.plos.org/plosone/s/recommended-repositories. You also have the option of uploading the data as Supporting Information files, but we would recommend depositing data directly to a data repository if possible.

Please update your Data Availability statement in the submission form accordingly.

Additional Editor Comments:

Dear Authors,

Thank you for submitting your manuscript to Plos One.

Our expert reviewers have recommended to revise the manuscript.

Please revise the manuscript accordingly.

I think the manuscript addresses a significant issue in the application of AI in medical imaging.

Minor enhancements to the discussion of broader impacts and potential limitations could provide a more rounded perspective.

Best regards,

Kazunori Nagasaka

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: No

Reviewer #2: Yes

**********

Reviewer #1: This manuscript proposed a camera signal extraction pipeline in endoscopy by using deep learning UNet model. Definitely the topic is interesting and important. My comments are:

1. The manuscript was poorly organised and written. English is poor, hard to follow. The quality of the figures are very low, even without labels at all.

2. There exist lots of studies in AI for endoscopic image/video analysis as summarise in https://doi.org/10.1038/s41746-022-00733-3. Why do we need another one like the one in this manuscript, given that “the main contribution of this work is the development of an AI-based method that, given raw endoscopic image data as input, manages to extract, in a source agnostic way, the location of the endoscopic camera image from the raw data.”? – I do not see any new developments.

3. Throughout the manuscript, authors talked about baseline models, but I am struggling to find what it is. It seemed to mention the paper by Yao (23) but that paper is about motion-based camera localization system in colonoscopy videos rather than simply segmenting images.

4. The results simply showed IoU and HD for the proposed method and “baseline method” – The analysis of effects of multi-centres and multi-camera devices is lacking.

5. In conclusion, the authors claimed that “This method can be used with any existing AI model to improve its ability to generalize to data from different sources, thus increasing transferability and sustainability.” – it is hardly convincing.

Reviewer #2: The study assesses the AI-based strategy utilizing images from nine distinct processors and four capsule endoscopy equipment; yet, endoscopic technology is perpetually advancing. Could the authors elaborate on the efficacy of their strategy when applied to newly designed or less frequently utilized endoscopic devices? What strategies might be employed to mitigate this constraint in further research?

Although the method attains elevated IoU and diminished Hausdorff distance on the EPIC dataset and public datasets, the study fails to address its efficacy in real-time clinical environments characterized by dynamic elements such as motion blur, lighting fluctuations, and aberrations. Could the authors address the possible limits of implementing this strategy in a clinical workflow and suggest measures to enhance its robustness?

The research contrasts the suggested method with only one published baseline. Could the authors elucidate the rationale for the exclusion of other existing AI-based segmentation algorithms from the comparative analysis? In what manner does this omission influence the study's capacity to thoroughly validate its superiority?

The article asserts that the proposed strategy enables AI models to generalize across various endoscope sources. Was this assertion corroborated by training and evaluating an independent AI model on preprocessed photos from other sources? Otherwise, may the authors elaborate on the importance of performing such an evaluation?

The standardization of images from many sources may adversely affect nuanced image characteristics essential for specific AI-driven diagnostic applications. Can the authors address whether their strategy modifies clinically significant image details? How can this constraint be mitigated?

The authors must articulate a more compelling rationale for this study by detailing the imperative of creating a source-agnostic AI-driven segmentation technique for endoscopic pictures. The study effectively handles the issue of image variability from various endoscopic sources; nonetheless, it would benefit from a discussion on the role of standardization and preprocessing in enhancing AI-based medical image interpretation. Recent studies, including Chen et al. (2025), which assessed band selection strategies for improved visualization in esophageal cancer detection, and Wang et al. (2024), which illustrated the significance of hyperspectral imaging in augmenting mucosal visualization during capsule endoscopy, underscore the necessity of refining image processing techniques to elevate diagnostic precision. Referencing these works would enhance the rationale for the proposed method and situate its contribution within the wider progress in AI-enhanced endoscopic imaging:

1. Chen, Yen-Chun, et al. "Evaluation of Band Selection for Spectrum-Aided Visual Enhancer (SAVE) for Esophageal Cancer Detection." Journal of Cancer 16, no. 2 (2025): 470.

2. Wang, Yen-Po, et al. "Spectrum aided vision enhancer enhances mucosal visualization by hyperspectral imaging in capsule endoscopy." Scientific Reports 14, no. 1 (2024): 22243.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: Yes: Hsiang-Chen Wang

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Jun 11;20(6):e0325987. doi: 10.1371/journal.pone.0325987.r003

Author response to Decision Letter 1

20 Mar 2025

Dear Professor Kazunori Nagasaka,

We would like to express our sincere gratitude for your thoughtful consideration and

insightful comments on our manuscript. We have carefully addressed your suggestions and

made the necessary revisions to enhance the clarity and impact of our work.

As recommended, we have expanded the discussion section to include a detailed

explanation of how our proposed method can be integrated into both the development and

application of AI in endoscopy. We have also highlighted how this integration can improve

data preprocessing pipelines and AI model training for endoscopic image analysis. It now

reads;

“We believe that our method can significantly enhance the generalizability of AI models

trained for endoscopy. By integrating the proposed method into existing and future AI

models, we achieve a streamlined process where raw endoscopic images are first

processed through our method to extract only the camera signal, which contains all relevant

information necessary for the AI model while minimizing irrelevant data, thereby

standardizing the input images. This way, the images for the AI include all relevant

information, a minimal amount of irrelevant information, and are as standardized as possible,

independent of their source. This further contributes to standardizing endoscopic images, a

process that has already been shown to significantly impact AI performance, for example, in

esophageal cancer detection (43) and improved mucosal visualization in capsule endoscopy

(44). Thus, the introduction of our method can greatly benefit the model’s ability to

generalize data from any source without additional overhead. Furthermore, we believe that

the proposed method can easily find successful application in pipelines for pre-processing

endoscopic image data and AI model training. The fact that the image area to be cropped is

selected as the minimum area rectangle containing all the mask pixels, together with the

high performance of the AI, suggests that no loss of relevant information occurs from using

the proposed method.”

We believe this expanded discussion better illustrates the potential impact of our method on

clinical practice and future AI research in endoscopy. In addition, based on reviewer

feedback, we have incorporated an analysis of execution time in the results section and

extended the discussion to address potential limitations. This includes consideration of

newer endoscopic hardware and the impact of endoscopic image content on our method’s

performance. By incorporating these points, we aim to provide a more balanced perspective

on the strengths and potential challenges of our approach.

Lastly, we have ensured that our manuscript aligns with the journal's required formatting

style. This also included and updated version of the financial disclosure which now

additionally states that “The funders had no role in study design, data collection and

analysis, decision to publish, or preparation of the manuscript”.We believe these revisions align with your expectations, and we greatly appreciate your

support. Please find below our detailed, point-by-point response to each reviewer comment.

I remain at your disposal for any further inquiries.

With kind regards, on behalf of all the authors,

Dr. Ioannis Kafetzis

Response to Reviewer 1 comments:

This manuscript proposed a camera signal extraction pipeline in endoscopy by using deep

learning UNet model. Definitely the topic is interesting and important.

Response:

We thank you for your acknowledgment that the topic of the manuscript on a camera signal

extraction pipeline in endoscopy using deep learning UNet models is important and

interesting.

Comment 1:

The manuscript was poorly organised and written. English is poor, hard to follow. The quality

of the figures are very low, even without labels at all.

Response:

We thank the reviewer for their feedback. We have thoroughly revised the paper to enhance

expression and clarity. This included seeking help from a native English speaker to refine the

language. Based on their feedback, we believe that the updated version is clearer.

Additionally, we restructured the order in which points are presented in the methods section

to facilitate a more intuitive reading experience. To ensure high-quality visuals, we generated

all figures again at a resolution of 300 dpi to enhance clarity and detail. In Figure 5, we have

added labels above each column of images to make the contents of each image clearer.

With these revisions, we aim to improve the quality and readability of the manuscript.

Comment 2:

There exist lots of studies in AI for endoscopic image/video analysis as summarise in

https://doi.org/10.1038/s41746-022-00733-3. Why do we need another one like the one in

this manuscript, given that “the main contribution of this work is the development of an AI-

based method that, given raw endoscopic image data as input, manages to extract, in a

source agnostic way, the location of the endoscopic camera image from the raw data.”? – I

do not see any new developments.

Response:

We thank the reviewer for their insightful comments and for highlighting that our work did not

clearly articulate its main contributions. We have included the provided review andcharacteristic references in our introduction to better contextualize our research. Additionally,

we added more details about related methods attempting to extract camera signals from

endoscopic images under the assumption of a circular shape, pointing out the need to

extend these methos as camera signal in gastroscopies and colonoscopies do not typically

appear in this form.

We believe this context makes it clear why we developed a source-agnostic method for

extracting the endoscopic camera. Furthermore, we mention previous works attempting a

similar approach to ours, but without extensive evaluation or making the model publicly

available.

Comment 3:

Throughout the manuscript, authors talked about baseline models, but I am struggling to find

what it is. It seemed to mention the paper by Yao (23) but that paper is about motion-based

camera localization system in colonoscopy videos rather than simply segmenting images.

Response:

We appreciate the careful review and have revised the manuscript to address the feedback.

Indeeed the paper of Yao (2023) is focused on motion-based camera localization in

colonoscopy videos. Yet, in their section for data preparation they explicitly state the steps

followed to extract the camera signal. We have made updated the description in our

manuscript to pinpoint where the method is described in the original work. The passage now

reads;

“To evaluate the efficiency of our proposed method, we compared our method with the

currently used standard, which is described in (30), in the section 3.2 analyzing the data pre-

processing pipeline.”

Comment 4:

The results simply showed IoU and HD for the proposed method and “baseline method” –

The analysis of effects of multi-centres and multi-camera devices is lacking.

Response:

We appreciate your comments and have revised address the issues you pointed out.

The reviewer is right to point out that the manuscript was missing a comparison of the

impact that different hardware had. To address this, we added a subgroup analysis per

endoscopic processor for IoU (Intersection over Union) and HD in the supplementary

section, providing a more detailed comparison of the proposed method and the baseline

method.

The reviewer also mentioned multi-camera devices. Unfortunately, the available

endoscopes, we were only able to use standard monocular devices. Therefore, we did not

have any information on multi-camera devices available in this study.Comment 5:

In conclusion, the authors claimed that “This method can be used with any existing AI model

to improve its ability to generalize to data from different sources, thus increasing

transferability and sustainability.” – it is hardly convincing.

Response:

The reviewer is correct to point out that the conclusion is not convincing. To address this, we

re-wrote the entire conclusion section and now reference specific applications of our method

which demonstrate improved transferability and data sustainability. With these references,

we believe that our claims are now better supported.

Here is the revised conclusion:

“In this work, we propose an AI-based method that effectively extracts camera signals from

raw endoscopic recording data recording with different hardware and software options. This

can enhance the standardization of images used as input to AI models, thereby increasing

their transferability and generalizability across diverse clinical settings. This is crucial for

maintaining consistent, high-quality data with reduced variability issues that can arise when

training AI models on diverse datasets. Moreover, the proposed method's source

agnosticism supports data sustainability by unifying diverse datasets into standardized

format, simplifying their inclusion in AI training and evaluation pipelines.

Additionally, we generate a dataset, called EPIC, of standardized endoscopic images,

attempting to highlight and collectively report the diversity that can be introduced by different

endoscopic equipment.

By making both the proposed method and the EPIC dataset publicly available, we aspire to

generate a collaborative environment where researchers can build upon these foundational

resources to further advance their work in AI-driven endoscopic applications.”

Response to Reviewer 2 comments:

Comment 1:

The study assesses the AI-based strategy utilizing images from nine distinct processors and

four capsule endoscopy equipment; yet endoscopic technology is perpetually advancing.

Could the authors elaborate on the efficacy of their strategy when applied to newly designed

or less frequently utilized endoscopic devices? What strategies might be employed to

mitigate this constraint in further research?

Response:

The reviewer is right to point out that handling of data from newer devices has been noted in

the discussion section. We acknowledge that endoscopic technology is rapidly advancing,

and our AI-based strategy was tested with images from nine distinct processors and four

capsule endoscopy equipment. However, we recognize that there are newer and less

frequently utilized endoscopic devices that may pose challenges.To address this constraint, we propose several strategies for future research:

Data Collection: We acknowledge the difficulty in obtaining relevant data from newer and

higher-quality devices. To mitigate this, we encourage researchers to submit their datasets,

which will help us update our model accordingly.

Model Generalization: Our AI-based strategy was validated on distinct and external

datasets. Given its architecture, it is designed to maintain high performance even when

applied to new data sources.

Public Release of Updated Models: We make the proposed method publicly available and

welcome contributions from researchers who can provide additional datasets. Any updated

versions of our model will be made available to ensure continued improvement and

relevance.

“Endoscopy is a rapidly developing field, especially in terms of hardware. Furthermore, there

are certainly sources of data that are rare, and thus obtaining relevant images can be

harder. Considering this, the ability of the proposed method to perform on data from newer

devices is significant. We are confident that the model can maintain high performance, as its

evaluation was performed on two datasets that had no overlap with the training dataset.

Furthermore, we make the model publicly available and welcome data submission from

researcher to further develop and improve model performance. Any updated to the model

will be also made available, acknowledging data contributions as well.”

Comment 2:

Although the method attains elevated IoU and diminished Hausdorff distance on the EPIC

dataset and public datasets, the study fails to address its efficacy in real-time clinical

environments characterized by dynamic elements such as motion blur, lighting fluctuations,

and aberrations. Could the authors address the possible limits of implementing this strategy

in a clinical workflow and suggest measures to enhance its robustness?

Response:

We appreciate the reviewer’s comments and understand the importance of evaluating the

method's efficacy in real-time clinical environments. The following sections provide additional

details on the implementation limits and measures to enhance robustness. The reviewer

noted that the implementation in real-time was not explicitly mentioned, which is indeed a

critical aspect for practical application. To address this concern, we have evaluated the

mean time required for the model to process test data and translate it into frames per second

(fps).

This analysis is included in the results section of our paper. In the worst-case scenario, the

AI achieved a performance of 52 fps, which is more than satisfactory for incorporating into

clinical workflows. The passage in the results section reads:

“Finally, the time required for the proposed method to extract the camera signal was

investigated. In images from public datasets, the mean execution time per image was 0.011

seconds which is close to 90fps. For images on the EPIC datasets, which are in general ofhigher resolution, the mean execution time per images was 0.018 seconds, or 52 fps. The

results were obtained by using the method with an NVIDIA GeForce RTX 3080 Ti (NVIDIA

Corporation, 2788 San Tomas Expressway, Santa Clara, CA 95051, USA). Both cases

indicate that the model can achieve real-time performance.”

The reviewer is also correct to highlight that low image quality data could impact model

performance. We took this into account while developing the model, training with both high-

and low-quality data. We now explicitly point this out in the corresponding methods section,

which reads:

“The AI follows the UNet architecture (34) with an Efficient Net (35) as backbone. Training

data for the model consisted of 1.765 manually annotated endoscopic images extracted from

examination videos recorded between January 15th, 2019, and January 31st, 2022, in four

different endoscopic centers. The training data included a range of image quality levels from

clear, high-quality images to lower quality, blurry images. This diverse set of indicative

images reflects the varied conditions encountered during endoscopy.”

Finally, to address real time use, we added in the discussion section the following passage:

“Endoscopic data usually contain elements such as motion blur and artefacts, that could be

disrupting to the AIs performance. To mitigate this problem, such data were also annotated

and included in the training dataset. Furthermore, in case of video data, the cropping

dimensions can be obtained via averaging of sequential frames, enabling removal of any

outliers.”

Comment 3:

The research contrasts the suggested method with only one published baseline. Could the

authors elucidate the rationale for the exclusion of other existing AI-based segmentation

algorithms from the comparative analysis? In what manner does this omission influence the

study's capacity to thoroughly validate its superiority?

Response:

In the introduction, we now clearly state that our research contrasts the suggested method

with only one publicly available baseline due to the specific focus on gastroenterology

applications where circular endoscopic images are extracted, which is not a common

scenario for other AI-based segmentation algorithms. This explains why certain existing

methods were excluded from the comparison.

Additionally, in the introduction and conclusion section, we have emphasized that a recent

paper used a UNET architecture for camera signal segmentation but did not provide any

model or checkpoint, making it impossible to compare their results directly. Therefore, this

method was also excluded from our comparative analysis.

Comment 4:

The article asserts that the proposed strategy enables AI models to generalize across

various endoscope sources. Was this assertion corroborated by training and eva

Attachment

Submitted filename: Rebuttal-Letter.pdf

pone.0325987.s006.pdf^{(108.1KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0325987.r004

Decision Letter 1

Kazunori Nagasaka

Dear Dr. Kafetzis,

Please submit your revised manuscript by May 12 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Kazunori Nagasaka

Academic Editor

PLOS ONE

Additional Editor Comments (if provided):

Dear Authors,

Thank you for your revised manuscript submission. After careful review, the manuscript still requires substantial improvements. Specific concerns include:

Abstract Format: The abstract, particularly the last line, is improperly formatted. Please carefully proofread and format according to the journal's guidelines.

Figures Missing: Figures 1 and 2 referenced in the text cannot be located. Ensure all figures referenced are clearly included and properly labeled.

Reference Accuracy: Reference 21 lacks essential publication information. Please verify and complete all references to comply with journal standards.

Technical Definition: The manuscript's definition of "Hausdorff distance (HD) as the maximum distance between predictions and the gold standard" is overly simplistic. Provide a more precise and technically accurate definition appropriate for a scholarly audience.

Gold Standard Generation: It remains unclear how the labels or gold standards for training were generated. Clarify your methodology for creating these standards to ensure reproducibility and transparency.

Technical Contribution: The manuscript currently employs a UNet model without notable innovations or modifications. To enhance technical value, clearly state and elaborate on novel aspects or significant modifications of your approach compared to existing methodologies.

Due to these critical concerns, your manuscript requires major revisions. Please address each point thoroughly and resubmit a significantly improved version.

Sincerely,

Kazunori Nagasaka

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions??>

Reviewer #1: No

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #1: No

Reviewer #2: Yes

**********

Reviewer #1: The revised manuscript is far from satisfactory. The quality of the production is still quite low, say, the last line of the abstract is in improper format, there is no where to find Figure 1 and Figure 2, reference 21 has no publication information etc. to name a few. Moreover, you cannot simply say "Hausdorff distance (HD) is the maximum distance between predictions and the gold standard." The design of the method in the manuscript is not clear how you generate the labels/gold standard for training. More importantly, the manuscript simply applied the UNet rather than developed some new techniques, which show very little technical contributions.

Reviewer #2: The authors all reply my comments. I haven't no more comments. This article can be accepted by PLOS One.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #1: No

Reviewer #2: Yes: Hsiang-Chen Wang

**********

PLoS One. 2025 Jun 11;20(6):e0325987. doi: 10.1371/journal.pone.0325987.r005

Author response to Decision Letter 2

2 May 2025

Dear Prof. Kazunori Nagasaka,

We would like to sincerely thank you for considering our manuscript for publication. We also apologize for the oversight regarding the missing figures in the revised manuscript. Please rest assured that we have carefully addressed all your comments, as well as those of the reviewers, in this revised version.

Below, we outline the actions taken in response to your specific concerns:

Comment 1: Abstract Format: The abstract, particularly the last line, is improperly formatted. Please carefully proofread and format according to the journal's guidelines.

Response: We apologize for the misalignment caused by applying text justification. This has now been corrected, and the abstract is properly aligned and formatted according to the journal's guidelines.

Comment 2: Figures Missing: Figures 1 and 2 referenced in the text cannot be located. Ensure all figures referenced are clearly included and properly labeled.

Response: We sincerely apologize for this oversight. In the revised submission, all figures are now properly included, clearly labeled, and meet the journal’s high-quality standards, with a resolution of 300 dpi.

Comment 3: Reference Accuracy: Reference 21 lacks essential publication information. Please verify and complete all references to comply with journal standards.

Response: We have updated Reference 21 by adding the missing publication details, including the proceedings in which the work was published, to ensure compliance with the journal’s referencing requirements.

Comment 4: Technical Definition: The manuscript's definition of "Hausdorff distance (HD) as the maximum distance between predictions and the gold standard" is overly simplistic. Provide a more precise and technically accurate definition appropriate for a scholarly audience.

Response: We recognize that the original definition was too simplistic. In the revised manuscript, we have included the formal mathematical definition of the Hausdorff distance and expanded the explanation to provide a more accurate and comprehensive description suitable for a scholarly audience.

Comment 5: Gold Standard Generation: It remains unclear how the labels or gold standards for training were generated. Clarify your methodology for creating these standards to ensure reproducibility and transparency.

Response: We have revised the Methods section to provide a detailed explanation of how the gold standard semantic segmentation masks were manually created for the camera signal, ensuring clarity and reproducibility of our methodology.

Comment 6: Technical Contribution: The manuscript currently employs a UNet model without notable innovations or modifications. To enhance technical value, clearly state and elaborate on novel aspects or significant modifications of your approach compared to existing methodologies.

Response: While we used a standard UNet model for segmenting the camera signal from raw endoscopic data, the primary contribution of our work lies in the application of an AI-based solution to standardize raw endoscopic images by extracting the camera signal. This approach addresses a critical gap in endoscopic research and is novel in its application. Additionally, we have made the trained model weights and methodology publicly available to support the broader research community. Our focus was on applying existing models in innovative ways to solve prevalent challenges in the field, rather than developing a new architecture.

We believe these revisions address all your concerns and have substantially improved the quality of the manuscript. A detailed response to the reviewer comments is also included below.

Once again, we are grateful for the valuable feedback from you and the reviewers, which has significantly enhanced the manuscript.

Dr. Ioannis Kafetzis

Response to Reviewer #1

Comment: The quality of the production is still quite low, say, the last line of the abstract is in improper format, there is no where to find Figure 1 and Figure 2, reference 21 has no publication information.

Response: We appreciate the reviewer’s careful review and apologize for the identified issues. Regarding the last line of the abstract, we resolved the formatting issue related to text justification. We have also ensured that Figures 1 and 2 are now properly referenced and visible in the manuscript. Finally, we have updated reference 21 with the necessary publication information.

Comment: You cannot simply say "Hausdorff distance (HD) is the maximum distance between predictions and the gold standard."

Response: The reviewer is right to point out that the original description of the Hausdorff distance was overly simplistic. To address this, we have now provided a more precise and formal mathematical definition of the Hausdorff distance. Additionally, to make the concept clearer for a broader audience, we included an interpretation of the definition. The revised passage reads:

The method was evaluated by means of intersection over union (IoU) and Hausdorff distance (HD). IoU is defined as the percentage overlapping with the gold standard. The HD, in the context of semantic segmentation, is defined as follows. Let m_1 and m_2 be two binary masks. The one-sided HD from m_1 to m_2 is defined as hd_s (m_1,m_2 )=(max)┬(x∈m_1 )⁡(min)┬(y∈m_2 )⁡〖‖x-y‖_2 〗 . This can be interpreted as the maximum distance from a point x∈m_1 to the closest point y∈m_2. The HD of m_1 and m_2 is defined as the maximum of the two one-sided HDs, that is HD(m_1,m_2 )=(max)┬⁡〖(hd_s (m_1,m_2 ),hd_s (m_2,m_1 )〗. Thus, the HD between the two masks can be seen as the greatest distance from a point in either mask to its nearest point in the other mask.

Comment: The design of the method in the manuscript is not clear how you generate the labels/gold standard for training.

Response: We appreciate the reviewer’s observation that the method for generating the gold standard was not sufficiently detailed. To clarify, we have expanded the relevant section in the Methods to explain that the gold standard semantic segmentation masks were manually created for the camera signal. The revised text reads:

Gold standard binary semantic segmentation masks for each image were manually created to delineate the camera signal– where a value of 1 indicated pixel inclusion within the camera signal and 0 otherwise.

Comment: More importantly, the manuscript simply applied the UNet rather than developed some new techniques, which show very little technical contributions.

Response: While it is true that the paper utilizes a UNet architecture to segment the camera signal from raw endoscopic data, the primary contribution of the manuscript lies in applying AI-based methods to standardize raw endoscopic images. Our approach provides an applicable, practical solution for the research community, and the trained model weights, along with the method itself, are made publicly available to facilitate further research. The intention was not to propose a new architecture but to leverage existing, well-established methods to address a key challenge in the field. We believe this makes a meaningful contribution to the standardization of endoscopic data. We have updated the relevant parts of the abstract, introduction and discussion to highlight our aim and most importantly, the fact that the proposed method and model weights are publicly available.

Response to Reviewer #2

We are thankful to the reviewer for the valuable insights they have provided. We are glad that they support publication of the work.

Attachment

Submitted filename: REVISION2-Response-to-Reviewers.docx

pone.0325987.s007.docx^{(27.7KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0325987.r006

Decision Letter 2

Kazunori Nagasaka

Advancing artificial intelligence applicability in endoscopy through source-agnostic camera signal extraction from endoscopic images

PONE-D-25-03664R2

Dear Dr. Kafetzis,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Kazunori Nagasaka

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Dear Authors,

I am pleased to inform you that your manuscript entitled "Advancing artificial intelligence applicability in endoscopy through source-agnostic camera signal extraction from endoscopic images" (Manuscript ID PONE-D-25-03664R2) has been reviewed carefully and is now accepted for publication in PLOS ONE.

The editorial team and reviewers acknowledge your diligent effort in addressing all the comments and concerns raised during the review process. Your revisions have significantly enhanced the manuscript’s clarity, depth, and scientific rigor, aligning it fully with the standards of our journal.

Again, Thank you for choosing Plos One for your publication.

Sincerely,

Kazunori Nagasaka, M.D., Ph.D.

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions??>

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously? -->?>

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available??>

The PLOS Data policy

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English??>

Reviewer #2: Yes

**********

Reviewer #2: The authors all reply my concerns. The manuscript technically sound, and do the data support the conclusions. This article can be accepted by PLOS One.

**********

what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy

Reviewer #2: Yes: Hsiang-Chen Wang

**********

PLoS One. doi: 10.1371/journal.pone.0325987.r007

Acceptance letter

Kazunori Nagasaka

PONE-D-25-03664R2

PLOS ONE

Dear Dr. Kafetzis,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Kazunori Nagasaka

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Table. Image data from endoscopic processors included in the EPIC dataset.

(DOCX)

pone.0325987.s001.docx^{(31KB, docx)}

S2 Table. Image data from capsule endoscopy included in the EPIC dataset.

Description of the different small bowel and colon video capsules included in the EPIC dataset. EPIC: Endoscopic Processor Image Collection.

(DOCX)

pone.0325987.s002.docx^{(26.4KB, docx)}

S1 Fig. Performance of the proposed and baseline methods on the different endoscopic processors.

(TIFF)

pone.0325987.s003.tiff^{(43.8MB, tiff)}

S1 File. Camera signal extraction pipeline and training of the camera image segmentation model.

Description of the pipeline and training for the proposed method.

(DOCX)

pone.0325987.s004.docx^{(27.7KB, docx)}

Attachment

Submitted filename: Rebuttal-Letter.pdf

pone.0325987.s006.pdf^{(108.1KB, pdf)}

Attachment

Submitted filename: REVISION2-Response-to-Reviewers.docx

pone.0325987.s007.docx^{(27.7KB, docx)}

Data Availability Statement

[pone.0325987.ref001] 1.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. doi: 10.1038/s41591-018-0300-7 [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref002] 2.He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K. The practical implementation of artificial intelligence technologies in medicine. Nat Med. 2019;25(1):30–6. doi: 10.1038/s41591-018-0307-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref003] 3.Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1(6):e271–97. doi: 10.1016/S2589-7500(19)30123-2 [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref004] 4.Hann A, Troya J, Fitting D. Current status and limitations of artificial intelligence in colonoscopy. United European Gastroenterol J. 2021;9(5):527–33. doi: 10.1002/ueg2.12108 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref005] 5.Brand M, Troya J, Krenzer A, Saßmannshausen Z, Zoller WG, Meining A. Development and evaluation of a deep learning model to improve the usability of polyp detection systems during interventions. United European Gastroenterol J. 2022;10(5):477–84. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref006] 6.Kamba S, Tamai N, Saitoh I, Matsui H, Horiuchi H, Kobayashi M, et al. Reducing adenoma miss rate of colonoscopy assisted by artificial intelligence: a multicenter randomized controlled trial. J Gastroenterol. 2021;56(8):746–57. doi: 10.1007/s00535-021-01808-w [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref007] 7.Li J, Lu J, Yan J, Tan Y, Liu D. Artificial intelligence can increase the detection rate of colorectal polyps and adenomas: a systematic review and meta-analysis. Eur J Gastroenterol Hepatol. 2021;33(8):1041–8. doi: 10.1097/MEG.0000000000001906 [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref008] 8.Glissen Brown JR, Mansour NM, Wang P, Chuchuca MA, Minchenberg SB, Chandnani M, et al. Deep learning computer-aided polyp detection reduces adenoma miss rate: a United States Multi-center Randomized Tandem Colonoscopy Study (CADeT-CS Trial). Clin Gastroenterol Hepatol. 2022;20(7):1499–507.e4. doi: 10.1016/j.cgh.2021.09.009 [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref009] 9.Repici A, Badalamenti M, Maselli R, Correale L, Radaelli F, Rondonotti E. Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial. Gastroenterology. 2020;159(2):512–20.e7. [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref010] 10.Weigt J, Repici A, Antonelli G, Afifi A, Kliegis L, Correale L, et al. Performance of a new integrated computer-assisted system (CADe/CADx) for detection and characterization of colorectal neoplasia. Endoscopy. 2022;54(2):180–4. doi: 10.1055/a-1372-0419 [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref011] 11.Wang P, Liu P, Glissen Brown JR, Berzin TM, Zhou G, Lei S, et al. Lower adenoma miss rate of computer-aided detection-assisted colonoscopy vs routine white-light colonoscopy in a prospective tandem study. Gastroenterology. 2020;159(4):1252–61.e5. doi: 10.1053/j.gastro.2020.06.023 [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref012] 12.Gimeno-García AZ, Negrin DH, Hernández A, Nicolás-Pérez D, Rodríguez E, Montesdeoca C. Usefulness of a novel computer-aided detection system for colorectal neoplasia: a randomized controlled trial. Gastrointest Endosc. 2022. doi: S0016-5107(22)02037-5 [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref013] 13.Soons E, Rath T, Hazewinkel Y, van Dop WA, Esposito D, Testoni PA, et al. Real-time colorectal polyp detection using a novel computer-aided detection system (CADe): a feasibility study. Int J Colorectal Dis. 2022;37(10):2219–28. doi: 10.1007/s00384-022-04258-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref014] 14.Levy I, Bruckmayer L, Klang E, Ben-Horin S, Kopylov U. Artificial intelligence-aided colonoscopy does not increase adenoma detection rate in routine clinical practice. Am J Gastroenterol. 2022;117(11):1871–3. [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref015] 15.Sudarevic B, Sodmann P, Kafetzis I, Troya J, Lux TJ, Saßmannshausen Z, et al. Artificial intelligence-based polyp size measurement in gastrointestinal endoscopy using the auxiliary waterjet as a reference. Endoscopy. 2023;55(9):871–6. doi: 10.1055/a-2077-7398 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref016] 16.Kader R, Cid-Mejias A, Brandao P, Islam S, Hebbar S, González-Bueno Puyal J. Polyp characterisation using deep learning and a publicly accessible polyp video database. Dig Endosc. 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref017] 17.Lux TJ, Saßmannshausen Z, Kafetzis I, Sodmann P, Herold K, Sudarevic B, et al. Assisted documentation as a new focus for artificial intelligence in endoscopy: the precedent of reliable withdrawal time and image reporting. Endoscopy. 2023;55(12):1118–23. doi: 10.1055/a-2122-1671 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref018] 18.Ebigbo A, Mendel R, Scheppach MW, Probst A, Shahidi N, Prinz F, et al. Vessel and tissue recognition during third-space endoscopy using a deep learning algorithm. Gut. 2022;71(12):2388–90. doi: 10.1136/gutjnl-2021-326470 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref019] 19.Römmele C, Mendel R, Barrett C, Kiesl H, Rauber D, Rückert T, et al. An artificial intelligence algorithm is highly accurate for detecting endoscopic features of eosinophilic esophagitis. Sci Rep. 2022;12(1):11115. doi: 10.1038/s41598-022-14605-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref020] 20.Konikoff T, Goren I, Yalon M, Tamir S, Avni-Biron I, Yanai H, et al. Machine learning for selecting patients with Crohn’s disease for abdominopelvic computed tomography in the emergency department. Dig Liver Dis. 2021;53(12):1559–64. doi: 10.1016/j.dld.2021.06.020 [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref021] 21.Kafetzis I, Fuchs K-H, Sodmann P, Troya J, Zoller W, Meining A, et al. Efficient artificial intelligence-based assessment of the gastroesophageal valve with Hill classification through active learning. Sci Rep. 2024;14(1):18825. doi: 10.1038/s41598-024-68866-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref022] 22.Kafetzis I, Sodmann P, Herghelegiu B-E, Brand M, Zoller WG, Seyfried F, et al. Prospective evaluation of real-time artificial intelligence for the hill classification of the gastroesophageal junction. United European Gastroenterol J. 2025;13(2):240–6. doi: 10.1002/ueg2.12721 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref023] 23.Qayyum A, Bilal M, Qadir J, Caputo M, Vohra H, Akinosho T. Segmentation-based dynamic cropping of endoscopic videos to address label leakage in surgical tool detection. 2023.

[pone.0325987.ref024] 24.Waljee AK, Weinheimer-Haus EM, Abubakar A, Ngugi AK, Siwo GH, Kwakye G, et al. Artificial intelligence and machine learning for early detection and diagnosis of colorectal cancer in sub-Saharan Africa. Gut. 2022;71(7):1259–65. doi: 10.1136/gutjnl-2022-327211 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref025] 25.Hassan C, Spadaccini M, Mori Y, Foroutan F, Facciorusso A, Gkolfakis P, et al. Real-time computer-aided detection of colorectal neoplasia during colonoscopy : a systematic review and meta-analysis. Ann Intern Med. 2023;176(9):1209–20. doi: 10.7326/M22-3678 [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref026] 26.Sierra-Jerez F, Ruiz J, Martinez F. A non-aligned deep representation to enhance standard colonoscopy observations from vascular narrow band polyp patterns. Annu Int Conf IEEE Eng Med Biol Soc. 2022;2022:1671–4. doi: 10.1109/EMBC48229.2022.9871752 [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref027] 27.Münzer B, Schoeffmann K, Böszörmenyi L. Content-based processing and analysis of endoscopic images and videos: a survey. Multimed Tools Appl. 2018;77(1):1323–62. [Google Scholar]

[pone.0325987.ref028] 28.Helferty JP, Zhang C, McLennan G, Higgins WE. Videoendoscopic distortion correction and its application to virtual guidance of endoscopy. IEEE Trans Med Imaging. 2001;20(7):605–17. doi: 10.1109/42.932745 [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref029] 29.Stehle T, Hennes M, Gross S, Behrens A, Wulff J, Aach T. Dynamic distortion correction for endoscopy systems with exchangeable optics. In: Meinzer HP, Deserno TM, Handels H, Tolxdorff T, editors. Bildverarbeitung für die Medizin 2009. Berlin, Heidelberg: Springer; 2009: 142–6. [Google Scholar]

[pone.0325987.ref030] 30.Meslouhi O, Kardouchi M, Allali H, Gadi T, Benkaddour Y. Automatic detection and inpainting of specular reflections for colposcopic images. Open Computer Science. 2011;1(3):341–54. doi: 10.2478/s13537-011-0020-2 [DOI] [Google Scholar]

[pone.0325987.ref031] 31.Münzer B, Schoeffmann K, Böszörmenyi L. Detection of circular content area in endoscopic videos. In: Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems. 2013: 534–6. https://ieeexplore.ieee.org/document/6627865/?arnumber=6627865 [Google Scholar]

[pone.0325987.ref032] 32.Budd C, Garcia-Peraza Herrera LC, Huber M, Ourselin S, Vercauteren T. Rapid and robust endoscopic content area estimation: a lean GPU-based pipeline and curated benchmark dataset. Computer Methods Biomechanics Biomed Eng. 2023;11(4):1215–24. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref033] 33.Munzer B, Schoeffmann K, Boszormenyi L. Improving encoding efficiency of endoscopic videos by using circle detection based border overlays. In: 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW). 2013: 1–4. doi: 10.1109/icmew.2013.6618304 [DOI] [Google Scholar]

[pone.0325987.ref034] 34.Yao H, Stidham RW, Gao Z, Gryak J, Najarian K. Motion-based camera localization system in colonoscopy videos. Medical Image Analysis. 2021;73:102180. [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref035] 35.Pore A, Finocchiaro M, Dall’Alba D, Hernansanz A, Ciuti G, Arezzo A. Colonoscopy navigation using end-to-end deep visuomotor control: a user study. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2022: 9582–8. [Google Scholar]

[pone.0325987.ref036] 36.Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Medical image computing and computer-assisted intervention – MICCAI 2015. Cham: Springer International Publishing; 2015: 234–41. [Google Scholar]

[pone.0325987.ref037] 37.Tan M, Le QV. EfficientNet: rethinking model scaling for convolutional neural networks. arXiv. 2020. [cited 2023 December 13]. http://arxiv.org/abs/1905.11946 [Google Scholar]

[pone.0325987.ref038] 38.Jha D, Ali S, Emanuelsen K, Hicks SA, Thambawita V, Garcia-Ceja E. Kvasir-instrument: diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy. In: Lokoč J, Skopal T, Schoeffmann K, Mezaris V, Li X, Vrochidis S, editors. MultiMedia modeling. Cham: Springer International Publishing; 2021: 218–29. [Google Scholar]

[pone.0325987.ref039] 39.Ali S, Jha D, Ghatwary N, Realdon S, Cannizzaro R, Salem OE, et al. A multi-centre polyp detection and segmentation dataset for generalisability assessment. Sci Data. 2023;10(1):75. doi: 10.1038/s41597-023-01981-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref040] 40.Wang G. Replication data for: colonoscopy polyp detection and classification: dataset creation and comparative evaluations. Harvard Dataverse. 2021. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FCBUOR [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref041] 41.Borgli H, Thambawita V, Smedsrud PH, Hicks S, Jha D, Eskeland SL, et al. HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci Data. 2020;7(1):283. doi: 10.1038/s41597-020-00622-y [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref042] 42.Murra-Saca JA. El Salvador atlas gastrointestinal video endoscopy. 2005. https://www.gastrointestinalatlas.com/index.html [Google Scholar]

[pone.0325987.ref043] 43.Rockafellar RT, Wets RJB. Variational Analysis. Springer Berlin Heidelberg. 1998. doi: 10.1007/978-3-642-02431-3 [DOI] [Google Scholar]

[pone.0325987.ref044] 44.Karimi D, Salcudean SE. Reducing the Hausdorff Distance in medical image segmentation with convolutional neural networks. IEEE Trans Med Imaging. 2020;39(2):499–513. doi: 10.1109/TMI.2019.2930068 [DOI] [PubMed] [Google Scholar]

[pone.0325987.ref045] 45.Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261–72. doi: 10.1038/s41592-019-0686-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref046] 46.Sánchez-Peralta LF, Pagador JB, Picón A, Calderón ÁJ, Polo F, Andraka N. PICCOLO white-light and narrow-band imaging colonoscopic dataset: a performance comparative of models and datasets. Appl Sci. 2020;10(23):8501. [Google Scholar]

[pone.0325987.ref047] 47.Chen Y-C, Karmakar R, Mukundan A, Huang C-W, Weng W-C, Wang H-C. Evaluation of Band Selection for Spectrum-Aided Visual Enhancer (SAVE) for esophageal cancer detection. J Cancer. 2025;16(2):470–8. doi: 10.7150/jca.102759 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref048] 48.Wang YP, Karmakar R, Mukundan A, Tsao YM, Sung TC, Lu CL. Spectrum aided vision enhancer enhances mucosal visualization by hyperspectral imaging in capsule endoscopy. Sci Rep. 2024;14(1):22243. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0325987.ref049] 49.Lux TJ, Banck M, Saßmannshausen Z, Troya J, Krenzer A, Fitting D, et al. Pilot study of a new freely available computer-aided polyp detection system in clinical practice. Int J Colorectal Dis. 2022;37(6):1349–54. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Advancing artificial intelligence applicability in endoscopy through source-agnostic camera signal extraction from endoscopic images

Ioannis Kafetzis

Philipp Sodmann

Robert Hüneburg

Jacob Nattermann

Nora Martens

Daniel R Englmann

Wolfram G Zoller

Alexander Meining

Alexander Hann

Roles

Abstract

Introduction

Methods

Results

Conclusion

Introduction

Materials and methods

Endoscopic camera signal extraction pipeline

Generation of the “Endoscopic Processors Image Collection” dataset

Fig 1. Example of one of the examination room setups used in obtaining images for the EPIC dataset.

Method evaluation

Statistical analysis

Statement of ethics

Results

Composition of the EPIC dataset

Fig 2. Image examples of images from the EPIC dataset with their capturing devices.

Evaluation of the proposed method

Fig 3. Performance comparison of the proposed and baseline methods when extracting the endoscopic camera signal for images from public datasets.

Fig 4. Performance comparison of the proposed and baseline methods when extracting the endoscopic camera signal from images of the EPIC dataset.

Fig 5. Examples of application of the proposed method to images from the test dataset.

Discussion

Conclusions

Supporting information

Data Availability

Funding Statement

References

Author response to Decision Letter 0

Decision Letter 0

Kazunori Nagasaka

Roles

Author response to Decision Letter 1

Decision Letter 1

Kazunori Nagasaka

Roles

Author response to Decision Letter 2

Decision Letter 2

Kazunori Nagasaka

Roles

Acceptance letter

Kazunori Nagasaka

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases