Abstract
Introduction
Successful application of artificial intelligence (AI) in endoscopy requires effective image processing. Yet, the plethora of sources for endoscopic images, such as different processor-endoscope combinations or capsule endoscopy devices, results in images that vastly differ in appearance. These differences hinder the generalizability of AI models in endoscopy.
Methods
We developed an AI-based method for extracting the camera signal from raw endoscopic images in a source-agnostic manner. Additionally, we created a diverse dataset of standardized endoscopic images, named Endoscopic Processor Image Collection (EPIC), from 4 different endoscopy centers. Included data were recorded using 9 different processors from 4 manufacturers with 45 endoscopes. Furthermore, images recorded with 4 capsule endoscopy devices from 2 manufacturers are included. We evaluated the camera signal extraction method using 641 manually annotated images from 5 different, publicly available endoscopic image datasets, as well as on the EPIC dataset. Results were compared it with a published baseline in terms of Intersection over Union (IoU) and Hausdorff distance (HD).
Results
In segmenting the camera signal on images from public datasets, our method achieved mean IoU of 0.97 which was significantly higher than that of the baseline method and mean HD of 21 pixels which was significantly lower compared to the baseline. On the standardized images of the EPIC dataset, there was no significant difference between IoU but our method achieved a significantly lower HD. Both the developed AI-based method and the generated dataset are made publicly available.
Conclusion
This work introduces an AI-based method that effectively segments the endoscope camera signal from the raw endoscopic data in a source-agnostic way. Utilizing the proposed method as a preprocessing step allows existing AI models to use any endoscopic image, independent of its source, without compromising performance. Additionally, EPIC, a dataset of diverse endoscopic images, is generated. The proposed method, trained AI model weights, and the EPIC dataset are made publicly available.
Introduction
In recent years, artificial intelligence (AI) has entered the medical field, finding direct involvement in patient care [1,2], showing the potential to achieve human-like performance [3]. This is particularly true for endoscopy, as the image-based nature of the examination can benefit from AI-based computer vision methods [4]. In colonoscopy, multiple AI-based systems attempt to enhance the physician’s ability to detect adenoma during the examination [5–8] with several commercially computer-aided detection (CADe) systems being developed [9–13] and introduced in clinical routine [14]. Further applications of AI in endoscopy include polyp size estimation [15], characterization of colorectal polyps [16], automatic report generation for colonoscopy examinations [17], resection planning of gastrointestinal neoplasia [18], recognition of eosinophilic esophagitis [19], and management of patients suffering from Crohn’s disease [20], hiatal hernias [21] and assessment of the gastroesophageal junction [22].
Raw data captured during the endoscopic examination is suboptimal to be used with AI, as the recorded interface contains the camera signal along with borders where patient and examination related information are displayed. The presence of extensive additional information in combination with the variability of the camera signal’s shape and location in the raw data compromises AI performance and increases the required computational power. As a result, existing AI models cannot perform adequately “out of the box” with data from different sources. Furthermore, using uncropped images introduces model bias that diminishes model performance, as the model focuses on the borders and shape of the camera signal [23]. On the other hand, extracting only the camera signal can significantly improve model generalizability, as it vastly decreases aforementioned discrepancies in model inputs. Such a method is also beneficial for cloud computing [24], as reducing the amount of transmitted data can reduce communication delays.
Several studies evaluating AI-based systems only used preset dimensions for extracting the camera signal [5,9]. This approach might be effective on a small scale yet becomes limiting for multi-center studies which are required for proper evaluation of AI models [25]. The lack of standardized image pre-processing and the need of manual endoscopic image cropping are acknowledged by Sierra-Jerez and colleagues in [26] as important obstacles in developing AI models in colonoscopy.
Image processing methods that can enhance performance of computer vision methods have been investigated [27]. Such methods include undistortion of the endoscopic images [28], camera calibration [29], and treatment of light artefacts and reflections [30]. Furthermore, extraction of the camera signal location has been extensively studied when the camera signal is of circular shape, which is common in endoscopic surgery [31–33]. Moving to data from gastroscopy and colonoscopy, the circularity assumption does usually not hold, as image signal is visualized in different shapes ranging from polygonal to elliptical. Mathematical methods for extracting the camera signal in these cases have been investigated [34,35]. Yet, such methods lack flexibility and thus fail to adapt to the ever-increasing diversity of raw data. AI-based cropping of endoscopic images was discussed in [23], as a method to reduce model bias. The AI presented there was not evaluated on different sources of endoscopic images and was not made publicly available.
The main aim of this work is the development and evaluation of an AI-based method that extracts the camera signal from the raw data in a source agnostic method, and to make the method and AI model weights publicly available. Such a method can support usage of AI models developed for endoscopy, by unifying input data, limiting their variability without loss of relevant information. The proposed method is evaluated using manually annotated endoscopic images from a variety of publicly available datasets. Additional evaluation is performed using a unique new dataset of standardized endoscopic images, called the Endoscopic Processor Image Collection (EPIC), which is introduced in this work. The EPIC dataset contains images recorded with a variety of sources, such as different processor-endoscope combinations and capsule endoscopy devices. Furthermore, the dataset contains endoscopic manually annotated binary masks indicating the location of the camera signal. An image processing method for performing camera signal segmentation is used as a baseline for comparison. The proposed method, AI model weights, and the EPIC dataset are made publicly available, to facilitate widespread adoption and further research in AI for endoscopy.
Materials and methods
Endoscopic camera signal extraction pipeline
The proposed method takes any endoscopic image as input and predicts a binary mask that segments the camera signal in the input. To reduce any prediction noise, only the largest connected component of the prediction is considered. Based on this, a minimum dimension sub-image containing the camera signal is extracted. In this work, picture-in-picture mode (PiP) is considered part of the camera signal.
The AI utilizes the UNet architecture [36] with an Efficient Net [37] as backbone. Training data for the model comprised of 1.765 manually annotated endoscopic images extracted from examination videos recorded between January 15th, 2019, and January 31st, 2022, in four different endoscopic centers. Gold standard binary semantic segmentation masks for each image were manually created to delineate the camera signal– where a value of 1 indicated pixel inclusion within the camera signal and 0 otherwise. The training data included a range of image quality levels from clear, high-quality images to low quality, blurry images. This diverse set of indicative images reflects the varied conditions encountered during endoscopy. The training images were pseudo-anonymized according to standard practice for patient data and contained no patient identifying information. An in-depth presentation of the model selection process and training details are presented in S1 File.
Generation of the “Endoscopic Processors Image Collection” dataset
The EPIC dataset contains raw endoscopic images captured across four German hospitals with backgrounds of either white or a combination of blue and green hues. Background color selection facilitates clear identification and separation of camera signals against any borders. Example of the recording setup from one of the Hospitals is shown in Fig 1. Such images were gathered for multiple combinations of endoscopic processors and endoscopes. Given that the screen aspect ratio can significantly influence the final appearance of these images, various aspect ratios—16:9, 4:3, and 5:4—were also used to collect representative examples. For applications involving capsule endoscopy or cholangioscopy, where acquiring new images was not always feasible, only existing images that matched the above description and contained no patient identifying data were included in the dataset. In addition to the raw images, the EPIC dataset includes manually generated binary masks that precisely delineate the camera signal’s position on each image.
Fig 1. Example of one of the examination room setups used in obtaining images for the EPIC dataset.
Displayed are three different Pentax (PENTAX Europe GmbH, Hamburg, Germany) processor generations with endoscopes attached. On the examination table are the two different textures used for obtaining the images.
No centers contributed images for both the EPIC dataset and training of the proposed AI. This guarantees that the EPIC dataset remains distinct and valid for performance evaluation purposes. For a detailed breakdown of the EPIC dataset’s contents, including processor-specific data and capsule-endoscopy-related information, readers are referred to S1 Table for endoscopic processor data and S2 Table for data from capsule endoscopy.
Method evaluation
The proposed method was evaluated on two different sets of endoscopic images. The first set consists of 641 images from five publicly available datasets [38–42] and the second set is the EPIC dataset. Gold standard was manually annotated for all images using binary masks. The mask had a value of 1 for pixels belonging to the camera signal and 0 otherwise. The method was evaluated by means of intersection over union (IoU) and Hausdorff distance (HD). IoU is defined as the percentage overlapping with the gold standard. The HD, in the context of semantic segmentation, is defined as follows. Let and be two binary masks. The one-sided HD from to is defined as . This can be interpreted as the maximum distance from a point to the closest point . The HD of and is defined as the maximum of the two one-sided HDs, that is [43,44]. Thus, the HD between the two masks can be seen as the greatest distance from a point in either mask to its nearest point in the other mask.
To evaluate the efficiency of our proposed method, we compared our method with the currently used standard, which is described in section 3.2 of [34], where the data pre-processing pipeline is presented. This image processing method first extracts the bright intensity from the image, compares each pixel to a threshold value and calculates the largest connected component in the image, which represents the location for the camera signal.
Statistical analysis
Statistical analysis was performed using the SciPy [45] library for Python. The mean value and 95% confidence intervals (CI) around it are calculated for the IoU and HD. Additionally, since the measurements are continuous pairs of not normally distributed data, the Wilcoxon test with a significance level of 0.05 is employed for comparing the proposed method and baseline performance.
Statement of ethics
Prospective collection of endoscopic examinations during clinical routine was approved by the local ethical committee responsible for each study center (Ethik-Kommission Landesärztekammer Baden-Württemberg (F-2021–042. F-2020–158), Ethik-Kommission Landesärztekammer Hessen (2021–2531), Ethik-Kommission der Landesärztekammer Rheinland-Pfalz (2021–15677) and Ethik-Kommission University Hospital Würzburg (12/20, 20200114 04)). All procedures were in accordance with the Helsinki Declaration of 1964 and later versions. Signed informed consent from each patient where data collection was performed prospectively was obtained prior to participation.
Results
Composition of the EPIC dataset
The EPIC dataset comprises 267 raw endoscopic images, along with manually extracted, gold standard masks that indicate the location of the camera signal. The raw data were captured using nine different endoscopic processors: Olympus CV-180, CV-190, and CV-1500 (Olympus Europa SE & Co. KG, Hamburg, Germany), Storz Image-1S (KARL STORZ SE & Co. KG, Tuttlingen, Germany), Pentax EPK-i, EPK-i7000, and EPK-i7010 (PENTAX Europe GmbH, Hamburg, Germany), and Fujifilm VP-4450HD and VP-7000 (FUJIFILM Europe GmbH, Düsseldorf, Germany). The number of processor-endoscope combinations varied up to 23 for one included processor. Furthermore, for 14 processor-endoscope combinations, images with different aspect ratios, namely 16:9, 4:3, and 5:4, are included.
Additionally, the dataset includes images captured using four different capsule endoscopy devices from two different manufacturers: OMOM (JINSHAN Science & Technology (Group) Co., Ltd., 118 Nishang Road, Yubei, Chongqing, China) and Medtronic (MEDTRONIC TRADING NL B.V., Larixplein 4 5616 VB Eindhoven, The Netherlands). Indicative examples of endoscopic images from the EPIC dataset are presented in Fig 2.
Fig 2. Image examples of images from the EPIC dataset with their capturing devices.
The images are displayed in their original resolution.
Evaluation of the proposed method
When tested with 641 endoscopic images from publicly available datasets, the proposed method achieved an IoU score of 0.97 (95% CI: 0.969–0.971), which was significantly higher than the mean IoU of 0.939 (95% CI: 0.932–0.946) achieved by the baseline method (p < 0.001). Additionally, the proposed method achieved a mean HD of 21 pixels (95% CI: 20–23), which was significantly lower than the mean HD of 51 pixels (95% CI: 45–57) achieved by the baseline (p < 0.001). The distributions of IoU and HD values are illustrated in Fig 3.
Fig 3. Performance comparison of the proposed and baseline methods when extracting the endoscopic camera signal for images from public datasets.
The distributions of intersection over union (left) and Hausdorff distance (right) values on the test dataset are compared.
On the EPIC dataset, our method achieved a mean IoU of 0.962 (95% CI: 0.955–0.969), which was higher but comparable to the mean IoU of 0.954 (95% CI: 0.946–0.962) for the baseline method (p = 0.68). For HD, our method achieved a mean of 40 pixels (95% CI: 34–46), which was significantly lower than the mean HD of 52 pixels (95% CI: 44–60) of the baseline p = 0.02. The distribution of the results is shown in Fig 4, and the evaluation across different endoscopic processors in terms of IoU and HD is presented in S1 Fig.
Fig 4. Performance comparison of the proposed and baseline methods when extracting the endoscopic camera signal from images of the EPIC dataset.
The distributions of intersection over union (left) and Hausdorff distance (right) values on the test dataset are compared.
Examples of applying the proposed method to endoscopic images from publicly available datasets are shown in Fig 5, where each row corresponds to a different test image. In the first column, the mask for the camera signal is overlaid with the image. In the second column, the results of comparing the predicted mask with the gold standard are displayed. Green color indicates true positives, red color false positives and blue color false negatives. In the third column, the camera signal is marked with a green bounding box and finally, the fourth column depicts the result of the extraction of the endoscopic image.
Fig 5. Examples of application of the proposed method to images from the test dataset.
Each row corresponds to a different example. The first column overlays the predicted mask on the original image. The second column evaluates the predicted mask, with green color indicating true positives, red color false positives and blue color false negatives. The third column displays the original image, and the green box indicates the minimum sub-image containing the camera signal, as obtained with the proposed method. The last column displays the extracted endoscopic image.
Finally, the time required for the proposed method to extract the camera signal was investigated. In images from public datasets, the mean execution time per image was 0.011 seconds which is close to 90 frames-per-second (fps). For images on the EPIC datasets, which are in general of higher resolution, the mean execution time per image was 0.018 seconds, or 52 fps. The results were obtained using an NVIDIA GeForce RTX 3080 Ti (NVIDIA Corporation, 2788 San Tomas Expressway, Santa Clara, CA 95051, USA). Both cases indicate that the model can achieve real-time performance.
Discussion
Although AI methods have been presented as highly beneficial for physicians, especially in the field of endoscopy, it remains challenging to integrate these models into clinical settings due to significant variability among endoscopic images used as input. This variability is primarily attributed to differences in the hardware, such as endoscopic processors and endoscopes, and software such as the device used to record the examination.
The main contribution of this work is the development of an AI-based method that, given raw endoscopic image data as input, efficiently extracts the location of the endoscopic camera image in a manner that is agnostic to the source, that is independent of the specific endoscopic hardware or software used to capture the images. The generated model and trained weights are made publicly available under https://www.kaggle.com/m/273059 (https://doi.org/10.34740/KAGGLE/M/273059), to support broader adoption from the research community.
Furthermore, we created a dataset of raw endoscopic images called the EPIC dataset. Our goal with this dataset is threefold: to highlight the substantial diversity present in endoscopic images, to determine the causes for their variability, and to catalog these diverse images in a standardized format. The EPIC dataset includes 267 raw endoscopic data captured using nine different processors, multiple different endoscopes, and four different capsule devices used for capsule endoscopy. The variety of combinations between hardware and recording options used makes the proposed dataset ideal for illustrating the wide range of shapes and forms in which the camera signals in displayed in raw endoscopic images. Moreover, the EPIC dataset contains manually annotated binary masks indicating the location of the camera signal for each image. The diversity of the dataset can be further highlighted considering the timespan of endoscopic images covered, with the oldest endoscopic processor included was introduced in 2006 whereas the newest one was launched in 2020.The EPIC dataset is made available under https://doi.org/10.34740/kaggle/dsv/11103826
The proposed method was evaluated using images from publicly available endoscopic datasets and the EPIC dataset. In segmenting the camera signal from public dataset images, our AI achieved a mean IoU 0.97 and mean HD of just 21 pixels, which are significantly better compared to the mean IoU of 0.939 (p < 0.01) and mean HD of 51 pixels (p < 0.01) that the baseline achieved. On the EPIC dataset, the proposed method achieved a mean IoU of 0.962 which was comparable to the 0.954 of the baseline (p = 0.68) and HD of 40 pixels which was significantly lower than the 52 pixels of the baseline (p = 0.02).The images of the EPIC dataset highly benefit the baseline method, as they are captured so that the camera signal has a color easily distinguishable from the background. Therefore, the baseline method was expected to achieve top results when evaluated on the EPIC dataset. Furthermore, the AI model has never encountered similar looking images during its training process.
Robust and reliable automatic extraction of the endoscopic camera signal from the raw data is necessary in the development of AI for colonoscopy [26]. In this direction, Yao H. and colleagues performed the cropping of the endoscopic image by binarizing it and detecting the largest 4-connected component [34], which served as a baseline for comparison of the proposed method. In [35] endoscopic images are cropped by first converting them in grayscale and then extracting a circular region with its center matching the center of the image. This method results, by design, in the loss of information from the endoscopic image. Finally, the work closest to our results is that of [23], where a U-Net model is used to determine the location of the camera signal in the endoscopic image. Yet, the work did not evaluate the model performance, especially how it performs on images from different sources and did not make their model available to the public. In terms of creating images with masks indicating the camera signal, Sanchez-Peralta et. al. proposed a dataset of raw endoscopic images, where masks for the camera signal are also provided [46]. Yet, all the images come from one processor, and extraction of the mask for the endoscopic image is manually performed. Finally, several works effectively extract the endoscopic camera signal for surgical videos, where the part of the image containing the signal is assumed to be circular [31–33]. Yet, the assumption of a circular camera signal is not present in the general endoscopic images, where the shape ranges from ellipse to polygonal.
We believe that our method can significantly enhance the generalizability of AI models trained for endoscopy. By integrating the proposed method into existing and future AI models, we achieve a streamlined process. First, raw endoscopic images are processed with our method to extract only the camera signal. This contains all relevant information necessary for the AI model while minimizing irrelevant data, thereby standardizing the input images. The standardized image is then used as input to the AI. This way, AI input includes all relevant and only a minimum amount of irrelevant information, independent from their source. This further contributes to standardizing endoscopic images, a process that has been already shown to significantly impact AIs, for example in esophageal cancer detection [47] and improved mucosal visualization in capsule endoscopy [48]. Thus, introduction of our method can improve generalizability of AI for endoscopy to data from any sources without any additional overhead. Furthermore, we believe that the proposed method can easily find successful application in pipelines for pre-processing endoscopic image data and AI model training. The fact that the image area to be cropped is selected as the minimum area rectangle containing all the mask pixels, together with the high performance of the AI suggests that no loss of relevant information occurs from using the proposed method.
Inclusion of data from different sources has already proved beneficial in training AI models that find applications in clinical practice. As an example, diversity in data training allowed successful application of an AI in clinical routine in multiple different endoscopy centers in [49], achieving high performance despite usage of different hardware. Furthermore, studies have incorporated images from publicly available datasets as external validation, where standardization of images from different sources plays a central role [21]. Finally, our proposed method can address the challenge of varying endoscopic equipment across different centers, facilitating the execution of multicenter studies for AI in endoscopy.
Endoscopy is a rapidly developing field, especially in terms of hardware. Furthermore, there are certainly sources of data that are rare, and thus obtaining relevant images can be harder. Considering this, the ability of the proposed method to remain performant when used with data from newer devices is significant. We are confident that the model can maintain high performance, as its evaluation was performed on two datasets that had no overlap with the training dataset. Furthermore, we make the model publicly available and welcome data submission from researchers to further develop and improve model performance. Any newer versions of the model weights will also be made available, acknowledging all data contributions as well.
There are also some limiting factors for this work. Endoscopic data usually contains elements such as motion blur and artefacts, that could be disrupting to the AIs performance. To mitigate this problem, such data were also annotated and included in the training dataset. Furthermore, in the case of video data, the cropping dimensions can be obtained via averaging of sequential frames, enabling removal of any outliers. Another limitation is that the EPIC dataset does not cover the whole spectrum of existing combinations. To address this issue, we plan to keep updating the EPIC dataset, as well as welcome and acknowledge image contributions that extend it.
Conclusions
In this work, we propose an AI-based method that effectively extracts camera signals from raw endoscopic data, independent of the endoscopy hardware and software used to record them. This can enhance the standardization of images used as input to AI models, thereby increasing their transferability and generalizability across diverse clinical settings. This is crucial for maintaining consistent, high-quality data with reduced variability issues that can arise when training AI models on diverse datasets. Moreover, the proposed method’s source agnosticism supports data sustainability by unifying diverse datasets into standardized format, simplifying their inclusion in AI training and evaluation pipelines.
Additionally, we generate a dataset, called EPIC, of standardized endoscopic images, attempting to highlight and collectively report the diversity that can be introduced by different endoscopic equipment.
By making both the proposed method and the EPIC dataset publicly available, we aspire to generate a collaborative environment where researchers can build upon these foundational resources to further advance their work in AI-driven endoscopic applications.
Supporting information
The EPIC dataset contains 267 images stored using 9 different endoscopic processors, several endoscopes and different recording aspect ratio settings. ERCP: Endoscopic Retrograde Cholangiopancreatography, EPIC: Endoscopic Processor Image Collection.
(DOCX)
Description of the different small bowel and colon video capsules included in the EPIC dataset. EPIC: Endoscopic Processor Image Collection.
(DOCX)
The subgroup analysis for each processor in terms of intersection over union and Hausdorff distance on the different processors included in the EPIC dataset is displayed. The mean value is indicated with a circle and lines depict 95% confidence intervals.
(TIFF)
Description of the pipeline and training for the proposed method.
(DOCX)
Data Availability
The proposed pipeline and model weights are made available for download from https://www.kaggle.com/m/273059 (DOI: 10.34740/KAGGLE/M/273059). The EPIC dataset proposed in the paper and used for method validation can be downloaded from here https://www.kaggle.com/dsv/11103826 (DOI: 10.34740/KAGGLE/DSV/11103826). The image data for external validation are parts of five different public datasets, namely Kvasir-Insturment https://datasets.simula.no/kvasir-instrument/, PolypGen https://www.synapse.org/Synapse:syn45200214, PolypSet https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FCBUOR, HyperKvasir https://datasets.simula.no/hyper-kvasir/, and the El Salvador atlas gastrointestinal video endoscopy https://www.gastrointestinalatlas.com/index.html.
Funding Statement
The authors AH and WGZ receive public funding from the state government of Baden-Württemberg, Germany (Funding cluster “Forum Gesundheitsstandort Baden-Württemberg”) to research and develop artificial intelligence applications for polyp detection in screening colonoscopy (funding number 5409.0–001.01/15). The funders had no role in study design, data collection and analysis, decision to publish, or presentation of the manuscript.
References
- 1.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. doi: 10.1038/s41591-018-0300-7 [DOI] [PubMed] [Google Scholar]
- 2.He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K. The practical implementation of artificial intelligence technologies in medicine. Nat Med. 2019;25(1):30–6. doi: 10.1038/s41591-018-0307-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1(6):e271–97. doi: 10.1016/S2589-7500(19)30123-2 [DOI] [PubMed] [Google Scholar]
- 4.Hann A, Troya J, Fitting D. Current status and limitations of artificial intelligence in colonoscopy. United European Gastroenterol J. 2021;9(5):527–33. doi: 10.1002/ueg2.12108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Brand M, Troya J, Krenzer A, Saßmannshausen Z, Zoller WG, Meining A. Development and evaluation of a deep learning model to improve the usability of polyp detection systems during interventions. United European Gastroenterol J. 2022;10(5):477–84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kamba S, Tamai N, Saitoh I, Matsui H, Horiuchi H, Kobayashi M, et al. Reducing adenoma miss rate of colonoscopy assisted by artificial intelligence: a multicenter randomized controlled trial. J Gastroenterol. 2021;56(8):746–57. doi: 10.1007/s00535-021-01808-w [DOI] [PubMed] [Google Scholar]
- 7.Li J, Lu J, Yan J, Tan Y, Liu D. Artificial intelligence can increase the detection rate of colorectal polyps and adenomas: a systematic review and meta-analysis. Eur J Gastroenterol Hepatol. 2021;33(8):1041–8. doi: 10.1097/MEG.0000000000001906 [DOI] [PubMed] [Google Scholar]
- 8.Glissen Brown JR, Mansour NM, Wang P, Chuchuca MA, Minchenberg SB, Chandnani M, et al. Deep learning computer-aided polyp detection reduces adenoma miss rate: a United States Multi-center Randomized Tandem Colonoscopy Study (CADeT-CS Trial). Clin Gastroenterol Hepatol. 2022;20(7):1499–507.e4. doi: 10.1016/j.cgh.2021.09.009 [DOI] [PubMed] [Google Scholar]
- 9.Repici A, Badalamenti M, Maselli R, Correale L, Radaelli F, Rondonotti E. Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial. Gastroenterology. 2020;159(2):512–20.e7. [DOI] [PubMed] [Google Scholar]
- 10.Weigt J, Repici A, Antonelli G, Afifi A, Kliegis L, Correale L, et al. Performance of a new integrated computer-assisted system (CADe/CADx) for detection and characterization of colorectal neoplasia. Endoscopy. 2022;54(2):180–4. doi: 10.1055/a-1372-0419 [DOI] [PubMed] [Google Scholar]
- 11.Wang P, Liu P, Glissen Brown JR, Berzin TM, Zhou G, Lei S, et al. Lower adenoma miss rate of computer-aided detection-assisted colonoscopy vs routine white-light colonoscopy in a prospective tandem study. Gastroenterology. 2020;159(4):1252–61.e5. doi: 10.1053/j.gastro.2020.06.023 [DOI] [PubMed] [Google Scholar]
- 12.Gimeno-García AZ, Negrin DH, Hernández A, Nicolás-Pérez D, Rodríguez E, Montesdeoca C. Usefulness of a novel computer-aided detection system for colorectal neoplasia: a randomized controlled trial. Gastrointest Endosc. 2022. doi: S0016-5107(22)02037-5 [DOI] [PubMed] [Google Scholar]
- 13.Soons E, Rath T, Hazewinkel Y, van Dop WA, Esposito D, Testoni PA, et al. Real-time colorectal polyp detection using a novel computer-aided detection system (CADe): a feasibility study. Int J Colorectal Dis. 2022;37(10):2219–28. doi: 10.1007/s00384-022-04258-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Levy I, Bruckmayer L, Klang E, Ben-Horin S, Kopylov U. Artificial intelligence-aided colonoscopy does not increase adenoma detection rate in routine clinical practice. Am J Gastroenterol. 2022;117(11):1871–3. [DOI] [PubMed] [Google Scholar]
- 15.Sudarevic B, Sodmann P, Kafetzis I, Troya J, Lux TJ, Saßmannshausen Z, et al. Artificial intelligence-based polyp size measurement in gastrointestinal endoscopy using the auxiliary waterjet as a reference. Endoscopy. 2023;55(9):871–6. doi: 10.1055/a-2077-7398 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kader R, Cid-Mejias A, Brandao P, Islam S, Hebbar S, González-Bueno Puyal J. Polyp characterisation using deep learning and a publicly accessible polyp video database. Dig Endosc. 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Lux TJ, Saßmannshausen Z, Kafetzis I, Sodmann P, Herold K, Sudarevic B, et al. Assisted documentation as a new focus for artificial intelligence in endoscopy: the precedent of reliable withdrawal time and image reporting. Endoscopy. 2023;55(12):1118–23. doi: 10.1055/a-2122-1671 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ebigbo A, Mendel R, Scheppach MW, Probst A, Shahidi N, Prinz F, et al. Vessel and tissue recognition during third-space endoscopy using a deep learning algorithm. Gut. 2022;71(12):2388–90. doi: 10.1136/gutjnl-2021-326470 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Römmele C, Mendel R, Barrett C, Kiesl H, Rauber D, Rückert T, et al. An artificial intelligence algorithm is highly accurate for detecting endoscopic features of eosinophilic esophagitis. Sci Rep. 2022;12(1):11115. doi: 10.1038/s41598-022-14605-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Konikoff T, Goren I, Yalon M, Tamir S, Avni-Biron I, Yanai H, et al. Machine learning for selecting patients with Crohn’s disease for abdominopelvic computed tomography in the emergency department. Dig Liver Dis. 2021;53(12):1559–64. doi: 10.1016/j.dld.2021.06.020 [DOI] [PubMed] [Google Scholar]
- 21.Kafetzis I, Fuchs K-H, Sodmann P, Troya J, Zoller W, Meining A, et al. Efficient artificial intelligence-based assessment of the gastroesophageal valve with Hill classification through active learning. Sci Rep. 2024;14(1):18825. doi: 10.1038/s41598-024-68866-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Kafetzis I, Sodmann P, Herghelegiu B-E, Brand M, Zoller WG, Seyfried F, et al. Prospective evaluation of real-time artificial intelligence for the hill classification of the gastroesophageal junction. United European Gastroenterol J. 2025;13(2):240–6. doi: 10.1002/ueg2.12721 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Qayyum A, Bilal M, Qadir J, Caputo M, Vohra H, Akinosho T. Segmentation-based dynamic cropping of endoscopic videos to address label leakage in surgical tool detection. 2023.
- 24.Waljee AK, Weinheimer-Haus EM, Abubakar A, Ngugi AK, Siwo GH, Kwakye G, et al. Artificial intelligence and machine learning for early detection and diagnosis of colorectal cancer in sub-Saharan Africa. Gut. 2022;71(7):1259–65. doi: 10.1136/gutjnl-2022-327211 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hassan C, Spadaccini M, Mori Y, Foroutan F, Facciorusso A, Gkolfakis P, et al. Real-time computer-aided detection of colorectal neoplasia during colonoscopy : a systematic review and meta-analysis. Ann Intern Med. 2023;176(9):1209–20. doi: 10.7326/M22-3678 [DOI] [PubMed] [Google Scholar]
- 26.Sierra-Jerez F, Ruiz J, Martinez F. A non-aligned deep representation to enhance standard colonoscopy observations from vascular narrow band polyp patterns. Annu Int Conf IEEE Eng Med Biol Soc. 2022;2022:1671–4. doi: 10.1109/EMBC48229.2022.9871752 [DOI] [PubMed] [Google Scholar]
- 27.Münzer B, Schoeffmann K, Böszörmenyi L. Content-based processing and analysis of endoscopic images and videos: a survey. Multimed Tools Appl. 2018;77(1):1323–62. [Google Scholar]
- 28.Helferty JP, Zhang C, McLennan G, Higgins WE. Videoendoscopic distortion correction and its application to virtual guidance of endoscopy. IEEE Trans Med Imaging. 2001;20(7):605–17. doi: 10.1109/42.932745 [DOI] [PubMed] [Google Scholar]
- 29.Stehle T, Hennes M, Gross S, Behrens A, Wulff J, Aach T. Dynamic distortion correction for endoscopy systems with exchangeable optics. In: Meinzer HP, Deserno TM, Handels H, Tolxdorff T, editors. Bildverarbeitung für die Medizin 2009. Berlin, Heidelberg: Springer; 2009: 142–6. [Google Scholar]
- 30.Meslouhi O, Kardouchi M, Allali H, Gadi T, Benkaddour Y. Automatic detection and inpainting of specular reflections for colposcopic images. Open Computer Science. 2011;1(3):341–54. doi: 10.2478/s13537-011-0020-2 [DOI] [Google Scholar]
- 31.Münzer B, Schoeffmann K, Böszörmenyi L. Detection of circular content area in endoscopic videos. In: Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems. 2013: 534–6. https://ieeexplore.ieee.org/document/6627865/?arnumber=6627865 [Google Scholar]
- 32.Budd C, Garcia-Peraza Herrera LC, Huber M, Ourselin S, Vercauteren T. Rapid and robust endoscopic content area estimation: a lean GPU-based pipeline and curated benchmark dataset. Computer Methods Biomechanics Biomed Eng. 2023;11(4):1215–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Munzer B, Schoeffmann K, Boszormenyi L. Improving encoding efficiency of endoscopic videos by using circle detection based border overlays. In: 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW). 2013: 1–4. doi: 10.1109/icmew.2013.6618304 [DOI] [Google Scholar]
- 34.Yao H, Stidham RW, Gao Z, Gryak J, Najarian K. Motion-based camera localization system in colonoscopy videos. Medical Image Analysis. 2021;73:102180. [DOI] [PubMed] [Google Scholar]
- 35.Pore A, Finocchiaro M, Dall’Alba D, Hernansanz A, Ciuti G, Arezzo A. Colonoscopy navigation using end-to-end deep visuomotor control: a user study. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2022: 9582–8. [Google Scholar]
- 36.Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Medical image computing and computer-assisted intervention – MICCAI 2015. Cham: Springer International Publishing; 2015: 234–41. [Google Scholar]
- 37.Tan M, Le QV. EfficientNet: rethinking model scaling for convolutional neural networks. arXiv. 2020. [cited 2023 December 13]. http://arxiv.org/abs/1905.11946 [Google Scholar]
- 38.Jha D, Ali S, Emanuelsen K, Hicks SA, Thambawita V, Garcia-Ceja E. Kvasir-instrument: diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy. In: Lokoč J, Skopal T, Schoeffmann K, Mezaris V, Li X, Vrochidis S, editors. MultiMedia modeling. Cham: Springer International Publishing; 2021: 218–29. [Google Scholar]
- 39.Ali S, Jha D, Ghatwary N, Realdon S, Cannizzaro R, Salem OE, et al. A multi-centre polyp detection and segmentation dataset for generalisability assessment. Sci Data. 2023;10(1):75. doi: 10.1038/s41597-023-01981-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Wang G. Replication data for: colonoscopy polyp detection and classification: dataset creation and comparative evaluations. Harvard Dataverse. 2021. https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/FCBUOR [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Borgli H, Thambawita V, Smedsrud PH, Hicks S, Jha D, Eskeland SL, et al. HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci Data. 2020;7(1):283. doi: 10.1038/s41597-020-00622-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Murra-Saca JA. El Salvador atlas gastrointestinal video endoscopy. 2005. https://www.gastrointestinalatlas.com/index.html [Google Scholar]
- 43.Rockafellar RT, Wets RJB. Variational Analysis. Springer Berlin Heidelberg. 1998. doi: 10.1007/978-3-642-02431-3 [DOI] [Google Scholar]
- 44.Karimi D, Salcudean SE. Reducing the Hausdorff Distance in medical image segmentation with convolutional neural networks. IEEE Trans Med Imaging. 2020;39(2):499–513. doi: 10.1109/TMI.2019.2930068 [DOI] [PubMed] [Google Scholar]
- 45.Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261–72. doi: 10.1038/s41592-019-0686-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Sánchez-Peralta LF, Pagador JB, Picón A, Calderón ÁJ, Polo F, Andraka N. PICCOLO white-light and narrow-band imaging colonoscopic dataset: a performance comparative of models and datasets. Appl Sci. 2020;10(23):8501. [Google Scholar]
- 47.Chen Y-C, Karmakar R, Mukundan A, Huang C-W, Weng W-C, Wang H-C. Evaluation of Band Selection for Spectrum-Aided Visual Enhancer (SAVE) for esophageal cancer detection. J Cancer. 2025;16(2):470–8. doi: 10.7150/jca.102759 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Wang YP, Karmakar R, Mukundan A, Tsao YM, Sung TC, Lu CL. Spectrum aided vision enhancer enhances mucosal visualization by hyperspectral imaging in capsule endoscopy. Sci Rep. 2024;14(1):22243. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Lux TJ, Banck M, Saßmannshausen Z, Troya J, Krenzer A, Fitting D, et al. Pilot study of a new freely available computer-aided polyp detection system in clinical practice. Int J Colorectal Dis. 2022;37(6):1349–54. [DOI] [PMC free article] [PubMed] [Google Scholar]





