Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2022 Dec 20;5:184. doi: 10.1038/s41746-022-00733-3

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2022

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

PMC Copyright notice

Fig. 3 — a A conceptual representation of a deep learning system with an optimiser for minimising a loss function. A simplified convolutional neural network (CNN) block comprising a 3 × 3 kernel and computed weight for each pixel with kernel weights and bias is provided. It also demonstrates a non-linear activation function applied to capture more complex features. The training and test phase consists of split datasets where the validation set is used to guarantee that the learnt parameters generalise and do not overfit the training dataset. A model over-fitting graph is shown that is regulated using a validation set. b Some widely used deep learning architectures are shown for various tasks in endoscopic image analysis. For the classification network, only an encoder network is used that is usually followed by a classifier such as softmax³. For detection, features are extracted using an encoder network, which is then pulled using a region proposal network to predict both the class and the bounding box representations¹²⁸. For semantic segmentation, the encoder features are up-scaled to image size per-pixel classification. Similarly, for the instance-segmentation task, both the region proposals for bounding boxes and per-pixel predictions for masks are used¹³¹. The idea of a depth estimation network is to understand how far the camera is from an anatomical region providing distances in the real-world coordinate system²². Finally, recurrent neural networks (aka RNNs) can embed temporal video information to refine current predictions from a CNN network⁶⁴. Here the sequential frame inputs v₁,.., v_n are fed to the CNN network producing visual feature vectors p₁,..., p_n, which are then fed to the RNN network. The RNNs output represents the temporal relationship providing context-aware predictions for each frame such that the output for the nth frame q_n is dependent on both current and previous frames, i.e., feature vectors q(V_n) and all other previous feature vectors q(V_u), u < n. Both CNN and RNN networks are jointly optimised using boosting strategy. The sources of relevant endoscopy images: gastroscopy and colonoscopy images in (a and b) are acquired from Oxford University Hospitals under Ref. 16/YH/0247 and forms part of publicly released endoscopy challenge datasets (EDD2020¹²⁷ under CC-by-NC 4.0 and PolypGen¹²⁸ under CC-by, Dr S. Ali is the creator of both datasets). Surgical procedure data are taken from ROBUST-MIS¹¹³.