Comparison of diagnostic performance between convolutional neural networks and human endoscopists for diagnosis of colorectal polyp: A systematic review and meta-analysis

Yixin Xu; Wei Ding; Yibo Wang; Yulin Tan; Cheng Xi; Nianyuan Ye; Dapeng Wu; Xuezhong Xu

doi:10.1371/journal.pone.0246892

. 2021 Feb 16;16(2):e0246892. doi: 10.1371/journal.pone.0246892

Comparison of diagnostic performance between convolutional neural networks and human endoscopists for diagnosis of colorectal polyp: A systematic review and meta-analysis

Yixin Xu ¹, Wei Ding ¹, Yibo Wang ¹, Yulin Tan ¹, Cheng Xi ¹, Nianyuan Ye ¹, Dapeng Wu ², Xuezhong Xu ^1,^*

Editor: Ping He³

PMCID: PMC7886136 PMID: 33592048

Abstract

Prospective randomized trials and observational studies have revealed that early detection, classification, and removal of neoplastic colorectal polyp (CP) significantly improve the prevention of colorectal cancer (CRC). The current effectiveness of the diagnostic performance of colonoscopy remains unsatisfactory with unstable accuracy. The convolutional neural networks (CNN) system based on artificial intelligence (AI) technology has demonstrated its potential to help endoscopists in increasing diagnostic accuracy. Nonetheless, several limitations of the CNN system and controversies exist on whether it provides a better diagnostic performance compared to human endoscopists. Therefore, this study sought to address this issue. Online databases (PubMed, Web of Science, Cochrane Library, and EMBASE) were used to search for studies conducted up to April 2020. Besides, the quality assessment of diagnostic accuracy scale-2 (QUADAS-2) was used to evaluate the quality of the enrolled studies. Moreover, publication bias was determined using the Deeks’ funnel plot. In total, 13 studies were enrolled for this meta-analysis (ranged between 2016 and 2020). Consequently, the CNN system had a satisfactory diagnostic performance in the field of CP detection (sensitivity: 0.848 [95% CI: 0.692–0.932]; specificity: 0.965 [95% CI: 0.946–0.977]; and AUC: 0.98 [95% CI: 0.96–0.99]) and CP classification (sensitivity: 0.943 [95% CI: 0.927–0.955]; specificity: 0.894 [95% CI: 0.631–0.977]; and AUC: 0.95 [95% CI: 0.93–0.97]). In comparison with human endoscopists, the CNN system was comparable to the expert but significantly better than the non-expert in the field of CP classification (CNN vs. expert: RDOR: 1.03, P = 0.9654; non-expert vs. expert: RDOR: 0.29, P = 0.0559; non-expert vs. CNN: 0.18, P = 0.0342). Therefore, the CNN system exhibited a satisfactory diagnostic performance for CP and could be used as a potential clinical diagnostic tool during colonoscopy.

Introduction

Based on 2018 reports, colorectal cancer (CRC) had approximately 1,800,000 new cases and 881,000 deaths, implying 1 in 10 cancer cases and deaths [1]. Approximately 85% of CRCs developed from precancerous polyps through genetic and epigenetic mechanisms with a mean dwell time of at least 10 years [2, 3]. Therefore, early and precise detection of colorectal polyp (CP) has a great significance in the prevention of CRC. Notably, colonoscopy is the most effective and essential method in the early diagnosis and prevention of CRC through detection and removal of the neoplastic lesion before its progression to invasive cancer [4]. Reports indicate that the CRC incidence of individuals taken single negative screening colonoscopy was lower by 72% and CRC mortality by 81% than in the general population [5]. Meanwhile, the removal of colorectal polyps could significantly reduce the risk of CRC [6]. Thus, achieving a better diagnostic accuracy of CP for their prevention and better treatment is critical.

Pathologically, CP can be categorized into inflammatory polyp sessile, hyperplastic polyp, serrated adenoma polyp (SSAP), and adenoma [7]. The risk of developing CRC is different for each classification. For instance, several studies have shown that adenoma, similar to SSAP, has the highest risk of developing and progressing to CRC. In contrast, hyperplastic and inflammatory polyp are hardly to develop to CRC [7, 8]. Therefore, how to accurately classify CP remains vital for both the endoscopists and patients, since precise differentiation of CP minimizes unnecessary endoscopic resection, subsequently decreasing the incidence of surgical complications, medical costs, and labor burden of doctors [9].

Despite colonoscopy being effective in the early diagnosis of CRC, it remains imperfect and has several fundamental limitations. First, it has a relatively- high rate of misdiagnosis [10]. Secondly, a few neoplastic lesions remain difficult to detect, even for expert endoscopists [11]. Additionally, the task is time-consuming for the endoscopists and labor-intensive which can result higher costs, specifically in countries with large populations. Lastly, the diagnostic performance of colonoscopy highly banks on the working experience of endoscopists, which varies among individuals. This implies that the diagnostic accuracy of colonoscopy is unstable.

To resolve these shortcomings, several studies have reported the application of artificial intelligence to improve medical diagnosis. For example, convolutional neural networks (CNN) have recently shown significant potential to assist endoscopists causing increased diagnostic accuracy of CP during colonoscopy [12]. Besides, CNN is a type of the most common network architectures of deep learning (DL) methods based on artificial intelligence (AI) technology. Moreover, additional studies showed that the CNN system could automatically classify CP based on its morphological features. It is significantly helpful in the therapeutic decision-making process during colonoscopy [13–15]. Nevertheless, this technology has not reached maturity. Also, a majority of controversies exist on whether the CNN system provides a better performance than the human endoscopists, and whether it is worthy of popularizing.

Here, we compared the diagnostic performance between the CNN system and human endoscopists in the field of CP detection and classification.

Materials and methods

Literature search strategy

A systematic literature search was conducted online for studies that assessed the diagnostic value of the CNN system used in the field of colonoscopy for colorectal polyp detection and classification. PubMed, Web of Science, Cochrane Library, and EMBASE databases (up to April 30, 2020) were used during the search with the combination of the following terms: ([“artificial intelligence”] OR [“convolutional neural networks”] OR [“deep learning”] OR [“computer-aided”]) AND ([“colonoscopy”] OR [“endoscopy”]) AND ([“colon”] OR [“rectum”] OR [“colonic”] OR [“rectal”] OR [“colorectal”]) AND ([“polyp”] OR [“polyps”]).

All article sections were carefully reviewed. Subsequently, bibliographies of the retrieved articles were screened to identify any potential source of relevant studies.

Study selection

The inclusion criteria included (1) studies that included patients with CP; (2) colonoscopy was performed to detect or classify colorectal polyps; (3) CNN system was applied to improve the diagnostic performance of colonoscopy; (4) precise diagnostic data were presented in the article; (5) if the colorectal polyps were classified, the final pathology results were provided. On the other hand, the exclusion criteria included (1) the types of articles were abstracts, reviews, letters, comments, and case reports; (2) precise data were unavailable in the article; (3) animal studies and non-English publications.

Data extraction and quality assessment

A total of 2 independent researchers (Ye and Xi) conducted the data extraction from the included studies. The information of enrolled studies included the first author’s name, publication year, country, diseases concerned, training material, testing material, types of diagnostic performance, and diagnostic performance of the CNN system, expert, and non-expert. The diagnostic performance was categorized as true-positive (TP), false-positive (FP), true-negative (TN), and false-negative (FN) and was retrieved from each article. Moreover, if there was any inconsistency between the 2 reviewers (Ye and Xi), a discussion was conducted including a third investigator to resolve the problem.

Renner et al. [15], provided data of “standard-confidence predictions” and “high-confidence predictions”. We found that including both of them might introduce the potential of duplication of data. After careful consideration, the “standard-confidence predictions” of data were included. Guo et al. [16] provided data of per-frame and per-video and data of per-frame was selected. This was because, first, nearly all of the articles enrolled for analysis used colonoscopy images instead of videos. To ensure the consistency of the whole analysis, the data of per-frame was selected; secondly, the authors did not provide enough per-video data for analysis. Additionally, Wang et al. [12], used 4 datasets to validate the diagnostic performance of the CNN system. However, precise data were only provided in Dataset A. As a result, this study chose to include the data of Dataset A. Kudo et al. [17], provided both white-light (WLI) and narrow-band image (NBI) for each lesion and tested CNN system in different imaging models. Including both of WLI and NBI images might cause duplication of data. As a result, we deleted the data of NBI images. However, Renner et al. [15], Kudo et al. [17], and Ozawa et al. [18] have the data of diminutive CPs. We thought it was not appropriate to add theme to the general analysis for the potential risk of duplication of data. We initially wanted to perform a subgroup analysis for them, but the STATA software could not do any analysis with sample size smaller than 4.

The methodological quality and applicability of the studies included were evaluated using the quality assessment of diagnostic accuracy scale-2 (QUADAS-2) [19].

Outcomes of interests

First, pooled sensitivity, specificity, and other diagnostic indices were calculated based on the value of TP, FP, TN, and FN, among CNN system, expert, and non-expert. Secondly, the diagnostic odds ratio (DOR) and the area (AUC) under the summary receive operating characteristic (SROC) curve, which represented overall diagnostic performance, were examined and compared among different groups. Finally, to identify whether the differences in diagnostic performance were statistically significant, the relative diagnostic odds ratio (RDOR) was compared between each of the 2 groups (CNN system vs. expert; CNN system vs. non-expert; expert vs. non-expert).

Statistical analyses

Statistical analyses were performed to establish the diagnostic efficacy. The sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), DOR, the AUC of SROC, and RDOR were pooled with their 95% confidence interval (CI). A diagnostic tool was considered to have a strong diagnostic value, if its PLR was above 5 and NLR was below 0.2 [20]. The heterogeneity among studies was evaluated by Cochran Q and Higgins’ I² statistics [21]. If the value of I² was more than 50%, and the value of P less than 0.05, indicating statistically significant heterogeneity existed, a random-effect model was selected for pooling the data [22]. Otherwise, a fixed-effect model was utilized.

SROC was estimated based on the Moses-Littenberg method [23]. Based on the AUC of SROC, the overall diagnostic performance was categorized into 4 levels, including reasonable (<0.75), good (0.75–0.92), very good (0.93–0.96), and excellent (≥0.97) [24].

RDOR was compared between each 2 groups to identify statistically significant differences of diagnostic performance and was based on multivariate meta-regression analysis [25, 26]. The Deeks’ funnel plot was used to assess the publication bias.

Pooled sensitivity, specificity, accuracy, PLR, NLR, DOR, and AUC of SROC were calculated using Stata version 14.0. QUADAS-2 assessment was performed using Review Manager version 5.3. The result with a P-value of less than 0.05 (p<0.05) was considered statistically significant.

Results

Search strategy

Following the initial search through the different databases, a total of 189 articles were identified (102 in PubMed, 31 in Web of Science, 12 in Cochrane Library, and 44 in EMBASE). First, 146 duplicate studies were removed and the remaining 43 articles were screened. In total, 15 articles, including non-English publications, reviews, abstracts, and case reports, which did not meet the inclusion criteria were excluded. Subsequently, the articles with imprecise data and irrelevant subjects were excluded after full-text articles were assessed. Eventually, 13 studies were enrolled in this meta-analysis [12–18, 27–32] (Fig 1). PRISMA flow diagram and checklist are shown in S1 and S2 Tables, respectively.

Cohort characteristics and quality of included studies

Among the enrolled studies, 7 focused on the field of CP detection, while other studies focused on the field of CP classification. Among these, 5 studies conducted in Japan, 4 in China, 1 in Germany, one in the USA, 1 in Norway, and 1 in Canada, respectively. All articles were published in the last four years (Table 1). Meanwhile, all studies included precise data about the diagnostic performance of the CNN system; 4 studies provided precise data about the performance of experts, and 3 studies provided precise data on the performance of non-expert. All the data about the diagnostic performance of human endoscopists are in the field of CP classification. Histological examination results were the golden standard in the studies done about CP classification. The diagnostic performance was categorized as TP, FP, FN, and TN (Table 2).

Table 1. Characteristics of the studies included.

Author	Year	Country	type of endoscopes	type of CNN system	real-time use of CNN system	type of lesions	type of images	Training material	Testing material	Field focused	Testing objects
Lequan [28]	2016	China	images from online database	3D fully convolutional neural networks	N/A	polyps of any size	N/A	Images	Images	Detection	CNN
Byrne [27]	2017	Canada	190 series colonoscopes (Olympus)	deep convolutional neural networks	Yes	diminutive polyps	NBI	Videos	Videos	Classification	CNN
Chen [13]	2018	China	CF-H260AZI, PCF-Q260AZI,CF-HQ290AZI (Olympus)	N/A	N/A	diminutive polyps	NBI	Images	Images	Classification	CNN/Expert/Non-expert
Wang [12]	2018	China	Olympus Evis Lucera CV260 (SL)/CV290 (SL) and Fujinon 4400/4450 HD	N/A	Yes	polyps of any size	N/A	Images	Images	Detection	CNN
Renner [15]	2018	Germany	Olympus Evis Exera III CF–HQ 190 colonoscopes	computer-assisted optical biopsy	N/A	polyps of any size	WLI/NBI	Images	Images	Classification	CNN/Expert
Mori [14]	2018	Japan	CFH290ECI colonoscopes (Olympus)	N/A	Yes	diminutive polyps	NBI and methylene blue staining modes	Images	Images	Classification	CNN/Expert/Non-expert
Shin [29]	2018	Norway	images from online database	N/A	N/A	polyps of any size	N/A	Images	Images	Classification	CNN
Urban [30]	2018	USA	PCF-H190 colonoscopes(Olympus)	VGG16,VGG19,and ResNet50	Yes	polyps of any size	N/A	Images	Images	Detection	CNN
Zhang [32]	2018	China	images from online database	ResYOLO	Yes	polyps of any size	N/A	Images/	Images	Detection	CNN
Yamada [31]	2019	Japan	images from online database	Faster R-CNN with VGG16	Yes	polyps of any size	N/A	Images	Images	Detection	CNN
Kudo [17]	2019	Japan	CF-H290ECI(Olympus)	EndoBRAIN	N/A	polyps of any size	WLI/NBI	Images	WLI/NBI images	Detection	CNN/Expert/Non-expert
Guo [16]	2020	Japan	Fujinon 4450 HD	YOLOv3	Yes	polyps of any size	WLI/NBI	Images	Short/full videos	Detection	CNN
Ozawa [18]	2020	Japan	Evis Lucera and CF Type H260AL/I, PCF Type Q260AI, Q260AZI, H290I, and H290ZI (Olympus)	Single Shot MultiBox Detector	N/A	polyps of any size	WLI/NBI	Images	Images	Classification	CNN

Open in a new tab

CNN: convolutional neural networks; NBI: Narrow band imaging; ResYOLO: residual learning modules based on YOLO; YOLO: a CNN system named you only look once; WLI: White light imaging

Table 2. A. Diagnostic performance of CNN system, expert, and non-expert in the field of polyp detection. B. Diagnostic performance of CNN system, expert, and non-expert in the field of polyp classification.

A
Author	Different grouping standard	CNN system				Expert				Non-expert
		TP	FP	TN	FN	TP	FP	TN	FN	TP	FP	TN	FN
Lequan [28]		3062	414	9260	1251
Wang [12]		6404	881	20691	2345
Urban [30]		7127	83	1203	228
Zhang [32]		3087	398	13057	1226
Yamada [31]		732	41	4094	20
Guo [16]	Short videos	2112	642	21692	1608
	Full videos	37938	5590	78658	5672
B
		TP	FP	TN	FN	TP	FP	TN	FN	TP	FP	TN	FN
Byrne [27]		104	2	19	4
Chen [13]	Group 1	181	21	75	7	183	22	74	5	183	29	67	5
	Group 2					184	33	63	4	176	33	63	12
	Group 3									154	22	74	34
	Group 4									158	11	85	30
Renner [15]		48	18	30	4	48	12	36	4
Mori [14]	Proximal-rectosigmoid	167	9	21	12	300	12	48	58	278	20	40	80
	Rectosigmoid	95	6	135	5	176	14	268	24	161	30	252	39
Kudo [17]		1260	0	700	40	603	20	330	20	920	40	460	380
Ozawa [18]		1073	175	74	99
Shin [29]		180	13	157	16

Open in a new tab

CNN: convolutional neural networks; FN: false-negative; FP: false-positive; NBI: narrow band imaging; TN: true negative; TP: true positive; WLI: white light imaging.

Based on the QUADAS-2 assessment, the quality of all 13 studies included was considered moderate (Fig 2). A total of 11 studies were considered high-quality with low risk in at least 5 of the 7 QUADAS-2 domain. For the patient selection domain, 2 studies introduced bias because case-control design was avoided [18, 27]. Moreover, 3 studies showed a high concern for applicability [13, 17, 18]. Subsequently, for the index test domain, 2 studies had a high risk of bias [13, 17]. Finally, there was only one study that had a high concern regarding reference standard applicability [14].

Application in the field of colorectal polyp detection diagnostic performance of CNN system

The results of the diagnostic performance of the CNN system are shown in Fig 3. The pooled sensitivity and specificity were 0.848 (95% CI: 0.692–0.932) and 0.965 (95% CI: 0.946–0.977), respectively. The heterogeneity of the sensitivity (I² = 99.91, P = 0.00) and specificity (I² = 99.78, P = 0.00) were significant. In addition, the pooled PLR, NLR, and DOR were 24.060 (95% CI: 14.939–38.750), 0.158 (95% CI: 0.073–0.341), and 152.325 (95% CI: 51.654–449.202), respectively. The AUC of SROC of the CNN system was 0.98 (95% CI: 0.96–0.99). The results are as shown in Table 3. Moreover, the PLR and NLR results of the CNN system confirmed that it is an effective method for detecting colorectal polyps.

Table 3. Diagnostic performance of CNN system, expert, and non-expert in the field of colorectal polyp classification.

Object	Sensitivity (95% CI)	Specificity (95% CI)	PLR (95% CI)	NLR (95% CI)	DOR (95% CI)	SROC (95% CI)
CNN	0.943 [0.927–0.955]	0.894 [0.631–0.977]	8.911 [2.110–37.622]	0.064 [0.043–0.094]	139.052 [22.978–841.481]	0.95 [0.93–0.97]
Expert	0.944 [0.892–0.972]	0.848 [0.732–0.919]	6.198 [3.416–11.247]	0.066 [0.0.34–0.127]	94.383 [39.547–225.251]	0.96 [0.94–0.98]
Non-expert	0.859 [0.769–0.918]	0.811 [0.718–0.878]	4.544 [3.122–6.614]	0.174 [0.109–0.277]	26.191 [15.870–43.225]	0.90 [0.87–0.93]

Open in a new tab

CNN: convolutional neural networks; DOR: diagnostic odds ratio; NLR: negative likelihood ratio; PLR: positive likelihood ratio; SROC: summary receive operating characteristic.

Subgroup analysis without the data of short or full videos

The study of Guo et al. [16] included data of videos, and the sample size was large. Considering including it might mislead the general result, we chose to perform a subgroup analysis without it.

The resutl showed that the pooled sensitivity and specificity were 0.878 (95%CI: 0.702–0.956) and 0.968 (95%CI: 0.945–0.981). Meanwhile, the PLR, NLR, DOR, and AUC of SROC were 27.314 (95%CI: 14.985–49.788), 0.126 (95%CI: 0.047–0.338), 216.250 (95%CI: 53.307–877.255), and 0.98 (95%CI: 0.97–0.99), respectively. The result was shown in the S3 Table.

Application in the field of colorectal polyp classification

Diagnostic performance of CNN system

First, the pooled sensitivity and specificity were 0.943 (95%CI: 0.927–0.955) and 0.894 (95%CI: 0.631–0.977) (Fig 4). The heterogeneity of sensitivity (I² = 94.77, P = 0.00) and specificity (I² = 98.91, P = 0.00) were significant. Meanwhile, the PLR, NLR, DOR, and AUC of SROC were 8.911 (95%CI: 2.110–37.622), 0.064 (95%CI: 0.043–0.094), 139.052 (95%CI: 22.978–841.481), and 0.95 (95%CI: 0.93–0.97), respectively.

Diagnostic performance of expert and non-expert

For the diagnostic performance of expert in the field of classification of colorectal polyps, the pooled sensitivity, specificity, PLR, NLR, DOR and AUC of SROC of expert were 0.944 (95%CI: 0.892–0.972), 0.848 (95%CI: 0.732–0.919), 6.198 (95%CI: 3.416–11.247), 0.066 (95%CI: 0.034–0.127), 94.383 (95%CI: 39.547–225.251), and 0.96 (95%CI: 0.94–0.98), respectively. The heterogeneity of sensitivity (I² = 93.68, P = 0.00) and specificity (I² = 94.03, P = 0.00) were significant.

Besides, the pooled sensitivity, specificity, PLR, NLR, DOR and AUC of SROC of non-expert were 0.859 (95%CI: 0.769–0.918), 0.811 (95%CI: 0.718–0.878), 4.544 (95%CI: 3.122–6.614), 0.174 (95%CI: 0.109–0.277), 26.191 (95%CI: 15.870–43.225), and 0.90 (95%CI: 0.87–0.93), respectively. The heterogeneity of sensitivity (I² = 91.38, P = 0.00) and specificity (I² = 88.75, P = 0.00) were significant.

All data is summarized in Table 3.

The comparison of diagnostic performance among CNN system, expert, and non-expert

For CP classification, the AUC of SROC of CNN, expert, and non-expert was 0.95 (95%CI: 0.93–0.97), 0.96 (95%CI: 0.94–0.98), and 0.90 (95%CI: 0.87–0.93), respectively (Fig 5). By comparing them in pairs acording to RDOR, we found the diagnostic performance of CNN is comparable to that of the expert, but significantly better than that of the non-expert.(Table 4).

Table 4. Comparison of diagnostic performance among CNN, expert, and non-expert in the field of colorectal polyp classification.

Object	Coefficient	Stand error	RDOR	95% CI	P
CNN vs. Expert	0.033	0.7425	1.03	0.20–5.30	0.9654
CNN vs. Non-expert	-1.696	0.7099	0.18	0.04–0.86	0.0342
Expert vs. Non-expert	-1.250	0.5784	0.29	0.08–1.04	0.0559

Open in a new tab

CNN: convolutional neural networks; RDOR: relative diagnostic odds ratio.

Publication bias and identification of sources of heterogeneity

According to Deeks’ funnel plot asymmetry, no publication bias was reported in pooled results of the CNN system. For CP detection, the result was P > |t| = 0.430. At the same time, for CP classification, the result was P > |t| = 0.196. They are as shown in Fig 6A and 6B. Since notable heterogeneity was observed in the pooled analysis of the CNN system in the field of CP detection and classification, meta-regression was conducted to identify the source of heterogeneity. Nonetheless, no potential sources of heterogeneity were identified.

Fig 6 — (A) CNN system for CP detection, (B) CNN system for CP classification.

Discussion

This work systematically reviewed the current status of the CNN system applied in the field of CP detection and classification. Moreover, we conducted a quantitative comparison of the diagnostic value between the CNN system and human endoscopists. Our major finding was that the diagnostic performance of the CNN system was comparable to that of the expert in the field of CP classification. In contrast, the performance of the CNN system was significantly superior to that of the non-expert.

The American Society of Gastrointestinal Endoscopy published the Preservation and Incorporation of Valuable Endoscopic Innovations (PIVI) statement in 2015 to address the resect and discard strategy [33]. This approach set the threshold of a diagnose-and-leave strategy for small colorectal polyps at NPV≥90%. At the same time, the threshold of a resect-and-discard strategy was above 90% of the agreement with histopathology for post-polypectomy surveillance intervals [34]. These set standards were significantly high and hard to achieve, even for experienced endoscopists. Besides, the task of endoscopists was time-consuming as well as labor-intensive. A few studies have shown that endoscopic detections and predictions triggered a rather low diagnostic accuracy rate, particularly in the case of non-expert use [35, 36]. Hence, this calls for the application and use of technological support. This is because evidence has ascertained that computer-aided diagnosis of endoscopic images using AI has the potential to surpass the diagnostic accuracy of trained specialists. Also, AI might also provide more accurate results without interobserver differences, especially between experts and non-experts.

A considerable number of studies have currently focused on the development of the CNN system that assisted human endoscopists. In the field of colonoscopy, its function is primarily divided into 2 categories, i.e.: detection and classification. For CP detection, we found that the PLR, NLR, and AUC of the CNN system was 8.911 [95%CI: 2.110–37.622], 0.064 [95%CI: 0.043–0.094], and 0.95 [95%CI: 0.93–0.97], respectively. These results suggested that CNN was a good diagnostic tool for CP detection. Guo et al. [16] provided the data of videos with large sample size. Considering including them would add potential risk to mislead the general result of CNN, we subsequently performed a subgroup analysis without them. The result just slightly changed which meant it was stable with or without the data of Guo et al. [16].

Unfortunately, we didn’t find data on human endoscopists in the field of CP dectection. However, some studies demonstrated that non-expert endoscopists could produce a better diagnostic performance during endoscopy after the AI training course [37, 38]. Hence, the AI technologies harbor the application potential as a clinical ancillary diagnostic tool and also as an endoscopist training method.

Furthermore, it would be highly beneficial if endoscopic observation can distinguish neoplastic CP from hyperplastic CP. This is because the removal of lesions without malignant potential is expensive and causes high post-procedure complications [39]. Thus, a precise classification of CP significantly improves the cost-effectiveness of colonoscopy. Nonetheless, the task of precisely classifying the different types of CP remains rather difficult. For instance, lesions with indistinct borders, flat and depressed features in conventional adenomas are challenging to distinguish from surrounding normal mucosa. This scenario is specifically prevalent when the bowel preparation is inadequate or the mucosa is capped by mucus or intestinal residue [11]. Kuiper et al. revealed that the sensitivity/specificity of classification of diminutive CP was only 77.0%/78.8%, which was far from satisfactory [40]. In this study, we found that the sensitivity/specificity of a non-expert in the field of CP classification was 85.9%/81.1%. As such, the benefits of optical CP classification might remain limited to experts. However, not every endoscopist is an expert. Therefore, the emergence of AI technology has significantly resolved this limitation. Further, we discovered that the diagnostic performance of the CNN system was significantly better than that of the non-expert. However, due to the complexity of classification technology, the DOR of the CNN system applied in the field of CP classification (139.052 [95%CI: 22.978–449.202]) was weaker compared to that in the field of CP detection (152.325 [95%CI: 51.654–449.202]). Alaso, a similar CNN-DL system was used for the diagnosis and classification of proximal gastric precancerous conditions, including chronic atrophic gastritis, intestinal metaplasia, and dysplasia [41]. This system achieved a sensitivity of 93.5%, and an accuracy of 98.3%, which were much better than both the less and more experienced endoscopists.

However, the CNN system was, in essence, a type of algorithm, which could not make logical decisions like humans. It can be used as a training or auxiliary tool to enhance the performance of endoscopists, but cannot entirely replace human endoscopists. Besides, CNN technology has several limitations.

First, most of the images and videos extracted for CNN system training are highly qualified, which usually triggers selection bias. These systems are frequently unable to distinguish lesions from low-quality materials. Also, their diagnostic performance is excellent in the training set but weak in the clinical practice.

Secondly, identification of images and videos of rare lesions including subtle flat colonic lesions and morphology types is challenging. They are insufficient in either hospitals’ independent or online databases, hence, inadequate training of the CNN system. This further triggers high misdiagnosis rates of infrequent diseases.

Thirdly, most studies included in the present review trained their CNN systems with stationary images or image frames extracted from colonoscopic videos which might hinder the ability of real-time implementation of the CNN system. Moreover, due to the lack of calculating power of computer processors and the complexity of technical processes, the latency of the decision-making process in most systems was unsatisfactory, subsequently disturbing the endoscopist during colonoscopy. Therefore, the ability to work in real-time during endoscopy should be incorporated.

Finally, the CNN system and other artificial intelligence are typically types of algorithm which make decision based on past information. This means it cannot make logical or X crossed decisions. Notably, AI excels when data and training are abundant and exhaustive. However, its performance becomes poorer when it faces previously unseen features and objects since it struggle to extrapolate knowledge gathered from the past to the new environment [42]. In this scenario, humans appear to perform better than AI [43].

With the rapid advancements of AI technology, an ideal CNN system will be developed to overcome these limitations. It might precisely distinguish different lesions from normal surrounding mucosa, including those rare lesions. Meanwhile, it might assist endoscopists simultaneously during endoscopy with almost undetectable latency. Even more, it might provide the type, location, size, depth, and other relevant information of lesions.

In the present study, there are some limitations that should be acknowledged here. First, studies on this field are limited since the application of the CNN system in the field of endoscopy has not matured. Secondly, the sample size of the comparison between the CNN system and human endoscopists was small, which might cause selection bias. Thirdly, although there was no publication bias, since letters, reviews, as well as articles not published in English were excluded, selective reporting bias might still exist. Fourthly, although meta-regression analysis was performed to identify the potential sources of heterogeneity, due to the limitation of the sample size and the variables collected from studies included, the exploration of heterogeneity might remain inadequate. Finally, the majority of studies included were retrospective and used different types of training and testing materials, hence a potential bias.

Conclusion

In conclusion, our systematic review and meta-analysis suggested that the CNN system achieved comparable diagnostic performance to that of an expert, and better performance compared to that of a non-expert, in the field of CP detection. Additionally, in the field of CP classification, the CNN system demonstrated better diagnostic performance than the human endoscopists regardless of the level of working experience. Despite the limitations of the CNN system, it can be popularized in clinical practice with relative-high diagnostic accuracy, consequently enhancing the diagnostic performance of endoscopists.

Supporting information

S1 Table. PRISMA flow diagram.

(DOC)

Click here for additional data file.^{(59KB, doc)}

S2 Table. PRISMA checklist.

(DOC)

Click here for additional data file.^{(72.5KB, doc)}

S3 Table. Subgroup analysis without the data of short or full videos in the field of CP detection.

(DOCX)

Click here for additional data file.^{(16.5KB, docx)}

Acknowledgments

The authors thank Dr. Peng Jiang and Dr. Hai-Feng Tang for their critical reading and informative advice during the study process. Meanwhile, the authors thank Freescience for language polishing.

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

The author(s) received no specific funding for this work.

References

1.Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424. 10.3322/caac.21492 . [DOI] [PubMed] [Google Scholar]
2.Kuntz KM, Lansdorp-Vogelaar I, Rutter CM, Knudsen AB, van Ballegooijen M, Savarino JE, et al. A systematic comparison of microsimulation models of colorectal cancer: the role of assumptions about adenoma progression. Med Decis Making. 2011;31(4):530–9. 10.1177/0272989X11408730 . [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Strum WB. Colorectal Adenomas. N Engl J Med. 2016;375(4):389–90. 10.1056/NEJMc1604867 . [DOI] [PubMed] [Google Scholar]
4.Montminy EM, Jang A, Conner M, Karlitz JJ. Screening for Colorectal Cancer. The Medical clinics of North America. 2020;104(6):1023–36. Epub 2020/10/26. 10.1016/j.mcna.2020.08.004 . [DOI] [PubMed] [Google Scholar]
5.Pilonis ND, Bugajski M, Wieszczy P, Franczyk R, Didkowska J, Wojciechowska U, et al. Long-Term Colorectal Cancer Incidence and Mortality After a Single Negative Screening Colonoscopy. Ann Intern Med. 2020;173(2):81–91. Epub 2020/05/26. 10.7326/M19-2477 . [DOI] [PubMed] [Google Scholar]
6.Li D, Liu L, Fevrier HB, Alexeeff SE, Doherty AR, Raju M, et al. Increased Risk of Colorectal Cancer in Individuals With a History of Serrated Polyps. Gastroenterology. 2020;159(2):502–11.e2. Epub 2020/04/12. 10.1053/j.gastro.2020.04.004 . [DOI] [PubMed] [Google Scholar]
7.Ijspeert JEG, Bastiaansen BAJ, van Leerdam ME, Meijer GA, van Eeden S, Sanduleanu S, et al. Development and validation of the WASP classification system for optical diagnosis of adenomas, hyperplastic polyps and sessile serrated adenomas/polyps. Gut. 2016;65(6):963–70. 10.1136/gutjnl-2014-308411 . [DOI] [PubMed] [Google Scholar]
8.Ijspeert JEG, Bevan R, Senore C, Kaminski MF, Kuipers EJ, Mroz A, et al. Detection rate of serrated polyps and serrated polyposis syndrome in colorectal cancer screening cohorts: a European overview. Gut. 2017;66(7):1225–32. 10.1136/gutjnl-2015-310784 . [DOI] [PubMed] [Google Scholar]
9.Allen JE, Sharma P. Polyp characterization at colonoscopy: Clinical implications. Best practice & research Clinical gastroenterology. 2017;31(4):435–40. Epub 2017/08/27. 10.1016/j.bpg.2017.07.001 . [DOI] [PubMed] [Google Scholar]
10.van Rijn JC, Reitsma JB, Stoker J, Bossuyt PM, van Deventer SJ, Dekker E. Polyp miss rate determined by tandem colonoscopy: a systematic review. Am J Gastroenterol. 2006;101(2):343–50. 10.1111/j.1572-0241.2006.00390.x . [DOI] [PubMed] [Google Scholar]
11.Yamada M, Sakamoto T, Otake Y, Nakajima T, Kuchiba A, Taniguchi H, et al. Investigating endoscopic features of sessile serrated adenomas/polyps by using narrow-band imaging with optical magnification. Gastrointest Endosc. 2015;82(1):108–17. 10.1016/j.gie.2014.12.037 . [DOI] [PubMed] [Google Scholar]
12.Wang P, Xiao X, Glissen Brown JR, Berzin TM, Tu M, Xiong F, et al. Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy. Nat Biomed Eng. 2018;2(10):741–8. 10.1038/s41551-018-0301-3 . [DOI] [PubMed] [Google Scholar]
13.Chen PJ, Lin MC, Lai MJ, Lin JC, Lu HH, Tseng VS. Accurate Classification of Diminutive Colorectal Polyps Using Computer-Aided Analysis. Gastroenterology. 2018;154(3):568–75. 10.1053/j.gastro.2017.10.010 . [DOI] [PubMed] [Google Scholar]
14.Mori Y, Kudo SE, Misawa M, Saito Y, Ikematsu H, Hotta K, et al. Real-Time Use of Artificial Intelligence in Identification of Diminutive Polyps During Colonoscopy: A Prospective Study. Ann Intern Med. 2018;169(6):357–66. 10.7326/M18-0249 . [DOI] [PubMed] [Google Scholar]
15.Renner J, Phlipsen H, Haller B, Navarro-Avila F, Saint-Hill-Febles Y, Mateus D, et al. Optical classification of neoplastic colorectal polyps—a computer-assisted approach (the COACH study). Scand J Gastroenterol. 2018;53(9):1100–6. 10.1080/00365521.2018.1501092 . [DOI] [PubMed] [Google Scholar]
16.Guo Z, Nemoto D, Zhu X, Li Q, Aizawa M, Utano K, et al. A polyp detection algorithm can detect small polyps: An ex vivo reading test compared with endoscopists. Dig Endosc. 2020. 10.1111/den.13670 . [DOI] [PubMed] [Google Scholar]
17.Kudo SE, Misawa M, Mori Y, Hotta K, Ohtsuka K, Ikematsu H, et al. Artificial Intelligence-assisted System Improves Endoscopic Identification of Colorectal Neoplasms. Clin Gastroenterol Hepatol. 2019. 10.1016/j.cgh.2019.09.009 . [DOI] [PubMed] [Google Scholar]
18.Ozawa T, Ishihara S, Fujishiro M, Kumagai Y, Shichijo S, Tada T. Automated endoscopic detection and classification of colorectal polyps using convolutional neural networks. Therap Adv Gastroenterol. 2020;13:1756284820910659 10.1177/1756284820910659 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Whiting PF, Rutjes AWS, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36. 10.7326/0003-4819-155-8-201110180-00009 . [DOI] [PubMed] [Google Scholar]
20.Deeks JJ. Systematic reviews in health care: Systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323(7305):157–62. 10.1136/bmj.323.7305.157 . [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327(7414):557–60. 10.1136/bmj.327.7414.557 . [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Jackson D, White IR, Thompson SG. Extending DerSimonian and Laird’s methodology to perform multivariate random effects meta-analyses. Stat Med. 2010;29(12):1282–97. 10.1002/sim.3602 . [DOI] [PubMed] [Google Scholar]
23.Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med. 1993;12(14):1293–316. 10.1002/sim.4780121403 . [DOI] [PubMed] [Google Scholar]
24.Jones CM, Athanasiou T. Summary receiver operating characteristic curve analysis techniques in the evaluation of diagnostic tests. Ann Thorac Surg. 2005;79(1):16–20. 10.1016/j.athoracsur.2004.09.040 . [DOI] [PubMed] [Google Scholar]
25.Jackson D, Riley R, White IR. Multivariate meta-analysis: potential and promise. Stat Med. 2011;30(20):2481–98. 10.1002/sim.4172 . [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Ian RW. Multivariate random-effects meta-regression: Updates to mvmeta. Stata Journal. 2011;11(2):255–70. 10.1177/1536867X1101100206 [DOI] [Google Scholar]
27.Byrne MF, Chapados N, Soudan F, Oertel C, Linares Pérez M, Kelly R, et al. Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model. Gut. 2019;68(1). 10.1136/gutjnl-2017-314547 . [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Lequan Y, Hao C, Qi D, Jing Q, Pheng Ann H. Integrating Online and Offline Three-Dimensional Deep Learning for Automated Polyp Detection in Colonoscopy Videos. IEEE J Biomed Health Inform. 2017;21(1):65–75. 10.1109/JBHI.2016.2637004 . [DOI] [PubMed] [Google Scholar]
29.Shin Y, Balasingham I. Automatic polyp frame screening using patch based combined feature and dictionary learning. Comput Med Imaging Graph. 2018;69:33–42. 10.1016/j.compmedimag.2018.08.001 . [DOI] [PubMed] [Google Scholar]
30.Urban G, Tripathi P, Alkayali T, Mittal M, Jalali F, Karnes W, et al. Deep Learning Localizes and Identifies Polyps in Real Time With 96% Accuracy in Screening Colonoscopy. Gastroenterology. 2018;155(4):1069–78 e8. 10.1053/j.gastro.2018.06.037 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Yamada M, Saito Y, Imaoka H, Saiko M, Yamada S, Kondo H, et al. Development of a real-time endoscopic image diagnosis support system using deep learning technology in colonoscopy. Sci Rep. 2019;9(1):14465 10.1038/s41598-019-50567-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Zhang R, Zheng Y, Poon CCY, Shen D, Lau JYW. Polyp detection during colonoscopy using a regression-based convolutional neural network with a tracker. Pattern Recognit. 2018;83:209–19. 10.1016/j.patcog.2018.05.026 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Chandrasekhara V, Desilets D, Falk GW, Inoue H, Romanelli JR, Savides TJ, et al. The American Society for Gastrointestinal Endoscopy PIVI (Preservation and Incorporation of Valuable Endoscopic Innovations) on peroral endoscopic myotomy. Gastrointest Endosc. 2015;81(5). 10.1016/j.gie.2014.12.007 . [DOI] [PubMed] [Google Scholar]
34.Rex DK, Kahi C, O’Brien M, Levin TR, Pohl H, Rastogi A, et al. The American Society for Gastrointestinal Endoscopy PIVI (Preservation and Incorporation of Valuable Endoscopic Innovations) on real-time endoscopic assessment of the histology of diminutive colorectal polyps. Gastrointest Endosc. 2011;73(3):419–22. 10.1016/j.gie.2011.01.023 . [DOI] [PubMed] [Google Scholar]
35.Rees CJ, Rajasekhar PT, Wilson A, Close H, Rutter MD, Saunders BP, et al. Narrow band imaging optical diagnosis of small colorectal polyps in routine clinical practice: the Detect Inspect Characterise Resect and Discard 2 (DISCARD 2) study. Gut. 2017;66(5):887–95. 10.1136/gutjnl-2015-310584 . [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Schachschal G, Mayr M, Treszl A, Balzer K, Wegscheider K, Aschenbeck J, et al. Endoscopic versus histological characterisation of polyps during screening colonoscopy. Gut. 2014;63(3):458–65. 10.1136/gutjnl-2013-304562 . [DOI] [PubMed] [Google Scholar]
37.Sehgal V, Rosenfeld A, Graham DG, Lipman G, Bisschops R, Ragunath K, et al. Machine Learning Creates a Simple Endoscopic Classification System that Improves Dysplasia Detection in Barrett’s Oesophagus amongst Non-expert Endoscopists. Gastroenterol Res Pract. 2018;2018:1872437 10.1155/2018/1872437 . [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Cai S-L, Li B, Tan W-M, Niu X-J, Yu H-H, Yao L-Q, et al. Using a deep learning system in endoscopy for screening of early esophageal squamous cell carcinoma (with video). Gastrointest Endosc. 2019;90(5). 10.1016/j.gie.2019.06.044 . [DOI] [PubMed] [Google Scholar]
39.Abu Dayyeh BK, Thosani N, Konda V, Wallace MB, Rex DK, Chauhan SS, et al. ASGE Technology Committee systematic review and meta-analysis assessing the ASGE PIVI thresholds for adopting real-time endoscopic assessment of the histology of diminutive colorectal polyps. Gastrointest Endosc. 2015;81(3). 10.1016/j.gie.2014.12.022 . [DOI] [PubMed] [Google Scholar]
40.Kuiper T, Marsman WA, Jansen JM, van Soest EJ, Haan YCL, Bakker GJ, et al. Accuracy for optical diagnosis of small colorectal polyps in nonacademic settings. Clinical gastroenterology and hepatology: the official clinical practice journal of the American Gastroenterological Association. 2012;10(9). 10.1016/j.cgh.2012.05.004 . [DOI] [PubMed] [Google Scholar]
41.Guimarães P, Keller A, Fehlmann T, Lammert F, Casper M. Deep-learning based detection of gastric precancerous conditions. Gut. 2020;69(1):4–6. 10.1136/gutjnl-2019-319347 . [DOI] [PubMed] [Google Scholar]
42.Saxe A, Nelli S, Summerfield C. If deep learning is the answer, what is the question? Nature reviews Neuroscience. 2020. Epub 2020/11/18. 10.1038/s41583-020-00395-8 . [DOI] [PubMed] [Google Scholar]
43.Lake BM, Ullman TD, Tenenbaum JB, Gershman SJ. Building machines that learn and think like people. The Behavioral and brain sciences. 2017;40:e253 Epub 2016/11/25. 10.1017/S0140525X16001837 . [DOI] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0246892.r001

Decision Letter 0

Ping He

22 Sep 2020

PONE-D-20-22694

Comparison of diagnostic performance between convolutional neural networks and human endoscopists for diagnosis of colorectal polyp: a systematic review and meta-analysis

PLOS ONE

Dear Dr. Xu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 06 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Ping He, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please include the date(s) on which you accessed the databases or records to obtain the data used in your study.

3. Thank you for quantifying study heterogeneity.

Please provide more detailed reporting on your results, for example, by reporting your Q or I^2 statistics.

4. We note that you state in your manuscript "The design and protocol of the present study were approved by our institutional review board."

Since this is a systematic review and does not include non-public data, this statement is not necessary and may be taken out.

If you would like to keep this statement, please provide the full name of your ethics committee and the ethics approval number.

5. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ

6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information

Additional Editor Comments:

There are serious problems with the manuscript. Please carefully polish and revise your manuscript according to the comments of the reviewers.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This is an interesting systematic review article written by Xu et al. This review is well-written and described using appropriate statistical methods. The authors conducted meta-analysis to compare the diagnostic performance of artificial intelligence (AI) for detecting and classifying the colorectal polyp to the human endoscopists. This study must be of interest to the reader and will provide useful knowledge to the reader. I have one comment for the authors.

1. Shouldn't the classification of polyps be divided into normal observation, magnification (80×) and super magnification (520×, microscopic observation) for analysis? Please revisit and consider this point.

Reviewer #2: The meta-analysis presented by Dr Xu and colleagues addresses an interesting topic, that is the performance of convolutional neural networks for detection and classification of colorectal polyps.

The fact that computer-aided colonoscopy and artificial intelligence are hot-topics is demonstrated by the very recent publication of other dedicated systematic reviews and meta-analysis, such as Barua et al, Endoscopy 2020 or Hassan et al, GIE 2020 (detection) and Lui et al, GIE 2020 (detection and classification).

In this meta-analysis, the authors only focus of CNNs, a specific type of artificial intelligence.

Unfortunately, the paper presented by dr. Xu has some important limitations:

- The search strategy seems well designed, and I appreciated the effort to analyze the quality of included studies and the possibility of publication bias. However, at least one recent full-study about CNNs that met the inclusion criteria was not included in the analysis (Urban 2018, Gastroenterology); also some abstracts (such as Misawa 2019 GIE, Matsui 2019 GIE, LUI 2019 GIE) were not included. The choice whether to include or not abstracts can be discussed (even if in a meta-analysis they are usually included), however abstracts were not included in the exclusion criteria.

- Data extraction from the selected studies was not so rigorous and clear.

For example, apparently for reference 13 (Renner et al) authors have included the performances of CNN considering either high-confidence predictions or "standard" predictions (see table 2), with potential duplication of data and overestimation of performance.

Moreover, for reference 25 (Guo et al) the authors considered per-frame sensitivity and specificity of short and long videos; however, in order to provide clinically useful information (similar to polyp or adenoma detection rates) per-video sensitivity and specificity (reported in table 3 of the cited paper for the 100 short videos) should have been used instead. In table 3 of the paper by Guo et al are also reported the diagnostic performances of 2 experts e 2 non-experts physicians that were not included in the metanalysis.

Similarly, for Wang et al (ref 31) the 1633 polyps (and not the total number of polyps images) should have been considered.

Considering these problems in data extraction, the reported pooled diagnostic performances of CNN and human endoscopists are not reliable.

- I think that the use of the Fagan Nomogram to calculate post-test probability considering positive and negative likelihood ratios is not really indicated in this field.

- The differential performance according to dimension of polyps (diminutive and non-diminutive polyps) should have been analysed

- The paper requires a deep linguistic revision (grammatical mistakes, repetitions…), possibly by an English mother-tongue revisor.

Other observations:

- Reference 10 does not refer to detection of colonic lesions, but of early esophageal squamous cell carcinoma/preneoplastic lesions in the esophagus. The sentence in the introduction related to this reference should be revised.

- Both table 1 and 2 should contain the reference numbers of the included papers in order to make the tables easier to read. Moreover, I suggest to list the papers according to the year of publication.

- Table 1 should also include more details about the imaging modalities included in the different studies. For example, the use of dedicated endocytoscopes in the studies referred as 12 and 25 should be reported. Also information about the use of real-time analysis by the CNN systems, and the type of lesions included (only diminutive polyps or polyps of any size) in the different studies should be highlighted in the table.

- In the paper it is stated that 7 of the included studies focused on detection; however in table 1 only 6 papers have detection been reported as field focus; similarly, 5 studies should include experts human endoscopists but only four papers are reported in table 1 and 2.

- The paper by Yu (2016), included in table 1 and 2 is not reported in the references.

Reviewer #3: There are some spelling mistakes:

- line 186, page 15: Results instead of resules

- line 211, page 17, table 2: proximal rectosigmoid instead of poximal rectosigmoid

- line 360, page 25: ordinary physicians instead of phyisicians

There are missing the technical explanations of the different CNN-systems you compared in your analysis: Why are some CNN-systems better for CP detection and others for CP classification? Is it justified to compare different CNN-systems that provide different features? Do they have different characteristics concerning deep learning?

There is also missing the description of the different classification-systems of CP used by different authors in different studies.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Naoki Hosoe

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Feb 16;16(2):e0246892. doi: 10.1371/journal.pone.0246892.r002

Author response to Decision Letter 0

19 Oct 2020

Dear Editor:

Thank you and the reviewers for your valuable suggestions. We have carefully read through the comments and made proper revisions. Our responses to the reviewer’s questions are listed below. We greatly appreciated your time and efforts to improve our manuscript for publication.

Sincerely,

Yixin Xu

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Response:

We have revised the manuscript following the style requirements.

2. Please include the date(s) on which you accessed the databases or records to obtain the data used in your study.

Response:

We completed the data collection on April 30, 2020. It was mentioned on Page 6, Line 117.

3. Thank you for quantifying study heterogeneity.

Please provide more detailed reporting on your results, for example, by reporting your Q or I^2 statistics.

Response:

In the field of CP detection, we add the value of I2 and P for CNN system on Page 14, Line 248-249; for expert on Page 15-16, Line 274-276; for non-expert on Page 16, Line 282-284.

In the field of CP classification, we add the value of I2 and P for CNN system on Page 16, Line 289-290; for expert on Page 17, Line 309-310; for non-expert on Page 17, Line 314-315.

All these data added were marked in green.

4. We note that you state in your manuscript "The design and protocol of the present study were approved by our institutional review board."

Since this is a systematic review and does not include non-public data, this statement is not necessary and may be taken out.

If you would like to keep this statement, please provide the full name of your ethics committee and the ethics approval number.

Response:

We have taken out this statement “The design and protocol of the present study were approved by our institutional review board.”, and added “As the data in this study was from previous studies, there was no need to get the ethics committee approval, follow the Declaration of Helsinki, or have patients informed consent form.” to the Ethical approval part.

Response:

We have added the ORCID to the Editorial Manager.

Response:

We have added the supporting information at the end of the manuscript and cited on Page 10, Line 206.

Reviewer 1#

This is an interesting systematic review article written by Xu et al. This review is well-written and described using appropriate statistical methods. The authors conducted meta-analysis to compare the diagnostic performance of artificial intelligence (AI) for detecting and classifying the colorectal polyp to the human endoscopists. This study must be of interest to the reader and will provide useful knowledge to the reader. I have one comment for the authors.

Response:

Thanks a lot for your summary and kind remarks.

To your constructive suggestion, we have carefully reviewed all the references in our meta-analysis. We found that only few studies included in our analysis mentioned the magnification type of their images used to train the CNN system. Only Kudo et al. and Mori et al. used the super magnification (520×). As a result, we did not divide the classification into different groups. Besides, we found that the classification system of CNN applied in colonoscopy mainly focused on determining whether the polyps, especially those diminutive polyps, were hyperplastic or neoplastic. For neoplastic polyps, simple resection may be not sufficient. Endoscopic Mucosal Resection (EMR), Endoscopic submucosal dissection (ESD) or even surgery is the optimal curative option. The process of classification of polyps were finished using the routine colonoscopy images (some systems using narrow-band images). Besides, magnification colonoscopy is currently not commercially available worldwide.

Reviewer 2#

The meta-analysis presented by Dr Xu and colleagues addresses an interesting topic, that is the performance of convolutional neural networks for detection and classification of colorectal polyps.

In this meta-analysis, the authors only focus of CNNs, a specific type of artificial intelligence.

Unfortunately, the paper presented by dr. Xu has some important limitations:

Response:

Thank you for this summary of our paper.

I started my manuscript on January 10, 2020. Before that, I have not found any similar article through Searching PubMed and other database. I thought this subject about artificial intelligence applied in the field of colonoscopy was interesting and “hot”. I finished my manuscript on June, and submitted it to PLOS ONE on July 22. I have not realized that there were three articles about this subject have already been published. Those limitations you mentioned were addressed below.

1. The search strategy seems well designed, and I appreciated the effort to analyze the quality of included studies and the possibility of publication bias. However, at least one recent full-study about CNNs that met the inclusion criteria was not included in the analysis (Urban 2018, Gastroenterology); also some abstracts (such as Misawa 2019 GIE, Matsui 2019 GIE, LUI 2019 GIE) were not included. The choice whether to include or not abstracts can be discussed (even if in a meta-analysis they are usually included), however abstracts were not included in the exclusion criteria.

Response:

Urban 2018, Gastroenterology was already included in my analysis. I marked it in red in Table 1 and Table 2. It is also the number 30 in my reference list (in red).

We have carefully read these abstracts (Misawa 2019 GIE, Matsui 2019 GIE, LUI 2019 GIE) and we found that they could not provide precise data for analysis. As a result, we decided not to include them. For more accurate expression, we added “abstracts” to the exclusion criteria part and marked it in red (Page 6, Line 132).

2. Data extraction from the selected studies was not so rigorous and clear.

Response:

Thank you for this constructive suggestion. We carefully reviewed the article of Renner et al., and found maybe the high-confidence predictions had the potential of duplication of data. As a result, we decided to delete the data of high-confidence, and recalculated the whole part of classification. There are so many changes, and all of them were marked in red. (Page 16, Line 288, 289, 291, 292, 294, 295; Page 17, Line 307-309; Page 19, Line 346). Table 4, Table 6, Fig. 5, Fig. 6, Fig. 8, and Fig. 9 were remade. Besides, we added some notes to Fig. 3 (Page 15, Line 261-264), and Fig. 5 (Page 17, Line 298-299). All these changes were marked in red.

3. Moreover, for reference 25 (Guo et al) the authors considered per-frame sensitivity and specificity of short and long videos; however, in order to provide clinically useful information (similar to polyp or adenoma detection rates) per-video sensitivity and specificity (reported in table 3 of the cited paper for the 100 short videos) should have been used instead. In table 3 of the paper by Guo et al are also reported the diagnostic performances of 2 experts e 2 non-experts physicians that were not included in the metanalysis.

Response:

There are two reasons that we did use per-frame sensitivity and specificity instead of per-video sensitivity and specificity. First, almost of the articles included in our analysis used colonoscopy images instead of videos to compare the diagnostic performance between the CNN system and human endoscopists. In order to ensure the consistency of the whole analysis, we chose the per-frame sensitivity and specificity. Second, for per-video sensitivity and specificity, Guo et al. did not provide enough data for us to analyze. In table 3 of their paper, they only provide the data of per-video sensitivity, but not the data of per-video specificity. As a result, we cannot get the data of (TP, TN, FP, and FN).

When we read the paper of Guo et al., we have found that they reported the diagnostic performance of experts and non-experts. But the data of human endoscopists are per-video, not per-frame. Based on the reasons we mentioned above, we decided to use the data of per-frame instead of per-video. So we did not include the data of human endoscopists.

4. Similarly, for Wang et al (ref 31) the 1633 polyps (and not the total number of polyps images) should have been considered.

Response:

The total 1633 polyps belong to Dataset A and C.

In the paper of Wang et al., they used four databases to validate the diagnostic performance of CNN system. Dataset A and B contained colonoscopy images. Dataset C and D contained colonoscopy video. However, they only provide precise data of Dataset A. As a result, we chose included the data of Dataset A. When we search for the references, we have found almost all authors used colonoscopy images to compare diagnostic performance between CNN system and human endoscopists. So we decided to choose colonoscopy images as the main source of our analysis.

5. I think that the use of the Fagan Nomogram to calculate post-test probability considering positive and negative likelihood ratios is not really indicated in this field.

Response:

The Fangan Nomogram is based on Bayes’ Theorem. It is often used to evaluate the clinical application of a diagnostic method or tool. The pre-test probability is calculated using Bayes’ Theorem. The post-test probability of LR-positive means the probability of patients suspected of this disease would get a positive diagnostic result using this diagnostic method or tool. While the post-test probability of LR-negative means the probability of patients who are not suspected of this disease would suffer it after this diagnostic method or tool gives them a negative diagnostic result.

After carefully thinking about the meaning of Fangan Nomogram, we think the application of it in the field of detection and classification of colorectal polyps may be appropriate.

6. The differential performance according to dimension of polyps (diminutive and non-diminutive polyps) should have been analysed

Response:

Sometimes colorectal polyps are hard to detect and classify, especially for the diminutive ones. So maybe the diagnostic performance would be different between diminutive polyps and non-diminutive polyps. After carefully reviewed all the references, we found that seven studies provided data about diminutive polyps (Chen et al., Renner et al., Mori et al., Ozawa et al., Shin et al., Byrne et al., and Kudo et al.). Among them, Kudo et al. focused on the detection of Polyps, and Chen et al., Mori et al., and Byrne et al. mainly focused on diminutive polyps. Because only one study provided data about diminutive polyps in the detection part, so we thought it was not necessary to perform a subgroup analysis. In the subgroup analysis of diminutive polyps in the classification part, the result showed the performance of the CNN and expert had just slightly changed. Probably, the reason is because only two studies (Renner et al., and Ozawa et al.) provided data about non-diminutive polyps. After careful consideration, we decided not to add the subgroup analysis to the manuscript, due to the slight change of the results and the consistency of the article.

The subgroup analysis was shown below:

Object Sensitivity (95% CI) Specificity

(95% CI) PLR

(95% CI) NLR

(95% CI) DOR

(95% CI) SROC

(95% CI)

CNN 0.935 [ 0.915-0.947] 0.770 [0.597-0.883] 4.061 [2.155-7.652] 0.085 [0.060-0.120] 47.849 [ 18.638-122.842] 0.983 [ 0.951-0.997]

Expert 0.912 [0.827-0.957] 0.848 [0.756-0.910] 6.009 [3.783-9.544] 0.104 [0.054-0.200] 57.970 [ 30.407-110.518] 0.938 [0.923-0.953]

Non-expert 0.878 [0.792-0.932] 0.782 [0.687-0.854] 3.915 [2.673-5.731] 0.174 [0.112-0.267] 25.428 [13.169-49.096] 0.895 [0.873-0.924]

Diminutive polyps

CNN 0.948 [0.925-0.965] 0.812 [0.673-0.900] 5.037 [2.741-9.258] 0.064 [0.040-0.101] 79.210 [29.347-213.791] 0.965 [0.941-0.972]

Expert 0.903 [0.808-0.954] 0.864 [0.777-0.921] 4.659 [3.099-8.818] 0.112 [0.057-0.221] 49.389 [30.057-117.346] 0.921 [0.901-0.965]

Non-expert 0.878 [0.792-0.932] 0.782 [0.687-0.854] 3.915 [2.673-5.731] 0.174 [0.112-0.267] 25.428 [13.169-49.096] 0.895 [0.873-0.924]

Note: The sample size of non-diminutive polyps is too small, so it cannot be analyzed. The data of non-expert in whole group are identical to those in diminutive polyps group, so the results are just the same.

7. The paper requires a deep linguistic revision (grammatical mistakes, repetitions…), possibly by an English mother-tongue revisor.

Response:

We have already sent this paper to language polishing. (All changes were marked in blue.)

Other observations:

8. Reference 10 does not refer to detection of colonic lesions, but of early esophageal squamous cell carcinoma/preneoplastic lesions in the esophagus. The sentence in the introduction related to this reference should be revised.

Response:

We are sorry for this mistake. We have revised this mistake and cited another reference.

9. Both table 1 and 2 should contain the reference numbers of the included papers in order to make the tables easier to read. Moreover, I suggest to list the papers according to the year of publication.

Response:

We added the reference number to Table 1 and 2 and listed papers according to year of publication. Besides, I fixed two references’ names and marked them in red. (Kuo is altered to Kudo; Yu is altered to Lequan )

10. Table 1 should also include more details about the imaging modalities included in the different studies. For example, the use of dedicated endocytoscopes in the studies referred as 12 and 25 should be reported. Also information about the use of real-time analysis by the CNN systems, and the type of lesions included (only diminutive polyps or polyps of any size) in the different studies should be highlighted in the table.

Response:

We have added information of type of endocytoscopes, type of CNN system, real-time use of CNN system, type of lesions, and type of images into Table 1, and marked them in red.

11. In the paper it is stated that 7 of the included studies focused on detection; however in table 1 only 6 papers have detection been reported as field focus; similarly, 5 studies should include experts human endoscopists but only four papers are reported in table 1 and 2.

Response:

Actually, there are 7 studies focused on detection, they are Lequan et al., Wang et al., Urban et al., Zhang et al., Yamada et al., Kudo et al., and Guo et al.. In fact, we made a mistake. There are only 4 studies provided data about experts and 3 studies provided data about non-experts. We revised it and marked it in red. (Page 10, Line 216)

12. The paper by Yu (2016), included in table 1 and 2 is not reported in the references.

Response:

Yu (2016) is actually Lequan (2016), we just mistook his/her last name. We have revised it and marked it in red.

Review 3#

1. There are some spelling mistakes:

- line 186, page 15: Results instead of resules

- line 211, page 17, table 2: proximal rectosigmoid instead of poximal rectosigmoid

- line 360, page 25: ordinary physicians instead of phyisicians

Response:

We have revised all the spelling mistakes, and already sent the manuscript to language polishing.

2. There are missing the technical explanations of the different CNN-systems you compared in your analysis: Why are some CNN-systems better for CP detection and others for CP classification? Is it justified to compare different CNN-systems that provide different features? Do they have different characteristics concerning deep learning?

Response:

For CP detection training, authors chose images contained CP and those did not contain CP. Their purpose is to train the CNN system to distinguish CP from normal colorectal mucosa. This task is also very difficult. Because flat, small and isochromatic polyps are associated with a high miss rate, even for experts.

While for CP classification training, authors used images contained hyperplastic or neoplastic CP. The authors used NBI or magnification images to train CNN system to classify different pathological type of CP. For hyperplastic CP, the treatment is just simple resection. While for neoplastic CP, it is not enough. Endoscopic Mucosal Resection (EMR), Endoscopic submucosal dissection (ESD) or even surgery is the optimal curative option. If the accuracy of classification function of CNN system is high enough, endoscopists can trust the diagnostic results provided by it and make a simultaneous therapeutic decision while colonoscopy.

Maybe the details of algorithms of detection and classification are slightly different. But rare authors provided information about these differences. However, they are all based on CNN basically.

3. There is also missing the description of the different classification-systems of CP used by different authors in different studies.

Response:

We have added information of different type of CNN system used in different studies in Table 1.

Attachment

Submitted filename: Response to reviewers.docx

Click here for additional data file.^{(33.5KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0246892.r003

Decision Letter 1

Ping He

24 Nov 2020

PONE-D-20-22694R1

Comparison of diagnostic performance between convolutional neural networks and human endoscopists for diagnosis of colorectal polyp: a systematic review and meta-analysis

PLOS ONE

Dear Dr. Xu,

Please submit your revised manuscript by Jan 08 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Ping He, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (if provided):

Please further polish your manuscript according to the comments below.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

Reviewer #3: All comments have been addressed

Reviewer #4: All comments have been addressed

Reviewer #5: All comments have been addressed

Reviewer #6: All comments have been addressed

Reviewer #7: All comments have been addressed

Reviewer #8: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: Yes

Reviewer #8: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: I Don't Know

Reviewer #8: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: Yes

Reviewer #8: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: No

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: Yes

Reviewer #8: Yes

**********

6. Review Comments to the Author

Reviewer #1: I read the authors response. The revised manuscript is appropriately corrected. I have no comment for the authors.

Reviewer #2: (No Response)

Reviewer #3: I thank the authors for having carefully read through the comments of all three reviewers and that you made the proper revisions. Your responses to the reviewer’s questions are fully integrated in your revised manuscript that can now be accepted for publication.

Reviewer #4: The article provide valuable information. The manuscript is well organized. I would like authors to confirm if they have taken the informed consent. Also authors should rectify the grammatical mistakes and language errors in every paragraph. Authors are advise also to incorporate more citations and to put citation newer than 2010. I recommend the manuscript for the publication with minor revision.

Reviewer #5: I happen to be a follow-on reviewer and have been able to review your submissions, original as well as revised addressing the concerns of previous worthy reviewers. I found that the revised manuscript is much better and to me most of the adjustments, corrections and modifications have already been addressed. I would suggest to give some more consideration to the write-up in standard format of English.

Reviewer #6: As technology evolves it is more present in our personal lives and professional lives. We need to embrace because it is the future and there is no doubt about this. I congratulate the authors on their work regarding the volume of research and also on the starting hypothesis. We call these advanced structures AI or artificial intelligence but in essence they are algorithms which take into consideration past information on only selected types of lesions but they do not make logical or X crossed decisions. These aspects need to be taken into account always when discussing this technology-the lack of flexibility.

A few mentions:

There are some english mystakes. I have marked some of them in the attached document. Please adress them.

Line 102-104

102 Moreover, findings from some studies showed that the CNN system could automatically

103 classify CP, which is significantly helpful for the therapeutic decision-making

104 process during colonoscopy.

The authors mentions the CNN could classify CP. Please elaborate in a few words if this classification is based on malignancy risk or only on size or by base of implantation as this information has important value to the reader.

Line 101-102.

It is a type of the most prevalent network architectures of deep learning (DL) methods based on artificial intelligence (AI) technology.

Please reformulate-this phrase is pretty difficult to understand by the reader. I understand the information is abstract but it needs to make sense to everyone.

Line 214 the authors mention all of the articles were published in the last 4 years. In the abstract the end of the time-lime of the search was April 2020. Please also provide a starting date of the search if it exists.

Other aspects from my point of view that the authors should mention in a few words is the problem of ethics. How much can a doctor base his decision on an algorithm and what are the legal implications if a decision is wrong and the CP classified as benign proves to be malignant.

Another aspect which I did not see addressed is the suboptimal colonic preparation for colonoscopy. In current practice we have all encountered it. How does the CNN address these issues. In the studies included in the analysis were all of the patients prepared ideally for the colonoscopy?

Reviewer #7: well written systemic review and meta-analysis by Dr. Xu on role of artificial intelligence (AI) in medical science.

Reviewer #8: Interesting systematic review and meta-analysis for diagnostic performance of convolutional neural networks for the detection and classification of colorectal polyps. The authors have done a thorough job of addressing all of the prior reviewers questions and concerns appropriately. Grammatical and spelling mistakes have also been addressed. I have not identified any other major changes that need to be made, technical or otherwise.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Andreas Adler, M.D.

Reviewer #4: No

Reviewer #5: No

Reviewer #6: No

Reviewer #7: Yes: Dr. Irshad Ahmad

Reviewer #8: Yes: Stas Amato

Attachment

Submitted filename: Re-Review PLOS one CNNs polyps.docx

Click here for additional data file.^{(25.1KB, docx)}

Attachment

Submitted filename: PONE-D-20-22694_R1_reviewer (1).pdf

Click here for additional data file.^{(2.9MB, pdf)}

PLoS One. 2021 Feb 16;16(2):e0246892. doi: 10.1371/journal.pone.0246892.r004

Author response to Decision Letter 1

8 Dec 2020

Dear Pro. Ping He:

Sincerely,

Xuezhong Xu

Reviewer 2#

1. It may be reasonable not to include abstracts in a meta-analysis. Thank you for having clarified this choice.

Response: Thank you.

2. Thank you for this major change. I think that this choice, such as the others subsequently listed, should somehow be discussed and motivated in the Methods section of the study.

3. And 4. Thank you for these comments. Please try to discuss these choices in the method section.

Response: we discussed the choices of data extraction in the method section/data extraction and quality assessment. (they are in blue font, Line 146-162, Page 7-8).

3. Of course I am aware of the meaning of the Fagan Nomogram; my doubts refer to the clinical, real life application of the Bayes’ theorem in this field. In particular, I criticize the arbitrary choice of the 20% pre-test probability of a patient having a polyp or of a polyp being an adenoma chosen in the paper.

I find not clear the sentence “the pretest probability was defined as the prevalence of the target condition” (line 175, statistical analysis): The pre-test probability of a patient having a polyp of course will change accordingly to the patient’s characteristics and to indication for colonoscopy; the pre-test probability of a polyp being an adenoma depends from the characteristics of the polyp and is even more difficult to estimate in terms of %.

If you want to keep the Fagan nomogram in the paper, I suggest choosing more appropriate pre-test probabilities (for example, expected polyp or adenoma detection rates in screening population derived from previous studies). However, I am really skeptical about the use of Fagan nomogram especially in the field of polyps’ classification: what does it mean that a polyp has 20% pre-test probability of being an adenoma? I think that an endoscopist generally has a much clearer idea of the nature of a polyp.

Response: After careful consideration, we thought the Fagan nomogram might not be suitable for this analysis. We chose to delete the results of Fagan nomogram. The statistical method of Fagan nomogram was deleted. The results about Fagan nomogram were deleted. The Fig 4 and Fig 6 were deleted. The order of other pictures was rearranged.

4. Thank you for the effort done to perform this subgroup analysis. Maybe you could consider inserting this subgroup analysis in the paper as a supplementary data, briefly discussing the results in the main paper.

Response: we added the subgroup analysis about diminutive polyps as S3 Table. We also discussed it in the discussion part (blue font, Line 425-431, Page 24).

5. The paper has for sure improved. However, I still find that some sections could be slightly modified in order to avoid repetitions (for example, lines 40-41 and 43-44 in the abstract; lines 205-206 and 210 in the results) and ease reading (line 447 in the discussion section). Some minor spelling mistakes are still present: line 187, he SROC (instead of the SROC); line 248, heterpgeneity; line 395, applid;

Response:

Line 40-41 and 43-44 were modified. (blue font, Line 39-43, Page 2)

Line 205-206 and 210 were modified. (blue font, Line 216, Page 10)

Line 447 was modified. (blue font, Line 464-465, Page 25)

Line 187 was modified. (blue font, Line 198, Page 9)

Line 248 was modified. (blue font, Line 254, Page 15)

Line 395 was modified. (blue font, Line 398, Page 22)

6. Thanks to the changes done, tables 1 and 2 are easier to read now and more informative. I only suggest to modify “type of endocytoscopes” in table 1 with “type of endoscopes”, since endocytoscopes were used only in few papers.

Response: “endocytoscopes” was modified into “endoscopes”.

7. For the study by Kudo et al (reference 26), I also think that considering both White light images and NBI images may lead to duplication of data, because they refer to the same polyps.

Response: we carefully reviewed the reference, we found it was inappropriate to delete any of them. However, it is different from Renner et al.. In Renner et al. the images of standard-confidence predictions might be the same as the images of high-confidence predictions. In Kudo et al. images in different models are different. As a result, we chose to include both of WLI and NBI images. We added discussion about this choice in the data extraction and quality assessment part. (blue font, Line 157-162, Page 8)

8. In the abstract it is stated that “the diagnostic performance of the CNN system was superior to that of the expert and non-expert” (lines 43-44). However, the meta-analysis did not find any statistically significant difference. Modify the sentence accordingly.

Response: the sentence was modified into “the diagnostic performance of the CNN system was superior to that of the expert and non-expert in the field of CP classification, although the differences were not statistically significant”. (blue font, Line 41-43, Page 2)

9. I suggest to split the paragraph “The comparison of diagnostic performance among CNN system, expert and non-expert” (page 18), by positioning the comparisons in the fields of detection and classification immediately after the relative paragraphes “diagnostic performance of expert and non-expert”.

Response: the paragraph was split and repositioned immediately after the relative paragraphs “diagnostic performance of expert and non-expert”.

Reviewer 4#

1. The article provide valuable information. The manuscript is well organized. I would like authors to confirm if they have taken the informed consent. Also authors should rectify the grammatical mistakes and language errors in every paragraph. Authors are advise also to incorporate more citations and to put citation newer than 2010. I recommend the manuscript for the publication with minor revision.

Response: All the authors have taken the informed consent and have no conflict of interest to disclose. I am sorry for grammatical mistakes and language errors. We have re-polished the whole article. All the places that have been modified were marked in red font.

We have checked the reference list, and we found that reference 4, 8, 15, 16, 18, 19, 20, and 21 were published before 2010. We changed reference 4 and 8 into newest ones. The rest of them are about the statistical methods of meta-analysis, which have been invented a long time ago. So we did not change the rest of the references.

Reviewer #5:

I happen to be a follow-on reviewer and have been able to review your submissions, original as well as revised addressing the concerns of previous worthy reviewers. I found that the revised manuscript is much better and to me most of the adjustments, corrections and modifications have already been addressed. I would suggest to give some more consideration to the write-up in standard format of English.

Response: We have re-polished the manuscript. We hope that it will be in the standard format of English. All the places that have been modified were marked in red font.

Reviewer #6:

1. As technology evolves it is more present in our personal lives and professional lives. We need to embrace because it is the future and there is no doubt about this. I congratulate the authors on their work regarding the volume of research and also on the starting hypothesis. We call these advanced structures AI or artificial intelligence but in essence they are algorithms which take into consideration past information on only selected types of lesions but they do not make logical or X crossed decisions. These aspects need to be taken into account always when discussing this technology-the lack of flexibility.

Response: we have added the limitations of AI to the discussion part. (blue font, Line 451-457, Page 25)

2. There are some english mystakes. I have marked some of them in the attached document. Please adress them.

Response: we have corrected all the mistakes.

Line 194 “informed consent form”: we deleted the ethics part, because the editor said it is not needed in the meta-analysis.

Line 209 “rules out” into “excluded”

Line 367-368 ”despite the differences were statistically insignificant” into “although the differences were statistically insignificant”

Line 393 “qualify being an expert” into “However, it is vital to consider that not all endoscopists possess expert experience”

Line 398 “applid” into “harbor the application”

3. Line 102-104

102 Moreover, findings from some studies showed that the CNN system could automatically

103 classify CP, which is significantly helpful for the therapeutic decision-making

104 process during colonoscopy.

Response: The CNN system could classify CP based on its morphological features. we have added this information to this sentence. (red font, Line 104-105, Page 5)

4. Line 101-102.

It is a type of the most prevalent network architectures of deep learning (DL) methods based on artificial intelligence (AI) technology.

Please reformulate-this phrase is pretty difficult to understand by the reader. I understand the information is abstract but it needs to make sense to everyone.

Response: We found it was difficult to reformulate this sentence. In order to make everyone understand the concept about AI, DP and CNN, we added a figure (S1 Fig) to this abstract.

5. Line 214. the authors mention all of the articles were published in the last 4 years. In the abstract the end of the time-lime of the search was April 2020. Please also provide a starting date of the search if it exists.

Response: The application of CNN in the field of colonoscopy has been introduced recently. When we searched for the relevant articles, we did not set a starting date. The result showed all studies about CNN in colonoscopy were published in the last 4 years.

6. Other aspects from my point of view that the authors should mention in a few words is the problem of ethics. How much can a doctor base his decision on an algorithm and what are the legal implications if a decision is wrong and the CP classified as benign proves to be malignant.

Response: We added a few sentences in the discussion part to declare the problem of ethics. (red font, Line 432-435, Page 24)

7. Another aspect which I did not see addressed is the suboptimal colonic preparation for colonoscopy. In current practice we have all encountered it. How does the CNN address these issues. In the studies included in the analysis were all of the patients prepared ideally for the colonoscopy?

Response: All the article included in my analysis extracted images with high quality (high-definition and well bowel preparation). As a result, it will lead to selection bias. Meanwhile, the performance of CNN system may be excellent in the training set but weak in the clinical practice. (Line 436-439, Page 24)

Besides, there is also an AI system that specially evaluates bowel preparation [1].

1. Zhou J, Wu L, Wan X, Shen L, Liu J, Zhang J, et al. A novel artificial intelligence system for the assessment of bowel preparation (with video). Gastrointest Endosc. 2020;91(2):428-35.e2. Epub 2019/11/30. doi: 10.1016/j.gie.2019.11.026. PubMed PMID: 31783029.

PLoS One. doi: 10.1371/journal.pone.0246892.r005

Decision Letter 2

Ping He

26 Jan 2021

PONE-D-20-22694R2

Comparison of diagnostic performance between convolutional neural networks and human endoscopists for diagnosis of colorectal polyp: a systematic review and meta-analysis

PLOS ONE

Dear Dr. Xu,

Please submit your revised manuscript by Mar 12 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Ping He, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (if provided):

This manuscript can be accepted with additional conditions. Ask the author to contribute to one of the reviewers' questions. The author is requested to complete the revision and send it to me for review and approval.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

Reviewer #5: All comments have been addressed

Reviewer #6: All comments have been addressed

Reviewer #7: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #5: Yes

Reviewer #6: Yes

Reviewer #7: Yes

**********

6. Review Comments to the Author

Reviewer #1: The revised manuscript has bee revised appropriately. I have no comment for the authors.

Reviewer #2: I would like to thank the authors for the efforts done in order to answer to all the comments provided. The revisions increased the general quality of the paper, however I still have some doubts about methodological/statistical aspects of the meta-analysis that have to be addressed.

The most important aspect of a meta-analysis is the reproducibility of the analysis by external readers.

I carefully analyzed the results showed in table 2. In this table are reported the diagnostic categories (TP, FP, TN, FN) of all the studies included in the meta-analysis. I tried to calculate by myself sensitivity for detection and I found different values from the ones reported in the paper. For example, considering only the seven papers labeled as “detection” studies in table 1, I calculated for the CNN system a sensitivity (that is TP/TP+FN) of 64902/77412=0.838, that is different from the reported 0.909.

After careful review, I have noticed that the study by Kudo et al [17] has been erroneously classified as detection study in table 1, while it is a classification study. Even excluding this study, however, sensitivity for CNN for detection is different (60462/77252= 0.782) from the reported one.

I have also another doubt: were all the data included in table 2 used to calculate diagnostic performances? If yes, there was a duplication of small lesions in Kudo [17] (reported both all the lesions, that include small lesions, and separately only lesions < 5 mm), Ozawa [28] (all lesions and only lesions < 10 mm) and Renner [15] (all lesions and diminutive rectosigmoid).

1. I suggest to split table 2 in two different tables, one including only detection studies and one classification studies. This will make easier for readers (and for reviewers) to independently control the diagnostic performances reported in the paper and will allow to clarify the reasons for the difference I found in detection sensitivity.

2. Modify the reported “field” of the study by Kudo [17] in table 1 (detection � classification). I suggest to carefully review all the papers included in order to correctly classify them into detection and classification studies.

3. I suggest to carefully review again all the data inserted (for example, I noticed that for Renner et al. are reported 99 results in table 2, but the original paper includes 100 polyps) and to recalculate all the reported diagnostic performances

4. Were small lesions in Kudo, Ozawa and Renner considered twice as they were reported in table 2? If yes, this is a mistake that has to be corrected (duplication of small polyps)

One more limitation of the study is that more than half of the data come from a single study (Guo et al [16], especially because of the inclusion of “full-videos” data. The results for detection are, for this reason, strongly “guided” by this study.

5. Can you provide, in addition to the general analysis, another analysis for detection excluding the large study by Guo et al? It could be very informative and add strength to the meta-analysis results. You should also briefly discuss this aspect in the discussion

Below are listed my comments to the revisions made in response to my previous comments:

2. Response: we discussed the choices of data extraction in the method section/data extraction and quality assessment. (they are in blue font, Line 146-162, Page 7-8).

Thank you for the explanations provided.

3. Response: After careful consideration, we thought the Fagan nomogram might not be suitable for this analysis. We chose to delete the results of Fagan nomogram. The statistical method of Fagan nomogram was deleted. The results about Fagan nomogram were deleted. The Fig 4 and Fig 6 were deleted. The order of other pictures was rearranged.

Thank you for having taken in consideration my observations.

4. Response: we added the subgroup analysis about diminutive polyps as S3 Table. We also discussed it in the discussion part (blue font, Line 425-431, Page 24).

Thank you. I suggest to briefly describe main results of this subgroup analysis in the results section.

7. Response: we carefully reviewed the reference, we found it was inappropriate to delete any of them. However, it is different from Renner et al. In Renner et al. the images of standard-confidence predictions might be the same as the images of high-confidence predictions. In Kudo et al. images in different models are different. As a result, we chose to include both of WLI and NBI images. We added discussion about this choice in the data extraction and quality assessment part. (blue font, Line 157-162, Page 8)

Thank you for having clarified the differences between the two papers. Even if I am still convinced that considering both the modalities may include a risk of duplication of data (and, more in general, that the use of “per frame” analysis instead of “per lesion” analysis may be misleading and not applicable to clinical practice), now I can understand this choice.

I suggest including in the discussion a few sentences about the possibly reduced translational applicability of the results of this metanalysis because of the use of per frames and per video data: in clinical practice it is important to identify and classify a specific polyp, not all the images regarding the polyp itself.

8. Response: the sentence was modified into “the diagnostic performance of the CNN system was superior to that of the expert and non-expert in the field of CP classification, although the differences were not statistically significant”. (blue font, Line 41-43, Page 2)

Thank you for the change. However, there is a spelling mistake in line 43: were not statistically insignificant instead of were not statistically significant.

9. Response: the paragraph was split and repositioned immediately after the relative paragraphs “diagnostic performance of expert and non-expert”.

Thank you.

Minor concerns:

- Lines 27-28: I find too severe the sentence “significantly unsatisfactory” to describe the efficacy of colonoscopy.

- Line 71: I don’t find the word approaches adequate in this context. I suggest to modify it in “mechanisms”

- Line 106-107: The new sentence “Nevertheless, this technology has not reached maturity hence unsatisfactory.” seems incomplete to me.

- I suggest to include some statistical results (at least p-values) in the abstract.

Reviewer #3: (No Response)

Reviewer #5: I believe it is a much better and appropriate manuscript after the second revision. I congratulate the researchers/authors for doing a wonderful job.

Reviewer #6: The authors have addressed the raised issues with the article. From my personal point of view the paper is fit for publication.

Reviewer #7: The present revision of the manuscript is well polished and much better than the original submission of the same. The reviewer comments have also been responded well.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Andreas Adler, M.D.

Reviewer #5: Yes: M Amir

Reviewer #6: No

Reviewer #7: No

PLoS One. 2021 Feb 16;16(2):e0246892. doi: 10.1371/journal.pone.0246892.r006

Author response to Decision Letter 2

26 Jan 2021

Dear Pro. Ping He:

Thank you and the reviewers for all your valuable suggestions. We have carefully read through the comments and made proper revisions. Our responses to the reviewer’s questions are listed below. We greatly appreciated your time and efforts to improve our manuscript for publication.

Sincerely,

Xuezhong Xu

Reviewer 2#

1. I carefully analyzed the results showed in table 2. In this table are reported the diagnostic categories (TP, FP, TN, FN) of all the studies included in the meta-analysis. I tried to calculate by myself sensitivity for detection and I found different values from the ones reported in the paper. For example, considering only the seven papers labeled as “detection” studies in table 1, I calculated for the CNN system a sensitivity (that is TP/TP+FN) of 64902/77412=0.838, that is different from the reported 0.909.

Response: We recalculated the whole analysis. We do the analysis using the midas package of STATA. We found the result of pooled sensitivity or specificity was not simply calculated as TP/(TP+FN). The data must be transformed somehow according to the heterogeneity. As a result, maybe the sensitivity or specificity you calculated was not right.

2. After careful review, I have noticed that the study by Kudo et al [17] has been erroneously classified as detection study in table 1, while it is a classification study. Even excluding this study, however, sensitivity for CNN for detection is different (60462/77252= 0.782) from the reported one.

Response: We modified Kudo to the field of CP classification and do all the analysis again.

3. I have also another doubt: were all the data included in table 2 used to calculate diagnostic performances? If yes, there was a duplication of small lesions in Kudo [17] (reported both all the lesions, that include small lesions, and separately only lesions < 5 mm), Ozawa [28] (all lesions and only lesions < 10 mm) and Renner [15] (all lesions and diminutive rectosigmoid).

Response: We deleted all the data which had the potential risk of duplication. Initially, we wanted to perform a subgroup analysis for diminutive polyps, but STATA could not analyze data with sample size smaller than 4.

4. I suggest to split table 2 in two different tables, one including only detection studies and one classification studies. This will make easier for readers (and for reviewers) to independently control the diagnostic performances reported in the paper and will allow to clarify the reasons for the difference I found in detection sensitivity.

Response: We split Table 2 into two different tables.

5. Modify the reported “field” of the study by Kudo [17] in table 1 (detection � classification). I suggest to carefully review all the papers included in order to correctly classify them into detection and classification studies.

Response: We modified Kudo to the field of CP classification and do all the analysis again.

6. I suggest to carefully review again all the data inserted (for example, I noticed that for Renner et al. are reported 99 results in table 2, but the original paper includes 100 polyps) and to recalculate all the reported diagnostic performances.

Response: We checked all data included again and modified the data of Renner et al.

7. Were small lesions in Kudo, Ozawa and Renner considered twice as they were reported in table 2? If yes, this is a mistake that has to be corrected (duplication of small polyps)

8. One more limitation of the study is that more than half of the data come from a single study (Guo et al [16], especially because of the inclusion of “full-videos” data. The results for detection are, for this reason, strongly “guided” by this study.

Can you provide, in addition to the general analysis, another analysis for detection excluding the large study by Guo et al? It could be very informative and add strength to the meta-analysis results. You should also briefly discuss this aspect in the discussion.

Response: We did subgroup analysis without the data of Guo et al. and discussed it in Line 392-396, Page 20.

9. Minor concerns:

Minor concerns:

- Lines 27-28: I find too severe the sentence “significantly unsatisfactory” to describe the efficacy of colonoscopy.

- Line 71: I don’t find the word approaches adequate in this context. I suggest to modify it in “mechanisms”

- Line 106-107: The new sentence “Nevertheless, this technology has not reached maturity hence unsatisfactory.” seems incomplete to me.

- I suggest to include some statistical results (at least p-values) in the abstract.

Response: We modified all the minor concerns.

Attachment

Submitted filename: response to reviewers.docx

Click here for additional data file.^{(20.8KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0246892.r007

Decision Letter 3

Ping He

28 Jan 2021

Comparison of diagnostic performance between convolutional neural networks and human endoscopists for diagnosis of colorectal polyp: a systematic review and meta-analysis

PONE-D-20-22694R3

Dear Dr. Xu,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Ping He, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

The authors have addressed all the questions issued by the reviewers. I think the quality of the paper has been greatly improved. I suggest publishing it directly.

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0246892.r008

Acceptance letter

Ping He

4 Feb 2021

PONE-D-20-22694R3

Comparison of diagnostic performance between convolutional neural networks and human endoscopists for diagnosis of colorectal polyp: a systematic review and meta-analysis

Dear Dr. Xu:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof. Ping He

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Table. PRISMA flow diagram.

(DOC)

Click here for additional data file.^{(59KB, doc)}

S2 Table. PRISMA checklist.

(DOC)

Click here for additional data file.^{(72.5KB, doc)}

S3 Table. Subgroup analysis without the data of short or full videos in the field of CP detection.

(DOCX)

Click here for additional data file.^{(16.5KB, docx)}

Attachment

Submitted filename: Response to reviewers.docx

Click here for additional data file.^{(33.5KB, docx)}

Attachment

Submitted filename: Re-Review PLOS one CNNs polyps.docx

Click here for additional data file.^{(25.1KB, docx)}

Attachment

Submitted filename: PONE-D-20-22694_R1_reviewer (1).pdf

Click here for additional data file.^{(2.9MB, pdf)}

Attachment

Submitted filename: response to reviewers.docx

Click here for additional data file.^{(20.8KB, docx)}

Data Availability Statement

All relevant data are within the manuscript and its Supporting Information files.

[pone.0246892.ref001] 1.Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424. 10.3322/caac.21492 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref002] 2.Kuntz KM, Lansdorp-Vogelaar I, Rutter CM, Knudsen AB, van Ballegooijen M, Savarino JE, et al. A systematic comparison of microsimulation models of colorectal cancer: the role of assumptions about adenoma progression. Med Decis Making. 2011;31(4):530–9. 10.1177/0272989X11408730 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0246892.ref003] 3.Strum WB. Colorectal Adenomas. N Engl J Med. 2016;375(4):389–90. 10.1056/NEJMc1604867 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref004] 4.Montminy EM, Jang A, Conner M, Karlitz JJ. Screening for Colorectal Cancer. The Medical clinics of North America. 2020;104(6):1023–36. Epub 2020/10/26. 10.1016/j.mcna.2020.08.004 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref005] 5.Pilonis ND, Bugajski M, Wieszczy P, Franczyk R, Didkowska J, Wojciechowska U, et al. Long-Term Colorectal Cancer Incidence and Mortality After a Single Negative Screening Colonoscopy. Ann Intern Med. 2020;173(2):81–91. Epub 2020/05/26. 10.7326/M19-2477 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref006] 6.Li D, Liu L, Fevrier HB, Alexeeff SE, Doherty AR, Raju M, et al. Increased Risk of Colorectal Cancer in Individuals With a History of Serrated Polyps. Gastroenterology. 2020;159(2):502–11.e2. Epub 2020/04/12. 10.1053/j.gastro.2020.04.004 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref007] 7.Ijspeert JEG, Bastiaansen BAJ, van Leerdam ME, Meijer GA, van Eeden S, Sanduleanu S, et al. Development and validation of the WASP classification system for optical diagnosis of adenomas, hyperplastic polyps and sessile serrated adenomas/polyps. Gut. 2016;65(6):963–70. 10.1136/gutjnl-2014-308411 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref008] 8.Ijspeert JEG, Bevan R, Senore C, Kaminski MF, Kuipers EJ, Mroz A, et al. Detection rate of serrated polyps and serrated polyposis syndrome in colorectal cancer screening cohorts: a European overview. Gut. 2017;66(7):1225–32. 10.1136/gutjnl-2015-310784 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref009] 9.Allen JE, Sharma P. Polyp characterization at colonoscopy: Clinical implications. Best practice & research Clinical gastroenterology. 2017;31(4):435–40. Epub 2017/08/27. 10.1016/j.bpg.2017.07.001 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref010] 10.van Rijn JC, Reitsma JB, Stoker J, Bossuyt PM, van Deventer SJ, Dekker E. Polyp miss rate determined by tandem colonoscopy: a systematic review. Am J Gastroenterol. 2006;101(2):343–50. 10.1111/j.1572-0241.2006.00390.x . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref011] 11.Yamada M, Sakamoto T, Otake Y, Nakajima T, Kuchiba A, Taniguchi H, et al. Investigating endoscopic features of sessile serrated adenomas/polyps by using narrow-band imaging with optical magnification. Gastrointest Endosc. 2015;82(1):108–17. 10.1016/j.gie.2014.12.037 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref012] 12.Wang P, Xiao X, Glissen Brown JR, Berzin TM, Tu M, Xiong F, et al. Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy. Nat Biomed Eng. 2018;2(10):741–8. 10.1038/s41551-018-0301-3 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref013] 13.Chen PJ, Lin MC, Lai MJ, Lin JC, Lu HH, Tseng VS. Accurate Classification of Diminutive Colorectal Polyps Using Computer-Aided Analysis. Gastroenterology. 2018;154(3):568–75. 10.1053/j.gastro.2017.10.010 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref014] 14.Mori Y, Kudo SE, Misawa M, Saito Y, Ikematsu H, Hotta K, et al. Real-Time Use of Artificial Intelligence in Identification of Diminutive Polyps During Colonoscopy: A Prospective Study. Ann Intern Med. 2018;169(6):357–66. 10.7326/M18-0249 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref015] 15.Renner J, Phlipsen H, Haller B, Navarro-Avila F, Saint-Hill-Febles Y, Mateus D, et al. Optical classification of neoplastic colorectal polyps—a computer-assisted approach (the COACH study). Scand J Gastroenterol. 2018;53(9):1100–6. 10.1080/00365521.2018.1501092 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref016] 16.Guo Z, Nemoto D, Zhu X, Li Q, Aizawa M, Utano K, et al. A polyp detection algorithm can detect small polyps: An ex vivo reading test compared with endoscopists. Dig Endosc. 2020. 10.1111/den.13670 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref017] 17.Kudo SE, Misawa M, Mori Y, Hotta K, Ohtsuka K, Ikematsu H, et al. Artificial Intelligence-assisted System Improves Endoscopic Identification of Colorectal Neoplasms. Clin Gastroenterol Hepatol. 2019. 10.1016/j.cgh.2019.09.009 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref018] 18.Ozawa T, Ishihara S, Fujishiro M, Kumagai Y, Shichijo S, Tada T. Automated endoscopic detection and classification of colorectal polyps using convolutional neural networks. Therap Adv Gastroenterol. 2020;13:1756284820910659 10.1177/1756284820910659 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0246892.ref019] 19.Whiting PF, Rutjes AWS, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36. 10.7326/0003-4819-155-8-201110180-00009 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref020] 20.Deeks JJ. Systematic reviews in health care: Systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323(7305):157–62. 10.1136/bmj.323.7305.157 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0246892.ref021] 21.Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003;327(7414):557–60. 10.1136/bmj.327.7414.557 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0246892.ref022] 22.Jackson D, White IR, Thompson SG. Extending DerSimonian and Laird’s methodology to perform multivariate random effects meta-analyses. Stat Med. 2010;29(12):1282–97. 10.1002/sim.3602 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref023] 23.Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med. 1993;12(14):1293–316. 10.1002/sim.4780121403 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref024] 24.Jones CM, Athanasiou T. Summary receiver operating characteristic curve analysis techniques in the evaluation of diagnostic tests. Ann Thorac Surg. 2005;79(1):16–20. 10.1016/j.athoracsur.2004.09.040 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref025] 25.Jackson D, Riley R, White IR. Multivariate meta-analysis: potential and promise. Stat Med. 2011;30(20):2481–98. 10.1002/sim.4172 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0246892.ref026] 26.Ian RW. Multivariate random-effects meta-regression: Updates to mvmeta. Stata Journal. 2011;11(2):255–70. 10.1177/1536867X1101100206 [DOI] [Google Scholar]

[pone.0246892.ref027] 27.Byrne MF, Chapados N, Soudan F, Oertel C, Linares Pérez M, Kelly R, et al. Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model. Gut. 2019;68(1). 10.1136/gutjnl-2017-314547 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0246892.ref028] 28.Lequan Y, Hao C, Qi D, Jing Q, Pheng Ann H. Integrating Online and Offline Three-Dimensional Deep Learning for Automated Polyp Detection in Colonoscopy Videos. IEEE J Biomed Health Inform. 2017;21(1):65–75. 10.1109/JBHI.2016.2637004 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref029] 29.Shin Y, Balasingham I. Automatic polyp frame screening using patch based combined feature and dictionary learning. Comput Med Imaging Graph. 2018;69:33–42. 10.1016/j.compmedimag.2018.08.001 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref030] 30.Urban G, Tripathi P, Alkayali T, Mittal M, Jalali F, Karnes W, et al. Deep Learning Localizes and Identifies Polyps in Real Time With 96% Accuracy in Screening Colonoscopy. Gastroenterology. 2018;155(4):1069–78 e8. 10.1053/j.gastro.2018.06.037 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0246892.ref031] 31.Yamada M, Saito Y, Imaoka H, Saiko M, Yamada S, Kondo H, et al. Development of a real-time endoscopic image diagnosis support system using deep learning technology in colonoscopy. Sci Rep. 2019;9(1):14465 10.1038/s41598-019-50567-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0246892.ref032] 32.Zhang R, Zheng Y, Poon CCY, Shen D, Lau JYW. Polyp detection during colonoscopy using a regression-based convolutional neural network with a tracker. Pattern Recognit. 2018;83:209–19. 10.1016/j.patcog.2018.05.026 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0246892.ref033] 33.Chandrasekhara V, Desilets D, Falk GW, Inoue H, Romanelli JR, Savides TJ, et al. The American Society for Gastrointestinal Endoscopy PIVI (Preservation and Incorporation of Valuable Endoscopic Innovations) on peroral endoscopic myotomy. Gastrointest Endosc. 2015;81(5). 10.1016/j.gie.2014.12.007 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref034] 34.Rex DK, Kahi C, O’Brien M, Levin TR, Pohl H, Rastogi A, et al. The American Society for Gastrointestinal Endoscopy PIVI (Preservation and Incorporation of Valuable Endoscopic Innovations) on real-time endoscopic assessment of the histology of diminutive colorectal polyps. Gastrointest Endosc. 2011;73(3):419–22. 10.1016/j.gie.2011.01.023 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref035] 35.Rees CJ, Rajasekhar PT, Wilson A, Close H, Rutter MD, Saunders BP, et al. Narrow band imaging optical diagnosis of small colorectal polyps in routine clinical practice: the Detect Inspect Characterise Resect and Discard 2 (DISCARD 2) study. Gut. 2017;66(5):887–95. 10.1136/gutjnl-2015-310584 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0246892.ref036] 36.Schachschal G, Mayr M, Treszl A, Balzer K, Wegscheider K, Aschenbeck J, et al. Endoscopic versus histological characterisation of polyps during screening colonoscopy. Gut. 2014;63(3):458–65. 10.1136/gutjnl-2013-304562 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref037] 37.Sehgal V, Rosenfeld A, Graham DG, Lipman G, Bisschops R, Ragunath K, et al. Machine Learning Creates a Simple Endoscopic Classification System that Improves Dysplasia Detection in Barrett’s Oesophagus amongst Non-expert Endoscopists. Gastroenterol Res Pract. 2018;2018:1872437 10.1155/2018/1872437 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0246892.ref038] 38.Cai S-L, Li B, Tan W-M, Niu X-J, Yu H-H, Yao L-Q, et al. Using a deep learning system in endoscopy for screening of early esophageal squamous cell carcinoma (with video). Gastrointest Endosc. 2019;90(5). 10.1016/j.gie.2019.06.044 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref039] 39.Abu Dayyeh BK, Thosani N, Konda V, Wallace MB, Rex DK, Chauhan SS, et al. ASGE Technology Committee systematic review and meta-analysis assessing the ASGE PIVI thresholds for adopting real-time endoscopic assessment of the histology of diminutive colorectal polyps. Gastrointest Endosc. 2015;81(3). 10.1016/j.gie.2014.12.022 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref040] 40.Kuiper T, Marsman WA, Jansen JM, van Soest EJ, Haan YCL, Bakker GJ, et al. Accuracy for optical diagnosis of small colorectal polyps in nonacademic settings. Clinical gastroenterology and hepatology: the official clinical practice journal of the American Gastroenterological Association. 2012;10(9). 10.1016/j.cgh.2012.05.004 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref041] 41.Guimarães P, Keller A, Fehlmann T, Lammert F, Casper M. Deep-learning based detection of gastric precancerous conditions. Gut. 2020;69(1):4–6. 10.1136/gutjnl-2019-319347 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref042] 42.Saxe A, Nelli S, Summerfield C. If deep learning is the answer, what is the question? Nature reviews Neuroscience. 2020. Epub 2020/11/18. 10.1038/s41583-020-00395-8 . [DOI] [PubMed] [Google Scholar]

[pone.0246892.ref043] 43.Lake BM, Ullman TD, Tenenbaum JB, Gershman SJ. Building machines that learn and think like people. The Behavioral and brain sciences. 2017;40:e253 Epub 2016/11/25. 10.1017/S0140525X16001837 . [DOI] [PubMed] [Google Scholar]

PERMALINK

Comparison of diagnostic performance between convolutional neural networks and human endoscopists for diagnosis of colorectal polyp: A systematic review and meta-analysis

Yixin Xu

Wei Ding

Yibo Wang

Yulin Tan

Cheng Xi

Nianyuan Ye

Dapeng Wu

Xuezhong Xu

Roles

Abstract

Introduction

Materials and methods

Literature search strategy

Study selection

Data extraction and quality assessment

Outcomes of interests

Statistical analyses

Results

Search strategy

Fig 1. Flow chart of studies identified, excluded and included.

Cohort characteristics and quality of included studies

Table 1. Characteristics of the studies included.

Table 2. A. Diagnostic performance of CNN system, expert, and non-expert in the field of polyp detection. B. Diagnostic performance of CNN system, expert, and non-expert in the field of polyp classification.

Fig 2. Methodological quality of the included 13 studies using assessment tool of QUADAS-2.

Application in the field of colorectal polyp detection diagnostic performance of CNN system

Fig 3. The pooled diagnostic accuracy index of CNN system in the field of CP detection.

Table 3. Diagnostic performance of CNN system, expert, and non-expert in the field of colorectal polyp classification.

Subgroup analysis without the data of short or full videos

Application in the field of colorectal polyp classification

Diagnostic performance of CNN system

Fig 4. The pooled diagnostic accuracy index of CNN system in the field of CP classification.

Diagnostic performance of expert and non-expert

The comparison of diagnostic performance among CNN system, expert, and non-expert

Fig 5.

Table 4. Comparison of diagnostic performance among CNN, expert, and non-expert in the field of colorectal polyp classification.

Publication bias and identification of sources of heterogeneity

Fig 6. Deeks’ funnel plot for publication bias.

Discussion

Conclusion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Ping He

Roles

Author response to Decision Letter 0

Decision Letter 1

Ping He

Roles

Author response to Decision Letter 1

Decision Letter 2

Ping He

Roles

Author response to Decision Letter 2

Decision Letter 3

Ping He

Roles

Acceptance letter

Ping He

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases