Classification and diagnosis of cervical lesions based on colposcopy images using deep fully convolutional networks: A man-machine comparison cohort study

Binhua Dong; Huifeng Xue; Ye Li; Ping Li; Jiancui Chen; Tao Zhang; Lihua Chen; Diling Pan; Peizhong Liu; Pengming Sun

doi:10.1016/j.fmre.2022.09.032

. 2022 Nov 9;5(1):419–428. doi: 10.1016/j.fmre.2022.09.032

Classification and diagnosis of cervical lesions based on colposcopy images using deep fully convolutional networks: A man-machine comparison cohort study

Binhua Dong ^a,^b,¹, Huifeng Xue ^c,¹, Ye Li ^a, Ping Li ^d, Jiancui Chen ^c, Tao Zhang ^e, Lihua Chen ^a, Diling Pan ^f, Peizhong Liu ^g,^h,^⁎, Pengming Sun ^a,^b,^⁎

PMCID: PMC11955021 PMID: 40166111

Abstract

Colposcopy is an important technique in the diagnosis of cervical cancer. The development of computer-aided diagnosis methods can mitigate the shortage of colposcopists and improve the accuracy and efficiency of colposcopy examinations in China. This study proposes the Dense-U-Net model for colposcopy image recognition. This was a man–machine comparison cohort study. It presents a novel artificial intelligence (AI) model for the diagnosis of cervical lesions through colposcopy images using a Dense-U-Net image semantic segmentation algorithm. The Dense-U-Net model was created by applying the methods of “deepening the network structure,” “applying dropout” and “max pooling.” Moreover, image-based and population-based diagnostic performances of the AI algorithm and physicians with different levels of specialist experience were compared. In total, 2,475 participants were recruited, and 13,084 colposcopy images were included in this study. The diagnostic accuracy of the Dense-U-Net model increased significantly with increasing colposcopy images per patient. As the number of images in the training set increased, the diagnostic accuracy of the Dense-U-Net model for cervical intraepithelial neoplasm 3 or worse (CIN3+) diagnosis increased (P = 0.035). The rate of diagnostic accuracy (0.89 vs 0.85, P < 0.001) of CIN3+ lesions using the Dense-U-Net model was higher than that of expert colposcopists, and the missed diagnosis (0.06 vs 0.07, P = 0.002) and false positive (0.05 vs 0.08, P < 0.001) were lower. Moreover, Dense-U-Net is more accurate in diagnosing the type III cervical transformation zone, which is difficult to diagnose by experts (P < 0.001). The Dense-U-Net model also showed higher diagnostic accuracy for CIN3+ in an independent test set (P < 0.001). To diagnose the same 870 test images, the Dense-U-Net system took 1.76 ± 0.09 min, while the expert, senior, and junior colposcopists took 716.3 ± 49.76, 892.1 ± 92.30, and 3034.7 ± 259.51 min, respectively. The study successfully built a reliable, quick, and effective Dense-U-Net model to assist with colposcopy examinations.

Keywords: Cervical lesions, Dense-U-Net, Colposcopic image, Lesion segmentation, Auxiliary diagnosis

Graphical abstract

1. Introduction

Cervical cancer is a leading cause of death in women worldwide, with 604,127 new cases and 341,831 deaths annually [1]. However, cervical cancer is preventable because it has a long preinvasive phase and is easily identified by specific clinical and histopathological examinations. Thus, the early detection and diagnosis of cervical lesions are particularly important for the prevention and treatment of cervical cancer [2]. After first-line screening for cervical cancer using cytology and/or high-risk human papillomavirus (HR-HPV) tests, colposcopy is a well-established diagnostic method for detecting cancerous and precancerous lesions through visual examination and biopsy of the uterine cervix [2,3]. In a traditional colposcopy examination, the diagnosis of cervical intraepithelial neoplasm (CIN) and cervical cancer mainly depends on an artificial reading, which is highly reliant on the operator's experience and knowledge. The accuracy and diagnostic efficiency of determining cervical lesions and early invasive cancer are limited. Moreover, colposcopy results are not always repeatable or accurate, and therefore false-negative rates fluctuate between 13% and 69% because of differences in physician experience and sample region [4]; other reports indicated that the false-negative rate of only colposcopy-positive biopsy samples was between 25% and 57% [5]. Thus, to reduce the missed diagnosis rate of high-grade lesions, gynecologists have increased sampling on colposcopy-negative regional tissue [4]. Furthermore, there is no clear standard regarding the number of specimens taken during a colposcopy and the sampling site; therefore, expert diagnosis is relatively subjective [5]. Dalla et al. [6] compared the primary diagnoses of colposcopy-based biopsies with those obtained from all clinical histological samples, including postoperative findings, and concluded that a low positive predictive value (PPV) for colposcopy may lead to overtreatment or missed treatment, mainly due to the uncorrected reading of the colposcopy image. In general, an accurate interpretation of colposcopy images by a clinician relies on the clinician's knowledge and experience, which takes a long time to gain and involves continued education and practice [7]. Furthermore, cervical cancers mostly occur in developing countries, where there is a shortage of medical resources and a great demand for colposcopists.

Given the diversity of pathologies and the causes of human fatigue, physicians have gradually benefited from computer-assisted interventions. Since the 1990s, computer-aided diagnosis (CAD) systems based on image processing and machine learning have become crucial in the medical field [8]. The development of CAD systems has significantly improved the efficiency and quality of medical image processing. Li [9] proposed a CAD system embodying a complex multisensor, multidata, and multifeature image analysis system for colposcopy. ColpoCAD, a CAD system for colposcopy, previously referred to as a “machine colposcopist expert”, has the potential to revolutionize the diagnosis of cervical cancer. However, the shallow learning mechanism based on machine learning cannot learn the more natural characteristics of the data. The methods of machine-learning-based CAD systems include image preprocessing, region of interest (ROI) segmentation, feature extraction, and recognition of tumor segmentation [10]. Traditional CAD systems heavily rely on handcrafted features and the feature fusion of classifiers, and the traditional shallow learning structure cannot meet the practical requirements of complex function modeling [11]. The recognition of medical images and the process of feature extraction in CAD technology must be optimized to improve diagnostic accuracy.

Recently, artificial intelligence (AI) based on deep learning has accomplished outstanding achievements in the field of image processing [12]. It performs deep feature extraction through a multilayer neural network and simulates more function expressions by training neural network parameters to make the model more efficient. A deep learning algorithm for CAD can directly extract features from the data and thus significantly reduce the workload and the impact of human intervention. Through the inherent deep structure of the neural network, the three core steps of feature extraction, selection, and classification can be realized in the same deep structure optimization [13]. Therefore, deep learning algorithms are expected to address the shortcomings of traditional manual reading and shallow machine learning CAD systems. A convolutional neural network (CNN) is a multilayer perceptron-based algorithm [14]. As one of the most prominent deep learning models, CNNs have made great achievements in image recognition and classification (ImageNet 2012 Image Classification Contest [11]). In classification tasks, the typical output is the classified image label. However, in many visual tasks, especially in biomedical image processing, the desired output should include localization, i.e., a class label is supposed to be assigned to each pixel [12]. After the fully convolutional network (FCN) [15] completes semantic image segmentation, an FCN-based U-Net [16] model is favored for medical image segmentation. This model can extend the image-level classification to the pixel level to realize semantic image segmentation.

In recent years, deep-learning-based methods have been proposed to detect cervical lesions in colposcopic images directly. Earlier reported deep learning models of colposcopy images were only 50% accurate in diagnosing cervical lesions; moreover, they only identified cervical cancer and not precancerous lesions [17]. In our previous study [18], a pretrained densely connected convolutional network was established to classify colposcopic images, and the diagnostic accuracy for cervical high-grade precancerous lesions (HSIL+) was improved to 73.08%. However, cervical lesions could only be classified into two categories [19]. The CYENET method was proposed to classify cervical cancers from colposcopy images automatically without segmentation and feature engineering stages; it can also extract discriminative features using ensemble approaches. However, this method cannot distinguish between different grades of cervical intraepithelial neoplasia and can only be used to classify cervical cancer in different transformation zone sites. Chen et al. [20] presented a novel discriminative cervical lesion detection system (CervixNet), which incorporates a new global class activation module and a new local bin excitation module in an end-to-end approach. With these two modules, CervixNet facilitates learning about lesions both globally and locally. However, it can only distinguish between normal and HSIL colposcopic images and lacks the detection of cervical low-grade precancerous lesions (LSIL). Since LSIL and HSIL patients have vastly different treatment options, differential diagnoses of LSIL and HSIL are crucial. To address the multiclass classification of cervical lesions, we previously proposed [21] a RetinaNet-based cervical lesion detection method by combining ResNet with FPN. The model uses focal loss to suppress the weight of simple, easy-to-separate samples to ensure that training can be focused on more important samples, thus enabling the detection of four levels of cervical lesions. However, the diagnostic accuracy of the model should be improved. Wang et al. [22] designed a dense block by extending DenseNet to FCNs and applied it to retinal vessel segmentation. Kolařík et al. [23] added interconnections between the same feature layers based on the Dense-U-Net model and applied it to brain and spine segmentation. The Dense-U-Net model reported in the above studies has an accuracy of 95.1%–99.7% for disease diagnosis, so the question arises: is it feasible to use Dense-U-Net for colposcopy image recognition? No relevant reports are available. The Dense-U-Net system established in previous studies had low utilization of image features, overfitting, and image background interference, and the colposcopic images of cervical lesions showed scattered lesion areas and inconsistent background color.

To correct the deficiencies in previous studies on intelligent recognition of colposcopic images, such as low utilization of image features, overfitting of images, and excessive use of global average pooling, we successfully built an AI model based on a Dense-U-Net deep neural network. The model is equipped with a feature merging function by deepening the network structure and improving the convolutional block; moreover, the global average pooling has been replaced with max pooling to effectively improve the cervical detection performance. Compared with previous studies, it has higher accuracy in identifying cervical lesions and can distinguish four different levels of cervical lesions [normal, cervical intraepithelial neoplasm 1 (CIN1), cervical intraepithelial neoplasm 2 (CIN2), and cervical intraepithelial neoplasm 3 or worse (CIN3+)].

2. Materials and methods

2.1. Study flowchart

The study flowchart, which is presented in Fig. 1, involved clinical data collection and image labeling (Fig. 1a), AI model training and validation based on deep learning (Fig. 1b), diagnostic testing performed by the AI model, and diagnosis of the same images by the AI model and colposcopists with different levels of experience (Fig. 1c).

2.2. Study population and diagnosis process

A total of 1870 participants from the Fujian Provincial Maternity & Children's Hospital (FMCH) and the Quanzhou First Hospital (QFH) were recruited for this study [24], [25], [26]. An additional 613 women were recruited in an independent colposcopy laboratory at the branch of Fujian Maternal and Child Health Hospital (Fujian Maternity Hospital), and their colposcopy images were collected as an independent test set. The study was registered in the National Clinical Trial Center (ChiCTR1800018279). All individuals in this study provided written informed consent. All participants received routine cervical cancer screening with HPV detection and cytology assays. According to the guidelines of the American Society for Colposcopy and Cervical Pathology (ASCCP) [27], all patients with abnormal cytological results and/or HR-HPV infections were required to have a coloscopy examination and cervical biopsy. If the biopsy result was more serious than a high-grade squamous intraepithelial lesion (HSIL, CIN2+), the patients underwent conization by loop electrosurgical excision procedure (LEEP) or a cold knife procedure. All specimens, including the biopsy and conization results, were independently examined by two pathologists, and a third physician was consulted in the case of a disagreement. The study was approved by the Ethics Committee of Fujian Maternity and Child Health Hospital (FMCH2018–075). All individuals in this study provided written informed consent.

2.3. Image labeling and data preprocessing before Dense-U-Net model training

As shown in Fig. 1a, after the pathologic results were confirmed, each image from the participants taken during the colposcopy examination was reviewed again and independently described by two experienced colposcopists (with more than 15 years of experience). The suspected lesion region (acetic-acid-responsive) was given a colposcopy diagnosis and labeled to distinguish CIN1, 2, 3, and cancer. The image characteristics from the visual observation with a colposcope included white epithelium on the cervix, primitive squamous epithelium, primitive columnar epithelium, spot blood vessels, abnormal blood vessels, no mosaic, erosion, and transformation zone type, all of which were recorded and described. A third gynecologist was consulted in the case of a disagreement.

Data preprocessing consisted of two stages. Fig. 2a shows a flow diagram for image preprocessing, which was described in our previous study [18,28,29]. The first stage was to process the original image data and the pathological diagnosis report information obtained from the hospital. According to the pathological diagnosis report, the image files of different degrees of lesions were classified, and then the corresponding ROI images and mask labels were generated through the image annotation files. The second stage was a series of data augmentation methods (random rotation, lighting change, zoom change, and cropping) before neural network training.

2.4. Setting up the training dataset, validation dataset, and task levels

The training dataset was set up based on the cohort receiving cervical cancer screening. After histology was confirmed and the colposcopy images were reviewed and labeled, 1000 participants with 8006 acetic acid-treated cervical images (including 312 cases of normal cervixes and 1716 images, 280 cases of CIN1 and 2515 images, 224 cases of CIN2 and 1763 images, and 184 cases of CIN3 or higher and 1012 images) were included in the training dataset (Table S2). There were no specific requirements for imaging pixels or equipment. In the experiments, model training was divided into three binary classification tasks that incorporated the diagnostic tasks of CIN1+ (differential-diagnosis CIN1+ or not, two groups), CIN2+ (differential-diagnosis CIN2+ or not, two groups), and CIN3+ (differential-diagnosis CIN3+ or not, two groups). Moreover, according to the cases in the training dataset, four study levels (250, 500, 750, and 1000 cases) were separately trained and tested under the same parameter settings to verify whether the model could self-learn and self-improve in the case of an increase in training data. The AI model used five-fold cross-validation in which the training set was randomly divided into five subsets, ensuring that each subset was proportionally consistent, and each cervical lesion category in the subsets was balanced. The effect of the model was calculated by taking four subsets as the training set and the remaining one subset as the validation set. This procedure was repeated five times, with each subset being used once as the validation set. The average of the five cross-validation results was taken as the final result to evaluate the performance of the trained model.

2.5. Deep learning Dense-U-Net model

Through a comparative analysis of several classical CNN models and considering the inconsistent background color and scattered lesion areas in cervical colposcopy image data, a deep CNN suitable for cervical image lesion detection and classification was proposed, based on U-Net (a fully CNN commonly used in medical image segmentation) and employing several techniques of DenseNet [30] (the winning model from the ImageNet Large-Scale Visual Recognition Challenge of 2017). Similar to DenseNet, our model strengthens feature propagation, encourages feature reuse, effectively alleviates the problem of gradient disappearance, significantly reduces the number of parameters, and effectively suppresses overfitting. Therefore, we call the proposed model Dense-U-Net. Previous studies have reported the application of Dense-U-Net in the medical field. Wang et al. [22] designed a dense block by extending DenseNet to FCNs and applied it to retinal vessel segmentation. Kolařík et al. [23] added interconnections between the same feature layers based on the original U-Net model and applied it to brain and spine segmentation. Although the Dense-U-Net model has been used in previous studies, the Dense-U-Net system established in previous studies had low utilization of image features, overfitting, and image background interference. Furthermore, the colposcopic images of cervical lesions have unfavorable diagnostic factors, such as scattered lesion areas and inconsistent background color. To improve the diagnostic accuracy and realize multiclass classification of cervical lesions, we have improved the system to maximize its optimization as follows: (1) We deepen the original network and add the idea of feature delivery in each convolution block. The improvement in the convolution block is mainly from an applied feature merge, which combines the input and output of the first convolution as the input of the second convolution. This technique enhances the delivery of features and effectively controls overfitting in the case of deeper networks. Through the deeper network structure and feature merge, the model can better learn the subtle differences of information in the images and enhance the learning of lesion characteristics, as shown in Fig. 2b. (2) Dropout methods are used in each convolutional block to reduce the effect of overfitting [31]. (3) Max pooling replaces the global average pooling in the transition layer. As shown in Fig. 2b (left), a transition layer is applied to the concatenation of convolution blocks in the Dense-U-net encoding stage. The transition layer applies max pooling as a replacement because the max pooling operation preserves more texture information, while the global average pooling operation mainly preserves the background information of the image. For the detection of the lesion area of the cervical image, the texture information of the lesion area is significantly more important than the background information. Fig. 2b shows the network structure. The final model contains eleven convolution blocks, five transition layers, a total of twenty-three convolutions, five global max pools, and five up-samples.

2.6. Diagnostic accuracy analysis and “Number needed for a diagnosis”

To test the AI model, 870 cases with 3149 original colposcopy images, including 431 normal cases with 1532 images, 229 cases of CIN1 with 883 images, 107 cases of CIN2 with 382 images, 69 cases of CIN3 with 222 images, and 34 cases of cancer with 130 images, were included in the test dataset. The clinical characteristics between the training and test datasets did not show any significant differences (Table S1). Options for diagnosis (normal, CIN1, CIN2, CIN3, and ICC) were required for each case. Three tasks, including the diagnosis of CIN1+, CIN2+, and CIN3+, were conducted for 870 cases. The number of images needed to achieve an accurate diagnosis was also evaluated (Table S2). For each diagnostic test or comparison, one image was randomly selected from the case's image file of the 870-case test dataset. Therefore, 870 images were analyzed every time.

2.7. Comparative test between the Dense-U-Net model and human experts

A comparison test consisting of 870 images of various challenging clinical situations was designed to compare the real-world performance of the AI algorithm to that of individual gynecologists, and it was performed three times (Fig. 1c). In each test, gynecologists with three levels of expertise (expert, senior, and junior) were asked to independently complete the same test task as the intelligence agent without any additional information. The expert gynecologists were professors with over 15 years of experience in the gynecology department. The senior gynecologists were attendees with over 10 years of experience in the gynecology department. The junior gynecologists were residents who had finished both clinical training and specific training for gynecology. Each physician independently performed a colposcopy examination for at least 200 cases per year.

2.8. Performance of the Dense-U-Net model on an independent test set

The performance of the Dense-U-Net model was evaluated on an independent external test set. We recruited 613 women in the independent colposcopy laboratory of the Fujian Maternity and Child Health Hospital Branch (Fujian Maternity Hospital) and collected their colposcopy images as an independent test set. The colposcopy images in the independent test set were randomly numbered, and the Dense-U-Net model and expert colposcopy physicians were used to read all randomly numbered images. When they read, they did not know the results of the other. Evaluating the diagnostic performance of the Dense-U-Net model in an independent test set using accurate detection, missed diagnosis and false positive indicators.

2.9. Statistical analysis

In the experiments, the receiver operating characteristic (ROC) curve is used to represent the performance of the deep CNN model according to the diagnostic results. The diagnostic levels of different physicians are marked on the ROC curve to intuitively express the diagnostic performance of our model compared to that of different physicians. ROC curves plot the true positive rate (sensitivity) versus the false-positive rate (1-specificity) [32]. The area under the curve indicates the diagnostic performance of the model. The ROC curve is obtained according to the threshold of different classification scores to calculate the corresponding true positive and false-positive rates. The diagnostic performance is often evaluated using the indices of sensitivity (SEN), specificity (SPEC), PPV, and negative predictive value (NPV) [33]. The SEN reflects the ability of the model to diagnose actual patients, and the SPEC reflects the ability of the model to identify a disease-free person. The PPV is the proportion of true positive subjects with a positive diagnostic test result; the NPV is the proportion of true negative subjects with a negative diagnostic test result.

3. Results

Overall, 2475 participants with 13,084 colposcopy images were included in this study. The training set consisted of 1000 participants with 8006 lesion-labeled images, and the other 870 participants with 3496 original images were randomly selected for either the test data or the human–AI comparison test. Meanwhile, 605 eligible participants from independent laboratories were recruited as an independent test set, and 1929 colposcopy images were collected. Table S1 shows that the basic information and the clinical symptom information, such as age, HPV infection status, cytology results, menopause status, and the number of pregnancies and births of the participants included in the training and test datasets, were not significantly different (P > 0.05).

3.1. Lesion detection results based on the improved Dense-U-Net AI model

The deep Dense-U-Net process for the diagnosis of cervical lesions is shown in Fig. 2C. The three main steps, i.e., training, testing, and diagnosis, represent different processes. The main purpose of the training phase (Fig. 3a) is to optimize the weight of the Dense-U-Net so that the training loss value is as low as possible. The optimization process is mainly achieved through the forward propagation calculation loss value and the back propagation (BP) optimization weight to continuously reduce the loss value. During the testing phase (Fig. 3a), the improved Dense-U-Net model optimized in the training phase is used for testing. Then, the diagnosis (Fig. 3a) can be performed by simply distinguishing the test results. The model is based on the U-Net framework and the main concept of DenseNet. The main frame of the network is a fully CNN, and the output of the network is a grayscale image of the lesion area. The binary classification of the image is achieved through the detection of the lesion area. As shown in Fig. 3a, the results of the lesions in this AI model were examined in different classification tasks. The results of some of the true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) are shown separately.

Fig 3 — **Correct diagnosis, false positive and missed diagnosis of the AI model for cervical lesions in the different training datasets.** (a) Test results of the AI model. The blue dashed box is the test data of the negative sample, and the red dashed box is the test data of the positive sample. Test results for the model CIN1+ diagnostic task (negative sample: normal; positive sample: CIN1/CIN2/CIN3/Cancer). Test results for the model CIN2+ diagnostic task (negative sample: normal/CIN1; positive sample: CIN2/CIN3/Cancer). Test results for the model CIN3+ diagnostic task (negative sample: normal/CIN1/CIN2; positive sample: CIN3/Cancer). (b) Diagnostic performance of the AI model for CIN1+ in different training sets. (c) Diagnostic performance of the AI model for CIN2+ in different training sets. (d) Diagnostic performance of the AI model for CIN3+ in different training sets. (e) One colposcopy image of CIN1+, CIN2+, and CIN3+ was placed in round 1, round 2 and round 3, respectively; these rounds were run for finding a needle in a haystack test. The AI model successfully excluded all normal cases and identified three cases of cervical lesions. Abbreviations: AI, artificial intelligence; TN, true negatives; FP, false positives; TP, true positives; FN, false negatives; CIN1+, cervical intraepithelial neoplasia 1 or worse; CIN2+, cervical intraepithelial neoplasia 2 or worse; CIN3+, cervical intraepithelial neoplasia 3 or worse.

3.2. Dense-U-Net model and self-learning validation

To verify the self-improvement of the Dense-U-Net model performance when the number of training data samples increased, 1000 cases were set to four different training dataset sizes (250, 500, 750, and 1000 cases) in the experiment. Each of the training datasets was trained on the same diagnostic tasks for CIN1+ (a disease with CIN1 or worse), CIN2+, and CIN3+. This part of the experiment verified the impact of an increase in data on the performance of the model. Therefore, the hyperparameters in Dense-U-Net training and the ratio of positive to negative samples were consistent among different tasks. After 5-fold cross-validation for each training task of the different training datasets, each trained model was tested three times on 870 images. The performance of the model was evaluated by calculating the average diagnostic score of the 5-fold cross-validation model test results. The results of the average accuracy are shown in Table S3 and Fig. 3b-e. The increase in the amount of training data is important for the diagnostic performance of the model (the more images in the training set, the higher the diagnostic performance of the model). This importance is especially reflected in the diagnosis of CIN3+ (P < 0.05). Fig. 3b-d more intuitively shows the self-improvement ability of the Dense-U-Net model with increasing amounts of training data.

3.3. Finding a needle in a haystack test

We aimed to further validate the ability of the Dense-U-Net model to detect lesions in normal cases. The dataset consisted of 297 normal colposcopy images and 3 abnormal colposcopy images (one each of CIN1, CIN2, and CIN3). The dataset was divided into three rounds for a three-time independent validation. For the different tasks, at each time, we placed one colposcopy image of CIN1+, CIN2+, and CIN3+ in round 1, round 2, and round 3, respectively. The Dense-U-Net model successfully excluded all normal cases and identified target cases with cervical lesions (Fig. 3e).

3.4. Results of image classification and diagnosis using the Dense-U-Net model

The pathological diagnostic report was used as the gold standard for the test data. We calculate the corresponding true positive, true negative, false positive, and false-negative rates through the mask output of each test image. Then, the accuracy (ACC), SEN, SPEC, PPV, and NPV can be calculated. As shown in Fig. 3b–e and Fig. 4, in the best model test results of 1000 case data as the training dataset, the Dense-U-Net model distinguishes normal and CIN1/CIN2/CIN3+ with an ACC of 66.2%, SEN of 64.4%, SPEC of 68%, PPV of 67.2%, and NPV of 65.2% in the CIN1+ classification task. In the CIN2+ classification diagnosis for distinguishing normal/CIN1 and CIN2/CIN3, the ACC was 75.8%, SEN was 43.6%, SPEC was 87.2%, PPV was 54.4%, and NPV was 81.5%. Finally, in the CIN3+ classification diagnosis, the ACC was 87.5%, SEN was 35.1%, SPEC was 87.2%, PPV was 49.1%, and NPV was 91.2%. Using the AI model to detect CIN1+, CIN2+, or CIN3+ lesions, we found that accurate detection linearly increases from CIN1+ to CIN3+, and missed detection linearly decreases. Unfortunately, there are no obvious regular changes in the detection of FP. Moreover, according to the cases in the training dataset, in the four study levels (250, 500, 750, and 1000 cases), the accurate detection of the Dense-U-Net model linearly increases with the number of cases (Fig. 3b–d).

Fig 4 — **Diagnostic ROC curve and diagnostic performance of the AI model for the diagnosis of cervical lesions for the different training sets.** (a) Comparison of AUC curve areas of 4 different training sets in CIN1+. (b) Comparison of AUC curve areas of 4 different training sets in CIN2+. (c) AUC curves of 4 different training sets in CIN3+ comparison of area. (d) Comparison of the performance metrics of 4 different training sets for the diagnosis of CIN1+/CIN2+/CIN3+, including sensitivity, specificity, negative predictive value and positive predictive value. Abbreviations: SEN, sensitivity; SPE, specificity; PPV, positive predictive value; NPV, negative predictive value.

3.5. Number needed for a diagnosis and diagnostic accuracy test

Table S2 shows how many images are needed for the Dense-U-Net model to achieve a valid and correct diagnosis. The results demonstrate that the number of cases needed to diagnose CIN2+ and CIN3+ was less than that for CIN1+ and normal cases. Moreover, the number of corrected diagnoses for normal cases, CIN2+, and CIN3+ were identical and were significantly higher than that for CIN3+ alone.

3.6. Comparison of the Dense-U-Net model with human experts

For the same 870-image test, physicians with different levels of expertise were asked to make a diagnosis based on only their own clinical experience. The Dense-U-Net model is compared with the experts in terms of ACC and other indicators to evaluate its diagnostic performance. After 5-fold cross-validation training, the Dense-U-Net model requires nearly 10 h for each training task. The diagnostic efficiency of the Dense-U-Net model is very high. Compared with manual diagnosis, the time efficiency is greatly improved, and the accuracy is guaranteed. The Dense-U-Net system took 1.76 ± 0.09 min to diagnose 870 test images, while the colposcopy expert, senior physicians, and junior physicians took 716.3 ± 49.76, 892.1 ± 92.30, and 3034.7 ± 259.51 min, respectively.

The partial test results are presented in Fig. 1c. Comparing the diagnostic performance results of the Dense-U-Net model, expert physicians, senior physicians, and junior physicians on cervical lesion screening revealed that the Dense-U-Net model has the lowest rate of misdiagnosis and missed diagnosis of CIN1+ cervical lesions, followed by the expert physicians; the diagnostic efficacy of the senior physicians was relatively low, and the junior physicians had the lowest diagnostic efficiency. The evaluation networks of the AI algorithm performed well (11% FP and 22% missed diagnosis) in identifying CIN1+ cases among the 870-image test dataset when compared with the individual physicians (expert: 13% FP and 19% missed diagnosis; senior: 18% FP and 20% missed diagnosis; junior: 20% FP and 21% missed diagnosis). Similar results were seen for CIN2+ and CIN3+ (Table 1). Therefore, we consider the performance of the Dense-U-Net agent to be comparable to that of a qualified physician. In addition, we found no difference between Dense-U-Net and colposcopic experts in the differential diagnosis of the type I/II cervical transformation zone (0.88 vs 0.87, P = 0.506). However, the Dense-U-Net system was substantially more accurate in identifying colposcopic images of the type III cervical transformation zone than colposcopic experts (0.80 vs 0.91, P < 0.001), as shown in Table 2.

Table 1.

Diagnostic performance of the AI model compared to colposcopy physicians with different levels of experience.

Subject	Junior	Senior	Expert	AI model	P value ^a
CIN1+
Accurate detection	0.59(0.52–0.65)	0.62(0.54–0.70)	0.68(0.59–0.76)	0.67(0.59–0.73)	<0.001
Missed diagnosis	0.21(0.15–0.26)	0.20(0.14–0.26)	0.19(0.15–0.23)	0.22(0.16–0.28)	<0.001
False positive	0.20(0.16–0.24)	0.18(0.14–0.22)	0.13(0.09–0.16)	0.11(0.08–0.15)	<0.001
CIN2+
Accurate detection	0.65(0.37–0.52)	0.72(0.68–0.76)	0.73(0.54–0.62)	0.75(0.70–0.80)	0.001
Missed diagnosis	0.13(0.09–0.17)	0.10(0.07–0.14)	0.10(0.08–0.12)	0.15(0.10–0.19)	<0.001
False positive	0.22(0.18–0.26)	0.18(0.15–0.21)	0.17(0.13–0.23)	0.10(0.05–0.14)	<0.001
CIN3+
Accurate detection	0.79(0.74–0.80)	0.84(0.80–089)	0.85(0.81–0.90)	0.89(0.86–0.93)	<0.001
Missed diagnosis	0.11(0.08–0.14)	0.08(0.06–0.10)	0.07(0.04–0.10)	0.06(0.04–0.09)	0.002
False positive	0.10(0.07–0.13)	0.08(0.07–0.09)	0.08(0.06–0.11)	0.05(0.03–0.08)	<0.001

Open in a new tab

AI model compared with expert colposcopists.

Abbreviations: CIN, cervical intraepithelial neoplasia; AI model, improved Dense-U-Net model.

Table 2.

Diagnostic performance of AI and colposcopy experts for CIN3+ in patients with different cervical transformation zones.

Subject	Expert	AI model	P value ^a
Ⅰ/Ⅱ types(n = 501) ^b
CIN3+
Accurate detection	0.88(0.84–0.91)	0.87(0.84–0.90)	0.506
Missed diagnosis	0.05(0.03–0.07)	0.02(0.01–0.03)	0.001
False positive	0.07(0.06–0.08)	0.12(0.10–0.14)	0.012
Sensitivity	0.55(0.42–0.68)	0.84(0.74–0.94)	0.001
Specificity	0.92(0.90–0.95)	0.87(0.84–0.90)	0.012
NPV	0.94(0.92–0.96)	0.98(0.96–0.99)	0.012
PPV	0.47(0.35–0.59)	0.45(0.35–0.54)	0.778
Ⅲ types(n = 369) ^c
CIN3+
Accurate detection	0.80(0.74–0.85)	0.91(0.86–0.95)	<0.001
Missed diagnosis	0.03(0.02–0.04)	0.02(0.01–0.03)	0.116
False positive	0.17(0.13–0.20)	0.08(0.05–0.10)	<0.001
Sensitivity	0.74(0.62–0.87)	0.88(0.78–0.97)	0.116
Specificity	0.81(0.77–0.85)	0.91(0.88–0.94)	<0.001
NPV	0.96(0.93–0.98)	0.98(0.96–0.99)	0.102
PPV	0.36(0.27–0.46)	0.59(0.47–0.70)	0.005

Open in a new tab

AI model compared with expert colposcopists.

Type 1 or type 2 cervical transformation zone.

Type 3 cervical transformation zone.

Abbreviations: CIN3+, cervical intraepithelial neoplasia grade 3 or worse; AI model, the improved Dense-U-Net model.

3.7. Performance of the Dense-U-Net model on an independent test set

A total of 613 women were recruited in separate test sets. Of these, 8 women were excluded because cervical histopathology was not performed, and a total of 1929 colposcopy images of 605 women were included. The diagnostic accuracy of the Dense-U-Net model for CIN1+/CIN2+/CIN3+ colposcopy images was higher than that of expert colposcopy physicians (P = 0.041, P = 0.008, P < 0.001). This is mainly reflected in the significant decrease in the missed diagnosis rate (P = 0.008, P < 0.001, P = 0.013). In this independent test set, it is further proven that the Dense-U-Net model established in this study is an effective and applicable intelligent diagnostic system for colposcopy images (Table 3).

Table 3.

Validation of the diagnostic performance of the AI model and colposcopy experts in an independent test set.

Subject	Expert	AI model	P value ^a
CIN1+
Accurate detection	0.62 (0.58–0.66)	0.68 (0.64–0.71)	0.041
Missed diagnosis	0.11 (0.06–0.16)	0.07 (0.02–0.11)	0.008
False positive	0.27 (0.22–0.32)	0.26 (0.20–0.31)	0.602
CIN2+
Accurate detection	0.75(0.71,0.78)	0.81 (0.78–0.84)	0.008
Missed diagnosis	0.07 (0.00–0.17)	0.02 (0.00–0.08)	<0.001
False positive	0.18 (0.14–0.21)	0.17 (0.14–0.21)	0.820
CIN3+
Accurate detection	0.83 (0.80–0.86)	0.92 (0.89–0.94)	<0.001
Missed diagnosis	0.03 (0.00–0.17)	0.01 (0.00–0.11)	0.013
False positive	0.14 (0.11–0.17)	0.07 (0.05–0.10)	<0.001

Open in a new tab

Abbreviations: CIN1+/2+/3+, cervical intraepithelial neoplasia grade 1/2/3 or worse; AI model, the improved Dense-U-Net model.

4. Discussion

In this study, a deep learning Dense-U-Net model was used to establish a colposcopy image diagnosis system for cervical lesions. Under the semantic-level labeling of professional physicians, the deep learning model performs very well on limited datasets and can learn the most critical features of colposcopy images through supervised learning methods. The diagnostic performance of the model is also good. For the diagnostic tasks of CIN3+, the diagnostic accuracy of the novel Dense-U-Net model established in this study reached 89%, surpassing that of expert physicians (with 20 years of clinical experience).

We found that the accuracy of the Dense-U-Net model for the diagnosis of CIN3+ (89% vs 85%, P < 0.001) and CIN2+ (75% vs 73%, P = 0.001) was significantly higher than that of colposcopy experts. The 2019 American Society for Colposcopy and Cervical Pathology (ASCCP) [34] guidelines pointed out that cervical lesion diagnosis is to detect CIN3+ lesions in time, such that patients can receive timely treatment. Therefore, to evaluate the advantages of diagnostic technology, the focus is to evaluate its diagnostic performance for cervical CIN3+. In contrast, for patients with minimal cervical lesions, colposcopy tests often miss the diagnosis [35], and the Dense-U-Net model presented here uses a deepened model structure, feature transfer functionality in each convolutional block, and max pooling, which improves the identification of small cervical lesions. Besides, cervical lesions often occur in the transformation zone. There are three types of cervical transformation zones in women, of which types I and II account for approximately 60%, for which colposcopy tests are easier and more accurate. The proportion of the type III transformation zone is approximately 40%, for which colposcopy tests are difficult and the results are easy to misdiagnose [36]. The Dense-U-Net model we established was significantly more accurate than expert colposcopy physicians in diagnosing cervical lesions in women with a type III transformation zone (P < 0.05).

Several experiments were performed to verify the performance and self-improvement of the deep neural network model with increasing amounts of training data. Although the improvement of the diagnostic results from training with 250 to 1000 cases was not significant, the performance of the model was still slightly improved with the support of data augmentation methods, and the result of 5-fold cross-validation was also more stable. In addition, in a training set of the same data volume, the ACC, SEN, SPEC, PPV, and NPV of the CNN model also showed stable trends with the CIN1+ to CIN3+ diagnostic tasks (Fig. 1c). The progressive increase in ACC was mainly due to the ease of identifying lesions that are more severe [37,17]. The changes in SEN, SPEC, PPV, and NPV were mainly because the proportion of positive and negative samples differs by diagnostic task. From the CIN1+ task to the CIN3+ task, the higher the proportion of positive samples, the higher the corresponding increases in SEN and PPV. Similarly, the higher the proportion of negative samples, the higher the corresponding increases in SPEC and NPV.

The purpose of a CAD system is to effectively improve the level and efficiency of medical care through computer-related technologies and theories [11,38]. We focused on comparing the effectiveness of both manual diagnostics and deep neural network diagnostic methods. The trained neural network model for the diagnosis of 600 test images takes less than 2 min, which is much less time than manual diagnosis and will not be affected by human intervention. Because manual diagnosis mostly depends on the clinician's experience and knowledge, unnecessary misdiagnosis due to fatigue and other external factors may occur [39,40]. With the help of an effective CAD system, the current medical diagnostic level can be greatly improved in terms of diagnostic efficiency.

There are still some limitations of this study. First, although 1870 cases with 11,155 images and valid pathological data are acquired in a cohort, the dataset is still relatively small for deep learning. Therefore, more data are required to verify the robustness of the model. Second, the lesion area on the cervical image of the training data is labeled by a professional colposcopist using a bounding box. The advantage of bounding box labeling is that it allows the deep neural network to learn critical image features and play a role in the attention mechanism. The disadvantages are also apparent. For large amounts of raw data, labeling by professional physicians requires a great deal of time. Third, model training is mainly based on the transfer learning method. We need to load only the pretrained model weights, freeze multiple convolutional layers in the CNN model, and train and update the convolutional layer weights at the end of the CNN layers. Although transfer learning can be performed in small training datasets, the pretrained model weights are mainly image features extracted from a large amount of other image data, and these large amounts of data (ImageNet, etc.) for neural network training and medical image data are often very different [41,42]. Therefore, the performance of our model depends largely on the model weights from pretraining. If the weights of these pretrained models are trained and optimized based on a large number of medical images, then our model's performance will improve.

5. Conclusion

In summary, we successfully built an AI model based on a Dense-U-Net deep neural network to diagnose cervical lesions from images acquired via colposcopy. Furthermore, the comparison of the Dense-U-Net model and manual diagnosis showed that the model is reliable, quick, and effective for the classification and diagnosis of cervical lesions by colposcopy. The overall diagnosis and efficacy were ideal, but they could be improved. In some diagnostic tasks, the model performed better than a clinician with some experience, which is highly clinically significant. In future work, we will apply the results of theoretical research to practical medical diagnostic tasks and further improve the effectiveness of AI in the early screening of cervical precancerous lesions.

CRediT authorship contribution statement

Binhua Dong: Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing. Huifeng Xue: Formal analysis, Investigation, Writing – original draft. Ye Li: Methodology, Writing – original draft, Writing – review & editing. Ping Li: Methodology. Jiancui Chen: Investigation. Tao Zhang: Formal analysis. Lihua Chen: Writing – original draft. Diling Pan: Investigation. Peizhong Liu: Conceptualization, Methodology, Project administration, Supervision, Writing – review & editing. Pengming Sun: Conceptualization, Methodology, Project administration, Supervision, Writing – review & editing.

Declaration of competing interest

The authors declare that they have no conflicts of interest in this work.

Acknowledgments

This work was supported by grants from the National Key R&D Program of China (2021YFC2701205), the National Natural Science Foundation of China (82271658), the Fujian Provincial Natural Science Foundation of China (2021J01408, 2021J01403), the Fujian Provincial Health and Education Joint Project (2019-WJ-05) and the Fujian Provincial Health Commission Innovation Project (2019-CX-7). We thank the Chief Physicians Liyu Dai, Yuchun Lv, and Jinwen Zheng; the senior doctors Qibin Wu, Dabin Liu, and Liangzhi Cai; Guanyu Ruan. and residents Guifen Lui, Lili Chen, Meimei Huang, and Jian An for their assistance with this study.

Biographies

Binhua Dong received B.S. and M.S. degrees from Fujian Medical University. He currently works at Fujian Maternity and Child Health Hospital, College of Clinical Medicine for Obstetrics & Gynecology and Pediatrics, Fujian Medical University. His research interests focus on the development, virus infection and artificial intelligence using in cervical cancer.

Pengming Sun received his Ph.D. degree from the Peking University Health Science Center, China, in 2005 and MD degree from the Charité Medical Center of Humboldt University and Free University of Berlin, Germany, in 2006. He currently serves as chief surgeon and professor in the Fujian Maternity and Child Health Hospital, College of Clinical Obstetrics Gynecology and Pediatrics, Fujian Medical University. His research interests focus on the development, metabolism, virus infection and artificial intelligence using in gynecological malignant tumors. He has published over 80 articles collected in SCI journals with a 17 h-index and cumulative IFs higher than 290 points.

Footnotes

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.fmre.2022.09.032.

Contributor Information

Peizhong Liu, Email: pzliu@hqu.edu.cn.

Pengming Sun, Email: sunfemy@hotmail.com.

Appendix. Supplementary materials

mmc1.docx^{(26.8KB, docx)}

References

1.Sung H., Ferlay J., Siegel R.L., et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2021;71(3):209–249. doi: 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]
2.Sawaya G.F., Smith-McCune K., Kuppermann M. Cervical Cancer Screening: More Choices in 2019. JAMA. 2019;321(20):2018–2019. doi: 10.1001/jama.2019.4595. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Dong B.H., Zou H.C., Mao X.D., et al. Effect of introducing human papillomavirus genotyping into real-world screening on cervical cancer screening in China: A retrospective population-based cohort study. Ther. Adv. Med. Oncol. 2021;13 doi: 10.1177/17588359211010939. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Khan M.J., Werner C.L., Darragh T.M., et al. ASCCP colposcopy standards. role of colposcopy, benefits, potential harms, and terminology for colposcopic practice. J. Low Genit. Tract. Dis. 2017;21(4):223–229. doi: 10.1097/LGT.0000000000000338. [DOI] [PubMed] [Google Scholar]
5.Baasland I., Hagen B., Vogt C., et al. Colposcopy and additive diagnostic value of biopsies from colposcopy-negative areas to detect cervical dysplasia. Acta Obstet. Gynecol. Scand. 2016;95(11):1258–1263. doi: 10.1111/aogs.13009. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Dalla Palma P., Giorgi Rossi P., Collina G., et al. The risk of false-positive histology according to the reason for colposcopy referral in cervical cancer screening: A blind revision of all histologic lesions found in the NTCC trial. Am. J. Clin. Pathol. 2008;129(1):75–80. doi: 10.1309/EWYGWFRRM8798U5P. [DOI] [PubMed] [Google Scholar]
7.Ferris D.G., Condorhuaman W.S.G., Waller J.L., et al. Polarized light colposcopy compared with standard colposcopy. J. Low Genit. Tract. Dis. 2015;19(3):234–238. doi: 10.1097/LGT.0000000000000111. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Doi K. Computer-aided diagnosis in medical imaging: Historical review, current status and future potential. Comput. Med. Imaging Graph. 2007;31(4–5):198–211. doi: 10.1016/j.compmedimag.2007.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Li W., Van Raad V., Gu J., et al. Computer Vision for Biomedical Image Applications: 2005. Springer Berlin Heidelberg; Berlin, Heidelberg: 2005. Computer-Aided Diagnosis (CAD) for Cervical Cancer Screening and Diagnosis: A New System Design in Medical Image Processing; pp. 240–250. [Google Scholar]
10.Xu Y., Jia Z., Wang L.B., et al. Large scale tissue histopathology image classification, segmentation, and visualization via deep convolutional activation features. BMC Bioinf. 2017;18(1):281. doi: 10.1186/s12859-017-1685-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Huang Z., Johnson T.S., Han Z., et al. Deep learning-based cancer survival prognosis from RNA-seq data: Approaches and evaluations. BMC Med. Genomics. 2020;13(5):41. doi: 10.1186/s12920-020-0686-1. Suppl. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.LeCun Y., Bengio Y., Hinton G. Deep learning. Nature. 2015;521(7553):436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
13.Cheng J.Z., Ni D., Chou Y.H., et al. Computer-aided diagnosis with deep learning architecture: Applications to breast lesions in us images and pulmonary nodules in CT Scans. Sci. Rep. 2016;6(1):24454. doi: 10.1038/srep24454. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Yang X., Yang S., Lian X., et al. Transfer learning via multi-scale convolutional neural layers for human-virus protein-protein interaction prediction. Bioinformatics. 2021:btab533. doi: 10.1093/bioinformatics/btab533. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Long J., Shelhamer E., Darrell T. Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 7-12 June 2015. 2015: 3431–3440.
16.Ronneberger O., Fischer P., U-Net Brox T. Medical Image Computing and Computer-Assisted Intervention – MICCAI: 2015. Springer International Publishing; Cham: 2015. Convolutional Networks for Biomedical Image Segmentation; pp. 234–241. [Google Scholar]
17.Sato M., Horie K., Hara A., et al. Application of deep learning to the classification of images from colposcopy. Oncol. Lett. 2018;15(3):3518–3523. doi: 10.3892/ol.2018.7762. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Zhang T., Luo Y.M., Li P., et al. Cervical precancerous lesions classification using pre-trained densely connected convolutional networks with colposcopy images. Biomed. Signal Process. Control. 2020;55 [Google Scholar]
19.Chandran V., Sumithra M.G., Karthick A., et al. Diagnosis of cervical cancer based on ensemble deep learning network using colposcopy images. Biomed. Res. Int. 2021:15. doi: 10.1155/2021/5584004. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Chen T.T., Liu X.C., Feng R.W., et al. Discriminative cervical lesion detection in colposcopic images with global class activation and local bin excitation. IEEE J. Biomed. Health Inform. 2022;26(4):1411–1421. doi: 10.1109/JBHI.2021.3100367. [DOI] [PubMed] [Google Scholar]
21.Chen J.C., Li P., Xu T.X., et al. Detection of cervical lesions in colposcopic images based on the RetinaNet method. Biomed. Signal Process. Control. 2022;75(8) [Google Scholar]
22.Wang C., Zhao Z.Y., Ren Q.Q., et al. Dense U-net based on patch-based learning for retinal vessel segmentation. Entropy. 2019;21(2):15. doi: 10.3390/e21020168. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Kolarik M., Burget R., Uher V., et al. Optimized High Resolution 3D Dense-U-Net Network for Brain and Spine Segmentation. Appl. Sci. 2019;9(3):17. [Google Scholar]
24.Dong B., Sun P., Ruan G., et al. Type-specific high-risk human papillomavirus viral load as a viable triage indicator for high-grade squamous intraepithelial lesion: A nested case- control study. Cancer Manag. Res. 2018;10:4839–4851. doi: 10.2147/CMAR.S179724. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Kang Y., Sun P., Mao X., et al. PCR-reverse dot blot human papillomavirus genotyping as a primary screening test for cervical cancer in a hospital-based cohort. J. Gynecol. Oncol. 2019;30(3):e29. doi: 10.3802/jgo.2019.30.e29. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Ruan G., Song Y., Dong B., et al. Cervical cancer screening using the Cervista high-risk human papillomavirus test: Opportunistic screening of a hospital-based population in Fujian province. Cancer Manag. Res. 2018;10:3227–3235. doi: 10.2147/CMAR.S169822. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Huh W.K., Ault K.A., Chelmow D., et al. Use of primary high-risk human papillomavirus testing for cervical cancer screening: Interim clinical guidance. Obstet. Gynecol. 2015;125(2):330–337. doi: 10.1097/AOG.0000000000000669. [DOI] [PubMed] [Google Scholar]
28.Bai B., Du Y., Liu P., et al. Detection of cervical lesion region from colposcopic images based on feature reselection. Biomed. Signal Process. Control. 2020;57 [Google Scholar]
29.Luo Y., Zhang T., Li P., et al. MDFI: Multi-CNN Decision Feature Integration for Diagnosis of Cervical Precancerous Lesions. IEEE Access. 2020;8:29616–29626. [Google Scholar]
30.Huang G., Liu Z., Laurens V., Weinberger K.Q. Densely Connected Convolutional Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2261–2269.
31.Hinton G.E., Srivastava N., Krizhevsky A., et al. Improving neural networks by preventing co-adaptation of feature detectors. PeerJ. Comput. Sci. 2012;3(4):212–223. [Google Scholar]
32.Kermany D.S., Goldbaum M., Ca I.W., et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018;172(5) doi: 10.1016/j.cell.2018.02.010. 1122-1131.e1129. [DOI] [PubMed] [Google Scholar]
33.Pepe M. Evaluating technologies for classification and prediction in medicine. Stat. Med. 2005;24(24):3687–3696. doi: 10.1002/sim.2431. [DOI] [PubMed] [Google Scholar]
34.Perkins R.B., Guido R.S., Castle P.E., et al. 2019 ASCCP risk-based management consensus guidelines for abnormal cervical cancer screening tests and cancer precursors. J. Low Genit. Tract. Dis. 2020;24(2):102–131. doi: 10.1097/LGT.0000000000000525. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Ho G., Bierman R., Beardsley L., et al. Natural history of cervicovaginal papillomavirus infection in young women. N. Engl. J. Med. 1998;338(7):423–428. doi: 10.1056/NEJM199802123380703. [DOI] [PubMed] [Google Scholar]
36.Vallikad E., Siddartha P., Kulkarni K., et al. Intra and inter-observer variability of transformation zone assessment in colposcopy: A qualitative and quantitative study. J. Clin. Diagn. Res. 2017;11(1) doi: 10.7860/JCDR/2017/21943.9168. XC04-XC6. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Almubarak H.A., Stanley R.J., Long R., et al. Convolutional neural network based localized classification of uterine cervical cancer digital histology images. Procedia Comput. Sci. 2017;114:281–287. [Google Scholar]
38.Xu T., Zhang H., Huang X., et al. Medical Image Computing and Computer-Assisted Intervention – MICCAI: 2016. Springer International Publishing; Cham: 2016. Multimodal Deep Learning for Cervical Dysplasia Diagnosis; pp. 115–123.https://link.springer.com/chapter/10.1007/978-3-319-46723-8_14 [Google Scholar]
39.Jeronimo J., Schiffman M. Colposcopy at a crossroads. Am. J. Obstet. Gynecol. 2006;195(2):349–353. doi: 10.1016/j.ajog.2006.01.091. [DOI] [PubMed] [Google Scholar]
40.Sherman M.E., Wang S.S., Tarone R., et al. Histopathologic extent of cervical intraepithelial neoplasia 3 lesions in the atypical squamous cells of undetermined significance low-grade squamous intraepithelial lesion triage study: Implications for subject safety and lead-time bias. Cancer Epidemiol. Biomarkers Prev. 2003;12(4):372–379. [PubMed] [Google Scholar]
41.Alzubaidi L., Al-Amidie M., Al-Asadi A., et al. Novel Transfer Learning Approach for Medical Imaging with Limited Labeled Data. Cancers (Basel) 2021;13(7):1590. doi: 10.3390/cancers13071590. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Suzuki K. Overview of deep learning in medical imaging. Radiol. Phys. Technol. 2017;10(3):257–273. doi: 10.1007/s12194-017-0406-5. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.docx^{(26.8KB, docx)}

[bib0001] 1.Sung H., Ferlay J., Siegel R.L., et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2021;71(3):209–249. doi: 10.3322/caac.21660. [DOI] [PubMed] [Google Scholar]

[bib0002] 2.Sawaya G.F., Smith-McCune K., Kuppermann M. Cervical Cancer Screening: More Choices in 2019. JAMA. 2019;321(20):2018–2019. doi: 10.1001/jama.2019.4595. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0003] 3.Dong B.H., Zou H.C., Mao X.D., et al. Effect of introducing human papillomavirus genotyping into real-world screening on cervical cancer screening in China: A retrospective population-based cohort study. Ther. Adv. Med. Oncol. 2021;13 doi: 10.1177/17588359211010939. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0004] 4.Khan M.J., Werner C.L., Darragh T.M., et al. ASCCP colposcopy standards. role of colposcopy, benefits, potential harms, and terminology for colposcopic practice. J. Low Genit. Tract. Dis. 2017;21(4):223–229. doi: 10.1097/LGT.0000000000000338. [DOI] [PubMed] [Google Scholar]

[bib0005] 5.Baasland I., Hagen B., Vogt C., et al. Colposcopy and additive diagnostic value of biopsies from colposcopy-negative areas to detect cervical dysplasia. Acta Obstet. Gynecol. Scand. 2016;95(11):1258–1263. doi: 10.1111/aogs.13009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0006] 6.Dalla Palma P., Giorgi Rossi P., Collina G., et al. The risk of false-positive histology according to the reason for colposcopy referral in cervical cancer screening: A blind revision of all histologic lesions found in the NTCC trial. Am. J. Clin. Pathol. 2008;129(1):75–80. doi: 10.1309/EWYGWFRRM8798U5P. [DOI] [PubMed] [Google Scholar]

[bib0007] 7.Ferris D.G., Condorhuaman W.S.G., Waller J.L., et al. Polarized light colposcopy compared with standard colposcopy. J. Low Genit. Tract. Dis. 2015;19(3):234–238. doi: 10.1097/LGT.0000000000000111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0008] 8.Doi K. Computer-aided diagnosis in medical imaging: Historical review, current status and future potential. Comput. Med. Imaging Graph. 2007;31(4–5):198–211. doi: 10.1016/j.compmedimag.2007.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0009] 9.Li W., Van Raad V., Gu J., et al. Computer Vision for Biomedical Image Applications: 2005. Springer Berlin Heidelberg; Berlin, Heidelberg: 2005. Computer-Aided Diagnosis (CAD) for Cervical Cancer Screening and Diagnosis: A New System Design in Medical Image Processing; pp. 240–250. [Google Scholar]

[bib0010] 10.Xu Y., Jia Z., Wang L.B., et al. Large scale tissue histopathology image classification, segmentation, and visualization via deep convolutional activation features. BMC Bioinf. 2017;18(1):281. doi: 10.1186/s12859-017-1685-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0011] 11.Huang Z., Johnson T.S., Han Z., et al. Deep learning-based cancer survival prognosis from RNA-seq data: Approaches and evaluations. BMC Med. Genomics. 2020;13(5):41. doi: 10.1186/s12920-020-0686-1. Suppl. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0012] 12.LeCun Y., Bengio Y., Hinton G. Deep learning. Nature. 2015;521(7553):436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]

[bib0013] 13.Cheng J.Z., Ni D., Chou Y.H., et al. Computer-aided diagnosis with deep learning architecture: Applications to breast lesions in us images and pulmonary nodules in CT Scans. Sci. Rep. 2016;6(1):24454. doi: 10.1038/srep24454. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0014] 14.Yang X., Yang S., Lian X., et al. Transfer learning via multi-scale convolutional neural layers for human-virus protein-protein interaction prediction. Bioinformatics. 2021:btab533. doi: 10.1093/bioinformatics/btab533. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0015] 15.Long J., Shelhamer E., Darrell T. Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 7-12 June 2015. 2015: 3431–3440.

[bib0016] 16.Ronneberger O., Fischer P., U-Net Brox T. Medical Image Computing and Computer-Assisted Intervention – MICCAI: 2015. Springer International Publishing; Cham: 2015. Convolutional Networks for Biomedical Image Segmentation; pp. 234–241. [Google Scholar]

[bib0017] 17.Sato M., Horie K., Hara A., et al. Application of deep learning to the classification of images from colposcopy. Oncol. Lett. 2018;15(3):3518–3523. doi: 10.3892/ol.2018.7762. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0018] 18.Zhang T., Luo Y.M., Li P., et al. Cervical precancerous lesions classification using pre-trained densely connected convolutional networks with colposcopy images. Biomed. Signal Process. Control. 2020;55 [Google Scholar]

[bib0019] 19.Chandran V., Sumithra M.G., Karthick A., et al. Diagnosis of cervical cancer based on ensemble deep learning network using colposcopy images. Biomed. Res. Int. 2021:15. doi: 10.1155/2021/5584004. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0020] 20.Chen T.T., Liu X.C., Feng R.W., et al. Discriminative cervical lesion detection in colposcopic images with global class activation and local bin excitation. IEEE J. Biomed. Health Inform. 2022;26(4):1411–1421. doi: 10.1109/JBHI.2021.3100367. [DOI] [PubMed] [Google Scholar]

[bib0021] 21.Chen J.C., Li P., Xu T.X., et al. Detection of cervical lesions in colposcopic images based on the RetinaNet method. Biomed. Signal Process. Control. 2022;75(8) [Google Scholar]

[bib0022] 22.Wang C., Zhao Z.Y., Ren Q.Q., et al. Dense U-net based on patch-based learning for retinal vessel segmentation. Entropy. 2019;21(2):15. doi: 10.3390/e21020168. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0023] 23.Kolarik M., Burget R., Uher V., et al. Optimized High Resolution 3D Dense-U-Net Network for Brain and Spine Segmentation. Appl. Sci. 2019;9(3):17. [Google Scholar]

[bib0024] 24.Dong B., Sun P., Ruan G., et al. Type-specific high-risk human papillomavirus viral load as a viable triage indicator for high-grade squamous intraepithelial lesion: A nested case- control study. Cancer Manag. Res. 2018;10:4839–4851. doi: 10.2147/CMAR.S179724. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0025] 25.Kang Y., Sun P., Mao X., et al. PCR-reverse dot blot human papillomavirus genotyping as a primary screening test for cervical cancer in a hospital-based cohort. J. Gynecol. Oncol. 2019;30(3):e29. doi: 10.3802/jgo.2019.30.e29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0026] 26.Ruan G., Song Y., Dong B., et al. Cervical cancer screening using the Cervista high-risk human papillomavirus test: Opportunistic screening of a hospital-based population in Fujian province. Cancer Manag. Res. 2018;10:3227–3235. doi: 10.2147/CMAR.S169822. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0027] 27.Huh W.K., Ault K.A., Chelmow D., et al. Use of primary high-risk human papillomavirus testing for cervical cancer screening: Interim clinical guidance. Obstet. Gynecol. 2015;125(2):330–337. doi: 10.1097/AOG.0000000000000669. [DOI] [PubMed] [Google Scholar]

[bib0028] 28.Bai B., Du Y., Liu P., et al. Detection of cervical lesion region from colposcopic images based on feature reselection. Biomed. Signal Process. Control. 2020;57 [Google Scholar]

[bib0029] 29.Luo Y., Zhang T., Li P., et al. MDFI: Multi-CNN Decision Feature Integration for Diagnosis of Cervical Precancerous Lesions. IEEE Access. 2020;8:29616–29626. [Google Scholar]

[bib0030] 30.Huang G., Liu Z., Laurens V., Weinberger K.Q. Densely Connected Convolutional Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2261–2269.

[bib0031] 31.Hinton G.E., Srivastava N., Krizhevsky A., et al. Improving neural networks by preventing co-adaptation of feature detectors. PeerJ. Comput. Sci. 2012;3(4):212–223. [Google Scholar]

[bib0032] 32.Kermany D.S., Goldbaum M., Ca I.W., et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018;172(5) doi: 10.1016/j.cell.2018.02.010. 1122-1131.e1129. [DOI] [PubMed] [Google Scholar]

[bib0033] 33.Pepe M. Evaluating technologies for classification and prediction in medicine. Stat. Med. 2005;24(24):3687–3696. doi: 10.1002/sim.2431. [DOI] [PubMed] [Google Scholar]

[bib0034] 34.Perkins R.B., Guido R.S., Castle P.E., et al. 2019 ASCCP risk-based management consensus guidelines for abnormal cervical cancer screening tests and cancer precursors. J. Low Genit. Tract. Dis. 2020;24(2):102–131. doi: 10.1097/LGT.0000000000000525. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0035] 35.Ho G., Bierman R., Beardsley L., et al. Natural history of cervicovaginal papillomavirus infection in young women. N. Engl. J. Med. 1998;338(7):423–428. doi: 10.1056/NEJM199802123380703. [DOI] [PubMed] [Google Scholar]

[bib0036] 36.Vallikad E., Siddartha P., Kulkarni K., et al. Intra and inter-observer variability of transformation zone assessment in colposcopy: A qualitative and quantitative study. J. Clin. Diagn. Res. 2017;11(1) doi: 10.7860/JCDR/2017/21943.9168. XC04-XC6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0037] 37.Almubarak H.A., Stanley R.J., Long R., et al. Convolutional neural network based localized classification of uterine cervical cancer digital histology images. Procedia Comput. Sci. 2017;114:281–287. [Google Scholar]

[bib0038] 38.Xu T., Zhang H., Huang X., et al. Medical Image Computing and Computer-Assisted Intervention – MICCAI: 2016. Springer International Publishing; Cham: 2016. Multimodal Deep Learning for Cervical Dysplasia Diagnosis; pp. 115–123.https://link.springer.com/chapter/10.1007/978-3-319-46723-8_14 [Google Scholar]

[bib0039] 39.Jeronimo J., Schiffman M. Colposcopy at a crossroads. Am. J. Obstet. Gynecol. 2006;195(2):349–353. doi: 10.1016/j.ajog.2006.01.091. [DOI] [PubMed] [Google Scholar]

[bib0040] 40.Sherman M.E., Wang S.S., Tarone R., et al. Histopathologic extent of cervical intraepithelial neoplasia 3 lesions in the atypical squamous cells of undetermined significance low-grade squamous intraepithelial lesion triage study: Implications for subject safety and lead-time bias. Cancer Epidemiol. Biomarkers Prev. 2003;12(4):372–379. [PubMed] [Google Scholar]

[bib0041] 41.Alzubaidi L., Al-Amidie M., Al-Asadi A., et al. Novel Transfer Learning Approach for Medical Imaging with Limited Labeled Data. Cancers (Basel) 2021;13(7):1590. doi: 10.3390/cancers13071590. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0042] 42.Suzuki K. Overview of deep learning in medical imaging. Radiol. Phys. Technol. 2017;10(3):257–273. doi: 10.1007/s12194-017-0406-5. [DOI] [PubMed] [Google Scholar]

PERMALINK

Classification and diagnosis of cervical lesions based on colposcopy images using deep fully convolutional networks: A man-machine comparison cohort study

Binhua Dong

Huifeng Xue

Ye Li

Ping Li

Jiancui Chen

Tao Zhang

Lihua Chen

Diling Pan

Peizhong Liu

Pengming Sun

Abstract

Graphical abstract

1. Introduction

2. Materials and methods

2.1. Study flowchart

Fig. 1.

2.2. Study population and diagnosis process

2.3. Image labeling and data preprocessing before Dense-U-Net model training

Fig. 2.

2.4. Setting up the training dataset, validation dataset, and task levels

2.5. Deep learning Dense-U-Net model

2.6. Diagnostic accuracy analysis and “Number needed for a diagnosis”

2.7. Comparative test between the Dense-U-Net model and human experts

2.8. Performance of the Dense-U-Net model on an independent test set

2.9. Statistical analysis

3. Results

3.1. Lesion detection results based on the improved Dense-U-Net AI model

Fig. 3.

3.2. Dense-U-Net model and self-learning validation

3.3. Finding a needle in a haystack test

3.4. Results of image classification and diagnosis using the Dense-U-Net model

Fig. 4.

3.5. Number needed for a diagnosis and diagnostic accuracy test

3.6. Comparison of the Dense-U-Net model with human experts

Table 1.

Table 2.

3.7. Performance of the Dense-U-Net model on an independent test set

Table 3.

4. Discussion

5. Conclusion

CRediT authorship contribution statement

Declaration of competing interest

Acknowledgments

Biographies

Footnotes

Contributor Information

Appendix. Supplementary materials

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases