Mask region-based CNNs for cervical cancer progression diagnosis on pap smear examinations

Carolina Rutili de Lima; Said G Khan; Syed H Shah; Luthiari Ferri

doi:10.1016/j.heliyon.2023.e21388

. 2023 Oct 25;9(11):e21388. doi: 10.1016/j.heliyon.2023.e21388

Mask region-based CNNs for cervical cancer progression diagnosis on pap smear examinations

Carolina Rutili de Lima ^a,^⁎, Said G Khan ^b, Syed H Shah ^c, Luthiari Ferri ^d

PMCID: PMC10641213 PMID: 37964829

Abstract

This research presents a novel approach for cervical cancer detection and segmentation using tissue images with multiple cells. The study employs a novel deep learning architecture based on Mask Region-Based Convolutional Neural Network (RCNN) and statistical analysis. This new architecture enables us to achieve a high percentage of detection and pix-to-pix area segmentation. A mean Average Precision (mAP) higher than 60% for 3-class and 5-class was achieved. In addition, higher F1-scores of 70% for 3-class and 5-class were obtained. This investigation is a collaborative work, where a medical consultant collected the samples from the Papanicolaou (Pap) Smear examination and labeled the cells presented to the liquid-based cytology (LBC). Furthermore, the online available benchmark data set, SIPaKMeD, was also utilized. Additionally, sample images from the Mendeley data set were also labeled by the trained medical consultant for comparison. The proposed scheme automatically generates a full report for a medical consultant to identify the location of the malicious cells in the given images and expedite the diagnosis and treatment process.

Keywords: Cervical cancer, Mask RCNN, Deep learning, Cells segmentation and classification, Whole tissue classification, Health and technology

1. Introduction

Cervical cancer remains one of the most prevalent malignancies in women, particularly in emerging countries with limited access to institutional health systems. [1]. Even though vaccination is widespread nowadays, most women remain vulnerable because vaccines are not 100 percent effective against all the variants of the viruses. Considering this perspective, the problem of assisting all individuals affected by cervical cancer becomes incredibly challenging for a public health system, particularly given the number of women who are already affected and those who may be carrying the virus but are unaware of it.

It is also a fact that most low-income countries and emerging countries don't have the capacity to vaccinate all women against cancer. In addition, they do not have a proper preventive mechanism and lack awareness to prevent the transmission of the cervical cancer virus. Cancer treatment options are also very limited in these countries. Globally, any public health system will have difficulty coping with many women who are already suffering from cervical cancer and many virus carriers. However, as noted by the World Health Organization (WHO), most of the merging countries that cannot vaccinate, educate, and prevent the transmission of the virus in their populace are more vulnerable [1].

It is also worth mentioning that globally, there is a shortage of consultant pathologists in the field of analyzing cells and interpreting pap smear tests [1]. Therefore, this is one of the major reasons for the slow rate of cervical cancer diagnosis in women [2]. Usually, the examination of a patient requires at least two specialists, i.e., a gynecologist and a consultant pathologist. However, available experts in the field are usually extremely occupied as they have to deal with multiple assignments.

The pap smear test is the most common way to detect cervical cancer. During this examination, the gynecologist collects a sample of the cells in the cervix [3], [4]. After the sample collection, the sample is sent to a pathologist for analysis. However, this analysis can take hours because of the huge number of cells in one human tissue. Also, it may be necessary to examine more than one tissue, each one containing dozens of cells, being very exhaustive for the doctor consultant. In addition, an expert may not be available on-site, and the samples may need to be transported to another far-off destination for analysis, which further delays the diagnosis process [1].

For precautions, some specialists recommend that women should do the test once a year for prevention and early treatment [5]. Nonetheless, it remains challenging to investigate the tissue by specialists for various reasons. The Brazilian pathologist who helped us during this research mentioned the need for a software-based tool that could show the location of the cell that may be affected or have some degree of cancer, which will significantly improve the speed of analysis and detection process. Furthermore, if there's software with high reliability, we could utilize it for early treatment and diagnosis of women with the disease.

Nevertheless, artificial intelligence (AI) techniques such as deep learning are becoming increasingly popular in solving problems in diverse areas, e.g., green power, face detection for security applications, social media, and image processing in medical diagnosis. One of the hot areas in image processing is cancer diagnosis, and AI has been used to explore the great potential of automatic cancer diagnosis and many other diseases. In the near future, AI-based sophisticated software will be available to aid consultants in this field as referred in [6]. One of the challenges in this field is when the cells overlap each other, as shown in Fig. 1. This makes it even harder for Deep Learning algorithms to distinguish, classify, and detect different types of carcinogenic cells, yet different laboratories use different colors, which makes it difficult to employ the same diagnostic and detection tools.

Regardless, to make that happen, a large database of accurately labeled images will be required to train a deep-learning NN; after training the deep NN, it should be tested and validated with the unseen images. The basic idea of training NN and testing is shown in Fig. 2. Once satisfied with the performance, it would be deployed in practice. However, there is still no universal and highly robust solution for cancer diagnosis, and without an expert medical consultant, the results may lead to false positives or false negatives. A significant amount of research has been conducted in this area to train deep learning schemes efficiently to classify an area of an image. New improvements are reported worldwide, eventually leading to automatic cancer detection and diagnosis software tools.

Also, whole tissue cells are cells that are studied within their natural tissue environment, surrounded by neighboring cells, extracellular matrix, and other cell types that make up the tissue microenvironment. The micro-environment can impact the behavior and functionality of these cells [7] [8]. In contrast, a single cell refers to a cell that has been removed from its original tissue environment and cultured in vitro, usually in a laboratory [7]. For our research, we opted to analyze a whole tissue slide containing multiple cells rather than a single cell, as many studies in this area have already been conducted.

Examining the current literature review, cited in the next chapter (Background and Literature Review), we observed that the current work in cervical cancer using whole tissue cells hasn't yet been fully automatized for detection, classification, and segmentation. The single crop cell classification and segmentation, or the weak mean Average Precision (mAP), are not enough to deliver a tool to be used by pathologists. To further improve the investigations in this area, we propose a model in this research to automatize the images collected from the microscopy and generate a report to assist the consultant pathologist. So we presented the following main contributions (and novelties) of this paper:

•
A new method for a whole tissue slide cell classification, segmentation, and detection using a modified Mask RCNN ResNeXt backbone, proven by using statistical variables, such as mAP (median average precision), mAR (median average recall), and F1-score;
•
Currently, most of the state-of-the-art cervical cancer diagnostic/detection schemes are not fully automatic to deliver a full report. In our case, a comprehensive report is automatically generated to be seen by a medical consultant, which expedites the diagnosis of the malignant cells;

The remaining paper has been divided into six sections. In Section 2, the Background and Literature Review are presented. In Section 3, the Image Data sets and the Methodology are discussed. In Section 4, a detail of the experiments carried out. In Section 5, the discussions of our results are provided. We finalize the paper with the conclusion and possible next steps for our research in Section 6.

2. Background and literature review

In the past ten years, many researchers have investigated this field. In this section, we highlight some of the most important ones and their main contributions to cervical cancer detection using Artificial Intelligence (AI). In the research work of Ghoneim et al. [9] and Waly et al. [10], deep learning networks were proposed for feature extraction (i.e., cell detection), and the Extreme Learning Machine (ELM) is used as the classifier for each cell detected. Yet, in [9], a CNN (Shallow, VGG-16-Net, and CaffeNet) was implemented and then connected to two ELMs (removing the softmax layer) for classification. The output of the first ELM is set to give normal or abnormal cells, and the other one is set to give classes of normal and abnormal cases.

Furthermore, Waly et al. [10] reached the highest model accuracy of 97.96% (7-class). The author employed the IDCNN-CDC model, which includes four major processes: 1) Preprocessing, the Gaussian Filter (GF) is applied to enhance data by removing noise from the dataset; 2) Segmentation, Tsallis entropy with the dragonfly optimization (TE-DFO) to make it easier to identify the malicious portions; 3) Feature extraction, the dataset images are fed into SqueezeNet; 4) Classification, the extracted features are used to the weighted extreme learning machine.

Win et al. [11], Rehman et al. [12] and Sabeena and Gopakumar [13] applied deep learning algorithms followed by machine learning techniques to classify cervical cancer cells. On the other hand, [11] employed a bagging ensemble classifier that computes the output of base learners during the classification stage. Moreover, Rehman et al. [12] and Sabeena and Gopakumar [13] tested each model separately at the end.

Similar to the work by Jia et al. [14], Yaman and T. Tuncer [15] also used SVM as the last step in the architecture. However, the CNNs applied were different. The first step was to use DarkNet19 and DarkNet53 in a “Pyramid Deep Feature Extraction Model”, which extracts features in three distinct sizes: 128x18, 64x64, and 32x32. Following that, the features have been merged, and NCA is used to determine the 1000 most relevant features. In the last step, SVM is used to classify them into 4 classes. Manna et al. [16] and Pramanik et al. [17] implemented two different methodologies using a Fuzzy Learning ensemble from three different CNNs. Both studies pre-trained on the ImageNet dataset. Moreover, despite using different strategies, they got good accuracy. Hussain et al. [18] also used the CNNs and ensembled with Resnet-50, Resnet-101, and GoogleNet. The ensemble classifier seeks the maximum number of classifiers' decisions and weighs them simultaneously to improve efficiency and performance, and it selects a class based on the highest number of votes received.

Chen et al. [19], [20] and Tan et al. [21] focused on the whole slide image (WSI) through classification using different architectures. [19] has three main parts: 1) segmentation, which extracts cells from the whole slide image (WSI); 2) classification, a new visual geometry group (VGG) called CompactVGG that is faster than the original one; 3) human aided visualization, providing two visual display modes for users to review and modify. In the work by [20], the architecture has three models: 1) LR model, it's fed with cropped images of 512 × 512 pixels from the WSI; HR model has an input image of 256 × 256 cropped according to the location heatmap and outputs a new lesion probability; RNN integrates the top 10 lesion cells and outputs their probabilities. In another instance, i.e., the work by Tan et al. [21], the Faster RCNN extracts the image information to get the feature map and the region proposal network, so at the end, it's possible to have access to the target category together with the target location. The WSI was zoomed in 200X and cropped to feed the CNN, and different images were also enhanced and augmented.

In contrast to previous research work, [22] apply Mask RCNN for the dataset images that contain just cells cropped and get their segmented image. For training and testing, these images are fed to another algorithm (Visual Geometry Group-like Network). In the work by [23], the authors investigated the possibility of a mobile-based framework detecting cervical lesions. This framework is based on the Internet of Things (IoT) that integrates the μSmartScope for the acquisition of the sample images with the deep learning model for detection and classification. They used the SIPaKed Dataset for training and then testing on their framework. The best performance was Faster-RCNN using the five classes in SKIPaKed. Moreover, the authors tried to push these models in the data acquired by the μSmartScope. However, the outcome was limited.

Xiang et al. [24] is also a research based on CNN detection and classification of cervical cancer tissue cells in different datasets. The architecture starts with the Darknet-53 network trained on Imagenet, used as a feature extractor, and then fine-tuning all convolution layers of YOLO3. The authors also compared different networks (FasterR-CNN, YOLOv3 416, YOLOv3 608, Tiny YOLOv3), and the one that achieved the best was YOLOv3 608, mAP of 0.574.

Many researchers in this field have focused on reaching a higher classification average using cropped images and with two steps: 1) feature extraction and 2) classification. Moreover, the ones that are targeting the segmentation process don't deliver a report for consulting the cells and helping the medical consultant in this area.

3. The image data and methodology

This section presents the essential resources, such as image data sets employed for this work. In addition, the statistical metrics and the architecture/topology are discussed in this section.

3.1. The dataset

This study utilized three different image data sets, which are described in detail in the following subsections:

3.1.1. Private dataset

A doctor partnered with a Brazilian clinic situated in Ijuí/Rio Grande do Sul (Instituto de Oncologia de Ijuí) to collect during the past years a private dataset of Pap Smear samples with various colorations taken from different microscopy, including some stored images from previous examinations. The doctor responsible for the clinic reviewed and examined all the data, ensuring that sensitive information such as names and ages were not included.

The coauthor (Pathologist) obtained images of tissue cells using microscopy and labeled them according to the Bethesda system. The classification of these cells can be found in Table 1, and examples of each cell's classification are provided in Table 2.

Table 1.

Private Dataset.

Augmentation	mAP	mAR	F1-score
940 train images	0.34	0.79	0.47
1900 train images	0.48	0.80	0.60
2800 train images	0.35	0.58	0.44

Augmentation	mAP	mAR	F1-score	Overfitting
No aug	0.22	0.55	0.32	Yes
2800 test images	0.599	0.86	0.701	No

Backbone	mAP	mAR	F1-score	α (used for train)
Resnet 50	0.49	0.81	0.61	0.001
Resnet 101	0.56	0.83	0.67	0.001
ResNeXt 50	0.52	0.80	0.63	0.001
ResNeXt 101	0.47	0.76	0.58	0.001
ResNeXt 140	0.52	0.78	0.63	0.001
ResNeXt 143	0.6	0.86	0.701	0.001
ResNeXt 152	loss Nan	loss Nan	loss Nan	0.0001

Number of layers	mAP	mAR	F1-score
1 layer	0.236	0.641	0.346
2 layers	0.2315	0.57	0.33
3 layers	0.31	0.69	0.42
4 layers	0.312	0.618	0.415
5 layers	Nan	Nan	Nan

Number of layers	mAP	mAR	F1-score
1 layer	0.29	0.59	0.39
2 layers	0.266	0.64	0.37
3 layers	0.52	0.78	0.63
4 layers	0.31	0.69	0.42
5 layers	0.36	0.66	0.46

Number of layers	mAP	mAR	F1-score
1 layer	0.27	0.67	0.38
2 layers	0.28	0.66	0.4
3 layers	0.27	0.69	0.39
4 layers	0.31	0.69	0.42
5 layers	Nan	Nan	Nan

Activation Method	mAP	mAR	F1-score
Relu	0.6	0.86	0.701
Softmax	0.4	0.68	0.5
Sigmoid	0.49	0.82	0.41
Elu	Nan	Nan	Nan
Selu	Nan	Nan	Nan

Name	Dataset	Algorithm	Detection/Classification/Segmentation	Results	Tool Cell location
[37]	Herlev and Private	Improved YOLOv3	Detection and Classification	mAP of 78.87%	No
[36]	Harlev and Private	Trainable Weka Segmentation	Detection	Acc of 98.88%	Yes
[23]	SIPaKMeD	Faster R-CNN	All	mAP of 0.37798, and AR of 0.64 (5-class)	No
[24]	Herlve and private	YOLOv3	All	mAP of 0.6 (10-class)	No
Our model	SIPaKMeD and private	Modified Mask-RCNN	All	mAP 0f 0.6 and mAR of 0.86 (5-class)	Yes

Name	Dataset	Number of classes	Number of images and cells	Name of the classes
[37]	Herlev and Private	7 classes	54,000 cells	LSIL, HSIL, SCC, AEC, AIS, AGC, and EA
[36]	Harlev and Private	2 classes	917 single cells and 557 full slides	Abnormal, and Normal
[23]	SIPaKMeD	5 classes	966 image and 4,490 cells	Dyskeratotic, Koilocytotic, Metaplastic, Parabasal, and Superficial-Intermediate
[24]	Herlve and private	10 classes	12,909 images and 58,995	Normal, ACUS, ASCH, LSIL, HSIL, AGC, ADE, VAG, MON, and DYS
Our model	SIPaKMeD and private	3 and 5 classes	1342 images and about 5 thousand cells	Dyskeratotic, Koilocytotic, Metaplastic, Parabasal, and Superficial-Intermediate

PERMALINK

Mask region-based CNNs for cervical cancer progression diagnosis on pap smear examinations

Carolina Rutili de Lima

Said G Khan

Syed H Shah

Luthiari Ferri

Abstract

1. Introduction

Figure 1.

Figure 2.

2. Background and literature review

3. The image data and methodology

3.1. The dataset

3.1.1. Private dataset

Table 1.

Table 2.

3.1.2. Open source dataset SIPaKMeD

Table 3.

Table 4.

3.1.3. Open source dataset Mendeley

Table 5.

3.1.4. Merged dataset

Table 6.

3.2. Statistical metrics

3.2.1. Intersection over Union (IoU)

Figure 3.

Figure 4.

3.2.2. Precision and recall

3.2.3. Precision

3.2.4. Recall

3.2.5. Precision-recall curve

3.2.6. F1-Score

3.3. Mean average precision (mAP)

Figure 5.

3.3.1. Confusion matrix

Figure 6.

Figure 7.

3.4. Architecture and topology implemented

Figure 8.

4. Experiments

Table 7.

4.1. Pre-processing and augmentation

Table 8.

Figure 9.

Figure 10.

Table 9.

4.2. New mask RCNN backbone

Table 10.

4.3. Five classes experiments

Figure 11.

Figure 12.

4.4. Three classes experiments

Figure 13.

Figure 14.

Figure 15.

4.5. Ablation test

4.5.1. Ablation for convolution layers

Table 11.

Table 12.

Table 13.

4.5.2. Ablation for activation layer

Table 14.

5. Results and discussion

5.1. Doctor's evaluation and comparison

Table 15.

Table 16.

Figure 16.

5.2. Research comparison

Table 17.

Table 18.

6. Conclusion and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Data availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES