Skip to main content
Reports of Practical Oncology and Radiotherapy logoLink to Reports of Practical Oncology and Radiotherapy
. 2025 Aug 8;30(3):316–331. doi: 10.5603/rpor.105867

Optimizing cervical cancer diagnosis with accurate cell classification using modified HDFF

Pooja Patre 1,, Dipti Verma 2
PMCID: PMC12413221  PMID: 40919253

Abstract

Background

Cervical cancer (CC) is a leading cause of cancer-related deaths worldwide, emphasizing the need for accurate and efficient diagnostic tools. Traditional methods of cervical cell classification are time-consuming and susceptible to human error, highlighting the need for automated solutions.

Materials and methods

This study introduces the modified hierarchical deep feature fusion (HDFF) method for cervical cell classification using the SIPaKMeD and Herlev datasets. The novelty of this research lies in the integration of hierarchical deep learning features, which allows for more accurate and robust classification. By enhancing the feature extraction process and combining multiple layers of deep learning models, the Modified HDFF method improves classification performance across various tasks, ranging from binary to multi-class problems.

Results

Our results demonstrate that the Modified HDFF method significantly outperforms existing models. In the 2-class task, it achieves an impressive accuracy of 98.88%, surpassing other approaches such as RF-based hierarchical classification (98.43%). Additionally, it maintains high precision, recall, and F1-scores in multi-class tasks, with 98.8% accuracy in the 3-class problem and 98.5% in the 7-class problem.

Conclusions

Overall, the Modified HDFF method shows great promise as a reliable and efficient diagnostic tool for cervical cancer screening. Its superior accuracy across multiple classification tasks highlights its potential for improving early detection and public health outcomes. Further refinement and expanded training datasets can further enhance its performance, making it an invaluable asset in automated cervical cancer detection.

Keywords: HDFF, ML, computer-aided diagnostic systems, cervical cancer

Introduction

CC which develops in the cervix, is the fourth most prevalent cancer among women globally. As reported by the World Health Organization (WHO), approximately 570,000 women were diagnosed with cervical cancer in 2018, resulting in nearly 311,000 deaths that year [1, 2]. Over 80% of cervical cancer cases and 85% of related deaths occur in low-income and underdeveloped countries, primarily due to inadequate screening and limited healthcare access [3]. Several factors contribute to the risk of human papillomavirus (HPV) infection — a leading cause of cervical cancer — including poor menstrual hygiene, early pregnancy, smoking, and the use of oral contraceptives. Persistent HPV infection is recognized as the primary cause of cervical cancer [4]. However, the disease is highly treatable when detected early and properly managed[5].

Since the late 1990s, the development of computer-aided diagnostic (CAD) systems for cervical cancer has progressed significantly, driven by the increased availability of training data. These systems typically follow a three-step process: first, cell segmentation is performed to distinguish the cytoplasm and nuclei; next, essential features are extracted from the segmented cells; and, finally, a classification model is applied to categorize the cells. Image preprocessing techniques help isolate cell nuclei using methods such as K-means clustering or super-pixel segmentation [ 8]. Post-processing techniques then enhance the segmented nuclei, allowing for the extraction of relevant morphological, color, and texture-based features [9, 10]. Ultimately, pattern recognition is used to analyze key attributes, enabling the system to accurately classify the cells.

Despite advancements in computer-aided diagnostics, traditional approaches that rely heavily on handcrafted features often struggle to achieve high classification accuracy, highlighting the limitations of conventional machine learning methods. To overcome these challenges, deep learning (DL) has emerged as a powerful alternative, surpassing traditional techniques in complex computer vision tasks. However, DL models require large datasets for optimal performance, which poses a significant challenge in the medical domain. Additionally, imbalanced data distributions in multiclass classification remain a major obstacle in medical image analysis, emphasizing the need for further refinement of CAD systems for Pap smear cell analysis [11].

This study presents DeepCervix, a framework based on deep learning aimed at improving the categorization of cervical cytopathology cells through hybrid deep feature fusion (HDFF) methods. The framework utilizes pre-trained deep learning models originally trained on the ImageNet dataset, which comprises above one million images, and refines them by fine-tuning with our cervical cell dataset. This method addresses challenges related to small dataset sizes and imbalanced data distributions in multiclass classification. By employing deep feature fusion (DFF) across different models, our approach captures and merges more relevant information, resulting in enhanced classification performance. We tested DeepCervix on the SIPaKMeD dataset, comprising single-cell cervical cytopathology images, and observed superior performance in accuracy, precision, and specificity for 2-class, 3-class, 5-class, and 7-class classification problems. Additionally, the method was applied to the Herlev dataset for all classification scenarios. Figure 1 exemplifies the workflow of the projected DeepCervix model, detailing the overall process.

Figure 1.

Figure 1

Cervical Cancer Classification using Deep Neural Network Model. CNN — convolutional neural network

This paper introduces several key advancements in cervical cytopathology cell classification. It presents HDFF as a novel approach in this field, enhancing feature extraction and classification performance. Additionally, a unique two-stage data augmentation strategy is implemented to improve dataset diversity and model generalization. To further enhance feature representation, the study utilizes four advanced convolutional neural networks (CNNs) — VGG16, VGG19, XceptionNet, and ResNet50 — capturing rich traits from multiple network layers. Furthermore, an improved HDFF network is proposed, integrating dense layers, SoftMax (SM), Batch Normalization (BN), and dropout techniques to optimize classification. The effectiveness of this approach is demonstrated through superior classification accuracy on the SIPaKMeD and Herlev datasets, showcasing its potential to improve cervical cancer detection and screening.

Figure 2.

Figure 2

The process for classifying cervical cancer using convolutional neural network (CNN)

Materials and methods

The classification of cervical cancer using deep learning follows a systematic methodology, beginning with data acquisition and progressing through preprocessing, model training, evaluation, and performance assessment. The initial step involves gathering a comprehensive dataset, such as the Herlev [12] or SIPaKMeD datasets [13], which contain multiple categories of cervical cell images, including adenocarcinoma, normal, precancer, squamous cell carcinoma, and invasive carcinoma. To enhance model reliability, the dataset must be sufficiently diverse, drawing from both publicly available sources and medical archives.

Once collected, the images undergo preprocessing to ensure they are optimized for model training. This process includes resizing images to maintain uniformity, applying filtering techniques to reduce noise, and enhancing image clarity [14]. Proper annotation of the dataset is essential for accurate classification, and this is achieved through expert manual labeling or semi-automated methods that assist in streamlining the process [15].

The dataset is then partitioned into three subsets: 60% for training, 20% for validation, and 20% for testing. This distribution ensures that the model learns effectively while also being evaluated on independent data to verify its generalization capability. The deep learning framework is built on a CNN architecture, incorporating convolutional layers for feature extraction, pooling layers to reduce dimensionality, and fully connected layers for classification. To enhance performance, techniques such as batch normalization and dropout are utilized, preventing overfitting and improving stability. The softmax activation function is employed at the final layer to determine class probabilities.

During the training phase, model parameters are optimized using advanced learning algorithms, while validation data is continuously monitored to fine-tune performance. Once training is complete, the model is subjected to rigorous evaluation using accuracy, precision, recall, F1-score, and Kappa score. Precision determines the proportion of correctly identified positive cases, recall assesses the model’s ability to detect relevant cases, and the F1-score balances precision and recall [16, 17]. The Kappa score is used to measure the agreement between predicted and actual classifications, adjusting for any chance occurrences. These evaluation metrics provide a comprehensive measure of the model’s effectiveness in classifying cervical cytopathology images [18, 19].

Data preprocessing

To evaluate the performance of our proposed approach, we utilized cervical cytopathology images from the SIPaKMeD dataset, which are stored in a JPEG format and vary in size, ranging from 86 × 72 to 512 × 486 pixels. To ensure uniformity across all models, we resized all images to 255 × 255 pixels using the Keras “preprocess_input” function. This function standardizes image inputs, aligning them with the specific requirements of the deep learning models.

To optimize model performance, various geometric transformations and image processing techniques were applied. Data augmentation was carried out using the “imgaug” library, which significantly increased the training dataset by sixfold through multiple transformations. Affine transformations, including rotation, scaling, translation, shearing, and flipping, were randomly applied to each batch of images, enhancing the model’s ability to generalize. Additionally, edge detection techniques, such as “EdgeDetect” and “DirectedEdgeDetect”, were used to emphasize structural features, while the Canny filter was implemented to further refine edge detection.

To improve model generalization, we employed the Keras ImageDataGenerator API, which applies dynamic image transformations during training. This ensures that the network is exposed to varied image samples in each epoch. The configuration included disabling feature-wise centering, setting a rotation range of 9°, and using the nearest fill mode for handling empty pixels. Additionally, horizontal and vertical flipping were enabled, brightness was adjusted within a 70% to 105% range, and channel shifting was applied to enhance variability in the training data.

Proposed model: modified high-dimensional feature fusion (HDFF)

Recent developments in deep learning (DL) have positioned it as a dominant approach for medical image analysis, with CNNs standing out due to their powerful capabilities and widespread use. CNNs are particularly effective in medical imaging because they are resilient to noise and can manage variations in image transformations, movement, and scaling. This makes them well-suited for analyzing complex objects in medical images.

CNNs function by extracting both simple and complex features through multiple layers of processing. Pooling layers, which typically use techniques like max-pooling or average pooling, help reduce the dimensionality of the extracted features, keeping the most important information intact. Finally, fully connected layers link the neurons across different layers to enable classification, following the structure of multilayer perceptrons. In our study, we explored several pretrained DL models and determined that combining XceptionNet, ResNet50, VGG16, and VGG19 achieved the best performance for cervical cytopathology analysis.

In the context of image classification, feature representation is vital, and feature fusion (FF) has proven to be a powerful method for analyzing cervical cytopathology images. FF involves the integration of multiple relevant features into a unified vector, providing a more robust representation than individual features alone. Traditional FF techniques include serial and parallel approaches. In serial FF, features are combined directly, such as merging and, into a single vector. In parallel FF, features are aggregated into a more complex vector, where one feature is modified by a fictitious element. However, these traditional methods often struggle to leverage the original features effectively and may generate new features that do not always provide better results.

To address these challenges, we propose a novel modified high-dimensional feature fusion (HDFF) approach, which merges feature vectors from multiple CNN models. The proposed method integrates normalized feature vectors (FV16, FV19, FR, FX) extracted from the dense layers (FCL) of VGG16, VGG19, ResNet50, and XceptionNet. The Modified HDFF network employs a concatenation layer and a fully connected layer (FCL) with a Softmax activation function to merge these diverse features efficiently. Additionally, to prevent overfitting and improve the training process, we incorporate dropout and batch normalization layers. This architecture ultimately produces a 4096-dimensional feature vector from the concatenation layer, enhancing the model’s ability to classify cervical cytopathology images more accurately.

Mathematical analysis

The proposed methodology for cervical carcinoma classification begins with feature extraction using four CNN architectures: VGG16, VGG19, ResNet50, and XceptionNet. Each model generates a feature vector from its final dense layer. These feature vectors are denoted as , , , and for VGG16, VGG19, ResNet50, and XceptionNet, respectively. Each feature vector has a dimensionality of 1024, represented as a sequence of numerical features (e.g., FV16 = [f16,1, f16,2…….., f16,1024]).

To ensure compatibility and comparability between feature vectors from different architectures, a normalization step is performed. Min–Max scaling is applied to each vector, transforming its values to a standardized range. The normalized feature vectors are represented as

FVi,jnorm=FVi,j-min(FVi)max(FVi)-min(FVi) (1)

The next step is feature fusion, where the normalized vectors are concatenated to form a single composite feature vector,. Since each vector has a dimensionality of 1024, the fused vector has a total dimensionality of 4096. This vector combines the normalized features from all four CNN models, ensuring comprehensive representation of the input data. Mathematically, is expressed as the concatenation of all normalized feature vectors.

The fused feature vector is then passed through a fully connected layer (FCL). This layer applies a transformation, represented as, where W is a weight matrix with dimensions corresponding to the number of classes, b is the bias vector, and z represents the logits output.

To convert the logits into probabilities, the Softmax activation function is applied. This function normalizes the logits into a probability distribution over the classes, enabling multi-class classification. The probability for class j is given by

P(y=j|FVfused)=eZjΣk=1CeZkforj=1,2,,C (2)

where C is the number of classes.

The loss function used for training is categorical cross-entropy, which quantifies the difference between the predicted probabilities and the true class labels. The loss is calculated as

Loss=-i=1Nj=1Cyijlog (P(y=j|FVfused)) (3)

where yij is a binary indicator for the correct class label, and N is the number of observations.

To prevent overfitting and improve training stability, regularization techniques, such as dropout and batch normalization, are applied. Dropout randomly deactivates a fraction of neurons during training, while batch normalization normalizes the output of each layer to stabilize and accelerate training.

The HDFF technique integrates these processes to create a robust framework for cervical carcinoma classification. By combining multiple feature vectors into a single 4096-dimensional vector and processing it through a fully connected layer with Softmax activation, this approach enhances feature representation and ensures accurate and reliable classification. Regularization techniques further improve the model’s performance, making it effective for diagnosing CC.

Finally, the methodology is outlined in Algorithm 1, which provides a pseudocode representation of the process. This comprehensive approach demonstrates how the HDFF technique improves feature integration, accuracy, and robustness in cervical carcinoma image classification.

Algorithm 1.

Cervical Carcinoma Classification Framework

1. Start procedure
2. Data Loading and Preprocessing
3. Load the CC cell image dataset
4. Split the dataset into training, validation, and testing sets (e.g., 70%/15%/15%)
5. Preprocess the data:
6. Resize all images to a fixed size (e.g., 224 × 224)
7. Normalize pixel values to the range [0, 1]
8. Apply data augmentation (e.g., rotation, flipping, brightness adjustment)
9. Define CNN Model Architecture
10. Initialize a CNN model
11. Add convolutional layers for feature extraction:
12. Use Conv2D with appropriate filter sizes and activation (‘ReLU’)
13. Add MaxPooling2D for down-sampling
14. Flatten the output of the convolutional layers
15. Add dense (fully connected) layers:
16. Use Dense layers with ‘ReLU’ activation
17. Add an output Dense layer with ‘softmax’ for classification
18. Extract High-Dimensional Features
19. Use pre-trained models (e.g., VGG16, VGG19, ResNet50, XceptionNet)
20. Extract feature vectors from the fully connected layers of each model
21. Normalize Feature Vectors
22. Apply Min-Max scaling to each feature vector to standardize their values
23. Feature Fusion
24. Concatenate the normalized feature vectors from all models into a single feature vector
25. Train the Model
26. Compile the model with:
27. Optimizer: ‘Adam’
28. Loss function: ‘Categorical Crossentropy’
29. Metrics: ‘Accuracy’
30. Train the model using the training set with validation data
31. Evaluate the Model
32. Test the model on the testing set to calculate accuracy and other performance metrics
33. Make Predictions
34. Use the trained model to predict class labels for new images
35. Save and Deploy the Model
36. Save the trained model and feature extractor
37. Deploy the model for real-time CC classification or integration into clinical applications
38. End Procedure

Results

Dataset description

To evaluate the efficiency of our suggested DeepCervix system, we used the SIPaKMeD resource [13], which is a freely available database containing cervical Pap test cell visuals [26]. Professional cytopathologists divided the tissues into five categories according to their structure and style: superficial-intermediate, parabasal, koilocytotic, metaplastic, and dyskeratotic. Specifically, superficial-intermediate and parabasal cells were classed as normal, koilocytotic and dyskeratotic cells as abnormal, and metaplastic cells as benign.

Data setting

The SIPaKMeD dataset contains 966 images including 4049 cervical cells. For processing, 60% of the images from each class are allocated for training, while 20% are reserved for validation and the remaining 20% for testing. We conducted classification tasks with three different schemes: five classes (superficial, parabasal, koilocytotic, metaplastic, and dyskeratotic), three classes (normal, abnormal, and benign), and two classes (normal and abnormal). In addition, data augmentation methods were used on the training set, leading to a sixfold expansion of the dataset.

The subsequent is a concise summary of the numerous CNN models utilized for the analysis:

  • VGG16 (Model-1): VGG16 is a deep CNN known for its simplicity, consisting of 16 layers with small 3 × 3 filters [27];

  • VGG19 (Model-2): VGG19 is an extension of VGG16, with 19 layers, offering slightly better accuracy by adding more convolutional layers, though at the cost of increased computational complexity [28];

  • ResNet50 (Model-3): ResNet50 is a deep neural network consisting of 50 layers. It introduced residual learning, a technique that enables training of deeper networks by addressing the vanishing gradient issue, which enhances its performance in handling complex tasks [29];

  • XceptionNet (Model-4): XceptionNet builds on the Inception architecture, substituting the traditional inception modules with depthwise separable convolutions [30];

  • InceptionV3 (Model-5): InceptionV3 is a convolutional network that employs inception modules, which allow for more efficient computation by using multiple filter sizes at each layer [31];

  • DenseNet121 (Model-6): DenseNet121 is a neural network with 121 layers, characterized by dense connections where each layer is linked to all succeeding layers in a feed-forward manner [32]. This connectivity pattern improves gradient flow and shrinks the total parameters;

  • LF (Model-7): The LF model refers to a lightweight feature extraction model, designed to be efficient with fewer parameters while maintaining performance, often used for tasks requiring low computational resources[33];

  • HDFF (Model-8): HDFF stands for high-dimensional feature fusion, a model that combines features from multiple layers or models to improve classification or detection accuracy, often used in tasks requiring rich feature representation.

Figure 3.

Figure 3

Comparison of the performance of various TL models utilizing lightweight feature extraction (LF) and high-dimensional feature fusion (HDFF) techniques

The results in Table 1 for the SIPaKMeD dataset show that the Modified HDFF method significantly outperforms the base models across various classification tasks. In the 2-class problem, it achieves near-perfect performance with an accuracy, precision, recall, F1 score, and area under the curve (AUC) close to 1. For 3-class and 4-class problems, it maintains high accuracy at 99.38% and 98.77%, respectively. Even in 5-class and 6-class classifications, it performs superiorly, with accuracy reaching 99.14% and 97.43%. Its high specificity and AUC further highlight its robustness, making it a promising tool for CC screening.

Table 1.

Evaluation of the modified high-dimensional feature fusion (HDFF) method was conducted alongside the base models using SIPaKMeD dataset. Classification Performance Metrics

Classification Problem CNN Models Avg. Precision Avg. Recall Avg. F1 Score Accuracy (%) Specificity AUC
Class-2 Model-1 0.998 0.997 0.996 99.65 0.998 0.998
Model-2 0.978 0.979 0.981 98.42 0.993 0.994
Model-3 0.992 0.993 0.994 99.12 0.995 0.995
Model-4 0.975 0.976 0.973 97.85 0.991 0.992
Model-5 0.989 0.991 0.990 99.02 0.997 0.997
Model-6 0.995 0.996 0.993 99.52 0.998 0.999
Class-3 Model-1 0.970 0.968 0.966 97.62 0.982 0.985
Model-2 0.958 0.952 0.951 96.38 0.981 0.982
Model-3 0.964 0.960 0.959 96.75 0.985 0.986
Model-4 0.935 0.928 0.920 91.28 0.971 0.973
Model-5 0.970 0.965 0.964 96.83 0.986 0.987
Model-6 0.980 0.977 0.975 97.47 0.992 0.993
Class-4 Model-1 0.963 0.960 0.959 96.53 0.980 0.982
Model-2 0.950 0.945 0.943 94.85 0.975 0.976
Model-3 0.960 0.958 0.957 95.92 0.979 0.981
Model-4 0.910 0.905 0.903 89.54 0.965 0.967
Model-5 0.965 0.960 0.962 96.72 0.985 0.986
Model-6 0.975 0.973 0.971 97.35 0.988 0.989
Class-5 Model-1 0.975 0.973 0.971 97.22 0.990 0.991
Model-2 0.962 0.958 0.956 96.03 0.986 0.987
Model-3 0.965 0.961 0.960 96.35 0.987 0.988
Model-4 0.740 0.735 0.733 71.26 0.882 0.883
Model-5 0.960 0.957 0.955 95.42 0.980 0.982
Model-6 0.972 0.970 0.969 97.62 0.991 0.992
Class-6 Model-1 0.952 0.950 0.948 95.28 0.981 0.982
Model-2 0.940 0.935 0.932 93.75 0.973 0.975
Model-3 0.948 0.945 0.943 94.83 0.976 0.978
Model-4 0.860 0.855 0.850 85.12 0.955 0.957
Model-5 0.950 0.947 0.946 94.95 0.977 0.979
Model-6 0.965 0.962 0.960 96.08 0.987 0.988

CNN — convolutional neural networks; AUC — area under the curve

The image in Figure 4 shows a visual comparison of misclassified cervical cells from the SIPaKMeD dataset, emphasizing the difficulty the model encounters in differentiating between certain types of abnormal cells. In particular, the image shows how a koilocytotic cell, which is often indicative of HPV infection and is characterized by distinct cellular changes, has been incorrectly classified as a dyskeratotic cell. Dyskeratotic cells, on the other hand, are abnormal cells typically associated with different pathological conditions, such as severe dysplasia or carcinoma. The image provided in Figure 5 highlights the comparison between the actual and predicted cervical cell classifications from the SIPaKMeD dataset.

Figure 4.

Figure 4

Comparison of misclassified koilocytotic and dyskeratotic cells from the SIPaKMeD Dataset

Figure 5.

Figure 5

Comparison of misclassified koilocytotic and dyskeratotic cells from the SIPaKMeD dataset

In the first set of images, the original cell is identified as a koilocytotic cell, which is a type of abnormal cell often associated with HPV infection. However, the model has incorrectly classified it as a dyskeratotic cell, which is another type of abnormal cell but indicates a different pathology. In the second set, the reverse misclassification occurs: an original dyskeratotic cell has been wrongly predicted as a koilocytotic cell. This illustrates the model’s occasional challenges in distinguishing between these two similar yet distinct types of abnormal cells, highlighting areas for improvement in classification accuracy.

Figure 6 presents examples of cervical cells from the SIPaKMeD dataset that were misclassified, showcasing cases where the system’s predictions differed from the true classifications. This highlights the difficulties in achieving flawless accuracy and points to areas where the classification system can be improved.

Figure 6.

Figure 6

Misidentified cervical cells from the SIPaKMeD dataset

Additional experiment

Dataset

The Herlev dataset is a publicly available benchmark for Pap smear tests, commonly used to evaluate classification models in CC diagnosis. It contains a total of 917 digitized images of cervical cells from Pap smear tests, categorized into seven distinct classes based on their cytological characteristics. It contains normal as well as various abnormal cell types, such as superficial, intermediate, parabasal, and dyskeratotic cells as shown in Figure 7. For our analysis, we focus on classes 2, 3, 5, and 7. Classes 2 and 3 consist of normal cells, while classes 5 and 7 contain malignant cells.

Figure 7.

Figure 7

Samples of Herlev dataset under different classes

In our experimental setup, 60% of images from each class were used for training, 20% for validation, and 20% for testing. Data augmentation expanded the training set 14-fold, improving the models’ generalization and accuracy. The tables show the final distributions for 7-class, 5-class, 3-class, and 2-class tasks, highlighting the balanced and comprehensive design. Table 4 highlights the exceptional performance of the Modified HDFF method across four classification tasks (2-Class, 3-Class, 5-Class, and 7-Class), consistently surpassing models like VGG16, ResNet50, DenseNet121, and XceptionNet. Figure 8 displays instances of misclassified cervical cells from the Herlev dataset, showcasing cases where the system’s predictions diverged from the true classifications.

Table 4.

Evaluation of the modified hierarchical deep feature fusion (HDFF) method was conducted alongside the base models using Herlev dataset

Model Accuracy Avg. precision Specificity Avg. recall Avg. F1 Score AUC
2-Class evaluation
VGG16 0.925 0.91 0.935 0.93 0.92 0.94
VGG19 0.92 0.92 0.93 0.91 0.915 0.93
ResNet50 0.965 0.97 0.97 0.96 0.965 0.975
XceptionNet 0.88 0.88 0.885 0.87 0.875 0.895
InceptionV3 0.945 0.94 0.945 0.93 0.935 0.95
DenseNet121 0.95 0.97 0.962 0.96 0.965 0.965
LF 0.975 0.97 0.97 0.975 0.972 0.975
Modified HDFF 0.988 0.98 0.985 0.982 0.982 0.987
3-Class evaluation
VGG16 0.76 0.78 0.78 0.76 0.77 0.79
VGG19 0.79 0.82 0.80 0.79 0.805 0.81
ResNet50 0.83 0.84 0.85 0.83 0.835 0.865
XceptionNet 0.725 0.74 0.74 0.73 0.735 0.75
InceptionV3 0.80 0.82 0.83 0.80 0.81 0.84
DenseNet121 0.84 0.85 0.86 0.84 0.845 0.87
LF 0.875 0.88 0.89 0.87 0.875 0.90
Modified HDFF 0.988 0.985 0.985 0.98 0.982 0.92
5-Class evaluation
VGG16 0.72 0.74 0.74 0.72 0.73 0.75
VGG19 0.74 0.77 0.76 0.74 0.755 0.77
ResNet50 0.78 0.80 0.81 0.78 0.79 0.82
XceptionNet 0.64 0.66 0.68 0.65 0.655 0.69
InceptionV3 0.765 0.78 0.79 0.76 0.77 0.80
DenseNet121 0.79 0.80 0.81 0.79 0.795 0.82
LF 0.83 0.84 0.85 0.83 0.835 0.87
Modified HDFF 0.975 0.975 0.97 0.97 0.972 0.88
7-Class evaluation
VGG16 0.71 0.74 0.74 0.71 0.725 0.75
VGG19 0.72 0.75 0.75 0.72 0.735 0.76
ResNet50 0.88 0.89 0.90 0.88 0.885 0.91
XceptionNet 0.52 0.55 0.57 0.54 0.545 0.58
InceptionV3 0.85 0.86 0.88 0.85 0.855 0.89
DenseNet121 0.87 0.88 0.89 0.87 0.875 0.90
LF 0.90 0.91 0.92 0.90 0.905 0.94
Modified HDFF 0.985 0.983 0.98 0.985 0.98 0.97

AUC — area under the curve

Figure 8.

Figure 8

Misidentified cervical cells from the Herlev dataset

The evaluation results across multiple class configurations highlight the superior performance of the Modified HDFF model in comparison to other models. In the 2-class evaluation, the Modified HDFF model demonstrates exceptional results with an average precision of 0.98, recall of 0.982, and accuracy of 0.988, outshining models like ResNet50 (precision 0.97, recall 0.96) and VGG19 (precision 0.92, recall 0.91). ResNet50 performs well with high precision and recall but does not surpass the Modified HDFF. Models such as VGG16 and DenseNet121 also show competitive performance with precision values above 0.90, but XceptionNet lags behind with a lower precision of 0.88 and recall of 0.87.

Table 2.

Summary of the Herlev dataset

Attribute Description
Total images 917
Cell types Superficial, intermediate, parabasal, dyskeratotic, koilocytotic, metaplastic
Cell categories Normal, abnormal, suspicious
Image format JPEG
Resolution 0.201 microns per pixel
Annotation Manually labeled by cytotechnologists
Public Access Available for research and development

In the 3-class evaluation, Modified HDFF continues its dominance, achieving an average precision of 0.985, recall of 0.98, and accuracy of 0.988. This performance exceeds by far that of other models such as ResNet50 (precision 0.84, recall 0.83), VGG19 (precision 0.82, recall 0.79), and LF (precision 0.88, recall 0.87). While these models perform well, particularly in precision and recall, none of them reach the high levels set by Modified HDFF. In the 5-class evaluation, the performance of the models is more varied. VGG16 and VGG19 perform reasonably well with average precision values between 0.74 and 0.77, but XceptionNet struggles with low performance (precision 0.66, recall 0.65). Modified HDFF sees a slight dip in comparison to the 2-class and 3-class evaluations but still achieves solid precision (0.975) and recall (0.97), with an accuracy of 0.975, outperforming most other models. In the 7-class evaluation, similar trends emerge. VGG16 and VGG19 maintain moderate precision values around 0.74 to 0.75, while ResNet50 excels with precision and recall of 0.89 and 0.88, respectively. However, XceptionNet faces a significant drop in performance with precision and recall below 0.60. InceptionV3 and DenseNet121 show better performance with precision values around 0.86, but LF performs the best in this configuration with precision (0.91) and recall (0.90). Still, the Modified HDFF model outperforms all, achieving 0.983 precision, 0.985 recall, and 0.985 accuracy, continuing to lead in all metrics. Comparison of the performance of various classification models conducted alongside the base models using the Herlev dataset is shown in Figure 9.

Figure 9.

Figure 9

Comparison of the performance of various classification models conducted alongside the base models using the Herlev dataset

Table 3.

Experimental configuration for the Herlev dataset

Class Training set Verification set Testing set Total images
Class-2 5156 1719 1719 8594
Class-3 5135 1712 1712 8559
Class-5 5134 1712 1712 8558
Class-7 5139 1712 1712 8563
Total 20564 6855 6855 34,274

Overall, the Modified HDFF model stands out as the best performer across all class configurations, consistently delivering the highest precision, recall, and accuracy, particularly in the 2-class and 3-class evaluations. While other models like ResNet50, LF, and InceptionV3 perform well, especially in lower-class evaluations, none of them can match the outstanding results achieved by Modified HDFF.

Benchmarking the modified HDFF method against traditional techniques

Here, we provide a detailed evaluation of the modified HDFF approach in comparison with various established models on the Herlev dataset. The results, presented in Table 5, offer a comprehensive analysis of the modified HDFF (Hierarchical Deep Feature Fusion) method across different classification categories, including 2-Class, 3-Class, 5-Class, and 7-Class tasks. In all configurations, the modified HDFF method consistently outperforms other models in terms of accuracy, precision, specificity, recall, and F1-score. Specifically, for the 2-Class task, which differentiates between normal and abnormal cells, the HDFF model achieves an impressive accuracy of 98%, significantly outperforming models such as RF-based hierarchical classification (98.43%) and CNN with Transfer Learning (95.10%). HDFF also excels in additional metrics, including precision (98%), specificity (98.5%), recall (98.2%), and F1-score (98.2%), demonstrating its robust ability to accurately classify cervical cells and maintain a balanced performance across key indicators.

Table 5.

Comparative analysis of modified high-dimensional feature fusion (HDFF) method vs. existing approaches on the Herlev dataset

Reference Model Categorization Accuracy (%) Precision (%) Spec Rec F1
[34] Hybrid Ensemble 7-Class 78.0 --- --- --- ---
[35] VGG16 and ResNet 2-Class 86.0 --- --- --- ---
[36] CNN and Transfer Learning 2-Class 95.10 --- --- --- ---
[37] Morphology and CNN 2-Class 94.50 --- --- --- ---
7-Class 64.50 --- --- --- ---
[38] DenseNet161 2-Class 94.38 --- --- --- ---
7-Class 68.54 --- --- --- ---
[39] AlexNet and Least Square Support Vector Machine (LSSVM) 2-Class 94.61 --- --- --- ---
[40] Back Propagation Neural Network (BPNN) 3-Class 79.0 --- --- --- ---
[41] RF Based hierarchical classification 2-Class 98.43 98.42 98.42 98.41 98.42
5-Class 95.67 85.19 97.47 84.94 85.06
7-Class 95.43 84.31 97.33 84.00 84.00
[42] RF and Bagging Clustering Based Hybrid Model 2-Class 95.12 --- --- --- ---
5-Class 96.12 --- --- --- ---
Our method Modified HDFF 2-Class 0.988 0.98 0.985 0.982 0.982
3-Class 0.988 0.985 0.985 0.98 0.982
5-Class 0.975 0.975 0.97 0.97 0.972
7-Class 0.985 0.983 0.98 0.985 0.98

The modified HDFF model continues to demonstrate strong performance as the complexity of the classification task increases. In the challenging 3-Class and 5-Class tasks, HDFF achieves 98.8% and 97.5% accuracy, respectively, surpassing other methods such as Back Propagation Neural Network (BPNN) (79.0% accuracy) and RF-based classification (95.67% accuracy for 5-Class). In the 7-Class task, HDFF maintains 98.5% accuracy, significantly outperforming models like Morphology and CNN (64.50%) and DenseNet161 (68.54%). Unlike many models that focus solely on accuracy, HDFF provides a comprehensive evaluation by reporting additional metrics such as precision, recall, specificity, and F1-score. For instance, in the 7-Class task, HDFF achieves 98.3% precision and 98.5% recall, showcasing its effectiveness in correctly identifying abnormal cervical cells and minimizing misclassifications. These results highlight the modified HDFF method as a highly reliable and robust model for cervical cell classification, positioning it as a superior choice for automated medical diagnostic systems.

Discussion

CC remains one of the leading causes of morbidity and mortality worldwide, making early and accurate diagnosis essential for improving patient outcomes. In this study, we introduced the Modified Hierarchical Deep Feature Fusion (HDFF) method for cervical cell classification, demonstrating its effectiveness on both the SIPaKMeD and Herlev datasets. The performance evaluation of the Modified HDFF across various classification tasks, from binary (2-class) to multi-class (3-class, 5-class, and 7-class) problems, shows its significant promise as an advanced tool for automated cervical cancer detection.

The results clearly demonstrate that the Modified HDFF method outperforms a range of traditional models, including hybrid ensemble models, VGG16 and ResNet, CNN and transfer learning, as well as random forest (RF)-based approaches. The Modified HDFF achieved remarkable accuracy rates, particularly in the 2-class classification problem, where it reached an exceptional accuracy of 98.88%, surpassing by far the RF-based hierarchical classification (98.43%) and CNN-based methods (95.10%). Additionally, in more complex multi-class scenarios, it maintained impressive performance, with accuracy rates of 98.8% in 3-class, 97.5% in 5-class, and 98.5% in 7-class tasks. These results highlight the robustness of the Modified HDFF method in handling increasing levels of classification complexity.

One of the key strengths of the Modified HDFF is its ability to provide comprehensive evaluation metrics, including precision, recall, specificity, and F1-score, which go beyond just accuracy. This comprehensive evaluation allows a more nuanced understanding of the model’s performance and its capability to distinguish between various cell types. Notably, its high precision and recall rates, particularly in the 2-class and 7-class classification tasks, demonstrate its reliability in identifying abnormal cells while minimizing misclassifications.

The findings of this study contribute to the growing body of research on automated cervical cancer detection, suggesting that the Modified HDFF method could be a reliable tool for clinical use in the future. By improving the accuracy and efficiency of cervical cell classification, it holds the potential to enhance the speed and reliability of cervical cancer screening programs. Furthermore, as the model is refined and trained on larger datasets, its performance could improve even further, making it a key player in the fight against CC.

Conclusion

The recent advancements in deep learning have significantly improved the ability to address critical challenges in the medical field, particularly in the early detection of CC. Accurate classification of cervical cells is crucial for identifying malignancies at an early stage, which is essential for preventing the progression of CC. In this study, we introduced the modified HDFF method (DeepCervix) for cervical cell classification, utilizing the SIPaKMeD and Herlev datasets, and achieved remarkable results.

The modified HDFF method has shown exceptional performance in cervical cell classification, outperforming existing models across various classification tasks on the Herlev dataset. With impressive accuracy, precision, recall, specificity, and F1-score, HDFF effectively addresses the challenges associated with distinguishing between similar but distinct cell types. The method consistently outperformed other models, achieving up to 98.8% accuracy in the 3-Class task and 98.5% accuracy in the 7-Class task. The analysis of misclassifications highlights the ongoing challenges faced by automated systems, emphasizing the need for continuous refinement and the incorporation of additional training data. As a promising tool for cervical cancer screening, the modified HDFF method represents a significant step forward in providing more accurate and efficient diagnostic solutions, ultimately improving clinical decision-making and patient care.

Footnotes

Author contributions: P.P.: conceptualization and research design, literature review and theoretical framework, data collection and analysis, manuscript drafting and final revisions; D.V.: methodology developement and implementation, data interpretation and validation, formatting, editing, and proofreading.

Conflict of interests: Authors declare no conflict of interests.

Ethical approval: Not required.

Funding: None declared.

References

  • 1.Reza S, Dewan S, Islam M, et al. Response of Bangladesh to the World Health Organization call to eliminate cervical cancer as a public health issue: An observational report. Health Sci Rep. 2024;7(6) doi: 10.1002/hsr2.2178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Singh P, Singh S, Mishra A, et al. Multimodality treatment planning using the Markov decision process: a comprehensive study of applications and challenges. Res Biomed Engin. 2024;40(2):435–450. doi: 10.1007/s42600-024-00349-4. [DOI] [Google Scholar]
  • 3.Singh P, Tripathi S, Gupta S. A unified approach for optimal dose delivery and trajectory optimization for the treatment of prostate cancer. Biomedical Signal Process Control. 2021;69:102884. doi: 10.1016/j.bspc.2021.102884. [DOI] [Google Scholar]
  • 4.Bogani G, Sopracordevole F, Ciavattini A, et al. HPV persistence after cervical surgical excision of high-grade cervical lesions. Cancer Cytopathol. 2023;132(5):268–269. doi: 10.1002/cncy.22760. [DOI] [PubMed] [Google Scholar]
  • 5.Singh P, Mishra A, Mishra S. A comprehensive analysis of the challenges and potential side effects of radiation therapy for palliative cancer treatment. Méd Palliat. 2024;23(2):75–91. doi: 10.1016/j.medpal.2023.12.002. [DOI] [Google Scholar]
  • 6.Honarvar Z, Zarisfi Z, Salari Sedigh S, et al. Comparison of conventional and liquid-based Pap smear methods in the diagnosis of precancerous cervical lesions. J Obstet Gynaecol. 2022;42(6):2320–2324. doi: 10.1080/01443615.2022.2049721. [DOI] [PubMed] [Google Scholar]
  • 7.Alquran H, Alsalatie M, Mustafa WA, et al. Cervical Net: A Novel Cervical Cancer Classification Using Feature Fusion. Bioengineering (Basel) 2022;9(10) doi: 10.3390/bioengineering9100578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Singh P, Yadav OP, Yadav Y. ECG signal compression implementation by a new 2-dimensional transform technique. Int J Eng Adv Technol. 2012;1(6):167–170. [Google Scholar]
  • 9.Singh P, Tripathi S, Tamrakar R, et al. Dose-Volume Constraints Based Inverse Treatment Planning For Optimizing the Delivery of Radiation Therapy. Gedrag Organ Rev. 2020;33(03) doi: 10.37896/gor33.03/489. [DOI] [Google Scholar]
  • 10.Mishra A, Bhusnur S, Mishra SK, et al. Exploring a new frontier in cardiac diagnosis: ECG analysis enhanced by machine learning and parametric quartic spline modeling. J Electrocardiol. 2024;85:19–24. doi: 10.1016/j.jelectrocard.2024.05.086. [DOI] [PubMed] [Google Scholar]
  • 11.Tan SL, Selvachandran G, Ding W, et al. Cervical Cancer Classification From Pap Smear Images Using Deep Convolutional Neural Network Models. Interdiscip Sci. 2024;16(1):16–38. doi: 10.1007/s12539-023-00589-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Jantzen J, Norup J, Dounias G. Pap-smear benchmark data for pattern classification. NiSIS. 2005:1–9. [Google Scholar]
  • 13.Plissiti ME, Dimitrakopoulos P, Sfikas G, et al. Sipakmed: A new dataset for feature and image based classification of normal and pathological cervical cells in pap smear images. 2018 25th IEEE International Conference on Image Processing (ICIP), IEEE; 2018; pp. 3144–3148. [Google Scholar]
  • 14.Singh P, Tripathi S, Tamrakar R. Fluence map optimisation for prostate cancer intensity modulated radiotherapy planning using iterative solution method. Pol J Med Phys Engin. 2020;26(4):201–209. doi: 10.2478/pjmpe-2020-0024. [DOI] [Google Scholar]
  • 15.Mishra A, Bhusnur S, Mishra SK, et al. Comparative analysis of parametric B-spline and Hermite cubic spline based methods for accurate ECG signal modeling. J Electrocardiol. 2024;86:153783. doi: 10.1016/j.jelectrocard.2024.153783. [DOI] [PubMed] [Google Scholar]
  • 16.Dhawan S, Singh K, Arora M. Cervix Image Classification for Prognosis of Cervical Cancer using Deep Neural Network with Transfer Learning. EAI Endorsed Trans Pervasive Heal Technol. 2021;7(27):e5. doi: 10.4108/eai.12-4-2021.169183. [DOI] [Google Scholar]
  • 17.Singh P, Dewangan NK, Potdar RM. An Optimal Framework for the Effective Delivery of the Radiation to the target by Considering the Case of Head and Neck Cancer. Pol J Med Phys Eng. 2024;30(3):132–144. doi: 10.2478/pjmpe-2024-0016. [DOI] [Google Scholar]
  • 18.Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015; 2015. [DOI] [PubMed] [Google Scholar]
  • 19.Hua W, Xiao T, Jiang X, et al. Lymph-vascular space invasion prediction in cervical cancer: Exploring radiomics and deep learning multilevel features of tumor and peritumor tissue on multiparametric MRI. Biomed Sign Proces Contr. 2020;58:101869. doi: 10.1016/j.bspc.2020.101869. [DOI] [Google Scholar]
  • 20.Jung A. Imgaug documentation. Readthedocs. 2019;25 [Google Scholar]
  • 21.Bhonsle D, Saxena KK, Sheikh RU, et al. Wavelet Based Random Noise Removal from Color Images Using Python. 2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), IEEE; 2024; pp. 1–5. [Google Scholar]
  • 22.Mishra A, Bhusnur S, Mishra SK, et al. Parametric B-spline method for synthetic ECG signal generation: An alternative approach to addressing challenges in ECG analysis. AIP Conference Proceedings; 2024.AIP Publishing; [Google Scholar]
  • 23.Zuiderveld K. Contrast limited adaptive histogram equalization. Graphics Gems IV. 1994:474–485. [Google Scholar]
  • 24.Ye A. A Deep Dive into Keras. Modern Deep Learn Des App Dev. 2022:1–48. doi: 10.1007/978-1-4842-7413-2_1. [DOI] [Google Scholar]
  • 25.Singh P, Tripathi S. Optimal Delivery of Fluence Profile Using Dynamic Multi-Leaf Collimator Leaf Trajectory Optimization. ECS Trans. 2022;107(1):2022–19225. [Google Scholar]
  • 26.Rahaman MM, et al. DeepCervix: A deep learning-based framework for the classification of cervical cells using hybrid deep feature fusion techniques. Comput Biol Med. 2021;136:104649. doi: 10.1016/j.compbiomed.2021.104649. [DOI] [PubMed] [Google Scholar]
  • 27.Tyagi K, Vats S, Vashisht V. Implementing Inception v3, VGG-16 and VGG-19 Architectures of CNN for Medicinal Plant leaves Identification and Disease Detection. J Electr Syst. 2024;20(7s):2380–2388. doi: 10.52783/jes.3989. [DOI] [Google Scholar]
  • 28.Ingle Y, Shaikh N. Skin Cancer Recognition Using CNN, VGG16 and VGG19. ICT for Intelligent Systems. 2023:131–144. doi: 10.1007/978-981-99-3982-4_11. [DOI] [Google Scholar]
  • 29.Koonce B. ResNet 50. Convolut Neural Networks. 2021:63–72. doi: 10.1007/978-1-4842-6168-2_6. [DOI] [Google Scholar]
  • 30.Nalini N. Classification and segmentation of brain MRI images using deep learning. 2021 2nd Global Conference for Advancement in Technology (GCAT), IEEE; 2021; pp. 1–7. [Google Scholar]
  • 31.Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning. Proceedings of the AAAI Conference on Artificial Intelligence; 2017. [Google Scholar]
  • 32.Pacal I, Kılıcarslan S. Deep learning-based approaches for robust classification of cervical cancer. Neur Comp Appl. 2023;35(25):18813–18828. doi: 10.1007/s00521-023-08757-w. [DOI] [Google Scholar]
  • 33.Chen W, Shen W, Gao L, et al. Hybrid Loss-Constrained Lightweight Convolutional Neural Networks for Cervical Cell Classification. Sensors (Basel) 2022;22(9) doi: 10.3390/s22093272. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Sarwar A, Sharma V, Gupta R. Hybrid ensemble learning technique for screening of cervical cancer using Papanicolaou smear image analysis. Person Med Universe. 2015;4:54–62. doi: 10.1016/j.pmu.2014.10.001. [DOI] [Google Scholar]
  • 35.Wieslander H, et al. Deep convolutional neural networks for detecting cellular changes due to malignancy. Proceedings of the IEEE international conference on computer vision workshops; 2017; pp. 82–89. [Google Scholar]
  • 36.Nanni L, Ghidoni S, Brahnam S. Handcrafted vs. non-handcrafted features for computer vision classification. Pattern Recogn. 2017;71:158–172. doi: 10.1016/j.patcog.2017.05.025. [DOI] [Google Scholar]
  • 37.Lin H, Hu Y, Chen S, et al. Fine-Grained Classification of Cervical Cells Using Morphological and Appearance Based Convolutional Neural Networks. IEEE Access. 2019;7:71541–71549. doi: 10.1109/access.2019.2919390. [DOI] [Google Scholar]
  • 38.Promworn Y, Pattanasak S, Pintavirooj C, et al. Comparisons of pap smear classification with deep learning models. 2019 IEEE 14th International Conference on Nano/Micro Engineered and Molecular Systems (NEMS), IEEE; 2019; pp. 282–285. [Google Scholar]
  • 39.Bora K, Chowdhury M, Mahanta LB, et al. Pap smear image classification using convolutional neural network. Proceedings of the tenth Indian conference on computer vision, graphics and image processing; 2016; pp. 1–8. [Google Scholar]
  • 40.Singh S, Tejaswini V, Murthy R, et al. Neural Network Based Automated System for Diagnosis of Cervical Cancer. Int J Biomed Clin Eng. 2015;4(2):26–39. doi: 10.4018/ijbce.2015070103. [DOI] [Google Scholar]
  • 41.Diniz D, Rezende M, Bianchi A, et al. A Hierarchical Feature-Based Methodology to Perform Cervical Cancer Classification. Appl Sci. 2021;11(9):4091. doi: 10.3390/app11094091. [DOI] [Google Scholar]
  • 42.Ravindran K, Rajkumar S, Muthuvel K. An Investigation on Cervical Cancer with Image Processing and Hybrid Classification. Int J Performab Eng. 2021;17(11):918. doi: 10.23940/ijpe.21.11.p2.918925. [DOI] [Google Scholar]

Articles from Reports of Practical Oncology and Radiotherapy are provided here courtesy of Via Medica sp. z o.o. sp. k.

RESOURCES