Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Mar 11;15:8427. doi: 10.1038/s41598-025-91720-7

A two stage blood cell detection and classification algorithm based on improved YOLOv7 and EfficientNetv2

XinZheng Wang 1,, GuangJian Pan 1, ZhiGang Hu 1, AoRu Ge 1
PMCID: PMC11897148  PMID: 40069243

Abstract

Current diagnoses of leukemia are typically performed manually by physicians on the basis of blood cell morphology, leading to challenges such as excessive workload, limited efficiency, and subjective outcomes. To solve the above problems, a two-stage detection method was developed for the automatic detection and identification of blood cells. First, for the blood cell detection task, an improved YOLOv7 blood cell detection model was proposed that integrates multihead attention and the SCYLLA-IoU (SIoU) loss function to accurately locate and classify white blood cells (WBCs), red blood cells (RBCs), and platelets in a full-field image of blood cells. For the white blood cell identification task of detecting network positioning, an improved EfficientNetv2 classification model was subsequently developed, which integrates the atrous spatial pyramid pooling (ASPP) module to increase classification accuracy and employs the balanced cross-entropy (BCE) function to address sample number imbalance. The experiments utilized four publicly accessible datasets: BCCD, LDWBC, LISC, and Raabin. The proposed detection model achieved an average accuracy of 94.7% in detecting and identifying blood cells in the BCCD dataset. With an IoU equal to 0.5, the model attained a mean average precision (mAP) of 97.17%. In the white blood cell classification task, an average precision (AP) of 95.12% and an average recall (AR) of 97% were achieved on the LDWBC, LISC, and Raabin datasets. The experimental results demonstrate that the proposed two-stage detection method detects and identifies blood cells accurately, thereby facilitating automatic detection, classification, and quantification of blood cell images, which can aid doctors in preliminary leukemia diagnosis.

Keywords: YOLOv7, EfficientNetv2, SIoU loss function, Multihead attention, ASPP, BCE

Subject terms: Biomedical engineering, Image processing

Introduction

WBCs are produced in the bone marrow and lymphoid tissues of the body’s immune system, and serve as crucial defense mechanisms against pathogenic microorganisms including bacteria, viruses, and fungi13. WBCs are traditionally split into two main categories: granulocytes and agranulocytes4,5. Granulocytes include neutrophils (50–70%), eosinophils (1–5%), and basophils (0–1%), whereas agranulocytes include lymphocytes (20–45%) and monocytes (2–10%)46. A deviation from the reference value of the white blood cell count can indicate the presence of various diseases7,8. According to existing research, between 1990 and 2017, the number of newly-diagnosed leukemia cases worldwide increased from 354,500 to 518,500. The number of cases of acute lymphoblastic leukemia (ALL) increased from 49,100 to 64,200 worldwide9. Moreover, the reported incidence of chronic lymphocytic leukemia has increased more than threefold. Leukemia poses a significant threat to human health, with the number of patients increasing annually10. As a result, timely detection and treatment of leukemia are particularly important.

The bone marrow is an important organ in the human body that regulates the production of WBCs, RBCs, and platelets. Accurate detection and identification of bone marrow blood cells are crucial for leukemia diagnosis and treatment. Traditional blood cell analysis requires skilled technicians to perform manual microscopy, which is a time-consuming, labor-intensive procedure prone to human error. Recent advancements in computer technology have enabled the rapid acquisition of digital blood smear images. In recent years, owing to the rapid advancement of computer technology, digital representations of blood smears can be quickly obtained. As a result, numerous computer-aided techniques utilizing cell morphology and characteristics have been developed for detecting and recognizing blood cells. For example, Hu et al.11 presented a novel method known as pixel-level balancing, which employs the pixel count within the detection box to accurately represent the size of the detected object. The training loss for targets of different sizes is dynamically adjusted via weights, thereby enhancing the accuracy of small target detection. To better obtain the characteristics of cell images, Zhai et al.12 proposed the following method. First, a threshold segmentation technique is applied directly to the raw whole slide images to segment all regions in the image that may represent WBCs. Then, WBC classification is performed using the segmented regions as input into a deep aggregation convolutional neural network.

Unlike traditional cell morphology recognition, deep learning technology allows networks to autonomously learn features from images, enabling simultaneous detection and classification. For example, Chen et al.13 proposed an attention-based dual-branch network TransMixNet to establish a classification model for WBC recognition. In the model-building phase, Chen et al.13 utilized transfer learning and data augmentation to address the insufficiency of samples and imbalance issues. They employed a hybrid strategy to model domain relationships among various training samples, thereby enhancing the generalization performance. To better obtain the distinguishing features between different cells, Liu et al.14 proposed a hybrid model that combines pretrained ResNet and DenseNet architectures via a spatial and channel attention module. This innovative approach was used to analyze the proximity among diverse training samples from different categories. Modeling relationships improved the model’s generalizability. Finally, occlusion testing uses the Grad-CAM algorithm to enhance model interpretability. In addition, Ahmad et al.15 utilized transfer learning on pretrained DenseNet201 and Darknet53 networks to extract optimal deep features from enhanced and segmented WBC images. The entropy-controlled marine predator algorithm is then used to filter the sequence fused feature vectors, and the resulting reduced feature vectors are subsequently classified through multiple baseline classifiers with different kernel configurations.

In recent years, several studies have focused on the diagnosis of leukemia-related diseases through the detection and classification of blood cell morphology. Li et al.16 conducted an evaluation of six blood cell detection models by establishing a comprehensive dataset consisting of multileukocyte images and considering various interfering factors commonly encountered in clinical applications. The primary objective was to provide invaluable support to medical professionals in augmenting their diagnostic capabilities. Yan et al.17 introduced a segmentation-based enhanced residual network that effectively discriminates acute promyelocytic leukemia from other subtypes of acute myeloid leukemia via morphological analysis of peripheral blood cells. Asar et al.18 proposed a new technique called the falcon optimization algorithm for leukemia detection and classification, which is based on deep convolutional neural networks, to classify and recognize leukemia accurately. Jawahar19 introduced the deep dilated residual convolutional neural network model with an attention mechanism to provide pathologists and oncologists with an accurate tool for classifying blood cells in the clinical diagnosis of ALL. Khan20 also incorporated dual attention mechanisms and proposed the deep convolutional generative adversarial network for the detection and classification of WBCs. This approach aimed to enhance diagnostic capabilities and advance real-world health care practices and diagnostic systems.

The majority of current detection and classification studies primarily focus on categorizing blood cells into RBCs, WBCs and platelets, or dividing WBCs into five distinct types. For example, Zhang et al.21 used Multi-Scale Feature Fusion to analyze these three cell types, while Cheuque et al.22 applied a multi-level convolutional neural network for the secondary classification of leukocytes. Expanding the scope of detection and classification to include more peripheral blood cell types can provide more comprehensive diagnostic information, enhancing clinical decision-making support, especially in diagnosing blood disorders such as anemia, infection, and leukemia. To address the complexity and optimization challenges associated with single-model multi-class detection, mitigate limitations due to dataset sample adequacy and category imbalance, and improve detection efficiency and classification accuracy, a multi-stage detection and classification strategy can be adopted. This approach decomposes complex tasks into simpler subtasks, thereby narrowing the target range and enhancing classification accuracy, while also providing greater model flexibility and scalability. In this context, inspired by previous research on cell images, a two-stage blood cell detection and recognition network based on YOLOv7 and EfficientNetv2 was proposed to locate, identify, classify, and count seven common types of cells in blood images. The specific contributions are outlined as follows:

  1. An improved YOLOv7 detection network that incorporates a module with a multihead attention mechanism was proposed for the cell detection task. This module facilitates learning from dynamically selected nuclear areas in WBC images and motivates the model to acquire discriminative information from WBC images, thereby addressing the sample similarity issue. Additionally, the SIoU loss function was introduced to increase the convergence speed and improve both the border positioning accuracy and the cell classification accuracy.

  2. The classification performance of EfficientNetv2 was enhanced by incorporating ASPP and employing atrous convolutions with different sampling rates. This augmentation strategy expands the network’s receptive field, enabling it to capture and integrate multiscale feature information, ultimately leading to improved identification of WBCs on the basis of cell detection in the previous stage. Additionally, global average pooling (GAP) was implemented to integrate cell image-level features, thereby enhancing the representation capability of global information. In addition, 1 × 1 convolutional operations were employed to mitigate potential degradation issues arising from high sampling rate hole (atrous) convolutions, thereby ensuring comprehensive preservation of details and further enhancing the accuracy of WBC classification.

  3. Considering the unbalanced samples in the dataset, first, the number of data samples was artificially increased through image processing operations such as translation, rotation, and shearing. In addition, during the training process, a BCE loss function was used to allocate the samples of each category dynamically. Assigning greater weights to samples with smaller numbers addressed the sample imbalance issue.

Related work

Machine learning (ML) and deep learning (DL), as prominent computer image processing technologies, have been extensively applied in the medical field, particularly in medical image-assisted diagnostics. For example, Alhashmi et al.23 developed a survival analysis model for thyroid cancer patients via surveillance, epidemiology, and end results data and machine learning algorithms. By identifying seventeen key factors and applying decision trees, random forests, and an SVM, the model achieved accuracy rates between 97 and 99% and outperformed previous methods in terms of the F1 score, precision, recall, and AUC. Tasci et al.24 proposed an automated Schizophrenia detection framework utilizing blood sample scattergram images, employing a feature engineering model based on the local binary pattern technique combined with shallow classifiers. The proposed methodology was evaluated on two distinct scattergram image datasets, achieving classification accuracies of 89.29% and 90.58%, respectively. These experimental results demonstrate the potential of this approach as a promising biomarker for Schizophrenia detection.

Compared with traditional machine learning, deep learning models can directly learn features from raw data, avoiding the cumbersome process of manual feature extraction. In recent years, deep learning has made significant advancements in medical image detection, classification, and recognition. Its applications span biometric recognition, imaging-assisted diagnostics, and microscopy image detection. For example, Banerjee et al.25 proposed an iris segmentation method based on the V-Net architecture that accurately locates the iris boundaries through semantic segmentation and uses the YCrCb and HSV color space techniques to recover the iris boundaries in the images. Ismael et al.26 explored multiresolution techniques for COVID-19 detection in chest X-rays via shear transforms for image decomposition and extreme learning machines for classification, achieving an accuracy of 99.29%. They also proposed a deep learning method27 using pretrained CNNs for feature extraction and an SVM for classification, along with developing a new CNN model. Gomes et al.28 introduced the IKONOS app, an intelligent system for diagnosing COVID-19 and viral and bacterial pneumonia via chest X-rays, by applying the Haralick and Zernike moments for feature extraction and classic classifiers for diagnosis. Faruqui et al.29 developed HAAS, a cloud-based model for lung cancer diagnosis that combines the HAASNet CNN classifier with IoMT devices, achieving 96.07% accuracy on CT scans and an F1 score of 94.81% and enhancing diagnostic efficiency and accessibility. For bacterial strain identification, Trivedi et al.30 introduced an automatic bacterial strain classification method using a CNN, which achieved 95.12% accuracy and 96.70% recall. This method improves efficiency and addresses the problem of insufficient training images through innovative image augmentation techniques.

Deep learning detection methods for peripheral blood cell morphology were initially investigated to train only on regions adjacent to a single pixel, generating a separate output for each trained pixel. As multiple target detection networks continue to emerge, researchers have trained network models via an end-to-end approach to learn and obtain outputs for all pixels simultaneously. Currently, mainstream target detection algorithms are broadly categorized into two types: two-stage and single-stage. Two-stage algorithms often suffer from low detection speeds. In contrast, single-stage algorithms offer faster detection, bypassing the candidate region processing step and directly localizing and classifying objects within the input image. Therefore, most of the target detection algorithms used in blood cell detection tasks are single-stage detection algorithms that do not require candidate regions. Wu et al.31 proposed the SW-YOLO algorithm based on YOLOv5s, which integrates the Swin transformer to enhance feature extraction, removes the large object detection layer to improve accuracy for smaller targets, and incorporates depthwise separable convolutions during feature fusion. Additionally, the EIOU loss function was used to better address the imbalance between positive and negative samples. Cong et al.32 used the improved ISE-YOLO algorithm based on YOLOv3 to detect three types of blood cells. ISE-YOLO first combines GAP with global maximum pooling to extract the feature maps and incorporates an improved squeeze-and-excitation (SE) structure. The designed SE module is then introduced into the YOLOv3 algorithm to strengthen the network’s ability to improve the detection performance for more accurate blood cell detection.

The attention module can help the model ignore irrelevant information and focus on representing key information during the detection process, learn the weight distribution from features in different layers, and combine the learned weights to improve feature performance. Researchers have introduced the attention mechanism into blood cell detection models, and in 2023, Shi et al.33 proposed a lightweight model named GPMB-YOLO for detecting three types of blood cells: WBCs, RBCs, and platelets. The model is based on the YOLOv8n framework, employing the PGhostC2f design to reduce model complexity and incorporating the SimAM attention mechanism to enhance feature extraction. Additionally, a multidimensional attention-enhanced bidirectional feature pyramid network was designed to optimize multiscale feature fusion, with hyperparameter optimization performed via genetic algorithms to further improve detection accuracy. Wang et al.34 improved YOLO by adding a multiscale residual module to enhance feature use and an attention mechanism to better combine features. They also addressed blood cell sample imbalance by weighting positive and negative samples differently.

Deep learning has demonstrated remarkable processing capabilities in the white blood cell classification task. It is capable of simultaneously accomplishing both feature extraction and classification tasks, thereby simplifying the recognition process of WBCs. This not only helps to optimize the classification performance but also enhances the overall efficiency, providing a valuable reference for the diagnosis of leukemia. In leukemia classification, Attallah et al.35 proposed a CAD system that combines extended wavelet pooling with CNNs and statistical texture features for detecting ALL and its subtypes. The method extends multiple CNN models by incorporating a wavelet pooling module and integrates texture features extracted via BP and the GLCM. The aim is to enhance the classification performance of ALL and its subtypes by merging deep features with handcrafted features. Talaat et al.36 proposed a novel classification model based on an optimized convolutional neural network (OCNN) for detecting and classifying leukemia microscopy images. The method includes image preprocessing, feature extraction, and classification and uses fuzzy optimization to adjust the CNN hyperparameters. Luo et al.37 proposed a white blood cell classification network, ResNeXt-CC, based on cross-layer deep feature fusion by exploiting the fact that cell classification relies mainly on local feature differentiation and achieved an accuracy of 89.93% on the C-NMC 2019 dataset, which is higher than that of existing classification methods; thus, the category imbalance problem was solved effectively.

In white blood cell classification, Mohamad38 used Google ViT and a migration learning approach with an ImageNet CNN to classify four types of WBCs on PBC and BCCD peripheral blood image datasets. This study focused on the effects of data volume and dataset noise on the classification method. The method emphasizes the effectiveness of Google ViT in the presence of insufficient data and the presence of unclean data. Rao et al.39 used the MobileNetV3-ShuffLeNetV2 model for global and local feature extraction to classify WBC images, resulting in the classification of WBC images into five classes, and the classification performance was excellent on the BCCD and Raabin datasets. Erten et al.40 used the ConcatNeXt model based on the CNN architecture to classify eight classes of blood cells and achieved 97.77% accuracy, and the model used Grad-CAM to further increase the accuracy to 98.73%. This study also used gradient-weighted class activation maps to further validate the model’s ability to focus on key regions of the image.

The above studies demonstrate the robust processing capabilities and extensive development potential of deep learning in the detection and classification of peripheral blood cells. Despite significant advancements, the implementation of detection and classification methods, as well as result reliability, is impeded by concerns regarding patient privacy protection, high labeling costs due to cell similarity, inadequate or imbalanced sample sizes, and other issues. The ongoing research endeavors to fully harness the potential of existing public datasets by employing a multi-stage strategy to advance blood cell detection and classification methodologies. Compared to conventional detection or classification algorithms, the proposed multi-stage network demonstrates significant improvements in accuracy, efficiency, and robustness. The network is specifically designed for blood cell detection, coarse classification, and white blood cell subclassification tasks. It integrates the YOLOv7 object detection algorithm, renowned for its superior detection performance, with the EfficientNetv2 image classification network, which is widely recognized for its high classification accuracy. Through the construction and optimization of relevant datasets, along with the meticulous implementation of advanced techniques—including multihead attention mechanisms, the SIoU loss function, the ASPP module, and the BCE loss function—the proposed network achieves precise identification and localization of target regions in blood cell images. This approach not only enhances the accuracy of white blood cell categorization but also significantly reduces false positives and false negatives. This work aims to provide robust technical support for addressing practical issues such as false detection and sample imbalance in hematological analysis.

Methods

The diversity and similarity among different subtypes of WBCs pose a challenge in predicting their types during target detection. Therefore, in this paper, a two-stage blood cell detection and classification algorithm based on YOLOv7 and EfficientNetv2 is employed (Fig. 1). In the first stage, the YOLOv7 framework is improved by integrating the multihead attention mechanism and the SIoU loss function, which collectively improve the accuracy of detecting and classifying RBCs, WBCs, and platelets. In the second stage, the classification task involves subdividing the subtypes of WBCs to increase the accuracy of anchor frame prediction throughout the entire process. Leveraging the EfficientNetv2 architecture integrated with the ASPP module, this algorithm is able to extract WBC features more effectively. Furthermore, the BCE loss function is employed to boost the classification accuracy for the various WBC types. The following is a description of the relevant data detected and recognized. First, cell images are detected from blood images as RBCs, WBCs, and platelets. Then, on the basis of first-order cell detection, the WBC classification network is used to further classify WBCs into basophils, eosinophils, lymphocytes, monocytes, and neutrophils. Finally, cell detection and WBC classification are combined to for the seven-level classification and recognition tasks in the blood images. The specific implementation of the algorithm is detailed in the following subsections.

Fig. 1.

Fig. 1

Overall framework of the two stage blood cell detection and classification model.

Improved YOLOv7 detection network

To realize the one-stage cell detection task and comprehensively consider the aims of accurate, real-time detection, YOLOv741 was utilized as the basic network and improved. As shown in Fig. 2, the improved YOLOv7 network architecture includes three parts: the backbone, the neck, and the head. Among them, the backbone extracts image features, whereas the neck processes and integrates them. The head subsequently performs classification and localization on the feature information generated by the neck, resulting in outputs such as classification probabilities, confidence scores, and bounding box information. The network finally outputs the corresponding prediction results on the basis of three targets of different sizes—RBCs, WBCs, and platelets—and then uses the NMS strategy to process the prediction borders and output the final prediction results. Considering that there are many objects in the cell dataset used in this study and that the similarity between various cell types is high, the system was improved on the basis of the YOLOv7 network structure. Specifically, in the design of the backbone network, the efficient layer aggregation network (ELAN) module was introduced, which includes two variants: ELAN-H and ELAN. Through operations such as expansion and merging, more blocks were stacked to extract more feature information. Additionally, a multihead attention mechanism module was added after the ELAN-H module, as shown in the ATT module in Fig. 2, to improve the perception of local features of cells and thus improve the accuracy of blood cell classification. The SIoU loss function was also integrated to increase both the positioning accuracy and the model generalization ability. The SIoU loss function addresses the gradient instability issue that can occur in CIoU when bounding box overlap is minimal by smoothing the IoU calculation. This improvement facilitates superior model convergence during training, especially for small targets or boxes with minimal overlap. In computing the bounding box loss, The SIoU loss function not only considers the IoU value but also integrates factors such as the distance between the centers of the bounding boxes and their aspect ratios, resulting in more precise target position regression.

Fig. 2.

Fig. 2

Improved YOLOv7 network architecture integrating multihead attention.

Multihead attention

Since YOLOv7 may miss detections when detecting and identifying blood smear images, a multihead attention mechanism42 was incorporated into the YOLOv7 network. Compared with other attention mechanisms, as shown in Table 1, the multihead attention can process multiple attention heads simultaneously. This allows it to learn rich hierarchical association features across long sequences and capture global information effectively. It focuses on information critical to the current task, thereby capturing more diverse feature information and laying a solid foundation for enhancing model performance. An analysis of blood cell detection tasks and cellular morphological characteristics, revealed that blood cells are often unevenly distributed in blood images or that there are cases where RBCs accumulate in large numbers. Therefore, integrating a multihead attention mechanism can effectively capture multiscale dependencies: each attention head can learn different representations from the input data, which helps capture multiscale dependencies within the input. Additionally, by extracting information from multiple perspectives, the model can learn richer and more diverse feature representations, thereby enhancing the overall expressive power of the model. Furthermore, ELAN-H uses operations such as expand, shuffle, and merge cardinality to enable the network to stack more blocks and extract more feature information. Finally, the fused attention module processes the output of ELAN-H, improving the accuracy of the network’s predictions. This modification enables the model to focus on different segments of the input simultaneously, thereby improving its ability to capture diverse cells. The dependencies and correlations between different features can be used to detect and identify cell images more accurately. Its structure is shown in Fig. 3, and its principle is shown in Eqs. (1) and (2).

Table 1.

Module comparison structure of the improved YOLOv7 detection network.

Module Application position Key features Advantages
CBAM Backbone feature extraction layer

Combines channel and spatial attention

Lightweight design

Improves channel and spatial feature selection
SE Backbone feature extraction layer Channel attention mechanism Simple and efficient Enhances channel-wise feature correlation
Multihead Attention Backbone feature extraction layer

Parallel computation across multiple channels

Captures multi-scale information

Enhances feature representation

Improves adaptability to complex scenarios

Fig. 3.

Fig. 3

Schematic of the multihead attention mechanism.

graphic file with name M1.gif 1
graphic file with name M2.gif 2

The essence of the scaled dot-product attention mechanism, as described in Eq. (1), lies in abstracting the input two-dimensional feature map sequence data into query matrix Q, key matrix K, and value matrix V. These matrices are derived from the original input X through linear transformations that depend on three weight matrices Inline graphic, and Inline graphic (Inline graphic,and Inline graphic). Multihead attention, as outlined in Eq. (2), allows the model to attend to diverse information from different representation subspaces at different positions. This equation describes the process by which multihead attention divides the input sequence into h heads. Following the computation of each head, they are concatenated along the channel dimension and transformed for the attention calculation. Equations (1) and (2) indicate that different heads can learn different attention weights and develop the ability to concentrate on different local areas in the image, thereby improving the ability to perceive the local features of the cells and thus capture the different aspects of the input data. Moreover, each attention head generates an attention weight distribution, and these distributions are then multiplied with the input data to produce a weighted representation. As shown in Fig. 3, the outputs of these heads are usually connected and then integrated through a linear transformation to generate the final multihead attention output. The addition of a multihead attention mechanism enables the capture of spatial relationships among cells, thereby facilitating a deeper understanding of their interactions and connections. This advancement has led to improved analysis and detection of cell images, ultimately increasing the accuracy of identification and classification.

SIoU loss function

Despite the use of the nonmaximum suppression strategy, significant errors persist in the final prediction frames generated by the cell detection network. The SIoU loss function43, which incorporates the IoU of the predicted and ground truth boxes, also introduces a penalty term associated with shape, thereby enabling a more comprehensive evaluation of the disparities between the predicted and ground truth boxes. This helps the model acquire a better understanding of the shape characteristics of target boxes and mitigates the aforementioned errors. As shown in Table 2, compared with the GIoU and CIoU loss functions, which focus on improving IoU-based regression by penalizing nonoverlapping or center-misaligned boxes, the SIoU loss function further enhances bounding box regression by incorporating angular and shape alignment optimizations, while avoiding the slow training convergence problem caused by angle mismatch. The SIoU loss function simultaneously considers the proportion and angle differences between target boxes, thereby enhancing the accuracy of target box positioning. Therefore, the border loss function was optimized, and the SIoU loss function was employed as a new loss criterion in this study.

Table 2.

Loss function comparison structure of the improved YOLOv7 detection network.

Loss Function Key Features Advantages Limitations
GIoU Loss

Extends IoU by considering the smallest enclosing box

Penalizes boxes with poor overlap

Better than IoU for handling non-overlapping boxes Suboptimal when aspect ratio differences are significant
CIoU Loss

Adds both center distance and aspect ratio penalty terms

Considers overlapping area, center distance, and aspect ratio

Balances localization, scale, and aspect ratio

Better for handling complex shapes compared to GIoU and DIoU

Higher computational complexity compared to GIoU and DIoU
SIoU Loss

Incorporates shape alignment for improved regression

Considers angular distance and IoU

Aims to optimize both localization and aspect ratios

Enhances bounding box regression accuracy

Better alignment with object shapes

Higher computational cost compared to basic IoU loss

The SIoU loss function combines four cost functions, namely, shape cost, distance cost, angle cost and IoU cost, as illustrated in Eq. (3).

graphic file with name M7.gif 3

The IoU measures the degree of overlap between the predicted box and the ground truth box, as shown in Fig. 4a, and its value is Inline graphic. Inline graphic represents the shape cost, Inline graphic represents the distance cost, Inline graphic represents the angle cost, and the formulas for each cost are as follows:

graphic file with name M12.gif 4
graphic file with name M13.gif 5

Fig. 4.

Fig. 4

Supplementary illustrative figure for the SIoU loss function, illustrating (a) the IoU cost, (b) distance cost, and (c) angle cost from Eqs. (3), (5), and (6), respectively.

Inline graphic where

graphic file with name M15.gif 6

In the shape cost Ω formula, Inline graphic and Inline graphic represent the widths of the predicted box and actual box, respectively, whereas Inline graphic and Inline graphic represent the heights of the predicted box and actual box respectively. Inline graphic is a factor that determines the degree of attention to shape cost, serving as an adjustment factor for constraining shape. In the distance cost Δ formula, Inline graphic and Inline graphic represents the width and height of the minimum enclosing rectangle for both real and predicted boxes, respectively, as shown in Fig. 4b. The coordinates Inline graphic represent the center point in the ground truth box, whereas Inline graphic denotes the center point in the predicted box. According to the angle cost formula, when attempting to minimize angle Inline graphic during the convergence process in Fig. 4c, β is minimized when α is less than or equal to Inline graphic. Introducing directionality through angle cost enables faster convergence during the training phase and better performance in inference. The experimental findings demonstrate that the addition of the SIoU loss function accelerates model convergence, resulting in a higher bounding box prediction IoU and more accurate cell detection results.

Improved EfficientNetv2 classification network

By building upon the initial classification of RBCs, WBCs, and platelets achieved in the first detection stage, EfficientNetv244, characterized by efficient parameter utilization, fast training speed, and robust feature extraction ability, was utilized as the foundational classification network for the detailed classification of WBCs. The overall improved EfficientNetv2 classification network structure is shown in Fig. 5. On the basis of EfficientNetv1, Fused-MBConv was introduced into the search space, combining the strengths of MBConv and traditional convolution to enhance EfficientNetv2’s prediction accuracy. Additionally, adaptive learning was introduced for progressive learning. An ASPP module was added to Block 6, capturing comprehensive multiscale feature information through parallel dilated convolutions of varying scales, thereby enhancing feature extraction for diverse cell types. Furthermore, during the training phase, a BCE loss function was employed to dynamically adjust weights based on the sample count of each category within the dataset, thereby addressing the issue of sample imbalance. Regularization intensity adjustment mechanisms, such as dynamic adjustment of regularization techniques (e.g., dropout, data augmentation, and mixup), can improve training speed while improving the accuracy of classification of various types of WBCs.

Fig. 5.

Fig. 5

Structure of the improved EfficientNetv2 in this study.

ASPP

The lack of obvious morphological differences between WBC types makes their classification challenging. The ASPP module45 extracts multiscale contextual information by employing atrous convolutions with varying dilation rates, thereby enhancing the model’s ability to handle complex structures and fine details. Specifically, ASPP (Fig. 6) classifies the center pixel (orange) by utilizing parallel dilated convolutions, which process the input feature map at different dilation rates. This design integrates both local and global contextual information without increasing computational cost or sacrificing spatial resolution. The dilation rate determines the sampling interval of the convolutional kernel, thereby controlling the receptive field size. In Fig. 6, the effective field-of-views of these parallel filters are depicted in different colors, illustrating ASPP’s multiscale mechanism. By capturing diverse features at multiple scales, the ASPP module robustly adapts to blood cells of various sizes and shapes, significantly improving the accuracy and versatility of white blood cell classification in diverse datasets. Compared with traditional convolutional neural networks, the ASPP module effectively prevents spatial information loss during down sampling operations, resulting in more precise classification of complex structures and improved overall performance. As illustrated in Table 3, unlike other modules, ASPP captures context information at multiple scales while maintaining feature map resolution through parallel dilated convolutions with varying dilation rates. This design not only avoids the tendency of GAP to overlook local details but also mitigates potential “grid effects” from extended convolutions and prevents information loss due to reduced spatial resolution as seen in pyramid pooling module (PPM). Furthermore, when integrated into lightweight networks such as EfficientNet, ASPP does not significantly impact inference speed, striking a favorable balance between classification accuracy and efficiency. Therefore, the ASPP module was integrated into the EfficientNetv2 network (Fig. 6) via dilated convolutions with different magnification expansion factors to capture features at multiple scales. This improvement contributes to the enhanced robustness of the WBC classifier, enabling it to handle various images of cells with different sizes and shapes. This integration improves the model’s classification accuracy on WBC datasets.

Fig. 6.

Fig. 6

Schematic of the ASPP module.

Table 3.

Module comparison structure of the improved EfficientNetv2 classification network.

Module Key Features Advantages Limitations
GAP

Aggregates spatial features into a single vector

Lightweight and efficient

Simple and computationally inexpensive Overlooks local details, affecting feature richness
PPM Pools features at different scales to capture global context Improves feature extraction for sparse data May introduce ‘grid effects’ in feature maps
Dilated convolutions Expands receptive field without increasing computation significantly Simple and computationally efficient Poor handling of scale variation
ASPP

Combines atrous convolutions at multiple rates

Includes GAP for spatial context enhancement

Captures multi-scale context effectively

Prevents ‘grid effects’ of dilated convolutions

Retains local details overlooked by GAP

Slightly higher computational cost compared to simpler methods

BCE

Judging from the distribution of the number of various types of samples in the dataset, the data volume of basophil images is significantly smaller than that of other cell types, and the number of other types of WBC samples is also unevenly distributed. Therefore, when tackling classification problems, selecting an appropriate loss function is essential. The loss function determines how the model adjusts its weights based on the discrepancy between predicted outcomes and true labels. Currently, widely used loss functions include standard cross-entropy loss, BCE loss46, and focal loss, as summarized in Table 4. Compared with the standard cross-entropy loss function, which treats all classes equally but struggles with imbalanced datasets, and the focal loss function, which emphasizes hard-to-classify samples but requires careful hyperparameter tuning, the BCE loss function mitigates class imbalance by assigning class-specific weights, effectively improving predictions for minority classes. This function also exhibits superior performance in scenarios with a relatively uniform distribution of easy and hard samples. Moreover, during the optimization process, it helps mitigate the vanishing gradient issue and ensures numerical stability. These characteristics enable the BCE loss to effectively tackle the challenge of white blood cell classification. It enhances feature learning for majority classes while improving recognition accuracy for minority classes, thereby boosting overall classification performance. Therefore, the BCE loss function, defined in Eq. (7), was proposed as the training objective in the improved network, which assigns different weights to different categories to solve the unbalanced sample distribution problem.

graphic file with name M27.gif 7

Table 4.

Loss function comparison structure of the improved EfficientNetv2 classification network.

Loss function Key features Advantages Limitations
Standard cross-entropy loss

Calculates logarithmic loss for each class

Treats all classes equally

Simple and efficient for balanced datasets Struggles with imbalanced datasets
Focal loss

Adds a focusing term to penalize easy examples

Emphasizes hard-to-classify samples

Effective for highly imbalanced datasets

Reduces the impact of overconfident predictions

Requires careful tuning of hyperparameters
BCE loss

Adds class weight adjustment to standard cross-entropy

Mitigates class imbalance

Helps in learning rare classes effectively

Reduces the impact of dominant classes

Improves performance on minority class predictions

Sensitive to weight hyperparameters

In Eq. (7), x is the input, C is the number of classes, and Inline graphic is the weight of the category to which Inline graphic belongs. The weight w is calculated via the equation Inline graphic,where Inline graphic is a 1 × C matrix composed of the total number of categories in the dataset, and where Inline graphic is the sample median. The calculation formula shows that the smaller the number of samples is, the greater the weight assigned. Therefore, the BCE loss function is utilized to assign appropriate weights according to sample quantities to solve the error problem caused by the imbalance in the number of samples during model prediction.

Experiments and results

Datasets

For the first-stage target detection task, the BCCD dataset47 was used for analysis. This dataset consists of 364 images, each with a resolution of 640 × 480 pixels, and includes a total of 4888 labels: 4155 RBCs, 372 WBCs, and 361 platelets. When the experimental dataset used in this study was divided, the BCCD dataset was split into 218 training images, 73 validation images, and 73 testing images.

For the second-stage WBC classification task, three public datasets were utilized as experimental datasets: LDWBC13 (the largest freely available WBC image dataset), LISC48 (WBC images for segmentation and classification), and Raabin49. Table 5 summarizes the three publicly available WBC datasets used in the experiments of this study. Among them, the LDWBC dataset was divided into 13,585 training samples, 4,530 validation samples, and 4,530 testing samples. The LISC dataset was split into 144 training samples, 49 validation samples, and 49 test samples, whereas the Raabin dataset was divided into 8,710 training samples, 2,902 validation samples, and 2,902 test samples.

Table 5.

Number of images of the five leukocyte subtypes in the training, test, and validation sets across the three datasets.

DataSet Type Image number Total
Basophil Monocyte Eosinophil Neutrophil Lymphocyte
LDWBC Train 134 580 323 6,281 6,267 13,585
Val 45 194 108 2,094 2,089 4,530
Test 45 194 108 2,094 2,089 4,530
Total 224 968 539 10,469 10,445 22,645
LISC Train 31 28 23 30 32 144
Val 11 10 8 10 10 49
Test 11 10 8 10 10 49
Total 53 48 39 50 52 242
Raabin Train 181 477 640 5,335 2,077 8,710
Val 60 159 213 1,778 692 2,902
Test 60 159 213 1,778 692 2,902
Total 301 795 1,066 8,891 3,461 14,514

Experimental setup

All the experiments were performed on the basis of the Windows GPU computing platform equipped with an Intel Core i9 12900KF CPU, 32 GB of RAM, an NVIDIA GTX3060Ti GPU, and an Intel® c612 chipset. The experiments were conducted via the open-source PyTorch framework and its Python interface.

In this study, a five fold cross-validation method was used for evaluation. The dataset was first divided into 5 equal parts, and each part was alternately used as the test set, while the remaining 4 parts were used as the training set. The average value of the results from the 5 test sets was calculated to assess the predictive performance for comparison with other commonly used methods. Appropriate hyperparameters were set before network training. The main hyperparameter settings are presented in Table 6. The selection of the batch size primarily considered the video memory capacity, and a maximum batch size of 32 was employed in the experimental setup. The learning rate was determined based on the error curve observed during the training process. In the case of a slow decrease in the curve, an increase in the learning rate was implemented, whereas if significant oscillations were observed, a reduction in the learning rate was applied. Therefore, the learning rate decay strategy was adopted in the experiments. During the initial training phase, a relatively large learning rate was implemented and then dynamically adjusted on the basis of the training results to ensure stable convergence of the model.

Table 6.

Hyperparameter configuration of the improved YOLOv7 detection and improved EfficientNetv2 classification models.

Hyperparameter Value
Improved YOLOv7 Improved EfficientNetv2
Activation function LeakyReLU, Sigmoid SELU
Cost function FocalLoss-SIoU BCE
Learning rate 1 × 10−2
Momentum 0.97
Weight decay 5 × 10−4
Optimizer Adam
Epochs 100
Batch size 32
anchor threshold 4
box loss weight 0.05
Classification loss 0.3
Objectness loss 0.7
Training callbacks

Model checkpoint,

Reduce LR on plateau

Early stopping

Cosine annealing

Evaluation indicators

The evaluation criteria utilized for the detection and classification models include accuracy, precision, recall, AP, AR, overall accuracy (OA), average F1-score (AF1), and mAP. The accuracy metric measures the ratio of the number of correctly classified samples to the total number of samples. The precision metric represents the ratio of the number of correctly predicted positive samples to the number of all predicted positive samples, whereas the recall metric represents the ratio of the number of correctly predicted positive samples to the total number of positive samples. The details are illustrated in the following formulas:

graphic file with name M33.gif 8
graphic file with name M34.gif 9
graphic file with name M35.gif 10
graphic file with name M36.gif 11
graphic file with name M37.gif 12
graphic file with name M38.gif 13
graphic file with name M39.gif 14

In the above formula, true positive (TP) refers to instances where the classifier correctly predicts a sample as positive, matching the sample’s actual positive status. False positive (FP) refers to the classifier incorrectly predicting a positive sample when the actual result is negative, indicating the occurrence of false positives among negative samples. True negative (TN) occurs when the classifier accurately identifies a sample as negative, aligning both the predicted and actual statuses as negative. False negative (FN) occurs when a classifier incorrectly predicts a positive sample as negative, representing the number of missed positive samples. The precision-recall (PR) curve depicts the interplay between precision and recall. The area under this curve, combined with the coordinate axis, is referred to as the AP. The mAP is calculated by averaging the AP values across all categories.

Experimental results

Comparative results of the improved YOLOv7 detection network

Table 7 presents a control experiment using different target detection methods on the BCCD dataset. The improved YOLOv7 network demonstrated detection and recognition precisions of 90.5%, 99.5%, and 94.1% for RBCs, WBCs, and platelets, respectively, surpassing the performance of other detection networks. At an IoU threshold of 50, the mAP values for RBCs, WBCs, and platelets were 94.5%, 99.7%, and 97.3%, respectively. Compared with the original YOLOv7 network, the improved YOLOv7 network achieved significant enhancements in prediction precision for platelets, RBCs, and WBCs, with increases of 7.8%, 7.3%, and 1.8% respectively. Importantly, it maintained the same inference time of 8ms as the original YOLOv7 network, outperforming other detection networks and ensuring real-time follow-up diagnosis. Thus, the improved detection network developed in this study shows superior performance in the detection and identification of RBCs, WBCs, and platelets.

Table 7.

Comparison of the results of the improved YOLOv7 with those of other detection methods on the BCCD test sets. Significant values are shown in bold.

Model Class Precision (%) Recall (%) mAP50 (%) mAP50-95 (%) Inference time (ms)
YOLOv5x RBC 79.5 93.3 92.6 68.7 8.8
WBC 97.8 100 99.4 82.1
Platelets 87.3 94.8 93.6 52.9
YOLOv741 RBC 83.2 88.4 93.8 70.2 7.6
WBC 97.7 100 99.5 82.9
Platelets 86.3 96.9 96.6 52.6
Faster-RCNN50 RBC 77.6 94 92.2 69.6 12
WBC 97.6 100 99.3 81.3
Platelets 93.5 89.2 96.4 50.9
Retina Net51 RBC 73 94.7 92.8 70.2 10
WBC 96.6 100 99.4 79.9
Platelets 89 83.1 92.2 51.6
Improved YOLOv7 RBC 90.5 95.8 94.5 64.1 8
WBC 99.5 100 99.7 87.4
Platelets 94.1 92.6 97.3 63.5

In this study, t tests are employed to evaluate and compare the performances of the traditional YOLOv7 model and improved YOLOv7 model in a blood cell detection task (Table 8). The results indicate that the improved YOLOv7 outperforms the traditional model across several evaluation metrics, particularly in detecting RBCs and platelets. For leukocyte detection, the original YOLOv7 model already achieved near-maximum performance (precision: 97.70 ± 0.46, recall: 100.00 ± 0.00, mAP50: 99.50 ± 0.48). While the t test results revealed no significant differences across multiple metrics, the improved model achieved small but consistent improvements. Notably, the mAP50-95 metric was significantly greater (p value ≤ 0.01). For platelet detection, the improved YOLOv7 achieved an accuracy of 94.10 ± 1.21 and an mAP50-95 of 63.52 ± 1.21, representing substantial improvements over the traditional model (p values < 0.001 for both). These results highlight the superior performance of the improved model, suggesting notable advancements in detection accuracy and applicability. Overall, the improved YOLOv7 demonstrated higher precision and recall in the detection of RBCs, platelets, and leukocytes.

Table 8.

Comparison of the YOLOv7 (Group 1) and improved YOLOv7 (Group 2) metrics, **p ≤ 0.01, ***p ≤ 0.001.

Class Metrics Mean ± standard deviation t-statistic p-value
YOLOv741 Improved YOLOv7
RBCs Precision 83.20 ± 1.25 90.50 ± 1.50 − 8.360 0.000***
Recall 88.40 ± 3.92 95.80 ± 1.05 − 4.077 0.004 **
mAP50 93.80 ± 2.29 94.50 ± 1.07 − 0.619 0.553
mAP50-95 70.20 ± 0.96 64.10 ± 2.59 4.938 0.001
WBCs Precision 97.70 ± 0.46 99.50 ± 0.19 − 8.087 0.000***
Recall 100.00 ± 0.00 100.00 ± 0.00
mAP50 99.50 ± 0.48 99.70 ± 0.20 − 0.860 0.415
mAP50-95 82.90 ± 1.18 87.40 ± 2.07 − 4.223 0.003**
Platelets Precision 86.30 ± 1.64 94.10 ± 1.21 − 8.558 0.000***
Recall 96.90 ± 0.25 92.6 ± 1.35 7.003 0.000
mAP50 96.60 ± 1.58 97.30 ± 1.28 − 0.770 0.464
mAP50-95 52.60 ± 0.96 63.52 ± 1.21 − 15.780 0.000***

Figure 7 shows the PR curve for the improved detection network throughout the training process. Through the PR curve graph, the changes in the performance of the model under different thresholds can be observed. The larger the area under the curve, the better the performance of the model is generally indicated. On the BCCD dataset validation set, the improved YOLOv7 detection network achieved average precisions of 93.8% for RBCs, 99.5% for WBCs, and 96.6% for platelets. Moreover, the areas under the PR curves for each category are all close to 1, indicating that the improved YOLOv7 network not only achieves accurate target detection but also demonstrates high classification accuracy. Figure 8 presents the confusion matrix for the improved detection network predicting the classification on the test set of the BCCD dataset. The confusion matrix clearly illustrates the model’s prediction performance for each type of cell, including both correct classifications and misclassifications. The recognition accuracies of RBCs, WBCs, and platelets were 92.82%, 100%, and 97.81%, respectively. The proposed method demonstrates outstanding detection performance, which has the potential to significantly enhance the accuracy of medical diagnoses.

Fig. 7.

Fig. 7

PR curve of the BCCD dataset trained on the improved YOLOv7 detection network.

Fig. 8.

Fig. 8

Confusion matrix of the classification results on the BCCD dataset.

Figure 9 shows the heatmap and comparison chart for the results predicted by the improved detection model on the BCCD test dataset. Three sets of images were used for analysis, where Fig. 9a represents the original image, Fig. 9b represents the heatmap generated by the detection target, and Fig. 9c represents the actual detection effect map of the detection network. In the heatmap, as the probability increases, the color indicating the likely position of the cells gradually shifts towards red. Areas with a deeper shade of red signify a higher likelihood of being the center position of the target cells. Comparing the heatmap and the actual prediction results shows that the improved detection model achieves comprehensive and accurate predictions as well as precise localization of various types of primitive blood cells within the image. Furthermore, it is evident that even with partial overlap between cells, as exemplified by the lower right corner of the third sample, the improved YOLOv7 network can efficiently detect and locate them, demonstrating its significant clinical applicability.

Fig. 9.

Fig. 9

Comparison of prediction results on the BCCD dataset with those of the improved YOLOv7 detection network. (a) Original image. (b) Heatmap of the improved YOLOv7. (c) Prediction results of the improved YOLOv7.

Experimental results of the improved EfficientNetv2 classification network

Table 9 compares the improved EfficientNetv2 classification network from this study with other classification network, on the LDWBC, LISC, and Raabin test sets. Table 10 compares the prediction accuracies for each category of WBCs on the LDWBC, LISC, and Raabin test sets. Both tables demonstrate that the improved WBC classification network outperforms other networks in various evaluation metrics. Taking LDWBC as an example, the improved EfficientNetv2 classification network achieved OA, AP, AR, and AF1 scores of 97.81%, 90.44%, 96.59%, and 93.42% respectively. For individual WBC types, the identification of basophils, monocytes, eosinophils, neutrophils and lymphocytes achieved accuracies of 95.56%, 91.24%, 100%, 98.90%, and 97.27%, respectively, all superior to other models. Although the 98.90% accuracy of predicting neutrophils decreased by 0.57% compared with the 99.47% prediction accuracy of the dual path network, the improved EfficientNetv2 still achieved an OA of 97.81% and an AR of 96.59%, and outperformed the other models. Table 9 also demonstrates that the inference time of the improved EfficientNetv2 classification network is shorter than that of the other compared models. In terms of network performance, the improved EfficientNetv2 exhibited a better classification effect, aiding in the diagnosis of various cell-related diseases.

Table 9.

Comparison of the results of the improved EfficientNetv2 with those of other approaches on the LDWBC, LISC, and Raabin test sets.

Dataset Model OA (%) AP (%) AR (%) AF1 (%) Inference time (ms)
LDWBC ResNet52 96.6 87.32 91.18 89.21 14
DenseNet53 96.84 89.98 93.44 91.68 18
Inception-v454 88.87 67.83 64.44 66.09 36
Dual Path Networks55 97.2 89.31 92.28 90.77 23
VggNet56 95.63 87.14 79.80 83.31 12
Improved EfficientNetv2 97.81 90.44 96.59 93.42 8
LISC ResNet52 91.84 95.48 92.00 93.71 9
DenseNet53 95.92 98.33 96.00 97.15 10
Inception-v454 73.47 74.88 72.68 73.76 22
Dual Path Networks55 95.92 99.06 96.00 97.51 14
VggNet56 93.88 97.48 94.00 95.71 8
Improved EfficientNetv2 95.92 97.79 96.00 96.89 5
Raabin ResNet52 96.35 92.87 96.14 94.48 8
DenseNet53 97.14 94.02 97.08 95.53 8
Inception-v454 89.52 78.39 88.20 83.01 18
Dual Path Networks55 95.76 92.11 96.64 94.32 12
VggNet56 95.83 91.92 94.93 93.40 7
Improved EfficientNetv2 98.21 97.14 98.41 97.77 4

Significant values are shown in bold.

Table 10.

Comparison of the accuracy (%) of the improved EfficientNetv2 and other methods in predicting each class on the LDWBC, LISC, and Raabin test sets.

Dataset Model Basophil Monocyte Eosinophil Neutrophil Lymphocyte
LDWBC ResNet52 80 82.99 98.15 98.52 96.22
DenseNet53 91.11 81.96 99.07 99.19 95.88
Inception-v454 42.22 42.27 52.78 93.46 91.48
Dual Path Networks55 91.11 73.71 100 99.47 97.08
VggNet56 64.44 51.03 87.04 99.33 97.18
Improved EfficientNetv2 95.56 91.24 100 98.90 97.27
LISC ResNet52 100 60 100 100 100
DenseNet53 100 80 100 100 100
Inception-v454 90.91 50 62.50 80 80
Dual Path Networks55 100 80 100 100 100
VggNet56 100 70 100 100 100
Improved EfficientNetv2 100 80 100 100 100
Raabin ResNet52 100 88.05 98.12 95.84 98.70
DenseNet53 100 92.45 97.65 96.91 98.41
Inception-v454 91.67 81.76 86.85 89.99 90.75
Dual Path Networks55 100 91.82 98.12 94.54 98.70
VggNet56 100 84.91 95.77 95.73 98.27
Improved EfficientNetv2 100 95.60 99.53 97.92 98.99

Significant values are shown in bold.

As shown in Table 11, taking the LDWBC dataset as an example, when conducting t tests to compare multiple models in Group 1 (e.g., ResNet, DenseNet, Inception-v4, Dual Path Networks, VggNet) with the improved EfficientNetv2 (Group 2), the results show that the improved EfficientNetv2 performs better than other models in all metrics and exhibits significant advantages in multiple metrics such as AP, AR, and AF1 significant advantages. In particular, in comparison with Inception-v4, ResNet and VggNet, the improved EfficientNetv2 shows significant improvement in AF1, AP and AR (all p ≤ 0.001 or p ≤ 0.01). For DenseNet and Dual Path Networks, several metrics also showed some improvement (p ≤ 0.05). Overall, the improved version of EfficientNetv2 significantly outperforms the traditional convolutional neural network model in terms of precision, recall, and F1 score, suggesting that it has greater potential for applications with high precision requirements.

Table 11.

Comparison of several models (Group 1) and the improved EfficientNetv2 (Group 2) Metrics, *p ≤ 0.05 **p ≤ 0.01, ***p ≤ 0.001.

Group1 Metrics Mean ± standard deviation t-statistic p-value
Group 1 Group 2 (Improved EfficientNetv2)
OA ResNet 96.60 ± 2.14 97.81 ± 1.02 − 1.141 0.287
DenseNet 96.84 ± 2.07 − 0.94 0.375
Inception-v4 88.87 ± 5.96 − 3.306 0.011*
Dual Path Networks 97.20 ± 1.68 − 0.610 0.879
VggNet 95.63 ± 6.27 − 0.767 0.465
AP ResNet 87.32 ± 1.85 90.44 ± 1.00 − 3.317 0.011*
DenseNet 89.98 ± 2.59 − 0.37 0.721
Inception-v4 67.83 ± 2.47 − 18.973 0.000***
Dual Path Networks 89.31 ± 2.20 − 1.046 0.326
VggNet 87.14 ± 1.73 − 3.693 0.006**
AR ResNet 91.18 ± 2.10 96.59 ± 2.08 − 4.093 0.003**
DenseNet 93.44 ± 1.96 − 2.465 0.039*
Inception-v4 64.44 ± 1.75 − 26.447 0.000***
Dual Path Networks 92.28 ± 2.37 − 3.056 0.016*
VggNet 79.80 ± 4.72 − 7.279 0.000***
AF1 ResNet 89.21 ± 1.66 93.42 ± 1.10 − 4.727 0.001***
DenseNet 91.68 ± 2.18 − 1.593 0.150
Inception-v4 66.09 ± 1.79 − 29.087 0.000***
Dual Path Networks 90.77 ± 2.02 − 2.576 0.033*
VggNet 83.31 ± 2.42 − 8.504 0.000***

Figure 10 presents the PR curve obtained by training the improved EfficientNetv2 classification network on the Raabin dataset, whereas Fig. 11 displays the prediction confusion matrix obtained on the LDWBC and Raabin test sets. Figures 10 and 11 demonstrate that on the Raabin dataset, the improved network achieved average precisions of 99.5% for basophils and eosinophils, 96.5% for lymphocytes, 93.2% for monocytes, and 95.3% for neutrophils. The areas under the PR curves for all category are close to 1, which indicates that the classification model has achieved high precision and recall rates. Additionally, the improved classification network demonstrated superior performance on both the LDWBC and Raabin test sets.

Fig. 10.

Fig. 10

PR curves of the improved EfficientNetv2 network trained on the Raabin dataset.

Fig. 11.

Fig. 11

Confusion matrix of the classification results obtained on the two validation test sets. (a) LDWBC. (b) Raabin.

Discussion

The foundational strengths of the YOLOv7 architecture were leveraged and tailored to meet the specific requirements outlined in this research, with the objective of increasing precision and speed in target detection. The performance of the resulting model was subsequently evaluated on the test set of the BCCD dataset. The demerit is depicted in Fig. 12, where Fig. 12a represents the original image, Fig. 12b shows the annotation map of the original image, Fig. 12c shows the result from the image predicted using the YOLOv7 network, and Fig. 12d demonstrates the result from the original image predicted using the improved YOLOv7 network. A comparison of Fig. 12c and d reveals that the prediction based on YOLOv7 cannot be fully detected in the global image. As shown in the yellow box in Fig. 12c, there are still some RBCs that were not detected, but the improved YOLOv7 model detected the majority of cell objects by incorporating a multihead attention mechanism and an SIoU border loss function. A further comparison of Fig. 12b and d highlights that the improved detection model successfully identified previously unlabeled cell objects in the dataset, as shown by the red box in Fig. 12b. A comparison of the prediction results of the improved detection network with those of YOLOv7 revealed that the improved YOLOv7 detection network achieved superior results in detection task.

Fig. 12.

Fig. 12

Comparison of the results predicted by YOLOv7 and the improved YOLOv7 network. (a) Original image. (b) Original image with annotation. (c) Prediction results of YOLOv7. (d) Prediction results of the improved YOLOv7.

To further elucidate the effectiveness of the improved YOLOv7 model, ablation experiments detailed in Table 12 were conducted to analyze the impact of the multihead attention mechanism and the SIoU loss function on model performance. As indicated in Table 12, the introduction of SIoU loss function enables the model to perform more stably under stricter IoU thresholds, which is crucial for high-precision target localization. Although there is a slight decrease in recall, the overall performance of the model remains robust with the SIoU loss function. The incorporation of multihead attention significantly improves the model’s precision and mAP50-95, highlighting its ability to focus on different types of information. Although there is a decrease in recall, the substantial improvement in precision and reduction in false detections render this configuration advantageous for practical clinical applications. The synergistic integration of multihead attention and the SIoU loss function results in significant improvements in precision, recall, mAP50, and mAP50-95. Specifically, precision has increased by 5.6%, indicating that the combined enhancements achieve optimal performance and validate the effectiveness and practicality of each module.

Table 12.

Detection metric results for different YOLOv7 models.

Multihead attention SIoU Loss Precision (%) Recall (%) mAP50 (%) mAP50-95 (%)
 ×   ×  89.1 95.1 96.6 68.6
 ×  89.7 92.9 96.0 70.6
 ×  91.7 92.4 96.2 71.5
94.7 96.1 97.2 71.7

Significant values are shown in bold.

In addition, for the WBC reclassification network in this study, the classification performance was improved on the basis of EfficientNetv2, and ablation experiments were conducted on three publicly available datasets. The results obtained from the experiments are presented in Tables 13 and 14. A comparison of the experimental results reveals that on different datasets, the classification accuracy of WBCs was effectively improved by adding the ASPP module and using the BCE loss function. Specifically, Table 13 shows that the improved EfficientNetv2 achieved higher OA, AP, AR, and AF1 compared to its original version. Table 14 further confirms that the incorporation of the ASPP module and the BCE loss function consistently elevated the classification performance for various types of WBCs. Moreover, as indicated in Table 14, the test results on the LISC dataset are generally better than those on the other datasets, and many results achieve 100% accuracy. Further research revealed that, compared with that of other datasets, the image quality of the LISC dataset was greater, and the improved classification network was better able to extract the characteristics of different WBCs. Finally, the classification results of the same classification network are affected by the quality of the dataset samples. A dataset with high sample quality but a small number of samples may produce similar prediction accuracy on the same network as a dataset with low sample quality but a large number of samples and may even achieve better results. Therefore, very high-quality datasets are particularly important for training network models.

Table 13.

Classification metric results for different EfficientNetv2 models on the three test sets.

Dataset Loss function ASPP OA (%) AP (%) AR (%) AF1 (%)
CE BCE With ASPP No ASPP
LDWBC 94.22 88.46 92.82 90.59
97.11 89.46 95.67 92.46
97.26 88.96 96.26 92.47
97.81 90.44 96.59 93.42
LISC 79.59 93.26 79.36 85.75
91.84 95.38 92.18 93.75
89.80 96.58 89.68 93.00
97.96 97.79 98.00 97.89
Raabin 95.24 93.25 93.61 93.43
96.35 94.53 96.08 95.30
96.97 95.57 97.07 96.31
98.21 97.14 98.41 97.77

Significant values are shown in bold.

Table 14.

Accuracies (%) of different EfficientNetv2 models for each category on three test sets.

Dataset Loss function ASPP Basophil Monocyte Eosinophil Neutrophil Lymphocyte
CE BCE With ASPP No ASPP
LDWBC 91.11 88.66 95.37 95.70 93.25
93.33 90.21 100 97.47 97.32
95.56 90.72 100 98.66 96.36
95.56 91.24 100 98.90 97.27
LISC 81.82 70 75.00 80 90
90.91 80 100 90 100
90.91 80 87.50 90 100
100 90 100 100 100
Raabin 95 89.94 90.61 95.67 96.82
100 93.71 92.49 96.34 97.83
100 94.97 95.77 96.91 97.69
100 95.60 99.53 97.92 98.99

Significant values are shown in bold.

The performances of the improved YOLOv7 and original YOLOv7 networks incorporating the proposed WBC classification model were compared for the target detection task. Figure 13 compares the predicted results. As shown in Fig. 13a, the original YOLOv7 model failed to detect the cells within the red box. In contrast, Fig. 13b demonstrates that the detection network proposed in this study can more accurately and comprehensively locate and recognize cells in blood images, while the two-stage WBC reclassification network also has high classification accuracy. Especially for the third sample in Fig. 13, the detection network successfully identified two WBCs with similar shapes. An experienced pathologist was consulted for this study, and the two WBCs were carefully classified after being identified by the pathologist. The comparison of the classification results of the improved WBC classification network and those determined by expert physicians revealed consistency, indicating that a high level of accuracy was achieved by the improved network. In general, the method used in this study has significantly improved the detection effect compared with YOLOv7 in the cell detection, identification, and classification tasks, and can locate cells more comprehensively and accurately. Moreover, the WBC classification network also achieved greater results. The integration of detection and classification provides more efficient, accurate, and reliable solutions for critical tasks such as blood cell analysis and clinical decision support.

Fig. 13.

Fig. 13

Final detection and classification results. (a) Results predicted by YOLOv7. (b) Results predicted by the improved YOLOv7.

While the improved YOLOv7 model is remarkably effective, it has several limitations. Despite the significant improvements in detection accuracy, certain challenges remain that can affect model performance. Issues such as suboptimal smear preparation, inconsistent staining, and residual artifacts can lead to cell overlap, indistinct boundaries, and alterations in morphological characteristics, consequently impacting model detection accuracy and increasing the likelihood of false negatives. For instance, as shown in the lower left corners of Fig. 14a, artifacts in the peripheral blood smear images have caused cellular blurring, resulting in missed detections of red blood cells. In addition, the detection of cells located near the edge of an image is often challenging, as shown at the edges of Fig. 14a and b, which may blur cell boundaries and obscure key morphological details. Future research should focus on optimizing these scenarios using advanced deep learning techniques like de-artifacts algorithms, generative adversarial networks, data augmentation strategies, model ensembling methods, and transfer learning. Examples of cell overlap are shown on the left side of Fig. 14c and in the upper left corner of Fig. 14d, cells in peripheral blood smears are often clustered into clumps or even aggregated into long strips, which poses a significant challenge to the detection model. In this case, large numbers of cells obscured by other cells can lead to incomplete feature extraction, further compromising the accuracy of the model. Despite these issues, improvements in the YOLOv7 model provide a robust foundation, paving the way for further refinements to address these limitations. Future efforts should focus on optimizing the model architecture, implementing more precise attention mechanisms for overlapping cells, and enhancing post-processing techniques.

Fig. 14.

Fig. 14

Result of the model output, where (a), (b), (c) and (d) are the detection results of different images.

Beyond these challenges related to image artifacts and cell clustering, the issue of cell category imbalance poses additional hurdles for the model. Differences in cell physiological proportions and biases in sampling can result in significant imbalances among cell categories. Taking the LDWBC data set from this paper as an example, the sample sizes for basophils and monocytes are smaller compared to those for neutrophils and lymphocytes. This imbalance leads the classification model used in this study to be more sensitive to categories with larger sample sizes when identifying cell types, potentially neglecting less represented categories. Such imbalances can adversely affect overall classification performance, particularly for rare leukocyte subtypes. To investigate this issue further, this study applies the proposed improved EfficientNetV2 algorithm to the highly imbalanced Aml-Cytomorphology_LMU bone marrow cell dataset57. Despite achieving an overall accuracy rate exceeding 95%, the significant disparity in sample sizes across categories poses challenges. Specifically, subtypes like promyelocytes, metamyelocytes and Smudge cell have very limited samples, resulting in prediction precisions of either 0% or 100%, as shown in Table 15.

Table 15.

The results of the improved EfficientNetv2 on the Aml-Cytomorphology_LMU dataset.

Class Image number Precision (%) Recall (%) AP (%)
BAS 79 74.7 58.7 44.1
EBO 78 72.2 69.3 50.2
EOS 424 98.6 97.4 96.0
KSC 15 100 33.3 33.4
LYA 11 0 0 0.1
LYT 3937 92.3 96.6 89.8
MMZ 15 0 0 0.1
MOB 26 28.3 20 5.8
MON 1789 95.4 87.5 84.6
MYB 42 54.0 37.5 22.6
MYO 3268 92.7 94.3 88.4
NGB 109 0 0 0.6
NGS 8484 98.1 99.0 97.6
PMB 18 0 0 0.1
PMO 70 79.1 75.7 60.1

BAS Basophil, EBO Erythroblast, EOS Eosinophil, KSC Smudge cell, LYA Lymphocyte (atypical), LYT Lymphocyte (typical), MMZ Metamyelocyte, MOB Monoblast, MON Monocyte, MYB Myelocyte, MYO Myeloblast, NGB Neutrophil (band), NGS Neutrophil (segmented), PMB Promyelocyte(bilobed), PMO Promyelocyte.

Table 15 shows that extreme imbalance can easily lead to overfitting or underfitting in the classification model, resulting in poor performance for underrepresented categories. To address this issue, several strategies can be employed. First, at the data level, techniques such as oversampling, under sampling, or synthetic data generation can help create a more balanced dataset. Furthermore, grouping rare or low-abundance cell types and then subdividing them can further enhance dataset balance. Second, improvements to the model itself, including adjustments to the model structure, loss function, and attention mechanisms, can enhance classification accuracy. Finally, ensemble learning methods, which combine the predictions of multiple models, can improve OA and stability. Together, these approaches can not only mitigate the current limitations but also enhance the model’s generalizability and reliability in clinical applications.

The four blood cell datasets used in this study were sourced from different laboratories and hospitals, employing different staining and imaging techniques. Specifically, the BCCD dataset employed the Gismo-right technique for staining the blood smear and was imaged at 100× magnification using a regular light microscope equipped with a CCD color camera. The LDWBC dataset used the Wright Giemsa staining technique and was captured using an OLYMPUS BX41 microscope with an OLYMPUS Plan N 100×/1.25 objective lens. The LISC dataset also employed the Gismo-right staining method similar to BCCD, and images were taken at 100× magnification using a light microscope (Microscope-Axioskope 40) paired with a Sony SSCDC50AP digital camera. For the Raabin dataset, all samples were stained with Giemsa and imaged using two microscopes, Olympus CX18 and Zeiss, both at 100× magnification. In these datasets, the model proposed in this paper demonstrated superior performance, highlighting its robust generalization capabilities. However, in clinical practice, variations in color texture and other characteristics of blood cells caused by staining can still potentially lead to confusion for the model in accurately recognizing blood cells. The accuracy and quantity of data annotation, as well as the quality of data collection and preprocessing, may also significantly impact research outcomes, thereby influencing the precision of recognition and diagnosis for related diseases. Therefore, in clinical applications, the study should focus on various staining methods, collect diverse blood cell image data, explore efficient and robust model architectures and algorithms, and optimize the performance of the models through staining normalization, standardization, or image enhancement preprocessing methods to improve the accuracy of detection and classification models.In addition, optimizing image preprocessing and postprocessing along with applying model pruning and quantization techniques can improve model inference speed while designing lightweight models.

In summary, the YOLO series of target detection networks has rapid detection capabilities. Compared with other versions, YOLOv7 exhibits notable superiority in terms of detection accuracy, speed, robustness, generalization capabilities, multiscale input support, deployment flexibility, and scalability. The improved YOLOv7 network facilitates the dynamic learning of nuclear areas in WBC images, effectively addressing issues related to sample similarity. Furthermore, it enhances convergence and improves the accuracy of border positioning and cell classification, making it well suited for real-time blood cell detection. In addition, EfficientNetv2 is also a lightweight network model, and the improvement in EfficientNetv2 results in higher prediction accuracy for the fine classification of WBCs. By combining YOLOv7 with EfficientNetv2, higher target object anchor frame prediction accuracy and object categorization accuracy can be guaranteed via real-time detection. On the basis of the abovementioned findings, the methodology presented in this study can be incorporated into a fully automated cytological analyzer. More specifically, the proposed detection and classification network is implemented on a dedicated computer system. In this study, the proposed model deployment utilizes hardware with a Core i9 12900KF CPU, 32 GB of RAM, and an NVIDIA GTX 3060 Ti GPU. The trained model is converted using TensorRT, and inference acceleration is achieved through CUDA programming on the NVIDIA platform. Images are scanned via a professional camera. Trained medical professionals annotate the regions of interest (ROIs) within these images, which are subsequently transmitted to the deployed computer for inference. The actual time to process a single image is approximately 8 ms, with classification inference taking around 6 ms. This speed enables real-time detection and classification, enabling subsequent analysis and statistics on the size, quantity, and spatial distribution of various cell types, to aid in diagnosis.

Conclusion

Currently, blood images are analyzed via manual microscopy, which not only is labor intensive and time-consuming but may also produce variable final diagnostic results due to human error. Therefore, the use of deep learning methods for the automatic detection, classification, and counting of blood cell images is proposed to aid doctors in preliminary leukemia diagnosis.

In this study, a two-stage deep learning method was proposed to automatically and accurately detect and identify blood cells in images. First, for the target detection task of identifying blood cells, the improved network based on YOLOv7 exhibits superior performance in accurately identifying and categorizing cells compared with its original version. In addition, the improved network also gains advantages from the guidance of the multihead attention mechanism, which further increases the representation ability of the model by allowing the model to focus on different aspects of information, thereby addressing the sample similarity issue. Furthermore, the positioning accuracy is improved by integrating the SIoU loss function, which combines angle, distance, shape, and IoU costs to increase the detection performance. Then, for the WBC reclassification task, considering that imbalanced data or insufficient training data may have a negative impact on the effectiveness of the deep learning model, the BCE loss function method was used to dynamically assign corresponding weights to each category, ensuring that categories with a small number of samples are given greater weight to overcome the data imbalance problem. Moreover, the ASPP module was incorporated to capture multiscale feature information by employing dilated convolution operations with different magnifications, with the aim of further enhancing the generalization ability of the model.

Despite the demonstrated significant effectiveness of the proposed method, it remains subject to several constraints, including staining techniques, imaging quality, and dataset quality. These constraints introduce specific limitations to the method. For example, image artifacts, cell clustering, and annotation issues can all affect the method’s effectiveness. slides obtained via different staining methods exhibit varying results, which leads to differences in the images fed into the network. This variability presents challenges to the network’s detection and classification capabilities. Additionally, acquiring high-quality images poses another challenge; higher image quality typically corresponds to larger resolutions, which increases the demands on the model’s inference speed.

In future work, for the target detection network, the loss function will continue to be improved, the target object will be located more accurately, and issues such as cell adhesion, overlap, and accurate classification will be more effectively addressed. For the reclassification of WBCs, it is essential to extract additional feature information within modules to reduce intraclass variation and enhance interclass differences, thereby improving the representational ability of the improved method. In addition, the current classification is primarily based on the five fundamental subcategories of WBCs. Future research should address the challenge of classifying additional subtypes.

Author contributions

Conceptualization, X.W., G.P. and Z.H.; methodology, G.P. and X.W.; software, G.P. and X.W.; validation, X.W., G.P., Z.H. and A.G.; formal analysis, X.W., G.P., Z.H. and A.G.; investigation, X.W.; resources, X.W. and Z.H.; data curation, G.P. and A.G.; writing—original draft preparation, G.P. and X.W.; writing—review and editing, X.W., G.P. and Z.H.; visualization, G.P. and A.G.; supervision, Z.H. and X.W.; project administration, X.W.; funding acquisition, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the key specialized research and development breakthrough of Henan province (Grant No. 232102211016), and the key scientific research projects of Henan colleges and universities (Grant No. 23A416004).

Data availability

The datasets used in this study are publicly available online at: BCCD: https://github.com/Shenggan/BCCD_Dataset/tree/master/BCCD. LDWBC: http://datasets.biodwhu.cn/LDWBC/. LISC: https://users.cecs.anu.edu.au/~hrezatofighi/Data/. Raabin: https://www.kaggle.com/datasets/raabindata/raabin-wbc

Declarations

Competing interests

The authors declare no competing interests.

Ethical approval

This study was conducted on four publicly available datasets, without involving human subjects or introducing any additional risks or discomfort associated with research participation. As a result, ethical approval was not required for the study. When collaborating with clinical experts in the subsequent process, it is crucial to adhere strictly to medical ethics norms such as the Declaration of Helsinki and international guidelines. To ensure research aligns with ethical principles, measures like respecting patient consent, conducting rigorous reviews, anonymizing and securing data, implementing access controls, and providing regular training must be conducted to uphold legality and compliance and safeguard the integrity and trustworthiness of the research.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Khan, M. A. et al. Automated design for recognition of blood cells diseases from hematopathology using classical features selection and elm. Microsc. Res. Tech.84, 202–216. 10.1002/jemt.23578 (2021). [DOI] [PubMed] [Google Scholar]
  • 2.Almezhghwi, K. & Serte, S. Improved classification of white blood cells with the generative adversarial network and deep convolutional neural network. Comput. Intell. Neurosci.10.1155/2020/6490479 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Siddique, M. A. I., Aziz, A. Z. B., Matin, A. An improved deep learning based classification of human white blood cell images. In International Conference on Electrical and Computer Engineering (ICECE), 149–52 (2020).
  • 4.Ghosh, S., Majumder, M. & Kudeshia, A. Leukox: Leukocyte classification using least entropy combiner (LEC) for ensemble learning. IEEE Trans. Circuits Syst. II-Express Briefs68, 2977–2981. 10.1109/TCSII.2021.3064389 (2021). [Google Scholar]
  • 5.Saade, P., El, J. R., El Hayek, S., Abi Zeid, J., Falou, O. & Azar, D. Computer-aided detection of white blood cells using geometric features and color. In Cairo International Biomedical Engineering Conference (CIBEC), 142–5 (2018).
  • 6.Karthikeyan, M. & Venkatesan, R. Interpolative leishman-stained transformation invariant deep pattern classification for white blood cells. Soft Comput.10.1007/s00500-019-04662-4 (2020). [Google Scholar]
  • 7.Özyurt, F. A fused CNN model for WBC detection with MRMR feature selection and extreme learning machine. Soft Comput.24, 8163–8172. 10.1007/s00500-019-04383-8 (2020). [Google Scholar]
  • 8.Baby, D., Devaraj, S. J. & Hemanth, J. Leukocyte classification based on feature selection using extra trees classifier: A transfer learning approach. Turk. J. Electr. Eng. Comput. Sci.10.3906/elk-2104-183 (2021). [Google Scholar]
  • 9.Dong, Y. et al. Leukemia incidence trends at the global, regional, and national level between 1990 and 2017. Exp. Hematol. Oncol.9, 14. 10.1186/s40164-020-00170-6 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Du, M. et al. The global burden of Leukemia and its attributable factors in 204 countries and territories: findings from the global burden of disease 2019 study and projections to 2030. J. Oncol.2022, 1612702. 10.1155/2022/1612702 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Hu, B., Liu, Y., Chu, P., Tong, M. & Kong, Q. Small object detection via pixel level balancing with applications to blood cell detection. Front. Physiol.13, 911297 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhai, Q., Fan, B., Zhang, B., Li, J.-H. & Liu, J.-Z. Automatic white blood cell classification based on whole-slide images with a deeply aggregated neural network. J. Med. Biol. Eng.42, 126–137 (2022). [Google Scholar]
  • 13.Chen, H., Liu, J., Hua, C., Zuo, Z., Feng, J., Pang, B. & Xiao, D. TransMixNet: An attention based double-branch model for white blood cell classification and its training with the fuzzified training data. In Proceedings of the 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 842–847 (2021).
  • 14.Chen, H. et al. Accurate classification of white blood cells by coupling pre-trained ResNet and DenseNet with SCAM mechanism. BMC Bioinform.23, 1–20 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ahmad, R., Awais, M., Kausar, N. & Akram, T. White blood cells classification using entropy-controlled deep features optimization. Diagnostics13, 352 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Li, M. et al. A deep learning model for detection of leukocytes under various interference factors. Sci. Rep.13, 2160 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Yan, G. et al. Diagnosis and typing of leukemia using a single peripheral blood cell through deep learning. Cancer Sci.10.1111/cas.16374 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Asar, T. O. & Ragab, M. Leukemia detection and classification using computer-aided diagnosis system with falcon optimization algorithm and deep learning. Sci. Rep.14, 21755. 10.1038/s41598-024-72900-3 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Jawahar, M. et al. An attention-based deep learning for acute lymphoblastic leukemia classification. Sci. Rep.14, 17447 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Khan, S. et al. Efficient leukocytes detection and classification in microscopic blood images using convolutional neural network coupled with a dual attention network. Comput. Biol. Med.174, 108146 (2024). [DOI] [PubMed] [Google Scholar]
  • 21.Zhang, D., Bu, Y., Chen, Q., Cai, S. & Zhang, Y. TW-YOLO: An innovative blood cell detection model based on multi-scale feature fusion. Sensors24(19), 6168 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Cheuque, C. et al. An efficient multi-level convolutional neural network approach for white blood cells classification. Diagnostics12(2), 248 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Alhashmi, S. M. et al. Survival analysis of thyroid cancer patients using machine learning algorithms. IEEE Access12, 61978–61990 (2024). [Google Scholar]
  • 24.Tasci, B. et al. Automated schizophrenia detection model using blood sample scattergram images and local binary pattern. Multimed. Tools Appl.83, 42735–42763 (2024). [Google Scholar]
  • 25.Banerjee, A., Ghosh, C. & Mandal, S. N. Analysis of v-net architecture for iris segmentation in unconstrained scenarios. SN Comput. Sci.3(3), 208 (2022). [Google Scholar]
  • 26.Ismael, A. M. & Şengür, A. The investigation of multiresolution approaches for chest X-ray image based COVID-19 detection. Health Inf. Sci. Syst.8(1), 29 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ismael, A. M. & Şengür, A. Deep learning approaches for COVID-19 detection based on chest X-ray images. Expert Syst. Appl.164, 114054 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Gomes, J. C., Barbosa, V. A. F., Santana, M. A. et al. IKONOS: An intelligent tool to support diagnosis of COVID-19 by texture analysis of X-ray images. Res. Biomed. Eng. 1–14 (2020).
  • 29.Faruqui, N. et al. Healthcare As a Service (HAAS): CNN-based cloud computing model for ubiquitous access to lung cancer diagnosis. Heliyon9(11), e21520 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Trivedi, S., Patel, N. & Faruqui, N. Bacterial strain classification using convolutional neural network for automatic bacterial disease diagnosis. In 2023 13th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 325–332 (IEEE, 2023).
  • 31.Wu, Y. et al. SW-YOLO: Improved YOLOv5s algorithm for blood cell detection. In International Conference on Intelligent Robotics and Applications (Springer Nature, Singapore, 2023).
  • 32.Liu, C., Li, D. & Huang, P. ISE-YOLO: Improved squeeze-and-excitation attention module based YOLO for blood cells detection. In 2021 IEEE International Conference on Big Data (Big Data), 3911–3916 (IEEE, 2021).
  • 33.Shi, C. et al. Gpmb-yolo: A lightweight model for efficient blood cell detection in medical imaging. Health Inf. Sci. Syst.12(1), 24 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Yufeng, W. A. N. G. & Dahai, L. I. Improved YOLO framework blood cell detection algorithm. Comput. Eng. Appl.58(12), 191–198 (2022). [Google Scholar]
  • 35.Attallah, O. Acute lymphocytic leukemia detection and subtype classification via extended wavelet pooling based-CNNs and statistical-texture features. Image Vis Comput.147, 105064 (2024). [Google Scholar]
  • 36.Talaat, F. M. & Gamel, S. A. Machine learning in detection and classification of leukemia using C-NMC_Leukemia. Multimed. Tools Appl.83(3), 8063–8076 (2024). [Google Scholar]
  • 37.Luo, Y. et al. ResNeXt-CC: A novel network based on cross-layer deep-feature fusion for white blood cell classification. Sci. Rep.14(1), 18439 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Abou Ali, M., Dornaika, F. & Arganda-Carreras, I. White blood cell classification: Convolutional Neural Network (CNN) and Vision Transformer (ViT) under medical microscope. Algorithms16(11), 525 (2023). [Google Scholar]
  • 39.Rao, B. S. & Rao, B. S. An effective WBC segmentation and classification using MobilenetV3–ShufflenetV2 based deep learning framework. IEEE Access20(11), 27739–27748 (2023). [Google Scholar]
  • 40.Erten, M. et al. ConcatNeXt: An automated blood cell classification with a new deep convolutional neural network. Multimed. Tools Appl.2, 1–9 (2024). [Google Scholar]
  • 41.Wang, C.-Y., Bochkovskiy, A. & Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7464–7475 (2023).
  • 42.Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. & Polosukhin, I. Attention is all you need. In Advances in neural information processing systems, Vol. 30 (2017).
  • 43.Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv preprint arXiv:2205.12740 (2022).
  • 44.Tan, M. & Le, Q. Efficientnetv2: Smaller models and faster training. In Proceedings of the International conference on machine learning, 10096–10106 (2021).
  • 45.Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell.40, 834–848 (2017). [DOI] [PubMed] [Google Scholar]
  • 46.Aurelio, Y. S., de Almeida, G. M., de Castro, C. L. & Braga, A. P. Learning from imbalanced data sets with weighted cross-entropy function. Neural Process. Lett.50, 1937–1949. 10.1007/s11063-018-09977-1 (2019). [Google Scholar]
  • 47.P., M. Blood cell images. 2018. https://www.kaggle.com/datasets/paultimothymooney/blood-cells Accessed 6 Jan 2022.
  • 48.Rezatofighi, S. H. & Soltanian-Zadeh, H. Automatic recognition of five types of white blood cells in peripheral blood. Comput. Med. Imaging Graph.35, 333–343. 10.1016/j.compmedimag.2011.01.003 (2011). [DOI] [PubMed] [Google Scholar]
  • 49.Kouzehkanan, S.-Z. M., Saghari, S., Tavakoli, E., Rostami, P., Abaszadeh, M., Satlsar, E. S., Mirzadeh, F., Gheidishahran, M., Gorgi, F., Mohammadi, S. et al. Raabin-WBC: A large free access dataset of white blood cells from normal peripheral blood. bioRxiv. 10.1101/2021.05.02.442287 (2021).
  • 50.Ren, S. Q., He, K. M., Girshick, R. & Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS), Montreal, CANADA (2015).
  • 51.Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollar, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell.42, 318–327. 10.1109/tpami.2018.2858826 (2020). [DOI] [PubMed] [Google Scholar]
  • 52.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778 (2016).
  • 53.Huang, G; Liu, Z, Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4700–4708 (2017).
  • 54.Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence (2017).
  • 55.Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S. & Feng, J. Dual path networks. In Advances in Neural Information Processing Systems, Vol. 30 (2017).
  • 56.Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
  • 57.Matek, C. et al. Human-level recognition of blast cells in acute myeloid leukaemia with convolutional neural networks. Nat. Mach. Intell.1, 538–544 (2019). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used in this study are publicly available online at: BCCD: https://github.com/Shenggan/BCCD_Dataset/tree/master/BCCD. LDWBC: http://datasets.biodwhu.cn/LDWBC/. LISC: https://users.cecs.anu.edu.au/~hrezatofighi/Data/. Raabin: https://www.kaggle.com/datasets/raabindata/raabin-wbc


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES