Skip to main content
Discover Oncology logoLink to Discover Oncology
. 2025 Apr 15;16:529. doi: 10.1007/s12672-025-02314-8

Multi-objective deep learning for lung cancer detection in CT images: enhancements in tumor classification, localization, and diagnostic efficiency

Abdulqader Faris Abdulqader 1, S Abdulameer 2,3, Ashok Kumar Bishoyi 4, Anupam Yadav 5, M M Rekha 6, Mayank Kundlas 7, V Kavitha 8, Zafar Aminov 9, Zahraa Saad Abdulali 10, Mariem Alwan 11, Mahmood Jawad 12, Hiba Mushtaq 13, Bagher Farhood 14,
PMCID: PMC12000487  PMID: 40232589

Abstract

Objective

This study aims to develop and evaluate an advanced deep learning framework for the detection, classification, and localization of lung tumors in computed tomography (CT) scan images.

Materials and methods

The research utilized a dataset of 1608 CT scan images, including 623 cancerous and 985 non-cancerous cases, all carefully labeled for accurate tumor detection, classification (benign or malignant), and localization. The preprocessing involved optimizing window settings, adjusting slice thickness, and applying advanced data augmentation techniques to enhance the model’s robustness and generalizability. The proposed model incorporated innovative components such as transformer-based attention layers, adaptive anchor-free mechanisms, and an improved feature pyramid network. These features enabled the model to efficiently handle detection, classification, and localization tasks. The dataset was split into 70% for training, 15% for validation, and 15% for testing. A multi-task loss function was used to balance the three objectives and optimize the model's performance. Evaluation metrics included mean average precision (mAP), intersection over union (IoU), accuracy, precision, and recall.

Results

The proposed model demonstrated outstanding performance, achieving a mAP of 96.26%, IoU of 95.76%, precision of 98.11%, and recall of 98.83% on the test dataset. It outperformed existing models, including You Only Look Once (YOLO)v9 and YOLOv10, with YOLOv10 achieving a mAP of 95.23% and YOLOv9 achieving 95.70%. The proposed model showed faster convergence, better stability, and superior detection capabilities, particularly in localizing smaller tumors. Its multi-task learning framework significantly improved diagnostic accuracy and operational efficiency.

Conclusion

The proposed model offers a robust and scalable solution for lung cancer detection, providing real-time inference, multi-task learning, and high accuracy. It holds significant potential for clinical integration to improve diagnostic outcomes and patient care.

Keywords: Lung cancer, Multi-task learning, CT imaging, Deep learning

Introduction

Lung cancer is the second most commonly diagnosed cancer globally and remains the leading cause of cancer-related deaths, accounting for approximately 1.8 million deaths annually [1]. Early detection is crucial for improving survival rates, as the five-year survival rate is significantly higher when lung cancer is identified at an early stage compared to advanced stages [2, 3]. Computed tomography (CT) has become a key diagnostic tool, providing high-resolution images of pulmonary structures [49]. However, accurately interpreting CT scans remains challenging due to the lungs' complex anatomy, the small size of early-stage tumors, and the overlapping appearance of benign and malignant lesions [1015].

Artificial intelligence (AI), particularly deep learning, has transformed medical imaging by enabling automated, fast, and accurate analysis of large datasets [1622]. Object detection frameworks like the You Only Look Once (YOLO) series have gained popularity for their ability to detect and classify objects in real time with high accuracy [2332]. In medical applications, YOLO architectures show promise in tumor detection using CT scans, offering efficient and scalable solutions to the challenges of manual radiological interpretation. However, current models struggle with detecting small-scale objects, handling imbalanced datasets, and achieving precise tumor localization—all critical for improving diagnostic workflows and guiding clinical decisions [3335]. The high dimensionality and variability of CT scan data further complicate these tasks, necessitating architectures capable of robust feature extraction and localization across diverse clinical scenarios.

The challenges are particularly pronounced in detecting early-stage tumors, where subtle morphological features are easily overlooked by both radiologists and AI systems [36, 37]. Addressing these gaps requires advanced frameworks leveraging multi-task learning to perform detection, classification, and localization simultaneously, optimizing both efficiency and performance. Such advancements can significantly enhance AI's role in clinical decision-making, enabling faster, more accurate diagnoses that ultimately improve patient outcomes.

Lung cancer remains the leading cause of cancer-related deaths worldwide, with early detection being crucial for improved survival rates. Traditional diagnostic methods, including radiological interpretation of CT scans, are prone to human error and interobserver variability, especially in detecting small or early-stage tumors. Conventional AI-based models have improved diagnostic accuracy but still struggle with false positives, precise tumor localization, and efficient real-time performance. To address these challenges, this study introduces YOLOv11, a multi-task deep learning model that integrates detection, classification, and localization into a unified framework. This approach aims to enhance diagnostic efficiency, reduce error rates, and provide a scalable AI-driven solution for clinical integration.

Key contributions

  • Development of YOLOv11: A novel deep learning framework integrating detection, classification, and localization for lung tumor assessment.

  • Transformer-Based Attention Layers: Enhanced feature extraction and improved focus on critical tumor regions, leading to superior accuracy.

  • Adaptive Anchor-Free Mechanism: Improved detection of small tumors without reliance on pre-defined anchor boxes.

  • Comparison with Previous Models: YOLOv11 outperformed YOLOv9 and YOLOv10 with a mAP of 96.26% and an IoU of 95.76%.

  • Clinical Applicability: Demonstrates strong potential for real-time clinical integration, significantly improving diagnostic workflows.

The structure of this paper is organized as follows: Sect. 2 details the materials and methods used in the study, including data collection processes, preprocessing techniques, and the technical innovations incorporated into YOLOv11. Section 3 focuses on the architecture of YOLOv11, emphasizing its advancements over earlier models and the integration of multi-task learning. Section 4 presents the experimental results, showcasing key performance metrics such as detection accuracy, classification precision, and a comparative analysis with YOLOv9 and YOLOv10. Section 5 explores the implications of these findings, discussing the model’s strengths, limitations, and potential for clinical implementation. Finally, Sect. 6 concludes the paper by summarizing its key contributions and proposing future research directions to further improve AI-driven medical imaging for lung cancer detection. By addressing existing diagnostic challenges and offering a cutting-edge solution, this study seeks to enhance the accuracy of lung cancer diagnostics and improve patient outcomes. The following section presents the materials and methods used in this study, detailing the dataset preparation, preprocessing techniques, and the architectural improvements of the YOLOv11 framework that enable accurate tumor detection, classification, and localization.

Materials and methods

Data collection and preprocessing

The dataset for this study consists of 1608 CT scan images sourced from publicly available medical imaging repositories and institutional databases. It is divided into two main groups: 623 images from confirmed lung cancer cases and 985 images from non-cancerous patients. Each image was carefully reviewed and annotated by expert radiologists to create precise ground truth labels for tumor detection, classification (benign or malignant), and localization. These annotations formed the foundation for training, validating, and testing the proposed deep learning models. The dataset used in this study was obtained internally at our research center between 2019 and 2022. The CT scan images were collected from collaborating medical institutions under strict ethical guidelines, with expert radiologists performing the annotations for tumor detection, classification, and localization.

To maintain dataset quality and relevance, inclusion and exclusion criteria were applied. Included CT scans were from adult patients (18 years and older) with clear annotations indicating tumor presence, type (benign or malignant), and exact location, and were of high resolution to ensure accurate tumor localization. Excluded scans included those with significant artifacts, incomplete annotations, poor image quality, pediatric cases, or rare lung abnormalities unrelated to cancer.

Standardized imaging protocols ensured the diagnostic relevance of the dataset. CT scans were analyzed under two window settings: the mediastinum window (width: 350 Hounsfield Units [HU]; level: 40 HU) for mediastinal structures and the lung window (width: 1400 HU; level: − 700 HU) for detailed lung parenchyma visualization. Reconstructions were performed with a slice thickness of 2 mm, providing high-resolution cross-sectional imaging. Slice intervals ranged from 0.625 to 5 mm to accommodate various clinical protocols, capturing both fine anatomical details and broader structural features.

To enhance model robustness and mitigate overfitting, data augmentation techniques were applied. These included geometric transformations like rotation (± 15 degrees), horizontal and vertical flipping, and scaling. Adjustments in color space, such as histogram equalization and contrast normalization, standardized image brightness, while Gaussian noise injection simulated real-world variability. Cropping and padding ensured consistent image dimensions without losing critical features. This preprocessing workflow resulted in a diverse and high-quality dataset for training and validation.

Figure 1 illustrates the study workflow, while Fig. 2 displays representative CT scan images, highlighting malignant (cancerous) and non-cancerous cases. The malignant images reveal abnormal lung parenchyma growths with irregular shapes and densities, whereas non-cancerous images show normal pulmonary structures without abnormalities.

Fig. 1.

Fig. 1

Comprehensive framework for multi-task learning for lung tumor detection

Fig. 2.

Fig. 2

Examples of malignant and non-cancerous cases in CT scans

Model architecture

The YOLO framework is a deep learning-based object detection architecture designed to perform detection, classification, and localization in a single forward pass. It divides the input image into a grid and predicts bounding boxes, class probabilities, and confidence scores for each cell. This method enables real-time operation while ensuring high accuracy. Over multiple iterations, the YOLO family has introduced architectural enhancements to improve detection accuracy, speed, and robustness, particularly for small and complex objects.

YOLOv9, YOLOv10, and YOLOv11 architectural details

YOLOv9 introduced advanced feature pyramidal networks to improve multi-scale object detection, enhancing its ability to identify objects of varying sizes. By incorporating deeper convolutional layers and dynamic anchor boxes, it achieved greater detection accuracy and robustness. Additionally, spatial pyramid pooling (SPP) modules expanded the receptive fields, enabling better capture of complex features critical in medical imaging.

Building on these improvements, YOLOv10 integrated attention mechanisms to focus on critical image regions, enhancing detection precision. It also employed a lightweight backbone network to optimize computational efficiency, enabling faster inference times without compromising accuracy. YOLOv10 further refined anchor-free detection methods, making it more effective at detecting smaller objects, which is especially important for identifying subtle abnormalities in lung CT scans.

YOLOv11 advances these architectures with updates specifically tailored to medical imaging challenges. Transformer-based attention layers enhance feature representation by capturing long-range dependencies, improving the detection of small tumors in high-resolution CT scans. Advanced normalization techniques, such as group normalization, ensure stable training even with the small batch sizes typical of medical datasets. Its adaptive anchor-free detection mechanism, coupled with deeper feature pyramid networks, provides superior handling of class imbalances and complex object morphologies. These enhancements make YOLOv11 the most robust and effective model for lung cancer detection, classification, and localization.

YOLOv11 introduces several key architectural advancements tailored for multi-task learning in medical imaging, making it exceptionally effective for lung cancer detection, classification, and localization in CT scans. One of its core innovations is the integration of transformer-based attention layers, which capture global contextual information to improve focus on critical regions, enhancing tumor localization and classification accuracy. Additionally, an adaptive anchor-free mechanism eliminates the dependence on pre-defined anchor sizes, allowing the model to detect objects of varying sizes more effectively while reducing computational overhead.

The enhanced Feature Pyramid Network (FPN) in YOLOv11 incorporates deeper feature fusion and additional lateral connections, significantly improving the detection of small and complex objects, such as early-stage lung tumors. A FPN is a deep learning architecture designed to improve object detection across multiple scales. It enhances the detection of both large and small objects by constructing a hierarchical feature representation, allowing models like YOLOv11 to recognize tumors of different sizes in CT images more effectively. To address the constraints of varying batch sizes in medical datasets, the model replaces traditional batch normalization with group normalization, ensuring stable training. It also employs a lightweight backbone with integrated transformer modules, balancing computational efficiency with strong feature extraction capabilities.

These advancements collectively overcome the limitations of earlier YOLO versions, enabling YOLOv11 to surpass YOLOv9 and YOLOv10 in tackling the unique challenges of medical imaging. Its ability to process high-resolution CT scans with efficiency and precision ensures superior performance, particularly in detecting and localizing subtle abnormalities indicative of lung cancer.

Training and evaluation protocol

The CT scan dataset was divided into three subsets: training, validation, and test sets, following a 70-15-15% split to ensure balanced representation of lung cancer and non-cancerous cases. The training set, comprising 70% of the data, included 688 non-cancerous cases and 436 lung cancer cases, totaling 1124 cases. The validation set, accounting for 15% of the data, consisted of 148 non-cancerous cases and 94 lung cancer cases, amounting to 242 cases. Similarly, the test set contained 149 non-cancerous cases and 93 lung cancer cases, also totaling 242 cases. The training set was used to develop the YOLOv11-based multi-task model. The validation set was employed for hyperparameter tuning and monitoring performance to prevent overfitting. Finally, the test set was reserved for evaluating the model's performance on unseen data, ensuring an unbiased assessment of its detection, classification, and localization capabilities.

Loss function and optimization algorithm

To optimize the YOLOv11 framework, a multi-task loss function was employed to effectively address detection, classification, and localization tasks. Generalized IoU (GIoU) was used for bounding box regression to ensure precise tumor localization by penalizing poor overlap between predicted and ground truth bounding boxes. Binary cross-entropy (BCE) was applied for classification to handle the imbalance between cancerous and non-cancerous cases, improving the model’s accuracy. Focal loss was implemented for objectness detection to mitigate the impact of background noise and enhance the detection of small objects. The total loss was calculated as a weighted sum of these components, ensuring balanced optimization across all tasks. The model was trained using the AdamW optimizer, which improved convergence by combining weight decay regularization with adaptive learning rates. The initial learning rate was set to 1e-4 and decayed using a cosine annealing schedule to stabilize the training process and enhance performance.

Hyperparameter tuning

Hyperparameter tuning was conducted using the validation set to optimize the performance of the YOLOv11 model. The batch size was evaluated with values of 8, 16, and 32, with a batch size of 16 selected as the optimal choice, balancing stability and memory efficiency during training. The learning rate was tested with initial values of 1e-3, 1e-4, and 1e-5, and 1e-4 demonstrated the best convergence and overall performance. To enhance the detection of small objects, anchor-free thresholds were fine-tuned, resulting in a final threshold of 0.5 IoU, which provided the best accuracy. The model was trained for 250 epochs, ensuring adequate training iterations for effective convergence without excessive computational demands. Early stopping was employed to monitor validation loss and prevent overfitting when the loss plateaued. These carefully selected hyperparameters significantly contributed to the accurate detection, classification, and localization of lung tumors in CT scans, ensuring a robust and optimized YOLOv11 model.

Performance metrics

To evaluate the detection and localization performance of the YOLOv11-based framework, two primary metrics were employed IoU and mAP:

IoU is calculated using the formula:

IoU=Area of OverlapArea of Union 1

where:

Area of Overlap refers to the region where the predicted bounding box overlaps with the ground truth bounding box.

Area of Union refers to the total area covered by both the predicted and ground truth bounding boxes combined.

IoU values range from 0 (no overlap) to 1 (perfect overlap), with a threshold (e.g., IoU ≥ 0.5) typically used to define a successful detection.

mAP is calculated by first computing the Average Precision (AP) for each class and then taking the mean across all classes. AP is the area under the Precision-Recall (PR) curve for a specific class. The formula for mAP is:

AP=01PRdR 2

where:

P(R) is the precision as a function of recall. The integral computes the area under the curve.

The mAP across all classes is calculated as:

mAP=c=1CAPcC 3

where:

C is the total number of classes. APc is the Average Precision for class c.

Classification metrics

For evaluating the model’s classification performance, particularly in distinguishing between cancerous and non-cancerous cases, the following metrics were used:

Precision: Precision quantifies the proportion of correctly classified positive cases (cancerous) out of all cases predicted as positive by the model. It is defined as:

Precision=True PositivesTrue Positives+False Positives 4

Recall: Recall measures the ability of the model to correctly identify all positive cases (cancerous) from the dataset. It is defined as:

Recall=True PositivesTrue Positives+False Negatives 5

Accuracy: Accuracy provides a holistic measure of the model’s classification performance by calculating the proportion of correctly classified cases (both cancerous and non-cancerous) out of the total cases. It is defined as:

Accuracy=True Positives+True NegativesTotal Number of Cases 6

These metrics collectively ensured a robust evaluation of the YOLOv11 framework, capturing both its ability to localize tumors accurately and its capacity to classify them effectively, thereby validating its utility for lung cancer diagnosis.

Experimental setup

Hardware and software specifications

The experiments were conducted on a high-performance computing system specifically configured to meet the computational requirements of training the YOLOv11 multi-task learning model. The hardware setup included an NVIDIA A100 Tensor Core GPU with 40 GB of memory, enabling accelerated training and efficient inference. An Intel Xeon Gold Processor with 48 cores was used to handle data preprocessing and ensure smooth parallel operations. The system was further supported by 256 GB of DDR4 RAM for large-scale data processing and model training, along with a 2 TB NVMe SSD for fast data access and checkpoint storage.

The software environment was optimized for deep learning workflows. PyTorch 2.0 served as the primary framework for model development, training, and evaluation, providing both flexibility and efficiency. The system operated on Ubuntu 20.04, offering a stable platform, and utilized CUDA 11.8 with cuDNN to fully leverage GPU acceleration. Additional libraries such as Numpy, Pandas, and OpenCV were used for data preprocessing and augmentation, while Matplotlib was employed for visualizing performance metrics and results. This comprehensive setup ensured efficient, scalable, and reliable experimentation, enabling robust training and evaluation of the YOLOv11 model.

Training time and resource utilization

Training the YOLOv11-based model was computationally intensive due to the complexity of multi-task learning and the high resolution of CT images. The training process spanned 250 epochs with a batch size of 16, requiring approximately 72 h of GPU compute time. Peak GPU memory utilization reached 32 GB during training, while the average CPU usage was around 70%, primarily for preprocessing tasks such as data augmentation and loading. To optimize resource efficiency, mixed-precision training was utilized with NVIDIA's Apex library, reducing memory overhead without compromising model accuracy. Periodic checkpoints were saved to ensure training progress could be recovered in case of interruptions. The trained model achieved an average inference time of 25 ms per image, highlighting its potential for real-time clinical applications. This setup ensured the model was trained and evaluated with high precision while maintaining computational scalability and efficiency.

With the dataset prepared and the model architecture refined, the next section reports the experimental results, evaluating the performance of YOLOv11 against previous models through key metrics such as accuracy, precision, recall, mAP, and IoU.

Results

Detection results: analysis and insights

The detection performance of the YOLOv11-based multi-task learning model was assessed using key metrics, including mAP and IoU, and compared to its predecessors, YOLOv9 and YOLOv10. The evaluation highlighted notable improvements in detection accuracy and localization precision with YOLOv11.

After 250 epochs of training, the YOLOv11 model achieved an average mAP of 96.26% and an IoU of 95.76%, showcasing its superior capability in detecting and localizing lung tumors. These results underscore the model’s effectiveness in accurately predicting bounding boxes and class probabilities, even in challenging cases involving small or subtle abnormalities. The high IoU score reflects precise alignment of predicted bounding boxes with the ground truth, a critical factor for reliable tumor localization in medical imaging.

Comparison with YOLOv10 and YOLOv9

YOLOv11 outperformed YOLOv10 and YOLOv9 across both mAP and IoU metrics, demonstrating the effectiveness of its architectural advancements. YOLOv10 achieved a final average mAP of 95.23% and IoU of 94.28%, while YOLOv9 recorded a mAP of 95.70% and IoU of 94.10%. Although the performance gap may appear modest, the incremental improvements in mAP (0.96% over YOLOv10 and 0.56% over YOLOv9) and IoU (1.48% over YOLOv10 and 1.66% over YOLOv9) represent significant progress in medical imaging, where even small enhancements can have a meaningful impact on diagnostic accuracy and reliability.

The superior performance of YOLOv11 is attributed to its enhanced feature pyramid network, transformer-based attention layers, and adaptive anchor-free mechanism. These innovations effectively addressed challenges such as detecting small tumors and handling complex tumor morphologies, which were limiting factors in YOLOv9 and YOLOv10. YOLOv11 also demonstrated a strong correlation between mAP and IoU, signifying its ability to not only detect tumors accurately (high mAP) but also localize them precisely (high IoU). This balance is critical for tasks requiring both classification and spatial accuracy, such as tumor mapping in CT scans.

While YOLOv9 and YOLOv10 achieved relatively high mAP and IoU scores, reinforcing the robustness of the YOLO architecture for medical imaging, the consistent improvements seen in YOLOv11 validate the significance of its architectural refinements.

Figure 3 illustrates the evolution of mAP and IoU metrics over 250 training epochs for YOLOv11, YOLOv10, and YOLOv9. The plots highlight steady improvement in both metrics across all models, with YOLOv11 achieving the highest final values (mAP: 96.26%, IoU: 95.76%). YOLOv11 also demonstrated faster convergence and greater stability during training compared to its predecessors, underscoring the effectiveness of its advanced features. Although YOLOv10 and YOLOv9 performed strongly, YOLOv11’s incremental advancements further enhance the reliability and precision of the YOLO series in medical imaging applications.

Fig. 3.

Fig. 3

Progression of mAP and IoU metrics across 250 training epochs for YOLOv11, YOLOv10, and YOLOv9

Figure 4 illustrates the loss trajectories for the YOLOv11, YOLOv10, and YOLOv9 models during 250 epochs of training. The loss values for each model were observed to decrease exponentially, with YOLOv11 exhibiting the fastest convergence and lowest final loss, highlighting its efficiency in optimizing the detection framework. YOLOv10 and YOLOv9 demonstrated slower convergence rates and marginally higher final loss values, reflecting the impact of architectural advancements in YOLOv11. These results emphasize YOLOv11's superior stability and optimization capabilities, which are crucial for achieving high accuracy in medical imaging tasks.

Fig. 4.

Fig. 4

Loss curves for YOLOv11, YOLOv10, and YOLOv9 Over 250 epochs

Classification results: analysis and insights

YOLOv11 consistently achieved the highest performance across all datasets, demonstrating training accuracy of 98.13%, validation accuracy of 97.93%, and testing accuracy of 97.11%. Precision remained high, with values of 98.11% for training, 98.65% for validation, and 97.99% for testing. Similarly, recall (sensitivity) metrics were exceptional, recording 98.83% for training, 97.99% for validation, and 97.33% for testing. These results highlight YOLOv11’s robustness in accurately identifying true positives and minimizing false negatives, making it the most reliable model for tumor classification. The slight decline in performance from training to testing datasets reflects good generalization and minimal overfitting.

In comparison, YOLOv10 delivered strong but slightly lower performance. It achieved 95.91% training accuracy, 95.04% validation accuracy, and 94.12% testing accuracy. Precision values were 95.93% for training, 95.95% for validation, and 93.96% for testing, while recall metrics were 97.35% for training, 95.95% for validation, and 94.59% for testing. These results suggest that while YOLOv10 performs well, it is slightly less robust than YOLOv11, particularly when handling unseen data.

YOLOv9 showed the lowest classification performance among the three models, with a noticeable drop in all metrics. It recorded 94.31% training accuracy, 91.74% validation accuracy, and 89.92% testing accuracy. Precision and recall values for training were 94.48% and 95.87%, respectively, while validation and testing datasets both showed 93.24% precision and recall for validation and 91.95% for testing. These metrics indicate that YOLOv9 struggles more with challenging cases, leading to higher false positive and false negative rates, particularly on the testing dataset. YOLOv11 consistently outperformed YOLOv10 and YOLOv9 by an average margin of 2–3% across accuracy, precision, and recall metrics. Its minimal accuracy declined from training to testing datasets reflects superior generalization, whereas YOLOv9 displayed the steepest decline of 4.39%, indicating weaker generalization capabilities. Precision for YOLOv11 was notably high in validation and testing datasets, effectively reducing false positives, while its recall remained consistently high, emphasizing its ability to correctly identify cancer cases. In contrast, YOLOv9 struggled with recall, particularly in the testing dataset, leading to more missed positive cases and highlighting its limitations in critical diagnostic tasks.

Figure 5 provides a comparative analysis of YOLOv9, YOLOv10, and YOLOv11 in terms of accuracy, precision, and recall over 250 epochs. The plots clearly illustrate YOLOv11’s superior generalization and robustness, with minimal gaps between training, validation, and testing curves, underscoring its strong resistance to overfitting. These attributes make YOLOv11 the most reliable model for tumor classification tasks, particularly in high-stakes diagnostic applications.

Fig. 5.

Fig. 5

Performance metrics of YOLO models across training, validation, and testing phases

The performance of YOLOv11 was evaluated against its predecessors, YOLOv9 and YOLOv10, using key classification and detection metrics. As shown in Table 1, YOLOv11 consistently outperformed the earlier models across all datasets, achieving the highest accuracy, precision, recall, and localization efficiency. Notably, YOLOv11 demonstrated a 7.19% improvement in testing accuracy over YOLOv9 and a 2.99% improvement over YOLOv10, highlighting its superior generalization capabilities. Additionally, its IoU score of 95.76% indicates better tumor localization, further confirming the effectiveness of its architectural advancements. These results validate YOLOv11 as a highly robust and reliable model for lung cancer detection and classification.

Table 1.

Performance comparison of YOLOv9, YOLOv10, and YOLOv11

Metric YOLOv9 YOLOv10 YOLOv11 YOLOv11 improvement
Training accuracy (%) 94.31 95.91 98.13  + 2.22% over YOLOv10, + 3.82% over YOLOv9
Validation accuracy (%) 91.74 95.04 97.93  + 2.89% over YOLOv10, + 6.19% over YOLOv9
Testing accuracy (%) 89.92 94.12 97.11  + 2.99% over YOLOv10, + 7.19% over YOLOv9
Training precision (%) 94.48 95.93 98.11  + 2.18% over YOLOv10, + 3.63% over YOLOv9
Validation precision (%) 93.24 95.95 98.65  + 2.70% over YOLOv10, + 5.41% over YOLOv9
Testing precision (%) 91.95 93.96 97.99  + 4.03% over YOLOv10, + 6.04% over YOLOv9
Training recall (%) 95.87 97.35 98.83  + 1.48% over YOLOv10, + 2.96% over YOLOv9
Validation recall (%) 93.24 95.95 97.99  + 2.04% over YOLOv10, + 4.75% over YOLOv9
Testing recall (%) 91.95 94.59 97.33  + 2.74% over YOLOv10, + 5.38% over YOLOv9
Mean average precision (mAP) (%) 95.70 95.23 96.26  + 1.03% over YOLOv10, + 0.56% over YOLOv9
Intersection over union (IoU) (%) 94.10 94.28 95.76  + 1.48% over YOLOv10, + 1.66% over YOLOv9

Bold highlights the superior performance of the YOLOv11 model compared to the other two models. This distinction emphasizes its enhanced accuracy and efficiency in various evaluation criteria

Figure 6 illustrates the confusion matrices for the YOLOv11, YOLOv10, and YOLOv9 models during the training, validation, and testing phases. Each matrix provides a detailed breakdown of true positives, false positives, true negatives, and false negatives, offering insight into the models' classification performance. Notably, YOLOv11 demonstrates the highest accuracy across all phases, with lower false positive and false negative rates compared to YOLOv10 and YOLOv9. This indicates that YOLOv11 excels in both detecting and correctly classifying objects, outperforming the earlier versions.

Fig. 6.

Fig. 6

Comparison of confusion matrices for YOLOv11, YOLOv10, and YOLOv9 across training, validation, and testing phases

Figure 7 demonstrates the t-SNE projections of the training and test datasets for YOLOv9, YOLOv10, and YOLOv11 models, highlighting the separation between "Cancerous" and "Non-Cancerous" classes. The left column shows the intermingled clusters before t-SNE application, while the right column illustrates the well-separated clusters after t-SNE. Controlled misclassifications are also visualized, simulating model-specific classification challenges. YOLOv11 exhibits the most distinct separation and minimal overlap between classes, reflecting its improved feature representation and classification accuracy compared to YOLOv9 and YOLOv10.

Fig. 7.

Fig. 7

t-SNE visualization of training and test data for YOLOv9, YOLOv10, and YOLOv11 models

Figure 8 illustrates the YOLOv11 model's performance in lung tumor localization by comparing the predicted bounding boxes (Red) with the ground truth annotations (Green). The visualization demonstrates a significant overlap between the predicted and actual tumor regions, highlighting the model's precision in identifying and localizing tumors within CT scans. This comparison further validates the model's effectiveness in handling complex tumor morphologies and varying imaging conditions. The results demonstrate the superior performance of YOLOv11 in detecting and localizing lung tumors. The following section discusses these findings in the context of existing research, highlighting the strengths and limitations of our approach and its implications for clinical applications.

Fig. 8.

Fig. 8

Accurate localization of lung tumors: predicted (red) vs. ground truth (green) bounding boxes

Discussion

The present study introduces YOLOv11, an advanced framework designed for lung cancer detection, classification, and localization from CT scan images. By addressing critical limitations in existing diagnostic models, YOLOv11 incorporates transformative architectural features, including transformer-based attention layers, adaptive anchor-free mechanisms, and enhanced feature pyramid networks. These innovations enable the model to achieve state-of-the-art performance in accuracy, precision, and recall while maintaining computational efficiency. Its multi-task learning capability streamlines diagnostic workflows and ensures high reliability, making it particularly effective for early-stage lung cancer detection. Existing AI models for lung cancer detection often face challenges such as high false-positive rates, suboptimal localization accuracy, and computational inefficiencies. While models like YOLOv9 and YOLOv10 offer improvements, they still struggle with detecting small tumors and handling class imbalances. Ensemble methods and segmentation-based models improve detection sensitivity but often require extensive computational resources, making them impractical for real-time applications. YOLOv11 addresses these limitations through an adaptive anchor-free mechanism, transformer-based attention layers, and optimized feature extraction strategies.

The YOLOv11 model significantly outperforms its predecessors, YOLOv10 and YOLOv9, in detection and classification tasks. YOLOv11 achieved an mAP of 96.26% and an IoU of 95.76%, surpassing YOLOv10 (mAP = 95.23%, IoU = 94.28%) and YOLOv9 (mAP = 95.70%, IoU = 94.10%). These improvements highlight the impact of YOLOv11's architectural advancements, such as transformer-based attention layers and deeper feature pyramid networks, which enhance multi-scale and contextual feature capture. Furthermore, YOLOv11 exhibited faster convergence and greater training stability, as evidenced by lower final loss values compared to YOLOv10 and YOLOv9.

Compared to other state-of-the-art methods, YOLOv11 demonstrates superior accuracy and efficiency while offering a comprehensive solution for lung cancer diagnostics. Mahum et al.’s [38] Lung-RetinaNet achieved notable metrics, including 99.8% accuracy, 99.3% recall, and 99.4% precision, by employing multi-scale feature fusion and lightweight context algorithms. However, Lung-RetinaNet is limited to detection and classification, whereas YOLOv11 integrates detection, classification, and localization in a unified framework. YOLOv11 achieves comparable precision (98.11%) and recall (98.83%) while maintaining high localization accuracy with an IoU of 95.76%. Ji et al.’s [30] ELCT-YOLO, which introduced a Cascaded Refinement Scheme for improved receptive fields and multi-scale context representation, achieved strong detection performance but was surpassed by YOLOv11 in mAP and IoU. YOLOv11's adaptive anchor-free mechanism and transformer-based attention layers provide a clear advantage, enabling it to excel in classification and localization tasks as well.

Rehman et al.’s [39] segmentation-focused approach achieved high sensitivity (98.33%) and dice similarity scores (98.18%) but relied on computationally intensive pre-segmentation steps. In contrast, YOLOv11 combines segmentation, detection, and classification into a real-time, scalable framework, making it a more efficient and practical solution for clinical applications. Advanced multimodal approaches, such as Zhou et al.’s [31] CCGL-YOLOv5, utilized cross-modal transformers and attention mechanisms for PET/CT imaging, achieving 97.83% accuracy and 96.67% mAP. While CCGL-YOLOv5 excels in multimodal imaging, its reliance on dual modalities limits its use in single-modality datasets. YOLOv11, optimized for single-modality CT scans, delivers comparable performance by leveraging a lightweight architecture capable of capturing both global and localized features.

Similarly, ensemble models like Quasar et al.’s [40] approach achieved high diagnostic accuracy (98%) by combining diverse classifiers. However, ensemble methods often demand substantial computational resources and complex integration strategies. YOLOv11 achieves comparable accuracy through its streamlined single-model architecture, providing a practical and scalable alternative.

The integration of transformer-based attention layers and group normalization in YOLOv11 effectively addresses challenges such as class imbalance and limited batch sizes commonly encountered in medical imaging datasets. Unlike segmentation-based approaches such as Rani et al.’s [41] AMPWSVM, which achieved 93.3% accuracy, YOLOv11 achieves superior performance across diverse datasets through its adaptive architecture and robust feature extraction capabilities. YOLOv11 demonstrates strong potential for clinical implementation, combining high accuracy with real-time inference capabilities. Compared to methods like Mahum et al.’s Lung-RetinaNet and Zhou et al.’s CCGL-YOLOv5, YOLOv11's lightweight design and computational efficiency make it suitable for integration into existing clinical workflows. Its ability to detect and localize small, complex tumors further establishes it as a transformative tool for early-stage lung cancer diagnosis, with the potential to improve patient outcomes by enabling timely interventions.

Despite its strengths, YOLOv11 has limitations. Its reliance on high-quality CT datasets may hinder applicability in resource-limited settings where image quality is inconsistent. Additionally, while the model effectively handles class imbalance, performance could be further improved by incorporating advanced data augmentation techniques or leveraging multimodal datasets. Future research could explore hybrid models that integrate YOLOv11's efficiency with multimodal imaging to enhance diagnostic accuracy even further. Building upon these insights, the conclusion section summarizes our key contributions, emphasizing the advancements of YOLOv11 and potential future directions for improving AI-driven lung cancer diagnostics.

Conclusion

This study introduced YOLOv11, a cutting-edge deep learning framework designed for the simultaneous detection, classification, and localization of lung tumors in CT images. Featuring transformer-based attention layers, adaptive anchor-free mechanisms, and enhanced feature pyramid networks, YOLOv11 demonstrated superior performance with a mAP of 96.26%, IoU of 95.76%, and high precision (98.11%) and recall (98.83%). In contrast to existing models, YOLOv11 excels in multi-task learning, enabling streamlined diagnostic workflows and improved accuracy, particularly in detecting small tumors. Its lightweight design and real-time inference capabilities make it well-suited for clinical implementation. While further testing on diverse datasets is necessary, YOLOv11 establishes a robust foundation for advancing AI-driven lung cancer diagnostics and enhancing patient care.

Acknowledgements

Authors are grateful to the Researchers Supporting Project (ANUI2024M111), Alnoor University, Mosul, Iraq.

Author contributions

A.F.A., Z.A., M.A., M.J. and H.M.: Investigation; Writing-original draft. S.A., A.Y., M.K., and Z.S.A.: Methodology; Software; Formal analysis. A.K.B., M.M.R., V.K.: Writing—original draft. B.F.: Conceptualization; Data curation; Project administration; Validation; Supervision; Writing—review & editing. All authors read and approved the manuscript.

Funding

The authors have not disclosed any funding.

Data availability

The datasets generated and analyzed during the current study are available from the corresponding author upon reasonable request.

Declarations

Ethics approval

The need for ethical approval was waived off by the ethical committee of Alnoor University, Mosul, Nineveh, Iraq.

Consent to participate

None.

Conflict of interest

The authors declare that there are no conflicts of interest.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Thandra KC, Barsouk A, Saginala K, Aluru JS, Barsouk A. Epidemiology of lung cancer. Contemp Oncol Onkol. 2021;25(1):45–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Gazdar AF, Minna JD. Molecular detection of early lung cancer. J Natl Cancer Inst. 1999;91(4):299–301. [DOI] [PubMed] [Google Scholar]
  • 3.Porter J. Detection of early lung cancer. Thorax. 2000;55(Suppl 1):S56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hu S, Li Y, Fan X. Predictive value of simulated CT radiomics combined with ipsilateral lung dosimetry parameters for radiation pneumonitis in patients with esophageal cancer: a machine learning-based retrospective study. Int J Gen Med. 2024;17:4127–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Weikert T, Jaeger PF, Yang S, Baumgartner M, Breit HC, Winkel DJ, et al. Automated lung cancer assessment on 18F-PET/CT using retina U-Net and anatomical region segmentation. Eur Radiol. 2023;33(6):4270–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Park S, Lee SM, Do KH, Lee JG, Bae W, Park H, et al. Deep learning algorithm for reducing CT slice thickness: effect on reproducibility of radiomic features in lung cancer. Korean J Radiol. 2019;20(10):1431–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ganeshan B, Goh V, Mandeville HC, Ng QS, Hoskin PJ, Miles KA. Non–small cell lung cancer: histopathologic correlates for texture parameters at CT. Radiology. 2013;266(1):326–36. [DOI] [PubMed] [Google Scholar]
  • 8.Wang X, Zhang L, Yang X, Tang L, Zhao J, Chen G, et al. Deep learning combined with radiomics may optimize the prediction in differentiating high-grade lung adenocarcinomas in ground glass opacity lesions on CT scans. Eur J Radiol. 2020;129:109150. [DOI] [PubMed] [Google Scholar]
  • 9.Cifci MA. SegChaNet: a novel model for lung cancer segmentation in CT scans. Appl Bionics Biomech. 2022;2022(1):1139587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Yang J, Wu B, Li L, Cao P, Zaiane O. MSDS-UNet: A multi-scale deeply supervised 3D U-Net for automatic segmentation of lung tumor in CT. Comput Med Imaging Graph. 2021;92:101957. [DOI] [PubMed] [Google Scholar]
  • 11.Wang S, Dong L, Wang X, Wang X. Classification of pathological types of lung cancer from CT images by deep residual neural networks with transfer learning strategy. Open Med. 2020;15(1):190–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Amini M, Nazari M, Shiri I, Hajianfar G, Deevband MR, Abdollahi H, et al. Multi-level multi-modality (PET and CT) fusion radiomics: prognostic modeling for non-small cell lung carcinoma. Phys Med Biol. 2021;66(20):205017. [DOI] [PubMed] [Google Scholar]
  • 13.Kuruvilla J, Gunavathi K. Lung cancer classification using neural networks for CT images. Comput Methods Programs Biomed. 2014;113(1):202–9. [DOI] [PubMed] [Google Scholar]
  • 14.Dunn B, Pierobon M, Wei Q. Automated classification of lung cancer subtypes using deep learning and CT-scan based radiomic analysis. Bioengineering. 2023;10(6):690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Katiyar P, Singh K. A comparative study of lung cancer detection and classification approaches in CT images. In: 2020 7th international conference on signal processing and integrated networks (SPIN). IEEE; 2020. p. 135–42.
  • 16.Heydarheydari S, Birgani MJT, Rezaeijo SM. Auto-segmentation of head and neck tumors in positron emission tomography images using non-local means and morphological frameworks. Polish J Radiol. 2023;88:e365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Fatan M, Hosseinzadeh M, Askari D, Sheikhi H, Rezaeijo SM, Salmanpour MR. Fusion-based head and neck tumor segmentation and survival prediction using robust deep learning techniques and advanced hybrid machine learning systems BT - head and neck tumor segmentation and outcome prediction. In: Andrearczyk V, Oreiller V, Hatt M, Depeursinge A, editors. Cham: Springer; 2022. p. 211–23.
  • 18.Salmanpour MR, Hosseinzadeh M, Akbari A, Borazjani K, Mojallal K, Askari D, et al. Prediction of TNM stage in head and neck cancer using hybrid machine learning systems and radiomics features. In: Medical imaging 2022: computer-aided diagnosis. SPIE; 2022. p. 648–53.
  • 19.Rezaeijo SM, Harimi A, Salmanpour MR. Fusion-based automated segmentation in head and neck cancer via advance deep learning techniques. In: Andrearczyk V, Oreiller V, Hatt M, Depeursinge A, editors. 3D head and neck tumor segmentation in PET/CT challenge. Cham: Springer; 2022. p. 70–6. [Google Scholar]
  • 20.Mahboubisarighieh A, Shahverdi H, Jafarpoor Nesheli S, Alipoor Kermani M, Niknam M, Torkashvand M, et al. Assessing the efficacy of 3D Dual-CycleGAN model for multi-contrast MRI synthesis. Egypt J Radiol Nucl Med. 2024;55(1):1–12. [Google Scholar]
  • 21.Paeenafrakati MS, Hajianfar G, Rezaeijo SM, Ghaemi M, Rahmim A. Advanced automatic segmentation of tumors and survival prediction in head and neck cancer. Lect Notes Comput Sci Chall. 2022. 10.1007/978-3-030-98253-9_19. [Google Scholar]
  • 22.Fatan M, Hosseinzadeh M, Askari D, Sheikhi H, Rezaeijo SM, Salmanpour MR. Fusion-based head and neck tumor segmentation and survival prediction using robust deep learning techniques and advanced hybrid machine learning systems. In: Andrearczyk V, Oreiller V, Hatt M, Depeursinge A, editors. 3D head and neck tumor segmentation in PET/CT challenge. Cham: Springer; 2021. p. 211–23. [Google Scholar]
  • 23.Jiang P, Ergu D, Liu F, Cai Y, Ma B. A review of YOLO algorithm developments. Procedia Comput Sci. 2022;199:1066–73. [Google Scholar]
  • 24.Aly GH, Marey M, El-Sayed SA, Tolba MF. YOLO based breast masses detection and classification in full-field digital mammograms. Comput Methods Programs Biomed. 2021;200:105823. [DOI] [PubMed] [Google Scholar]
  • 25.Diwan T, Anirudh G, Tembhurne JV. Object detection using YOLO: challenges, architectural successors, datasets and applications. Multimed Tools Appl. 2023;82(6):9243–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Khanam R, Hussain M. YOLOv11: an overview of the key architectural enhancements. arXiv Prepr arXiv241017725. 2024.
  • 27.Alif MAR. YOLOv11 for vehicle detection: advancements, performance, and applications in intelligent transportation systems. arXiv Prepr arXiv241022898. 2024.
  • 28.Terven J, Córdova-Esparza DM, Romero-González JA. A comprehensive review of Yolo architectures in computer vision: from YOLOv1 to YOLOv8 and YOLO-nas. Mach Learn Knowl Extr. 2023;5(4):1680–716. [Google Scholar]
  • 29.Al-Masni MA, Al-Antari MA, Park JM, Gi G, Kim TY, Rivera P, et al. Simultaneous detection and classification of breast masses in digital mammograms via a deep learning YOLO-based CAD system. Comput Methods Programs Biomed. 2018;157:85–94. [DOI] [PubMed] [Google Scholar]
  • 30.Ji Z, Zhao J, Liu J, Zeng X, Zhang H, Zhang X, et al. ELCT-YOLO: an efficient one-stage model for automatic lung tumor detection based on CT images. Mathematics. 2023;11(10):2344. [Google Scholar]
  • 31.Zhou T, Liu F, Ye X, Wang H, Lu H. CCGL-YOLOv5: a cross-modal cross-scale global-local attention YOLOv5 lung tumor detection model. Comput Biol Med. 2023;165:107387. [DOI] [PubMed] [Google Scholar]
  • 32.Ozturk T, Talo M, Yildirim EA, Baloglu UB, Yildirim O, Rajendra AU. Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput Biol Med. 2020;1(121):103792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Su Y, Liu Q, Xie W, Hu P. YOLO-LOGO: a transformer-based YOLO segmentation model for breast mass detection and segmentation in digital mammograms. Comput Methods Programs Biomed. 2022;221:106903. [DOI] [PubMed] [Google Scholar]
  • 34.Hussain M. Yolov1 to v8: unveiling each variant–a comprehensive review of YOLO. IEEE Access. 2024;12:42816–33. [Google Scholar]
  • 35.He Z, Wang K, Fang T, Su L, Chen R, Fei X. Comprehensive performance evaluation of YOLOv11, YOLOv10, YOLOv9, YOLOv8 and YOLOv5 on object detection of power equipment. arXiv Prepr arXiv241118871. 2024.
  • 36.AkbarnezhadSany E, EntezariZarch H, AlipoorKermani M, Shahin B, Cheki M, Karami A, et al. YOLOv8 outperforms traditional CNN models in mammography classification: insights from a multi-institutional dataset. Int J Imaging Syst Technol. 2025;35(1):e70008. [Google Scholar]
  • 37.Bijari S, Rezaeijo SM, Sayfollahi S, Rahimnezhad A, Heydarheydari S. Development and validation of a robust MRI-based nomogram incorporating radiomics and deep features for preoperative glioma grading: a multi-center study. Quant Imaging Med Surg. 2025;15(2):1121138–5138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Mahum R, Al-Salman AS. Lung-RetinaNet: lung cancer detection using a RetinaNet with multi-scale feature fusion and context module. IEEE Access. 2023;11:53850–61. [Google Scholar]
  • 39.Rehman A, Harouni M, Zogh F, Saba T, Karimi M, Alamri FS, et al. Detection of lungs tumors in CT scan images using convolutional neural networks. IEEE/ACM Trans Comput Biol Bioinforma. 2023;21(4):769–77. [DOI] [PubMed] [Google Scholar]
  • 40.Quasar SR, Sharma R, Mittal A, Sharma M, Agarwal D, de La Torre DI. Ensemble methods for computed tomography scan images to improve lung cancer detection and classification. Multimed Tools Appl. 2024;83(17):52867–97. [Google Scholar]
  • 41.Rani KV, Sumathy G, Shoba LK, Shermila PJ, Prince ME. Radon transform-based improved single seeded region growing segmentation for lung cancer detection using AMPWSVM classification approach. Signal, Image Video Process. 2023;17(8):4571–80. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets generated and analyzed during the current study are available from the corresponding author upon reasonable request.


Articles from Discover Oncology are provided here courtesy of Springer

RESOURCES