Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Sep 25;15:32827. doi: 10.1038/s41598-025-16725-8

Steel surface defect detection algorithm based on improved YOLOv10

Laomo Zhang 1,, Zhike Wang 2, Ying Ma 1, Guowei Li 2
PMCID: PMC12464263  PMID: 40998876

Abstract

In recent years, steel surface defect detection based on machine vision has attracted significant attention and has emerged as a research hotspot. However, several challenges remain. In practical industrial scenarios, deep learning-based detection methods often involve high computational complexity, which limits their applicability for real-time defect monitoring. Moreover, due to the complex and noisy background of steel surfaces, conventional deep learning networks frequently suffer from the loss of critical defect features during the feature extraction process. To address these challenges, this paper proposes a novel latent-space attention multi-scale YOLOv10n model (LAM-YOLOv10n). First, a lightweight ghost module is integrated to significantly reduce the model’s parameter count and computational cost. Second, a spatial multi-scale attention (SMA) module is designed to enhance the extraction of discriminative features related to steel surface defects. Finally, a multi-branch feature fusion network (MFFN) is introduced to improve the effectiveness of multi-scale feature aggregation, thereby enhancing the model’s detection performance for various defect types. Experimental results demonstrate that the proposed LAM-YOLOv10n model achieves a 3.47% improvement in precision compared with the baseline YOLOv10n network, outperforming several state-of-the-art object detection models in both accuracy and efficiency. These findings indicate the effectiveness and practicality of the proposed method for real-time steel surface defect detection in complex industrial environments.

Keywords: Multi-branch feature fusion network, YOLOv10, Spatial multi-scale attention, Steel surface defect detection

Subject terms: Computer science, Information technology, Software

Introduction

Steel, as a critical industrial material, is widely used in infrastructure construction, industrial manufacturing, transportation and logistics, and shipbuilding, serving as the cornerstone of national economic growth and technological progress13. Due to its excellent mechanical properties, weldability, durability, and cost-effectiveness, steel plays a crucial role in ensuring the reliability and service life of critical structures and systems. The performance and quality of steel components significantly impact the structural integrity and financial outcomes of engineering projects, affecting not only operational safety but also long-term maintenance costs and lifecycle efficiency. However, during complex multi-stage manufacturing processes—such as rolling, heat treatment, and surface treatment—various defects may appear on the steel surface, as shown in (Fig. 1). These defects can act as stress concentration points, reducing fatigue strength and ultimately affecting the mechanical properties of steel products. If not detected and addressed promptly, such defects may lead to serious safety hazards, equipment failures, or even structural collapse in high-risk applications. Therefore, achieving precise, rapid, and automated identification of surface defects is of critical importance to the mechanical manufacturing industry46.

Fig. 1.

Fig. 1

Examples of types of steel surface defects.

Approaches for identifying steel surface defect types can generally be divided into three main categories: human visual inspection79, conventional photoelectric techniques1012, and modern machine vision-based detection systems1315. Among these, the first two—although relatively simple—face challenges such as elevated labor expenses and considerable inconsistencies due to human subjectivity. These shortcomings result in extended inspection durations, decreased operational efficiency, higher instances of misclassification, and diminished detection precision.

With the development of deep learning, an increasing number of scholars have focused on the field of steel defects. However, most existing models suffer from high computational complexity and insufficient feature fusion capabilities, posing significant challenges for classifying surface defects in steel using computer vision technology. As a classic network in the YOLO series, YOLOv1016 achieves a better balance between speed and accuracy by introducing Neural Architecture Search (NAS) and an improved decoupled head structure. Therefore, this paper proposes a reliable and accurate defect type detection algorithm using YOLOv10 as the baseline network. The main contributions of this paper are as follows:

  1. This paper proposes a steel defect detection method LAM-YOLOv10n, which consists of the original YOLOv10n network combined with the Ghost module, the MFFN module and the SMA module, which has the capability of extracting the defect type feature information under the complex background of the steel surface, and meets the detection requirements in practical industrial scenarios.

  2. In order to solve the problem of low defect detection precision due to the complex background of steel surface, this paper proposes the SMA module, which is used in the backbone feature extraction network to extract the defective feature information on the steel surface, and at the same time, it utilizes the multi-scale connection to avoid the loss of defective feature information in the process of extraction, so as to improve the overall performance of the network model and recognition precision.

  3. To address the issue of target feature loss during the training phase of the model, this paper introduces the MFFN module, which is designed to fuse fine-grained feature information related to steel surface defects. This enhancement aims to strengthen the image recognition model’s comprehension of defect scenarios on steel surfaces, thereby improving both the accuracy and robustness of the recognition process.

Related work

A substantial body of research has investigated the use of conventional machine learning methods for detecting defects on steel surfaces. Xu et al.17 employed a multiscale geometric processing framework that segmented steel surface imagery into directional elements across various resolutions. From these, high-dimensional descriptors were extracted and subsequently transformed into compact representations using graph-based dimensionality reduction, thereby enabling effective defect categorization. Hu et al.18 developed an integrated strategy that fused backpropagation neural networks with support vector machines. In their approach, defect images underwent binarization to simplify the extraction of relevant features and assist in defect classification. Liu et al.19 presented a fully trainable extreme learning machine architecture for recognizing defects, with local binary patterns serving as the primary feature representation. The framework autonomously aggregated results from multiple independent submodules to determine defect types. Experimental results demonstrated that this method outperformed several traditional techniques in steel surface defect recognition tasks.

Driven by the swift evolution of computer vision, digital imaging, and artificial intelligence, steel surface defect detection is steadily moving toward greater levels of automation and intelligence. The integration of ultra-high-definition cameras, sophisticated visual processing techniques, and deep neural networks has significantly advanced the automatic identification and categorization of surface imperfections in steel. Soukup et al.20 utilized a convolutional neural network (CNN) trained in a fully labeled environment to boost detection accuracy, further optimizing performance through regularization methods. Yi et al.21 introduced an approach that segments faulty regions using a symmetric surrounded saliency mechanism, combined with a CNN. This framework bypasses the need for manual feature extraction commonly seen in earlier techniques, thus improving both speed and reliability. Damacharla et al.22 implemented a hybrid deep learning design by embedding ResNet and DenseNet modules into the encoder of a UNet architecture, resulting in enhanced precision in identifying surface anomalies on steel products. Uraon et al.23 developed a specialized neural network aimed at identifying multiple defect types within intricate background environments. Experimental findings indicated that the model delivered strong performance in the context of steel surface defect recognition. Bouguettaya et al.24 designed a composite architecture by merging MobileNet-V2 with Xception under a transfer learning paradigm. Utilizing a deep ensemble methodology, their model preserved the advantage of rapid inference while mitigating issues related to the typically large size of conventional deep learning networks, achieving promising outcomes in defect identification tasks. Akhyar et al.25 incorporated deformable convolution layers and adaptive RoI pooling into a cascaded R-CNN framework, enhancing the network’s responsiveness to irregular object geometries. Furthermore, the application of stochastic and limit-based scaling mechanisms refined the model’s precision in handling detailed target characteristics. Xia et al.26 advanced the YOLOv5s framework by introducing a novel large-kernel C3 module, which strengthened the system’s capacity for perceptual efficiency and detailed feature representation in visually complex textures. Additionally, a tailored training approach was adopted, leveraging multi-scale feature maps aligned with convolution kernels of varying dimensions, thus improving the model’s flexibility in detecting defects with diverse morphological attributes. Raj et al.27 presented the YOLOv7-CSF model, which integrates a lightweight and cost-effective coordinated attention module into the prediction head of the YOLOv7 framework. In addition, the implementation of the SCYLLA-IOU loss function contributed to improved detection performance and computational effectiveness in steel surface defect identification. Huang et al.28 proposed an enhanced architecture named WFE-YOLOv8s, derived from the YOLOv8s model, where the traditional C2F module was substituted with a novel CFN design. This replacement led to a reduction in both model parameters and computational complexity (measured by GFLOPs), while the incorporation of EMA attention further boosted detection precision. He et al.29 developed an improved defect detection network also built upon the YOLOv8s backbone. Their method utilized a foundational convolutional network to extract hierarchical feature representations at different layers. These representations were then aggregated via a multiscale fusion mechanism, followed by the application of a proposal generation module to localize potential defect regions. Experimental validation showed that the approach excelled in accurately identifying a wide range of surface anomalies on steel. Amin et al.30 advanced the detection of steel defects by constructing a machine learning framework capable of recognizing hierarchical defect patterns from sample steel plate imagery and categorizing them into appropriate classes, thereby enhancing overall classification performance. Tabernik et al.31 introduced a segmentation-driven deep neural network tailored for identifying and isolating surface anomalies, which demonstrates strong learning capabilities even when trained on limited datasets. Demir et al.32 presented an innovative deep learning technique for pinpointing and categorizing defect types arising during the steel manufacturing process. Their approach employs concurrent training of residual blocks alongside attention modules to capture rich, discriminative features. Li et al.33 proposed a neural model that embeds a multi-scale representation learning mechanism for improved defect identification. By combining hierarchical feature extraction with a streamlined fusion strategy, the model significantly boosts detection accuracy while keeping the number of parameters relatively small.

Overall structure of LAM-YOLOv10n model

The LAM-YOLOv10n model is specifically developed to achieve high-precision detection of steel surface defects by capturing images from a top-down perspective. The primary objective of the proposed model is to accurately localize defect regions and classify various types of surface anomalies, thereby reducing false positives and enhancing the overall reliability of defect recognition in real industrial environments. To achieve this goal, LAM-YOLOv10n model focuses on fully mining and utilizing the image feature in-formation in and around the defect region to improve the ability to differentiate defect types. By jointly analyzing local defect characteristics and global surface textures, the model effectively enhances its discriminative capability, enabling precise classification among subtle and visually similar defect types. In addition, the LAM-YOLOv10n model adopts an optimized network structure and feature extraction to ensure that it still maintains high detection precision and real-time performance in complex industrial environments.

The overall network framework, as illustrated in Fig. 2, highlights the synergistic interaction among key modules, including the Backbone module, Neck module, and Head module. These components collectively contribute to robust and real-time steel surface defect detection. Among them, the Backbone module serves to extract the defect type feature information in the steel surface image; the Neck module realizes the effective fusion of the deep and shallow feature information of the defect type; and the Head module is responsible for analyzing the steel image data and detecting the defects.

Fig. 2.

Fig. 2

LAM-YOLOv10n network structure.

The actual production environment of steel is often characterized by blurred visual characteristics of defect target types due to lighting, contrast and other factors, further increasing the difficulty of defect type classification and detection. In addition, defect targets on steel surfaces are often small, and complex production backgrounds and harsh environmental conditions make it more difficult to detect and localize defect target features. To address these challenges, the LAM-YOLOv10n model is optimized in several ways. First, the data preprocessing module expands the industrial steel surface defect data to provide more sufficient and complete training data for the network model; second, the model introduces the SMA module, which adopts a cross-channel approach to interact with the steel surface defect feature information, and thus is specifically designed to focus on the tiny target defect features on the steel surface. Finally, by adding the MFFN structure to the output of the Neck module, feature information of different depths can be more effectively fused to improve the detection of deep and shallow defects, thus better localizing and detecting the types of steel surface defects. Through the above positive effects, the LAM-YOLOv10n model can significantly improve the accuracy and stability of defect detection, and effectively solve the practical problems of defect recognition on steel surfaces.

Introducing ghost module

Inspired by the literature34, to maintain equilibrium in the parameter scale of the original YOLOv10n model, this paper utilizes the Ghost module to optimize the feature extraction network. Where the Ghost module is shown in Fig. 3, it aims to integrate the balance of spatial information retention and network complexity to achieve a faster and more efficient image recognition model that provides better performance and usability for practical applications.

Fig. 3.

Fig. 3

Schematic showing the Ghost module.

Specifically, Ghost makes the point that the feature maps obtained by convolution can have other similar features to avoid compromising the overall performance of the model by having too many parameters. For the reduction of resources that is the convolution filter. If a data with width w, height h and number of channels c is given, then the number of parameters needed for the convolution operation to generate the feature map is: h×w×c×convolution kernel size×number of output channels. Then for the Ghost module, after obtaining a part of the feature map through the convolution operation, a linear transformation is performed to obtain the required feature map, as shown in Eq. (1):

graphic file with name d33e384.gif 1

where Inline graphic is the linear transformation and Inline graphic is the i-th feature map obtained by the ordinary convolution operation. The number of feature maps obtained by the linear transformation operation is the same as the number of feature maps obtained by the ordinary convolution operation. That is, under the condition of obtaining the same number of feature maps, the Ghost module operation is much smaller than the ordinary convolution operation.

The GhostBottleneck Structure primarily consists of two Ghost modules and residual blocks stacked together, with two specific stacking methods. As shown in Fig. 3, the GhostBottleneck Structure with a stride of 1 is composed of two Ghost modules connected in series. For input data features, the Ghost module is used to expand the number of data feature channels and then compress the number of data feature channels from bottom to top, ensuring that the number of output feature channels matches the number of input channels. Subsequently, feature sharing is achieved through the residual block, meaning that the input feature map and output feature map share features in the same manner. The GhostBottleneck structure with a stride of 2 is based on the GhostBottleneck structure with a stride of 1 and adds an additional convolution module with a stride of 2.

Introducing SMA module

In order to locate and detect the target feature information of steel surface defects more accurately, this paper proposes the SMA module. This module aims to establish the dependency relationship between the feature information of steel surface defect types through multi-scale parallel modules, and to fuse the feature information of these three parallel modules through cross-space learning and dot product learning methods. Each parallel module adopts a cross-channel approach for the interaction of steel surface defect feature information.

As illustrated in Fig. 4, the SMA structure is composed of three parallel processing branches. In the first branch, the input features are initially passed through a 3 × 3 convolutional layer, followed by both global maximum pooling (GMP) and global average pooling (GAP). The outputs from these pooling operations are then concatenated and fed into a 1 × 1 convolution layer to compress the feature dimensions. Subsequently, a Softmax function is applied to generate attention weights, which are then dot-multiplied with the initial convolutional output to produce the branch’s final output features. The second branch performs average pooling separately along the height and width dimensions of the feature map. These pooled outputs are concatenated and sent through a 1 × 1 convolution layer to reduce dimensionality, after which the processed data enters the cross-spatial attention unit. Similarly, the third branch applies a 3 × 3 convolution to the input feature map before feeding it into the same cross-spatial attention mechanism. Within this module, feature representations from the second and third branches are interactively refined. Finally, the refined output is element-wise multiplied with the results from the first branch to generate the overall output of the SMA module.

Fig. 4.

Fig. 4

Schematic showing the SMA module.

The SMA module and the lightweight Ghost module together form the feature extraction network for the LAM-YOLOv10n model. This is designed to improve the extraction of defect type features from the complex background of steel surfaces.

MFFN-based neck module

Traditional feature fusion approaches include both serial and parallel strategies. Serial fusion approach directly connects two sets of identical or different input features to generate a new feature vector whose dimension is equal to the sum of the two sets of input features. The parallel fusion approach fuses multiple sets of feature information into a composite vector. With the development of deep learning, two feature fusion approaches, FPN and PANet, but these fusion approaches tend to lead to the loss of information about the details of the input features and the presence of more redundancy information.

To address the above challenges, this study introduces a MFFN module, which is applied at the three output layers of the Neck component, as illustrated in (Fig. 5). This unit is specifically designed to integrate fine-grained features related to steel surface defects, thereby strengthening the model’s interpretative capability in complex defect scenarios. By refining the contextual understanding of the recognition system, the proposed module contributes to enhanced detection accuracy and improved consistency in performance.

Fig. 5.

Fig. 5

Schematic showing the MFFN module.

The input features of the MFFN module go through two branches, the first branch and the second branch, the first branch first goes through the GAP and the convolution module of size 3 × 3 to aggregate the spatial feature information of the feature map, and then obtains the corresponding weights through the Softmax operation, and then finally obtains the feature information by multiplying the obtained weights with the input feature information; the second branch goes through the local average pooling operation (LAP) and the local maximum pooling operation (LMP) to obtain a complete feature representation. The second branch is spliced by LAP and LMP to get a complete feature representation, and then multiplied with input features by convolution module of size 3 × 3 and Softmax to get feature information; finally, the and are spliced together to get the output feature information by the MFFN module after dimensionality reduction by convolution module of size 1 × 1. As shown in Eqs. (24):

graphic file with name d33e460.gif 2
graphic file with name d33e466.gif 3
graphic file with name d33e472.gif 4

Where Inline graphic is the input feature information of MFFN module, Inline graphic is the global average pooling operation, Inline graphic is the local maximum pooling operation, Inline graphic is the local average pooling operation, Inline graphic is the convolution operation of size 3 × 3, Inline graphic is the convolution operation of size 1 × 1, and Inline graphic is the output feature information of MFFN module.

Experimental results and analysis

Evaluation indicators and experimental parameters

In order to evaluate the precision of the LAM-YOLOv10n target detection model in the type of steel surface defect detection, this paper evaluates it using the commonly used metrics in target detection, such as precision (P), average precision (AP), mean average precision (mAP), recall (R), frames per second (FPS). As shown in Eqs. (59):

graphic file with name d33e535.gif 5
graphic file with name d33e541.gif 6

Where TP stands for correct positive samples, FP stands for error positive samples, FN stands for error negative samples,Inline graphicstands for confidence score, and Inline graphicstands for IoU threshold.

graphic file with name d33e561.gif 7
graphic file with name d33e567.gif 8
graphic file with name d33e573.gif 9

Where t is the time required by the target detection model to detect all categories of data, and Inline graphic is the total amount of data required to be detected by the target detection model, and N is the total number of target detection categories.

In this paper, all experiments were conducted on a computer with Intel i7-9700 K processor and NVIDIA GeForce GTX 1060Ti GPU graphics card, all models were trained for 100 epochs with batch size of 16, learning rate of 0.001, optimizer of Adam, confidence score of 0.35, IoU threshold of 0.7, and momentum of 0.9.

Dataset and preprocessing

The source of data for this experiment is the defect dataset of steel surface defects produced by Kechen Song’s team at Northeastern University35, which has six types of defects, namely crazing, inclusion, patches, pitted_surface, rolled-in_scale, and six types of scratches, as shown in (Fig. 6).

Fig. 6.

Fig. 6

Sample plot of the data set.

At the same time, in order to enrich the steel surface defects dataset, this paper utilizes a variety of image processing techniques, including image flipping, image cropping, brightness adjustment, contrast adjustment, adding noise and so on. On the one hand, it increases the diversity of the defect dataset, on the other hand, it improves the adaptability of the network model to the complex noise on the steel surface, and the processed dataset is named PRO-DataSet. Image flipping involves mirroring images horizontally or vertically to expand the diversity of the dataset. Image cropping involves randomly selecting regions of interest in the original image to produce multiple image fragments of different sizes and positions, thereby increasing the diversity of the dataset. In addition, adding Gaussian noise to images simulates visual interference that may exist in actual industrial environments, thereby improving the network model’s adaptability to complex environments in actual industrial scenarios.

PRO-DataSet steel surface defect type data set specific information as shown in Table 1, did not change the specific defect type, only expand the number of defect type target features. Among them, The number of crazing defect type targets is 402, and the number of target frames is 921; the number of inclusion defect type targets is 510, and the number of target frames is 896; the number of patches defect type targets is 323, and the number of target frames is 1,020; the number of rolled-in_scale defect type targets is 478, and the number of target frames is 1020; the number of targets for the pitted_surface defect type is 501 and the number of target frames is 965; the number of targets for the scratches defect type is 536 and the number of target frames is 1089.

Table 1.

PRO-Dataset data set information.

Data set name Typology Number(PCS) Target frame(PCS)
PRO-DataSet Crazing 402 921
Inclusion 510 896
Patches 323 1020
Rolled-in_scale 478 1402
Pitted_surface 501 965
scratches 536 1089

Ablation experiment

In the LAM-YOLOv10n network structure, this paper combines the GhostConv module, the SMA module and the MFFN module based on the original YOLOv10n model design. In order to verify the positive effects of GhostConv module, SMA module and MFFN module in steel surface defect detection, respectively, in this section, the ablation experiments are carried out by adding GhostConv module, SMA module and MFFN module one by one into the original YOLOv10n model, and the results of the ablation experiments are shown in (Table 2). In the table, the GC scheme represents the addition of the GhostConv module, the GC-SMA scheme represents the addition of the GhostConv module and the SMA module, and the GC-SEM-MFFN scheme represents the addition of the GhostConv module, the SMA module and the MFFN module. The evaluation indexes of the ablation experiment and the convergence process of network training are shown in (Fig. 7a–d).

Table 2.

Results of ablation experiments.

Methods MFFN SMA(SEM) GhostConv(GC) Precision(%) Recall(%)
YOLOv10n 93.49 91.02
GC 93.63 91.89
GC_SEM 93.76 92.77
GC_SEM_MFFN 96.96 93.73

Fig. 7.

Fig. 7

LAM-YOLOv10n ablation experiment indicators.

The experimental results show that the design of the LAM-YOLOv10n model with the multi-channel attention modules, SMA and GhostConv modules, and the MFFN fusion method has a significant enhancement on the localization of the steel surface defect type detection task. The ablation experiments with the gradual addition of the GhostConv module, the SMA module, and the MFFN module revealed that the gradual addition of these modules can enhance the detection precision of the network model, respectively. Among them, after adding the GC module, the detection precision rate of the model is improved by 0.24% points. After adding the SMA module, the model further improves by 0.13% points, indicating that the interaction between the SMA module and the GhostConv module helps to extract more accurate information about the surface defect features of steel. Finally, by adding the MFFN module, the model precision increased by 3.47% points, which shows that by fusing the deep and shallow feature information of the steel surface defect type, the precision of detection and positioning is further improved.

Comparison experiment

In order to verify the detection effect of LAM-YOLOv10n, the small target detection algorithm for steel surface defect types proposed in this paper, experiments are carried out on the PRO-DataSet steel surface defect dataset. The experimental results are shown in (Fig. 8). By comparing the experiments, the effectiveness of LAM-YOLOv10n detection algorithm in steel surface defect type detection can be better understood.

Fig. 8.

Fig. 8

Comparative experiment results of different versions of YOLOv10.

Among them, we selected different versions in the baseline network YOLOv10 model to conduct detailed comparison experiments with the LAM-YOLOv10n proposed in this chapter, which are YOLOv10n, YOLOv10s, YOLOv10m, YOLOv10b, YOLOv10l, and YOLOv10x. The experimental results show that, in the steel surface defect type detection tasks, the precision rates of YOLOv10n, YOLOv10s, YOLOv10m, YOLOv10b, YOLOv10l, and YOLOv10x are 93.49, 94.88, 95.12, 95.68, 97.34, and 97.79%, respectively. In comparison, the precision rate of the LAM-YOLOv10n network increased by 3.47% relative to YOLOv10n, increased by 2.08% relative to YOLOv10s, increased by 1.84% relative to YOLOv10m, increased by 1.28% relative to YOLOv10b, decreased by 0.38% relative to YOLOv10l, and YOLOv10x decreased by 0.83%. Although the precision of the LAM-YOLOv10n network does not reach the level of the individual versions of the baseline network YOLOv10, it is worth noting that the precision of the LAM-YOLOv10n network still improves by 3.47% in the YOLOv10n-based case. This result emphasizes the effectiveness of the algorithm proposed in this paper for steel surface defect type detection scenarios.

In contrast, the LAM-YOLOv10n model proposed in this paper aims to balance the requirements of real-time and precision, and to improve the performance of the network model in detecting steel surface defect types without significantly affecting the detection speed of the network model. The model architecture is optimized to enhance the ability to accurately detect and classify diverse types of steel surface defects, while ensuring that the overall inference speed remains suitable for deployment in real-world industrial scenarios where real-time performance is critical. This design philosophy is particularly important for deployment in real-world industrial environments where high precision and fast inference capabilities are critical to ensure timely and reliable defect identification. In addition, in order to further validate the effectiveness of the LAM-YOLOv10n model, we conduct a series of comparative experiments against several representative object detection networks, and the results of the comparative experiments are shown in (Table 3).

Table 3.

LAM-YOLO10n comparison experiment results.

Models Precision(%) Recall(%) Map@0.5(%) FPS(F/s)
SSD36 82.24 81.63 81.77 257
YOLOv5s37 85.67 83.02 84.27 236
Faster R-CNN38 87.11 85.23 86.55 274
YOLOv7-tiny39 91.89 89.74 89.86 207
YOLOv8n40 91.01 89.04 89.35 206
YOLOv11s41 93.83 91.36 92.35 168
YOLOv12n42 94.36 92.14 92.87 175
RT-DETR43 94.65 92.53 93.08 124
LAM-YOLOv10n 96.96 93.73 94.39 154

From Table 3, the LAM-YOLOv10n model proposed in this paper demonstrates significant advantages. For instance, in terms of precision metrics, the LAM-YOLOv10n model achieves the highest precision, outperforming YOLOv12n and RT-DETR by 2.6 and 2.31%, respectively. Although the FPS is slightly lower than that of YOLOv12n, the model sacrifices minimal speed while improving accuracy. This indicates that the LAM-YOLOv10n model performs best in terms of detection accuracy. It also demonstrates that the proposed LAM-YOLOv10n model achieves superior detection performance in practical application scenarios compared to steel surface defect type detection.

In addition, in order to visualize the detection effect of the LAM-YOLOv10n model, the LAM-YOLOv10n model and other five models with better detection precision are selected for visual comparison of the detection effect, as shown in (Fig. 9).

Fig. 9.

Fig. 9

Detection results of different algorithms.

As can be seen in Fig. 9, compared with the detection effect of the LAM-YOLOv10n model proposed in this paper, the other comparative algorithms have different degrees of leakage detection phenomenon, and the confidence level is low, which is attributed to the fact that this paper adopts the SMA module and the Ghost module to focus on the feature information of the defect types on the steel surface, especially in the background scenario of the complex steel surface. In addition, by introducing the MFFN module, multi-scale fusion of steel surface defect type feature information can be achieved, so that the LAM-YOLOv10n model can achieve optimal detection performance.

Discussion

In this paper, we introduce a model named LAM-YOLOv10n, developed to enhance the effectiveness of detecting defects on industrial steel surfaces. In practical application scenarios, the deep learning algorithms have high computational loads, which makes it difficult to realize real-time monitoring of defect types. In addition, due to the complex background of the steel surface, the existing deep learning network is prone to the loss of target feature information during the feature extraction process. The model firstly introduces the Ghost lightweight module, which effectively reduces the number of parameters of the model; secondly, the SMA module is designed, which focuses on the feature information extraction of the defect types on the steel surface; finally, the MFFN module is utilized to further enhance the multi-scale feature fusion effect of defect targets. The experimental results show that the LAM-YOLOv10n algorithm proposed in this paper improves the precision by 3.47% compared with the original network, which significantly outperforms the existing target detection models.

In this paper, we explore the industrial steel surface defect detection in theory and experiment, and analyze the mainstream deep learning algorithms for industrial steel surface defect detection in recent years. Considering that the detection algorithms need to meet the requirements of low latency and high precision in the actual industrial scenarios, we select the YOLOv10 model as the benchmark model for the research in this paper. In addition, a feature extraction network is constructed on the basis of YOLOv10 by introducing the GhostConv module and the SMA attention mechanism, which balances the spatial information retention and the network complexity, and establishes the dependency between the feature information of the steel surface defect types by means of a multi-scale parallel module, and fuses the three parallel modules by means of cross-spatial learning and pointwise multiplicative learning to merge these three feature information. Each parallel module adopts a cross-channel approach for the interaction of steel surface defect feature information, so as to realize a faster and more efficient image recognition model for industrial steel surface defects, and provide better performance and usability for practical applications. In addition, the fusion performance of the target characteristic in-formation of industrial steel surface defects is enhanced by introducing the MFFN module, and the information of each input feature is dynamically adjusted during the fusion process to obtain a higher precision. It is demonstrated experimentally that our designed LAM-YOLOv10n model achieves an excellent detection precision rate with a small number of parameters.

In addition, the research in this paper has some areas for improvement at the same time. First, the scenarios in this study were only experimented on experimental data, which achieved better results, and the subsequent work can explore the detection performance of the collected steel surface defect images in real industrial scenarios. Secondly, for real industrial scenarios, due to the interfering nature of the data, it is recommended that the concept of edge computing such as FPGA be placed at the front end of the image acquisition module for real-time preprocessing of industrial steel surface defect images to prevent the propagation of redundant information into the subsequent transmission and processing. Finally, a feature fusion network with stronger attention can be designed to focus on fusing the target feature information of industrial steel surface defect types to prevent the interference of redundant feature information and enhance the detection performance.

In addition to the above work, the detection of industrial steel surface defects can be improved in the following ways to enhance the detection performance:

  1. Using image segmentation technology to focus on the target edge feature information of steel surface defect types, so as to more accurately segment the steel surface defect types in the industrial video stream. Since the image segmentation technique performs pixel-level segmentation, it improves the defect type detection precision.

  2. Using infrared detection technology to detect defect types in industrial steel video streams with different light intensities. Due to the environmental impacts of image texture, noise, and light intensity captured by the capture image module in the actual scene, the data is first processed using infrared detection technology, and then the performance is improved by classification or detection techniques.

Conclusions

In this paper, a model LAM-YOLOv10n is proposed by selecting YOLOv10n as the benchmark network model for the current situation of steel surface defect type detection. The model first introduces the Ghost lightweight module to effectively reduce the number of model parameters, thereby reducing the computational complexity; secondly, the SMA module is designed to focus on extracting and enhancing the key feature information of steel surface defect types to ensure the accurate capture of fine defect targets under complex backgrounds; finally, the MFFN module is adopted to further enhance the detection ability of defect targets at different scales. Through experimental verification, the LAM-YOLOv10n algorithm proposed in this paper has achieved a high detection precision. This model can not only effectively improve the recognition rate of steel surface defect types, but also meet the dual requirements of real-time and precision in actual industrial detection scenarios, and can provide a more efficient and reliable solution for defect detection in the steel production process.

Author contributions

Conceptualization, Laomo Zhang; methodology, Laomo Zhang; software, Zhike Wang; validation, Ying Ma and Guowei Li; investigation, Zhike Wang; data curation, Zhike Wang; writing—original draft preparation, Laomo Zhang; writing-review and editing, Guowei Li; visualization, Ying Ma.

Funding

This work is supported by the Higher Education Key Scientific Research Program Funded of by Henan Province under Grant 22A520023 and 24A520008, Doctoral Cultivation Fund Project of Henan University of Engineering under Grant D2022030 and D2022032, Henan Province Science and Technology Research Projects under Grant 232102210118.6.

Data availability

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Zhou, X., Fang, H., Fei, X. & Zhang, J. Edge-aware multi-level interactive network for salient object detection of strip steel surface defects. IEEE Access.9, 149465–149476 (2021). [Google Scholar]
  • 2.Yu, J., Cheng, X. & Li, Q. Surface defect detection of steel strips based on anchor-free network with channel attention and bidirectional feature fusion. IEEE T Instrum. Meas.71, 1–10 (2021). [Google Scholar]
  • 3.Qiao, J., Sun, C., Cheng, X., Yang, J. & Chen, N. Stainless steel cylindrical pot outer surface defect detection method based on cascade neural network. Meas. Sci. Technol.35, 036201 (2023). [Google Scholar]
  • 4.Neogi, N., Mohanta, D. K. & Dutta, P. K. Review of vision-based steel surface inspection systems. J. Image. Video. Proc. 1–19 (2014).
  • 5.Tang, B., Chen, L., Sun, W. & Lin, Z. Review of surface defect detection of steel products based on machine vision. IET Image Process.17, 303–322 (2023). [Google Scholar]
  • 6.Yeung, C. C. & Lam, K. M. Efficient fused-attention model for steel surface defect detection. IEEE T Instrum. Meas.71, 1–11 (2022). [Google Scholar]
  • 7.Chu, M., Gong, R., Gao, S. & Zhao, J. Steel surface defects recognition based on multi-type statistical features and enhanced twin support vector machine. Chemometr Intell. Lab.171, 140–150 (2017). [Google Scholar]
  • 8.Liu, K. et al. Steel surface defect detection using a new Haar–Weibull-variance model in unsupervised manner. IEEE T Instrum. Meas.66, 2585–2596 (2017). [Google Scholar]
  • 9.Ghorai, S., Mukherjee, A., Gangadaran, M. & Dutta, P. K. Automatic defect detection on hot-rolled flat steel products. IEEE T Instrum. Meas.62, 612–621 (2012). [Google Scholar]
  • 10.Wang, G. et al. Multifrequency AC magnetic flux leakage testing for the detection of surface and backside defects in Thick steel plates. IEEE Magn. Lett.13, 1–5 (2022). [Google Scholar]
  • 11.Jing, X., Yang, X. Y., Xu, C. H., Chen, G. & Ge, S. Infrared thermal images detecting surface defect of steel specimen based on morphological algorithm. J. China Univ. Pet.36, 146–150 (2012). [Google Scholar]
  • 12.Meng, X., Lu, M., Yin, W., Bennecer, A. & Kirk, K. J. Evaluation of coating thickness using lift-off insensitivity of eddy current sensor. Sensors21, 419 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Luo, Q., Fang, X., Liu, L., Yang, C. & Sun, Y. Automated visual defect detection for flat steel surface: A survey. IEEE T Instrum. Meas.69, 626–644 (2020). [Google Scholar]
  • 14.Huang, X., Zhu, J. & Huo, Y. SSA-YOLO: an improved YOLO for hot-rolled strip steel surface defect detection. IEEE T Instrum. Meas.73, 5040017 (2024). [Google Scholar]
  • 15.Li, Z., Tai, Y., Huang, Z., Peng, T. & Zhang, Z. MPFANet: a multipath feature aggregation network for steel surface defect detection. Meas. Sci. Technol.35, 045409 (2024). [Google Scholar]
  • 16.Wang, A. et al. Yolov10: Real-time end-to-end object detection. ArXiv2405, 14458 (2024). [Google Scholar]
  • 17.Xu, K., Ai, Y. & Wu, X. Application of multi-scale feature extraction to surface defect classification of hot-rolled steels. Int. J. Min. Metall. Mater.20, 37–41 (2013). [Google Scholar]
  • 18.Hu, H. J., Li, Y. X., Liu, M. F. & Liang, W. H. Steel strip surface defects classification based on machine learning. Comput. Eng. Des.35, 620–624 (2014). [Google Scholar]
  • 19.Liu, Y., Jin, Y. & Ma, H. Surface defect classification of steels based on ensemble of extreme learning machines. WRC Symp. Adv. Robot. Autom. (WRC SARA). 203–208 (2019).
  • 20.Soukup, D. & Huber-Mörk, R. Convolutional neural networks for steel surface defect detection from photometric stereo images. Int. Symp. Vis. Comput.8887, 668–677 (2014).
  • 21.Yi, L., Li, G. & Jiang, M. An end-to‐end steel strip surface defects recognition system based on convolutional neural networks. Steel Res. Int.88, 1600068 (2017). [Google Scholar]
  • 22.Damacharla, P., Rao, A., Ringenberg, J. & Javaid, A. Y. TLU-net: a deep learning approach for automatic steel surface defect detection. Int Conf. Appl. Artif. Intell. (ICAPAI). 1–6 (2021).
  • 23.Uraon, P. K., Verma, A. & Badholia, A. Steel sheet defect detection using feature pyramid network and RESNET. Int Conf. Edge Comput. Appl. (ICECAA) 1543–1550 (2022).
  • 24.Bouguettaya, A., Mentouri, Z. & Zarzour, H. Deep ensemble transfer learning-based approach for classifying hot-rolled steel strips surface defects. Int. J. Adv. Manuf. Technol.125, 5313–5322 (2023). [Google Scholar]
  • 25.Akhyar, F., Liu, Y., Hsu, C. Y., Shih, T. K. & Lin, C. Y. FDD: a deep learning–based steel defect detectors. Int. J. Adv. Manuf. Technol.126, 1093–1107 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Xia, K. et al. Mixed receptive fields augmented YOLO with multi-path Spatial pyramid pooling for steel surface defect detection. Sensors23, 5114 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Raj, G. D. & Prabadevi, B. Steel strip quality assurance with yolov7-csf: a coordinate attention and Siou fusion approach. IEEE Access.11, 129493–129506 (2023). [Google Scholar]
  • 28.Huang, Y., Tan, W., Li, L. & Wu, L. Wfre-yolov8s: a new type of defect detector for steel surfaces. Coatings13, 2011 (2023). [Google Scholar]
  • 29.He, Y., Song, K., Meng, Q. & Yan, Y. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features. IEEE T Instrum. Meas.69, 1493–1504 (2019). [Google Scholar]
  • 30.Amin, D. & Akhter, S. Deep learning-based defect detection system in steel sheet surfaces. IEEE Reg. 10 Symp. (TENSYMP). 444–448 (2020).
  • 31.Tabernik, D., Šela, S., Skvarč, J. & Skocaj, D. Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf.31, 759–776 (2020). [Google Scholar]
  • 32.Demir, K., Ay, M., Cavas, M. & Demir, F. Automated steel surface defect detection and classification using a new deep learning-based approach. Neural Comput. Appl.35, 8389–8406 (2023). [Google Scholar]
  • 33.Li, Z., Wei, X., Hassaballah, M., Li, Y. & Jiang, X. A deep learning model for steel surface defect detection. Complex. Intell. Syst.10, 885–897 (2024). [Google Scholar]
  • 34.Han, K. et al. Ghostnet: More features from cheap operations. Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recogn. 1580–1589 (2020).
  • 35.Song, K. & Yan, Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surf. Sci.285, 858–864 (2013). [Google Scholar]
  • 36.Liu, W. et al. Ssd: single shot multibox detector. Eur. Conf. Comput. Vis. (ECCV). 9905, 21–37 (2016). [Google Scholar]
  • 37.Redmon, J., Divvala, S., Girshick, R. & Farhadr, A. You only look once: Unified, real-time object detection. Proc. IEEE Conf. Comput. Vis. Pattern Recogn. 779–788 (2016).
  • 38.Girshick, R. Fast R-CNN, Proc. IEEE Int. Conf. Comput. Vis. 1440–1448 (2015).
  • 39.Cheng, P. et al. Tiny-YOLOv7: tiny object detection model for drone imagery. Int. Conf. Image Graph. 14357, 53–65 (2023). [Google Scholar]
  • 40.Reis, D., Kupec, J., Hong, J. & Daoudi, A. Real-time flying object detection with YOLOv8. ArXiv Preprint arXiv.2305.09972. (2023).
  • 41.Khanam, R. & Hussain, M. Yolov11: an overview of the key architectural enhancements. ArXiv Preprint arXiv2410.17725. (2024).
  • 42.Tian, Y., Ye, Q. & Doermann, D. Yolov12: Attention-centric real-time object detectors. ArXiv Preprint arXiv.2502.12524. (2025).
  • 43.Zhao, Y. et al. Detrs beat yolos on real-time object detection. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. 16965–16974 (2024).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES