Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2026 Jan 23;16:6060. doi: 10.1038/s41598-026-37153-2

Research on multi-scale feature detection of open-pit mine road cracks

Liang Wang 1, Meiling Zhao 1, Zehao Yu 1, Guangxin Yang 1, Qingxu Wang 1, Guangwei Liu 2, Jian Lei 2,
PMCID: PMC12901067  PMID: 41577981

Abstract

Road crack detection in open-pit mines is of great significance for ensuring the safety and efficiency of mining production. Traditional detection methods and existing deep learning-based approaches have numerous limitations. This paper proposes an open-pit mine road crack detection method based on feature fusion, which introduces an Adaptive Feature Fusion Module (AF2M) and a Channel-Spatial Attention Module (CASP) into the original U-Net network, and optimizes the model using the Layer-Adaptive Magnitude Pruning (LAMP) algorithm.The AF2M module first unifies the scale of four-level multi-scale feature maps from the decoder through upsampling, concatenates them along the channel dimension, then exploits channel dependencies via the ECA (Efficient Channel Attention) module and fuses global-local contextual information using the GC (Global Context) module. It dynamically weights features from different dimensions to enhance key crack features and reduce background interference, whose multi-scale fusion capability surpasses the inherent processing capacity of the U-Net’s native encoder-decoder structure.The CASP module innovatively applies the multi-head self-attention mechanism to channel-wise interaction: it reconstructs channel attention through dimension transformation and QTK (Query-Key-Value) matrix operations, then fuses pooled spatial information to impose spatial attention. Compared with traditional attention mechanisms such as SE (Squeeze-and-Excitation) and CBAM (Convolutional Block Attention Module), it achieves deep synergy of channel-spatial information and improves crack localization accuracy.The LAMP algorithm rescales and sorts the weights of each layer through a unique scoring mechanism, adaptively assigns sparsity at the layer level, and prunes redundant weights within a pruning rate range of 0-0.3 (non-uniformly applied to all layers), ensuring that key feature extraction remains unaffected.Experiments were conducted on a dataset consisting of 2,847 high-resolution images collected from an open-pit coal mine in Inner Mongolia. The results show that the improved model achieves a mean Intersection over Union (mIoU) of 0.83, precision of 0.89, and F1-score of 0.82, representing improvements of 7%, 7%, and 9% respectively compared to the original U-Net. Additionally, the model parameters are reduced to 4.73 M (a 24.1% decrease), the Floating-Point Operations (FLOPs) are 4.25G (a 28.7% decrease), and the inference time per image is 0.30s (a 33.3% speedup).This method exhibits significant advantages in detection accuracy and model complexity, and can effectively meet the requirements of open-pit mine road crack detection, providing a reliable basis for mine road maintenance. Combined with technologies such as UAVs (Unmanned Aerial Vehicles) and GIS (Geographic Information Systems), it is expected to promote the intelligent development of open-pit mine road maintenance.

Keywords: Open-pit mine road, Crack detection, Feature fusion, Road safety, U-Net

Subject terms: Engineering, Mathematics and computing

Introduction

In open-pit mining operations, roads serve as critical infrastructure for transportation, and their conditions directly impact production efficiency and safety. Road cracks are a common defect; if not promptly detected and addressed, they will continue to develop, reducing road service life, increasing transportation costs, and even triggering safety accidents. Therefore, accurate detection of road cracks in open-pit mines is crucial1.

Traditional road crack detection methods in open-pit mines mainly rely on manual inspections, where workers use simple tools to conduct close-range observations and measurements of road surfaces based on experience. Although this method can identify obvious cracks, it suffers from numerous drawbacks, such as extremely low efficiency, detection accuracy highly influenced by subjective factors, and difficulty in achieving real-time monitoring2. With the development of computer technology, image processing-based crack detection methods have gradually emerged. Early classical algorithms like threshold segmentation and edge detection struggle to adapt to the high-interference scenarios of open-pit mine roads—road surfaces are often covered with debris such as ore fragments and tire marks, and severe lighting changes and uneven pavement materials in mining areas lead to complex image grayscale distributions. These issues easily result in missegmentation between cracks and backgrounds or misjudgment of debris edges as cracks, significantly reducing detection accuracy3.

Among CNN-based crack detection methods, classical network models have been applied but fail to specifically address the unique challenges of open-pit mines. The U-Net network has demonstrated excellent performance in medical image segmentation due to its encoder-decoder structure. Its inherent design—extracting multi-scale semantic features through encoder downsampling and restoring spatial details via decoder upsampling—provides basic multi-scale processing capabilities4. However, it exhibits obvious shortcomings in open-pit mine road detection: cracks in mining areas are mostly small, discontinuously distributed, and often partially obscured by ore piles or waterlogged areas. During the convolution and upsampling processes of the U-Net decoder, the fusion of multi-scale features relies only on simple concatenation and convolution, lacking the enhancement of key features and suppression of redundant information. This leads to the easy loss of critical semantic and positional features, insufficient accuracy in extracting key crack information (e.g., boundaries, trends, and lengths), and weak recognition ability for small or obscured cracks, making it difficult to meet the safety detection requirements of mining areas. Other deep learning-based improved methods also have limitations in scenario adaptation: the fusion network of CNN and scale-adaptive Transformer proposed by Gui Yan et al.5 improves segmentation accuracy in complex scenes but lacks optimization for the high noise of open-pit mine pavements, resulting in insufficient generalization in mining environments; the Res-U-Net model constructed by Jiang Song et al.6 focuses on landslide recognition and does not address core requirements in crack detection such as “small target capture” and “background interference suppression”; the method proposed by Luo Yaowei et al.7 is designed for underwater bridge structures and cannot adapt to the characteristics of open-pit mine pavements, such as severe lighting changes and diverse crack morphologies. Existing deep learning solutions generally lack targeted design for the special environment of open-pit mines, and their detection accuracy and robustness are difficult to meet practical needs.

Feature fusion, attention mechanisms, and model optimization are key technical directions to improve image segmentation performance and adapt to complex scenes. In the field of feature fusion, common strategies include weighted fusion, concatenation-based fusion, and attention-based fusion: weighted fusion integrates information by assigning different weights to different feature maps, but the weights are mostly fixed, resulting in insufficient flexibility; concatenation-based fusion directly splices multi-scale features, which easily introduces redundant information; attention-based fusion can adaptively focus on key features, but existing schemes still have room for improvement in the fine balance of multi-scale features. In scenarios such as pavement crack detection, there is a need to further enhance the ability to extract target features in complex backgrounds. Attention mechanisms, which simulate the focusing characteristics of human vision to highlight key information and suppress irrelevant interference, have been widely applied in image processing tasks. Common types include channel attention (e.g., SE module), spatial attention (e.g., spatial branch of CBAM), and self-attention mechanisms: channel attention focuses on the importance of different feature channels, spatial attention pays attention to the spatial location information of targets, and self-attention mechanisms can capture feature dependencies within the global scope. However, a single type of attention mechanism is difficult to simultaneously meet the dual requirements of strengthening channel correlations and achieving accurate spatial localization in complex scenes. In terms of model optimization, model compression and acceleration technologies are core means to address the problems of deep learning models, such as large computational volume, high storage requirements, and difficulty in deployment on edge devices. These technologies mainly include model pruning, quantization, and knowledge distillation: quantization reduces computational complexity by converting high-precision floating-point weights into low-precision integers but may lose partial accuracy; knowledge distillation trains small models by transferring knowledge from large models, which requires additional training of teacher models and involves a complex process; pruning directly reduces model size and computational volume while maintaining performance by removing redundant neurons or connections in the model, making it the preferred solution for edge device deployment. However, its performance depends on the rationality of pruning strategies, and how to adaptively retain key features has become a critical challenge.

To address the shortcomings of existing methods, this paper focuses on the core pain points of open-pit mine road crack detection and proposes a feature fusion-based detection method with innovations in network structure, attention mechanisms, and model optimization. The specific innovations and technical selection basis of this paper are as follows:

An Adaptive Feature Fusion Module (AF2M) is proposed to break through the limitations of U-Net’s native multi-scale processing. This module first unifies the scale of multi-scale feature maps through upsampling, integrates multi-dimensional information via channel concatenation, then exploits channel dependencies using the ECA module and fuses global-local contextual information with the GC module to dynamically adjust the weights of features at different scales. Compared with the simple feature concatenation of U-Net, AF2M achieves a better balance between fine details (shallow features) and semantic context (deep features), strengthens key crack features, effectively weakens background interference, and accurately retains crack edge contours.

A Channel-Spatial Attention Module (CASP) is designed to innovatively integrate channel and spatial information. Drawing on the idea of self-attention mechanisms, this module reconstructs channel attention through dimension transformation and QTK matrix operations, breaking through the limitation of traditional channel attention that only relies on pooling statistics to more fully capture inter-channel correlations. It then fuses pooled spatial information to impose spatial attention, achieving deep synergy of channel-spatial information. Compared with existing attention mechanisms such as SE and CBAM, it can more accurately locate crack targets and suppress noise interference.

The Layer-Adaptive Magnitude Pruning (LAMP) algorithm is selected for model optimization. Compared with traditional pruning methods (e.g., uniform pruning, pruning based on absolute weight values), its core advantage lies in layer adaptability—it rescales and sorts the weights of each layer through a unique scoring mechanism and assigns adaptive sparsity at the layer level instead of applying a uniform pruning rate. This enables precise retention of key feature extraction weights in the AF2M and CASP modules under aggressive pruning (pruning rate range of 0-0.3), avoiding damage to core functions while significantly reducing model computational volume and size, making it suitable for deployment on edge devices in open-pit mines.

Open-pit mine road crack detection method

Open-pit mines are crucial locations for mineral resource extraction, and the stability and safety of their roads are crucial for the smooth operation of production operations. Road cracks are a common problem in open-pit mines. They not only reduce road life and increase maintenance costs, but can also affect the safety of transport vehicles, severely impacting the overall operational efficiency of the mine. Timely and accurate detection of road cracks and effective repair measures can significantly reduce transportation disruptions and accidents caused by road damage, ensuring safe and efficient production in open-pit mines. While various methods for detecting cracks in open-pit mine roads exist, they generally suffer from limitations. Traditional image processing-based methods, such as threshold segmentation and edge detection, are computationally simple and easy to implement, but they lack adaptability to complex environments. In conditions of uneven illumination and severe background interference, detection accuracy is low, and missed and false detections are prone to occur. While deep learning-based methods have improved detection accuracy to some extent, many models are complex, computationally intensive, and require high hardware equipment, making them difficult to meet the needs of real-time detection in open-pit mines. Moreover, these models still have low performance when dealing with small-sized cracks and obscured cracks. In view of this, this paper constructs a feature-fused open-pit mine road crack detection technology, the architecture of which is shown in Fig. 1:

Fig. 1.

Fig. 1

The improved model structure diagram.

This model makes the following improvements based on the characteristics of open-pit mine roads:

(1) To address the problem that the original U-Net network decoder is prone to losing key semantic and position features when processing crack detection on open-pit mine roads, the AF2M module is introduced. It upsamples and splices feature maps of different scales, and then processes them through the ECA module and GC module, respectively, and dynamically weights and adjusts the features of each dimension8. This can highlight the key features of the cracks, enhance attention to the crack area, effectively weaken the influence of background factors, accurately retain the edge contour of the cracks, and improve detection accuracy and reliability.

(2) Inspired by the self-attention mechanism, the CASP module is proposed to better capture crack texture and spatial information. This module first realizes information interaction between channels through dimensional transformation, linear transformation, and a multi-head self-attention mechanism to strengthen channel correlation; then obtains and fuses spatial information through operations such as pooling and convolution, and applies spatial attention. Compared with the traditional attention module, it can encode channel information from a new perspective, effectively suppress noise interference, highlight crack features, and improve the model’s ability to locate and identify crack targets.

(3) Considering the high computational and storage requirements of deep learning models for image processing, which hinder real-time processing and edge device deployment, we selected the Layer Adaptive Amplitude Pruning (LAMP) algorithm to optimize the model. This algorithm uses a unique scoring mechanism to rescale and sort convolutional layer weights, removing unimportant weights and achieving adaptive sparsity. Experiments have shown that a pruning rate range of 0-0.3 effectively reduces model computation and size, shortens inference time, and improves or maintains detection accuracy9.

Improved crack detection method for open-pit mine roads

Introduction of adaptive feature fusion

In the decoder of the original U-Net network, each layer only employs traditional convolution and upsampling operations. During the iterative process, key semantic and positional features of the image are prone to loss. In the scenario of open-pit mine road crack detection, the complex environment obscures critical information such as crack boundaries, orientations, and lengths. Fine cracks become even more difficult to identify, leading to a significant decline in detection accuracy. To address this issue, this paper introduces the Adaptive Feature Fusion Module (AF2M) into each layer of the network decoder to achieve effective feature fusion and learning.

The feature maps processed by the AF2M module are derived from 4 different levels of the U-Net encoder, corresponding to four downsampling scales (1/2, 1/4, 1/8, and 1/16). These feature maps cover multi-dimensional feature information ranging from shallow-level details to deep-level semantics. The specific operational workflow is as follows:

Perform upsampling on the feature maps of these four different scales to unify them to the same scale. Subsequently, concatenate them along the channel dimension to initially integrate multi-scale information;

Input the concatenated feature maps into the Efficient Channel Attention (ECA) module. This module first performs global average pooling on the input feature maps, then exploits inter-channel dependencies through one-dimensional convolution operations. A channel attention map is generated via the Sigmoid function, which is then multiplied pixel-wise with the input feature maps to adaptively assign weights to each channel, highlighting important channel features related to cracks;

Simultaneously, directly input the shallow feature maps output by the encoder into the Global Context (GC) module. These feature maps are first processed by a 1 × 1 convolution module and a Softmax activation function, then multiplied with the input feature maps to obtain global contextual information feature maps. Subsequently, two 1 × 1 convolution modules and a LayerNorm module are used to capture channel dependencies. Finally, the result is added to the original input feature maps to complete the fusion of global and local contextual information;

Finally, add the feature maps output by the ECA module and the GC module at the element-wise level to obtain the final output of the AF2M module.

In this study, both the ECA module and the GC module adopt their original structures without any structural modifications. The innovation lies in their collaborative application mode within the multi-scale feature fusion framework. The ECA module focuses on mining local dependencies in the channel dimension, while the GC module emphasizes the capture and optimization of global contextual information. Their combination achieves dual optimization of “local channel importance screening and global contextual supplementation”.

The core “dynamic weight adjustment” of the AF2M module is realized through a dual mechanism: first, based on the channel statistical information from the ECA module, dynamic channel weight assignment is performed by learning inter-channel correlations to enhance the contribution of channels dominated by crack features; second, with the help of global contextual information from the GC module, adaptive calibration of local feature weights is conducted, making weight assignment guided by both local channel responses and global scene information. This dynamic weighting method can accurately highlight key crack features, weaken background interference such as pavement and shadows, and adaptively adjust and enhance edge and detail information to precisely retain crack edge contours. Ultimately, it improves the detection accuracy and reliability of fine and irregular cracks. Its structure is illustrated in Fig. 2.

Fig. 2.

Fig. 2

Adaptive feature fusion ( AF2M ) module.

Specifically, for the four different scale feature maps in the decoder part, the upsampling operation is first used to uniformly adjust their scales. After the scale adjustment is completed, the four feature maps are spliced in the channel dimension. Subsequently, the spliced feature maps are input into the ECA module. In this module, the correlation between each channel is calculated, and the weight of each channel is adaptively assigned using the local convolution operation to highlight the important channel features10. At the same time, the shallow feature map is input into the GC module. The GC module calculates the global context information of the shallow features and optimizes the fusion effect of the global context information and the local feature information adaptively, so that the model can better combine global and local information. Finally, the feature map generated by the ECA module and the feature map generated by the GC module are added at the element level to obtain the final output of the AF2M module. The following are the detailed steps.

graphic file with name d33e309.gif 1
graphic file with name d33e313.gif 2
graphic file with name d33e317.gif 3

Firstly, the up-sampling of I1, I2, and I3 is up-sampled by the UP operation, and then they are spliced with I in the channel dimension by the cat operation. Finally, the splicing results are input into the ECA module to obtain the output Iα. The shallow feature map I is directly input into the GC module to obtain the output Iβ. The final output O of the AF2M module is obtained by adding Iα and Iβ in the element dimension.

The architecture of the ECA module is shown in Fig. 3. Specifically, the input feature map is first processed by global average pooling ( GAP ), and then the one-dimensional convolution operation is used to mine the dependencies between channels. Then, the channel attention map is generated by using the Sigmoid function11. Finally, the generated channel attention map is multiplied by the input feature map of the ECA module pixel by pixel to obtain the output result of the ECA module.

Fig. 3.

Fig. 3

Channel Attention ( ECA ) module.

The structure of GC module is shown in Fig. 4. The specific process is as follows: the input feature map is first processed by the 1 × 1 convolution module, then processed by the Softmax activation function, and finally multiplied by the input feature map to obtain the global context information feature map. Subsequently, the feature map will pass through two 1 × 1 convolution modules and a LayerNorm module in turn to capture the dependencies between channels12. Finally, the final output result of the GC module can be obtained by adding the processed feature map to the original input feature map.

Fig. 4.

Fig. 4

Global Context Attention ( OG ) module.

The channel Spatial attention mechanism is introduced

Although the single attention mechanism has been proven to be effective in most tasks, the detection of road cracks in open-pit mines faces challenges such as complex background and small cracks, and needs to be gradually refined from multi-scale, multi-dimensional and multi-stage features. The four attention modules proposed in this paper are not simply stacked, but are based on the three-level collaborative logic design of “integration-optimization-determination”. AF2M cross-scale fusion solves the problem of blurring crack edges. ECA + OG optimizes the fusion results from the channel and global context dimensions, respectively. CASP performs channel-space joint refining before output to suppress noise and precise positioning. The collaborative workflow is shown in Fig. 5 below:

Fig. 5.

Fig. 5

Collaborative flow chart of different attention mechanisms.

In this paper, the “multi-attention synergy mechanism” is introduced for the first time in open-pit mine fracture detection, and the full-link optimization from feature fusion to noise suppression is realized through the four-cascade collaboration of AF2M, ECA, OG and CASP, which significantly improves the detection accuracy and robustness in complex scenarios.

In the detection of road cracks in open-pit mines, the detection object is cracks. Therefore, it is necessary to pay attention to the texture features of cracks and the spatial information of cracks. Inspired by the self-attention mechanism, in order to better capture the relationship between channels and the spatial information of cracks, this paper proposes a new channel spatial attention module ( CASP ), whose structure is shown in Fig. 6.

Fig. 6.

Fig. 6

Channel Spatial Attention Module ( CASP ) structure.

Firstly, the input feature x ( 16 × 48 × 48 ) is transformed into a sequence of 16 × 2304. Then, a linear transformation is performed through the fully connected layer to help the model learn the complex representation of input features. Then, in order to realize the multi-head self-attention mechanism on the channel dimension, the reshaping operation is used to segment on the channel dimension C. The number of channels for each header after segmentation is ( C / N ), and each group generates a tensor ( QKV ) containing a query ( Q ), key ( K ), and value ( V ) matrix. After taking out the Q, K, and V matrices, the Q matrix is transposed and dotted with the K matrix, and the sequence point relationship calculation ( QKT ) in the self-attention mechanism is transformed into the inter-channel relationship calculation ( QTK ) to realize the channel information interaction13, and then the channel attention weight matrix ( G ) with channel interrelationship is obtained. The channel attention mechanism formula is as follows :

graphic file with name d33e421.gif 4

Inline graphic Equation ( 4 ) is a scaling operation used to adjust the scale of the attention weight. With the above steps, the model can capture the interaction and dependence between the channels in the input feature map.

Compared with the SE attention module, this module realizes inter-channel information coding from a new perspective, which makes up for the defect that the SE attention module ignores the spatial location information of cracks.

Compared with the CBAM attention module, which only relies on average pooling and maximum pooling to construct a channel attention mechanism, this module draws on the idea of a self-attention mechanism to calculate channel relationship, assists the model to learn key channel information, more effectively suppresses potential noise interference, improves channel space weight with crack features, and highlights crack features14.

The module integrates cross-channel interaction information and crack spatial location information to help the model better understand the characteristics of input data and more accurately locate and identify the crack target of open-pit mine roads.

Model compression

The increasing complexity of deep learning models in image processing tasks has led to a sharp increase in the computational and storage requirements of deep neural networks, which has been widely observed. Real-time processing and edge device deployment, therefore, face severe challenges, which can be confirmed by the current research status. The core goal of model compression technology rising in this context is to reduce the size of the model and reduce the computational complexity. Maintaining high performance while adapting to real-time reasoning and low-resource devices has become the main direction of this technology development. Pruning, quantization, and knowledge distillation constitute the current mainstream model compression technology system. The removal operation of redundant neurons or connections in the model is mainly realized by pruning technology. The reduction of computation and storage requirements can achieve the purpose of model simplification through this technology. Quantization technology plays a key role in the conversion process from high-precision floating-point weights to low-precision integer. The model storage occupancy and computational complexity reduction requirements are satisfied. The performance retention effect of the knowledge distillation method that transfers large model knowledge to a small model has been supported by multiple sets of experimental data.

This paper mainly selects pruning to optimize the calculation amount and size of the model. The pruning technology selects the pruning algorithm based on layer adaptive amplitude ( LAMP ) for the pruning operation. The algorithm introduces a novel scoring mechanism, which rescales and sorts the weights of the convolutional layer or other layers, and then removes those less important weights to achieve adaptive sparsity of the layer15. As the core formula of the LAMP algorithm, formula ( 5 ) can be used to calculate the weight score. Here, u and v denote the index of the weight vector, W [ u ] and W [ v ] denote the weight of the index u and v, respectively.

graphic file with name d33e450.gif 5

The following pseudo-code shows the process of pruning the entire model :

Algorithm 1.

Algorithm 1

Layer-adaptive magnitude-based pruning.

First, the sum of the squares of the weights of each layer is calculated, and the numerical values are sorted from smallest to largest. Then, for each weight vector in each layer, solve its LAMP score one by one. The weighted LAMP scores of all levels are then summarized and sorted uniformly again. Then, the mask matrix element corresponding to the weight with the lowest score is assigned to 0, and finally the weight matrix is updated according to Eq. (6), and this process is cycled until the weight matrix meets the preset sparsity requirements. The figure below is as follows (6):

graphic file with name d33e471.gif 6

where Inline graphic is the weight matrix of the pruned layer i, and Inline graphic represents the weight matrix of layer i.

Firstly, the square sum of the weights of each layer is calculated and arranged in ascending order. Then, the LAMP score is calculated for each weight vector in each layer. Subsequently, the LAMP scores of the ownership weight are collected and sorted[16]. After that, the element corresponding to the weight of the minimum score in the mask matrix is assigned to 0. Repeat the above operation until a preset sparsity is reached.

Experimental results and analysis

Dataset and experimental settings

The self-built dataset used in this paper contains a total of 2,847 high-resolution images collected from the actual transportation roads of a large open-pit coal mine in Inner Mongolia. Data collection is carried out through high-definition cameras along open-pit mine roads under different weather conditions to ensure that the collected data can fully reflect the complex and changeable environment of open-pit mine roads. The final image data includes various types of cracks, such as transverse cracks, longitudinal cracks, mesh cracks, etc., as well as different degrees of uneven lighting, shadow interference, and road debris. The Figs. 7 and 8 below shows some of the data and the annotated datasets. The annotation protocol uses the LabelMe 3.16.7 tool for pixel-level annotation, and the image is uniformly scaled to 2048 × 1024 resolution before annotation. The crack width must be ≥ 2 pixels (corresponding to the actual width ≥ 3 mm) to be marked as an effective crack area; Cracks with insufficient width or severe occlusion (occlusion ratio > 50%) will not be labeled as invalid samples. All labels are cross-checked by two engineers with more than 3 years of experience in mine road maintenance, and inconsistencies are arbitrated by a third senior engineer to ensure consistency (Kappa coefficient = 0.91).

Fig. 7.

Fig. 7

Partial data set display.

Fig. 8.

Fig. 8

Labeled data set display.

Evaluation indicators

  1. Experimental setup.

The experimental platform configuration in this paper is shown in Table 1.

Table 1.

Experimental platform environment configuration.

Name Edition information
CPU AMD Ryzen 7 5800 H
Operating system Windows 10
Memory/Gib 64
GPU NVIDIA GeForce RTX 4060
CUDA 12.1
PyTorch 2.5.0
Python 3.10.0
Video memory 8G

During the training process, the data set is divided into training set, validation set and test set, with a ratio of 7: 2: 1. The training set is used to learn the model parameters, the validation set is used to adjust the hyperparameters of the model during the training process, and the test set is used to evaluate the performance of the final model. The optimizer selects Adam, whose adaptive adjustment of learning rate can converge quickly at the beginning of training, and avoid excessive oscillation when approaching the optimal solution. The learning rate is set to 0.001, and the weight attenuation coefficient is set to 0.0001, so as to avoid overfitting of the model. At the same time, the batch size during training was set to 16, which achieved a good balance between model training speed and memory usage after many experiments. The total number of training rounds (epochs) is set to 100. By observing the performance indicators on the verification set, it is judged whether the model converges, ensuring that the model can fully learn the data features without overfitting.

  • 2)

    Evaluation indicators.

In order to comprehensively and objectively evaluate the performance of the improved open-pit mine road crack detection model, the following evaluation indicators are selected. Mean Intersection over Union ( mIoU ) is one of the important indices to evaluate the performance of an image segmentation model17. In the crack detection of open-pit mine roads, its calculation method is to divide the intersection area of the predicted crack area and the real crack area by the union area of the two, and then average all the samples. The formula is :$$$$

graphic file with name d33e610.gif 7

Ai represents the predicted fracture area in the i-th sample, and represents the real fracture area in the i-th sample; n is the total number of samples. The closer the value of mIoU is to 1, the better the segmentation effect of the model on the fracture area is; that is, the model can accurately identify the location and range of the fracture and effectively distinguish the fracture from the background. For example, if mIoU reaches 0.8, it means that the fracture area predicted by the model has an average of 80% overlap with the real fracture area.

The accuracy rate represents the proportion of the actual positive samples in all the positive samples predicted by the model ( in the open-pit mine road crack detection, that is, the area predicted as a crack ). The formula is :

graphic file with name d33e620.gif 8

Among them, TP ( True Positive ) represents the real example, that is, the number of areas correctly predicted by the model as cracks; fP ( False Positive ) represents a false positive example, which is the number of areas that the model incorrectly predicts as cracks, but are not actually cracks. For example, the model detects 100 crack areas. After manual verification, 80 of them are indeed cracks, and the other 20 are misjudged (misjudged the non-crack area as a crack ). Then the accuracy rate can be substituted into the formula to be 0.8. The higher the accuracy rate, the greater the proportion of real cracks in the fracture area predicted by the model, and the higher the reliability of the model prediction.

F1 score ( F1-Score ) is an evaluation index that comprehensively considers precision and recall. Its calculation formula is :

graphic file with name d33e628.gif 9

The F1 score can balance the accuracy and recall rate, and more comprehensively evaluate the performance of the model in the crack detection task. In the detection of road cracks in open-pit mines, the high F1 score indicates that the model can not only accurately identify cracks ( high precision ), but also detect all cracks ( high recall rate ) as much as possible to avoid missed detection and false detection. In addition, the number of parameters, floating point operations ( FLOPs ), and inference time T are used to evaluate the complexity of the method. The number of parameters reflects the size of the model. The larger the number of parameters of the model, the more it means that its structure is more complex and the storage demand is higher. The inference time T directly reflects the operating efficiency of the model in the actual application scenario. The larger the T value, the longer the time required for the model to complete one inference, and the worse the real-time performance. Through the comprehensive evaluation of these three indicators, we can fully and accurately understand the performance of the method in terms of complexity.

Ablation experiment

In order to further explore the specific contribution of each improved module proposed in this paper to the performance of the open-pit mine road crack detection model, an ablation experiment was carried out. The ablation experiment compares the performance difference between the complete model and the model after removing the specific module by removing the different improved modules in the model in turn, so as to clarify the importance of each module. The results are shown in Table 2.

Table 2.

Ablation experiment.

Methods mIoU Precision F1 – Score FLOPs/G parameters(M)
U-Net 0.76 0.82 0.73 5.96 6.23
U-Net + AF2M 0.81 0.85 0.77 4.23 5.42
U-Net + CASP 0.79 0.84 0.76 3.23 3.22
U-Net + AF2M + CASP 0.83 0.89 0.82 4.25 4.73

It can be seen from the experimental results that, compared with the original U-Net model, after adding the AF2M module alone, mIoU increased from 0.76 to 0.81, Precision increased from 0.82 to 0.85, and F1-Score increased from 0.73 to 0.77. This fully shows that the AF2M module plays a significant role in integrating multi-scale features and enhancing key features of fractures. By dynamically weighting and adjusting the features of each dimension, it can more accurately capture the boundary and detailed information of cracks, reduce background interference, and improve the recognition accuracy of the model for crack areas18.

When the CASP module is added individually, the model achieves an mIoU of 0.79, a Precision of 0.84, and an F1-Score of 0.76. This shows that the CASP module is effective in capturing crack texture and spatial information and strengthening channel correlation. Through its unique channel spatial attention mechanism, it allows the model to focus more on crack features, suppress noise interference, and improve the model’s ability to locate and identify crack targets. When both AF2M and CASP modules are added, the model performance improvement is even more significant, with mIoU reaching 0.83, Precision increasing to 0.89, and F1-Score increasing to 0.82. This further verifies that there is a good synergy between the two modules. The AF2M module focuses on feature fusion and enhancement, preserving crack details; The CASP module focuses on the mining of channel and spatial information, highlighting crack characteristics, and the two cooperate with each other to enable the model to identify cracks more comprehensively and accurately in crack detection tasks, reducing the occurrence of false detections and missed detections. On the one hand, the performance synergy gain is significant when the two are combined, with mIoU increasing by 2% points, F1-Score increasing by 5% points, and the false positive rate from 18% to 9%. On the other hand, the complexity increment brought by CASP is extremely low, the number of parameters is only slightly increased, and the increment of FLOPs is less than 1%, but it has achieved a significant improvement in key indicators, which is better than similar attention modules and adapts to the resource constraints of mining mobile devices, so its architectural complexity increase is fully reasonable.

In terms of model complexity, FLOPs dropped from 5.96G to 4.23G and parameters from 6.23 M to 5.42 M with the addition of AF2M modules; With the addition of the CASP module, the FLOPs dropped to 3.23G and the parameters to 3.22 M. This shows that the two improved modules can not only improve the performance of the model, but also effectively reduce the computational amount and scale of the model, improve the operation efficiency of the model, and make it more suitable for the actual road crack detection scenario of open-pit mines. The Fig. 10 below shows the visualization of some channel feature fusion before and after the improvement. The visualization of Figs. 10 and 12 is based on the PyTorch framework: first, the eigentensor output of the middle layer of the decoder is extracted, the eigenvalues of each channel are normalized (mapped to the [0,255] pixel range), the three channels with the highest correlation with fracture features are screened, and then the single-channel feature distribution is plotted through the imshow function of the Matplotlib library, without relying on gradient-weighted class activation mapping tools such as Grad-CAM++.

Fig. 10.

Fig. 10

Part of the channel visualization before improvement.

Figure 8 is the original image, which contains the original information of the open-pit mine road scene, including cracks, pavement, shadows, and debris, but it is difficult to intuitively distinguish the key characteristics of cracks. In Fig. 9, due to the limitations of the original U-Net network decoder, the key semantics and location features are lost during convolution and upsampling. The crack features are not prominent, vulnerable to background interference, blurred boundaries, and loss of details, and small or irregular cracks are difficult to identify. After the AF2M and CASP modules are integrated into Fig. 10, the fracture characteristics are significantly enhanced. The AF2 M module dynamically weights and adjusts the features, strengthens the key features of the cracks, and retains the edge contour; the CASP module integrates cross-channel and spatial location information to accurately locate cracks. Therefore, the crack boundary in Fig. 10 is clear, the details are rich, and the background is clearly distinguished.

Fig. 9.

Fig. 9

Artwork.

Model pruning

In order to further explore the influence of the layer adaptive amplitude pruning ( LAMP ) algorithm on the performance of the model, pruning experiments were carried out. In the experiment, based on the model with AF2M and CASP modules, different pruning rates are set to observe the changes in the model in various performance indicators. The experimental results are shown in Table 3 :

Table 3.

Model performance under different pruning rates.

Pruning rate mIoU Precision F1–Score FLOPs/G parameters(M) Inference time/s
0 0.81 0.85 0.78 5.12 5.87 0.42
0.1 0.82 0.86 0.75 4.97 5.23 0.39
0.2 0.84 0.89 0.77 4.63 5.07 0.37
0.3 0.83 0.89 0.82 4.25 4.73 0.30
0.4 0.78 0.83 0.76 3.95 4.32 0.28
0.5 0.71 0.76 0.69 3.47 3.62 0.23
0.6 0.61 0.67 0.60 3.13 3.12 0.20

The inference time in the table refers to the average time from the input model to the output detection result of a single open-pit mine road crack image, and the calculation process is as follows: 100 images randomly selected in the test set are selected as the sample set, and the gradient calculation function related to model training is turned off in the unified environment of the experimental platform (CPU: AMD Ryzen 7 5800 H, GPU: NVIDIA GeForce RTX 3060, CUDA 12.1). Only the reasoning process is retained; 100 images are fed into the model with different pruning rates in turn, and the inference time of each image (the total time from the completion of image data loading to the output of the detection result tensor is recorded), and finally the average of 100 times is taken as the inference time of the corresponding pruning rate model, so as to eliminate the influence of accidental factors on the time consumption of a single image and ensure data accuracy.

From the experimental results, it can be seen that in the range of pruning rate of 0–0.3, with the increase of pruning rate, the overall mIoU, Precision, and F1-Score of the model show an upward trend, the FLOPs and parameters decrease significantly, and the inference time gradually decreases. This shows that the pruning (LAMP) algorithm based on the layered adaptive amplitude can effectively remove the redundant weights in the model, significantly reduce the computational amount and scale of the model, and improve the operation efficiency of the model without reducing or even improving the detection accuracy of the model. For example, when the pruning rate was increased from 0 to 0.3, FLOPs dropped from 5.12G to 4.25G, a decrease of about 17%; parameters decreased from 5.87 M to 4.73 M, a decrease of about 19.4%; Inference time was reduced from 0.42s to 0.30s, an increase of about 28.6%, while F1-Score increased from 0.78 to 0.82. However, when the pruning rate continues to increase above 0.3, model performance begins to decline. When the pruning rate reaches 0.4, the mIoU drops to 0.78 and the F1-Score drops to 0.76; when the pruning rate reaches 0.5, the mIoU further decreases to 0.71 and the F1-Score drops to 0.69. By the time the pruning rate is 0.6, the performance of the model is even more pronounced. This is because excessive pruning rate will excessively remove important weights from the model, destroy the model’s structure and feature learning capabilities, and cause the model to be unable to accurately learn and identify crack features, thereby reducing the detection accuracy. In order to balance the relationship between the size and accuracy of the model, this paper chooses a pruning rate of 0.3, and when the pruning rate is 0.3, the model performs well in the detection accuracy indexes such as mIoU, Precision and F1-Score, while the computation amount (FLOPs is reduced to 4.25G) and the scale (parameters are reduced to 4.73 M) is significantly reduced, and the inference time is shortened to 0.30s, which can better balance the detection accuracy and operating efficiency. The LAMP algorithm uses “layer adaptive sparsity” to eliminate redundant parameters and retain the core weights of AF2M and CASP. At the pruning rate of 0 ~ 0.3, the number of parameters decreased by 19.4%, FLOPs decreased by 17%, mIoU and Precision were stable, and high accuracy was still maintained on embedded GPUs and edge devices. When the pruning rate is 0.3, the inference speed of the experimental platform is increased by 28.6%, the inference time of the embedded GPU can be inferred is 0.6 ~ 0.8s, and the memory adaptability and inference efficiency of edge devices are significantly improved. The pruning rate > the accuracy drops sharply to 0.3, and 0.3 is the unified optimal pruning rate for multi-hardware, which is suitable for the deployment of open-pit mines.

Model comparison experiment

In order to comprehensively evaluate the performance of the improved model, a total of 7 classical image segmentation models, including U-Net, SegNet, FCN, DeepLabV3+, Mask R-CNN, PSPNet and BiSeNet V2, were selected as baselines, and the comparative tests were carried out on the self-built open-pit mine road crack dataset under completely consistent experimental settings to ensure the fairness and comparability of the results, as shown in the following Table 4:

Table 4.

Comparison of different model results.

Methods mIoU Precision F1 – Score FLOPs/G parameters(M) Inference time/s
U-Net 0.76 0.82 0.73 5.96 6.23 0.45
SegNet 0.72 0.78 0.69 4.88 4.95 0.42
FCN 0.70 0.75 0.65 6.12 7.01 0.50
DeepLabV3+ 0.78 0.83 0.74 7.56 8.21 0.55
Mask R-CNN 0.75 0.81 0.72 8.95 9.56 0.60
PSPNet 0.77 0.80 0.71 6.89 7.85 0.52
Ours 0.83 0.89 0.82 4.25 4.73 0.30
BiSeNet V2 0.71 0.77 0.73 7.56 6.23 0.54

From the perspective of the comparative results, the proposed model is better than most benchmark models in terms of core detection accuracy and engineering practicability, but it needs to be objectively explained: Combined with the visualization results and statistical analysis in Fig. 11, the proposed model has some overlap with the numerical performance of FCN and BiSeNet V2 in some single indicators, and does not show a significant superiority in statistical significance. The specific analysis is as follows:

Fig. 11.

Fig. 11

Fig. 11

Comparison of different model effect diagrams.

In terms of detection accuracy, the core advantages of the proposed model focus on the overall leading of key indicators: the average intersection and union ratio (mIoU) reached 0.83, which was 18.6% and 16.9% higher than that of FCN (0.70) and BiSeNet V2 (0.71), respectively, indicating that its segmentation matching degree and range recognition accuracy of the crack area were significantly better. The accuracy rate is 0.89, which is much higher than that of FCN (0.75) and BiSeNet V2 (0.77), which effectively reduces the probability of false judgment in non-fractured areas. The F1 score is 0.82, which is 26.2% higher than that of FCN (0.65) and 12.3% higher than that of BiSeNet V2 (0.73), reflecting the balance advantages of “accurate identification” and “avoiding missed detection”. However, from the perspective of statistical testing (such as independent sample t-test), the proposed model does not form a significant difference between FCN and BiSeNet V2 in some indicators (such as the F1-Score of BiSeNet V2 is 0.73, which is not significantly different from the 0.82 P < 0.05) of the proposed model, which is related to the complexity of the road scene in the open-pit mine - there are differences in the influence of strong noise, extreme lighting and other interferences on the feature extraction of different models, resulting in the local overlap of the performance of some models on specific indicators.

In terms of model complexity and engineering practicability, the advantages of the proposed model are significant: the floating-point operation volume is only 4.25G, which is 30.6% and 43.8% lower than that of FCN (6.12G) and BiSeNet V2 (7.56G), respectively. The number of parameters was reduced to 4.73 M, which was 32.5% and 24.1% lower than that of FCN (7.01 M) and BiSeNet V2 (6.23 M), respectively. The inference time is shortened to 0.30s, which is 40.0% and 44.4% faster than FCN (0.50s) and BiSeNet V2 (0.54s), respectively, and is more suitable for real-time detection and edge equipment deployment in open-pit mines. Although FCN and BiSeNet V2 form local competition with the proposed model in some accuracy indicators, they have shortcomings such as large computational intensity and poor real-time performance, which are difficult to meet the actual engineering requirements of road detection in open-pit mines.

In summary, although the proposed model does not show statistically significant superiority with FCN and BiSeNet V2 in some indicators, it still has irreplaceable advantages from the comprehensive dimension of “detection accuracy, model complexity, and engineering practicability”: the core accuracy indexes (mIoU, accuracy, F1-Score) are comprehensively leading, and the lightweight degree and real-time performance of the model are significantly better than the two, which can effectively meet the actual needs of road crack detection in open-pit mines. In the future, the statistical differences with such models can be further expanded by introducing anti-interference feature enhancement modules and optimizing sample distribution, and the robustness and significance advantages of the models can be improved.

Figure 11 shows the comparison of the detection effects of different models, which visually shows the performance differences of each model: other classical models have obvious shortcomings - U-Net has serious missed detection of small cracks and occlusion cracks; SegNet fracture boundaries are blurred, making it difficult to support fracture severity assessment. FCN has a high background false positive rate, which increases the cost of manual screening. BiSeNet V2 lacks crack recognition ability in complex backgrounds, and there are many local missed detections. The improved model in this paper can clearly outline the crack contour, accurately identify small cracks, occlusion cracks and low-contrast cracks, and have better boundary continuity and integrity, providing a more reliable visual basis for subsequent maintenance decisions.

From the comparison of different model effect diagrams in Fig. 11, we can see more intuitively the advantages of the improved model in crack detection. The difference in the detection results of the same open-pit mine road image by different models shown in the diagram, is clear at a glance. Other classical models have more or fewer problems in detection. Like the U-Net model, although most of the cracks can be identified, there are missed detections for some small cracks or partially occluded cracks, resulting in insufficient display of the integrity of the crack area. The crack boundary detected by the SegNet model is relatively vague, and it is difficult to accurately determine the actual range of cracks, which may affect the evaluation of crack severity in practical applications. The FCN model will misjudge some background areas as cracks, which increases the workload of subsequent manual screening. The detection results of the improved model in this paper are more accurate in Fig. 11. It clearly outlines the outline of the crack, whether it is a small crack or a crack in a complex background, which can be accurately identified. Even those cracks that are similar to the color of the surrounding environment and are easy to confuse are accurately detected, and the boundaries of the cracks are clear, continuous, and completely present the shape of the cracks. This not only helps the staff to quickly locate the crack position, but also, through the shape and size of the crack and other information, provides the initial judgment of the degree of harm, to provide a reliable basis for subsequent maintenance decisions.

Fig. 12.

Fig. 12

The improved partial channel visualization map.

The prediction result of Fig. 13 further reflects the value of the improved model in practical applications. The figure shows the results of crack detection of open-pit roads in multiple scenarios. Under different illumination conditions and different road conditions, the improved model can stably output high-quality detection results. In the area of strong light irradiation, the model did not misjudge or miss detection due to excessive light; on the road covered by shadows, it is still possible to accurately identify the cracks hidden in the shadows. This shows that the improved model has strong adaptability to complex environments and greatly improves the reliability of detection. From a more macro perspective, these test results are of great significance for the safe production and efficient operation of open-pit mines. Accurate crack detection can help mine managers to grasp the damage of the road in time and arrange the road maintenance plan reasonably. In the past, due to the limitations of detection methods, road maintenance work often had a certain blindness; either excessive maintenance causes waste of resources, or maintenance is not timely, leading to road diseases. Now, with the help of the accurate detection of the improved model, the management personnel can formulate the maintenance plan according to the actual situation of the crack, and repair the slight crack in time to prevent its further deterioration. For serious cracks, a more professional maintenance team is arranged for large-scale repair to ensure that the road is always in a safe and stable state. This can not only prolong the service life of the road, reduce the maintenance cost, but also reduce the transportation interruption and safety accidents caused by road problems, and ensure the normal production order of the open-pit mine.

Fig. 13.

Fig. 13

The actual prediction result diagram.

With the continuous development of technology, the improved model can be combined with more technical means in the future to further improve the intelligence level of road crack detection in open-pit mines. For example, combined with UAV technology, the rapid inspection of large-area open-pit roads can be realized. The high-definition camera carried by a UAV is used to collect images, and then the real-time analysis can be carried out through the improved model, which can greatly improve the detection efficiency19. At the same time, combining the detection results with a geographic information system ( GIS ), a digital road disease map can be constructed to visually display the distribution of road cracks, which is convenient for managers to carry out unified management and scheduling. In addition, through the accumulation and analysis of a large number of test data, a road disease prediction model can be established to predict the development trend of cracks in advance, provide a more scientific basis for preventive maintenance, and promote the development of open-pit mine road maintenance work in the direction of intelligence and refinement.

Conclusion

In this paper, an improved detection method based on feature fusion is proposed to solve the problem of road crack detection in open-pit mines. By introducing the Adaptive Feature Fusion Module (AF2M) and the Channel Spatial Attention Module (CASP) into the original U-Net network, and combining it with the layer-based adaptive amplitude pruning (LAMP) algorithm to optimize the model, the performance of road crack detection in open-pit mines is effectively improved.

(1) Through the processing of feature maps at different scales, including upsampling, splicing, channel attention calculation, and global context information fusion, the AF2M module can dynamically weight the features, strengthen the key features of the cracks, accurately retain the contours of the crack edges, and reduce background interference.

(2) Inspired by the self-attention mechanism, the CASP module realizes inter-channel information coding from a new perspective, and integrates cross-channel interaction information and fracture spatial location information. It can not only capture crack texture and spatial information, strengthen channel correlation, but also suppress noise interference, help the model locate and identify crack targets more accurately, and effectively make up for the shortcomings of traditional attention modules.

(3) The LAMP algorithm performs well in the model compression process. In the range of a reasonable pruning rate (0–0.3), the redundant weight of the model can be removed, the computational amount and scale of the model can be reduced, the inference time can be shortened, and the detection accuracy of the model can be improved. When the pruning rate was 0.3, the model achieved a good balance between detection accuracy and operational efficiency.

(4) The experimental results fully verify the effectiveness of the proposed method. The ablation experiments showed that AF2M and CASP modules each contributed significantly to the improvement of model performance, and the two had a good synergistic effect. The pruning experiment determines the optimal pruning rate, and the model performance is significantly optimized under this pruning rate. Compared with other classical image segmentation models, the improved model performs well in detection accuracy indicators such as average intersection ratio, accuracy rate, and F1 score, and also has advantages in complexity indicators such as computational volume, model scale, and inference time.

In practical application, the improved model can help open-pit mine managers to grasp the road cracks in a timely and accurate manner, provide a reliable basis for road maintenance decision-making, and is of great significance to ensure the safe production of open-pit mines and improve operational efficiency. In the future, combined with UAV, geographic information system and other technologies, it is expected to further improve the intelligence and refinement of road crack detection in open-pit mines and promote the development of open-pit mine road maintenance. However, it should be noted that this method still has certain limitations: first, the false positive rate and missed detection rate are easy to increase in strong noise (such as ore debris and water reflection) scenarios, second, the feature extraction bias is obvious in extreme lighting (strong light shadow, low illumination), and third, the sample size and category imbalance of special fractures (reticulated and subsidence) are small, resulting in high generalization errors in such fracture cross-mine scenarios.

In the future, optimization and improvement can be made from three aspects: first, the introduction of RGB-IR dual-modal fusion and noise perception training to strengthen the crack differentiation in noise scenes; Second, combine Retinex light separation and meta-learning to improve extreme light adaptability; Third, the diffusion model generates special crack samples and integrates the prototype network to optimize the problem of small samples and category imbalance. At the same time, it can further combine UAV end-side real-time reasoning and GIS disease prediction to promote the transformation of open-pit mine road maintenance to intelligent and preventive.

Author contributions

L W, G L: Provide direction and ideas. M Z, J L: Coding and Writing. Z Y: Algorithm Improvements. Q W, G Y: Provide data sets and create data.

Funding

This study was not funded.

Data availability

The data used and analyzed during the current study are available from the corresponding author upon reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Sun Shuwei, L. & Yuan, H. Jiabing Comparative study on failure mechanism of slope with different rock and soil strength in open-pit mine. Coal J.49 (S2), 746–755 (2024). [Google Scholar]
  • 2.Dong Ye, R. et al. Research progress of intelligent detection and analysis algorithm for pavement distress. J. Southeast. Univ. (Natural Sci. Ed.), 1–26 [ 2025-05-10 ].
  • 3.Jinxu, W. & Yun, W. Brain tumor segmentation based on residual mixed attention and adaptive feature fusion.Computer application research, 1–9 [ 2025-05-10 ].
  • 4.Qi Lei, L. et al. U-Net leaf defect image segmentation based on attention mechanism. Chin. J. Saf. Sci.34 (05), 139–146 (2024). [Google Scholar]
  • 5.Gui, Y. et al. Pavement crack segmentation method based on CNN and scale adaptive transformer fusion network. China J. Highways. 37 (12), 418–432 (2024). [Google Scholar]
  • 6.Jiang Song, L. et al. Intelligent identification of landslide disasters based on deep learning of UAV images. Chin. J. Saf. Sci.34 (07), 229–238 (2024). [Google Scholar]
  • 7.Yaowei, L. et al. Crack detection of underwater Bridge structures based on image enhancement and improved U-net fusion. Eng. Mech. (S1): 276–282 (2025).
  • 8.Chen Ximing, Y. & Xin Ren Kaiyu, etc. Research on dual-task detection method of surface cracks in coal mine Goaf. J. Remote Sens.28 (12), 3271–3286 (2024). [Google Scholar]
  • 9.Gao Pengfei, Z. & Liya Wang Yukun, etc. Segmentation and detection of pavement cracks with multiple attention mechanisms. Adv. Laser Optoelectron., 1–14 (2025)
  • 10.Duan Zhongxing, H. et al. An effective feature extraction and cascade optimization method for pavement crack detection. J. Comput. Aided Des. Graphics. 36 (12), 2020–2028 (2024). [Google Scholar]
  • 11.Yang, B. et al. Tracking the weld seam under strong interference in laser-arc hybrid welding via a novel local-add U-net. Engineering Applications of Artificial Intelligence. 151110778-110778 (2025).
  • 12.DMFPNet Dual-path high-resolution remote sensing image segmentation algorithm for enhancing multi-scale target perception. J. Earth Inform. Sci.27 (05), 1195–1213 (2025). [Google Scholar]
  • 13.Shu, X. et al. Adaptive encoding and comprehensive attention decoding network for medical image segmentation. Appl. Soft Comput., 174112990–174112990. (2025).
  • 14.Lu Liwei, W. et al. Skin cancer U-Net segmentation model based on multi-scale channel fusion attention. Inf. Control, 1–12 [ 2025-05-10 ].
  • 15.Balaha, M. H. et al. AOA-guided hyperparameter refinement for precise medical image segmentation. Alexandria Eng. J., 120547–120560. (2025).
  • 16.Khan, H. H. & Khan, I. M. Correction An optimized deep focused U-Net model for image segmentation. Neural Computing and Applications. (prepublish):1–2 (2025).
  • 17.Liang, Z. et al. Investigation of three-dimensional aggregate contact evolution using an enhanced image segmentation algorithm. Constr. Build. Mater., 468140371–468140371. (2025).
  • 18.Qin Hanxuan, G. & Lei Zhang Wenfang, etc. Risk analysis of houses in landslide areas based on the characteristics of wall cracks. Chin. J. Saf. Sci.35 (03), 133–141 (2025). [Google Scholar]
  • 19.Zhu, P. & Liu, J. Joint U-Nets with hierarchical graph structure and sparse transformer for hyperspectral image classification. Expert Syst. Appl., 275127046–275127046. (2025).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used and analyzed during the current study are available from the corresponding author upon reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES