Skip to main content
PLOS One logoLink to PLOS One
. 2025 May 22;20(5):e0324512. doi: 10.1371/journal.pone.0324512

Enhanced YOLOv8-based pavement crack detection: A high-precision approach

ZuXuan Zhang 1, HongLi Zhang 1,*, TongJia Zhang 2
Editor: Ahmed M Yosri3
PMCID: PMC12097708  PMID: 40403041

Abstract

At present, the repair of cracks is still implemented manually, which has the problems of low identification efficiency and high labor cost. Crack detection is the key to realize the mechanical and intelligent crack repair. To solve these problems, an improved automatic recognition algorithm based on YOLOv8 model, YOLOV8-DGS is proposed in this study. Firstly, this paper introduces deep separable Convolution (DWConv) into YOLOv8 backbone network to capture crack information more flexibly and improve the recognition accuracy of the model. Secondly, GSConv is used in the neck part to reduce computation and enhance feature representation, especially in the processing of multi-scale fracture features. Through these improvements, YOLOv8-DGS not only improves the accuracy of small cracks, but also ensures the real-time and high efficiency of intelligent joint filling equipment in practical applications. Experimental results show that the Precision, Recall, F1-score, mAP50 and FPS of the YOLOv8-DGS algorithm in pavement crack detection are 91.6%, 90%, 90.8%, 92.4% and 85 frames, respectively. At the same time, the recognition rate of different types of cracks in the model reached more than 86%, which increased by 20.5% compared with the YOLO11 model. This method can provide theoretical basis for automatic crack identification and technical support for automatic seam filling machine.

1. Introduction

Pavement crack is one of the key factors affecting road safety and life. With the acceleration of urbanization, the problem of pavement damage has become increasingly serious, especially the appearance of cracks, which not only affects road safety, but also accelerates the aging of pavement [1]. The traditional crack detection method usually relies on manual inspection, and the road inspection personnel observe the crack on the highway with the naked eye, generally the crack detection takes 2–3 hours per kilometer. This method is time-consuming and laborious, and the detection of the road section with heavy traffic is extremely dangerous, prone to the problem of missing detection, and the detection effect is not ideal. In recent years, with the rapid development of computer vision and deep learning technology, the pavement crack detection method based on deep learning has gradually become a research hotspot, and has been widely used in intelligent engineering equipment [2].

At present, YOLO (You Only Look Once) series models have made remarkable achievements in the field of target detection because of their high detection speed and accuracy. YOLOv8, as the classic version of the YOLO series, is widely used in object detection. However, in practical applications, the precision and accuracy of YOLOv8’s small target detection in complex scenes are still not enough to meet the needs of intelligent devices. Specifically, the current YOLOv8 model is still difficult to achieve high enough accuracy when dealing with small cracks, irregularly shaped cracks, and partially obscured cracks. In addition, the existing pavement crack detection methods such as CrackFormer mostly focus on the detection task itself, but how to further improve the robustness and practicality of the algorithm is still a technical problem to be solved when conducting real-time and accurate crack detection for intelligent crack filling equipment.

In order to realize automatic crack repair work, it is necessary to get rid of relying on manual detection of road diseases, this method is low efficiency, subject to the subjective influence of maintenance personnel, prone to misdiagnosis and missed diagnosis [3]. This study summarizes the previous research results and further breakthroughs in computer vision. Cai et al. [4,5] used YOLACT network model to identify ice images in the cold ocean, improved the detection accuracy of ice floes in images captured in the complex polar environment, and used improved image processing algorithms to estimate the circumference crack size of the identified ice. Liu et al. [6] proposed a Crack Transformer network (CrackFormer) for fine-grained crack detection to optimize the common problems of uneven strength, complex topology, low contrast and noise background in crack images. Although the problems such as background are solved, there are still misdetection and missing detection of small targets. With the rapid development of deep learning, crack detection based on image processing is faced with more complex challenges [7]. Jia et al. [8] used the improved VarifocalNet model to analyze and detect photovoltaic module defects, and designed two new bottleneck modules to enhance the network depth and receptive field, thereby improving the speed and accuracy of defect detection. However, traditional convolutional neural networks still have some limitations when dealing with complex recognition tasks, and deep learning techniques are constantly optimized to further improve the recognition accuracy of the models. Aiming at the problems such as poor recognition and classification effect, slow recognition speed and weak generalization ability of current recognition methods, Jiang et al. [9] proposed an algorithm based on improved YOLOv8, which integrates CloFormer module to improve the ability to extract high and low frequency features from Marine debris images and improve the overall performance of the model. In the study of hidden crack location of asphalt pavement, Zhen et al. [10] added Resnet50vd to YOLOv3 for improvement. The improved convolution is a hyperparameter optimization method based on Bayesian search, which enhances the detection capability of the model. In addition, Liu et al. [11] also proposed a YOLOv3 model containing four-scale detection layers, which effectively reduces the missed detection rate of crack characteristics of small targets. It is worth noting that image quality is closely related to the accuracy and speed of model recognition. Sun et al. [12] propose an image enhancement algorithm for SLAM, which aims to solve the problem of significant decline in target detection accuracy in vision degradation scenes, and significantly improve the detection performance of mainstream target detection networks such as YOLOv3, Faster R-CNN and DetectoRS. In view of the shortcomings of deep learning in data acquisition and defect statistics in crack detection, Cai et al. [13] proposed a borehole rescue command and decision system for underground disaster areas based on multi-source heterogeneous data fusion. The indicators of the human pose fusion image recognition algorithm of the system are higher than those of traditional algorithms. Recent studies have demonstrated the effectiveness of multi-source heterogeneous data fusion in real-time identification of critical situations in disaster areas. Xu et al. [14] designed a three-dimensional reconstruction and geometric topography analysis method for small lunar craters, and obtained three-dimensional point clouds on the lunar surface by means of the aggregation of stereo matching networks. Through the test of KITTI2015 dataset, the advantages of this system in terms of real-time performance and effectiveness were proved.

Convolutional neural network is more suitable for complex and small target detection such as road cracks [15]. This study aims to improve the YOLOv8 model and propose a high-precision crack identification method suitable for intelligent seam filling machine based on the characteristics of road crack detection. In order to reduce the computational complexity and improve the inference speed of the model, the convolutional modules in the backbone and neck parts are optimized. Instead, the more lightweight deep separable convolutional modules and the grouping shuffle convolutional modules are replaced. Before training, image filtering and histogram equalization algorithm are introduced to improve the robustness of the model, so as to meet the real-time detection requirements of intelligent sewing machine. The goal of this study is to improve the accuracy and accuracy of crack detection to 91.6% by improving the YOLOv8 model while maintaining high speed, so as to provide technical support for the automatic operation of intelligent sewing machine.

2. Materials and methods

2.1. Datasets

In this study, the road crack pictures collected by the author named RDD2022 are used for training and testing. The RDD2022 dataset contains road defect data from six countries: China, Japan, Czech Republic, Norway, USA and India. In this study, high-quality pictures in RDD were manually screened and the local pavement crack pictures were combined to make a data set. The data set was divided into categories by running codes. There were four labels in the data set, namely D00 (longitudinal crack), D10 (transverse crack), D20 (block crack) and D30 (irregular crack). A rich and balanced data set of pavement cracks suitable for this study is obtained.

According to Table 1, a total of 4878 pavement cracks were selected in this data set, among which 4104 types of cracks were D00; D10 total 2359; D20 total 934; Total number of D40 is 321. First, these images are cropped to a resolution of 640 × 640 to fit the training parameters. Then the data is expanded by rotation, translation and other operations of the image, and the image is expanded to 19512. Finally, in order to eliminate the noise and blur in the image, the filtering equalization operation is used to further enhance the data. Data partitioning is a crucial step in the development of large models, and common training set to test set ratios are 9:1, 8:2, 7:3. Although a high proportion of training sets will improve the learning ability of the model, the training results will be unstable due to insufficient test sample size, and can not truly reflect the generalization ability of the model. Although a high proportion of test sets will improve the generalization ability of the model, it will lead to insufficient model learning on some features, and it can not fully learn the features of the data. Therefore, the ratio of 8:2 is adopted in this data set to ensure sufficient model training data and stable model evaluation results, which is not prone to large fluctuations. Through the above operations, a road crack data set covering different lighting conditions, barrier-free occlusion and different material pavement is produced, and the model trained by this data set can be better generalized to different scenes.

Table 1. Classification of pavement cracks.

Type Feature code Label Number
Longitudinal crack D00 C 4104
Transverse crack D10 L 2359
Block crack D20 B 934
Irregular crack D30 I 321

The rectangular boxes with different colors in Fig 1(a)-(d) are the characteristic boxes of longitudinal crack type, transverse crack type, block crack type and irregular crack type respectively. labelimg software [16] was used for annotation. The causes of cracks can be roughly divided into four aspects: pavement material characteristics, climate conditions, traffic load and design standards, so that the pavement is subjected to the action of transverse/longitudinal forces to produce transverse/longitudinal cracks; The transverse cracks extend horizontally, perpendicular to the center line of the road, and the spacing is uniform. Longitudinal cracks extend longitudinally, parallel to the center line of the road, and have local branches. Block crack is a kind of criss-cross crack form, which is typical of asphalt pavement block crack. When longitudinal cracks and transverse cracks continue to develop, it is easy to form massive cracks. Temperature fatigue, reflection cracks and asphalt aging can also cause massive cracks [17].

Fig 1. Sample label.

Fig 1

2.2. Image Preprocessing

2.2.1. Histogram equalization.

The histogram of the image represents the frequency of different gray levels (brightness values) in the image. The goal of histogram equalization [18] is to make the histogram distribution of the image more uniform by redistributing the gray level of the image, so as to improve the contrast of the image. Each image is composed of individual pixels, in order to solve the histogram equalization of the image, it is necessary to use a discrete form of cumulative distribution function, which is a function representing the cumulative value of the occurrence frequency of each gray level and all previous gray levels. Its calculation formula is as follows:

CDF(i)=j=0ip(j) (1)

Where, p(j) is the probability density of the gray levelj, i is the value of the gray level.

In order to map the range of the cumulative distribution function to [0, 255], normalization is usually performed:

CDFnorm(i)=CDF(i)CDFminN×MCDFmin (2)

Where, CDFmin is the minimum value of the cumulative distribution function, N and M are the number of rows and columns of the image, respectively. According to the normalized CDF value, the gray value of the original image is mapped to the new gray value, and the enhanced image is generated.

The traditional histogram equalization method may cause detail loss or noise for the image with good contrast, which will affect the image quality.

2.2.2. Filter equalization enhancement.

In order to solve the problem of noise amplification and excessive enhancement caused by histogram equalization in image enhancement, a filter equalization enhancement algorithm is designed in this paper based on the application requirements of intelligent sewing machine. The method calculates the contrast of the image for many times. When the contrast exceeds the default threshold, the image is directly output. If the contrast is lower than the default threshold, the histogram equalization operation to limit the contrast is further applied to effectively avoid excessive enhancement in the traditional equalization method.

Fig 2 shows the flow of filtering equalization enhancement algorithm. The initial contrast threshold is set as T = 1.8 in the experiment. When the road crack image is taken as input, bilateral filtering is first applied, and the spatial distance parameter is 4, the spatial arrangement parameter and the color space parameter are 75 and 80, respectively, to obtain the filtered image. Then, the contrast value between the filtered image and the original image is calculated. If contrast T < 1.8, histogram equalization of restricted contrast is continued. Otherwise, output the image directly.The mathematical expression of bilateral filtering is as follows:

Fig 2. Filter equalization enhancement algorithm flow.

Fig 2

I(x)=1WpxiΩI(xi)f(xix)g(I(xi)I(x)) (3)

Where, f(·) is the spatial distance weight function, g(·) is the pixel intensity weight function, Wp is the normalization coefficient.

Contrast calculation formula is as follows:

T=σIμI (4)

Where, σI and μI are the standard deviation and mean of the image respectively to quantify the basis for contrast adjustment.

The distance function and kernel function are two important functions that constitute the bilateral filter. The weighted average value of the target pixel and the surrounding pixel is calculated to achieve the filter, which can effectively smooth the image and remove the noise while preserving the sharpness of the image edge. The histogram equalization is realized by adjusting the gray distribution of the image to improve the contrast, and the constraint parameters are added to reduce the amplification of noise and the loss of detail. Finally, the enhanced crack image generated by this algorithm has higher visual clarity and better crack feature expression, which is suitable for real-time crack identification and detection requirements of intelligent seam filling machines.

2.2.3. Data enhancement processing comparison.

In image recognition research, the quality of data set directly affects the performance of the model, so it is necessary to pre-process the crack data set effectively before crack recognition. As a common image processing method [19], traditional histogram equalization can improve the contrast and visual effect of images, but its performance in pavement crack detection is relatively general, and it is difficult to highlight the detailed characteristics of cracks. Therefore, an image enhancement technology is proposed in this study, and its effect comparison is shown in Fig 3.

Fig 3. Comparison of pretreatment effect.

Fig 3

Fig 3 shows the comparison of the pre-processing effects of irregular road cracks in the self-made data set. The original block crack image, the image after histogram equalization and the image of mean filtering operation are listed respectively. The traditional histogram equalization method equalizes the whole image globally, but does not fully consider the details of the image, which leads to the obvious increase of noise and the crack information is covered by other features.

In order to overcome the shortcomings of traditional histogram equalization methods, a filter equalization enhancement algorithm is proposed in this paper. Firstly, the crack image is filtered bilaterally to effectively remove noise while preserving edge details. Then histogram equalization combined with contrast restriction is applied to precisely control the enhancement of image contrast. Noise amplification and excessive enhancement have always been the problems of traditional methods, so the limitation of contrast parameters can be avoided, which significantly improves the image processing effect and the crack recognition ability of the model.

In this experiment, a comparative experiment method was adopted. The unimproved YOLOv8 model was used to train on the original data set and the equalized and filtered data set respectively. Other parameters were consistent, and the accuracy rate obtained from the two trainings was compared. The training results before and after equalization filtering are shown in Table 2.

Table 2. The training results before and after filtering equalization.
Fracture type Equalizing
filter before (%)
After equalizing
filter (%)
Longitudinal 61.5 74.7
Transverse 60.0 72.5
Block 68.5 78.6
Irregularity 62.5 70.9

It is not difficult to see from Table 2 that the recognition effect achieved by using equalization filtering to enhance the data set is better than that achieved by the original data set.

3. Detection algorithm improvement

3.1. Algorithm overview and detection process

YOLOv8 is a classic version of the YOLO family of algorithms, It combines more lightweight backbone networks such as CSPDarknet and the effective multi-scale Feature fusion method PANet (Path Aggregation Network) and FPN (Feature Pyramid) in neck Network) and anchor-free detection strategy of head part, which has strong real-time and detection ability. Through the collaborative work of backbone, neck and head [20], YOLOv8 has found a good balance between accuracy and speed, and has a wide range of application prospects in the field of target detection.

YOLOv8 algorithm was used to train the self-made data set. The model training process in this study is shown in Fig 4. First, the data in the open data set rdd2022 was selected and combined with the data collected by the author on site. labelimg annotation tool [21] was used to label the pavement crack image and produce the data set that could be used for yolo training. Secondly, the data set is divided into training set and test set, and the training set is processed by equalization filtering. Finally, the model is trained, and the model is tested on the test set, so as to obtain the crack recognition results of the model.

Fig 4. Pavement crack recognition process based on YOLOv8 modeling.

Fig 4

3.2. YOLOv8 model improvements

The improved model YOLOV8-DGS proposed in this study is optimized for the computational complexity and feature extraction capability of the YOLOv8 model, and focuses on improving the backbone and neck structures. On the premise of keeping the number of layers unchanged, we introduce Depthwise Separable Convolution (DWConv) and Ghost Shuffle Convolution (GSConv). To reduce the calculation and improve the detection accuracy.

YOLOv8’s backbone is mainly used to extract feature information of different scales. Standard Convolution takes a lot of computation and has high redundancy in feature extraction. In order to improve computational efficiency, we replace the standard convolution modules at layers 0, 1, 3, and 5 in backbone with DWConv (light blue fills the modular part, as shown in Fig 5). The core idea of DWConv is to split standard Convolution into Depthwise Convolution and Pointwise Convolution. Channel by channel convolution is only carried out in a single channel, rather than across channels like standard convolution, so as to reduce the computation. Point-by-point convolution is the use of 1 × 1 convolution to aggregate information across channels to restore feature representation capabilities. With the introduction of DWConv, backbone is able to reduce parameters and computation while maintaining strong feature extraction capabilities, thus processing crack texture information more efficiently.

Fig 5. Structure of YOLOv8-DGS.

Fig 5

YOLOv8’s neck is mainly used for multi-scale feature fusion, usually using upsampling and downsampling operations to enhance the robustness of target detection. Traditional convolution is computationally complex in this part, so we replace all standard convolution with GSConv in the lower sample (orange fill module, as shown in Fig 5). GSConv divides input channels into multiple groups by Grouped Convolution, and convolution operations are carried out independently within each group, thus reducing computational complexity. Shuffle Operation The Shuffle operation uses the Channel Shuffle mechanism to enable the features of different groups to interact with each other to compensate for information loss caused by group convolution. The introduction of GSConv enables neck to fuse information at different scales more efficiently, while reducing computational costs, improving the effectiveness of feature transmission, and enhancing the accuracy of crack detection.

3.2.1. DWConv depth-wise convolution.

In the traditional convolution operation, it is assumed that the size of the input feature graph is H×W×Cin, and the size of the output feature graph is H×W×Cin, where Cin is the number of input channels and Cout is the number of output channels. For each input channel and output channel, a convolution calculation is performed. Assuming the size of the convolution kernel is K×K, then the number of parameters for each convolution layer will be:

Number of parameters=K×K×Cin×Cout

As the number of input and output channels increases, the amount of computation and the number of parameters increases dramatically, which makes deep neural networks very time consuming when training and reasoning, especially on mobile or embedded devices.

In this case, it is necessary to introduce Depthwise Convolution [22], which significantly reduces computational complexity by decomposes traditional Convolution operations into two simpler operations: Depthwise convolution and Pointwise Convolution (1x1 convolution).

Deep convolution operates on each input channel separately, rather than across channels as with traditional convolution. That is, for each channel of the input, an independent convolution kernel is used for convolution calculation, and the convolution operations of different channels do not interfere with each other. The characteristic of this operation is that the size of the convolution kernel is still K×K, but each convolution kernel is only responsible for the corresponding input channel, and is no longer combined with other channels as is the case with standard convolution. The amount of computation is greatly reduced, and the number of parameters is also greatly reduced. After deep convolution, point-by-point convolution (i.e., 1 × 1 convolution) is usually used to fuse the individual input channels into the output channels. This step is responsible for linear combination of each output channel after deep convolution to generate the final output channel. The function of point-by-point convolution is to combine the features of multiple channels generated by the deep convolution operation to restore the channel dimension of the feature graph.

For a deeply separable product, we first perform a deep convolution (convolution for each input channel):

Number of depth convolution parameters = K×K×Cin

Then, use point-by-point convolution to merge features:

Number of point by point convolution parameters = Cin×Cout

Thus, the total number of parameters for depth separable integrations is:

Depth separable volume total parameter number = K×K×Cin+Cin×Cout

In this way, the depth divisible volume greatly reduces the number of parameters and the amount of computation in the model. The depth-separable convolution module and its operation process are shown in Fig 6.

Fig 6. Structure diagram of depth-separable volume integration algorithm.

Fig 6

Unlike traditional convolution, depth-separable convolution changes its internal computational structure while keeping the dimensions of the input and output images unchanged. Therefore, depth-separable convolution can perfectly replace traditional convolution without adding or reducing the number of backbone layers. Specifically, deepseparable convolution enhances the range of receptive fields by decomgenerating convolution operations so that the convolution kernel of each channel only processes data from a single channel [23]. This method helps the model to capture the detail information in the crack image better, so the accuracy and efficiency of the target detection are greatly improved.

3.2.2. GSConv group shuffle convolution.

GSConv (Group-shuffle Conv) [24] (Fig 7) is an improved convolution operation, which aims to solve the problem of excessive computational load and parameters of convolution operations in deep neural networks, while improving the computational efficiency and expressiveness of the model. It combines Group Convolution and Channel Shuffle techniques to enhance the network’s learning ability while keeping the computational effort low, especially when dealing with tasks such as small object detection.

Fig 7. GSConv (Group-shuffle Conv) structure diagram.

Fig 7

In traditional convolution operation, the input feature map and the convolution kernel are convolved to generate the output feature map. As the depth of the network and the dimension of the input feature graph increase, the computation and parameter number of the convolution operation also increase exponentially. In order to reduce the computation and parameter number, grouping convolution comes into being. The core idea of grouping convolution [25] is to divide the channels of input feature graphs into multiple groups, and each group performs convolution operations separately to reduce the amount of computation. However, although grouping convolution can effectively reduce the computation and parameter number, it also has the limitation that there is no information exchange between the channels in each group, resulting in the loss of feature expression ability. In order to overcome this problem, channel rearrangement technology comes into being. By rearranging the channels in each group in a specific order, GSConv ensures efficient information exchange between each channel. This enables the model to obtain richer feature representations while maintaining low computational complexity.

GSConv (Group-shuffle Conv) introduces the channel rearrangement technology, which disrupts and rearranges the connections between channels after a grouping convolution operation, thus facilitating the flow and sharing of information across groups. Specifically, GSConv consists of the following two key steps:

Group convolution: First, the input feature graph is divided into multiple groups, and each group is independently convolution operation, which can reduce the calculation amount and the number of parameters of each group of convolution operation.

Channel Shuffle: After the packet convolution is complete, the channel is rearranged. That is, the channels in each group are rearranged according to certain policies, so that the channels from different groups can be connected to each other, so as to realize the information flow across groups. This helps enhance feature representation.

The improvement of feature extraction ability of the improved model compared with the original model is shown in Fig 8. The FLOPs, number of parameters and number of layers of the improved model and the comparison experimental group model are shown in Table 3.

Fig 8. Comparison of feature extraction heat map before and after the improved algorithm.

Fig 8

Table 3. Different model parameters and test results.
Model FLOPs (B) Params (M) Layers
YOLOv5s 15.9 7.21 213
YOLOv8n 8.7 3.20 168
YOLOv8s 28.6 11.2 168
YOLOv8-DW 7.3 10.13 187
YOLOv8-GS-slim 23.0 8.88 266
YOLO11s 21.7 9.45 319
YOLOv8-DGS 24.3 9.20 174

4. Experiment

4.1. Experimental parameter settings

The hardware parameters of this study are a desktop computer provided by the working unit, the central processor is the Core i5-12400f launched by Intel Corporation, the GPU used for graphics processing is the RTX4070 launched by NVIDIA, and the running memory is 16GB, which can better meet the model training work. Fig 9 shows the model training samples, which are composed of multiple pictures of the same size. Before training, these pictures will undergo data-enhanced operations with the model itself, such as simple rotation, scaling and cropping operations to increase the generalization ability and stability of the model.

Fig 9. Training visualization.

Fig 9

The experiment used annaconda to create a virtual environment running on the pycharm compilation platform, python version 3.10. Table 2 lists the parameter values of this experiment. The number of training rounds is set to 200, which can ensure the stability of the training results. Because the GPU with good performance is used, the batch is set to 32. The input size is set to 640 consistent with the data set size, and the Mosaic is set to 0 because mean filtering was used to enhance the data set in this study. In the training process, the number of model layers, the number of parameters, the calculation complexity, mAP50 and MAP50-95 are calculated and output to show the performance of the model in the training process. Table 4 shows the specific parameters.

Table 4. Hyper-parameter Settings.

Parameter Settings
Initial learning rate(lr0) 0.01
Final learning rate(lrf) 0.02
Batch size 32
Weight_decay 0.001
Warmuo_epochs 3.0
Warmup_bias_lr 0.1
Epochs 100
Mosaic 0

4.2. Evaluation index

In order to comprehensively evaluate the performance of the improved YOLOv8 model in pavement crack detection, this study adopted a variety of common target detection and evaluation indexes. These indicators can objectively reflect the performance of the model in different aspects, including detection accuracy, recall rate, positioning ability and speed. Specific evaluation indicators include Precision, Recall, F1-score, mAP and Inference Time, etc. [26].

Accuracy is the proportion of positive samples predicted correctly in the model’s detection results, and recall measures the ability of the model to identify all actual cracks, i.e., the proportion of true positives to all actual cracks. The formula is as follows:

P=TPTP+FP (5)
R=TPTP+FN (6)

Where TP is true positive (number of crack areas correctly detected), FP is false positive (number of crack areas incorrectly detected), and FN is false negative (number of crack areas missed). The high accuracy means that most of the crack areas detected by the model are correct, that is, the false positive rate is low. The high recall rate means that the model detects as many cracks as possible, reducing missed detection.

F1-score is the harmonic average of accuracy and recall rate, which is an important index to measure the comprehensive performance of the model. By considering both accuracy and recall, it avoids the bias of considering one metric in isolation. Its calculation formula is as follows:

F1score=2×P×RP+R (7)

F1-score can balance accuracy and recall rate to a certain extent, and it is a comprehensive index to evaluate the performance of detection models.

mean Average Precision (mAP) is one of the most commonly used evaluation indexes in target detection, especially in multi-class detection tasks. mAP comprehensively reflects the detection ability of the model by calculating the average accuracy under different recall rates. Specifically, mAP gets the final result by calculating the average accuracy (AP) for each category and then averaging the AP across all categories. The formula is as follows:

mAP=1Ni=1NAPi (8)

Where N is the total number of classes, and APi is the average precision of the I-th class. The higher the mAP value, the better the overall detection performance of the model.

In addition to the traditional indexes such as precision and recall rate, the computational efficiency in the target detection task is also an important evaluation criterion. Detection speed is usually measured by Inference Time, which is the time it takes a model to process an image. The shorter the reasoning time, the better the performance of the representation model in real-time applications, the faster the crack detection can be performed, and the real-time detection scenarios such as intelligent sewing machines can be adapted.

4.3. Experimental results

The control group of this implementation is ultralytics company’s classic YOLO model. YOLOv5s and YOLOv8n, which have a small number of parameters, are selected. Due to the reduction of the number of parameters, the reasoning speed of the model is improved and the accuracy is still good, but it is not enough to be applied to intelligent engineering equipment. YOLO11, launched this year by ultralytics, also shows good potential, achieving 65.2% accuracy on this dataset, but the downside of the improved accuracy is that the running speed is reduced, and the large number of parameters and computation are not suitable for practical devices. The reason for the improvement on the basis of YOLOv8s is that it takes into account the speed and accuracy of recognition. By integrating the improved methods of YOLOv8 proposed at the present stage, such as depth-separable convolutional module and spatial depth conversion convolutional module, traditional convolutional modules in Backbone network and Neck are replaced, and comparative experiments are set. Although these modules show better recognition accuracy on self-made data sets after replacement, the inference speed is still not satisfactory. The best result is that GSConv and DWConv are embedded in the network, and the YOLOv8-DGS network is obtained on the experimental data set. Table 5 shows the comparison of experimental results between the improved model and other models. The mAP50 and MAP50-95 of YOLOv8-DGS reached 91.6% and 61.3% respectively, which increased by 13.5% and 15.2% compared with the YOLOv8s model. Meanwhile, the parameters of the improved model were reduced by 11.6% compared with the original model, which improved the detection speed. Fig 10 shows the identification results of four different road cracks.

Table 5. Different model test results.

Model mAP50 mAP50-95
YOLOv5s 0.624 0.323
YOLOv8n 0.604 0.412
YOLOv8s 0.644 0.428
YOLOv8-DW 0.841 0.506
YOLOv8-GS-slim 0.812 0.476
YOLO11s 0.626 0.431
YOLOv8-DGS 0.916 0.613

Fig 10. Test results of YOLOv8-DGS model.

Fig 10

In order to prove the superiority of the improved model, statistical analysis, especially P-value calculation, is needed to measure the significance of the experimental results. Detection indicators follow normal distribution, independent sample t test can be used:

t=X¯1X¯2s12n1+s22n2 (9)

Where, X¯1, X¯2 is the average performance of YOLOV8-DGS and standard YOLOv8 respectively, s1, s2 is the standard deviation, and n1, n2 is the sample number. Table 6 shows the T-test P-values of the improved model and the original model under the indexes mAP50 and MAP50-95, of which both P-values are less than 0.05, indicating that the improvement in detection performance of the improved model YOLOv8-DGS is not accidental, but statistically significant, and it can be considered that the improved model is superior to YOLOv8s in reliability.

Table 6. P-value analysis of YOLOV8-DGS compared with YOLOv8.

Index t tests the p-value
mAP50 1.03 × 10-7
mAP50-95 8.46 × 10-7

Faster running speed is an important condition for application in smart devices [27]. In this experiment, FPS was used to characterize the inference speed of the model. The formula for calculating FPS is shown in formula (10).

FPS=1000pre+inference+post (10)

Among them, pre is the pre-processing time of the model, which is to convert the data into a format suitable for training to reduce the computing load and help improve the performance of the model; inference is the inference time of the model, that is, the time used to pass the preprocessed data into the model and output it; post is the post-processing time of the model, which is the time spent decoding the output and converting the format. The units of the three are milliseconds. FPS is the time it takes to run all the above processes.

Table 7 shows the inference speed comparison of several typical models. These models all use smaller models such as YOLOv5s and YOLOv8n with fewer parameters, and the control group of other models is also a lightweight improved model. Compared with these models, the inference speed of YOLOv8-DGS in this study is the first in the case of approximate number of parameters, reaching 85 FPS. It is directly proved that the improved model can improve the reasoning speed on the premise of ensuring the recognition accuracy, and provide the feasibility for the subsequent deployment to the intelligent sewing machine.

Table 7. Comparison of the running speed of various models.

Model Pre-process(ms) Inference(ms) Post-process(ms) FPS
YOLOv5s 0.5 11 1.4 74
YOLOv8n 0.5 9.7 1.7 83
YOLOv8s 0.7 10.3 1.8 78
YOLOv8-DW 0.3 10.5 1.6 80
YOLOv8-GS-slim 0.4 10.3 1.4 84
YOLO11s 0.6 10.1 1.7 80
YOLOv8-DGS 0.5 10.1 1.6 85

4.4. Ablation experiment

The ablation experiment aims to verify whether the YOLOv8-DGS model has improved the detection efficiency of crack-like data set compared with the original model. The specific experimental method is the control variable method, which replaces the convolution in Neck on the basis of the original model, and compares the recognition accuracy of traditional convolution and grouping shuffle convolution with the same backbone. After the recognition effect is improved, the Neck part is kept unchanged, and the convolution of layers 0, 1, 3 and 5 in backbone are replaced successively, and the recognition accuracy of depth-separable convolution and traditional convolution are compared. Other parameters remain the same. The experimental results are shown in Table 8.

Table 8. Ablation test results.

Backbone
(DWConv replaces Conv)
Neck
(GSConv replaces Conv)
Evaluation index
0 1 3 5 All Precision Recall mAP
0.722 0.74 0.71
0.768 0.70 0.75
0.812 0.79 0.84
0.844 0.81 0.86
0.902 0.89 0.84
0.916 0.90 0.92

As can be seen from Table 8, the conventional convolution of layers 0, 1, 3 and 5 of the YOLOv8 model is replaced by deep separable volume, while the convolution modules in the Neck structure are all replaced by grouping shuffle convolution. The average accuracy of the above method is the highest, reaching 91.6%. It can be seen that the improved YOLOV8-DGS model based on the YOLOv8 model has better performance.

5. Conclusion

This paper proposes a road crack detection algorithm based on improved YOLOv8, which is used to detect complex pavement and different types of pavement defects that are difficult to distinguish due to obstacles. This method makes good preparation for subsequent work such as defect location, slot opening and joint filling. The algorithm improves two parts: First, deep separable convolution is introduced into the third, fifth and seventh layers of the backbone network, and the number of parameters is reduced through the operation of separated convolution, and the ability of crack feature learning is enhanced. On the other hand, by replacing the traditional convolution in the Neck part of yolov8, the calculation amount of the model is greatly reduced, the detection accuracy of the model is improved, and the detection speed is accelerated. In order to prove the effectiveness of the improved model, the author conducted validation on 19512 data sets. Through model training and verification, the effectiveness of the proposed method is confirmed.

Although the improved model has achieved a good detection effect, the original image quality of the data set used for training is not high, and the types of pavement defects are not covered, so the generalization ability of the model still needs to be improved. There are also challenges in identifying small cracks and pothole-type damage on shaded or wet roads.

In the follow-up work, the research will continue to collect higher-quality pavement crack images to train the improved model and further improve the generalization ability of the model. In order to better apply the model to the intelligent filling equipment, the subsequent research focus is on the lightweight of the model. Pruning techniques such as Max-Min and alpha-beta are used to compress the model, and accelerated deployment techniques such as TensorRT are combined to apply the model to embedded devices.

Acknowledgments

The authors sincerely thank the Editor-in-Chief, and the anonymous reviewers for their detailed comments and constructive suggestions, which greatly improved this manuscript.

Data Availability

The experimental dataset that support the findings of this study are openly available in [GitHub] at [https://github.com/sekilab/RoadDamageDetector]

Funding Statement

Natural Science Foundation project of Shandong Province Project number: ZR2024QE374 Project name: Research on the key technology of robot six-degree-of-freedom grasping and detecting of highly reflective parts in unordered stacking scene The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Xie W, Cui M, Liu M, Wang P, Qiang B. Deephashing multi-label image retrieval with attention mechanism. Int J Robot Autom. 2022;37(4):372–81. [Google Scholar]
  • 2.Hang D, Yang J, Jiang S, Li H, Zou X, Tang C, et al. Lightweight mesh crack detection algorithm based on efficient attention mechanism. Int J Robot Autom. 2023;:170–9. [Google Scholar]
  • 3.Zhong J, Zhu J, Huyan J, Ma T, Zhang W. Multi-scale feature fusion network for pixel-level pavement distress detection. Autom Constr. 2022. [Google Scholar]
  • 4.Zhou L, Cai J, Ding S. The Identification of Ice Floes and Calculation of Sea Ice Concentration Based on a Deep Learning Method. Remote Sensing. 2023;15(10):2663. doi: 10.3390/rs15102663 [DOI] [Google Scholar]
  • 5.Cai J, Ding S, Zhang Q, Liu R, Zeng D, Zhou L. Broken ice circumferential crack estimation via image techniques. Ocean Engineering. 2022;259:111735. doi: 10.1016/j.oceaneng.2022.111735 [DOI] [Google Scholar]
  • 6.Liu H, Miao X, Mertz C, Xu C, Kong H. Crackformer: transformer network for fine-grained crack detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. 3783–92.
  • 7.Ren Y, Huang J, Hong Z, Lu W, Yin J. Image-based concrete crack detection in tunnels using deep fully convolutional networks. Constr Build Mater. 2022;234. [Google Scholar]
  • 8.Jia Y, Chen G, Zhao L. Defect detection of photovoltaic modules based on improved VarifocalNet. Sci Rep. 2024;14(1):15170. doi: 10.1038/s41598-024-66234-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Jiang W, Yang L, Bu Y. Research on the Identification and Classification of Marine Debris Based on Improved YOLOv8. JMSE. 2024;12(10):1748. doi: 10.3390/jmse12101748 [DOI] [Google Scholar]
  • 10.Liu Z, Gu X, Yang L, Wang Y, Wang D. Novel YOLOv3 model with structure and hyperparameter optimization for detection of pavement concealed cracks in GPR images. Journal Title Abbreviation. 2022. [Google Scholar]
  • 11.Liu Z, Gu X, Chen J, Wang D, Chen Y, Wang L. Automatic recognition of pavement cracks from combined gpr b-scan and c-scan images using multiscale feature fusion deep neural networks. Autom Constr. 2023. [Google Scholar]
  • 12.Sun Q, Wang H, Liu W, Zou J, Ye F, Li Y. An Improved Stereo Visual-Inertial SLAM Algorithm Based on Point-and-Line Features for Subterranean Environments. IEEE Trans Veh Technol. 2025;74(3):3925–40. doi: 10.1109/tvt.2024.3492388 [DOI] [Google Scholar]
  • 13.Cai G, Zheng X, Guo J, Gao W. Real-time identification of borehole rescue environment situation in underground disaster areas based on multi-source heterogeneous data fusion. Safety Science. 2025;181:106690. doi: 10.1016/j.ssci.2024.106690 [DOI] [Google Scholar]
  • 14.Xu X, Fu X, Zhao H, Liu M, Xu A, Ma Y. Three-Dimensional Reconstruction and Geometric Morphology Analysis of Lunar Small Craters within the Patrol Range of the Yutu-2 Rover. Remote Sensing. 2023;15(17):4251. doi: 10.3390/rs15174251 [DOI] [Google Scholar]
  • 15.Hu W, Wang W, Ai C, Wang J, Wang W, Meng X, et al. Machine vision-based surface crack analysis for transportation infrastructure. Automation in Construction. 2021;132:103973. doi: 10.1016/j.autcon.2021.103973 [DOI] [Google Scholar]
  • 16.Du Y, Pan N, Xu Z, Deng F, Shen Y, Kang H. Pavement distress detection and classification based on yolo network. In: Informa UK Limited. [Google Scholar]
  • 17.Wang H, Han X, Song X, Su J, Li Y, Zheng W, et al. Research on automatic pavement crack identification Based on improved YOLOv8. Int J Interact Des Manuf. 2024;18(6):3773–83. doi: 10.1007/s12008-024-01769-3 [DOI] [Google Scholar]
  • 18.Dutta MK, Sarkar R. Application of retinex and histogram equalisation techniques for the restoration of faded and distorted artworks: a comparative analysis. Optik. 2022. [Google Scholar]
  • 19.Zhang F, Dai Y, Peng X, Wu C, Zhu X, Zhou R, et al. Brightness segmentation-based plateau histogram equalization algorithm for displaying high dynamic range infrared images. Infrared Physics Technol. 2023;134:104894. doi: 10.1016/j.infrared.2023.104894 [DOI] [Google Scholar]
  • 20.Zhao H, Jin J, Liu Y, Guo Y, Shen Y. Fsdf: A high-performance fire detection framework. Expert Syst Appl. 2024. [Google Scholar]
  • 21.Zhang K, Zhang Y, Cheng HD. Crackgan: pavement crack detection using partially accurate ground truths baitute of Electrical and Electronics Engineers (IEEE). 2021;(2).
  • 22.Kaiser L, Gomez AN, Chollet F. Depthwise separable convolutions for neural machine translation. Comput Sci Comput Lang. 2017. [Google Scholar]
  • 23.Liu G, Hu Y, Chen Z, Guo J, Ni P. Lightweight object detection algorithm for robots with improved YOLOv5. Engineering Applications of Artificial Intelligence. 2023;123:106217. doi: 10.1016/j.engappai.2023.106217 [DOI] [Google Scholar]
  • 24.Li H, Li J, Wei H, Liu Z, Zhan Z, Ren Q. Slim-neck by GSConv: a lightweight-design for real-time detector architectures. J Real-Time Image Proc. 2024;21(3). doi: 10.1007/s11554-024-01436-6 [DOI] [Google Scholar]
  • 25.Hu Y, Zeng Z, Song J, Xu L, Zhou X. Online network traffic classification based on external attention and convolution by IP packet header. Computer Networks. 2024;252:110656. doi: 10.1016/j.comnet.2024.110656 [DOI] [Google Scholar]
  • 26.Li J, Yuan C, Wang X. Real-time instance-level detection of asphalt pavement distress combining space-to-depth (SPD) YOLO and omni-scale network (OSNet). Automation in Construction. 2023;155:105062. doi: 10.1016/j.autcon.2023.105062 [DOI] [Google Scholar]
  • 27.Wang Y, Zhang L, Xiong X, Kuang J, Xiang S. A Lightweight and Efficient Multi-Type Defect Detection Method for Transmission Lines Based on DCP-YOLOv8. Sensors (Basel). 2024;24(14):4491. doi: 10.3390/s24144491 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Ahmed Yosri

25 Mar 2025

PONE-D-25-10806A pavement crack identification method based on improved yolov8PLOS ONE

Dear Dr. Zhang,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 09 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Ahmed M. Yosri

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. Thank you for stating the following financial disclosure:

“Natural Science Foundation project of Shandong Province

Project number: ZR2024QE374

Project name: Research on the key technology of robot six-degree-of-freedom grasping and detecting of highly reflective parts in unordered stacking scene.”

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

4. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“This research was supported by the Natural Science Foundation project of Shandong Province (ZR2024QE374).”

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“Natural Science Foundation project of Shandong Province

Project number: ZR2024QE374.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

5. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript deals with " A pavement crack identification method based on improved yolov8." The following comments and suggestions need to be addressed before its consideration.

1. The manuscript should provide a more detailed explanation of the YOLOv8-DGS model improvements. Specifically, elaborating on how deep separable convolution (DWConv) and GSConv are integrated into the backbone and neck of the model would help readers understand the technical advancements made in crack detection

2. These references could be cited in the introduction, particularly when discussing the advancements in deep learning methods for image processing and detection tasks. It would fit well in a paragraph that highlights the growing application of deep learning in various fields, including remote sensing and crack detection. Remote Sensing, 15(10), 2663. https://doi.org/10.3390/rs15102663, Ocean Engineering, 259, 111735. https://doi.org/10.1016/j.oceaneng.2022.111735

3. This reference can be cited in the introduction, particularly when discussing advancements in the YOLOv8 model, machine learning, and its applications in various fields, including marine debris detection. It would be relevant to mention this study when highlighting the versatility and improvements of YOLOv8 in different contexts. Journal of Marine Science and Engineering, 12(10), 1748. https://doi.org/10.3390/jmse12101748, International Journal of Computational Methods, 2450066. 10.1142/S021987622450066X, Journal of Computing in Civil Engineering, 39(3), 4025017. 10.1061/JCCEE5.CPENG-6167, Scientific Reports, 14(1), 15170. 10.1038/s41598-024-66234-3, Measurement Science and Technology, 36(1), 015104. 10.1088/1361-6501/ad7f77, GPS Solutions, 28(4), 178. 10.1007/s10291-024-01715-6, IEEE Transactions on Vehicular Technology, 1-16. 10.1109/TVT.2024.3492388

4. It is essential to include a comprehensive description of the experimental setup used to evaluate the YOLOv8-DGS model. This should encompass the dataset characteristics, the sample number, and the test case selection criteria. Providing this information will enhance the reproducibility of the study

5. While the manuscript mentions impressive performance metrics such as Precision, Recall, F1-score, and mAP50, it would be beneficial to include a comparison with existing models, such as YOLOv3 or other state-of-the-art methods. This comparison can highlight the advantages of the proposed model and provide context for the reported results

6. These references could be cited when discussing the importance of real-time identification systems in various environments, including underground disaster areas. It would be relevant to mention this study while highlighting the advancements in detection technologies and their applications in safety and rescue operations. A potential sentence could be: "Recent studies have demonstrated the effectiveness of multi-source heterogeneous data fusion in real-time identification of critical situations in disaster areas, which aligns with the advancements in detection technologies discussed in this paper. Safety Science, 181, 106690. https://doi.org/10.1016/j.ssci.2024.106690. Remote Sensing, 15(17), 4251. https://doi.org/10.3390/rs15174251. Advances in Civil Engineering, 2023(1), 8897139. https://doi.org/10.1155/2023/8897139. Chinese Journal of Mechanical Engineering, 37(1), 108. 10.1186/s10033-024-01107-4. Computers and Geotechnics, 178, 106949. https://doi.org/10.1016/j.compgeo.2024.106949. Computers and Geotechnics, 177, 106827. https://doi.org/10.1016/j.compgeo.2024.106827

7. Incorporating visual aids, such as graphs or tables, to present the performance metrics and comparisons with other models can significantly improve the manuscript's clarity and impact. Visual representations can help readers quickly grasp the improvements made by the YOLOv8-DGS model

8. The manuscript should address any limitations of the proposed method. Discussing potential challenges, such as the model's performance in varying environmental conditions or with different types of pavement materials, would provide a balanced view of the research

9. Including a section on future work could enhance the manuscript. Suggestions for further research, such as exploring the application of the YOLOv8-DGS model in real-time scenarios or integrating it with other technologies, would demonstrate the ongoing relevance of the research

10. A thorough manuscript proofreading is recommended to ensure clarity and coherence. This includes checking for grammatical errors, ensuring consistent terminology, and improving the overall flow of the text

Reviewer #2: I have reviewed the manuscript entitled " A pavement crack identification method based on improved yolov8" the manuscript is insightful and is structured well, there are several areas that require further refinement. Below, I have provided section-wise review with specific comments to enhance the clarity, rigor, and overall impact of your work.

The title is clear and informative but could be slightly refined. Suggested improvement:

"Enhanced YOLOv8-Based Pavement Crack Detection: A High-Precision Approach".

Consider explicitly mentioning "Deep Learning" or "Computer Vision" to attract a broader audience.

In the abstract section add a sentence highlighting how this method outperforms conventional YOLO models. The section lacks a direct comparison with existing YOLO models. It also does not explicitly mention real-time feasibility or computational efficiency. The claim about "improving performance" is general—what aspect? Accuracy, speed, or robustness?

Specify if this model is suitable for real-time deployment.

In the introduction section, the paper mentions manual inspection is inefficient but does not provide statistics or real-world implications (e.g., cost/time of manual inspection vs. AI-based detection). The problem statement could be more focused. Instead of stating that "YOLOv8 has potential," explicitly highlight its limitations in pavement crack detection. The introduction does not clearly state YOLOv8 alone is insufficient. Instead of saying "YOLOv8 has potential," it should specify its weaknesses (e.g., handling small cracks, computational load). Other crack detection methods (e.g., transformer-based approaches) are missing. How does this work compare to recent works like CrackFormer? The paper should explicitly state the gap in previous research that this study addresses.

In the material and method section, the sample size and its justification need more emphasis. A dataset split ratio (80:20) is mentioned, but no rationale is provided. Does the dataset cover diverse road conditions (e.g., lighting, shadows, different pavement materials)? What resolution are the images? Are there any preprocessing steps to remove blur/noise? Will a model trained on RDD2022 generalize well to different environments?

The section could benefit from a statistical analysis of how preprocessing improved accuracy.

Section 2.2… How does preprocessing affect accuracy? A small table comparing performance before/after preprocessing would strengthen this section. Mean filtering is explained but could use more mathematical rigor. Why was mean filtering chosen? Does it outperform Gaussian filtering or median filtering for noise reduction?

Section 3… Detection algorithm improvement… The paper claims the improvements reduce computational load, but where is the evidence? A table comparing FLOPs and parameter count before and after modification is needed. How do the modifications affect feature extraction? A grad-CAM visualization would strengthen this section.

In the experimental section of the paper, were hyperparameters optimized via grid search? Why were specific values chosen? Would adding more layers improve accuracy?

Section 4.2…. Are differences statistically significant? A p-value analysis is needed. A statistical test comparing YOLOv8-DGS with standard YOLO models would strengthen claims.

"Inference Time" should be broken down into GPU vs CPU performance.

Section 5… Conclusion section lacks a discussion of failure cases: When does the model fail? Low contrast? Blurry images?

Future work is vague: What specific lightweight techniques will be explored?

No discussion of practical deployment: How can this be integrated into real-world applications?

Reviewer #3: This paper discusses crack detection, which is very important for road management. An image enhancement technology and a recognition algorithm are developed to enhance the performance of YOLO 8. The developed method YOLOv8-DGS can effectively identify the cracks on pavement.

(1) How do you measure the “accuracy” in the case study? It is the most important criterion. Its result is given without calculation or definition. How do you know the actual measurement of the cracks?

(2) Compared to YOLO 11, YOLOv8-DGS has better accuracy on mAP50 but worse accuracy on MAP50-95. Their running speeds are also similar. Does it mean YOLOv8-DGS is not better than YOLO 11? Please explain.

(3) This paper has many typos and errors, for example, it should be Table 3 instead of Figure 3 on Line 354. Please carefully revise this paper before publication.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 May 22;20(5):e0324512. doi: 10.1371/journal.pone.0324512.r003

Author response to Decision Letter 0


6 Apr 2025

We have carefully read the comments of reviewers and editors, and made changes and replies in response to these comments. The information for the Response has been submitted under "Response to Reviewers".

Attachment

Submitted filename: Response to Reviewers.docx

pone.0324512.s002.docx (22KB, docx)

Decision Letter 1

Ahmed Yosri

28 Apr 2025

Enhanced YOLOv8-Based Pavement Crack Detection: A High-Precision Approach

PONE-D-25-10806R1

Dear Dr. Zhang,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Ahmed M. Yosri

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The content of paper was well organized, all the suggested points are incorporated and easy for the reader to follow the subject discussed, thus support for its acceptance.

Reviewer #2: I am satisfied with responses, the authors have incorporate all the changes and addressed all the points, so the paper can be accepted for publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

Reviewer #2: No

**********

Acceptance letter

Ahmed Yosri

PONE-D-25-10806R1

PLOS ONE

Dear Dr. Zhang,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Ahmed M. Yosri

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Attachment

    Submitted filename: Response to Reviewers.docx

    pone.0324512.s002.docx (22KB, docx)

    Data Availability Statement

    The experimental dataset that support the findings of this study are openly available in [GitHub] at [https://github.com/sekilab/RoadDamageDetector]


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES