Automatic detection of potholes using VGG-16 pre-trained network and Convolutional Neural Network

Satyabrata Swain; Asis Kumar Tripathy

doi:10.1016/j.heliyon.2024.e30957

. 2024 May 14;10(10):e30957. doi: 10.1016/j.heliyon.2024.e30957

Automatic detection of potholes using VGG-16 pre-trained network and Convolutional Neural Network

Satyabrata Swain ¹, Asis Kumar Tripathy ^1,^⁎

PMCID: PMC11128863 PMID: 38803954

Abstract

A self-driving car is necessary to implement traffic intelligence because it can vastly enhance both the safety of driving and the comfort of the driver by adjusting to the circumstances of the road ahead. Road hazards such as potholes can be a big challenge for autonomous vehicles, increasing the risk of crashes and vehicle damage. Real-time identification of road potholes is required to solve this issue. To this end, various approaches have been tried, including notifying the appropriate authorities, utilizing vibration-based sensors, and engaging in three-dimensional laser imaging. Unfortunately, these approaches have several drawbacks, such as large initial expenditures and the possibility of being discovered. Transfer learning is considered a potential answer to the pressing necessity of automating the process of pothole identification. A Convolutional Neural Network (CNN) is constructed to categorize potholes effectively using the VGG-16 pre-trained model as a transfer learning model throughout the training process. A Super-Resolution Generative Adversarial Network (SRGAN) is suggested to enhance the image's overall quality. Experiments conducted with the suggested approach of classifying road potholes revealed a high accuracy rate of 97.3%, and its effectiveness was tested using various criteria. The developed transfer learning technique obtained the best accuracy rate compared to many other deep learning algorithms.

Keywords: Potholes, CNN, VGG 16, Automatic detection

Highlights

•
To classify potholes, a Convolutional Neural Network using VGG-16 is constructed.
•
SRGAN is used to improve overall quality by reducing blurriness in images.
•
The VGG-16 pre-trained model is used for extracting features of the images.
•
Suggested approach for classifying potholes revealed a high accuracy rate of 97.3%.

1. Introduction

Infrastructure is an essential element of civilization and a significant part of many people's lives. The majority of countries across the world rely heavily on their road and street networks to facilitate the transportation of people, products, and freight by motorized vehicles. Both institutions that do scientific research and companies that produce automobiles have lately been interested in autonomous driving [1]. It is one of the most promising areas of study in science and technology, and it can improve driving safety, reduce the amount of time spent in traffic, and cut down on the amount of pollution produced [2]. Vehicles capable of driving themselves need to meet stringent safety standards. Collecting real-time road data while driving can give essential information for route planning, which in turn can assist in assuring the safety of vehicles [3].

Because of the rise in both pollution and the number of vehicles on the road, almost every city in the country has lately experienced an increase in large and small potholes. The death toll caused by potholes increased by more than fifty percent over 2016 levels, reaching a total of three thousand five hundred and ninety-seven each year in the United States [4]. Potholes indicate that the road has not been maintained properly, which may indicate an underlying structural issue [5]. Manual inspections performed by skilled structural engineers or inspectors take a lot of time and are expensive. Several attempts have been made by professionals and researchers working in the transportation sector to create an automated method for identifying potholes [6]. Finding potholes is a labor-intensive and time-consuming process that demands a lot of manual effort. Detecting potholes has been accomplished through the use of a plethora of different technologies, such as those based on vibration, 3D reconstruction, and vision.

In recent decades, there has been a growth in the utilization of automated pothole-detecting technologies that integrate a wide variety of methodological approaches [7]. Researchers have applied deep learning algorithms to images of road defects. As a result, significant advancements have been made, among other things, in road identification, classification, and segmentation. This has allowed the researchers to overcome the limitations of traditional artificial visual inspection [8], [9]. Deep learning requires a considerable amount of both the source data and the target data for it to be able to train its networks efficiently. Since the amount of the dataset is restricted, there is a greater possibility that the overfitting problem will affect the network [10]. Hossain et al. [11] compared the CNN-based YOLO V4 tiny AI model results with the human evaluator to detect potholes with an accuracy of 85%.

In 2020, Baek and Chung [12] developed a technique for identifying and classifying potholes predicated on edge detection and used pavement pictures as input data. This approach was recommended for use in the identification and classification of potholes. While evaluating the state of the roads, the YOLO algorithm was utilized to find and classify potholes as they were found. In 2021, Park et al. [13] introduced a method for automatically identifying potholes that used several YOLO models and images as input data sources. The approach also detected potholes in photographs. Ye et al. [14] outlined a procedure for pre-pooling CNN using photos as input data to detect potholes in roadways automatically and adding the pre-pooling layer to the classic CNN as input before the initial convolution layer makes it possible to distinguish traditional CNN from other CNNs. This is how the pre-pooling CNN is characterized. Dewangan and Sahu [15] came up with the idea of an embedded vehicle prototype and a CNN-based pothole-detecting method. Fan et al. [16] provided an approach for identifying potholes that is not only accurate but also efficient in terms of computing. A dense disparity map is initially adjusted to make it possible to differentiate between parts of damaged and undamaged roadways with greater precision.

An efficient pothole recognition method was proven by Fan et al. [17], who based their work on road disparity map estimation and segmentation. A technique for transforming disparity maps is then utilized to provide improved visibility of the sites of damaged roadways. Fan et al. [18] developed a one-of-a-kind road damage detection system that utilized unsupervised disparity map segmentation in their research. The thresholding approach developed by Otus is used to segment the modified disparity map to isolate the regions that include roadways that have been damaged. Fan et al. [19] covered CNNs considered state-of-the-art (SoTA). The SoTA CNNs developed for semantic segmentation are tested rigorously to see how well they locate potholes in the road. Also, to better the visual feature depictions for semantic segmentation, the researchers constructed a unique CNN layer they alluded to as the graph attention layer [20]. This layer was done to increase semantic segmentation (GAL).

Grayscale and texture data processing were combined by Gao et al. [21], who suggested an approach for integrating the two types of data. This technique primarily makes use of an industrial camera to accomplish the task of pothole identification in a quick and precise manner. Yebes et al. [22] concentrated on finding potholes in images of actual road sceneries in the real world. They produced a large dataset consisting of photos annotated with information on potholes. Subsequently, they fine-tuned the available object identification models using Faster R-CNN and SSD deep neural networks. Using bounding boxes and deep neural networks, Gupta et al. [23] developed a one-of-a-kind approach for identifying and localizing potholes in thermal images. Impressive findings emerged from tests run with modified ResNet 50-RetinaNet and ResNet 34-SSD versions. The average precisions obtained from these tests were 91.15% and 74.93%, correspondingly. Lim et al. [5] proposed deep learning-based automatic pothole identification techniques. These systems take photos as their primary source of input data. They used public database websites like Flickr, Google Images, and Pixabay to acquire images of potholes for their presentation. The research presented two customized models based on YOLOv2 to train and evaluate the pothole dataset.

Chun and Ryu [24] developed a fully convolutional neural network (CNN) based on a quasi-learning road surface fault diagnostic system. Computer vision was utilized by Yousaf et al. [25] to recognize and locate potholes, one of the most dangerous dangers that may be found on the road. They devised a system that works from the top down to identify and localize potholes in pavement images. In their recent study, Wu et al. [2] presented an innovative method for locating road potholes using the mobile point cloud and photographs. The process is broken down into three stages: the first is the extraction of 2D candidate potholes from pictures using a deep learning method; the second is the extraction of 3D candidate potholes using a point cloud; and the third stage is the identification of potholes using depth analysis. Ul Haq et al. [26], suggested a hybrid-matching approach combining crucial point screening and block-matching techniques to automate identifying roadway potholes. This was done to achieve the goal. Alhussan et al. [27], provided a one-of-a-kind technique for the feature extraction and refinement of the random forest (RF) classifier founded on responsive evolution and dipper-throated optimization. This method was used to classify random forests (AMDTO).

Lv. N. et al. proposed a novel YOLO3 model called the YOLOv3-ALL algorithm [28] to detect small objects on a defective surface. To increase the clustering effect, the K-mean++ algorithm is used along with the intersection-over-union (IoU). A convolutional block attention module (CBAM) is used in the network to improve the network capability [29], [30]. The detection capability of a YOLOv3 network for small-object defects is greatly improved after the addition of fourth-scale prediction. Wang. D. et al. proposed [31] object detection model using YOLOv3, data augmentation, and structure optimization. Color adjustment is carried out to enhance the color contrast. Then, through geometric transformation, data augmentation is performed. Based on the presence of water or not, Pothole categories are subdivided into different groups. To optimize the structure of the YOLOv3 model, Residual Network (ResNet101) and complete IoU (CIoU) loss are used. Then, clustering and modification of multi-scale anchor sizes is done using the K-mean++ algorithm. Salaudeen et al. [32] proposed an enhanced super-resolution generative adversarial networks (ESRGAN) based technique to improve road surface quality for detecting small objects. YOLOv5 and EfficientDet networks are used as a detection network. While experimenting on different detection datasets, the proposed method performs better than state-of-the-art methods. The abbreviations used in the paper are shown in Table 1.

Table 1.

Abbreviation Table.

Abbreviation

Full Form

YOLO

You only look once

CNN

Convolutional Neural Network

Transfer Learning

VGG

Visual Geometry Group

True Positive

False Positive

True Negative

False Negative

FPR

False Positive Rate

TPR

True Positive Rate

ROC

Receiver Operator Characteristic

PSNR

Peak Signal-to-Noise Ratio

ESRGAN

Enhanced Super-Resolution Generative

Adversarial Network using Keras

Open in a new tab

MSE

Mean Square Error

SRGAN

Super-Resolution GAN

Open in a new tab

The results of the deep learning-based approaches discussed earlier stand out compared to those of other machine learning techniques. There are still some questions and constraints to consider. Deep learning has to process enormous amounts of data to get better results than traditional methods. Training may be costly as a result of the complexity of the data models. The training of deep learning algorithms takes considerably more time. In terms of performance, transfer learning models beat conventional machine learning models. This is because the models that use information (features, weights, etc.) from models that have been trained in the past are already familiar with the properties. It makes training neural networks much quicker compared to starting from scratch.

The following is an outline of the primary contributions made by the technique that has been proposed:

•
Using the methods of color correction and geometric modification, an efficient picture preparation approach was presented to refine and enlarge the pothole dataset. The sensing consistency of the suggested model was maintained by utilizing this strategy.
•
SRGAN is suggested to enhance the image's overall quality. It will make the photographs less blurry, change an image of low quality into an image of high resolution, and eliminate any distortion.
•
VGG-16 pre-trained model is proposed to transfer knowledge to the other model for extracting features of the pothole image.
•
Following this, a Convolutional Neural Network is suggested as a method for properly detecting and categorizing potholes within road photographs.
•
The performance of the detection method is improved by optimizing the hyperparameters of the suggested network to get optimal results.
•
As a last step, the performance of the suggested system is assessed by contrasting it with that of several other deep learning algorithms.

Pothole-detection techniques are summarized as shown in Table 2.

Table 2.

Detection Techniques for Potholes Detection and Classification.

Authors

Algorithms

Dataset

Performance

Baek & Chung (2020) [12]

YOLO network

Global Road Damage Detection

Challenge 2020

Open in a new tab

Precision 83%

Park et al., (2021) [13]

YOLOv4, YOLOv4-tiny

Annotated Potholes Image Dataset

Precision 84%, ,

Open in a new tab

Ye et al. (2021) [14]

CNN

Generated dataset

Precision 98.95%

Fan et al. (2019) [18]

Disparity

transformation algorithm

Open in a new tab

Three datasets containing 67 pairs

of stereo images.

Open in a new tab

Precision 98.04%

Fan et al. (2022) [19]

Disparity map transformation

Stereo pothole dataset

Precision 88.26%

Gao et al. (2020) [21]

Support vector machine

Pothole image dataset

Precision 97.4%

Gupta et al. (2020) [23]

Deep neural networks

Thermal image dataset

ResNet50-RetinaNet:

precision 91.15%

ResNet34-SSD:

precision 74.93%

Open in a new tab

Chun & Ryu (2019) [24]

Fully CNN

Collected dataset

Supervised:

Precision 70.12%

Semi-Supervised:

Precision 90.14%

Open in a new tab

Yousaf et al. , (2018) [25]

Support vector machine (SVM)

Pavement images dataset

Precision 95.7%

Alhussanet al., (2022) [27]

Adaptive mutation and dipper

throated optimization (AMDTO) for

feature selection and

random forest (RF) classifier.

Open in a new tab

Road potholes dataset

Precision 99.795%

Open in a new tab

2. Proposed methodology

In this section, the major processes and information sources that were implemented in the recommended approach are detailed. They were used in the technique. Under this specific inquiry, the proposed model was educated using a data set of pictures of potholes. To improve the overall picture quality, super-resolution generative adversarial networks are utilized to upsize the photographs by a factor of four while concurrently producing high-resolution images from low-resolution pictures. We employed a VGG-16 CNN pre-trained model using the ImageNet dataset to provide highly trained parameters to improve the prediction performance of the proposed CNN model. This allowed us to produce more accurate results (weights). To properly train the CNN model, we first independently trained the parameters and weights of the pre-trained models and then transferred those settings to the CNN model. Because of this, we enhanced the prediction capabilities of the suggested CNN model. Before being integrated with the other photos, each of the shots is first subjected to a process known as data augmentation, during which they are zoomed in both the horizontal and the vertical directions. ReLu is utilized in every one of the examinations. It is essential to emphasize that the soft-max function was applied to the model's topmost prediction layer to represent the data accurately. This is a significant point to note. At this stage, we will apply the VGG16 model to the Transfer Learning (TL) model that we have proposed. First, we eliminate the last three FC levels of the VGG-16 network. After that, we replace those layers with a fully linked CNN layer. It is necessary to perform this to guarantee that the output characteristics will conform to the binary categorization. The information flow that would be handled by the developed framework for detecting and categorizing potholes is depicted in Fig. 1.

2.1. Dataset

A sufficient nonconformity picture dataset for training is required to guarantee the production's precision and durability and facilitate interaction with prediction using the machine learning algorithm. This may be done by obtaining enough nonconformity image datasets. According to the findings of the investigation that was carried out by Rahman and Patel in the year 2020, [33], a total of 1441 images of potholes, each of which had a resolution of 720 by 720 pixels, were categorized and uploaded to a system that Kaggle managed. The process of object categorization was performed by an expert panel, which began by determining if the pothole that was the focus of the investigation was visible in the snapshot and then proceeded to locate it within the image that accompanied the photograph. The algorithm that classifies objects was used on every image in the database, and the process was carried out as thoroughly and frequently as possible. Undoubtedly, the picture might illustrate one or more of the items. The dataset is comprised of the photos and the labels that go along with them. The dataset was segmented into training and testing subsets, with each of these subsets having a ratio of 80% to testing and 20% to training. Fig. 2 displays several examples of common potholes in the testing subgroup.

Pothole images on road surfaces of the dataset.

2.2. Data preprocessing

Color correction techniques were utilized to alter the brightness level present in the dataset. The image's contrast, as well as its clarity, have both been adjusted. The amount of the dataset and the data quality both have a role in the success of DL models. In addition to that, methods of geometric augmentation are utilized.

2.2.1. Color adjustment

The photos showing the cracks in the pavement were captured in three different color channels. Red, green, and blue were the three colors that made up each color pixel. These three colors, which formed a vector representing the image, corresponded to a different color image located at a specific point in space. The enhancement of color photographs was accomplished by applying two different techniques: contrast and sharpness [34]. The poor contrast brought on by the pothole picture's limited gray-level range was addressed by using contrast augmentation to expand the gray-level series of the picture and improve the image's clarity. This was done to resolve the quandary. After applying the probability smoothing method to the intensity and saturation components of the hue-saturation-intensity (HSI) color model, the concentration and diffusion components were converted into a uniform circulation. This was accomplished by standardizing the distributions of the two components. In the equations provided in Eq. (1) and Eq. (2), respectively, the calculation formulae for the intensity and saturation components are shown.

y_{1k} = F (x_{1 k}) = P {x_{1} \leq x_{1 k}} = \sum_{m = 0}^{k} f (x_{1 m}) = \sum_{m = 0}^{k} P {x_{1 k} \leq x_{1 k}}

(1)

y_{s t} = F (x_{1 k} | x_{I k}) = \sum_{m = 0}^{t} f (x_{s m} | x_{I k}) = \sum_{m = 0}^{t} \frac{P {x_{I} = x_{I K, x_{s} = x_{s m}}}}{P {x_{I} = x_{I k}}}

(2)

where k = 0, 1,..., L-1 and t = 0, 1,..., M-1; L and M, respectively, denote discrete degrees of intensity and saturation.

{X = (x}_{H}, x_{S}, x_{I}) (xH, xS, xI)

(3)

Each image is represented by T, which is a vector of color pixel values. A probability function is denoted by the symbol F(.), and its formula reads as follows:

{F (Z) = F (x}_{I}, x_{s} {) = P}_{xI}, x_{I}, x_{s}

(4)

The blurring initially present in the picture of the pothole is removed when sharpness is applied, and the picture's core is highlighted by increasing the contrast of the surrounding pixels. The inclusion of gradation of the pothole picture is made possible because of the contribution of the Laplace sharpness (the Laplace operator). An illustration of the enhancement strategy that makes use of the Laplace operator may be seen in the equation (5).

g (x, y) = f (x, y) - [\begin{matrix} \nabla^{2} R (x, y) \\ \nabla^{2} G (x, y) \\ \nabla^{2} B (x, y) \end{matrix}]

(5)

Where $g (x, y)$ represents the pothole image following sharpening, $f (x, y)$ represents the original pothole picture, and $R (x, y)$ , $G (x, y)$ , and $B (x, y)$ represent the Laplace operators of every element (red, green, and blue) of the color images, respectively.

2.2.2. Geometric augmentation

Pictures underwent a geometric transformation process using specialized flipping, rotating, and cropping procedures. Randomization occurs in every direction over the pothole target, which is generally polygonal. Based on the abovementioned characteristics, four rotating operations were utilized: 90 degrees clockwise, 180 degrees, 90 degrees anticlockwise, and random crop. The focus point of an existing picture is rotated at a predetermined angle to produce a pothole image. This angle serves as a point of reference. The quantity of data may be effectively increased by using this method. The random crop approach increased the learning of the major properties of potholes while improving the model's stability. This was accomplished by clipping the corners of the image at random. For instance, potholes (P) are especially notable for F, and the compiled image contains background noise (N). To put it another way, the principal characteristic of the pothole in the image that was captured is denoted by the letter I, which is written as ( $F, N$ ). Although if it is theoretically possible to learn ( $F, N$ ), it is more likely that one will learn F, which can lead to overfitting. The majority of depictions have potholes in the central area. In contrast to N, it is extremely improbable that their basic traits would disappear. This is precisely how they are positioned concerning one another when working out. The information gain correlated with factor F is large, and it is more plausible that the trainee would learn factor F while rejecting factor B, as shown by the equation. This is true even when the weight assigned is only 0 or 1, which is the most extreme situation that can be considered (4). For example, the scenario described by the expression $X C ((1)) (1 . F, 0 . B)$ is a scenario where F weights 1, while N does not weigh at all.

X_{I}^{(1)} (1 . F, 0 . N), X_{I}^{(2)} (1 . F, 0 . N), X_{I}^{(3)} (1 . F, 0 . N)

(6)

2.2.3. Image quality enhancement using super-resolution GAN (SRGAN)

The SR network that utilizes the GAN architecture was developed to enhance the perceived quality of super-resolution pictures. In most cases, SRGAN uses the fundamental structure of SRResNet [35]; however, to obtain a greater degree of performance, the output image and the perceptual deficit will require a few minor adjustments. The SRGAN model can be trained to perform super-resolution and picture improvement concurrently (including color correction and de-blurring).

2.2.4. Relativistic discriminator

The SRGAN uses an enhancement of the discriminator network founded on the translational GAN [36]. This discriminator differs from the usual discriminator D. The relativistic discriminator $D_{R}$ makes an effort to infer the likelihood that a genuine image is considerably more realistic than an image created by a generator. As a direct consequence of this, the authors came to the following results concerning the discriminator loss and the confrontational loss for the convolution layer:

L_{D}^{R} = - E_{x r} [\log (D_{R} (x_{r}, x_{f}))] - E_{x f} [\log (D_{R} (x_{f}, x_{r}))]

(7)

L_{G}^{R} = - E_{x r} [\log (1 - D_{R} (x_{r}, x_{f}))] - E_{x f} [\log (D_{R} (x_{f}, x_{r}))]

(8)

$E_{x f}$ stands for calculating an average for all data created in a mini-batch. When applied to the SR picture, the image obtained from the input LR picture is displayed as xf and xr, respectively. During adversarial training, the generator can profit from data gradients created from actual and generated data.

2.2.5. Perceptual loss

$S R G A N$ suggested a loss, which would restrict the characteristics once the activation was performed. Like perceptual loss $L_{p e r c e p t}$ works, feature restrictions are satisfied before activation. The following equation may be used to calculate the loss for the encoder network:

L_{G} = L_{p e r c e p t} + λ L_{G}^{R} + η L_{1}

(9)

Although the destruction of content is coefficient for balancing the various loss factors, L1 is used to assess the $1 - n o r m$ distance between the ground truth picture and the recovered image. This distance determines how close the recovered image is to the ground truth image. Lastly, the authors suggested a network interpolation technique that trains a $P S N R - o r i e n t e d$ network and a fine-tuned GAN-base network, with both network parameters interpolated to create an interpolated model G with the following parameters. This was done to prevent undesirable noise in the results.:

Θ_{G}^{I N T E R P}, Θ_{G}^{P S N R}, Θ_{G}^{G A N}

(10)

Θ_{G}^{I N T E R P} = (1 - α) Θ_{G}^{P S N R} + α Θ_{G}^{G A N}

(11)

As a result, the network can give favorable results without using artifacts and maintain a consistently balanced perceptual quality during training. Regarding the problem of potholes, the network showed that it could drastically reduce the noise in the photographs. A proper deblurring was conducted, and the edge information of the pothole was enhanced, in addition to the images being enlarged to a size that is four times their actual size.

2.3. Transfer learning model

Transfer learning refers to the process of utilizing a model that has been created previously and “fine-tuning” it with the assistance of fresh data. This process is known as “reusing” previously taught models, as shown in Fig. 3. The model that has previously been pre-trained is going to be used in the role of a feature extractor for its intended function. To provide a more precise explanation, the pre-trained CNN models extract features like edges and curves for the aim of object detection and the categorization of images. The layers of the pre-trained model can be utilized for feature extraction, provided that the new job does not significantly deviate from the Image Net project in terms of the dataset or the task's difficulty. The fully linked portions of the already pre-trained model are removed and replaced with a new set. This new set is then trained using the unique dataset for the new job. Because the number of parameters for preparation may be lowered by reusing the previously pre-trained parameters on models, transfer learning can potentially minimize the amount of data required to construct CNN models. Because there are typically just a handful of medical pictures to study, the field of radiology lends itself well to the concept of transfer learning.

As shown in Fig. 3, training the VGG-16-based transfer learning (TL) model for pothole identification was completed. The previously trained VGG-16 model has improved its accuracy by applying its expertise to the problem of detecting potholes. Fixed feature extractors were taken from the lowest layers of the pre-trained model and were responsible for extracting common features. Consequently, these weights were put in the freezer and were not utilized in the training procedure. The upper layers of the model that had already been pre-trained were the ones that were used to learn task-specific characteristics from the pothole picture dataset. With this in mind, the completely connected layers were switched out for CNN's fully connected layers, and the output was generated using the SoftMax layer. Because this problem involves binary classification, there is just one node in the output layer. This node outputs either 0 (non-pothole) or 1 (pothole).

2.4. Hyper parameters tuning

Tweaking the model's hyperparameters is another essential phase in the model-building process. Adjusting hyperparameters such as batch size, epoch, learning rate, activation function, dropout for regularization, and the number of hidden layer units of CNN is required to build an efficient CNN model that can generalize new data. One finally arrives at the most suitable option by selecting from a limited pool of viable options for each hyperparameter's value from the available options. While doing hyperparameter tuning, some parameters include the optimizer, the batch size, and the epochs. In training deep learning algorithms, the Stochastic Gradient Descent optimizer has been superseded by the Adam optimizer, which is more extensively employed. Adam is an optimizer that combines the features of both the RMSProp and AdaGrad algorithms. The Adam optimizer's expression may be found in equations (12) and (13) as follows:

p_{t} = α_{1} p_{t - 1} + (1 - α_{1}) [\frac{δ L}{δ W_{t}}]

(12)

q_{t} = α_{2} q_{t - 1} + (1 - α_{2}) [\frac{δ L}{δ W_{t}}]

(13)

$α_{1}$ and $α_{2}$ are the decay rates, δL is loss function derivative, $δ W_{t}$ is weights derivative at t, $W_{t}$ signifies the weights, $p_{t}$ is gradients collection, and $q_{t}$ is past gradients sum of squares. When tuning any deep learning system, the batch size is one of the most important hyperparameters to adjust. The parallelism of GPUs allows a high batch size to speed up the computing process while training a deep learning model. However, this might result in poor generalization if the model is not properly trained. On the other hand, a small batch size might result in faster convergence to good solutions. Hence, whether you choose to work with large or small batch sizes, there will always be a cost and benefit-trade-off. In this study, we simulate the suggested model using a variety of batch sizes, including 8, 16, 32, 64, and 128, to find which batch size provides the most accurate results. The value represented by the epoch represents the total number of times the complete dataset is shown to the neural network. When we say that one epoch of training has passed, we mean that the training dataset has had one chance to change the model's internal parameters. So, the number of epochs should be increased to reduce the number of errors that occur during the model learning process. On the other hand, adding more epochs will increase the time needed for computing. So, it is important to strike a balance between employing a large or small number of epochs. The model being shown here is simulated using a total of one hundred epochs. The hyperparameters and their associated values are listed in Table 3, which contains more information.

Table 3.

Hyperparameters and their Values.

Hyperparameters	Values
Batch size	32
Learning rate	0.001
Epoch	100
Optimizer	Adam
Activation function	ReLU

Open in a new tab

3. Simulation results

Create Python simulation experiments using the Anaconda 3 platform. Python 3.7 was used for model training and testing on Windows 10 computers. The backend Tensorflow version 1.12 and high-level application programming interface Keras version 2.2.4 were used to create the CNN model. After 100 epochs, the model reaches convergence. Images used in experiments were divided as follows: 70% training, 10% validation, and 20% testing. In the suggested investigation, one-way ANOVA was used for all experimental analyses to validate the findings.

We determined whether or not the suggested model could be implemented via a series of tests containing assessment measures to assess the model's performance. All tests were carried out using a laptop equipped with a Jupyter and a GPU. In addition, the laptop needed Python version 3.7 and 16 gigabytes of RAM. The CNN model was developed with the help of Keras version 2.2.4 as a high-level application programming interface (API) and Tensorflow version 1.12 as the back end. By doing the tests several times and then tabulating and charting the data of each run, we were able to generate consistent results. The proposed model was validated by applying it to the Kaggle dataset, and several performance metrics, including accuracy, precision, sensitivity, and F1 Score, were considered throughout the research. An exploratory examination of a wide range of hyperparameters was carried out, and many different deep-learning models, such as YOLO, FCNN, and DeepLabv3+, were installed to assess their efficacy and efficiency in categorizing different types of road damage.

3.1. Evaluation metrics

Throughout the analyzed trials, we used a variety of measures to evaluate the performance of the models. The definitions of the formulas are as follows:

3.1.1. Accuracy

When examining the performance of DL algorithms, only one component must be evaluated: a criterion. This can be explained, in part, by its uncomplicated nature and the ease with which it can be put into practice. The proportion of cases properly categorized concerning the total number of instances is one definition of accuracy.

A c c u r a c y = \frac{(T P + T N)}{(T P + T N + F P + F N)}

(14)

Meanwhile, TP, also known as true positives, refers to the cases of something being positive that was projected to be positive, while TN, also known as true negatives, refers to the instances of something being negative that was predicted to be negative. At the same time, FP, also known as false positives, refers to the negative cases anticipated to be positive. FN, also known as false negatives, refers to the positive instances projected to be negative.

3.1.2. Specificity

Specificity measurement is computed as the percentage of true negatives, which denotes the classifier's proper identification.

S p e c i f i c i t y c u r a c y = \frac{(T N)}{(T P + F N)}

(15)

3.1.3. Precision and recall

Precision and recall are frequently combined because they are correlated. The mathematical formula for precision and recall can be given as,

P r e c i s i o n = \frac{T P}{T P + F P}

(16)

R e c a l l = \frac{T P}{T P + F N}

(17)

3.1.4. F1 score

The F1 score is a quantitative tool that is utilized in the process of determining how accurate a system is. To determine it, we take the score for Precision, add it to the score for Recall, and calculate the harmonic mean. It is utilized to determine the Classification's overall level of excellence. The F1-score may be understood in mathematical terms as in equation (18).

F 1 s c o r e = \frac{2 \times (P r e c i s i o n - R e c a l l)}{(P r e c i s i o n + R e c a l l)}

(18)

It is a common statistic because it provides an overall view of precision and recall, which may be beneficial when an algorithm has high accuracy but poor recall or vice versa. This can assist in determining if the algorithm should be improved or abandoned. This is one of the reasons why it is beneficial. It is important to exercise caution when employing this assessment metric as a tool for evaluation, even though it is frequently utilized to offer a more balanced perspective on precision and recall. This is because the statistic uses a weighted average, which considers both precision and recall.

3.1.5. Model accuracy and model loss analysis

Training accuracy and validation accuracy tests for the proposed VGG-16-CNN are carried out based on model accuracy and loss metrics. The interpretation of the loss is dependent on how well the model is performing in both the training and validation sets, which are utilized to compute the loss. The loss interpretation depends on how well the model performs in both sets. It reflects the number of errors committed for each particular occurrence in the training or validation sets. Fig. 4 presents the suggested method's training and validation accuracy graphs. These graphs pertain to the proposed strategy. A total of one hundred different epochs are used to analyze the model.

Training and validation accuracy of the proposed model.

As shown in Fig. 4, the proposed model obtains the best possible accuracy during training and validation. In the first two epochs, the graph shows a significant accuracy improvement, indicating that the network is quickly picking up new information. For the entirety of the training phase, there was a steady improvement in both the training and validation accuracy, which is a strong indication that the model parameters were successfully adjusted. In addition, the suggested model beats previous models with a top performance of 97%, although it has fewer parameters and a lower computing cost than the other models. According to Fig. 5, the suggested model has the best possible training and validation loss. The loss on the validation set stays relatively constant across several epochs, demonstrating that the model generalizes well to data it has not previously encountered. In contrast, the loss on the training set decreases rapidly throughout the first two epochs. The suggested Transfer Learning method generally makes training the model go faster while improving its overall performance. In addition, the data augmentation that is a part of the proposed system contributes to an increase in the richness and sufficiency of the dataset. This leads to greater accuracy models, helps reduce operating expenses, and makes it easier to clean the data.

Training and validation loss of the proposed model.

Fig. 6 illustrates the specificity of both the proposed and the current algorithms. The YOLO model achieves a specificity of 88.2%, while the FCNN model achieves 84.5%, and the DeepLabv3+ model achieves 88.7%. Nevertheless, the suggested model achieves a specificity of 95.4%, which is 7% higher than both YOLO and DepLabv3+ and 11% higher than the FCNN model. A high specificity indicates that the model can accurately recognize most of the findings that are not positive. The model's generalization capacity rises when additional data are added, and existing picture data are enhanced. It increases the variability of the data and reduces the likelihood of the data being overfitting.

(a) Specificity, (b) Precision, (c) Recall, and (d) F1 Score of the proposed and existing algorithms.

Fig. 6b illustrates the degree of accuracy achieved by both the new model and the one already in use. The analysis results show that the work offered received a score of 95%, whereas DeepLabv3+ received 86%, FCNN received 86%, and YOLO received 89%. The results produced by the suggested model were 9% better than those obtained by DeepLabv3+, 9% better than those obtained by FCNN, and 6% better than those obtained by the YOLO network. This demonstrates that the suggested method works better than the existing algorithm. Image enhancement is done to the pothole image data to improve the picture's appearance, and then a new enhanced image is created as a result. This is how the performance of the proposed system is improved, and it is what is responsible for the improvement. In most cases, the enhanced image can be understood with less mental effort than the original image. So, the image is processed using the suggested model, and potholes are located accurately.

Fig. 6c compares the recall capabilities of the proposed algorithm with those of the current methods. The recall scores that YOLO, FCNN, and DeepLabv3+ achieved and the proposed work are 87%, 85%, 89%, and 92%, respectively. Image enhancement is a process that can increase the accuracy and memory of pothole detection. This process highlights significant aspects while deleting extraneous information, decreasing noise, and altering levels to expose blurry details. The findings of the experiments indicate that improving the quality of both the data and the images used in the learning algorithm may be accomplished by combining several data and picture enhancement methods. In addition, optimizing the deep learning algorithm's hyperparameters through a process known as fine-tuning is an essential step toward attaining accurate prediction and categorization of potholes.

The F1 score for the proposed approach and the present method are compared in the figure that can be found above labeled 6d. The recommended transfer learning model performed noticeably better than DeepLabv3+, FCNN, and YOLO, as shown by the fact that it achieved the highest possible F1 score. This is demonstrated by the fact that it obtained the absolute maximum score. This result was a 94% score. The DeepLabv3+ received a score of 90%, whereas the FCNN scored 84%, and YOLO scored 89%. The uneven data are brought into balance thanks to the augmentation of the data, which ultimately results in an increase in performance and the accomplishment of the best outcomes that are even remotely conceivable for the dataset.

Two statistical tests, namely the Matthews correlation coefficient (MCC) and Cohen's Kappa Statistic, were carried out to compare the results. These tests are known as the MCC and Cohen's Kappa Statistic. MCC is a binary classification rate that only awards a high score if the binary predictor successfully anticipates the most positive and negative data instances. This occurs because MCC requires the binary predictor to predict both positive and negative data examples accurately. Under these circumstances, achieving such a high score is impossible. It is a well-known performance measure that is used to evaluate how accurate binary classifications are, and it is symbolized by the mathematical equation (19), which can be found in the following:

MCC = \frac{(TP \times TN - FP \times FN)}{\sqrt{((TP + FP) \times (TP + FN) \times (TN + FP) \times (TN + FN))}}

(19)

The value of MCC might fall anywhere between -1 and +1. When the value of MCC is closer to 1, it is doing well.

Cohen's Kappa Statistic is utilized to determine the level of concordance between two raters that classify things incompatible with one another, as demonstrated mathematically in the equation (20).

C K = \frac{p_{0} - p_{e}}{1 - p_{e}}

(20)

Here, indicate how much the rater's observations agree with each other. $p_{e}$ is the theoretical probability that two random things will agree. Using Eqs. (21) and (22), we can figure out the $p_{0}$ and $p_{e}$ between the raters by using (23) and (24).

p_{0} = \frac{T P + T N}{T P + T N + F P + F N}

(21)

p_{e} = P r o b a b i l i t y o f P o s i t i v e + P r o b a b i l i t y o f N e g a t i v e

(22)

where,

P r o b a b i l i t y o f P o s i t i v e = \frac{(T P + F P)}{(T P + T N + F P + F N)} \times

\frac{(T P + F N)}{(T P + T N + F P + F N)}

(23)

P r o b a b i l i t y o f N e g a t i v e = \frac{(F P + T N)}{(T P + T N + F P + F N)} \times

\frac{(F N + T N)}{(T P + T N + F P + F N)}

(24)

Cohen's Kappa (CK) value is never more than 1, and it is always between 0 and 1. A value of 0 indicates that there is no agreement between the two raters, while a value of 1 indicates that there is perfect agreement. There is an almost perfect agreement between the actual data and the predictors across all CK models.

Table 4 shows the MCC, CK, MSE, and PSNR values of the different algorithms. It is observed from Table 4 that the MCC and CK scores of the planned work are becoming closer and closer to 1, which indicates that there is a perfect positive association. Instead of initializing the model with random weights, transfer learning (TL) uses weights from a model that has already been trained as initializes; this allows the model to be trained using the dataset, which contributes to the improvement in performance that can be attributed to the application of TL in the work that has been proposed. This is because TL uses weights from a pre-trained model as initialized. This strategy helps to change the weights based on prior learning and makes training go more quickly.

Table 4.

PERFORMANCE OF THE ALGORITHMS.

Algorithms	MCC (%)	CK (%)	MSE	PSNR
YOLO	97.47	97.39	0.3628	50.9263
FCNN	98.23	98.44	0.3283	51.4772
DeepLabV3+	98.96	98.65	0.2849	52.8782
Proposed work	99.29	99.49	0.2391	53.2913

Open in a new tab

The evaluation of the performance of data that has been preprocessed entails determining how effective the preprocessing was by an examination of the rate of data distortion as well as the rate at which the data was restored. Mean Square Error Calculation needs to be done to finish this evaluation, which involves comparing the pixel values of the source picture to those of the output image to establish the gap between the two. This helps to assess the level of distortion already there in the data before it was processed and the degree to which it was effectively corrected.

Transfer learning is a useful technique for achieving improved performance and reducing training costs by leveraging the knowledge gained from source tasks and applying it to target tasks. The general accuracy of the applications is shown in Table 5. With Transfer learning approaches, achieved 99.23%, and Without Transfer learning approaches achieved 98.46% accuracy.

Table 5.

Comparison with state-of-the-art techniques.

Algorithm	Training Time	Accuracy (%)
CNN with VGG16 (without training)	23 min 53 secs	98.46%
CNN with VGG16 (with training)	23 min 42 secs	99.23%

Open in a new tab

It is observed from Table 5 that the training times of CNN with VGG16 are almost the same for the two types of methods. Temporal differences were observed, although the same network architectures were used for transfer learning and without transfer learning. This may be because the instantaneous loads on the processor can be different. The same observation is valid for the testing times.

In this work, the power of transfer learning is unleashed by using the pre-trained model CNN-VGG-16 as an effective feature extractor to classify pothole images on road surfaces of the dataset. Training data is used to train the machine-learning model. The more training data the model has, the better it can make predictions. Testing Data is used to evaluate the performance of the model. The model can learn from the training data and improve its predictions. The computational time of the training and testing is tabulated in Table 5. In CNN with VGG16 without training, computational training time is 23 min 53 secs, and Testing time for an image is 0.0323 sec, whereas CNN with VGG16 (with training) training time is 23 min 42 secs and Testing time is 0.0387 sec. But CNN with VGG16 (with training) only attained the highest accuracy of 99.23%.

M S E = \frac{1}{A B} \sum_{m = 0}^{A - 1} \sum_{n = 0}^{B - 1} {[I (m, n) - N (m, n)]}^{2}

(25)

Where I am referring to the image in the dimensions of AxB, a picture that contains noise is denoted by the letter N. The picture data that has been compressed could cause a loss of data and will be different from the original data. The Peak Signal-to-Noise Ratio (PSNR) is utilized to quantify this disparity. It is an index that may be used to determine the amount of information lost in the image's quality, and it indicates the ratio of noise to the maximum signal that a signal can have. A lower number for PSNR suggests a higher quality signal with less loss. When the mean squared error equals zero, the PSNR for lossless pictures is not defined.

P S N R = 10 \log \frac{M a x_{I}^{2}}{M S E}

(26)

Where $M a x_{I}^{2}$ represents the maximum value of the subsequent image, this indicates that a larger PSNR may be expected when the MSE is reduced. As a direct result, the PSNR value is greater in the data sets with high quality than those with low quality.

According to the assessment findings, the suggested model can achieve a high degree of accuracy in the prediction of pothole classes and non-pothole classes, with errors ranging from 0.2 to 0.3 in MSE, as shown in Table 3. The PSN closer to the plot's upper left corners that the proposed technique efficiently eliminates image loss and properly forecasts potholes. In addition, the PSNR score was found to be at least 50 db. As a result, these data indicate the usefulness of the model developed for identifying potholes in surroundings that are both basic and complicated.

3.1.6. Receiver operator characteristic (ROC) curve

The ROC curve represents the sensitivity and specificity trade-off for a particular test. This trade-off may be understood as the chance that a given test properly classifies a given data pair, with and without the condition. A curve closer to the plot's upper left corner indicates that the classifier is doing better. The Receiver Operating Characteristic (ROC) curve is essential for issue identification and classification. It compares the likelihood of the True Positive Rate (TPR) to the False Positive Rate (FPR) at a variety of threshold levels to differentiate signal from noise. Meanwhile, sensitivity, also known as TPR, displays the accuracy of the model's assessment of the negative class. Specificity, also known as FPR, reveals the rate at which the model incorrectly classifies members of the negative class.

A ROC curve was drawn up to determine how well the classifier could differentiate between potholes and other road defects that were not potholes. The classifier's performance may be shown in Fig. 7. A good statistic for summing up the overall accuracy of a test is the area under the curve or AUC. The score can be anything from 0 to 1, with 0 suggesting an entirely erroneous test and 1 indicating a completely accurate test. The ROC curve is summarized by a metric called the Area Under the Curve (AUC), which is used to quantify the capacity of a model to discern between distinct groups. The portion of the ROC curve shown in Fig. 7, closer to the upper left corner, suggests that the suggested approach detects potholes in pictures properly. The AUC was determined to be 0.98, a desirable result for classification performance because values above 0.50 are favored. The sample categorization is considered to be of greater quality when the AUC value is higher. As a result, the suggested model is appropriate for categorizing potholes.

The Mask R-CNN model's inference time is 0.098, as shown in 6 table, which is significantly more time than the ESRGAN+YOLOv2 model. The ESRGAN + YOLOv2 model likewise has an acceptable inference time of 0.042 seconds because it concentrates on cloud-based object detection. The suggested VGG16 model may be applied to CNN inference in as little as 0.008 seconds, more than four times quicker than the FAST-RANSAC model. We conclude that the value of our model for the period of inference is better than the expected result.

Table 6.

Computational Inference speed of the proposed method.

Algorithm	Inference	Size (Mb)	Time
Algorithm	speed	Size (Mb)	Time
Mask R-CNN	0.098	921.3	O(N²)
FAST-RANSAC	0.039	683.2	O(N²)
SURF-NN	0.015	282.8	O(NR)
OCSVM	0.021	42.5	O(N)
ESRGAN + YOLOv2	0.042	15.9	O(NKD2)
CNN with VGG16	0.008	8.33	O(N)

Open in a new tab

Compared to other models, the trained model is fairly compact (8.33 MB in size). This demonstrates that using the VGG-16 model, CNN has a considerably lower computational cost than using other models. The ESRGAN+YOLOv2 model, which is in line with the suggested model, has a size of 42.5 MB, which is appropriate for heavyweight models and is 15.9 MB larger than the OCSVM model. For cloud-based discovery processes, the SURF-NN model has a size of 282.8 MB, and FAST-RANSAC has a size of up to 635.2 MB. Comparison of the size of all algorithms is detailed in Table 6.

In Table 7, a comparison of the time complexity of the proposed model and the state-of-the-art method is given.

Table 7.

Comparison with state-of-the-art vs our results.

Papers	Algorithm	Dataset	Accuracy
Chung and Yang	Mask R-CNN	Online dataset	73.54%
Satti et al.	FAST-RANSAC	Online dataset	98.47%
Wang et al.	SURF-NN	CCSAD	91.56%
Oguine et al.	YOLO V3	MAS COCO	90.56%
Salaudeen and Çelebi	ESRGAN	CCSAD	93.71%
Our Proposed	CNN with VGG16	Kaggle dataset	99.23%

Open in a new tab

3.2. Comparison with state of art technique

For pothole detection, we also conducted a comparative analysis with many research that used comparable datasets. The table compares the outcomes with those of our study. The same dataset and Mask R-CNN model were used in the A.A. Alhusan et al. [27] study. Lv et al. [28] additionally used a Japanese dataset to find pits. To enhance and increase the pothole dataset and guarantee the detection stability of the suggested model, we present an efficient image preprocessing strategy that uses color adjustment and geometric modification algorithms. Depending on whether there is water present or not, the swales are further separated. The surface condition of the potholes and the weather can be predicted based on the detection data. The YOLOv3 model's rich pothole information can be extracted using the proposed network extraction method. The precision of identification and positioning is increased since the size of the anchor that was adjusted this way fits the form and size of the pothole. The suggested model's robustness is demonstrated by using random occlusion and noise addition to create adversarial attack samples. The outcomes exhibit strong overall robustness. In particular, under strong interference, the suggested model becomes resilient to Gaussian noise.

3.3. Discussion

One of the comparative study's weaknesses is that the different models utilized in this study were not properly trained under comparable circumstances. All dent detection models are trained using datasets containing photographs of dents. Still, there are several variables, including the size of the dataset, the images it contains, and the GPU being used to train the model. These results must be carefully analyzed to prevent missing or undervaluing alternative object identification techniques because a closer look exposes certain abnormalities. Because they are fundamental characteristics of object recognition models and do not significantly alter regardless of how much effort the model puts into it, factors like inference speed, model size, etc., are relevant for comparison studies. Although it performs worse than the heavy model by a lesser margin, the dimple detection model outperforms all other lightweight detectors by a wider margin. This model's lengthier inference time than other models is its one flaw.

4. Conclusion

While on the road, it's nearly hard to avoid seeing damaged pavement in potholes or road cracks. Motorists must operate their automobiles securely and flexibly to prevent any harm to themselves or their vehicles. Deep learning algorithms provide a workable answer to the problem of locating potholes in today's world. Adjustments to the color and geometric transformations were made to guarantee the suggested model's detection stability and serve as helpful picture preprocessing approaches that may be utilized to improve and increase existing pothole datasets. The VGG16 model is used as a pre-trained resource for transfer learning. It has been suggested that potholes be identified and categorized by a CNN. On the Kaggle dataset, the model is trained and evaluated for accuracy. The findings of the suggested model made it abundantly obvious that there was a reasonably high success rate overall. The findings make it abundantly evident that transfer learning is an appropriate method if the requirements for computing time and hardware are minimal and results may be obtained quickly. In the future, the suggested technique will be able to be applied for various applications for detection and classification, and the system's overall functionality will be enhanced.

5. Limitations and future directions

Because potholes do not have a predetermined shape or size, pothole inspection is challenging and unusual. They can take many distinct shapes, which complicates the detecting process. The enhanced pothole detection system was trained for 5000 epochs using the model presented in this study. Potholes collected in various shapes and with numerous potholes are included in the dataset used for model training. According to experimental findings, the model outperforms all currently used algorithms in accuracy by up to 99.23%. The analytical results demonstrate that the suggested model is well suited for pothole detection because of its small size, ease of distribution, small storage footprint, low power consumption, and minimal computer resource requirements. The results presented here will certainly give further details on solving this issue using small object detection tasks and ultra-high-resolution photos. Furthermore, the suggested method outperforms existing methods, especially with higher accuracy, compared to state-of-the-art pothole identification methods on comparable datasets. This study, which goes beyond pothole detection efforts, aims to fill up these gaps across several scientific domains.

End-to-end training for super-resolution and object detection networks will be the main focus of future research. This effort also focuses on creating a lightweight super-resolution network to drastically reduce inference time and employing a lightweight semantic segmentation network to recognize all items in frames and pavement. We intend to compile and publish a dataset that accurately represents road potholes for autonomous vehicles even though there is no common benchmark dataset for sophisticated pothole detection datasets. The technique may be extensively utilized to maintain and enhance road safety. The potholes may be found automatically using acceleration information obtained from wireless sensors mounted on the vehicle.

Environmental contamination and the condition of the road surface can be monitored using sensor networks installed in public transportation systems. The following suggestions for potential model extensions:

•
It may display pit locations graphically by including a map in the Android application.
•
If a voice notification and a depression is nearby, it can move slowly and hear the distance to the depression.
•
More precise and complex algorithms can be utilized to calculate the distance between two latitudes and longitudes.
•
More advanced prototypes could be created to control vehicle speed when potholes are ahead.

Declaration

Both authors have contributed equally to (1) the conception and design of the study, the acquisition of data, or the analysis and interpretation of data; (2) drafting the article or critically revising its important intellectual content; (3) final approval of the version submitted.

CRediT authorship contribution statement

Satyabrata Swain: Writing – original draft, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Asis Kumar Tripathy: Writing – review & editing, Validation, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Contributor Information

Satyabrata Swain, Email: satya.swain@gmail.com.

Asis Kumar Tripathy, Email: asistripathy@vit.ac.in.

Data availability

The authors do not have any data to share.

References

1.Alqethami S., Alghamdi S., Alsubait T., Alhakami H. Roadnet: efficient model to detect and classify road damages. Appl. Sci. 2022;12 [Google Scholar]
2.Wu H., Yao L., Xu Z., Li Y., Ao X., Chen Q., Li Z., Meng B. Road pothole extraction and safety evaluation by integration of point cloud and images derived from mobile mapping sensors. Adv. Eng. Inform. 2019;42 [Google Scholar]
3.Varona B., Monteserin A., Teyseyre A. A deep learning approach to automatic road surface monitoring and pothole detection. Pers. Ubiquitous Comput. 2020;24:519–534. [Google Scholar]
4.Bhatia Y., Rai R., Gupta V., Aggarwal N., Akula A., et al. Convolutional neural networks based potholes detection using thermal imaging. J. King Saud Univ, Comput. Inf. Sci. 2022;34:578–588. [Google Scholar]
5.Kim Y.-M., Kim Y.-G., Son S.-Y., Lim S.-Y., Choi B.-Y., Choi D.-H. Review of recent automated pothole-detection methods. Appl. Sci. 2022;12:5320. [Google Scholar]
6.Chen H., Yao M., Gu Q. Pothole detection using location-aware convolutional neural networks. Int. J. Mach. Learn. Cybern. 2020;11:899–911. [Google Scholar]
7.Dib J., Sirlantzis K., Howells G. A review on negative road anomaly detection methods. IEEE Access. 2020;8:57298–57316. [Google Scholar]
8.Sun Z., Pei L., Li W., Hao X., Chen Y. Pavement encapsulation crack detection method based on improved faster r-cnn. J. South China Univ. Technol. (Nat. Sci. Ed.) 2020;48:84–93. [Google Scholar]
9.Liyang X., Wei L., Zhaoyun S., Pei L. Proc., Maintenance and Management. Branch of China Highway Society; China: 2020. Automatic crack detection method based on jtg 5210-2018 standard; pp. 90–98. [Google Scholar]
10.Ellingson S.R., Davis B., Allen J. Machine learning and ligand binding predictions: a review of data, methods, and obstacles. Biochim. Biophys. Acta G, Gen. Subj. 2020;1864 doi: 10.1016/j.bbagen.2020.129545. [DOI] [PubMed] [Google Scholar]
11.Hossain M.S., Angan R.B., Hasan M.M. 2023 International Conference on Electrical, Computer and Communication Engineering (ECCE) IEEE; 2023. Pothole detection and estimation of repair cost in Bangladeshi street: ai-based multiple case analysis; pp. 1–6. [Google Scholar]
12.Baek J.-W., Chung K. Pothole classification model using edge detection in road image. Appl. Sci. 2020;10:6662. [Google Scholar]
13.Park S.-S., Tran V.-T., Lee D.-E. Application of various yolo models for computer vision-based real-time pothole detection. Appl. Sci. 2021;11 [Google Scholar]
14.Ye W., Jiang W., Tong Z., Yuan D., Xiao J. Convolutional neural network for pothole detection in asphalt pavement. Road Mater, Pavement Des. 2021;22:42–58. [Google Scholar]
15.Dewangan D.K., Sahu S.P. Potnet: pothole detection for autonomous vehicle system using convolutional neural network. Electron. Lett. 2021;57:53–56. [Google Scholar]
16.Fan R., Liu M. Road damage detection based on unsupervised disparity map segmentation. IEEE Trans. Intell. Transp. Syst. 2019;21:4906–4911. [Google Scholar]
17.Fan R., Wang H., Wang Y., Liu M., Pitas I. Graph attention layer evolves semantic segmentation for road pothole detection: a benchmark and algorithms. IEEE Trans. Image Process. 2021;30:8144–8154. doi: 10.1109/TIP.2021.3112316. [DOI] [PubMed] [Google Scholar]
18.Fan R., Ozgunalp U., Hosking B., Liu M., Pitas I. Pothole detection based on disparity transformation and road surface modeling. IEEE Trans. Image Process. 2019;29:897–908. doi: 10.1109/TIP.2019.2933750. [DOI] [PubMed] [Google Scholar]
19.Fan R., Ozgunalp U., Wang Y., Liu M., Pitas I. Rethinking road surface 3-d reconstruction and pothole detection: from perspective transformation to disparity map segmentation. IEEE Trans. Cybern. 2021;52:5799–5808. doi: 10.1109/TCYB.2021.3060461. [DOI] [PubMed] [Google Scholar]
20.Sahoo Arundhati, Tripathy Asis Kumar. On routing algorithms in the internet of vehicles: a survey. Connect. Sci. 2023;35(1) [Google Scholar]
21.Gao M., Wang X., Zhu S., Guan P. Detection and segmentation of cement concrete pavement pothole based on image processing technology. Math. Probl. Eng. 2020;2020:1–13. [Google Scholar]
22.Yebes J.J., Montero D., Arriola I. Learning to automatically catch potholes in worldwide road scene images. IEEE Intell. Transp. Syst. Mag. 2020;13:192–205. [Google Scholar]
23.Gupta S., Sharma P., Sharma D., Gupta V., Sambyal N. Detection and localization of potholes in thermal images using deep neural networks. Multimed. Tools Appl. 2020;79:26265–26284. [Google Scholar]
24.Chun C., Ryu S.-K. Road surface damage detection using fully convolutional neural networks and semi-supervised learning. Sensors. 2019;19:5501. doi: 10.3390/s19245501. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Yousaf M.H., Azhar K., Murtaza F., Hussain F. Visual analysis of asphalt pavement for detection and localization of potholes. Adv. Eng. Inform. 2018;38:527–537. [Google Scholar]
26.Haq M.U.U., Ashfaque M., Mathavan S., Kamal K., Ahmed A. Stereo-based 3d reconstruction of potholes by a hybrid, dense matching scheme. IEEE Sens. J. 2019;19:3807–3817. [Google Scholar]
27.Alhussan A.A., Khafaga D.S., El-Kenawy E.-S.M., Ibrahim A., Eid M.M., Abdelhamid A.A. Pothole and plain road classification using adaptive mutation dipper throated optimization and transfer learning for self driving cars. IEEE Access. 2022;10:84188–84211. [Google Scholar]
28.Lv Ning, Xiao Jian, Qiao Yujing. Object detection algorithm for surface defects based on a novel YOLOv3 model. Processes. 2022;10(4):701. [Google Scholar]
29.Swain Satyabrata, Tripathy Asis Kumar. Advances in Distributed Computing and Machine Learning: Proceedings of ICADCML 2022. Springer; 2022. A novel smart data collection approach in UAV-enabled smart transportation system; pp. 709–716. [Google Scholar]
30.Swain Satyabrata, Sahoo Jyoti Prakash, Tripathy Asis Kumar. Advances in Distributed Computing and Machine Learning: Proceedings of ICADCML 2020. Springer; 2021. Power allocation-based QoS guarantees in millimeter-wave-enabled vehicular communications; pp. 35–43. [Google Scholar]
31.Wang Danyu, Liu Zhen, Gu Xingyu, Wu Wenxiu, Chen Yihan, Wang Lutai. Automatic detection of pothole distress in asphalt pavement using improved convolutional neural networks. Remote Sens. 2022;14(16):3892. [Google Scholar]
32.Salaudeen Habeeb, Çelebi Erbuğ. Pothole detection using image enhancement GAN and object detection network. Electronics. 2022;11(12):1882. [Google Scholar]
33.Rahman A., Patel S. 2020. Annotated potholes image dataset. [Google Scholar]
34.Liu Z., Gu X., Wu W., Zou X., Dong Q., Wang L. Gpr-based detection of internal cracks in asphalt pavement: a combination method of deepaugment data and object detection. Measurement. 2022;197 [Google Scholar]
35.Ledig C., Theis L., Huszár F., Caballero J., Cunningham A., Acosta A., Aitken A., Tejani A., Totz J., Wang Z., et al. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. Photo-realistic single image super-resolution using a generative adversarial network; pp. 4681–4690. [Google Scholar]
36.Jolicoeur-Martineau A. The relativistic discriminator: a key element missing from standard gan. 2018. arXiv:1807.00734 preprint.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The authors do not have any data to share.

[br0010] 1.Alqethami S., Alghamdi S., Alsubait T., Alhakami H. Roadnet: efficient model to detect and classify road damages. Appl. Sci. 2022;12 [Google Scholar]

[br0020] 2.Wu H., Yao L., Xu Z., Li Y., Ao X., Chen Q., Li Z., Meng B. Road pothole extraction and safety evaluation by integration of point cloud and images derived from mobile mapping sensors. Adv. Eng. Inform. 2019;42 [Google Scholar]

[br0030] 3.Varona B., Monteserin A., Teyseyre A. A deep learning approach to automatic road surface monitoring and pothole detection. Pers. Ubiquitous Comput. 2020;24:519–534. [Google Scholar]

[br0040] 4.Bhatia Y., Rai R., Gupta V., Aggarwal N., Akula A., et al. Convolutional neural networks based potholes detection using thermal imaging. J. King Saud Univ, Comput. Inf. Sci. 2022;34:578–588. [Google Scholar]

[br0050] 5.Kim Y.-M., Kim Y.-G., Son S.-Y., Lim S.-Y., Choi B.-Y., Choi D.-H. Review of recent automated pothole-detection methods. Appl. Sci. 2022;12:5320. [Google Scholar]

[br0060] 6.Chen H., Yao M., Gu Q. Pothole detection using location-aware convolutional neural networks. Int. J. Mach. Learn. Cybern. 2020;11:899–911. [Google Scholar]

[br0070] 7.Dib J., Sirlantzis K., Howells G. A review on negative road anomaly detection methods. IEEE Access. 2020;8:57298–57316. [Google Scholar]

[br0080] 8.Sun Z., Pei L., Li W., Hao X., Chen Y. Pavement encapsulation crack detection method based on improved faster r-cnn. J. South China Univ. Technol. (Nat. Sci. Ed.) 2020;48:84–93. [Google Scholar]

[br0090] 9.Liyang X., Wei L., Zhaoyun S., Pei L. Proc., Maintenance and Management. Branch of China Highway Society; China: 2020. Automatic crack detection method based on jtg 5210-2018 standard; pp. 90–98. [Google Scholar]

[br0100] 10.Ellingson S.R., Davis B., Allen J. Machine learning and ligand binding predictions: a review of data, methods, and obstacles. Biochim. Biophys. Acta G, Gen. Subj. 2020;1864 doi: 10.1016/j.bbagen.2020.129545. [DOI] [PubMed] [Google Scholar]

[br0110] 11.Hossain M.S., Angan R.B., Hasan M.M. 2023 International Conference on Electrical, Computer and Communication Engineering (ECCE) IEEE; 2023. Pothole detection and estimation of repair cost in Bangladeshi street: ai-based multiple case analysis; pp. 1–6. [Google Scholar]

[br0120] 12.Baek J.-W., Chung K. Pothole classification model using edge detection in road image. Appl. Sci. 2020;10:6662. [Google Scholar]

[br0130] 13.Park S.-S., Tran V.-T., Lee D.-E. Application of various yolo models for computer vision-based real-time pothole detection. Appl. Sci. 2021;11 [Google Scholar]

[br0140] 14.Ye W., Jiang W., Tong Z., Yuan D., Xiao J. Convolutional neural network for pothole detection in asphalt pavement. Road Mater, Pavement Des. 2021;22:42–58. [Google Scholar]

[br0150] 15.Dewangan D.K., Sahu S.P. Potnet: pothole detection for autonomous vehicle system using convolutional neural network. Electron. Lett. 2021;57:53–56. [Google Scholar]

[br0160] 16.Fan R., Liu M. Road damage detection based on unsupervised disparity map segmentation. IEEE Trans. Intell. Transp. Syst. 2019;21:4906–4911. [Google Scholar]

[br0170] 17.Fan R., Wang H., Wang Y., Liu M., Pitas I. Graph attention layer evolves semantic segmentation for road pothole detection: a benchmark and algorithms. IEEE Trans. Image Process. 2021;30:8144–8154. doi: 10.1109/TIP.2021.3112316. [DOI] [PubMed] [Google Scholar]

[br0180] 18.Fan R., Ozgunalp U., Hosking B., Liu M., Pitas I. Pothole detection based on disparity transformation and road surface modeling. IEEE Trans. Image Process. 2019;29:897–908. doi: 10.1109/TIP.2019.2933750. [DOI] [PubMed] [Google Scholar]

[br0190] 19.Fan R., Ozgunalp U., Wang Y., Liu M., Pitas I. Rethinking road surface 3-d reconstruction and pothole detection: from perspective transformation to disparity map segmentation. IEEE Trans. Cybern. 2021;52:5799–5808. doi: 10.1109/TCYB.2021.3060461. [DOI] [PubMed] [Google Scholar]

[br0310] 20.Sahoo Arundhati, Tripathy Asis Kumar. On routing algorithms in the internet of vehicles: a survey. Connect. Sci. 2023;35(1) [Google Scholar]

[br0200] 21.Gao M., Wang X., Zhu S., Guan P. Detection and segmentation of cement concrete pavement pothole based on image processing technology. Math. Probl. Eng. 2020;2020:1–13. [Google Scholar]

[br0210] 22.Yebes J.J., Montero D., Arriola I. Learning to automatically catch potholes in worldwide road scene images. IEEE Intell. Transp. Syst. Mag. 2020;13:192–205. [Google Scholar]

[br0220] 23.Gupta S., Sharma P., Sharma D., Gupta V., Sambyal N. Detection and localization of potholes in thermal images using deep neural networks. Multimed. Tools Appl. 2020;79:26265–26284. [Google Scholar]

[br0230] 24.Chun C., Ryu S.-K. Road surface damage detection using fully convolutional neural networks and semi-supervised learning. Sensors. 2019;19:5501. doi: 10.3390/s19245501. [DOI] [PMC free article] [PubMed] [Google Scholar]

[br0240] 25.Yousaf M.H., Azhar K., Murtaza F., Hussain F. Visual analysis of asphalt pavement for detection and localization of potholes. Adv. Eng. Inform. 2018;38:527–537. [Google Scholar]

[br0250] 26.Haq M.U.U., Ashfaque M., Mathavan S., Kamal K., Ahmed A. Stereo-based 3d reconstruction of potholes by a hybrid, dense matching scheme. IEEE Sens. J. 2019;19:3807–3817. [Google Scholar]

[br0260] 27.Alhussan A.A., Khafaga D.S., El-Kenawy E.-S.M., Ibrahim A., Eid M.M., Abdelhamid A.A. Pothole and plain road classification using adaptive mutation dipper throated optimization and transfer learning for self driving cars. IEEE Access. 2022;10:84188–84211. [Google Scholar]

[br0320] 28.Lv Ning, Xiao Jian, Qiao Yujing. Object detection algorithm for surface defects based on a novel YOLOv3 model. Processes. 2022;10(4):701. [Google Scholar]

[br0330] 29.Swain Satyabrata, Tripathy Asis Kumar. Advances in Distributed Computing and Machine Learning: Proceedings of ICADCML 2022. Springer; 2022. A novel smart data collection approach in UAV-enabled smart transportation system; pp. 709–716. [Google Scholar]

[br0340] 30.Swain Satyabrata, Sahoo Jyoti Prakash, Tripathy Asis Kumar. Advances in Distributed Computing and Machine Learning: Proceedings of ICADCML 2020. Springer; 2021. Power allocation-based QoS guarantees in millimeter-wave-enabled vehicular communications; pp. 35–43. [Google Scholar]

[br0350] 31.Wang Danyu, Liu Zhen, Gu Xingyu, Wu Wenxiu, Chen Yihan, Wang Lutai. Automatic detection of pothole distress in asphalt pavement using improved convolutional neural networks. Remote Sens. 2022;14(16):3892. [Google Scholar]

[br0360] 32.Salaudeen Habeeb, Çelebi Erbuğ. Pothole detection using image enhancement GAN and object detection network. Electronics. 2022;11(12):1882. [Google Scholar]

[br0270] 33.Rahman A., Patel S. 2020. Annotated potholes image dataset. [Google Scholar]

[br0280] 34.Liu Z., Gu X., Wu W., Zou X., Dong Q., Wang L. Gpr-based detection of internal cracks in asphalt pavement: a combination method of deepaugment data and object detection. Measurement. 2022;197 [Google Scholar]

[br0290] 35.Ledig C., Theis L., Huszár F., Caballero J., Cunningham A., Acosta A., Aitken A., Tejani A., Totz J., Wang Z., et al. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. Photo-realistic single image super-resolution using a generative adversarial network; pp. 4681–4690. [Google Scholar]

[br0300] 36.Jolicoeur-Martineau A. The relativistic discriminator: a key element missing from standard gan. 2018. arXiv:1807.00734 preprint.

PERMALINK

Automatic detection of potholes using VGG-16 pre-trained network and Convolutional Neural Network

Satyabrata Swain

Asis Kumar Tripathy

Abstract

Highlights

1. Introduction

Table 1.

Table 2.

2. Proposed methodology

Figure 1.

2.1. Dataset

Figure 2.

2.2. Data preprocessing

2.2.1. Color adjustment

2.2.2. Geometric augmentation

2.2.3. Image quality enhancement using super-resolution GAN (SRGAN)

2.2.4. Relativistic discriminator

2.2.5. Perceptual loss

2.3. Transfer learning model

Figure 3.

2.4. Hyper parameters tuning

Table 3.

3. Simulation results

3.1. Evaluation metrics

3.1.1. Accuracy

3.1.2. Specificity

3.1.3. Precision and recall

3.1.4. F1 score

3.1.5. Model accuracy and model loss analysis

Figure 4.

Figure 5.

Figure 6.

Table 4.

Table 5.

3.1.6. Receiver operator characteristic (ROC) curve

Figure 7.

Table 6.

Table 7.

3.2. Comparison with state of art technique

3.3. Discussion

4. Conclusion

5. Limitations and future directions

Declaration

CRediT authorship contribution statement

Declaration of Competing Interest

Contributor Information

Data availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases