Traffic violations analysis: Identifying risky areas and common violations

El Mehdi Ben Laoula; Omar Elfahim; Marouane El Midaoui; Mohamed Youssfi; Omar Bouattane

doi:10.1016/j.heliyon.2023.e19058

. 2023 Aug 9;9(9):e19058. doi: 10.1016/j.heliyon.2023.e19058

Traffic violations analysis: Identifying risky areas and common violations

El Mehdi Ben Laoula ^a,^∗, Omar Elfahim ^a, Marouane El Midaoui ^b, Mohamed Youssfi ^a, Omar Bouattane ^a,^b

PMCID: PMC10472221 PMID: 37662813

Abstract

Road traffic accidents caused by traffic violations are a major public health issue that results in loss of lives and economic costs. Therefore, it is important to prioritize road safety measures that reduce the incidence and severity of accidents. In this study, we suggest an incremental road safety strategy that identifies high-risk areas and common traffic violations in order to prioritize further enforcement. In fact, by analyzing data on traffic violations in different districts and comparing them to the overall average using the Kolmogorov-Smirnov (KS) test, risky areas are identified and the most common violations are detected. We performed a comparison between several types of clustering optimizations to spot clusters to be enforced in order to reduce violations. Our results indicate that some Districts have a higher risk of traffic violations than others do, and some violations (Speeding, Registration, License, Belt, Influence, Phone, etc.) are more common than others are. We also find that k-means clustering provides the best results for identifying clusters of violations records and optimizing enforcement strategies. Our findings can be adopted by law enforcement agencies to focus on high-risk areas and target the most common violations in order to optimize their resources and improve road safety.

Keywords: Traffic violations, Road safety, Kolmogorov-smirnov (KS) test, Clustering enforcements, K-means

1. Introduction

Road traffic accidents are a major public health concern worldwide. The World Health Organization (WHO) reports that approximately 1.35 million people die each year due to road traffic accidents, making it the leading cause of death among young people aged 5–29 years [1]. Moreover, road traffic accidents are estimated to cause economic losses of up to 3% of a country's gross domestic product (GDP) [2]. Therefore, it is essential to prioritize measures that aim to reduce the incidence and severity of road traffic accidents.

One of the key contributors to road traffic accidents is traffic rule violations such as speeding, running red lights, and reckless driving that increase the risk of accidents and fatalities on the roads [3]. To improve traffic safety, two major strategies are mainly adopted: the top-down strategy and the bottom-up approach [[4], [5], [6]]. While top-down strategies involve a comprehensive approach to risk, taking into account all possible risks and their interconnections, bottom-up ones identify a specific scope of risk (in our case, areas and types of violations) and target interventions in those areas. According to this approach, it is crucial to identify areas with high rates of traffic violations and prioritize enforcement efforts in these areas. One way to achieve this is by using statistical methods to analyze not only the statistics of violations but also their distributions over space and to compare these distributions with empirical distributions or standards [7].

The Kolmogorov-Smirnov (KS) test is a non-parametric statistical test that is widely used to compare two probability distributions. In the context of traffic safety, the KS test can be used to compare the distribution of traffic violations in a particular area with a known distribution or standard [8]. By doing so, traffic safety officials can identify areas with high rates of violations and prioritize enforcement efforts in these areas [9]. In recent years, there has been an increasing interest in the use of statistical methods, including the KS test, to improve traffic safety [10,11].

In this paper, we explore the application of the KS test to traffic safety and demonstrate its usefulness in identifying areas with high rates of traffic violations. We present a case study of using the KS test to evaluate the distribution of 13 types of violations in a geographical area and compare it with a local average to identify risky places and the most common violations in order to prioritize enforcement efforts. Finally, we compare this approach with other approaches and strategies and present its advantages in providing a comprehensive analysis of traffic safety [12].

2. Related works

Several studies have been conducted on using computer science technology and applied algorithms to improve traffic safety. For instance, a study by Kwon et al. (2019) used data mining techniques to analyze traffic violation data and identify high-risk areas for traffic accidents [13]. Similarly, a study by K. A. Ismail et al. (2010) proposes the use of computer vision for automated and objective traffic conflict analysis, presenting methodologies for tracking and classifying road users, measuring their real-world coordinates, detecting pedestrian-vehicle conflicts, evaluating pedestrian scramble treatments, and detecting spatial traffic violations [14]. Another study conducted by P.S. Reddy et al. suggested an automated traffic violation detection system that can accurately identify signal violations in real-time using computer vision techniques, providing efficient monitoring and enforcement of traffic regulations, surpassing the limitations of human capacity, and enabling simultaneous detection of multiple violations [15]. In addition, R.J. Franklin et al. (2020) proposed a computer vision-based system that detects traffic violations employing YOLOV3 object detection to track and penalize violations such as signal jumps, vehicle speeds, and vehicle counts. The optimized implementation achieved high accuracy: 89.24% for speed violation detection [16].

Among all these studies aiming to reduce the traffic violations and reduce traffic accident, some works consider the use of statistical methods. In fact, in their study, T. B. Ambro and others (2021) [17] aimed to identify and evaluate major traffic violations and related risk factors using multinomial logit model. The results, based on data collected in China, revealed six major traffic violations: traffic light violation, illegal parking, wrong-way driving, speeding, and not wearing a seat belt. The findings suggest that considering these risky contributing factors during traffic regulations and enforcement development can help reduce traffic violations, create a smooth/healthy driving condition, and improve traffic safety. The [18] study performed by G. Zhang and others in 2013 analyzed traffic accident data from Guangdong Province for the period 2006–2010, with a focus on traffic violations and accident severity. The study found that reducing traffic violations, targeting different vehicle types and driver groups, and improving road and transport facilities are crucial measures needed to promote road safety in China and other regions. Then, M. H. Hosseinlou and others aimed in their research [19] to identify factors that contribute to the high rate of traffic violations and crashes on Iran's freeways Using statistical models. The study analyzed data from 36 road segments and found that average speed has a positive correlation with the number of violations and crashes, while peripheral landscapes, the number of interchanges, the number of passing lanes, and exemption from paying tolls have an inverse relationship with violations and crashes.

Another field of research involves clustering traffic violations and identifying high-risk spots. In fact, S. Vardaki and al focus on Greek drivers and their self-reported tendency to commit traffic violations related to speeding, drunk driving, and cell phone use [20]. Through cluster analysis, the authors identified three distinct groups of drivers with different attitudes and behaviors toward traffic violations. The findings suggest that age, gender, and area of residence are factors that influence drivers' attitudes and behaviors toward traffic violations. Another work proposes and compares two approaches for detecting vehicular spatial violations, k-means clustering and pattern matching with the longest common subsequence (LCSS) similarity measure [21]. The results show that LCSS matching was generally superior to k-means clustering for detecting U-turn violations at an urban intersection in Kuwait City. Finally [22], explores the spatio-temporal clustering patterns of traffic collisions using network-constrained methods, which were tested on data from the Jianghan District of Wuhan, China. The proposed methods, such as weighted network kernel density estimation, network cross K-function, network differential Local Moran's I, and network local indicators of mobility association, could help researchers, practitioners, and policymakers better understand the hotspot changes and reduce the risks associated with traffic collisions.

3. Methodology

There has been an increasing concern about traffic safety and the need for effective measures to reduce traffic violations and accidents. As introduced before, a potential approach is the use of statistical methods to identify high-risk areas and prioritize enforcement efforts. One such method is the KS test [23], which can compare the distribution of traffic violations in different areas and identify those with a higher rate of violations [[24], [25], [26]]. The KS test can detect areas with higher traffic violations than the national average, making it possible to use safety systems in these locations for effective targeting of repeat offenders, reducing deployment costs and false positives. Integrating safety technologies into the methodology can overcome traditional data collection limitations and provide comprehensive traffic safety data, such as human observation and manual data entry. It can also provide a more comprehensive analysis of traffic safety by providing data on the types of vehicles involved in violations and the times and locations of the violations. Fig. 1 presents the methodology for using the KS test to detect the most problematic areas and most common traffic violations.

A
Gather traffic violation data observed in different areas:

Fig. 1 — Overview of the proposed approach.

The first step is to collect traffic violation data for different areas in order to conduct an analysis of traffic safety. The data collected should include the location of the violation, the type of violation, and the identification of the offending vehicle. This information can be gathered from traffic cameras, police reports, or other sources.

B
Calculate the observed cumulative distribution function (CDF):

The cumulative distribution function (CDF) is a mathematical function that describes the probability distribution of a random variable by giving the probability that the variable takes a value less than or equal to a certain point. In other words, as shown in equation (1), it tells us the probability of observing a value less than or equal to a given value. In fact, for a continuous random variable X, as is our case, the CDF is defined as:

Equation 1.

(1)

where F(x) represents the probability that X takes a value less than or equal to x, i.e., the cumulative probability of X up to x. P represents the probability function, which gives the probability of a specific outcome or set of outcomes of X. And X represents the random variable whose CDF we are interested in. In this case, the observed cumulative distribution function (CDF) is calculated for each area based on the recorded violations and for each type of violation to detect the most problematic. This involves calculating the proportion of violations that occur at or below certain coordinates and then summing these proportions to obtain the cumulative distribution function.

C
Calculate the expected CDF based on the national average distribution:

The expected CDF is calculated based on the national average distribution. This involves obtaining data on the distribution of traffic violations across the entire country and using this to estimate the expected distribution of violations in a given area. This step is essential for further comparison.

D
Calculate the maximum difference between the observed and expected CDFs (D-statistic):

The D-statistic, also known as the Kolmogorov-Smirnov (KS) statistic, is a measure used in the KS test to compare two cumulative distribution functions (CDFs). The KS test is a non-parametric statistical test that determines if two datasets come from the same distribution or if they differ significantly. The D-statistic measures the maximum vertical distance between the two CDFs being compared, which represents the degree of difference between them.

In this experiment, the D-statistic is calculated as the maximum difference between the observed and expected CDFs. As shown in Fig. 2, this parameter provides a measure of the degree to which the observed distribution of violations deviates from the expected distribution. The larger the D-statistic, the greater the difference between the two distributions. The KS test also calculates a p-value, which represents the probability of obtaining a large D-statistic or a larger one than the observed value, assuming the two datasets come from the same distribution. If the p-value is less than a predetermined level of significance (e.g., 0.05), then the null hypothesis that the two datasets come from the same distribution is rejected.

The D-statistic, presented in equation (2), is a measure of the maximum absolute difference between the cumulative distribution functions (CDFs) of two datasets. Mathematically, the D-statistic is defined as:

Equation 2.

(2)

Where F₁(x) and F₂(x) are the CDFs of the two datasets being compared, and max denotes the maximum value over all x values.

E
Determine a significance level for the D-statistic and identify the area with the highest D-statistic:

A significance level is determined for the D-statistic to determine whether the observed deviation from the expected distribution is statistically significant. This involves comparing the D-statistic to a critical value based on the sample size and the desired level of significance. If the D-statistic is statistically significant, the area with the highest D-statistic value is identified as the high-risk area. This area is considered to have a significantly higher rate of traffic violations compared to what would be expected based on national averages.

F
Identify common violation

To identify common violations, traffic violations dataset is analyzed and most common violations are identified. In fact, the spatial distribution of some stops (Speed, license, registration, traffic light, and other categories) are much more different than the average. By identifying these common violations, law enforcement agencies can focus their resources by providing enforcement: personnel designing patrols in specific areas, or equipment such as speed radars, cameras, etc.

G
Clustering dataset stops

Clustering the traffic specific violations means applying clustering algorithms to identify the center of high-risk areas and optimize enforcement strategies to reduce it. We installed a cluster to be enforced for violations such as speeding and driving under the influence using tools like radar for speed and etylotest for alcohol levels. This step, by identifying the center of the problematic violations among all others, helps law enforcement agencies to allocate their resources more effectively by focusing on high-risk areas and targeting the most common violations.

In this stage, we suggest the use of following clustering optimizations.

1.
K-means is a popular clustering algorithm that partitions a dataset into k clusters based on the Euclidean distance between the data points and the cluster centers [27]. This algorithm involves randomly initializing k centroids, assigning each data point to its nearest centroid, and then updating the centroids based on the mean of the data points in each cluster. K-means is fast, scalable, and easy to implement, but it can be sensitive to the initial centroid locations and may converge to a local minimum instead of the global minimum [28].
2.
AHL (Agglomerative Hierarchical Clustering) is a type of clustering algorithm that is commonly used in machine learning and data mining. It is a bottom-up approach to clustering, where each data point is initially considered as a separate cluster, and clusters are then successively merged together based on their similarity [29]. In AHL, the similarity between two clusters is measured using a distance metric, and the algorithm iteratively merges the two most similar clusters until a stopping criterion is met. AHL has the advantage of being relatively simple to implement and interpret, and it can handle datasets of varying sizes and dimensions. However, it can be computationally expensive for large datasets, and the choice of distance metric can greatly affect the results of the clustering [30].
3.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups together data points that are closely packed together and separates outliers that are far from any cluster. It defines clusters as dense regions of data points that are separated by areas of lower density. DBSCAN define a neighborhood around each data point and then grouping together points that belong to the same cluster [31]. It is robust to noise and can find clusters of arbitrary shape. However, it requires setting two hyperparameters: the radius of the neighborhood and the minimum number of points required to form a dense region [32].
H
Prioritize enforcement efforts in the high-risk area:

Enforcement efforts are prioritized in the high-risk area to reduce the rate of traffic violations. License plate recognition technology can be used to identify repeat offenders and target enforcement efforts more effectively. To evaluate the effectiveness of the system in terms of reducing violations and improving traffic safety, equation (3) shows the propose Key Performance Indicator:

Equation 3.

(3)

Where NV is the number of violations detected before deploying the system, RV is the number of violations detected after deploying the system, C is the cost of deploying the system, TF is the improvement in traffic flow and reduction in congestion due to the system (measured as a percentage increase in average speed or reduction in travel time), and CF is a measure of the system's feasibility, taking into account factors such as public acceptance, privacy concerns, and technical challenges. The KPI formula calculates the ratio of the reduction in violations to the cost of deployment, multiplied by the improvement in traffic flow and congestion, and adjusted by the feasibility factor. A higher KPI score indicates a more effective and efficient system.

4. Results and discussion

4.1. Dataset

To understand the patterns of traffic violations in the state of Maryland, we collected a dataset that includes information on the type of violation, the location where it occurred, and the time and date of the violation for the year 2022. The data are organized by district, which allowed us to analyze the distribution of traffic violations across different areas in the state. We observed significant variations in the number of violations across the six districts of Maryland, with some districts having a much higher number of violations than others. To provide a more detailed picture of the types of traffic violations in each district, we identified different categories of violations and tagged each record accordingly.

When analyzing this data, we aim to identify high-risk areas for traffic violations and optimize enforcement strategies to improve road safety. Fig. 3 shows the distribution of traffic violations across the six districts of Maryland, USA. Each Police District violations are given with different color no matter the type of the stop.

Fig. 3 — Traffic violations dataset overview.

The fields of each record are: Date Of Stop, Time Of Stop, Agency, SubAgency, Description, Location, Latitude, Longitude, Accident, Belts, Personal Injury, Property Damage, Fatal, Commercial License, HAZMAT, Commercial Vehicle, Alcohol, Work Zone, State, Vehicle Type, Charge, Article, Contributed To Accident, Race, Gender, Driver City, Driver State, DL State, and Arrest Type. To provide a more detailed picture of the types of traffic violations in each district, we identified different categories of violations and tagged each record accordingly. Table 1 shows the definition of each tag used in the dataset. These tags include failure to obey red light, exceeding speed limit, failure to register vehicle or expired registration, driving without a valid license or with a suspended license, failure to stop at a stop sign, and so on.

Table 1.

Traffic violations type.

Traffic violation	Tag
Failure to obey red light	Light
Exceeding speed limit	Speeding
Failure to register vehicle or expired registration	Registration
Driving without a valid license or with a suspended license	License
Failure to obey device instructions, such as traffic cones or barriers	DeviceInstruction
Failure to stop at a stop sign	Stop
Using a cell phone while driving	Phone
Excessive window tinting or using not legal tinted materials	Tint
Driving under the influence of alcohol or drugs	Influence
Displaying a license plate that is not authorized or is obscured	Plate
Failure to maintain proper lane position or crossing over solid lines	Line
Not wearing a seat belt or allowing passengers to ride without wearing it	Belt
Other traffic violation	Other

Open in a new tab

By analyzing these tags, we were able to build separate sub-datasets for each type of violation, and for each dataset, we recorded the number of violations that occurred in each police district. This information will help us identify high-risk areas for each type of violation and optimize enforcement strategies accordingly. Table 2 details dataset stops according to their District and Type.

Table 2.

Dataset size Traffic violations type.

Tag\District	1st Rockville	2nd Bethesda	3rd Silver Spring	4th Wheaton	5th Germantown	6th Gaithersburg	Total
Light	504	1012	960	1272	680	583	5011
Speeding	439	794	595	1225	580	470	4103
Registration	639	893	797	1374	459	507	4669
License	353	397	693	1065	330	347	3185
DeviceInstruction	217	756	480	619	259	273	2604
Stop	106	196	219	350	122	102	1095
Phone	109	315	142	164	74	117	921
Tint	49	138	87	269	153	128	824
Influence	113	89	172	266	95	56	791
Plate	102	127	208	124	57	106	724
Line	40	76	64	152	39	45	416
Belt	32	26	60	139	52	49	358
Other	480	828	842	1199	451	478	4278
Total	3183	5647	5319	8218	3351	3261	28,979

Open in a new tab

4.2. KS-test

After calculating the CDF of each District a part, results, presented in Table 3, show that 1st District Rockville is the most problematic (Fig. 4-a) and that “Not wearing a seat belt or allowing passengers to ride without wearing a seat belt” is the most common violation observed followed by “Failure to stop at a stop sign” and then “influence" (Fig. 4-b).

Table 3.

D-Statistic of traffic violation by Tag.

TAG	D-Statistic
belt	0.28768
stop	0.27445
influence	0.22853
deviceInstruction	0.22109
license	0.21334
phone	0.21464
tint	0.18677
line	0.17089
speed	0.16019
other	0.15008
registration	0.12765
plate	0.12758
light	0.03698

Open in a new tab

Fig. 4 — Cdf by District (a) CDF by Tag (b).

In contrast, violations related to “light” have a very low D-statistic of 0.03698, which indicates that these violations are more evenly distributed across the 1st ^District Rockville and are less indicative of high-risk areas. This information may be useful for law enforcement agencies when deciding on the allocation of resources and the types of enforcement strategies that may be most effective for reducing violations related to “light”. Therefore, law enforcement agencies could focus their resources on monitoring and enforcing “belt” and “stop” violations in these high-risk areas to help reduce the number of accidents.

4.3. Clustering

In order to further optimize enforcement strategies for enhancing road safety, we conducted a comprehensive analysis by applying three different clustering algorithms: K-means, Agglomerative Hierarchical Clustering (AHL), and DBSCAN, to identify clusters of traffic violations in the most problematic district, Rockville. Our analysis was specifically focused on the three most common violations that pose significant risks to road users, namely “Failure to stop at a stop sign”, “Not wearing a seat belt or allowing passengers to ride without wearing a seat belt”, and “Driving under the influence of alcohol or drugs”. The results of our analysis, presented in Fig. 5, reveal that K-means clustering provides the most accurate and meaningful clusters. This finding is of practical significance as it enables law enforcement agencies to efficiently allocate their resources and target high-risk areas for effective enforcement actions.

Fig. 5 — Clustering methods on common traffic violations recorded in the 1st District.

4.4. Discussion

In order to determine the most effective approach for improving road safety through statistical analysis of traffic violations, we compared several different methods. These methods include the before-and-after study (BAS) [33,34], time series analysis (TSA) [35,36], and the KS test, the approach used in our study. As presented in Table 4, each approach has its own advantages and limitations that should be carefully considered when selecting a method to analyze traffic violation data.

Table 4.

Comparison between our approach, BAS and TSA.

Approach	Key Concept	Advantages	Limitations
Before-and-After Studies [33,34]	Compare number of violations before and after implementation of intervention.	Can directly measure the effectiveness of an intervention.	Can be influenced by external factors, such as changes in traffic volume or weather conditions.
Time Series Analysis [35,36]	Analyze changes in violations frequency over time using statistical models	Can account for trends and seasonality in violation data	-Can be complex to implement and interpret, -May require large amounts of data
KS Test (ours)	Compare observed distribution of violations with national average using cumulative distribution function (CDF)	-Simple to implement; -Can identify high-risk areas; -Can identify high risk Tag;	-Limited to identifying areas with significantly different distributions; -Does not provide information on changes over time

Open in a new tab

5. Conclusion

In conclusion, this study proposes an incremental road safety strategy to prioritize enforcement efforts by identifying high-risk areas and common traffic violations. The methodology involves the use of the Kolmogorov-Smirnov (KS) test to compare the distribution of traffic violations in different areas and identify those with a higher rate of violations. By analyzing data on traffic violations in different districts, we have identified the most problematic areas and common traffic violations. The results show that some districts have a higher risk of traffic violations than others do, and some violations are more common than others are. We also compared several types of clustering optimizations to spot clusters to be enforced, such as radars for speeding and etylotest for driving under influence. Our findings suggest that k-means clustering provides the best results for identifying clusters of violations records and optimizing enforcement strategies. Law enforcement agencies can use our findings to focus on high-risk areas and optimize their resources to improve road safety, thereby reducing the incidence and severity of accidents, saving lives and reducing economic costs.

Author contribution statement

El Mehdi Ben Laoula: Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.

Omar Elfahim: Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data.

Marouane EL Idaoui: Conceived and designed the experiments; Analyzed and interpreted the data.

Mohamed Youssfi: Omar Bouattane: Analyzed and interpreted the data.

Data availability statement

Data associated with this study has been deposited at https://www.kaggle.com/datasets/rounak041993/traffic-violations-in-maryland-county.

Additional information

No additional information is available for this paper.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Biographies

graphic file with name fx1.jpg

El Mehdi BEN LAOULA: Born in 1988 in Meknes, Morocco. PhD student at the university Hassan II Casablanca, SSDIA laboratory. His researches are focused on artificial intelligence and smart road security. Master's degree at ENSET Mohammedia in Distributed Information Systems in 2017 and Ph.D. candidate since 2019 at SSDIA Laboratory, El Mehdi leads research's scope are AI, ML, and computer vision especially in road safety purposes and path planning for emergency contexts.

graphic file with name fx2.jpg

Omar ELFAHIM: Born in 1996 in Tinghir, Morocco. PhD student at the university Hassan II Casablanca, SSDIA aboratory. His researches are focused on artificial intelligence and Multi-agent systems. Master's degree at FST Marrakech in Modeling and Scientific Computing for Mathematical Engineering in 2020 and Ph.D. candidate since 2021 at SSDIA Laboratory. His research focuses on AI, ML, and NLP, particularly in medical decision-making.

graphic file with name fx3.jpg

Marouane EL MIDAOUI: Born in Casablanca, Morocco, and he attended Hassan II University and graduated in 2017 with a master degree in distributed information system at ENSET Mohammedia. He is also concerned with computer science, IA, programming and software design. He received his Ph.D at 2022 in the scope of Intelligent Transport Systems and Logistics at SSDIA laboratory in the same university, His researches are focused on AI, logistics, IoT and Blockchain.

graphic file with name fx4.jpg

Mohamed YOUSSFI: Born in 1970 in OUARZAZATE, Morocco. He is now a teacher researcher in computer science, parallel and distributed systems at the University Hassan II, ENSET Institute as well as software consultant. His researches are focused on parallel and distributed computing technologies, grid computing and middleware's, received the B.S. degree in Mechanics in 1989 and the M.S. degree in Applied Mechanics in 1993 from the ENSET Institute, Mohammedia. Mr Youssfi owns a successful YouTube channel specialized in computer engineering and software programming.

graphic file with name fx5.jpg

Omar BOUATTANE: Omar Bouattane has received his PhD from the University Hassan II of Casablanca, in Parallel Computing and Image processing. He currently serves as a full Professor in the Department of Electrical Engineering at ENSET of Mohammedia. His topic is in various domains of high performance computing, image processing and electrical engineering, particularly in renewable energy and smart grids. Since 2012, He was the head of the laboratory of Signals, Distributed Systems and AI. He is the Director of ENSET Institute Mohammedia since 2018.

References

1.Sigala M., Beer A., Hodgson L., O'Connor A. 2019. Big Data for Measuring the Impact of Tourism Economic Development Programs: A Process and Quality Criteria Framework for Using Big Data. [Google Scholar]
2.Nguyen G., et al. Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey. Artif. Intell. Rev. 2019;52(1):77–124. doi: 10.1007/s10462-018-09679-z. [DOI] [Google Scholar]
3.Shorten C., Khoshgoftaar T.M. A survey on image data augmentation for deep learning. J. Big Data. 2019;6:1. doi: 10.1186/s40537-019-0197-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Jones R.N., Preston B.L. Adaptation and risk management. Wiley Interdiscip. Rev.: Clim. Change. 2011;2(2):296–308. doi: 10.1002/wcc.97. Mar./Apr. [DOI] [Google Scholar]
5.Davies D., Hogg A.B. Risk management: holistic risk management. Comput. Law Secur. Rep. 1997;13(5):336–339. doi: 10.1016/S0267-3649(97)80174-4. Sep./Oct. [DOI] [Google Scholar]
6.Olson E.G. Strategically managing risk in the information age: a holistic approach. J. Bus. Strat. 2005;22(3):31–38. doi: 10.1108/02756660510700618. May/Jun. [DOI] [Google Scholar]
7.Vinayakumar R., Alazab M., Soman K.P., Poornachandran P., Al-Nemrat A., Venkatraman S. Deep learning approach for intelligent intrusion detection system. IEEE Access. 2019;7:41525–41550. doi: 10.1109/ACCESS.2019.2895334. [DOI] [Google Scholar]
8.Sivaraman K., Krishnan R.M.V., Sundarraj B., Sri Gowthem S. Network failure detection and diagnosis by analyzing syslog and SNS data: applying big data analysis to network operations. Int. J. Innovative Technol. Explor. Eng. 2019;8(9 Special Issue 3):883–887. doi: 10.35940/ijitee.I3187.0789S319. [DOI] [Google Scholar]
9.Dwivedi A.D., Srivastava G., Dhar S., Singh R. A decentralized privacy-preserving healthcare blockchain for IoT. Sensors. 2019;19(2):1–17. doi: 10.3390/s19020326. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Al-Turjman F., Zahmatkesh H., Mostarda L. Quantifying uncertainty in internet of medical things and big-data services using intelligence and deep learning. IEEE Access. 2019;7:115749–115759. doi: 10.1109/ACCESS.2019.2931637. [DOI] [Google Scholar]
11.Kumar S., Singh M. Big data analytics for healthcare industry: impact, applications, and tools. Big Data Min. Anal. 2019;2(1):48–57. doi: 10.26599/BDMA.2018.9020031. [DOI] [Google Scholar]
12.Ang L.M., Seng K.P., Ijemaru G.K., Zungeru A.M. Deployment of IoV for smart cities: applications, architecture, and challenges. IEEE Access. 2019;7:6473–6492. doi: 10.1109/ACCESS.2018.2887076. [DOI] [Google Scholar]
13.Kwon H., et al. Identifying high-risk areas for traffic accidents using data mining techniques. J. Saf. Res. 2019;68:103–109. [Google Scholar]
14.Ismail K.A. T, University of British Columbia; 2010. Application of Computer Vision Techniques for Automated Road Safety Analysis and Traffic Data Collection.https://open.library.ubc.ca/collections/ubctheses/24/items/1.0062871 [Online]. Available: [DOI] [Google Scholar]
15.Reddy P.S., Nishwa T., Reddy R.S.K., Sadviq C., Rithvik K. 2021 6th International Conference on Communication and Electronics Systems (ICCES) 2021. Traffic rules violation detection using machine learning techniques; pp. 1264–1268. Coimbatre, India. [DOI] [Google Scholar]
16.Franklin R.J., Mohana . 2020 5th International Conference on Communication and Electronics Systems. ICCES); Coimbatore, India: 2020. Traffic signal violation detection using artificial intelligence and deep learning; pp. 839–844. [DOI] [Google Scholar]
17.Ambo T.B., Ma J., Fu C. "Investigating influence factors of traffic violation using multinomial logit method,". Int. J. Inj. Control Saf. Promot. Mar. 2021;28(1):78–85. doi: 10.1080/17457300.2020.1843499. [DOI] [PubMed] [Google Scholar]
18.Zhang G., Yau K.K.W., Chen G. "Risk factors associated with traffic violations and accident severity in China,". Accid. Anal. Prev. 2013;59:18–25. doi: 10.1016/j.aap.2013.05.004. [DOI] [PubMed] [Google Scholar]
19.Hosseinlou M.H., Mahdavi A., Nooghabi M.J. "Validation of the influencing factors associated with traffic violations and crashes on freeways of developing countries: a case study of Iran,". Accid. Anal. Prev. 2018;121:358–366. doi: 10.1016/j.aap.2018.06.009. [DOI] [PubMed] [Google Scholar]
20.Vardaki S., Yannis G. Investigating the self-reported behavior of drivers and their attitudes to traffic violations. J. Saf. Res. 2013;46:1–11. doi: 10.1016/j.jsr.2013.03.001. ISSN: 0022-4375. [DOI] [PubMed] [Google Scholar]
21.Ismail K., Sayed T., Zaki M.H., Alrukaibi F. Automated detection of spatial traffic violations through use of video sensors. Transport. Res. Rec. 2011;2241(1):87–98. doi: 10.3141/2241-10. [DOI] [Google Scholar]
22.Fan Y., Zhu X., She B., Guo W., Guo T. Network-constrained spatio-temporal clustering analysis of traffic collisions in Jianghan District of Wuhan, China. PLoS One. 2018;13(4) doi: 10.1371/journal.pone.0195093. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Massey F.J., Jr. "The Kolmogorov-Smirnov test for goodness of fit,". J. Am. Stat. Assoc. Mar. 1951;46(253):68–78. [Google Scholar]
24.Harjono M.S., Halim A., Ramli K. "Simulation of improved hybrid Petri nets intersection model considering traffic distribution,". Int. J. Soft Comput. 2012;7(4):217–223. ijscomp.2012.217.223. [Google Scholar]
25.Delavary M., Ghayeninezhad Z., Lavallière M. "Evaluating the impact of increased fuel cost and Iran's currency devaluation on road traffic volume and offenses in Iran, 2011–2019,". Saf. Now. 2020;6(4):49. doi: 10.3390/safety6040049. [DOI] [Google Scholar]
26.Liu Y., Alsaleh R., Sayed T. Modeling pedestrian temporal violations at signalized crosswalks: a random Intercept parametric survival model,". Transport. Res. Rec. 2022;2676(6):707–720. doi: 10.1177/03611981221076119. [DOI] [Google Scholar]
27.Hartigan J.A., Wong M.A. Algorithm AS 136: a k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 1979;28(1):100–108. doi: 10.2307/2346830. [DOI] [Google Scholar]
28.Yedla M., Pathakota S.R., Srinivasa T.M. Enhancing K-means clustering algorithm with improved initial center. Int. J. Comput. Sci. Inf. Technol. 2010;1(2):121–125. [Google Scholar]
29.Day W.H., Edelsbrunner H. Efficient algorithms for agglomerative hierarchical clustering methods. J. Classif. 1984;1(1):7–24. doi: 10.1007/BF01890115. [DOI] [Google Scholar]
30.Zhou S., Xu Z., Liu F. Method for determining the optimal number of clusters based on agglomerative hierarchical clustering. IEEE Transact. Neural Networks Learn. Syst. Dec. 2016;28(12):3007–3017. doi: 10.1109/TNNLS.2016.2608001. [DOI] [PubMed] [Google Scholar]
31.Khan K., Madani S.A., Khan S.U., Rehman A.U. The Fifth International Conference on the Applications of Digital Information and Web Technologies. ICADIWT 2014); 2014. DBSCAN: past, present and future; pp. 164–169. [DOI] [Google Scholar]
32.Ienco D., Bordogna G. Fuzzy extensions of the DBScan clustering algorithm. Soft Comput. 2018;22(5):1719–1730. doi: 10.1007/s00500-016-2435-0. Mar. [DOI] [Google Scholar]
33.Chapman E.A., Masten S.V., Browning K.K. Crash and traffic violation rates before and after licensure for novice California drivers subject to different driver licensing requirements. J. Saf. Res. 2014;50:125–138. doi: 10.1016/j.jsr.2014.05.005. [DOI] [PubMed] [Google Scholar]
34.Macdonald S., Wells D., Wild D., Rumbold D., Giesbrecht S. Collisions and traffic violations of alcohol, cannabis and cocaine abuse clients before and after treatment. Accid. Anal. Prev. 2004;36(5):795–800. doi: 10.1016/j.aap.2003.07.004. [DOI] [PubMed] [Google Scholar]
35.Foroutaghe M.D., Moghaddam A.M., Fakoor V. Impact of law enforcement and increased traffic fines policy on road traffic fatality, injuries and offenses in Iran: interrupted time series analysis. PLoS One. Apr. 2020;15(4) doi: 10.1371/journal.pone.0231182. 10.1371/journal.pone.0231182. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Novoa A.M., et al. Effect on road traffic injuries of criminalizing road traffic offences: a time-series study. Bull. World Health Organ. 2011;89:422–431. doi: 10.2471/BLT.10.082180. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data associated with this study has been deposited at https://www.kaggle.com/datasets/rounak041993/traffic-violations-in-maryland-county.

[bib1] 1.Sigala M., Beer A., Hodgson L., O'Connor A. 2019. Big Data for Measuring the Impact of Tourism Economic Development Programs: A Process and Quality Criteria Framework for Using Big Data. [Google Scholar]

[bib2] 2.Nguyen G., et al. Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey. Artif. Intell. Rev. 2019;52(1):77–124. doi: 10.1007/s10462-018-09679-z. [DOI] [Google Scholar]

[bib3] 3.Shorten C., Khoshgoftaar T.M. A survey on image data augmentation for deep learning. J. Big Data. 2019;6:1. doi: 10.1186/s40537-019-0197-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Jones R.N., Preston B.L. Adaptation and risk management. Wiley Interdiscip. Rev.: Clim. Change. 2011;2(2):296–308. doi: 10.1002/wcc.97. Mar./Apr. [DOI] [Google Scholar]

[bib5] 5.Davies D., Hogg A.B. Risk management: holistic risk management. Comput. Law Secur. Rep. 1997;13(5):336–339. doi: 10.1016/S0267-3649(97)80174-4. Sep./Oct. [DOI] [Google Scholar]

[bib6] 6.Olson E.G. Strategically managing risk in the information age: a holistic approach. J. Bus. Strat. 2005;22(3):31–38. doi: 10.1108/02756660510700618. May/Jun. [DOI] [Google Scholar]

[bib7] 7.Vinayakumar R., Alazab M., Soman K.P., Poornachandran P., Al-Nemrat A., Venkatraman S. Deep learning approach for intelligent intrusion detection system. IEEE Access. 2019;7:41525–41550. doi: 10.1109/ACCESS.2019.2895334. [DOI] [Google Scholar]

[bib8] 8.Sivaraman K., Krishnan R.M.V., Sundarraj B., Sri Gowthem S. Network failure detection and diagnosis by analyzing syslog and SNS data: applying big data analysis to network operations. Int. J. Innovative Technol. Explor. Eng. 2019;8(9 Special Issue 3):883–887. doi: 10.35940/ijitee.I3187.0789S319. [DOI] [Google Scholar]

[bib9] 9.Dwivedi A.D., Srivastava G., Dhar S., Singh R. A decentralized privacy-preserving healthcare blockchain for IoT. Sensors. 2019;19(2):1–17. doi: 10.3390/s19020326. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Al-Turjman F., Zahmatkesh H., Mostarda L. Quantifying uncertainty in internet of medical things and big-data services using intelligence and deep learning. IEEE Access. 2019;7:115749–115759. doi: 10.1109/ACCESS.2019.2931637. [DOI] [Google Scholar]

[bib11] 11.Kumar S., Singh M. Big data analytics for healthcare industry: impact, applications, and tools. Big Data Min. Anal. 2019;2(1):48–57. doi: 10.26599/BDMA.2018.9020031. [DOI] [Google Scholar]

[bib12] 12.Ang L.M., Seng K.P., Ijemaru G.K., Zungeru A.M. Deployment of IoV for smart cities: applications, architecture, and challenges. IEEE Access. 2019;7:6473–6492. doi: 10.1109/ACCESS.2018.2887076. [DOI] [Google Scholar]

[bib13] 13.Kwon H., et al. Identifying high-risk areas for traffic accidents using data mining techniques. J. Saf. Res. 2019;68:103–109. [Google Scholar]

[bib14] 14.Ismail K.A. T, University of British Columbia; 2010. Application of Computer Vision Techniques for Automated Road Safety Analysis and Traffic Data Collection.https://open.library.ubc.ca/collections/ubctheses/24/items/1.0062871 [Online]. Available: [DOI] [Google Scholar]

[bib15] 15.Reddy P.S., Nishwa T., Reddy R.S.K., Sadviq C., Rithvik K. 2021 6th International Conference on Communication and Electronics Systems (ICCES) 2021. Traffic rules violation detection using machine learning techniques; pp. 1264–1268. Coimbatre, India. [DOI] [Google Scholar]

[bib16] 16.Franklin R.J., Mohana . 2020 5th International Conference on Communication and Electronics Systems. ICCES); Coimbatore, India: 2020. Traffic signal violation detection using artificial intelligence and deep learning; pp. 839–844. [DOI] [Google Scholar]

[bib17] 17.Ambo T.B., Ma J., Fu C. "Investigating influence factors of traffic violation using multinomial logit method,". Int. J. Inj. Control Saf. Promot. Mar. 2021;28(1):78–85. doi: 10.1080/17457300.2020.1843499. [DOI] [PubMed] [Google Scholar]

[bib18] 18.Zhang G., Yau K.K.W., Chen G. "Risk factors associated with traffic violations and accident severity in China,". Accid. Anal. Prev. 2013;59:18–25. doi: 10.1016/j.aap.2013.05.004. [DOI] [PubMed] [Google Scholar]

[bib19] 19.Hosseinlou M.H., Mahdavi A., Nooghabi M.J. "Validation of the influencing factors associated with traffic violations and crashes on freeways of developing countries: a case study of Iran,". Accid. Anal. Prev. 2018;121:358–366. doi: 10.1016/j.aap.2018.06.009. [DOI] [PubMed] [Google Scholar]

[bib20] 20.Vardaki S., Yannis G. Investigating the self-reported behavior of drivers and their attitudes to traffic violations. J. Saf. Res. 2013;46:1–11. doi: 10.1016/j.jsr.2013.03.001. ISSN: 0022-4375. [DOI] [PubMed] [Google Scholar]

[bib21] 21.Ismail K., Sayed T., Zaki M.H., Alrukaibi F. Automated detection of spatial traffic violations through use of video sensors. Transport. Res. Rec. 2011;2241(1):87–98. doi: 10.3141/2241-10. [DOI] [Google Scholar]

[bib22] 22.Fan Y., Zhu X., She B., Guo W., Guo T. Network-constrained spatio-temporal clustering analysis of traffic collisions in Jianghan District of Wuhan, China. PLoS One. 2018;13(4) doi: 10.1371/journal.pone.0195093. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Massey F.J., Jr. "The Kolmogorov-Smirnov test for goodness of fit,". J. Am. Stat. Assoc. Mar. 1951;46(253):68–78. [Google Scholar]

[bib24] 24.Harjono M.S., Halim A., Ramli K. "Simulation of improved hybrid Petri nets intersection model considering traffic distribution,". Int. J. Soft Comput. 2012;7(4):217–223. ijscomp.2012.217.223. [Google Scholar]

[bib25] 25.Delavary M., Ghayeninezhad Z., Lavallière M. "Evaluating the impact of increased fuel cost and Iran's currency devaluation on road traffic volume and offenses in Iran, 2011–2019,". Saf. Now. 2020;6(4):49. doi: 10.3390/safety6040049. [DOI] [Google Scholar]

[bib26] 26.Liu Y., Alsaleh R., Sayed T. Modeling pedestrian temporal violations at signalized crosswalks: a random Intercept parametric survival model,". Transport. Res. Rec. 2022;2676(6):707–720. doi: 10.1177/03611981221076119. [DOI] [Google Scholar]

[bib27] 27.Hartigan J.A., Wong M.A. Algorithm AS 136: a k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics) 1979;28(1):100–108. doi: 10.2307/2346830. [DOI] [Google Scholar]

[bib28] 28.Yedla M., Pathakota S.R., Srinivasa T.M. Enhancing K-means clustering algorithm with improved initial center. Int. J. Comput. Sci. Inf. Technol. 2010;1(2):121–125. [Google Scholar]

[bib29] 29.Day W.H., Edelsbrunner H. Efficient algorithms for agglomerative hierarchical clustering methods. J. Classif. 1984;1(1):7–24. doi: 10.1007/BF01890115. [DOI] [Google Scholar]

[bib30] 30.Zhou S., Xu Z., Liu F. Method for determining the optimal number of clusters based on agglomerative hierarchical clustering. IEEE Transact. Neural Networks Learn. Syst. Dec. 2016;28(12):3007–3017. doi: 10.1109/TNNLS.2016.2608001. [DOI] [PubMed] [Google Scholar]

[bib31] 31.Khan K., Madani S.A., Khan S.U., Rehman A.U. The Fifth International Conference on the Applications of Digital Information and Web Technologies. ICADIWT 2014); 2014. DBSCAN: past, present and future; pp. 164–169. [DOI] [Google Scholar]

[bib32] 32.Ienco D., Bordogna G. Fuzzy extensions of the DBScan clustering algorithm. Soft Comput. 2018;22(5):1719–1730. doi: 10.1007/s00500-016-2435-0. Mar. [DOI] [Google Scholar]

[bib33] 33.Chapman E.A., Masten S.V., Browning K.K. Crash and traffic violation rates before and after licensure for novice California drivers subject to different driver licensing requirements. J. Saf. Res. 2014;50:125–138. doi: 10.1016/j.jsr.2014.05.005. [DOI] [PubMed] [Google Scholar]

[bib34] 34.Macdonald S., Wells D., Wild D., Rumbold D., Giesbrecht S. Collisions and traffic violations of alcohol, cannabis and cocaine abuse clients before and after treatment. Accid. Anal. Prev. 2004;36(5):795–800. doi: 10.1016/j.aap.2003.07.004. [DOI] [PubMed] [Google Scholar]

[bib35] 35.Foroutaghe M.D., Moghaddam A.M., Fakoor V. Impact of law enforcement and increased traffic fines policy on road traffic fatality, injuries and offenses in Iran: interrupted time series analysis. PLoS One. Apr. 2020;15(4) doi: 10.1371/journal.pone.0231182. 10.1371/journal.pone.0231182. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] 36.Novoa A.M., et al. Effect on road traffic injuries of criminalizing road traffic offences: a time-series study. Bull. World Health Organ. 2011;89:422–431. doi: 10.2471/BLT.10.082180. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Traffic violations analysis: Identifying risky areas and common violations

El Mehdi Ben Laoula

Omar Elfahim

Marouane El Midaoui

Mohamed Youssfi

Omar Bouattane

Abstract

1. Introduction

2. Related works

3. Methodology

Fig. 1.

Fig. 2.