Skip to main content
Data in Brief logoLink to Data in Brief
. 2024 Jul 31;56:110790. doi: 10.1016/j.dib.2024.110790

HelmetML: A dataset of helmet images for machine learning applications

Kailas Patil a,b,, Rohini Jadhav c, Yogesh Suryawanshi a, Prawit Chumchu b,, Gaurav Khare a, Tanishk Shinde a
PMCID: PMC11350450  PMID: 39206221

Abstract

The improper wearing or absence of helmets represents a significant contributing factor to fatal accidents in motorcycle driving. This dataset serves the purpose of detecting whether individuals have correctly or incorrectly worn helmets through camera-based analysis. The Helmet dataset has been curated, comprising a total of 28,736 images featuring various helmet types, including Full-Face, Half-Face, Modular, and Off-Road Helmets, in both correct and incorrect configurations. Captured using an iPhone 13 and Mi10T mobile phones, the images exhibit diverse climatic conditions, ranging from daytime to night-time scenarios. Subsequent to image acquisition, a pre-processing phase was undertaken to standardize the dataset. This involved renaming the images and adjusting their dimensions to a uniform 768 × 576 resolution, after which they were organized into respective folders. The uniqueness of this dataset lies in its incorporation of diverse environmental conditions, comprehensive helmet types, variability in helmet orientations, and its status as a large and balanced dataset, thereby presenting a realistic representation of real-world scenarios. The dataset's utility extends to various machine learning tasks, including image classification, object detection, and pose estimation specifically geared towards helmet recognition. Its scientific value lies in its potential to advance research and development in the realm of safety measures associated with motorcycle helmet usage.

Keywords: Helmet recognition, Machine learning, Image classification, Safe driving, Bike, Real-world scenarios


Specifications Table

Subject Applied Machine Learning
Specific subject area Helmet
Data format Raw
Type of data Images
Data collection A diverse array of photographs was captured in various environmental settings, encompassing 180-degree perspectives. Four distinct helmet types, namely Full-Face Helmet, Half-Face Helmet, Modular Helmet, and Off-Road Helmet, were featured in this dataset. Each helmet was systematically worn in both correct and incorrect orientations. The resulting dataset comprises a total of 28,736 images, evenly distributed between instances where helmets were worn correctly (13,780 images) and instances where helmets were worn incorrectly (13,780 images). In the course of image acquisition, the care was taken to safeguard privacy and sensitive information. Specifically, measures were implemented to conceal facial features, bike number plates, and any other identifying information present within the images. The images were captured utilizing the mobile application on an iPhone 13 and Mi10T camera, encompassing a diverse range of climatic conditions including day and night settings.
Data source location Vishwakarma University,
Kondhwa Budruk, Maharashtra, Pune, India
Longitude and Latitude: 18°27′37.8″N 73°53′00.9″E
Data accessibility Repository name: Mendeley Data
Title: Helmet Wearing Image Dataset
Data identification number: 10.17632/tm72fkfxd5.3
Direct URL to data: https://data.mendeley.com/datasets/tm72fkfxd5/3

1. Value of the Data

  • The dataset serves as a foundation for research in the field of computer vision and machine learning applied to safety-related domains. Researchers can leverage this dataset to advance the state-of-the-art in helmet recognition, image classification, object detection, and pose estimation.

  • The dataset, with its diverse images of different helmet types in varying conditions, provides a valuable resource for training machine learning models. This, in turn, facilitates the creation of robust algorithms capable of accurately recognizing and classifying helmet usage.

  • The dataset bridges computer science, engineering, and safety disciplines, fostering interdisciplinary collaboration and knowledge exchange for the collective goal of enhancing safety in motorcycle driving through technological innovations.

  • The large and balanced nature of the dataset allows for robust validation of machine learning algorithms, ensuring their accuracy and reliability in distinguishing correct and incorrect helmet usage across diverse conditions.

  • The dataset facilitates the identification of proper and improper helmet usage using camera analysis. This contributes to advancing technology-driven safety protocols in motorcycle riding, potentially mitigating injuries in road accidents.

2. Background

The impetus behind the creation of this dataset stems from a pervasive issue observed in daily reports of motorcycle accidents, prominently featured in newspapers. A discernible pattern has emerged, indicating that a substantial proportion of these accidents is attributable to the non-adherence or improper application of helmet usage protocols. In tandem with these observations, regulatory bodies, represented by law enforcement agencies or government authorities, have undertaken measures to address this public safety concern. Fines are levied upon individuals found to be either inadequately wearing helmets or, in some instances, not wearing them at all while operating a motorcycle.

To address the multifaceted challenges associated with non-compliance to helmet regulations, we sought to embark on a technological initiative aimed at automating the enforcement of proper helmet usage. Specifically, our objective was to devise a dataset capable of facilitating the development of machine learning algorithms capable of discerning between correct and incorrect helmet wearing through camera-based analysis. The envisioned outcome is a system that could potentially actuate motorcycle ignition only when a helmet is correctly worn, thereby automating and augmenting current enforcement mechanisms.

3. Data Description

For motorcycle riders, helmets are an essential safeguard, protecting the most critical part of your body – the head. Statistics show they dramatically reduce the risk of fatal head injuries in an accident. Helmets absorb the impact of a crash, distributing the force and shielding your skull from fractures. This significantly reduces the likelihood of severe brain trauma. Even in minor mishaps, helmets offer crucial protection from scrapes, cuts, and debris. This minimizes immediate injury severity and the potential for later complications and infections [1]. Furthermore, in most countries, helmet use is mandated by law. Following this law ensures your safety and avoids potential fines [2]. It's important to choose a helmet that fits properly and meets safety standards for optimal protection. A comfortable, well-maintained helmet allows for better focus and reaction time while riding. Ultimately, a helmet is your primary defense in a motorcycle accident, potentially making the difference between a minor incident and a life-altering injury.

There are many different types of helmets images used in the study. Here is a brief description of the four types of helmets [3] which are used to create dataset:

Full-face helmet: Full-face helmets provide the most protection for the head, face, and neck. They are typically used by motorcycle riders and race car drivers.

Half-face helmet: Half-face helmets provide protection for the head, but not the face. They are typically used by scooter riders and moped riders.

Modular helmet: Modular helmets, also known as flip-up helmets, can be opened up to expose the face. This can be helpful for riders who need to take off their helmet frequently, such as when talking to someone or eating.

Off-road helmet: Off-road helmets are designed for use by dirt bike riders and motocross riders. They have a visor to protect the rider's face from dirt and debris, and a peak to help keep the sun out of the rider's eyes.

We created a new dataset to improve motorcycle helmet safety through image recognition. This dataset tackles the critical issue of improper helmet use contributing to motorcycle fatalities. It includes a large and balanced collection of 28,736 images showcasing various helmet types (full-face, half-face, modular, off-road) worn correctly and incorrectly. To ensure real-world applicability, the images capture diverse lighting conditions (day and night). The dataset underwent pre-processing for efficient use, including renaming, resizing, and organized storage. This unique resource incorporates a variety of environments, helmet types, and orientations. The dataset's potential lies in its usefulness for machine learning tasks like image classification, object detection, and helmet pose estimation. Ultimately, this dataset holds scientific value by promoting advancements in motorcycle helmet safety research through better image recognition techniques.

In the domain of machine learning datasets, there has been a remarkable surge in significant contributions, particularly in image datasets [[4], [5], [6], [7], [8], [9], [10]]. Motivated by this, we aimed to create a diverse helmet image dataset sourced from a wide array of images. During the image acquisition phase, an approach to privacy preservation was adopted, with one of the authors utilizing a cotton mask to obscure facial features. This measure aimed to ensure the anonymity of individuals in the images, adhering to ethical considerations and privacy standards.

Post-capture, a systematic preprocessing pipeline was implemented to optimize the dataset for subsequent analysis. This involved a renaming procedure for standardization purposes and the imposition of uniform dimensions across all images. Following this preprocessing stage, the images were systematically organized and stored in their respective folders.

The resultant dataset comprises a total of 28,736 high-resolution images, offering a nuanced exploration of helmet wear in diverse environmental conditions. It is noteworthy that the dataset is segregated into two main folders, namely "Correct way" and "Incorrect way," reflecting the binary categorization of the helmet wearing instances. Within each of these folders, a hierarchical structure is observed with four subfolders, each corresponding to a distinct helmet type: Full-Face Helmet, Half-Face Helmet, Modular Helmet, and Off-Road Helmet.

The dataset sample images are shown in Table 1.

Table 1.

Showcasing sample images of helmet dataset.

Correct Way Incorrect Way
Type of Helmet Sample Images Sample Images
Full-Face Helmet Image, table 1 Image, table 1
Half-Face Helmet Image, table 1 Image, table 1
Modular Helmet Image, table 1 Image, table 1
Off-Road Helmet Image, table 1 Image, table 1

Table 2 highlights the comprehensive nature of our dataset, which includes a larger number of images, a wider variety of helmet types, and both correct and incorrect wearing configurations under diverse environmental conditions, compared to other datasets that are typically limited in scope and environmental variation. This ensures a more realistic representation for machine learning tasks, enhancing the dataset's utility in research and development for helmet recognition and safety measures.

Table 2.

Difference between our work and others dataset.

Features Our Dataset [13] Other/Existing Datasets
Viklundvisuals [11] Ahsanization [12]
Total Images 27560 764 5000
Helmet Types Full-Face, Half-Face, Modular, Off-Road N/A Safety helmet only
Wearing configurations Correct and Incorrect wearing position in different environment 2 classes: With or without helmet With helmet
Environmental Conditions Diverse (daytime and nighttime) N/A N/A
Device Used iPhone 13 and Mi10T Unknown Unknown

4. Experimental Design, Materials and Methods

Images depicting individuals wearing four distinct helmet types, namely Full-Face, Half-Face, Modular, and Off-Road helmets, were captured within the premises of Vishwakarma University, located at coordinates 18°27′38.4"N and 73°53′01.2"E, amidst diverse environmental conditions.

4.1. Image creation and processing

To create this dataset our different helmet types was chosen which are most widely used worldwide while driving, which includes Full-Face helmet, Half-Face helmet, Modular helmet and Off-Road helmet. The images were captured using mobile phones iPhone 13 and Mi10T. The details of mobile device information is mentioned in Table 3.

Table 3.

Mobile device and image details of helmet dataset.

Sr. No. Mobile Model Details Particulars
1 Mi10T Phone type Smartphone
2 Operating System Android
3 Company name Xiaomi
4 Version M2007J3SP
5 Camera 64MP
6 Images Captured 1176

1 iPhone 13 Phone type Smartphone
2 Operating System iOS
3 Company name Apple
4 Version A2633
5 Camera 12 MP
6 Images Captured 6036

Total Original Images 7212

The dataset was generated using a diverse array of helmet wearing images. During the image acquisition phase, an approach to privacy preservation was adopted, with one of the authors utilizing a cotton mask to obscure facial features. This measure aimed to ensure the anonymity of individuals in the images, adhering to ethical considerations and privacy standards. The image capturing process is shown in Fig. 1.

Fig. 1.

Fig 1

Illustrative representation of Image capturing process.

Following the capture process, a structured preprocessing pipeline was instituted to refine the dataset for subsequent analytical procedures. This encompassed a standardization-oriented renaming protocol and the enforcement of consistent dimensions and renaming across all images by using IrfanView software (Fig. 2).

Fig. 2.

Fig 2

Image pre-processing method.

Additionally, it entailed the implementation of an image augmentation technique to ensure the balanced dataset. In this dataset, the ratio of original to augmented images is 1:3. This means that for every original image, there are three augmented versions. This ratio is essential for evaluating the dataset's balance and understanding the training dynamics for machine learning models, as it ensures a robust variety of training examples derived from each original image, enhancing the model's ability to generalize across different scenarios. Representative examples of the augmented images are illustrated in Fig. 3.

Fig. 3.

Fig 3

Augmented sample images for balanced datasets.

Following this preprocessing stage, the images were systematically organized and stored in their respective folders. The resultant dataset comprises a total of 28,736 high-resolution images, offering a nuanced exploration of helmet wear in diverse environmental conditions. It is noteworthy that the dataset is segregated into two main folders, namely "Correct way" and "Incorrect way," reflecting the binary categorization of the helmet wearing instances. Within each of these folders, a hierarchical structure is observed with four subfolders, each corresponding to a distinct helmet type: Full-Face Helmet, Half-Face Helmet, Modular Helmet, and Off-Road Helmet.

4.2. Dataset utilization

We set out to develop an advanced helmet detection system by harnessing the power of pre-trained convolutional neural network (CNN) models, specifically VGG19, ResNet50, and MobileNet. Our dataset consisted of a comprehensive collection of images categorized into two distinct folders: "correct_way" and "incorrect_way," with a combined total of 28,736 images, equally distributed between the two classes. To facilitate effective model training and evaluation, we partitioned the dataset into a training set comprising 70% of the data and a testing set comprising the remaining 30%. During the training phase, we employed a fine-tuning approach, wherein the pre-trained models were adjusted and optimized using the training subset to improve their ability to accurately classify helmet images. Leveraging the concept of transfer learning, our models were equipped with the capacity to transfer knowledge learned from large-scale datasets like ImageNet to the specific task of helmet detection, thereby enhancing their performance in distinguishing between instances of proper and improper helmet usage. Following rigorous training, we meticulously evaluated the performance of our models on the reserved testing dataset. Through the computation of various metrics such as accuracy, precision, recall, and F1 score, we assessed the effectiveness of our models in accurately classifying helmet images. Our comprehensive implementation not only ensures the robustness and accuracy of helmet detection but also offers an automated solution for safety monitoring across diverse scenarios, thus facilitating compliance with safety regulations and standards.

The Table 4 presents the performance metrics of three pre-trained convolutional neural network (CNN) models, namely MobileNetV2, ResNet50, and VGG19, before and after training for helmet detection. The accuracy values demonstrate the improvement achieved through training, indicating the models' ability to effectively classify helmet images post-training.

Table 4.

Confusion matrices of pretrained machine learning models on the helmet dataset: before and after training with our dataset.

Model Before training After Training
MobileNetV2 Image, table 4 Image, table 4
Resnet 50 Image, table 4 Image, table 4
VGG19 Image, table 4 Image, table 4

Futuer Work

As future work, we plan to integrate GAN-based augmentation techniques into our dataset enhancement process. This will involve generating synthetic images that depict various helmet types and wearing configurations under different environmental conditions.

Limitations

NA

Ethics Statement

The dataset's author, Gaurav Khare, is portrayed in the dataset images. Privacy has been maintained during image capture through the use of cotton masks to obscure facial features. The authors assert that there are no conflicts of interest. This research did not involve animal or human studies and did not inflict harm on any living organism.

CRediT authorship contribution statement

Kailas Patil: Conceptualization, Supervision, Writing – review & editing. Rohini Jadhav: Writing – review & editing. Yogesh Suryawanshi: Conceptualization, Writing – review & editing. Prawit Chumchu: Writing – review & editing. Gaurav Khare: Methodology. Tanishk Shinde: Methodology.

Acknowledgements

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Contributor Information

Kailas Patil, Email: kailas@eng.src.ku.ac.th, kailas.patil@vupune.ac.in.

Prawit Chumchu, Email: prawit@eng.src.ku.ac.th.

Data Availability

References

  • 1.Thompson D.C., Rivara F., Thompson R., Cochrane Injuries Group. Helmets for preventing head and facial injuries in bicyclists. Cochrane Database Syst. Rev. 1999;4 doi: 10.1002/14651858.CD001855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Houston D.J., Richardson L.E. Motorcyclist fatality rates and mandatory helmet-use laws. Accid. Anal. Prev. 2008;40(1):200–208. doi: 10.1016/j.aap.2007.05.005. [DOI] [PubMed] [Google Scholar]
  • 3.Rice T.M., Troszak L., Erhardt T., Trent R.B., Zhu M. Novelty helmet use and motorcycle rider fatality. Accid. Anal. Prev. 2017;103:123–128. doi: 10.1016/j.aap.2017.04.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Suryawanshi Y., Patil K., Chumchu P. VegNet: dataset of vegetable quality images for machine learning applications. Data Br. 2022;45 doi: 10.1016/j.dib.2022.108657. 108657 ISSN 2352-3409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Suryawanshi Y., Gunjal N., Kanorewala B., Patil K. Yoga dataset: a resource for computer vision-based analysis of Yoga asanas. Data Br. 2023;48 doi: 10.1016/j.dib.2023.109257. 109257 ISSN 2352-3409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Thite S., Suryawanshi Y., Patil K., Chumchu P. Coconut (Cocos nucifera) tree disease dataset: a dataset for disease detection and classification for machine learning applications. Data Br. 2023;51 doi: 10.1016/j.dib.2023.109690. 109690 ISSN 2352-3409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Thite S., Suryawanshi Y., Patil K., Chumchu P. Sugarcane leaf dataset: a dataset for disease detection and classification for machine learning applications. Data Br. 2024;53 doi: 10.1016/j.dib.2024.110268. 110268 ISSN 2352-3409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Meshram V., Suryawanshi Y., Meshram V., Patil K. Addressing misclassification in deep learning: a merged Net approach. Softw. Impacts. 2023;17 doi: 10.1016/j.simpa.2023.100525. 100525 ISSN 2665-9638. [DOI] [Google Scholar]
  • 9.Jadhav R., Suryawanshi Y., Bedmutha Y., Patil K., Chumchu P. Mint leaves: dried, fresh, and spoiled dataset for condition analysis and machine learning applications. Data Br. 2023;51 doi: 10.1016/j.dib.2023.109717. 109717 ISSN 2352-3409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Meshram V., Meshram V., Patil K., Suryawanshi Y., Chumchu P. A comprehensive dataset of damaged banknotes in Indian currency (Rupees) for analysis and classification. Data Br. 2023;51 doi: 10.1016/j.dib.2023.109699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.A. Mvd, (2020). Helmet Detection [Data set]. Kaggle. https://www.kaggle.com/datasets/andrewmvd/helmet-detection/data.
  • 12.A. Mvd, (2020). Hard Hat Detection [Data set]. Kaggle. https://www.kaggle.com/datasets/andrewmvd/hard-hat-detection.
  • 13.Khare Gaurav, Suryawanshi Yogesh, PATIL Kailas. Helmet wearing image dataset”. Mendeley Data. 2023;V2 doi: 10.17632/tm72fkfxd5.2. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES