Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2022 Sep 30;82(9):14135–14152. doi: 10.1007/s11042-022-13913-w

Face mask detection and social distance monitoring system for COVID-19 pandemic

Iram Javed 1, Muhammad Atif Butt 2, Samina Khalid 1,, Tehmina Shehryar 3, Rashid Amin 4, Adeel Muzaffar Syed 5, Marium Sadiq 1
PMCID: PMC9522539  PMID: 36196269

Abstract

Coronavirus triggers several respirational infections such as sneezing, coughing, and pneumonia, which transmit humans to humans through airborne droplets. According to the guidelines of the World Health Organization, the spread of COVID-19 can be mitigated by avoiding public interactions in proximity and following standard operating procedures (SOPs) including wearing a face mask and maintaining social distancing in schools, shopping malls, and crowded areas. However, enforcing the adaptation of these SOPs on a larger scale is still a challenging task. With the emergence of deep learning-based visual object detection networks, numerous methods have been proposed to perform face mask detection on public spots. However, these methods require a huge amount of data to ensure robustness in real-time applications. Also, to the best of our knowledge, there is no standard outdoor surveillance-based dataset available to ensure the efficacy of face mask detection and social distancing methods in public spots. To this end, we present a large-scale dataset comprising of 10,000 outdoor images categorized into a binary class labeling i.e., face mask, and non-face masked people to accelerate the development of automated face mask detection and social distance measurement on public spots. Alongside, we also present an end-to-end pipeline to perform real-time face mask detection and social distance measurement in an outdoor environment. Initially, existing state-of-the-art single and multi-stage object detection networks are fine-tuned on the proposed dataset to evaluate their performance in terms of accuracy and inference time. Based on better performance, YOLO-v3 architecture is further optimized by tuning its feature extraction and region proposal generation layers to improve the performance in real-time applications. Our results indicate that the presented pipeline performed better than the baseline version, showing an improvement of 5.3% in terms of accuracy.

Keywords: Face mask detection, Social distance measurement, Single and multi-stage detectors, Coronavirus

Introduction

Coronavirus broke out at the end of 2019, and it is still devastating havoc on the livelihood and businesses of millions of people around the world [13]. Since the world has started recovering from the pandemic, people intend to return to a state of regularity, the same as before the pandemic. However, there is an upsurge of uneasiness among the people in getting back to their normal routine because this virus spreads through droplets of saliva from an infected person which can affect the people within the range of approximately 6 feet. The main symptoms of this infection are fever, headache, cough, respiratory difficulties, loss of taste, and smell ability which leads to the death of the infected person [41]. The incidence rate of COVID-19 is higher than other acute respiratory problems like severe acute respiratory syndrome (SARS) and the Middle East respiratory syndrome (MERS).

To prevent this deadly virus, World Health Organization (WHO) [35] issued guidelines and SOPs such as wearing a face mask and maintaining social distance in public spots. In this regard, several research studies also reported that maintaining the distance while physical interaction between people can prevent the spread of most respiratory diseases [21]. Tangana et. al [1] presented a mathematical model to demonstrate the impact of physical distance while interaction on transmission possibilities of virus among the people. In another study [15], it is demonstrated that wearing a face mask is highly effective in mitigating the reproduction of coronavirus. However, manual monitoring and enforcement of the aforementioned SOPs in public places such as schools, universities, shopping malls, and parks is a quite challenging task.

In step with the rapid advancement in Artificial Intelligence (AI), Deep Learning in particular, the computer vision community has contributed various state-of-the-art methods for intelligent surveillance [65], object detection [6] and recognition [5, 7], and scene understanding [46]. These methods can be employed to develop an intelligent monitoring system for face mask detection and social distance measurement in public places. However, there are two main challenges in this direction. Firstly, to the best of our knowledge, there is no South Asian standard benchmark available to evaluate facial mask detection and social distance measurement methods. Secondly, there is no pipeline available for the development of an end-to-end real-time intelligent monitoring system for facial mask detection and social distance measurement. It is important to mention that several research studies have employed standard single- and multi-stage object detectors such as Faster-RCNN, SSD, and Retina-Net to perform face mask detection [17]. However, these methods do not consider the impact of social distance measurement, which make these methods insufficient for deployment in actual public places.

To address the aforementioned short-comings of existing state-of-the-art methods, in this paper, we have made the following contributions.

  1. A local dataset containing 10,000 images based on two classes (i.e., masked face and unmasked face) has been collected from public places. It is worth noting that these classes are unique in orientation and dress codes, which are not covered in the existing datasets.

  2. Existing state-of-the-art single and multi-stage object detectors are fine-tuned on the proposed dataset. Based on the analysis, an improved YOLO-v3 based object detection architecture is presented to enhance robustness of real-time surveillance systems.

  3. Alongside, a machine-vision based distance measurement method has been proposed to ensure social distancing on public places.

  4. Lastly, an extensive comparative study has been carried out between state-of-the-art Face mask detection methods and the proposed method to demonstrate the effectiveness of our proposed method in terms of higher detection and recognition accuracy, and inference time.

The rest of the paper is organized as follow. In Section 2, we briefly discuss existing state-of-the-art facial mask detection and social distance measurement methods, along with the available datasets. In Section 3, we present a detailed overview of our proposed end-to-end pipeline for face mask detection and social distance measurement. The experimental results have been presented in Section 4. Finally, the paper is concluded in Section 5.

Related work

Real-time object detection and recognition methods can play an important role in developing intelligent monitoring methods for face mask detection and social distancing measurement to prevent coronavirus transmission. In this section, we analyze the existing state-of-the-art methods employed in developing an intelligent monitoring system for face mask detection and social distancing measurement which includes: (i) single- and multi-stage detection methods—for face masked and non-masked face detection, (ii) Available Datasets—to develop generalized face detection systems and, (iii) social distance measurement methods.

Facial mask detection

In the majority of existing research works, the researchers focused on face construction and identity recognition while wearing face masks. However, the aim of this study is to identify the human face in both states—wearing the mask, or not wearing the mask in order to assist in reducing COVID-19 transmission and spread. In recent studies, researchers have demonstrated that wearing face masks minimizes the rate of COVID-19 spread as it can interrupt airborne germs effectively [38]. However, monitoring the people in public places is still a challenging task. In this regard, Zhang et al. [62] proposed a single shot refinement face detector namely Refine Face to to detect people not wearing a face mask. In another research work, Jagadeeswari et al. [19] proposed SSD-based face mask detection method for an outdoor environment. Khandelwal et al.[22] presented a deep learning approach for classifying human face with and without mask. Onyema et al.[40] proposed method for facial expression recognition based convolutional neural network. Hussain et al. [16] proposed deep learning based IoT system to detect face mask using transfer learning approach.

Besides, aforementioned approaches achieved better accuracy on the respective test data. However, the real-time face mask detection is still a critical challenge for the system developers. In this regard, Snyder et al. [56] introduced deep learning based approach for mask detection to prevent COVID-19 transmission. Kodali et al. [23] presented custom CNN-based model to detect face wearing a mask in the public spots. Similarly, Sagayam et al. [54] proposed deep neural network based method for binary class (i.e., masked and non-masked) face state recognition. Degadwala et al. [9] proposed YOLO-v4 based face detection method which has been trained and tested over WIDER-FACE and MAFA datasets. Likewise, Taneja et al. [58] presented facial mask detection system with MobileNetV2 lightweight CNN and achieved 99.98% accuracy. On the other hand, Sethi et al. [55] aims to detect mask using ResNet-50. The model give 11.07% and 6.44% higher precision and recall and compared it with RetinaFaceMask detector model.

In another research work, Loey et. al [30] presented multi-stage detection method for face detection with wearing or not wearing mask. Alongside, ensemble method combined with deep learning model to detect face masks using real-world and synthetic data to improve the generalizability of machine learning models. These research works are discussed along with insightful strengths and limitations in the Table 1. To this end, we conclude that the deployment of the above-discussed face mask detection systems encounter several constraints at development and deployment level such as diverse types of face masks, face orientation, and illumines conditions [52]. Furthermore, stabilizing object detection model accurateness and real time condition, placement of detector on system with limited computing capacity. In the circumstance of the epidemic, facial mask detection is not still explored in images, videos as well as closed circuit television (CCTV) to control transmission chain of virus [37].

Table 1.

An Overview of Existing Machine Learning Methods Used for Face Mask Detection and Recognition Tasks

Author Methods Dataset Accuracy Limitation
Roy et al. [52] MOXA included YOLO-v3, Tiny YOLO-v3, SSD and Faster R-CNN Kaggle’s medical masks dataset — 3000 images YOLO-v3: 63.99% mAP, Tiny YOLOv3: 56.27% mAP, SSD: 46.52% mAP, and F-RCNN: 60.5% mAP Unmanned approach MOXA requires improvement including more innovative object detectors
Nagrath et al.[37] Single shot multibox object detection model and MobileNetV2 Kaggle’s medical masks and PyImage search dataset contains 1,376 images The SSDMNV2 model attained 92.64% accuracy SSDMNV2 was trained on artificially produced images, still not tested in real situations as well as with real-time CCTV
Hussain et al.[16] Transfer learning with CNN, VGG-16, MobileNetV2, ResNet-50, Inceptionv3 MAFA dataset, Masked Face-Net, and Bing dataset Using VGG-16 achieved 99.81% accuracy and with MobileNetV2 attained 99.6% accuracy Online accessible dataset contain noisy and construct by artificially, which is not suitable for real time system
Snyder et al.[56] ResNet-50 with FPN and Multi-Task CNN MCelebFaces Attributes, Microsoft Common Objects in Context, WIDER FACE dataset and Custom Mask Community Dataset 87.7% detection accuracy Incorrectly identify faces with mask and without mask
Kodali et al.[23] CNN model Kaggle dataset with 853 images 96% detection accuracy Incorrectly identify faces with mask and without mask
Sagayam et al.[54] OpenCV and MobileNet-V2 used to detect face mask Kaggle’s medical masks and PyImage search 99% accuracy achieved by MobieNet-V2 Trained on limited dataset which is not perform well in real time situation
Degadwala et al.[9] YOLO-v4 MAFA and WIDER-FACE dataset 99.98% accuracy obtained by YOLO-v4 Have need of more computational power and require 30FPS camera resolution rate
Taneja et al.[58] MobileNet-V2 lightweight CNN model used to detect face mask Medical Masks Dataset and the Face Mask Dataset 99.98% accuracy Performance of MobileNet-V2 is not accurate as compared to Faster R-CNN and Inception-V2
Chadav et al.[8] Multi-stage CNN model Kaggle with 853 images 98% accuracy Dual-stage CNN model do not detect side views of the face
Bhuiyan et al.[3] YOLO-v3 Google colab datasets having 650 images 96% accuracy Limited dataset used and cannot test on real time condition
Ejaz et al.[11] Using Principal Component Analysis (PCA) recognition faces with masks and without the mask ORL face dataset is used for masked faces containing 500 images Attain accuracy for face mask is 72% and without the mask is 95% PCA gave poor results in mask face, only front side face images use for the dataset
Qin et al.[44] Classification of facial image with SRCNet and automatic identifies faces wear with mask Medical Masks dataset having 3835 images Acquire 98.70% accuracy with image super resolution classification network (SRCNet) Use of limited number of images
Jiang et al.[20] Proposed SSD to classify face with FPN 7959 images collected from internet Without mask: 89.6% precision, with mask: 91.9% precision Do not differentiate between mask and unmask face properly
Rahman et al.[45] Facial mask detection in smart city through CCTV 1539 images are collected from different sources 98.7% Achieve accuracy Confuse system with a hand covered face
Punn et al.[43] YOLO-v3 used to monitoring real time social distance 800 images taken from OID dataset YOLO v3 with deep sort acquire better result as compared to FPS Privacy issue, do not record violations
Yang et al.[61] Faster R-CNN and YOLOv4 detects real time social distance and critical density Taken 12300 images from MS-COCO dataset Accuracy and performance are good to monitor social distance Do not record data, crowd analysis still a challenge
Militante et al.[36] Single shot detector used to detect face mask and physical distance with alarm system 20000 images collected from web accuracy rate of 97% Do not detect face mask and distance at the same time
Yadav et al.[60] face mask and social distance detection and generate an alert signal with SSD used custom dataset of 3165 images obtain accuracy 85% and 95% N/A

Available datasets

In the context of COVID-19, the face datasets have an essential role in training deep models for face mask and non-masked face detection. Recently, several datasets have been proposed to accelerate research in this direction. In this regard,Ge et al.[12] proposed MAFA dataset contains 30811 images which are collected from the Internet. These images have distinct types of masks, several occlusion degree and orientations. Furthermore,Laxel [33] introduced Face Mask Dataset (FMA) holds 853 images with three classes collected from Kaggle. Another extent version of kaggle dataset proposed by Wobot [18] denoted as FMA containing 6024 images having 20 classes. Rahmani et al. [45] proposed Medical Mask Dataset (MMD). The MMD dataset consist of 9067 images with three classes use to detect only medical mask. On the other hand, Wang et al. [59] proposed a large-scale dataset of masked faces for detection and recognition Masked-Face Detection Dataset (MFDD), the Real-world Masked-Face Recognition Dataset (RMFRD) and the Simulated Masked-Face Recognition Dataset (SMFRD). MFDD contain 24771 mask face image that were collected from the internet. The RMFRD have 2203 mask face image and 90,000 without mask images. SMFRD includes 50000 images. These detectors achieve 95% accuracy with the multi granularity model. We do not collect any images from the existing available dataset. Instead, we build a challenging dataset to perform experiments on existing object detector.

Social distance measurement

Social distancing is a significant safety measure to control the spread of COVID-19. Computer vision application has shown better applicability in detection [47] and emotion enable cognition task in real time environment [4]. In this regard, computer vision play an important role to dimensionality reduction with Matrix Factorization (MF) has valuable framework to treat against COVID-19 [34]. Additionally, Feature Selection and Prognosis Classification used to develop machine learning based intelligent system for COVID-19 disease [53]. The spectral clustering [2] and gene selection technique [51] has been presented to map to a low-dimensional space by merging node centrality and community detection. Due to increase spread of COVID-19 outbreak cause serious condition to the global education systems. During the school closure, Computer Science innovation technologies have been useful and comfortable for teaching as well as learning [10]. Prem et al. [42] used susceptible-exposed-infected-removed (SEIR) method to study the special effects of social distancing on the spread of the virus. Levchev et al. [26] aimed to study a database configuration in multiple sensor technologies similar to cameras, LiDAR, inertial gyroscopes, wireless sensors and additional sensors used as data acquisition stages. Liang et al. [27] utilized various sensors to get image information and geographic location information at the same time build an indoor 3D chart using geographic coordinates. Niu et al. [39] highlighted social distancing problem in 3D view by using monocular cameras pedestrian 3D localization. Futhermore, Magoo et al. [31] setting bird eye view framework with YOLO v3 model to monitor social distance in public area. Though, the research community has contributed several social distance measurement methods, however, deployment of such systems in real-world environment is still a challenging task.

The method

To address the above-mentioned issues, we propose a novel pipeline for developing an end-to-end face mask detection methods to monitor the public spots in order to mitigate the COVID-19 spread, as shown in the Fig. 1. Firstly, we present a large-scale M UST F ace D ataset (MFD)—containing 10,000 images along with binary class bounding box annotations i.e., Face wearing mask, and Face not wearing mask. Alongside, we analyzed the existing state-of-the-art single stage and multi-stage object detector over our proposed dataset. Specifically, we fine-tuned the existing YOLO-v3 [49], SSD [63], RetinaNet-50 [28], Fast-RCNN [50], Faster R-CNN (FPN) [32], Faster-RCNN (ResNet-50) [25] and Faster-RCNN (ResNet-101) [29] on our proposed dataset through transfer learning. Based on the better performance, we further improved the YOLO-v3 architecture to robustify its performance in outdoor environment. On the basis of our face detector, we employed our self-proposed social distance measurement method—which takes input from the face detector and computes the distance between the two human beings to mitigate the COVID-19 spread in public spots.

Fig. 1.

Fig. 1

The Proposed Pipeline For Developing Face Mask Detection And Social Distance Measurement in Public Places

MUST face dataset

To this end, we collect and release M UST F ace D ataset (MFD)—a large-scale dataset to accelerate the development of generalized methods for end-to-end face mask detection in public places. Our MFD contains 10,000 images along with binary class (i.e., masked face, non-masked face) bounding box annotations. The proposed dataset is generated from the video sequences captured by the surveillance cameras installed at the outdoors of the departmental buildings. The average height of the installed cameras is in the range of 12 feet to 15 feet from the ground. After successful video sequence collection, the crowded frames are manually extracted while ensuring the quality control parameters such as positioning of the people and the clarity of the images. It is important to mention that we comply with the regulatory bodies and collected the data from the permitted areas. To protect the privacy, we do not disclose or release the personal identities, Geo-location, incoming and outgoing pattern based information of the people.

After completing frame extraction, considering the use-case of our proposed method, we defined two classes for annotations i.e., masked face, and non-masked face. For this purpose, we employed LabelImg annotation tool to label the human faces according to the aforementioned defined classes. One of the reasons of manual annotations instead of automated labeling is to maintain the accuracy of the coordinates of ground truth which plays an important role in training a robust face detection model. All the annotations are cross-validated by a team of experts to ensure the quality of ground truth. Some of the samples of our dataset are shown in the Fig. 2.

Fig. 2.

Fig. 2

Sample Images From Our M UST F ace D ataset

Suitable face detection method selection

Till recently, deep object detection methods have demonstrated better applicability in various real-time object detection and recognition tasks [24]. To select the suitable deep learning object detector, firstly, we fine-tuned the existing state-of-the art single-stage and multi-stage detection methods including YOLO-v3 [49], SSD [63], RetinaNet-50 [28], Fast-RCNN [50], Faster R-CNN (FPN) [32], Faster-RCNN (ResNet-50) [25] and Faster-RCNN (ResNet-101) [29] on our proposed MFD through transfer learning. The results show that existing YOLO-v3 outperformed aforementioned employed detection methods in terms of inference time and accuracy. Based on the better performance, we further improved the YOLO-v3 architecture to robustify its performance in outdoor environment.

Proposed facial mask detection architecture

In the proposed framework, we have employed YOLO-v3 architecture to perform facial mask detection in real-time, one of the most outstanding deep learning object detectors proposed by Joseph Redmon and Ali Farhadi in 2018 [48], which demonstrated consistent performance for object detection and recognition tasks. One of the main issues in existing detection network was the vanishing gradient problem, which commonly occurs by increasing network layers. Therefore, multi-scale YOLO-v3 has been proposed which hold residual connections—which join the input from the previous layer to output of next layer similar to ResNet architecture. Resultantly, Yolo-v3 achieved good performance even over low resolution images due to inclusion of multi-scale feature extraction property. To this end, we employed the existing YOLO-v3 architecture and inserted k-means anchoring to 9 anchor boxes and then isolate them into three locations to get more more bounding boxes per image than baseline version.

The input layer takes an RGB image with a size of 416x416 pixels. As a backbone network, we employed DarkNet-53 to accomplish the maximum calculated floating-point procedure per second. The internal structure of the model includes fully connected network that does not contain max-pooling layer. As depicted in Fig. 1 the network contains convolution block, residual block, and scale output layers. In convolution block, convolution functions of the kernel size hold strides instead of max pooling to reduce size of input images; each monitored by batch normalization and ReLU activation. On the other hand, residual block having different kernel size of two convolution block named as mega-block. In existing YOLO-v3 architecture, the convolution blocks iterates by 1x, 2x, 4x, and 8x. However, considering the use-case of our application, we reduced the iterations of convolution blocks to 1x, 2x, 4x in order to improve the learning performance and inference time. In the bottom of the architecture, an average pool, followed by a fully connected layer and softmax activation is employed to down-sample the feature map and get binary class output probability, respectively. To improve the learning process, we applied the concept of transfer learning to utilize the storing knowledge of a neural network to do new tasks by simply learning new weights. The ultimate aim of employing this technique is to increase the learning process.

Social distance measurement methods

With the recent advancement in the field of AI, computer vision based applications have demonstrated better applicability in several applications such as scene understanding, object recognition, speed, and distance estimation [14]. Some research used proportional-integral-derived (PID) [57] due to it’s simplicity and non-optimal performance. Since, it is suitable for distance measurement as well as will consume less power and memory. Zhang et al. [64], proposed distance estimation method to localization of an object in the camera coordinate frame. Their method contain three steps. The first step is regarding camera calibration and the second step is concerned with constitute a model for distance measurement between camera coordinate frame with their projection frame and third step is representing absolute distance estimation.

The distance is computed with respect to the pivot point of bounding box known as centroid—which is calculated using (1), mentioned below.

C(x,y)=x^min+x^max2,y^min+y^max2 1

It can be seen from (1), C means centroid—means that minimum and maximum width of the bounding box whereas y_min , y_max means that minimum and maximum height of the bounding box. Calculated centroid and then use Euclidean distance formula to measure distance between centroids, as shown in the (2) and then compared the distance with ground truth value.

D(C2(x,y),C1(x,y))=(xminxmax)2+(yminymax)2 2

After calculating centroid of bounding box, a unique ID is assigned to each centroid. In the next step, the distance between every detected centroid is computed using Euclidean distance. To validate the correctness, Root Mean Square Error (RMSE) (mentioned in the equation 3) to estimate the error between actual value and predicted value of the model.

RMSE=i=1N(PredictedvalueActualvalue)N 3

Proposed algorithm for real-time face mask detection

Here we present a novel algorithm, depicted in Algorithm 1 , for developing and deploying an end-to-end face mask detection and social distance monitoring system in the public spots.

In the first step, the real-time stream of the camera get the visual frames—which is passed to our developed face mask detection method for inference. Our proposed method analyzes the frames, if there is no face detected, our network returns null. If face is detected, face detect and also compute distance between faces by using our proposed method. To find out the precautionary measure according to the facial mask and measure social distance, a discussion performed in the Section 2. Following scenarios has been performed: if person wear a mask and distance is greater than 6 feet then no action performed. But when person not wearing a mask and social distance is greater than 6 feet then alert is high. On the other hand, when person wear mask and social distance is less than 6 feet again alarm generated. The masked person and not maintain social distance, then generated warning.

Algorithm 1.

Algorithm 1

Real-time surveillance Pro.

Experiments and results

In this section, we evaluate the effectiveness of the proposed mask/non-mask face detection method and present the comparison study with current cutting-edge techniques. The studies are conducted on a powerful computer running a 64-bit version of Windows 10 that has an RTX 2080TI graphics card, an 11 GB DDR5 GPU, a core i9- 9900k CPU, and 32 GB of RAM.

Training setup

The training process of the proposed pipeline is divided into three fundamental steps: data pre-processing, model training, and model evaluation. Firstly, the whole dataset is randomly split into training, validation, and test set with 80:10:10 percent ratio and normalized the input size to 416x416 pixel resolution. In the next step, Pytorch library is used for the implementation of the proposed pipeline. Moreover, the experiments are categorized into three phases i.e. (i) evaluation of the existing state-of-the-art object detection networks on proposed dataset, and (ii) evaluation of improved Yolo-v3 network on proposed dataset, and (iii) evaluation of proposed distance measurement method.

Evaluation of existing state-of-the-art object detection networks on proposed dataset

To evaluate the existing state-of-the-art deep object detection models—YOLO-v3, SSD, RetinaNet-50, RetinaNet-101, Fast-RCNN, Faster R-CNN (FPN), Faster-RCNN (ResNet-50) and Faster-RCNN (ResNet-101) are are fine-tuned on the proposed face mask detection dataset. Pytorch 1.4.0 library and cuda 11.0 version are used to configure the training runs. The hyper-parameters such as learning rate, batch size and epochs are set to 0.0001, 32, and 100 with the stochastic gradient descent optimizer to update model weights, respectively. The performance matrices of the employed models are shown in Table 2.

Table 2.

Evaluation of existing state-of-the-art object detection networks on proposed dataset

Method Mean Accuracy mAP mAP @ 0.95 Inf.Time (ms)
YOLO-V3 64.1% 59.6% 53.1% 28
SSD 61.8% 56.2% 48.6% 34
RETINA-NET 50 55.2% 51.9% 44.7% 37
RETINA-NET 101 51.0% 46.3% 41.8% 39
FAST-RCNN 41.7% 39.4% 37.1% 132
FASTER-RCNN (FPN) 47.3% 44.0% 41.5% 119
FASTER-RCNN (ResNet-50) 59.0% 57.4% 55.6% 108
FASTER-RCNN (ResNet-101) 62.7% 61.3% 59.0% 98

It can be seen from Table 2 that single-stage detectors demonstrated better applicability in term of low inference time due to their less parametric architectures. Whereas, the multi-stage object detectors have been computationally expensive while achieving significantly higher inference time. It is also important to mention that Yolo-v3 with 53 layers demonstrated better accuracy than the SSD, RetinaNet-50, RetinaNet-101, Fast-RCNN, Faster R-CNN (FPN), Faster-RCNN (ResNet-50) and Faster-RCNN (ResNet-101). For instance, YOLO-v3 achieved 64.1% mean accuracy, 59.6% mAP, 53.1% mAP @ 0.95 and 28ms inference time. Similarly, SSD achieved 61.8% mean accuracy, mAP 56.2%, mAP @ 0.95 is 48.6% and take 34 prediction time. Also, RetinaNet-50 demonstrate 55.2% mean accuracy, mAP 51.9%, mAP @ 0.95 is 44.7% with the inference time of 37ms on the test set. Whereas, RetinaNet-101 achieved 51.0% mean accuracy, mAP 46.3%, and 44.7% mAP @ 0.95 with 39ms inference time which is comparatively higher than RetinaNet-50. On the other hand, We next analyze the multi-stage object detector i.e., Fast R-CNN which demonstrated 41.7% mean accuracy, 39.4% mAP, and 37.1% mAP @ 0.95 with 132ms inference time on the our test set which is significantly higher than the employed single shot detectors. In another experiment, Faster R-CNN based on FPN 119 achieved mean accuracy of 47.3%, 44% mAP, and 41.5% mAP @ 0.95. Whereas, the sample Faster R-CNN with ResNet-50 feature extraction network achieved mean accuracy of 59.0%, mAP 44%, and 57.4% mAP @ 0.95 with inference time of 108ms. However, with ResNet-101 as a backbone feature extraction network, Faster-RCNN shows mean accuracy of 62.7%, mAP 61.3%, and 59.0% mAP @ 0.95 with inference time of 98ms. Consequently, it can be assumed that YOLO-v3 with DarkNet-53 can achieve better accuracy after further architectural fine-tuning.

Evaluation of improved YOLO-V3 architecture on proposed dataset

Based on the above discussed analysis, the architecture of the YOLO-v3 is further improved by trimming the less contributing convolutional layers and residual connections.The improved feature extractor—DarkNet has been evaluated on the proposed dataset. In order to train the network faster, we employed transfer learning to learn the high level features from the proposed dataset. In the training setup, we employed SGD optimization algorithm with momentum to train and evaluate the improved network on our proposed dataset for mask/non-mask face detection tasks. The re-known performance metrics such as mean accuracy, mAP, mAP @ 0.95 and inference time are used to evaluate the performance of our improved face mask/non-mask face detection on our dataset. The mean accuracy refers to the sum of correct predictions divided by the sum of total data samples. Whereas, mAP denotes mean average precision, and AP @ 0.95 shows the average precision with 0.95 intersection over union. Furthermore, inference time refers to the total time taken from getting an input to producing an output.

It can be seen from the Table 3 that our improved Yolo-V3 based detection network outperformed the baseline Yolo-v3 in mask/non-mask face detection tasks on our proposed dataset. One of the main reasons behind the increase of accuracy in our model is the trimming of less contributing residual connections with accelerated the performance of our model as compared to the baseline model. Some of sample results are demonstrated in Fig. 3 to show the effectiveness of our proposed masked/non-masked face detection method.

Table 3.

Evaluation of improved YOLO-v3 on proposed dataset

Method Mean Accuracy mAP mAP @ 0.95 Inf. Time
Existing 64.1% 59.6% 53.1% 28 ms
YOLO-V3
Proposed 69.4% 64.7% 62.0% 25 ms
YOLO-V3

Fig. 3.

Fig. 3

Qualitative examples of our masked/non-masked face detection method on our face mask dataset

Evaluation of proposed distance measurement method

After evaluating our proposed mask/non-mask face detection, in next step, we evaluated our proposed machine-vision based distance measurement method to ensure social distancing on public places. Following the standard performance metrics, we employed root mean square error to analyze the correctness of our method as compared to the ground truth. Some of the quantitative analysis is shown in Table 4.

Table 4.

Results of proposed distance measurement methods

Sr. No. Ground Truth (ft) Predictions (ft) RMSE
Distance 1 2.44 2.37 0.035
Distance 2 2.99 2.95 0.020
Distance 3 3.16 3.10 0.030

The vision based system detect faces of the person and give the bounding boxes information. Later on, detect the central point of the bounding boxes around the face and then measure distance between two central point (centroid) using the standard equation of euclidean distance. The error rate is computed using RMSE which computes the difference between ground truth value and predicted value of the model. For instance,in the Distance 1 sample, the actual distance (ground truth) between two persons is 2.44 feet, whereas our proposed vision-based distance measurement method predicts 2.37 with quite lesser error rate i.e., 0.035 RMSE. In the next data sample i.e., Distance 2, the actual distance is 2.99 whereas, our model inferred 2.95 with the RMSE of 0.020. Similarly, in Distance 3 sample, the ground truth value is 3.16 whereas, the proposed method predicts 3.10 holding the error rate of 0.030 which is quite effective performance on our test set.

Conclusion

In this paper, a novel pipeline for developing an end-to-end masked/non-masked face detection method is proposed to improve the effectiveness of real-time surveillance systems at public places. Alongside, a new dataset containing 10,000 images of two classes (masked face, non-masked face) is constructed to develop a generalized masked/non-masked face detection and social distance measurement in outdoor public places. While fine-tuning existing state-of-the-art single-stage and multi-stage detection methods, it is observed that Yolo-v3 outperformed the other networks in terms of accuracy and inference time. Based on analysis, we further improved the baseline Yolo-v3 by eliminating the less contributing residual connections in the network. Consequently, the results indicate that our customized YOLO-v3 performed better than baseline version, showing an improvement of 5.3% in terms of accuracy. In the future, we are aiming to extend our work to develop an image segmentation-based system that can provide accurate level information and gives greater clarity to detect face mask.

Author Contributions

The research conceptualization and methodology were done by Iram Javed, Muhammad Atif Butt, and Samina Khalid. The technical and theoretical framework was prepared by Iram Javed and Muhammad Atif Butt. The technical review and improvement were performed by Tehmina Shehryar, Rashid Amin, Adeel Muzaffar Syed, and Marium Sadiq. The overall technical support, guidance, and project administration were done by Iram Javed, Muhammad Atif Butt, and Samina Khalid.

Data Availability

All the data used to support the findings of the study are available in the manuscript.

Code Availability

The proposed dataset along with implementation is available at https://github.com/iram1994/Face-Mask-Detection

Declarations

Conflict of Interests

The authors declare no conflicts of interest.

Footnotes

The proposed dataset along with implementation is available at https://github.com/iram1994/Face-Mask-Detection/

Disclosure

Iram Javed and Muhammad Atif Butt are the joint first authors to this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Atangana A. Modelling the spread of covid-19 with new fractal-fractional operators: can the lockdown save mankind before vaccination? Chaos, Solitons & Fractals. 2020;136:109860. doi: 10.1016/j.chaos.2020.109860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Berahmand K, Nasiri E, Li Y, et al. Spectral clustering on protein-protein interaction networks via constructing affinity matrix using attributed graph embedding. Comput Biol Med. 2021;138:104933. doi: 10.1016/j.compbiomed.2021.104933. [DOI] [PubMed] [Google Scholar]
  • 3.Bhuiyan MR, Khushbu SA, Islam MS (2020) A deep learning based assistive system to classify covid-19 face mask for human safety with yolov3. In: 2020 11th international conference on computing, communication and networking technologies (ICCCNT). IEEE, pp 1–5
  • 4.Butt MA, Riaz F, Mehmood Y, Akram S. Reeec-agent: Human driver cognition and emotions-inspired rear-end collision avoidance method for autonomous vehicles. Simulation. 2021;97(9):601–617. doi: 10.1177/00375497211004721. [DOI] [Google Scholar]
  • 5.Butt MA, Khattak AM, Shafique S, Hayat B, Abid S, Kim K-I, Ayub M W, Sajid A, Adnan A (2021) Convolutional neural network based vehicle classification in adverse illuminous conditions for intelligent transportation systems. Complexity 2021
  • 6.Butt MA, Riaz F. Carl-d: A vision benchmark suite and large scale dataset for vehicle detection and scene segmentation. Sig Process Image Commun. 2022;104:116667. doi: 10.1016/j.image.2022.116667. [DOI] [Google Scholar]
  • 7.Butt M A, Ul-Hasan A, Shafait F (2022) Traffsign: Multilingual traffic signboard text detection and recognition for urdu and english. In: International workshop on document analysis systems. Springer, pp 741–755
  • 8.Chavda A, Dsouza J, Badgujar S, Damani A (2021) Multi-stage cnn architecture for face mask detection. In: 2021 6th international conference for convergence in technology (I2CT). IEEE, pp 1–8
  • 9.Degadwala S, Vyas D, Chakraborty U, Dider AR, Biswas H (2021) Yolo-v4 deep learning model for medical face mask detection. In: 2021 international conference on artificial intelligence and smart systems (ICAIS). IEEE, pp 209–213
  • 10.Edmond V, Onyema E, Osijirin A, Oka O (2022) Application of innovative technologies in computer science education during covid-19 school closure in enugu. 12:5129–5139
  • 11.Ejaz M S, Islam M R, Sifatullah M, Sarker A (2019) Implementation of principal component analysis on masked and non-masked face recognition. In: 2019 1st international conference on advances in science, engineering and robotics technology (ICASERT). IEEE, pp 1–5
  • 12.Ge S, Li J, Ye Q, Luo Z (2017) Detecting masked faces in the wild with lle-cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2682–2690
  • 13.Ge X-Y, Pu Y, Liao C-H, Huang W-F, Zeng Q, Zhou H, Yi B, Wang A-M, Dou Q-Y, Zhou P-C, et al. Evaluation of the exposure risk of sars-cov-2 in different hospital environment. Sustain Cities Soc. 2020;61:102413. doi: 10.1016/j.scs.2020.102413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ghodgaonkar I, Chakraborty S, Banna V, Allcroft S, Metwaly M, Bordwell F, Kimura K, Zhao X, Goel A, Tung C et al (2020) Analyzing worldwide social distancing through large-scale computer vision. arXiv:2008.12363
  • 15.Goldberg MH, Gustafson A, Maibach EW, Ballew MT, Bergquist P, Kotcher JE, Marlon JR, Rosenthal SA, Leiserowitz A. Mask-wearing increased after a government recommendation: a natural experiment in the us during the covid-19 pandemic. Front Commun. 2020;5:44. doi: 10.3389/fcomm.2020.00044. [DOI] [Google Scholar]
  • 16.Hussain S, Yu Y, Ayoub M, Khan A, Rehman R, Wahid JA, Hou W. Iot and deep learning based approach for rapid screening and face mask detection for infection spread control of covid-19. Appl Sci. 2021;11(8):3495. doi: 10.3390/app11083495. [DOI] [Google Scholar]
  • 17.Iftikhar A et al (2021) An insight into facial mask and social distance monitoring system based on deep learning object detector to prevent covid-19 transmission. In: Sinteza 2021-international scientific conference on information technology and data related research. Singidunum University, pp 120–127
  • 18.Intelligence W (2021) Face mask detection dataset
  • 19.Jagadeeswari C, Theja MU. Performance evaluation of intelligent face mask detection system with various deep learning classifiers. Int J Adv Technol. 2020;29(11s):3074–3082. [Google Scholar]
  • 20.Jiang N, Lu Y, Tang S, Goto S (2010) Rapid face detection using a multi-mode cascade and separate haar feature. In: 2010 international symposium on intelligent signal processing and communication systems. IEEE, pp 1–4
  • 21.Katz R, Vaught A, Simmens SJ. Local decision making for implementing social distancing in response to outbreaks. Public Health Rep. 2019;134(2):150–154. doi: 10.1177/0033354918819755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Khandelwal P, Khandelwal A, Agarwal S, Thomas D, Xavier N, Raghuraman A (2020) Using computer vision to enhance safety of workforce in manufacturing in a post covid world. arXiv:2005.05287
  • 23.Kodali R K, Dhanekula R (2021) Face mask detection using deep learning. In: 2021 international conference on computer communication and informatics (ICCCI). IEEE, pp 1–5
  • 24.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
  • 25.Lee C, Kim H J, Oh K W (2016) Comparison of faster r-cnn models for object detection. In: 2016 16th international conference on control, automation and systems (iccas). IEEE, pp 107–110
  • 26.Levchev P, Krishnan M N, Yu C, Menke J, Zakhor A (2014) Simultaneous fingerprinting and mapping for multimodal image and wifi indoor positioning. In: 2014 International conference on indoor positioning and indoor navigation (IPIN). IEEE, pp 442–450
  • 27.Liang J Z, Corso N, Turner E, Zakhor A (2013) Image based localization in indoor environments. In: 2013 fourth international conference on computing for geospatial research and application. IEEE, pp 70–75
  • 28.Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
  • 29.Liu B, Zhao W, Sun Q (2017) Study of object detection based on faster r-cnn. In: 2017 Chinese Automation Congress (CAC). IEEE, pp 6233–6236
  • 30.Loey M, Manogaran G, Taha MHN, Khalifa NEM. Fighting against covid-19: a novel deep learning model based on yolo-v2 with resnet-50 for medical face mask detection. Sustain Cities Soc. 2021;65:102600. doi: 10.1016/j.scs.2020.102600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Magoo R, Singh H, Jindal N, Hooda N, Rana PS. Deep learning-based bird eye view social distancing monitoring using surveillance video for curbing the covid-19 spread. Neural Comput Appl. 2021;33(22):15807–15814. doi: 10.1007/s00521-021-06201-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Mai X, Zhang H, Jia X, Meng MQ-H. Faster r-cnn with classifier fusion for automatic detection of small fruits. IEEE Trans Autom Sci Eng. 2020;17(3):1555–1569. [Google Scholar]
  • 33.mask Dataset — MakeML - Create Neural Network with ease. Makeml.app. (2022). Retrieved 15 January 2022, from https://makeml.app/datasets/mask
  • 34.Mehrpooya A, Saberi-Movahed F, Azizizadeh N, Rezaei-Ravari M, Saberi-Movahed F, Eftekhari M, Tavassoly I. High dimensionality reduction by matrix factorization for systems pharmacology. Brief Bioinform. 2022;23(1):bbab410. doi: 10.1093/bib/bbab410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Middleton J, Martin-Moreno JM, Barros H, Chambaud L, Signorelli C. ASPHER statement on the novel coronavirus disease (COVID-19) outbreak emergency. Int J Public Health. 2020;65(3):237–238. doi: 10.1007/s00038-020-01362-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Militante S V, Dionisio N V (2020) Deep learning implementation of facemask and physical distancing detection with alarm systems. In: 2020 third international conference on vocational education and electrical engineering (ICVEE). IEEE, pp 1–5
  • 37.Nagrath P, Jain R, Madan A, Arora R, Kataria P, Hemanth J. Ssdmnv2: a real time dnn-based face mask detection system using single shot multibox detector and mobilenetv2. Sustainable cities and society. 2021;66:102692. doi: 10.1016/j.scs.2020.102692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Naudé W. Artificial intelligence vs covid-19: limitations, constraints and pitfalls. AI & society. 2020;35(3):761–765. doi: 10.1007/s00146-020-00978-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Niu Y, Xu Z, Xu E, Li G, Huo Y, Sun W. Monocular pedestrian 3d localization for social distance monitoring. Sensors. 2021;21(17):5908. doi: 10.3390/s21175908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Onyema E M, Shukla P K, Dalal S, Mathur M N, Zakariah M, Tiwari B (2021) Enhancement of patient facial recognition through deep learning algorithm: Convnet. Journal of Healthcare Engineering 2021 [DOI] [PMC free article] [PubMed]
  • 41.Prather KA, Wang CC, Schooley RT. Reducing transmission of sars-cov-2. Science. 2020;368(6498):1422–1424. doi: 10.1126/science.abc6197. [DOI] [PubMed] [Google Scholar]
  • 42.Prem K, Liu Y, Russell TW, Kucharski AJ, Eggo RM, Davies N, Flasche S, Clifford S, Pearson CA, Munday JD, et al. The effect of control strategies to reduce social mixing on outcomes of the covid-19 epidemic in wuhan, china: a modelling study. The Lancet Public Health. 2020;5(5):e261–e270. doi: 10.1016/S2468-2667(20)30073-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Punn N S, Sonbhadra S K, Agarwal S, Rai G (2020) Monitoring covid-19 social distancing with person detection and tracking via fine-tuned yolo v3 and deepsort techniques. arXiv:2005.01385
  • 44.Qin B, Li D. Identifying facemask-wearing condition using image super-resolution with classification network to prevent covid-19. Sensors. 2020;20(18):5236. doi: 10.3390/s20185236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Rahmani MKI, Taranum F, Nikhat R, Farooqi MR, Khan MA. Automatic real-time medical mask detection using deep learning to fight covid-19. Comput Syst Sci Eng. 2022;42(3):1181–1198. doi: 10.32604/csse.2022.022014. [DOI] [Google Scholar]
  • 46.Rasib M, Butt M A, Khalid S, Abid S, Raiz F, Jabbar S, Han K (2021) Are self-driving vehicles ready to launch? an insight into steering control in autonomous self-driving vehicles. Math Probl Eng 2021
  • 47.Rasib M, Butt MA, Riaz F, Sulaiman A, Akram M. Pixel level segmentation based drivable road region detection and steering angle estimation method for autonomous driving on unstructured roads. IEEE Access. 2021;9:167855–167867. doi: 10.1109/ACCESS.2021.3134889. [DOI] [Google Scholar]
  • 48.Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
  • 49.Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
  • 50.Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28 [DOI] [PubMed]
  • 51.Rostami M, Forouzandeh S, Berahmand K, Soltani M, Shahsavari M, Oussalah M. Gene selection for microarray data classification via multi-objective graph theoretic-based method. Artif Intell Med. 2022;123:102228. doi: 10.1016/j.artmed.2021.102228. [DOI] [PubMed] [Google Scholar]
  • 52.Roy B, Nandy S, Ghosh D, Dutta D, Biswas P, Das T. Moxa: a deep learning based unmanned approach for real-time monitoring of people wearing medical masks. Trans Indian Natl Acad Eng. 2020;5(3):509–518. doi: 10.1007/s41403-020-00157-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Saberi-Movahed F, Mohammadifard M, Mehrpooya A, Rezaei-Ravari M, Berahmand K, Rostami M, Karami S, Najafzadeh M, Hajinezhad D, Jamshidi M, et al. Decoding clinical biomarker space of covid-19: exploring matrix factorization-based feature selection methods. Comput Biol Med. 2022;146:105426. doi: 10.1016/j.compbiomed.2022.105426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Sagayam K M et al (2021) Cnn-based mask detection system using opencv and mobilenetv2. In: 2021 3rd international conference on signal processing and communication (ICPSC). IEEE, pp 115–119
  • 55.Sethi S, Kathuria M, Kaushik T. Face mask detection using deep learning: an approach to reduce risk of coronavirus spread. J Biomed Inform. 2021;120:103848. doi: 10.1016/j.jbi.2021.103848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Snyder S E, Husari G (2021) Thor: a deep learning approach for face mask detection to prevent the covid-19 pandemic. In: SoutheastCon 2021. IEEE, pp 1–8
  • 57.Suwarno I, Ma’arif A, Raharja N M, Hariadi T K, Shomad M A (2020) Using a combination of PID control and Kalman filter to design of IoT-based telepresence self-balancing robots during COVID-19 pandemic
  • 58.Taneja S, Nayyar A, Nagrath P, et al. (2021) Face mask detection using deep learning during covid-19. In: Proceedings of second international conference on computing, communications, and Cyber-Security. Springer, pp 39–51
  • 59.Wang Z, Wang G, Huang B, Xiong Z, Hong Q, Wu H, Yi P, Jiang K, Wang N, Pei Y et al (2020) Masked face recognition dataset and application. arXiv:2003.09093
  • 60.Yadav S. Deep learning based safe social distancing and face mask detection in public areas for covid-19 safety guidelines adherence. Int J Res Appl Sci Eng Technol. 2020;8(7):1368–1375. doi: 10.22214/ijraset.2020.30560. [DOI] [Google Scholar]
  • 61.Yang D, Yurtsever E, Renganathan V, Redmill KA, Özgüner U. A vision-based social distancing and critical density detection system for covid-19. Sensors. 2021;21(13):4608. doi: 10.3390/s21134608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Zhang S, Chi C, Lei Z, Li SZ. Refineface: refinement neural network for high performance face detection. IEEE Trans Pattern Anal Mach Intell. 2020;43(11):4008–4020. doi: 10.1109/TPAMI.2020.2997456. [DOI] [PubMed] [Google Scholar]
  • 63.Zhang S, Wen L, Bian X, Lei Z, Li S Z (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212
  • 64.Zhang Z, Han Y, Zhou Y, Dai M. A novel absolute localization estimation of a target with monocular vision. Optik. 2013;124(12):1218–1223. doi: 10.1016/j.ijleo.2012.03.032. [DOI] [Google Scholar]
  • 65.Zhao Z-Q, Zheng P, Xu S-, Wu X. Object detection with deep learning: A review. IEEE Trans Neural Netw Learn Syst. 2019;30(11):3212–3232. doi: 10.1109/TNNLS.2018.2876865. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All the data used to support the findings of the study are available in the manuscript.

The proposed dataset along with implementation is available at https://github.com/iram1994/Face-Mask-Detection


Articles from Multimedia Tools and Applications are provided here courtesy of Nature Publishing Group

RESOURCES