Skip to main content
Elsevier - PMC COVID-19 Collection logoLink to Elsevier - PMC COVID-19 Collection
. 2022 Dec 16;130:104610. doi: 10.1016/j.imavis.2022.104610

A survey on computer vision based human analysis in the COVID-19 era

Fevziye Irem Eyiokur a,, Alperen Kantarcı b, Mustafa Ekrem Erakın b, Naser Damer c,d, Ferda Ofli e, Muhammad Imran e, Janez Križaj f, Albert Ali Salah g,h, Alexander Waibel a,i, Vitomir Štruc f, Hazım Kemal Ekenel b
PMCID: PMC9755265  PMID: 36540857

Abstract

The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks. Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given at the end of the survey. This work is intended to have a broad appeal and be useful not only for computer vision researchers but also the general public.

Keywords: Computer vision, COVID-19, Human analysis, Masked faces, Survey

1. Introduction

The COVID-19 pandemic took the world by storm. Since the first large-scale outbreak in December 2019 in Wuhan, China, COVID-19, a highly infectious atypical (viral) pneumonia caused by the zoonotic coronavirus SARS-CoV-2, spread throughout the globe and resulted in around 600 million recorded cases and over 6.5 million deaths by mid 2022 according to information from Worldometer1 [1]. To contain the spread of the disease, minimize cases and limit the number of deaths, governments across the world started introducing prevention measures that had a profound impact on peoples’ lives and changed their behavior and daily routines. Common prevention measures included (mandatory) face masks in public spaces, medical facilities and crowded areas, requests for social distancing, and restrictions on the allowed crowd size at different events, among others [2], [3].

To help combat COVID-19, the computer vision community quickly took an active stance and initiated a wide range of research activities that resulted in novel techniques for COVID-19 detection and severity analysis from medical images [4], [5], monitoring solutions for assessing compliance with the given prevention measures [1], [6], [7], screening approaches for flagging potentially sick subjects [8], [9], [10], infection-risk assessment methods [11], and efficient biometrics-based authentication schemes tailored towards the characteristics of the COVID-19 era [12], [13]. These solutions have been swiftly adopted in practice and were observed to have a critical role in the efforts towards containing the pandemic [14]. They allowed to automate many monitoring tasks, improved situation-awareness and facilitated large-scale screening efforts. Furthermore, themed workshops, such as the International Workshop on Face and Gesture Analysis for COVID-19 (FG4COVID19),2 were organized in the scope of major computer vision conferences to provide a platform for discussion and presentation of the latest vision techniques related to COVID-19.

A key component of many of the solutions discussed above are, what we refer to in this survey as, computer vision based human analysis (CVHA) techniques that focus on the analysis of people and faces in visual data. While considerable progress has been made in the general area of vision based human analysis, the COVID-19 pandemic introduced several new challenges that have been underexplored in the literature before, e.g.:

  • Mask-based occlusions: One of the globally most prevalent prevention measures, introduced in response to COVID-19, are face masks. The presence of face masks represents a considerable obstacle with an adverse impact on the performance of many CVHA techniques, such as facial landmarking, face detection and recognition, but also related (auxiliary) tasks such as face image quality assessment (FIQA), presentation attack detection (PAD) and others. Dedicated mechanisms are, therefore, needed to handle this type of occlusion. It is important to note that partial occlusions of the facial area have been studied also in the pre-COVID-19 era [15], [16], [17]. However, most of the research from that period was not focused specifically on face masks and, as a result, techniques developed for more general occlusions were observed to lead to suboptimal performance for many COVID-19 related human-centered vision tasks.

  • Relevant datasets: The majority of modern vision techniques relies on machine learning and is, hence, trained using suitably annotated training data. Before the start of the pandemic, there was an obvious lack of datasets suitable for the development of CVHA techniques for combating COVID-19. Especially datasets with masked faces (and people) with high-quality annotations were not widely available. Several factors contributed to the lack of such datasets: (i) there was limited interest in vision problems involving masked-faces (and masked-people) only, (ii) the occlusions caused by the face masks made it difficult to generate annotations of reasonable quality (e.g., facial landmarks, accurate bounding boxes, segmentation/parsing maps, etc.), and (iii) a wide variety of facial masks with highly diverse appearance was introduced during the pandemic, but was not available in the pre-COVID-19 era.

  • Ethical considerations and social impact: While CVHA techniques for COVID-19 were developed with the goal of more efficiently combating the pandemic, the deployment of such techniques also raises ethical considerations and comes with a considerable societal impact. For example, installments of screening and monitoring applications can help to contain the spread of the coronavirus, but may also be extended into citizen surveillance and impact the privacy of individuals.

A considerable amount of work has been conducted over the course of the last three years to address the above challenges and has been covered partially in recent survey papers. Wang et al. [18], for example, reviewed techniques for masked facial detection and associated datasets. Alzubi et al. [19] as well as Utomo and Kusuma [20] discussed dedicated face recognition techniques for masked faces. Elbishlawi et al. [21] reviewed crowd-counting techniques and pointed to the importance of this technology for COVID-19. Related to these works is also the survey of Ulhaq et al. [4], which covers computer vision techniques applicable mostly to medical data, diagnostics and clinical management, and the deep-learning oriented review paper by Shorten et al . [22], where vision approaches, again mostly related to medical applications, are briefly discussed. Although the listed works provide some insight into CVHA research related to COVID-19, they focus on specific problems only, e.g., masked face detection or recognition, or provide a partial picture of the broader (and interconnected) research area. A well-structured and thorough survey on COVID-19 focused vision-based human analysis techniques, on the other hand, is, to the best of our knowledge, still missing from the literature.

In this work, we aim to address this gap and present a comprehensive overview of computer vision techniques that analyze visual data of people and faces with COVID-19 applications in mind. The goal of the survey is to: (i) provide a high-level taxonomy and background on vision techniques applied to human analysis relevant to COVID-19 (Section 2) (ii) present a consolidated summary of recent research activities in this area (Section 3), (iii) provide a review of dataset collection and generation efforts (Section 4), and (iv) elaborate on open problems and challenges with the goal of providing a basis for future research activities (Section 5). The overall structure of the survey is illustrated in Fig. 1 . The work is primarily intended for researchers looking for a broad overview of computer vision research for COVID-19, but also other stakeholders interested in this topic.

Fig. 1.

Fig. 1

High-level structure of the survey. We provide a comprehensive review of recent human-centered computer vision techniques for combating COVID-19, discuss existing datasets and data-generation procedures, and present a list and discussion of the most important open issues and future research directions.

We make the following main contributions in this survey:

  • We present a comprehensive review of computer vision techniques that analyze imagery of people and faces to support the COVID-19 containment efforts and discuss over 200 relevant references that cover diverse but relevant topics from this problem domain.

  • We provide a taxonomy of existing solutions for the most relevant tasks studied as part of vision-based research for COVID-19, e.g., face mask detection, masked face detection, masked face recognition, crowd analysis, etc.

  • We discuss issues beyond the technological solutions, such as ethics, social impact and elaborate on open problems and future research directions.

Since COVID-19 will not be the last pandemic the world faces, we believe this survey will help technological preparedness for similar situations, and ultimately improve the robustness and usability of relevant technologies.

2. Taxonomy on Computer Vision based Human Analysis (CVHA) for COVID-19

The COVID-19 pandemic triggered a need for efficient computer vision techniques related to different problems in visual human analysis that can broadly be categorized into three groups based on their overall goals, as also illustrated in Fig. 2 , i.e.:

  • Techniques for prevention, monitoring and control: The goal of the first group of CVHA techniques is to help prevent the spread of COVID-19 and monitor compliance with the given prevention measures and typically aim to detect/analyze some characteristics (e.g., the presence of masks, the crowd size, or physiological changes/abnormalities) of the subjects in the visual data. Techniques from this group are applicable for screening purposes and as a source of critical statistical data for governments, health organizations and regulatory bodies. CVHA solutions covered in this survey from this group include face/mask detection algorithms [23], [24], [25], crowd-counting solutions designed for COVID-19 characteristics [11], [24], [26], [27], [28], breathing rate detection techniques [29] and face-hand interaction detection approaches [30], [31].

  • Facilitating algorithms: The second group of techniques represents solutions that facilitate applications that are not immediately related to COVID-19 prevention, but whose performance is affected by the external circumstance caused by the pandemic, such as, the presence of face masks. A typical example of such an application is biometric identity inference from facial images, where face masks have been observed to have a considerable adverse effect on the overall recognition accuracy [32]. Many techniques and algorithms have, therefore, been proposed in the last few years to enable such critical applications also during COVID-19, but with minimal performance loss. CVHA techniques from the group of facilitating algorithms reviewed in this paper include face recognition solutions for masked faces [33], [32], [34], [35], [36], [37], as well age estimation [38], [39] and facial expression recognition approaches [40], [41] that were all extended recently with the goal of improving robustness with masked faces.

  • Supporting solutions: The last group of techniques in our taxonomy represents supporting solutions that do not address specific problems with real-world COVID-19-related applications, but are needed to enable techniques from the two groups above. The most important solutions from this group discussed in the survey are data generation techniques, capable of synthesizing artificial training data for the various computer vision models [33], [42], [43], mask removal techniques aiming to reconstruct the original (unocluded) facial images [44], [45], [46] and landmark localization (or/and alignment) techniques [47], [48] that are used as preprocessing steps for other COVID-19-related CVHA solutions.

Fig. 2.

Fig. 2

High-level taxonomy of Computer Vision based Human Analysis (CVHA) techniques surveyed in this paper with respect to the targeted goal.

We note that there is no clear separation between these three groups and there are clear interdependencies that are present in the presented taxonomy, as also highlighted in the overview Fig. 2.

3. Survey of CVHA techniques in the COVID-19 era

In this section, we summarize research on the different CVHA techniques that emerged during the COVID-19 pandemic and are covered in this survey. Specifically, we discuss research efforts focused on (i) face/mask detection, (ii) face recognition and various auxiliary tasks needed for face recognition systems, such as presentation attack detection and face quality estimation, (iii) facial expression recognition, (iv) age classification, (v) landmark localization, (vi) crowd detection/counting, (vii) breathing rate detection, (viii) face-hand interaction detection, and (ix) synthetic data generation.

3.1. Face/mask detection

Among the various prevention measures introduced around the world, face masks were likely the most wide spread and, in fact, were mandatory in various countries [2], [3]. Facial masks were also supported by the World Health Organization (WHO), who published a detailed guide on this topic [49]. Computer vision based detection techniques for masked faces are typically needed to monitor whether people comply with the advice of health organizations and governments in public spaces and to facilitate situational awareness.

In general, masked face detection is a specialized object detection problem where, in addition to the standard detection of non-occluded faces, the goal is to also reliably identify the presence of faces with masks in an image (or video frame). This task typically includes a binary decision (face present/face absent) for a given sub-region of the input image, which also defines the approximate spatial location of the (masked) facial area. Extensions of this problem that emerged during the pandemic, in addition to the (masked) face detection task, also often detect the presence of masks in the image (mask present/mask absent) or/and determine whether the mask is placed/worn in accordance with regulations and general guidelines or not [1], [30], [50], [51], [52], as shown in Fig. 3 .

Fig. 3.

Fig. 3

Masked face detection is a specific object detection problem where the objects to be detected (i.e., faces) can appear both, with and without masks (left). The different extensions that appeared during the COVID-19 pandemic also incorporate decisions on whether the masks are present in the images (middle) and whether the masks are worn correctly or not (right).

Before the pandemic, the masked face detection problem was mostly investigated as part of the more general detection tasks with partial occlusions, where the occlusions may have appeared due to the placement of the hands, the presence of sunglasses, gas masks, helmets, niqabs, and other objects that commonly cover some part of the face. In such settings, face detection methods are commonly observed to perform worse, with the performance degradations increasing as the occluded part of the face gets larger [53]. One of the earliest pre-COVID-19 works on detecting masked faces was presented by Nieto-Rodrígue et al. [54], where the authors introduced a system that checks whether medical staff wears mandatory medical masks in the operating room. They used two distinct detectors: one for the face and the other for the medical mask. They employed the Viola-Jones object detector [55] for both face and mask detection. They collected a dataset that contains faces with medical masks to train the detectors. Another work from this period [56] presented the first large-scale masked face detection dataset named MAFA and trained the locally linear embedding and convolutional neural networks (LLE-CNNs) for detecting faces with and without face masks. The proposed method first extracts (face) region proposals and describes them with a convolutional neural network (CNN). After this, a k-nearest neighbor (KNN) module refines the descriptors for recovering missing facial cues of masked faces. As the last step, a unified CNN is used to perform classification and regression to identify candidate facial regions and their overall positions in images/frames.

In [57], the authors introduced a refined version of the MAFA [56] dataset, called MAFA-FMD, that included only images with medical face masks. Using this new dataset, the authors proposed a novel context attention module to extract highly descriptive contextual features, such as face mask wearing conditions, and showed that the proposed approach outperforms the benchmark RetinaFace [58] and YOLOv3 [59] face detectors. In [52], Nagrath et al. introduced the SSDMNV2 system, which combines a single-shot multibox detector framework and a MobileNetV2 [60] based classifier for the detection of masked faces as well as face mask detection. This lightweight model is suitable for deploying on embedded devices and for real-time data processing. Another work that uses a single-stage face detector is [61]. Here, the authors used the YOLOv2 model [62] to detect masked faces and ResNet-50 [63] to detect face masks. To overcome the challenge of scarce labeled data, Cabani et al. [51] built a synthetic dataset of masked faces to train robust face detection and face-mask detection models. The authors tried to imitate different mask-wearing conditions by using realistic image synthesis methods. However, they only used a single type of medical mask to simulate different wearing conditions, therefore, raising questions on the generalization capabilities of their models to real-world data, where facial masks may have different colors, shapes, and textures. Joshi et al. [50] proposed a framework to detect face masks from a video stream by using the MTCNN [64] face detection model and classifying mask presence with MobileNetV2 [60]. They tested their framework on actual footage of public spaces, captured during the COVID-19 pandemic. The dataset contains multiple geographical locations and people from different ethnicities and the proposed method was demonstrated to outperform RetinaFaceMask [57] on the considered dataset. The authors of [65] proposed a two-stage Faster R-CNN [66] network with an InceptionV2 [67] model along with a novel wearing mask detection (WMD) dataset to address the masked face detection task. Through comprehensive experiments, they show that the two-stage detector provides a good trade-off between accuracy and computational complexity. The work of Roy et al. [68] proposed using a YOLOv3 [59] model along with the Single Shot MultiBox Detector (SSD) [69] and Faster R-CNN [66] for masked face detection. The experimental results on the novel Moxa3K benchmark dataset [68] showed that YOLOv3 [59] achieves better performance than competing models while having comparable runtime.

In [30], Eyiokur et al. studied an extended detection problem, where each face image was classified into one of three classes: no mask, mask, and incorrectly worn mask. The authors introduced a labeled large-scale face mask detection dataset and using the newly collected data trained a RetinaFace model [58] for face detection. Next, they employed several CNN models, namely, Inception-v3 [67], MobileNetV2 [60], EfficientNet [70], etc., to classify the detected and cropped faces into the three above-mentioned classes. The authors also extensively tested their models, both on the proposed dataset as well as on other available datasets from the literature. Cross-dataset evaluations showed their dataset’s representation power, which is crucial for new face mask detection datasets. A similar problem was also studied in [71], where Jiang et al. proposed the Squeeze and Excitation (SE)- YOLOv3 [59] mask detector for the detection of properly worn masks. The main idea behind the approach was to integrate the SE block with the YOLOv3 [59] model to teach the network to focus on the crucial features. The authors also utilized a focal loss to solve the extreme foreground-background class imbalance. Experimental results showed that the proposed network achieved better localization and detection performances than competing models on the considered dataset. Kantarci et al. [72] introduced a novel face mask detection dataset named Bias-Aware Face Mask Detection (BAFMD) dataset. The dataset has been collected using Twitter images with a specific focus on mitigating dataset bias for ethnicity, age, and gender. In order to reduce such biases, their dataset contains real-world face mask images with a more balanced distribution across different demographics, e.g., gender, race, and age. They train a YOLOv5 [73] object detector, which shows superior performance over other detectors. In [1], Batagelj et al. compare different masked face detectors and correct face-mask placement classification networks in detail using a dataset that they created using the MAFA [56] dataset. The reported experimental results provide insightful performance comparisons of various methods and show that the RetinaFace [58] model is the most stable masked face detection model among the considered techniques. Furthermore, the authors demonstrated that all face detection models’ performance deteriorate significantly, if face masks are present in the image as compared to faces without masks. In [74], the authors proposed a face mask-wearing identification method by combining image super-resolution and classification networks (SRCNet). They used a standard face detector for detecting and cropping faces with and without masks. After the detection step, the authors evaluated the image size to choose the next step. If an image’s resolution was smaller than 150×150 pixels, i.e., the width or length was below 150 pixels, they applied the super-resolution model to enhance the high-frequency details of the image. If the image was already larger than 150×150 pixels, they skipped the super-resolution and subjected the image to a face mask-wearing classification network, which classify the mask-wearing conditions into one of three classes: no mask-wearing, incorrect mask-wearing, and correct mask-wearing. The reported experimental results show that applying super-resolution to low-resolution face crops boosts classification performance and that the presented model yielded competitive performance overall.

Most of the methods proposed for (masked) face detection and related problems, such as facial mask detection, build on advances made in the generic object detection problem domain. However, to adapt/extend the existing detectors to work reliably with partially occluded face data or, specifically, with masked faces, these solutions incorporate minor modifications to the overall detection pipelines and, more importantly, introduce new, large-scale datasets with masked faces that contain diverse data with rich appearance variations induced by facial masks for model training. Due to the importance of these datasets for the masked face detection problems, they are discussed separately in Section 4.

3.2. Face recognition

Similarly to face detection, where the appearance and widespread usage of facial masks had an adverse impact on the performance of existing face detection models, face recognition is another area, where facial masks negatively impacted the applicability of face recognition technology. In this section, we therefore provide an in-depth discussion of the effect of face masks on different components of face recognition systems and then review the efforts made so far to mitigate such negative effects.

3.2.1. The effect of wearing masks on face recognition

Face recognition deployability is strongly affected by biometric sample capture and presentation, most prominently, face occlusions. Face recognition in the presence of occlusions has been studied widely within the computer vision community [17], [75], [76], [77], [78], [79], [80], [81]. However, most of the pre-COVID-19 work targeted general unstructured face occlusions. The effect of the specific occlusion induced by face masks gained attention at the start of the COVID-19 pandemic. An early work by Damer et al. [32] evaluated the verification performance drop of face recognition systems when verifying unmasked-to-masked faces, in comparison to verifying unmasked faces, all with real masks and in a collaborative environment. This was followed by an extended study [82] with a larger database and evaluation of both synthetic and real masks. As a part of the ongoing Face Recognition Vendor Test (FRVT), the National Institute of Standards and Technology (NIST) has released results (FRVT -Part 6A) on the effect of face masks on the performance of face recognition systems provided by vendors [83]. The results revealed that the verification accuracy with masked faces declined substantially. However, the study used simulated masked images under the assumption that their effect is representative of the effect of real face masks. Following NIST’s evaluation, the US Department of Homeland Security conducted a similar evaluation, however, on more realistic data [84]. They also identified a significant negative effect of facial masks on the accuracy of automatic face recognition solutions. A general conclusion by these studies was that the effect of masks was bigger on genuine pairs’ decisions, in comparison to imposter pairs’ decisions. A study comparing the effect of face masks on the human experts/verifiers in comparison to automatic face recognition models concluded with a set of comments on different aspects of the correlation between the verification performance of humans and machines [85]. The study showed a trend in the human experts’ verification performance drop similar to that of automatic face verification models [85]. In the next section, an overview of the solutions to mitigate this negative effect on face recognition performance is presented.

3.2.2. Enhancing masked face recognition

As validated by the studies discussed above, wearing a face mask does significantly affect the performance of face recognition technology. This by itself is intuitive, as the mask covers part of the facial information that face recognition models can use to discriminate between individuals. However, the insights from the discussed studies also inspired many innovative solutions aiming at enhancing the performance of masked face recognition. In this work, we present an operational categorization of these solutions based on their conceptual modeling of the masked face recognition problem. These solutions can be categorized into four groups, (a) mask in-painting, (b) template unmasking, (c) model optimization, and (d) periocular recognition, and are presented in the following sections along with the main works that made significant contributions under each category. A graphical representation of these categories is also presented in Fig. 4 , where masked face probes are processed in four different processes to be compared to an unmasked face reference.

Fig. 4.

Fig. 4

Masked face recognition solutions mainly fall under one of the four categories presented in this figure. From top to bottom, these solutions are face in-painting, template unmasking, model optimization, and periocular recognition. Blue rectangles are general-purpose face recognition models, purple rectangles are template unmasking models, red rectangles are face recognition models trained specifically to tolerate masked faces, and green rectangles are periocular recognition models.

(A) Mask in-painting. Under this category, illustrated in the top of Fig. 4, the main goal is to detect and in-paint the face area covered by the mask before processing the face with conventional face recognition models. Such a process will not necessarily add additional identity-specific information to the face, because such in-painting processes are trained to predict the occluded area details from the visible parts of the face, and thus they extract the initial identity information from the already visible parts of the face. However, such in-painting will transfer the image into a distribution (domain) that is more similar to what general-purpose face recognition models are trained for and bring it closer to the unmasked reference. Seen as a domain adaption process, this can have the potential in enhancing the performance of masked face recognition. The main advantage with in-painting based strategies is the possibility to maintain the use of well-performing general-purpose face recognition models. The main disadvantage is that the training of the generative in-painting process is commonly expensive in terms of the required training data and the training computational cost [86]. Such generative processes might also result in artifacts that are out of the normal face image distribution compared to the original masked faces themselves [87].

Although face in-painting is in general a well-studied field with the recent methods producing photo realistic images [88], [89], [90], using this technique to enhance masked face recognition is still under-explored. Jiang et al. [91] recently addressed the specific issue of in-painting face mask areas without evaluating the effect on face recognition performance. Such aesthetic-driven face in-painting of the mask area, i.e. mask removal, is discussed more in details in Section 3.9. Similar in-painting approaches have been shown before to be beneficial, to some degree, in enhancing the recognition performance [92] of occluded faces.

(B) Template unmasking. Under this category, illustrated in the second row in Fig. 4, the main goal is to transfer the extracted masked face template into a form where it behaves similarly to a template extracted from an unmasked face of the same identity. Here, both the masked probe and the unmasked reference are processed by a general-purpose face recognition model, however, the masked face template typically undergoes another processing step. Just like with in-painting, this process will not add identity information to the masked face template, but it rather will remove the template artifacts introduced by the mask information. The main advantage of such solutions is that it maintains the use of the well-performing general-purpose face recognition models and that the overhead computational cost of the template unmasking model is relatively negligible [93] when compared to the face recognition model itself or the generative in-painting model discussed under the first category.

Despite the clear operational benefits of this category of solutions, relatively few works targeted such a concept. The first to do so was Boutros et al. [93] that proposed to train a template unmasking model on top of any general-purpose face recognition model to transfer the masked face template to a form that behaves similarly to an unmasked face template of the same identity in the comparison operations. Based on the fact that the genuine comparisons are significantly more affected than the imposter ones when comparing masked to unmasked faces, the authors proposed the self-restrained triplet loss that assigns higher importance to positive pairs during training when the negatives pairs are deemed relatively distanced enough. Following a similar operational concept, a recent study also proposed to process the masked face template in a framework that utilizes contrastive learning [94].

(C) Model optimization. Under this category, illustrated in the third row in Fig. 4, the main goal is to train a face recognition model that can produce comparable embeddings for both masked and unmasked faces. Understandably, training such a solution requires having masked and unmasked face samples in the training data. This also requires, in part, general-purpose face recognition training goals such as direct embedding learning [95] or embedding learning through classification [96], [97]. The main advantage of this category of solutions is that both the masked and unmasked faces are processed with the same model. However, such solutions induce the need for a tedious training process and considerable amounts of training data that also needs to include masked faces. Additionally, including masked faces in the training process might render the resulting model less accurate when comparing pairs of unmasked faces [98]. This shortfall was recently targeted in the literature with a high degree of success [99], where the authors forced the face recognition model to produce optimal templates for both, masked and unmasked faces by incorporating a template-level knowledge distillation loss between the trained network and a general-purpose face recognition network.

Most of the works addressing masked face recognition so far fall under this category. The authors in [12] combined the ArcFace loss [97] with a mask-usage classification loss and noted it as Multi-Task ArcFace [12]. Other work combined the traditional triplet loss and the mean squared error in an effort to improve the face recognition robustness to masks [13]. The authors in [98] theorized that the masked face recognition process requires a larger penalty margin when using the cosine loss. Others proposed improving the face template consistency using a pairwise loss [100]. Geng et al. [101] proposed to enhance masked face recognition performance through mask-like generative augmentation. Hsu et al. [102] experimented with different loss functions to determine their suitability for masked face recognition. A hybrid backbone of residual block and self-attention components was proposed by [103], an aspect that was also investigated in [104].

(D) Periocular recognition. Under this category, illustrated at the bottom of Fig. 4, the main goal is to simply reduce the face recognition problem to be a partial face recognition problem. This assumes that the mask commonly covers the lower part of the face and maintains the visibility of the upper part of the face. This area that includes the eyes and the adjacent regions is commonly called the (peri) ocular region [105]. The biometric literature refers to the recognition of this area, when the iris is not exclusively targeted, as periocular recognition [106]. Periocular recognition can include the periocular region of one eye for some applications [107], [108]. However, in the masked face recognition scenario, both right and left periocular regions are typically considered.

A number of works proposed to crop the masked face and focus the recognition task on the periocular region when the mask is present [109], [110], [111]. The need to use the periocular region for recognition purposes when faces are masked was extensively studied in [112], including a detailed survey on periocular recognition technologies.

3.2.3. Masked face recognition competitions

Two major competitions were organized in an effort to attract novel solutions for masked face recognition. The first was the MFR2021 Masked Face Recognition Competition [113] organized as part of the International IEEE Joint Conference on Biometrics (IJCB) 2021 [114]. The competition examined the deployability of the solutions by considering the compactness of the face recognition models. A private dataset was used for evaluation. The dataset contained real masked faces and represented a collaborative capture scenario. Out of 18 submitted solutions, 10 were able to outperform the widely used ResNet-100 baseline [63] trained using the ArcFace loss [97]. Most of the competition entries used synthetic or/and real masked face images in the training of their solutions.

The second competition was the Masked Face Recognition Challenge [34] organized within the Face Bio-Metrics Under COVID? Masked Face Recognition (MFR) Workshop, one of the IEEE/CVF International Conference on Computer Vision Workshops [115]. The competition included three test sets and used an online model testing system and provided a detailed evaluation of the submitted face recognition models. The results of the competition pointed to the effectiveness of augmentation strategies simulating facial masks when training recognition models for the targeted task of masked face recognition.

3.2.4. Masks and face recognition subsystems

Presentation attack detection. Presentation attacks on face recognition systems involve the presentation of an artifact or of human characteristics to a biometric capture subsystem in a fashion intended to interfere with system policy, as defined in ISO/IEC 30107–3 [116]. This can include attacks like face morphing [117], [118], makeup attacks [119], or even identification circumvention attacks [120]. However, given the attack scenarios, the attack that is most related to wearing masks is the spoofing attack, where an attacker presents an artifact to a biometric capture subsystem with the aim of impersonating a different identity [116]. Presentation attack detection solutions (PAD) aim at differentiating between non-attack samples, i.e. bona fide, and spoofing presentation attacks [121]. Such solutions can be based on user challenge (user performing a specific task/move), on special sensor characteristics, e.g. light field camera or thermal sensor, or on software solutions [121]. The most widely spread software solutions depend on analyzing samples captured in the visible domain given the high deployability of visible-spectrum cameras in personal devices. Such solutions can be texture-based [122], motion-based [123], frequency-based [124], or a combination of two or more of these technologies [125].

Wearing a face mask changes the nature of the sample processed by the face PAD algorithm. This was apparent in the wide-spread reporting of malfunctioning face logins into personal devices at the start of the COVID-19 pandemic, not only because of failing to match to the unmasked reference image, but also because the masked face is seen as a spoofing attack by the PAD algorithm. This interesting fact was revealed by an extensive study presented by Fang et al. [126] where the authors collected a set of unmasked and masked bona fide and attack samples and tested both the vulnerability of face recognition to such attacks and the performance of established PAD algorithms when processing masked attacks. The study also presented a novel kind of attack where the attacks, printed or shown on a screen, were covered with a real mask. The main findings of the study pointed out that PAD algorithms classify many of the masked bona fide samples as attacks. The study also found that face recognition algorithms are still vulnerable to masked face attacks, especially when a real face mask is placed on the attacks [126]. An effort to reduce this effect on PAD performance was successfully presented in [127] where the authors propose to train the PAD using partial pixel-wise labels, where the real masks placed on the attacks are considered to be a bona fide area in an attack sample. This was also supported by giving the non-covered parts of the face a higher influence in the PAD decision inference from the image, bringing the PAD behavior on masked faces closer to that of the unmasked faces [127]. Further efforts are required though to build publicly available masked face attack databases and mask-invariant masked face PAD algorithms.

Quality assessment. Face image quality (FIQ) measures the utility of an image to face recognition algorithms [128], [129]. This utility is measured with an FIQ score as defined in ISO/IEC 2382–37 [130]. Various methods have been developed for face image quality assessment (FIQA) weather by building quality pseudo labels and learning to predict such labels [131], [132], by measuring different aspects of face recognition model response to the investigated image [133], [134], or learning to predict the relative classifiability of a face by predicting its class-relative placement in a face recognition training process [135]. As FIQA measures the utility to face recognition algorithms, it does not necessarily reflect the perceived image quality (IQ) measured by conventional general image quality assessment (IQA) solutions [136], [137]. However, IQ measures have been found to correlate to the face image utility, though to a much lower degree than FIQ [136]. As mentioned earlier, wearing a mask does lower the accuracy of face recognition, and thus it is expected also to be reflected in a lower FIQ. This issue was investigated by Fu et al. [138] where it was shown that even when the perceptual quality and capture environment do not change, the FIQ drops substantially when a mask is worn. This consistently correlates with the drop in face recognition performance, whether by machine or human experts [138]. Additionally, the networks performing FIQA did shift their attention away from the mask region and more towards the visible face region, more specifically the ocular region, as demonstrated in [138].

3.3. Facial expression recognition

Another important application affected by the presence of face masks is Facial Expression Recognition (FER). FER is a longstanding computer vision problem, where the goal is to recognize specific facial expression or emotional states based on changes in facial appearance. It is generally acknowledged that different parts of the face are involved when expressing different expressions, as evidenced, for example, by the facial action unit coding (FACS) system, one of the most widely conceptual frameworks to the FER problem [139], [140], [141], [142]. Thus, occlusions of these areas, as caused, for example, by facial masks lead to obvious performance degradations.

The problem of facial expression recognition under the presence of face masks was explored by Abate et al. in [41]. Here, the authors studied class activation maps (CAM) for different expressions and found that anger, happiness, sadness, and neutral expressions are most heavily represented around the nose and mouth areas. As results of this observation the authors concluded that FER models struggle to extract informative features from the face images when face masks are present. To address this problem two common mitigation strategies were proposed in the literature, i.e.: (a) collecting/generating masked datasets with facial expressions that can be used for fine-tuning of existing FER models, and (b) designing new models capable of performing facial expression recognition despite the presence of masks.

(A) Datasets and fine-tuning. Collecting and labeling facial expressions is a difficult, time- and labor-intensive task that might also be subjective. The difficulty of labeling facial expressions carries over to the problem of Masked Facial Expression Recognition as well. Since there are no datasets publicly available for this task, generally, simulated masks are utilized in the literature. Yang et al. [40], for example, developed a mask simulation method that uses facial landmarks and their orientations to fit a mask. They also annotated 13000 images from the Labeled Faces in the Wild (LFW) dataset [143] for facial expression recognition and compiled a new dataset, called LFW-FER. Finally, using the mask simulation methodology on the LFW-FER dataset, they generated a synthetic dataset for FER containing simulated mask, called M-LFW-FER that is publicly available for research purposes and can be used to fine tune FER models for expression recognition under the presence of facial masks.

Similar ideas were also pursued by other works. Barros et al. [144], for example, first detected the facial landmarks on images from the AffectNet dataset [145] and fit a mask to the faces covering all the landmarks below the nose. Using the resulting MaskedAffectNet dataset, the authors then applied different training strategies, e.g., transfer learning, to their FaceChannel model [146] to account for the presence of face masks. When the authors trained the FER model from scratch using the MaskedAffectNet dataset, the model performance drastically deteriorated for unmasked applications. However, when the model first pretrained on a standard dataset and later fine-tuned, the FER performance was affected only so slightly, making it useful for both masked and unmasked facial images.

(B) Mask-agnostic FER. Techniques from the second group aim to design models that are robust (agnostic) with respect to the presence of facial masks and perform similarly for masked and unmasked faces. Yang et al. [147] developed a new approach for masked face expression recognition along these lines. Their model consists of two parts. The first part includes a classifier for masked and unmasked recognition that generates a binary attention heatmap for the face masks. The second part of the model takes the binary attention heatmaps and convolutional face features to classify the facial expression. The authors show that their model outperforms other state-of-the-art occlusions robust facial expression recognition models, like region attention network (RAN) [148] and CNN with attention mechanism (ACNN) [149].

3.4. Masked face age classification

Similar to face and facial expression recognition systems, age estimation techniques also critically depend on the visibility of the facial areas and struggle with performance when parts are occluded. As a result, studies investigating age estimation with facial masks have also appeared during the COVID-19 pandemic.

Golwalker et al. [39] conjectured that using large prediction models in age estimation with occluded faces makes it challenging due to the lack of suitable large-scale datasets. When wearing masks, the most discriminative features for age estimation are largely hidden below the masks, like wrinkles on the cheeks and mouth. Their approach to this problem was, therefore, using a shallow model that could be fine-tuned easily based on a small set of images of people wearing masks. To this end, the authors used a simple 9-layer CNN architecture. For the age detection dataset, they collected faces wearing masks from various age categories and augmented it with an auxiliary dataset of 4500 synthetic images of masked people using a Generative Adversarial Network (GAN) [150]. Öztel et al. [38] developed a two-stage pipeline consisting of a face mask detection and an age classification stage. With this approach, the authors first determine if the person is wearing a face mask or not. Then, depending on the result, two separate age classification models are utilized. If the person is not wearing a mask, a standard classification model in the form of a simple CNN trained on UTKFace Large Scale Face Dataset [151] is used. If the person is wearing a face mask, another simple CNN model is utilized, but this time trained with simulated face masks on the UTKFace Large Scale Face Dataset [151]. The proposed pipeline included three different age classes, i.e., teenager (12–20), middle-aged (21–64), and elderly (65+), and was shown to ensure competitive results.

3.5. Landmark localization and alignment

Facial landmark localization and alignment are essential components of various face-related applications, such as face recognition, facial pose estimation, 3D face reconstruction, emotion recognition, face synthesis, and face morphing, and falls into the category of supporting techniques given our taxonomy from Section 2. The main goal of landmark localization is to locate key points of the given 2D face image, such as the nose tip, eyebrow curve, mouth corners, eye centers, or eye corners among others. Until the pandemic, many successful facial landmark localization approaches have been developed by using thousands of annotated face images [58], [152], [153]. However, the widespread usage of face masks to prevent virus transmission has brought new challenges to landmark localization and alignment similarly as to many other face-based algorithms. As it was not possible to collect and label a new dataset for the task, most methods prefer to use existing datasets, especially the JD-landmark dataset [154] and place virtual facial masks on the face images. This is because labeling facial landmark points on images with facial masks is hard, especially for 68 or 106 points of landmarks, which is the most commonly used markup in the literature. The authors of [47] propose MaskFan, which is a lightweight convolutional neural network that uses depthwise separable convolutions and group operations. They also propose a novel loss function named Enhanced Wing loss, which gives less importance to errors made near facial masks. Facial landmark localization methods generally adopt L1 or L2 loss functions that focus on more considerable errors. Since predicting facial landmarks over facial masks is a hard task due to invisible parts of the face, applying L1 or L2 loss forces the model to pay more attention to large errors that most heavily impact performance. In [155], Wen et al. also propose a new architecture for the masked facial landmark localization problem. Their model consists of three different neural networks designed for: alignment, estimation, and refinement. They use downscaled face images in the alignment network and then align faces according to the reference pose. Then, the estimation network predicts 106 facial landmarks. Finally, their refinement model takes the non-masked region of the face, which is eyes and eyebrows and tries to generate more accurate predictions. Hu et al. [48] adopt multi-knowledge distillation and a pose-aware resampling strategy. They aim to increase data diversity by sampling images with different face poses.

All the presented works generate virtual masks and apply them on the JD-landmark dataset [154], while studies with real masks are still largely missing from the literature due to the obvious ground-truth issues associated with such work. Suggestions for evaluation strategies that only consider visible landmarks have also been made in prior publications [154].

3.6. Crowd detection and counting

One of the prevention measures to reduce the spread of the Coronavirus disease is physical/social distancing in public areas (see Fig. 5 ) that can be monitored automatically using vision-based crowd counting techniques. Such techniques are able to count or estimate the number of people in a given area from a single image or a video acquired through surveillance cameras, CCTV or even drones. A plethora of research has been done over the past years on the crowd counting problem [166], [167] dealing with challenges, such as mutual occlusions, non-uniform people density, varying scale, perspective, illumination, weather conditions, crowd size, and density, that can severely alter human appearance. In this section, we review crowd counting solutions with the focus on approaches developed to assist policy measures against the COVID-19 pandemic (see Table 1 ).

Fig. 5.

Fig. 5

Generic images of crowded street before and during the pandemic, respectively. (© JordyMeow and Benzoyl).

Table 1.

Summary of the reviewed crowd counting methods.

Method Year Handling of facial masks Handling of social distancing Mode of counting implementation
Al-Sa’d et al. [156] 2022 No Yes Detection based CNN
Valencia et al. [157] 2021 No Yes Detection based CNN
Somaldo et al. [158] 2020 No Yes Detection based CNN
Nguyen et al. [159] 2021 No No Regression based CNN
Almalki et al. [160] 2021 Yes No Detection based CNN
He et al. [161] 2022 No No Attention based CNN
Dosi et al. [162] 2021 No No Attention based CNN
Alvarez et al. [163] 2021 No Yes Detection based CNN
Jarraya et al. [164] 2021 No No Density based CNN
Amin et al. [165] 2021 Yes Yes Detection based CNN
Nguyen et al. [6] 2021 Yes No Detection based CNN

Early crowd counting methods mainly rely on object detection with counting. Usually, these methods first extract image features, such as shapelets [168], Histograms of Oriented Gradients (HOGs) [169], Haar wavelets [170] or other related descriptors from the image and then combine the computed representations with various classification methods, such as Support Vector Machines (SVMs) [169], [170], regression forests [171] and alike, in order to detect people in images. These methods work well for detecting sparse (masked) faces, but perform poorly on dense crowds where individual people are not clearly visible.

To alleviate the above-mentioned problems, some approaches rely on direct-count regression. Examples of such methods in [172] use handcrafted features and some regression technique, such as linear regression, to learn a mapping function between the features and the crowd count. These methods are able to accurately estimate people counts even in the presence of occlusions and background clutter, but they ignore spatial information. The solution to this problem is given by the density estimation based methods [173] that learn a mapping between features in the local region and the corresponding object density maps, while integrating over the densities to obtain crowd counts. The approaches of this type generally use dot-annotated images for training that are transformed to density functions using kernel density estimation [174].

With the advent of deep learning, many crowd counting approaches have been proposed based on convolutional neural networks [175]. Amin et al. [165], for example, proposed a solution that addresses both face mask detection and crowd counting. With this approach, the YOLO-based algorithm [176] is used to detect face masks, while the MobileNet single shot object detector [177] is used simultaneously for crowd counting. Kammoun-Jarraya et al. [164] introduced a CNN based technique for crowd counting from a single image for enforcing social distancing during the COVID-19 pandemic. The proposed model follows the structure of VGG-19 [178] with small kernel sizes in the convolutional layers but without the fully connected layers. Due to the fully convolutional structure, the model is able to process input images of arbitrary resolutions. The reported results on a new large-scale crowd counting dataset from the Saudi public areas point to the competitive performance compared to the state-of-art methods. Alvarez et al. [163] developed a software to monitor the physical distance and the crowd density of a specified area or a region of interest. The YOLOv3 model [59] was used in this work to detect humans in each video frame captured by a mobile phone. Physical distancing is monitored through computing the interpersonal distance of a pair of centroids of the detected bounding boxes, while the crowd density is computed by counting the number of people present in the region of interest. The software is able to detect 83% of physical distancing and 84% of crowd density violations. Dosi et al. [162] proposed a pipeline named Attentive EfficientNet (AECNet) for density estimation in crowd counting that makes use of an encoder-decoder-based architecture. In the encoder block, they use EfficientNet [70] and empirically show their superiority over other feature extraction architectures.

To mitigate the problem of huge scale variations, He et al. [161] proposed a novel approach for crowd counting named Jointly Attention Network (JANet). They designed the Multi-order Scale Attention module to extract meaningful high-order statistics with abundant scale details and also introduced the Multi-pooling Relational Channel Attention module to investigate the global scope relations and structural semantics. Various experiments illustrated the superiority of the JANet approach. Almalki et al. [160] introduced an approach that detects, counts, and classifies the crowd’s masking condition and calculates spatiotemporal safety index that can be used for assisting effective policy decisions and relief plans against COVID-19. The approach uses YOLOv3 [59] to extract image features and a classification layer is added at the end of the YOLOv3 extractor that classifies a detected face to either the mask or the no-mask group. A unified system that allows the scale variation problem to be solved both directly and indirectly was described by Nguyen et al. in [159]. Here, the dense scale information is learned directly through the main network, which is designed with dense dilated convolution blocks and dense residual connections among the blocks. The scale information is further incorporated into the features indirectly through learning depth information from an auxiliary depth dataset.

Somaldo et al. [158] proposed a drone that has the ability of localization, navigation, people detection, crowd identifier, and social distancing warning. For this purpose they utilize YOLOv3 [59] to detect people and also define an adaptive social distancing detector. Valencia et al. [157] presented a desktop application that utilizes YOLOv4-tiny3 and the DeepSORT tracking algorithm [179] to monitor crowd counts and social distancing from a top-view camera perspective. A privacy-preserving adaptive social distance estimation and crowd monitoring solution for surveillance cameras was proposed by Al-Sa’d et al. [156]. The authors utilize OpenPose [180] to detect and localize people. Their approach is able to compute inter-personal distances in real-world coordinates, detect social distance infractions and identify overcrowded regions in a scene. The work presented in [6] investigated the effectiveness of different approaches to estimate the ratio of people wearing a mask within an observed crowd - a problem referred to by the authors as mask-wearing ratio estimation. Specifically, the authors compared detection-based and regression-based approaches to crowd counting, while also distinguishing between people with and without masks in the given crowd. Moreover, the authors improved the state-of-the-art face detector, RetinaFace [58], to be able to better estimate the mask-wearing ratio. A large-scale dataset with more than 580,000 face annotations was also introduced to facilitate the experiments.

3.7. Breathing rate measurements

Clinical studies on patients with COVID-19 disease showed that one of the most common symptoms are fever, respiratory and digestive symptoms [181]. In order to identify breathing abnormalities, which can be a symptom of COVID-19, multiple research works suggested measuring respiratory rate using wearable devices [182], non-contact radar signals [183] and thermal cameras [29].

Among these breathing rate measurement techniques, detecting breathing anomalies using thermal cameras represents a cheap and effective solution that can easily be implemented in practice, as many countries already deployed thermal cameras to detect people with high fever at airports and public buildings [184]. Following this line of research, Queiroz et al. [29] proposed to analyze the intensity of thermal images over time using deep learning techniques. The approach exploits the fact that the region covered by facial masks gets warmer when exhaling, which can be detected through the analysis of the pixel intensities in the thermal image. Similarly, when the pixel intensity within the mask region decreases, this indicates that the person is inhaling. To facilitate the research, the authors collected 33 videos of 11 subjects, with subjects breathing slowly, normal and fast. Their experimental results showed that breathing rate measurements can reach an accuracy of up to 91% on their dataset.

3.8. Face-hand interaction detection

To minimize the transmission of COVID-19, common advice issued by health organizations included limiting face-hand interaction. CVHA techniques were also developed to help monitor face-hand interaction in public spaces. A basic representation of the face-hand interaction detection task is presented in Fig. 6 .

Fig. 6.

Fig. 6

Illustration of the face-hand interaction detection task. One of the most common advice given to prevent virus transmission is not to touch the face with hands. During the COVID-19 pandemic, automatic detection of face-hand interaction has gained importance as a research topic in CVHA.

One of the initial studies by Beyan et al. [31] investigated the face-hand touching behavior of people. The authors first manually annotated 64 video recordings, originally collected for the analysis of social interactions within a small group of people, for face-hand touching interaction. Next, they evaluated rule-based, hand-crafted feature-based, and learned CNN feature-based models for their performance in face-hand touching detection and found that the CNN model yielded the best overall results with an F1-score of 83.76%. In a more recent study, Eyiokur et al. [30] explored the applicability of several well-known CNN models, such as ResNet [63] and EfficientNet [70], for face-hand interaction detection. Here, the authors first introduced an unconstrained face-hand interaction dataset, named ISL-Unconstrained Face Hand Interaction Dataset (ISL-UFHD), to advance detection of face-hand interaction detection within a comprehensive prevention system for COVID-19 protection measurements, and then evaluated the considered classification models on the newly collected data. Experimental results showed that the highest classification accuracy of 93.35% was obtained with the EfficientNet-b2 model [70].

Both of the studies discussed above, proposed CVHA techniques that showed promise for the face-hand interaction problem. However, important challenges, such as the detection in extreme imaging conditions and under varying poses, or in the presence of ambiguity caused by the different depth levels of the face and hand, still persist. Face-hand interaction detection is, therefore, still considered an open research problem that requires further investigation.

3.9. Synthetic data generation and mask removal

One of the main challenges in CVHA at the beginning of the COVID-19 was the obvious lack of suitable datasets needed to train various CVHA techniques. In response to this challenge, generative approaches have been quickly adopted to build synthetic datasets to alleviate the need of collecting real-life masked face images as well as to develop methods based on data augmentation and generation for various tasks such as face recognition, identification, and landmark detection.

To artificially generate face images with masks, Anwar et al. [33] developed an open-source tool, MaskTheFace,4 that can convert non-masked faces to masked faces effectively. The tool uses the Dlib-based face landmark detector [152] to identify the face tilt and six key features, i.e. landmarks, on the face to properly fit a face mask. Alternatively, Wang et al. [42] presented another open-source toolbox, FaceX-Zoo,5 which implements a Facial Mask Adding (FMA-3D)6 method for adding a mask to a non-masked face image. Given a real masked face image I and a non-masked face image J, this method synthesizes a photo-realistic masked face image with the mask region coming from I and the facial area coming from J.

Encouraged by these initiatives, many studies have attempted to enrich existing datasets containing faces without masks, e.g. CelebA [185], CASIA-WebFace, LFW [143], CALFW [186], with synthetically generated masked face images to enable further research on masked face recognition [43], [154], [187], [188], [189], [190]. For instance, Wang et al. [43] and Karasugi et al. [187] generated synthetic face mask datasets using Dlib’s landmark detector [152] to properly align face mask templates on faces, whereas Mare et al. [188] relied on SparkAR Studio,7 a developer program made by Facebook to create Instagram face filters, to create synthetic masks and overlay them on faces in the original images. On the other hand, Xiang et al. [154] targeted to improve the accuracy and robustness of facial landmark localization on masked faces by introducing a new dataset with generated masks that are largely varied in identity, head pose, facial expression, and occlusion based on the FMA-3D method in the FaceX-Zoo toolbox [42].

Some studies tackled the opposite problem and focused on removing the face masks from images [25], [44], [45], [46]. The idea with these studies is to bring the data closer to the real-world masked-free data, for which standard off-the shelf CVHA models for different tasks are readily available. For instance, Din et al. [44] investigated a two-stage method for unmasking of masked faces where the first stage detects and segments masks with a modified version of U-Net [191] and the second stage deploys a GAN-based network with global and local discriminators for mask-area inpainting. Similarly, Li et al. [45] proposed a method combining a GAN and a texture network to first inpaint the face after removing the mask and then to smooth out the texture to make the resulting face more realistic. Taking a step further, Coelho et al. [46] presented a generative approach for face mask removal using audio and appearance together. This approach estimated landmarks representing mouth structure from the audio, and feed these landmarks into a GAN to reconstruct the full face image with the mouth in a correct shape. Hu et al. [25] described a method to generate faces with properly worn masks, either by simply overlaying the mask if no mask is worn or by first removing and then overlaying the mask if the mask is incorrectly worn. The method employs Mean and Covariance Feature Matching GAN (MCGAN) [192] for the mask removal task and uses MaskTheFace [33].

4. Datasets

An unprecedented number of datasets targeting various CVHA tasks related to COVID-19 have been introduced over the last few years. While masked face datasets were already collected back in 2017 [56], the need for suitable larger-scale collections of masked face images increased significantly during the COVID-19 pandemic. As a result, novel (real and simulated) datasets were introduced (see Fig. 7 ) for masked face detection and recognition, face-hand interaction detection, (masked) crowd counting and as well as other related CVHA problems. In Table 2 , we summarize the main COVID-19 related datasets and compare their characteristics.

Fig. 7.

Fig. 7

Illustrative example images from different datasets introduced for the development of CVHA techniques for the COVID-19 era.

Table 2.

Summary of COVID-19 related datasets reviewed in this paper. BB: Face bounding box, FM: Face Mask, FP: Frontal to Profile, MF: Masked face, FO:Face Occlusion.

Dataset name Dataset source & availability Mask Types Number of images Head pose Variation of subjects Number of face mask classes Purpose of data collection Annotation type
MAFA [56] Internet
Publicly available
Real 30,811 Various Medium Diversity
Mostly Asian People
Multiple MF detection BB coordinates
FM & occlusion classes
MAFA-FMD [193] MAFA
Not publicly available
Real 56,084 Various Medium Diversity
Mostly Asian People
3 FM detection BB coordinates
FM classes
FMLD [1] MAFA,Wider Face
Publicly available
Real 63,072 Various High Diversity 3 MF detection BB coordinates
FM & occlusion classes
WearMask [194] MAFA,Wider Face,Internet
Publicly available
Real 9,097 Various High Diversity 2 MF detection BB coordinates
FM classes
PWMFD [71] MAFA,Wider Face,Internet
Publicly available
Real 9,205 Various High Diversity 3 MF detection BB coordinates
FM classes
ISL-UFMD [30] Internet,Celebahq, FFHQ
Publicly available
Real 20,891 Various High Diversity 3 FM detection FM classes
SMFRD [43] Face recg. Datasets
Publicly available
Artificial 500,000 Various Various 2 MF recognition FM classes
Subject ids
MaskedFace-Net
CMFD,IMFD [51]
FFHQ
Publicly available
Artificial 133,783 Mostly
Frontal
High Diversity 2 FM detection FM classes
Incorrect FM sub-classes
MS1MV2-Masked [93] MS1MV2
Publicly available
Artificial 57.5 m Various High Diversity 6 MF recognition FM classes
Subject ids
RMFRD [43] Internet
Publicly available
Real 92,671 FP Low Diversity
Asian People
2 MF recognition FM classes
Subject ids
Damer’s MFRD [32] Collected by researchers
Not publicly available
Real 2,160 FP Low Diversity 2 MF recognition Subject ids
Damer’s
MFRD-Extended [82]
Collected by researchers
Not publicly available
Real,
Artificial
8,640 FP Low Diversity 2 MF recognition Subject ids
MFR2 [33] Internet
Publicly available
Real 269 FP High Diversity Multiple MF recognition FM classes
Subject ids
FaceMask [195] Internet
Publicly available
Real 4,866 Various Various 2 MF detection BB coordinates
FM classes
DS-IMF [196] Collected by researchers
Not publicly available
Real 3600 Frontal Low Diversity
Indian People
2 MF detection
& recognition
BB coordinates
Subject ids
Thermal-Mask [29] SpeakingFaces
Not publicly available
Artificial,
Thermal
75,908 FP Low Diversity 2 Thermal MF detection
Breathing rate measurement
BB coordinates
FM classes
Thermal mask
dataset [197]
Collected by researchers
Not publicly available
Real,
Thermal
7,920 Various Low Diversity 2 Thermal MF detection BB coordinates
FM classes
COVID-19 TFCD [198] Collected by researchers
Publicly available
Real,
Thermal
261 FP Low Diversity 2 Thermal MF detection BB coordinates
FM classes
Kaggle853 [199] Internet
Publicly available
Real 853 Various Medium Diversity
Mostly Asian
3 MF detection BB coordinates
FM classes
Kaggle12k [200] Internet
Publicly available
Real 12,000 FP High Diversity 2 FM detection FM classes
Kaggle
FMLD [201]
Style Gan-2 Generated
Publicly available
Artificial 20,000 Frontal High Diversity 2 Not clarified FM classes
WWMR-DB [202] Collected by researchers
Publicly available
Real 1222 FP Low Diversity Multiple MF detection BB coordinates
FM classes
Medical Mask Dataset
(MMD) [203]
Public domain
Publicly available
Real 6,000 Various High Diversity 3 MF detection BB coordinates
FM & occlusion classes
AIZOOTech face
mask dataset [204]
MAFA,Wider Face
Publicly available
Real 7,971 Various High Diversity 2 MF detection BB coordinates
FM classes
BAFMD [72] Twitter
Publicly available
Real 6,264 Various High Diversity 2 MF detection BB coordinates
FM classes
ROF [81] Google Image Search
Publicly available
Real 5,559 Various High Diversity 3 MF recognition FO classes
Subject ids

In [56], the first large-scale masked face dataset, named MAFA, was published. The dataset contains 30,811 images of multiple persons with various head poses, face occlusions, and ethnicity, collected from the Internet. The MAFA dataset includes 35,806 masked face crops (there are multiple faces per image) with six annotated attributes: face, eye, and mask coordinates, head pose, occlusion degree, and four different mask types. The dataset is primarily intended for the development of face/mask detection models. However, it needs to be noted that some of the masks present in the data are not worn correctly, e.g., they are not covering the nose, so mask detection models developed on this dataset are generally considered less suitable for monitoring applications aimed at preventing the spread of the COVID-19 disease. To address this issue, some studies considered improper mask usage as an additional label for the facial images, i.e., next to the mask and no mask labels. Such data labels helps conceive more appropriate systems with respect to health-protective rules and usability in real-world conditions. In [1], Batagelj et al. pointed out that the actual annotations of MAFA [56] are not suitable for training useful detectors to distinguish between correctly and incorrectly worn masks. The authors, therefore, reannotated the MAFA images based on health-protective rules. In addition to MAFA [56], they also annotated the Wider Face dataset and released the generated annotations under the name Face Mask Label Dataset (FMLD). Thus, FMLD [1] includes images partitioned into three groups: 29,532 images with correctly worn masks, 1,528 images with incorrectly worn masks, and 32,012 images with mask free faces. In addition to mask annotations, the FMLD also has bounding box coordinates of faces, gender, ethnicity, and pose labels for each face. Wang et al. [194] utilized the same benchmark datasets, Wider Face [53] & MAFA [56], to build a serverless edge face detection tool. Their dataset included 4,065 images from MAFA, 3,894 images from Wider Face [53], and 1,138 additional images from the Internet, for a total of 17,532 face crops with corresponding bounding boxes. In [71], a new dataset called the Properly Wearing Masked Face Detection Dataset (PWMFD) is presented and consists of 3,615 newly collected images, 2,581 relabeled images from MAFA [56], 2,951 images from Wider Face [53], and 58 images from RMFRD [43]. Similar to the previous studies, Jiang et al. [71] considered three classes for the labels of their dataset, i.e., correctly worn, incorrectly worn and mask-free. In total, there are 7,695 properly worn masked faces, 10,471 mask-free faces, and 366 incorrectly worn masked faces in PWMFD. In [193], a face detector was first applied to the MAFA dataset [56], and the generated face crops were then reannotated with respect to virus protection rules. This way, a new dataset, named MAFA-FMD, was collected and includes 56,024 images belonging to correct, incorrect, and no mask-wearing classes. Unfortunately, the MAFA-FMD dataset is not publicly available.

Eyiokur et al. [30] proposed an unconstrained masked face dataset,8 named ISL-Unconstrained Face Mask Dataset (ISL-UFMD), to study CVHA techniques for COVID-19. ISL-UFMD [30] contains 11,075 mask-free, 9,300 proper, and 513 improper mask images, collected from the Internet, YouTube videos, and well-known face datasets, such as CelebA-HQ [185] and LFW [143]. By relying on different resources during data collection, the data in ISL-UFMD features highly diverse images captured in unconstrained conditions with variability across ethnicity, age, gender, head pose, and environmental settings. Furthermore, Eyiokur et al. [30] also presented the first unconstrained face-hand interaction dataset named ISL-Unconstrained Face Hand Interaction Dataset (ISL-UFHD) to advance face-hand interaction detection with respect to COVID-19 protection rules. In ISL-UFHD, there are 10,018 samples with face-hand interaction and 20,038 without. Another related dataset, FaceMask, was described by Vrinkas et al. in [195] and contains 4,866 images of people with variations in gender and ethnicity, occlusions, and capture conditions, e.g., indoor/outdoor. Some of the faces are blurred, have partial occlusions or are of low-resolution due to distance to the camera. There are 15,419 and 12,262 face crops that belong to the mask and no mask classes, respectively. Morever, Kantarci et al. [72] proposed Bias Aware Face Mask Detection (BAFMD) dataset in order to create a dataset that minimizes potential bias on ethnicity, age, and gender. Their dataset contains 6,264 images from Twitter with more than 16,000 facial bounding boxes with and without facial masks.

The need for a large-scale masked face datasets motivated researchers to also generate simulated images with artificial masks positioned on the face, as already discussed in Section 3.9. In [51], a large-scale simulated masked face dataset named MaskedFace-Net, which includes the Correctly worn Masked Face Dataset (CMFD) and Incorrectly worn Masked Face Dataset (IMFD) subsets, was presented. MaskedFace-Net was constructed from the Flickr-Faces-HQ3 (FFHQ) dataset [205] using a mask-to-face deformable model and contains 137,016 images in total. In [99], the popular large-scale MS1MV2 dataset [206] with 5.8 M images of 85 k subjects was augmented with simulated face masks with a probability of 0.5. Similarly, in [93], a face mask simulated version of the MS1MV2 dataset [206] was utilized for the training of the presented masked face recognition system. To evaluate the proposed system, other well-known benchmarks for face verification, namely IARPA Janus Benchmark-C (IJB-C) dataset [207] and LFW [143], were used to generate face images with synthetic face masks. Moreover, in [43], three novel datasets, named Masked Face Detection Dataset (MFDD), Real-world Masked Face Recognition Dataset (RMFRD), and Simulated Masked Face Recognition Dataset (SMFRD) were published to investigate the face mask detection and face recognition performance in the case of occlusion due to face masks. Wang et al. [43] proposed 500,000 simulated masked face images of 10,000 subjects constructed with an artificial mask generation tool.

In addition to the works that investigate prevention, monitoring and control CVHA techniques, e.g., for the detection and tracking of proper usage of face masks, some studies also examined the effect of wearing face masks on the performance of face recognition systems. To facilitate this work, novel real-world and simulated masked face recognition datasets were introduced. Wang et al. [43], for example, described the Real-world Masked Face Recognition Dataset (RMFRD) which consists of 5,000 masked and 90,000 non-masked face images that belong to 525 celebrities. Although it is stated that the RMFRD dataset contains 5,000 face images with masks, there are only 2,203 face images with masks in the publicly available version. Damer et al. [32] presented a database that consists of 2,160 images of 24 participants from three different sessions. For each session, three videos are collected from the participants; two of them containing faces with and without a face mask in daylight and the third one containing faces with masks in different lighting condition. Session one was considered as a reference, and sessions two and three were considered as sources for the probe data. In their follow-up work [82], the same authors extended the initial dataset with an additional 24 participants and a new type of face images with simulated masks. In [33], the authors published a relatively small dataset, called Masked Faces in Real World for Face Recognition (MFR2), that contains 53 identities with an average of five images, with or without face mask, per subject.

In [196], Mishra et al. focused on analyzing masked face detection, gender prediction, mask/no mask classification, and masked face recognition on images that were acquired from Indian subjects. They introduced the Dual Sensor Indian Masked Face Dataset (DS-IMF), which consists of 300 subjects with 300 mask-free and 1,500 mask images per class. The images were captured with a DSLR camera and a mobile phone. Moreover, Fang et al. [126] presented the novel Collaborative Real Mask Attack Database (CRMA) to investigate the effect of face masks on presentation attack detection. The CRMA dataset consists of 30% AM0 (unmasked face PA), 60% AM1 (masked face PA), and 10% AM2 (unmasked face PA with a real masked placed on the PA) which are images of three different presentation attacks for analyzing both print and replay attacks. In [81], a new real-world dataset named Real World Occluded Faces (ROF) with 3,195 neutral images, 1,686 sunglasses images (upper-face occluded) and 678 masked images (lower-face occluded) is presented. In ROF dataset, collected images belong to 180 different identities and they are used to explore effect of occlusions on face recognition performance.

Another notable group of works [29], [197], [198] focused on the collection of datasets for COVID-19 related applications using thermal imaging. Queiroz et al. [29], for example, utilized a large-scale multimodal dataset known as SpeakingFaces [208] that consists of thermal images as well as visual and audio streams. The original SpeakingFaces dataset [208] does not include faces with masks. Queiroz et al. [29], therefore, generated thermal masked faces using artificial masks placed over the mouth and nose area. After preprocessing, 42,460 thermal masked and 33,448 thermal mask-free faces were included in the final Thermal-Mask Dataset (TMD) [29]. In [197], Glowacka et al. collected 7,920 thermal images with four different cameras from various distances and subjects with and without facial masks. The captured dataset [197] in the final form includes 10,555 faces, as some of the recorded images include multiple people. In [198], a small thermal mask dataset (COVID-19 TFCD), with 250 images belonging to 20 participants, was collected.

There are many online dataset repositories such as Kaggle, IEEE DataPort, and Github that allow researchers to publish their data collections. With growing interest in CVHA problems during the COVID-19 times, several face masked datasets [199], [200], [201], [202], [203], [204] were published on these repositories. In [199], there are 853 images of 4,080 faces belonging to three face classes (present/not present/improperly worn). In [200], around 12 k face images belonging to two main classes: mask, no mask, were published. The dataset has variations in terms of resolution, mask type, diverse people. Another dataset named Face Mask Lite Dataset (Kaggle-FMLD) [201] on Kaggle contains 10,000 artificial face images generated using StyleGAN2 architecture. By adding artificial masks to the generated faces, the authors created a simulated dataset to address the masked face recognition problem. The Ways to Wear a Mask or a Respirator (WWMR-DB) dataset [202], published on the IEEE DataPort, consists of 1,222 images of a small number of people with eight different mask usage annotations. The Medical Mask Dataset (MMD) [203] and AIZOOTech dataset [204] are publicly available masked face detection datasets and annotation efforts published by private companies.

5. Open issues & future challenges

Significant progress has been made over the last couple of years to address the main challenges in CVHA techniques for the COVID-19 era, but several open issues still remain and need to be addressed in the future. Below we provide a discussion of the most important topics in the opinion of the authors.

5.1. Self-adaptation of CVHA techniques

A desired ability of CVHA techniques is detecting new conditions and self-adaptation to these new conditions. For example, before the COVID-19 era it was not very common to wear masks. Therefore, most of the pre-COVID-19 CVHA techniques were trained with datasets that do not at all contain or contain very few samples of people with masks. On account of this, the performance of many existing techniques deteriorated severely, once people started wearing face masks. As the changes in human appearance can occur over time due to several factors, e.g., due to fashion trends and health concerns, it is necessary and important to have CVHA techniques that adapt themselves to the current conditions. One way of performing this is to benefit from online continual learning approaches [209], [210], [211], [212], [213], [214] that have been utilized in computer vision research. Detecting unseen cases is critical, as sometimes changes in appearance might not be as sudden. The adaption of models to new concepts and conditions by learning with few data [215], [216], [217], [218] and using self-supervised features [219] are other essential points that need to be considered in future works.

5.2. Generalization and robustness

The current generation of CVHA techniques is already exhibiting remarkable performance across diverse data characteristics. However, in unconstrained scenarios, large appearance variability may still adversely affect their performance [18], [19]. This, for example, includes low-resolution masked inputs for tasks such as face landmarking, face detection and face recognition. Significant pose variations and additional occlusions, e.g., due to glasses, hats and scarfs, also still have an adverse effect on performance, especially with masked facial images. In CVHA solutions involving crowds, novel ideas and powerful techniques are needed that can differentiate between masked and non-masked people in various environments and across a range of viewing angles [6]. Thus, there is an imminent need to further improve the generalization capabilities and most of all robustness of CVHA techniques aimed at combating COVID-19.

5.3. Availability of large-scale benchmarks

As discussed in Section 4, a considerable amount of datasets, especially for the masked faces, appeared in response to the needs induced by the COVID-19 pandemic. However, many of these datasets are small, not well curated and come without a well-defined experimental protocol and/or performance indicators. In CVHA problems, such as facial landmarking and related tasks, for example, ground truth information is usually also not readily available. As a result, research often combine datasets for their experiments and define in-house protocols for experimentation. It is, therefore, difficult to objectively evaluate progress and assess the merits and deficits of the CVHA techniques being proposed in the literature. Well-designed large-scale benchmarks with clear objectives and properly designed experimental protocols are critically needed to help advance the field further and provide a solid basis for research going forward.

5.4. Bias and fairness

Data-driven techniques that learn from labeled examples are today the most widely utilized solutions for various computer vision tasks. When such techniques are applied in automated decision-making systems that impact people’s lives, fairness and bias become critically important. As automated decisions need to be fair and equally accurate for all, regardless of race, gender, age and other demographic factors, it is paramount that CVHA techniques ensure unbiased performance for subjects with diverse demographic attributes [220]. The negative consequences of biased systems have, for example, made headlines for face recognition, prompting many of the largest software corporations, such as Microsoft, Amazon, and IBM, to reconsider their face recognition programs and policies [221], [222]. While several studies explored bias with standard CVHA techniques [223], [224], [225], [226], [227], this issue has seen far less attention with masked face images and the data characteristics induced by COVID-19 [228]. Therefore, studies are needed that help to better understand the behavior of CVHA techniques in terms of bias and fairness with masked face images, as well as targeted mitigation techniques that contribute towards fairer decisions for different CVHA tasks. Furthermore, as many of the existing datasets gathered for COVID-19 related CVHA techniques are not balanced across demographic groups, additional efforts are also required on the data collection and curation side to facilitate research into these topics.

5.5. Ethics and privacy

The ability of processing face images behind masks and identifying people raises certain questions about the surveillance capabilities enabled by such technologies. As in all ethics issues, a sound analysis should balance the potential benefits against the potential risks, and arrive at guidelines and recommendations that will mitigate the risks, while maximizing the benefits. The main risks of facial surveillance involve loss of privacy, especially in cases where privacy matters. Government surveillance, in particular during events that criticize the said government, is a major case in point, and there is widespread worry that the deployment of facial surveillance can jeopardize people’s rights of expression, can lead to prosecutions and harm. This is also linked to legitimate uses of facial surveillance and analysis technology, such as for public transport payments or health related screening purposes, which can then be extended into citizen surveillance, i.e. the “slippery slope” argument. Religious freedoms, freedom of opinion and expression, freedom of assembly and association are all fundamental human rights, and need to be protected. During the Umbrella Movement in Hong Kong, the protesters have used masks and other props to cover their faces to prevent the police from identifying them and singling out protesters for arrests.9 While there may be legal barriers for governments to target protesters, they can be sidestepped quickly. A Chinese based company, Hanwang, announced in 2020 that its facial recognition software was identifying people with masks with 95% accuracy, as opposed to 99.5% for people without masks.10 When asked the possibility of this software being used to identify protesters in Hong Kong, the company spokesperson said that this use case is known, but the market is too small. The company reports having about 200 clients in Beijing using the technology, including the police.

The second potential risk for enhanced facial surveillance capabilities (even with masked faces) is associated with data use by private companies. Since identification and personalization content for potential customers is key for new marketing approaches, identity can be monetized easily. Privacy breaches in this sector can have drastic consequences.11 While personal data, including biometric data, such as facial imagery, are regulated in certain parts of the world, e.g., see GDPR in Europe,12 the Japanese Act on the Protection of Personal Information in Japan [229], or the California Consumer Privacy Act (CCPA) [230] and the Biometric Information Privacy Act (BIPA) [231] in the US, technological safeguards are also critically needed to address ethics and privacy concerns. Along these lines, biometric privacy-enhancing technologies designed specifically for masked faces and capable of hiding part of the information contained in the data may become more important going forward [222].

6. Conclusion

It is generally expected that COVID-19 will not simply disappear, and will remain an issue for years to come. Consequently, novel computer vision techniques adapted to societal developments and behavioral changes of people induced by prevention measures and health-related governmental policies will increasingly be needed. A significant amount of work has already been done to help prevent and control the spread of the disease and facilitate normal operation of identity management schemes and other relevant infrastructure using vision-based methods. As discussed in the survey paper, a large part of this work focused on computer vision techniques for human analysis (CVHA), which analyze visual data related to faces and people during the COVID-19 era, e.g., in the presence of occlusions with face masks.

In this survey paper, we presented a comprehensive review of existing CVHA solutions for the COVID-19 era. Specifically, we discussed the main challenges introduced to CVHA problems by the pandemic, presented a high-level taxonomy of existing methods, elaborated on relevant datasets and described, what we feel, are the most important open issues that need to be addressed in the future. The consolidated information presented in the survey is expected to help researchers working on similar problems to quickly get an overview of the work already done and the main challenges that require further research.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This research was supported in parts by the ARRS Research Programme P2–0250 (B) “Metrology and Biometric Systems” and the additional funding provided for COVID-19 related research as well as the bilateral ARRS-TUBITAK funded project: Low Resolution Face Recognition (FaceLQ), with TUBITAK project number 120N011. This research work has been also partially funded by the German Federal Ministry of Education and Research and the Hessen State Ministry for Higher Education, Research and the Arts within their joint support of the National Research Center for Applied Cybersecurity ATHENE, and EU COST project GoodBrother (19121). The project on which this report is based was also partially funded by the Federal Ministry of Education and Research (BMBF) of Germany under the number 01IS18040A.

Footnotes

Data availability

No data was used for the research described in the article.

References

  • 1.Batagelj B., Peer P., Štruc V., Dobrišek S. How to correctly detect face-masks for covid-19 from visual information? Appl. Sci. 2021;11(5):2070. [Google Scholar]
  • 2.Fischer E.P., Fischer M.C., Grass D., Henrion I., Warren W.S., Westman E. Low-cost measurement of face mask efficacy for filtering expelled droplets during speech. Sci. Adv. 2020;6(36) doi: 10.1126/sciadv.abd3083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Feng S., Shen C., Xia N., Song W., Fan M., Cowling B.J. Rational use of face masks in the covid-19 pandemic. Lancet Respir. Med. 2020;8(5):434–436. doi: 10.1016/S2213-2600(20)30134-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ulhaq A., Born J., Khan A., Gomes D.P.S., Chakraborty S., Paul M. Covid-19 control by computer vision approaches: A survey. Ieee Access. 2020;8:179437–179456. doi: 10.1109/ACCESS.2020.3027685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bhargava A., Bansal A. Novel coronavirus (covid-19) diagnosis using computer vision and artificial intelligence techniques: a review. Multimed. Tools Appl. 2021;80(13):19931–19946. doi: 10.1007/s11042-021-10714-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Nguyen K.-D., Nguyen H.H., Le T.-N., Yamagishi J., Echizen I. 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) IEEE; 2021. Effectiveness of detection-based and regression-based approaches for estimating mask-wearing ratio; pp. 1–8. [Google Scholar]
  • 7.N. Petrović, Ð. Kocić, Smart technologies for covid-19 indoor monitoring, in: Viruses, Bacteria and Fungi in the Built Environment, 2022, pp. 251–272.
  • 8.Hussain S., Yu Y., Ayoub M., Khan A., Rehman R., Wahid J.A., Hou W. Iot and deep learning based approach for rapid screening and face mask detection for infection spread control of covid-19. Appl. Sci. 2021;11(8):3495. [Google Scholar]
  • 9.W. Tan, J. Liu, Application of face recognition in tracing covid-19 fever patients and close contacts, in: 19th IEEE International Conference on Machine Learning and Applications (ICMLA), 2020, pp. 1112–1116.
  • 10.Tan W., Liu J., Zhuo Y., Yao Q., Chen X., Wang W., Liu R., Fu Y. Fighting covid-19 with fever screening, face recognition and tracing. J. Phys: Conf. Ser. 2020;1634(1) [Google Scholar]
  • 11.Rezaei M., Azarmi M. Deepsocial: Social distancing monitoring and infection risk assessment in covid-19 pandemic. Appl. Sci. 2020;10(21):7514. [Google Scholar]
  • 12.D. Montero, M. Nieto, P. Leskovský, N. Aginako, Boosting masked face recognition with multi-task arcface, CoRR abs/2104.09874.
  • 13.Neto P.C., Boutros F., Pinto J.R., Saffari M., Damer N., Sequeira A.F., Cardoso J.S. BIOSIG. vol. P-315. Gesellschaft für Informatik e.V.; 2021. My eyes are up here: Promoting focus on uncovered regions in masked face recognition; pp. 21–30. (LNI). [Google Scholar]
  • 14.J.T. Widjaja, Developing Trustworthy Covid-19 Computer Vision Systems, https://towardsdatascience.com/developing-trustworthy-covid-19-computer-vision-systems-c862767d0d50, accessed: 2022-08-18 (2021).
  • 15.Martinez A.M. Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans. Pattern Anal. Mach. Intell. 2002;24(6):748–763. [Google Scholar]
  • 16.V. Štruc, S. Dobrišek, N. Pavešić, Confidence weighted subspace projection techniques for robust face recognition in the presence of partial occlusions, in: 20th International Conference on Pattern Recognition (ICPR), 2010, pp. 1334–1338.
  • 17.Ekenel H.K., Stiefelhagen R. International Conference on Biometrics. Springer; 2009. Why is facial occlusion a challenging problem? pp. 299–308. [Google Scholar]
  • 18.Wang B., Zheng J., Chen C. A Survey on Masked Facial Detection Methods and Datasets for Fighting Against COVID-19. IEEE Trans. Artif. Intell. 2021;1(01) 1–1. [Google Scholar]
  • 19.Alzu’bi A., Albalas F., Al-Hadhrami T., Younis L.B., Bashayreh A. Masked face recognition using deep learning: A review. Electronics. 2021;10(21):2666. [Google Scholar]
  • 20.Y. Utomo, G.P. Kusuma, Masked face recognition: Progress, dataset, and dataset generation, in: 3rd International Conference on Cybernetics and Intelligent System (ICORIS), 2021, pp. 1–4.
  • 21.Elbishlawi S., Abdelpakey M.H., Eltantawy A., Shehata M.S., Mohamed M.M. Deep learning-based crowd scene analysis survey. J. Imaging. 2020;6(9):95. doi: 10.3390/jimaging6090095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Shorten C., Khoshgoftaar T.M., Furht B. Deep learning applications for covid-19. J. Big Data. 2021;8(1):1–54. doi: 10.1186/s40537-020-00392-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Tomás J., Rego A., Viciano-Tudela S., Lloret J. Incorrect facemask-wearing detection using convolutional neural networks with transfer learning. Healthcare. 2021;9(8):1050. doi: 10.3390/healthcare9081050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Razavi M., Alikhani H., Janfaza V., Sadeghi B., Alikhani E. An automatic system to monitor the physical distance and face mask wearing of construction workers in covid-19 pandemic. SN Comput. Sci. 2022;3(1):1–8. doi: 10.1007/s42979-021-00894-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Y. Hu, X. Li, Covertheface: face covering monitoring and demonstrating using deep learning and statistical shape analysis, arXiv preprint arXiv:2108.10430.
  • 26.N. Petrović, Ð. Kocić, Iot-based system for covid-19 indoor safety monitoring, preprint), IcETRAN.
  • 27.Sathyamoorthy A.J., Patel U., Paul M., Savle Y., Manocha D. Covid surveillance robot: Monitoring social distancing constraints in indoor scenarios. Plos One. 2021;16(12) doi: 10.1371/journal.pone.0259713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Yang D., Yurtsever E., Renganathan V., Redmill K.A., Özgüner Ü. A vision-based social distancing and critical density detection system for covid-19. Sensors. 2021;21(13):4608. doi: 10.3390/s21134608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Queiroz L., Oliveira H., Yanushkevich S. 2021 International Conference on Information and Digital Technologies (IDT) IEEE; 2021. Thermal-mask–a dataset for facial mask detection and breathing rate measurement; pp. 142–151. [Google Scholar]
  • 30.Eyiokur F.I., Ekenel H.K., Waibel A. Unconstrained face mask and face-hand interaction datasets: building a computer vision system to help prevent the transmission of covid-19. SIViP. 2022:1–8. doi: 10.1007/s11760-022-02308-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.C. Beyan, M. Bustreo, M. Shahid, G.L. Bailo, N. Carissimi, A. Del Bue, Analysis of face-touching behavior in large scale social interaction dataset, in: ICMI, 2020, pp. 1–10.
  • 32.Damer N., Grebe J.H., Chen C., Boutros F., Kirchbuchner F., Kuijper A. 2020 International Conference of the Biometrics Special Interest Group (BIOSIG) IEEE; 2020. The effect of wearing a mask on face recognition performance: an exploratory study; pp. 1–6. [Google Scholar]
  • 33.A. Anwar, A. Raychowdhury, Masked face recognition for secure authentication, arXiv preprint arXiv:2008.11104.
  • 34.J. Deng, J. Guo, X. An, Z. Zhu, S. Zafeiriou, Masked face recognition challenge: the insightface track report, in: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021, pp. 1437–1444.
  • 35.K. Wang, S. Wang, J. Yang, X. Wang, B. Sun, H. Li, Y. You, Mask aware network for masked face recognition in the wild, in: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021, pp. 1456–1461.
  • 36.Zhang L., Sun L., Yu L., Dong X., Chen J., Cai W., Wang C., Ning X. Arface: attention-aware and regularization for face recognition with reinforcement learning. IEEE Trans. Biom. Behav. Identity Sci. 2021;4(1):30–42. [Google Scholar]
  • 37.W. Wang, Z. Zhao, H. Zhang, Z. Wang, F. Su, Maskout: a data augmentation method for masked face recognition, in: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021, pp. 1450–1455.
  • 38.Yolcu G., Öztel İ. A multi-task deep learning system for face detection and age group classification for masked faces. Sakarya Univ. J. Sci. 2021;25(6):1394–1407. [Google Scholar]
  • 39.R. Golwalkar, N. Mehendale, Age detection with face mask using deep learning and facemasknet-9, SSRN. [DOI] [PMC free article] [PubMed]
  • 40.B. Yang, J. Wu, G. Hattori, Facial expression recognition with the advent of face masks, in: 19th International Conference on Mobile and Ubiquitous Multimedia, 2020, pp. 335–337.
  • 41.Abate A.F., Cimmino L., Mocanu B.-C., Narducci F., Pop F. The limitations for expression recognition in computer vision introduced by facial masks. Multimed. Tools Appl. 2022:1–15. doi: 10.1007/s11042-022-13559-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.J. Wang, Y. Liu, Y. Hu, H. Shi, T. Mei, Facex-zoo: A pytorch toolbox for face recognition, 2021, pp. 1–8.
  • 43.Z. Wang, et al., Masked face recognition dataset and application, arXiv preprint arXiv:2003.09093.
  • 44.Din N.U., Javed K., Bae S., Yi J. A novel gan-based network for unmasking of masked face. IEEE Access. 2020;8:44276–44287. [Google Scholar]
  • 45.Li X., Shao C., Zhou Y., Huang L. 2021 4th International Conference on Robotics, Control and Automation Engineering (RCAE) IEEE; 2021. Face mask removal based on generative adversarial network and texture network; pp. 86–89. [Google Scholar]
  • 46.Coelho L.E., Prates R., Schwartz W.R. 2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) IEEE; 2021. A generative approach for face mask removal using audio and appearance; pp. 239–246. [Google Scholar]
  • 47.Sha Y., Zhang J., Liu X., Wu Z., Shan S. 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) IEEE; 2021. Efficient face alignment network for masked face; pp. 1–6. [Google Scholar]
  • 48.Hu H., Wang C., Jiang T., Guo Z., Han Y. 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) IEEE; 2021. Robust and efficient facial landmark localization; pp. 1–7. [Google Scholar]
  • 49.Organization W.H., et al. World Health Organization; 2020. Advice on the use of masks in the context of covid-19: interim guidance, 5 June 2020, Tech. rep. [Google Scholar]
  • 50.Joshi A.S., Joshi S.S., Kanahasabai G., Kapil R., Gupta S. 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN) IEEE; 2020. Deep learning framework to detect face masks from video footage; pp. 435–440. [Google Scholar]
  • 51.Cabani A., Hammoudi K., Benhabiles H., Melkemi M. Maskedface-net–a dataset of correctly/incorrectly masked face images in the context of covid-19. Smart Health. 2021;19 doi: 10.1016/j.smhl.2020.100144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Nagrath P., Jain R., Madan A., Arora R., Kataria P., Hemanth J. Ssdmnv2: A real time dnn-based face mask detection system using single shot multibox detector and mobilenetv2. Sustain. Cities Soc. 2021;66 doi: 10.1016/j.scs.2020.102692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.S. Yang, P. Luo, C.-C. Loy, X. Tang, Wider face: A face detection benchmark, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp. 5525–5533.
  • 54.Nieto-Rodríguez A., Mucientes M., Brea V.M. In: Pattern Recognition and Image Analysis. Paredes R., Cardoso J.S., Pardo X.M., editors. Springer International Publishing; Cham: 2015. System for medical mask detection in the operating room through facial attributes; pp. 138–145. [Google Scholar]
  • 55.Viola P., Jones M.J. Robust real-time face detection. Int. J. Comput. Vis. 2004;57(2):137–154. [Google Scholar]
  • 56.S. Ge, J. Li, Q. Ye, Z. Luo, Detecting masked faces in the wild with lle-cnns, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017, pp. 2682–2690.
  • 57.Fan X., Jiang M. 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC) IEEE; 2021. Retinafacemask: A single stage face mask detector for assisting control of the covid-19 pandemic; pp. 832–837. [Google Scholar]
  • 58.J. Deng, J. Guo, E. Ververas, I. Kotsia, S. Zafeiriou, Retinaface: Single-shot multi-level face localisation in the wild, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2020, pp. 5203–5212.
  • 59.J. Redmon, A. Farhadi, Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767.
  • 60.M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv 2: Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2018, pp. 4510–4520.
  • 61.Loey M., Manogaran G., Taha M.H.N., Khalifa N.E.M. Fighting against covid-19: A novel deep learning model based on yolo-v2 with resnet-50 for medical face mask detection. Sustain. Cities Soc. 2021;65 doi: 10.1016/j.scs.2020.102600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.J. Redmon, A. Farhadi, Yolo9000: better, faster, stronger, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017, pp. 7263–7271.
  • 63.K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp. 770–778.
  • 64.K. Zhang, Z. Zhang, Z. Li, Y. Qiao, Joint face detection and alignment using multitask cascaded convolutional networks, IEEE Signal Proc. Lett. 23 (10).
  • 65.Wang B., Zhao Y., Chen C.P. Hybrid transfer learning and broad learning system for wearing mask detection in the covid-19 era. IEEE Trans. Instrum. Meas. 2021;70:1–12. doi: 10.1109/TIM.2021.3069844. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Ren S., He K., Girshick R., Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015;28:1–9. doi: 10.1109/TPAMI.2016.2577031. [DOI] [PubMed] [Google Scholar]
  • 67.C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp. 2818–2826.
  • 68.Roy B., Nandy S., Ghosh D., Dutta D., Biswas P., Das T. Moxa: A deep learning based unmanned approach for real-time monitoring of people wearing medical masks. Trans. Indian Natl Acad. Eng. 2020;5(3):509–518. doi: 10.1007/s41403-020-00157-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C.-Y., Berg A.C. European conference on computer vision (ECCV) Springer; 2016. Ssd: Single shot multibox detector; pp. 21–37. [Google Scholar]
  • 70.Tan M., Le Q. International conference on machine learning (ICML) PMLR; 2019. Efficientnet: Rethinking model scaling for convolutional neural networks; pp. 6105–6114. [Google Scholar]
  • 71.Jiang X., Gao T., Zhu Z., Zhao Y. Real-time face mask detection method based on yolov3. Electronics. 2021;10(7):837. [Google Scholar]
  • 72.A. Kantarcı, F. Ofli, M. Imran, H.K. Ekenel, Bias aware face mask detection dataset, arXiv preprint arXiv:2211.01207.
  • 73.G. Jocher, et al. ultralytics/yolov5: v5.0 - YOLOv5-P6 1280 models, AWS, Supervise.ly and YouTube integrations (Apr. 2021). 10.5281/zenodo.4679653. [DOI]
  • 74.Qin B., Li D. Identifying facemask-wearing condition using image super-resolution with classification network to prevent covid-19. Sensors. 2020;20(18):5236. doi: 10.3390/s20185236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Kim J., Choi J., Yi J., Turk M.A. Effective representation using ICA for face recognition robust to local distortion and partial occlusion. IEEE Trans. Pattern Anal. Mach. Intell. 2005;27(12):1977–1981. doi: 10.1109/TPAMI.2005.242. [DOI] [PubMed] [Google Scholar]
  • 76.Ou W., You X., Tao D., Zhang P., Tang Y., Zhu Z. Robust face recognition via occlusion dictionary learning. Pattern Recogn. 2014;47(4):1559–1572. [Google Scholar]
  • 77.L. Song, D. Gong, Z. Li, C. Liu, W. Liu, Occlusion robust face recognition based on mask learning with pairwise differential siamese network, in: IEEE/CVF International Conference on Computer Vision, ICCV, 2019, pp. 773–782.
  • 78.Qiu H., Gong D., Li Z., Liu W., Tao D. End2end occluded face recognition by masking corrupted features. IEEE Trans. Pattern Anal. Mach. Intell. 2021:1–9. doi: 10.1109/TPAMI.2021.3098962. [DOI] [PubMed] [Google Scholar]
  • 79.Neto P.C., Pinto J.R., Boutros F., Damer N., Sequeira A.F., Cardoso J.S. Beyond masks: On the generalization of masked face recognition models to occluded face recognition. IEEE Access. 2022;10:86222–86233. [Google Scholar]
  • 80.Neto P.C., Boutros F., Pinto J.R., Damer N., Sequeira A.F., Cardoso J.S., Bengherabi M., Bousnat A., Boucheta S., Hebbadj N., Erakin M.E., Demir U., Ekenel H.K., de Queiroz Vidal P.B., Menotti D. IJCB. IEEE; 2022. OCFR 2022: Competition on occluded face recognition from synthetically generated structure-aware occlusions. [Google Scholar]
  • 81.Erakın M.E., Demir U., Ekenel H.K. 2021 International Conference of the Biometrics Special Interest Group (BIOSIG) IEEE; 2021. On recognizing occluded faces in the wild; pp. 1–5. [Google Scholar]
  • 82.Damer N., Boutros F., Süßmilch M., Kirchbuchner F., Kuijper A. Extended evaluation of the effect of real and simulated masks on face recognition performance. Iet Biom. 2021;10(5):548–561. doi: 10.1049/bme2.12044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.M. Ngan, P. Grother, K. Hanaoka, Ongoing face recognition vendor test (frvt) part 6b: Face recognition accuracy with face masks using post-covid-19 algorithms (2020). 10.6028/NIST.IR.8331. [DOI]
  • 84.Arun Vemury and fake Hasselgren and John Howard and Yevgeniy Sirotin, 2020 biometric rally results - face masks face recognition performance, https://mdtf.org/Rally2020/Results 2020, last accessed: day (2020).
  • 85.Damer N., Boutros F., Süßmilch M., Fang M., Kirchbuchner F., Kuijper A. Masked face recognition: Human versus machine. IET Biom. 2022;11(5):512–528. [Google Scholar]
  • 86.Y. Li, S. Liu, J. Yang, M.-H. Yang, Generative face completion, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017, pp. 3911–3919.
  • 87.Song L., Cao J., Song L., Hu Y., He R. AAAI. AAAI Press; 2019. Geometry-aware face completion and editing; pp. 2506–2513. [Google Scholar]
  • 88.Han C., Wang J. Face image inpainting with evolutionary generators. IEEE Signal Process. Lett. 2021;28:190–193. [Google Scholar]
  • 89.Niu Z., Li H., Li Y., Mei Y., Yang J. An adaptive face image inpainting algorithm based on feature symmetry. Symmetry. 2020;12(2):190. [Google Scholar]
  • 90.Zhang X., Wang X., Shi C., Yan Z., Li X., Kong B., Lyu S., Zhu B., Lv J., Yin Y., Song Q., Wu X., Mumtaz I. De-gan: Domain embedded gan for high quality face image inpainting. Pattern Recogn. 2022;124 [Google Scholar]
  • 91.Jiang Y., Yang F., Bian Z., Lu C., Xia S. Mask removal: Face inpainting via attributes. Multimed. Tools Appl. 2022:1–13. doi: 10.1007/s11042-022-12912-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Li C., Ge S., Hua Y., Liu H., Jin X. International Symposium on Artificial Intelligence and Robotics. Springer; 2018. Occluded face recognition by identity-preserving inpainting; pp. 427–437. [Google Scholar]
  • 93.Boutros F., Damer N., Kirchbuchner F., Kuijper A. Self-restrained triplet loss for accurate masked face recognition. Pattern Recogn. 2022;124 doi: 10.1016/j.patcog.2021.108473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Neto P.C., Boutros F., Pinto J.R., Darner N., Sequeira A.F., Cardoso J.S. 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) IEEE; 2021. Focusface: Multi-task contrastive learning for masked face recognition; pp. 01–08. [Google Scholar]
  • 95.F. Schroff, D. Kalenichenko, J. Philbin, Facenet: A unified embedding for face recognition and clustering, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2015, pp. 815–823.
  • 96.F. Boutros, N. Damer, F. Kirchbuchner, A. Kuijper, Elasticface: Elastic margin loss for deep face recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022, pp. 1578–1587.
  • 97.J. Deng, J. Guo, N. Xue, S. Zafeiriou, Arcface: Additive angular margin loss for deep face recognition, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2019, pp. 4690–4699.
  • 98.Deng H., Feng Z., Qian G., Lv X., Li H., Li G. Mfcosface: a masked-face recognition algorithm based on large margin cosine loss. Appl. Sci. 2021;11(16):7310. [Google Scholar]
  • 99.Huber M., Boutros F., Kirchbuchner F., Damer N. 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) IEEE; 2021. Mask-invariant face recognition through template-level knowledge distillation; pp. 1–8. [Google Scholar]
  • 100.H. Qian, P. Zhang, S. Ji, S. Cao, Y. Xu, Improving representation consistency with pairwise loss for masked face recognition, in: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021, pp. 1462–1467.
  • 101.M. Geng, P. Peng, Y. Huang, Y. Tian, Masked face recognition with generative data augmentation and domain constrained ranking, in: Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 2246–2254.
  • 102.Hsu G.-S.J., Wu H.-Y., Tsai C.-H., Yanushkevich S., Gavrilova M.L. Masked face recognition from synthesis to reality. IEEE Access. 2022;10:37938–37952. [Google Scholar]
  • 103.W. Chang, M. Tsai, S. Lo, Ressanet: a hybrid backbone of residual block and self-attention module for masked face recognition, in: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021, pp. 1468–1476.
  • 104.Li Y., Guo K., Lu Y., Liu L. Cropping and attention based approach for masked face recognition. Appl. Intell. 2021;51(5):3012–3025. doi: 10.1007/s10489-020-02100-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Nguyen H.M., Reddy N., Rattani A., Derakhshani R. ICPR Workshops (8) vol. 12668. Springer; 2020. VISOB 2.0 - the second international competition on mobile ocular biometric recognition; pp. 200–208. (Lect. Notes Comput. Sci.). [Google Scholar]
  • 106.Alonso-Fernandez F., Bigün J. A survey on periocular biometrics research. Pattern Recognit. Lett. 2016;82:92–105. [Google Scholar]
  • 107.Boutros F., Damer N., Raja K.B., Kirchbuchner F., Kuijper A. Template-driven knowledge distillation for compact and accurate periocular biometrics deep-learning models. Sensors. 2022;22(5):1921. doi: 10.3390/s22051921. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Boutros F., Damer N., Raja K.B., Ramachandra R., Kirchbuchner F., Kuijper A. Iris and periocular biometrics for head mounted displays: Segmentation, recognition, and synthetic data generation. Image Vis. Comput. 2020;104 [Google Scholar]
  • 109.S. Dharanesh, A. Rattani, Post-covid-19 mask-aware face recognition system, in: 2021 IEEE International Symposium on Technologies for Homeland Security (HST), 2021, pp. 1–7.
  • 110.Ardiansyah, D.Y. Liliana, Facial biometric identification in the masked face, in: 2021 13th International Conference on Information Communication Technology and System (ICTS), 2021, pp. 129–133.
  • 111.M. Junayed, A. Sadeghzadeh, M. Islam, Deep covariance feature and cnn-based end-to-end masked face recognition, in: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), 2021, pp. 1–8.
  • 112.R. Sharma, A. Ross, Periocular biometrics and its relevance to partially masked faces: A survey, CoRR abs/2203.15203.
  • 113.Boutros F., Damer N., Kolf J.N., Raja K.B., Kirchbuchner F., Ramachandra R., Kuijper A., Fang P., Zhang C., Wang F., Montero D., Aginako N., Sierra B., Nieto M., Erakin M.E., Demir U., Ekenel H.K., Kataoka A., Ichikawa K., Kubo S., Zhang J., He M., Han D., Shan S., Grm K., Struc V., Seneviratne S., Kasthuriarachchi N., Rasnayaka S., Neto P.C., Sequeira A.F., Pinto J.R., Saffari M., Cardoso J.S. International IEEE Joint Conference on Biometrics, IJCB. IEEE; 2021. MFR 2021: Masked face recognition competition; pp. 1–10. [Google Scholar]
  • 114.International IEEE Joint Conference on Biometrics, IJCB 2021, Shenzhen, China, August 4–7, 2021, IEEE, 2021. 10.1109/IJCB52358.2021. [DOI]
  • 115.IEEE/CVF International Conference on Computer Vision Workshops, ICCVW, IEEE, 2021. 10.1109/ICCVW54120.2021. [DOI]
  • 116.ISO/IEC JTC1 SC37 Biometrics, ISO/IEC 30107-3:2017 Information technology — Biometric presentation attack detection — Part 3: Testing and reporting, International Organization for Standardization, 2016.
  • 117.Damer N., Saladie A.M., Zienert S., Wainakh Y., Terhörst P., Kirchbuchner F., Kuijper A. 2019 International Conference on Biometrics (ICB) IEEE; 2019. To detect or not to detect: The right faces to morph; pp. 1–8. [Google Scholar]
  • 118.N. Damer, C.A.F. López, M. Fang, N. Spiller, M.V. Pham, F. Boutros, Privacy-friendly synthetic data for the development of face morphing attack detectors, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022.
  • 119.Drozdowski P., Grobarek S., Schurse J., Rathgeb C., Stockhardt F., Busch C. 9th IEEE International Workshop on Biometrics and Forensics, IWBF. IEEE; 2021. Makeup presentation attack potential revisited: Skills pay the bills; pp. 1–6. [Google Scholar]
  • 120.Damer N., Wainakh Y., Boller V., von den Berken S., Terhörst P., Braun A., Kuijper A. 9th IEEE International Conference on Biometrics Theory, Applications and Systems, BTAS. IEEE; 2018. Crazyfaces: Unassisted circumvention of watchlist face identification; pp. 1–9. [Google Scholar]
  • 121.Raghavendra R., Busch C. Presentation attack detection methods for face recognition systems: A comprehensive survey. ACM Comput. Surv. 2017;50(1):8:1–8:37. [Google Scholar]
  • 122.Peng F., Qin L., Long M. Face presentation attack detection using guided scale texture. Multimed. Tools Appl. 2018;77(7):8883–8909. [Google Scholar]
  • 123.Damer N., Dimitrov K. In: Proceedings of the British Machine Vision Conference 2016, BMVC. Wilson R.C., Hancock E.R., Smith W.A.P., editors. BMVA Press; 2016. Practical view on face presentation attack detection. [Google Scholar]
  • 124.Raghavendra R., Raja K.B., Marcel S., Busch C. Sixth International Conference on Image Processing Theory, Tools and Applications, IPTA. IEEE; 2016. Face presentation attack detection across spectrum using time-frequency descriptors of maximal response in laplacian scale-space; pp. 1–6. [Google Scholar]
  • 125.Fang M., Damer N., Kirchbuchner F., Kuijper A. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) IEEE; 2022. Learnable multi-level frequency decomposition and hierarchical attention mechanism for generalized face presentation attack detection; pp. 1131–1140. [Google Scholar]
  • 126.Fang M., Damer N., Kirchbuchner F., Kuijper A. Real masks and spoof faces: On the masked face presentation attack detection. Pattern Recogn. 2022;123 doi: 10.1016/j.patcog.2021.108398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Fang M., Boutros F., Kuijper A., Damer N. 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) IEEE; 2021. Partial attack supervision and regional weighted inference for masked face presentation attack detection; pp. 1–8. [Google Scholar]
  • 128.ISO/IEC JTC1 SC37 Biometrics, ISO/IEC 29794-1:2016 Information technology - Biometric sample quality - Part 1: Framework, International Organization for Standardization, 2016.
  • 129.Best-Rowden L., Jain A.K. Learning face image quality from human assessments. IEEE Trans. Inf. Forensics Secur. 2018;13(12):3064–3077. [Google Scholar]
  • 130.ISO/IEC JTC1 SC37 Biometrics, ISO/IEC 2382-37:2017 Information technology - Vocabulary - Part 37: Biometrics, International Organization for Standardization, 2017.
  • 131.Hernandez-Ortega J., Galbally J., Fiérrez J., Haraksim R., Beslay L. 2019 International Conference on Biometrics (ICB) IEEE; 2019. Faceqnet: Quality assessment for face recognition based on deep learning; pp. 1–8. [Google Scholar]
  • 132.F. Ou, X. Chen, R. Zhang, Y. Huang, S. Li, J. Li, Y. Li, L. Cao, Y. Wang, SDD-FIQA: unsupervised face image quality assessment with similarity distribution distance, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 7670–7679.
  • 133.P. Terhörst, J.N. Kolf, N. Damer, F. Kirchbuchner, A. Kuijper, SER-FIQ: unsupervised estimation of face image quality based on stochastic embedding robustness, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 5650–5659.
  • 134.Q. Meng, S. Zhao, Z. Huang, F. Zhou, Magface: A universal representation for face recognition and quality assessment, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 14225–14234.
  • 135.F. Boutros, M. Fang, M. Klemt, B. Fu, N. Damer, CR-FIQA: face image quality assessment by learning sample relative classifiability, CoRR abs/2112.06592.
  • 136.Fu B., Chen C., Henniger O., Damer N. IEEE/CVF Winter Conference on Applications of Computer Vision, WACV. IEEE; 2022. A deep insight into measuring face image utility with general and face-specific image quality metrics; pp. 1121–1130. [Google Scholar]
  • 137.Terhörst P., Kolf J.N., Damer N., Kirchbuchner F., Kuijper A. IEEE International Joint Conference on Biometrics, IJCB. IEEE; 2020. Face quality estimation and its correlation to demographic and non-demographic bias in face recognition; pp. 1–11. [Google Scholar]
  • 138.Fu B., Kirchbuchner F., Damer N. 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) IEEE; 2021. The effect of wearing a face mask on face image quality; pp. 1–8. [Google Scholar]
  • 139.Cohn J.F., Zlochower A.J., Lien J., Kanade T. Automated face analysis by feature point tracking has high concurrent validity with manual facs coding. Psychophysiology. 1999;36(1):35–43. doi: 10.1017/s0048577299971184. [DOI] [PubMed] [Google Scholar]
  • 140.Valstar M., Pantic M. 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW’06) IEEE; 2006. Fully automatic facial action unit detection and temporal analysis. pp. 149–149. [Google Scholar]
  • 141.Zhi R., Liu M., Zhang D. A comprehensive survey on automatic facial action unit analysis. Vis. Comput. 2020;36(5):1067–1093. [Google Scholar]
  • 142.P. Ekman, W.V. Friesen, Facial action coding system, Environmental Psychology & Nonverbal Behavior.
  • 143.G.B. Huang, M. Mattar, T. Berg, E. Learned-Miller, Labeled faces in the wild: A database forstudying face recognition in unconstrained environments, in: Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, 2008, pp. 1–11.
  • 144.P. Barros, A. Sciutti, I only have eyes for you: The impact of masks on convolutional-based facial expression recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 1226–1231.
  • 145.Mollahosseini A., Hasani B., Mahoor M.H. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 2017;10(1):18–31. [Google Scholar]
  • 146.Barros P., Churamani N., Sciutti A. The facechannel: a fast and furious deep neural network for facial expression recognition. SN Comput. Sci. 2020;1(6):1–10. doi: 10.1007/s42979-020-00325-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147.Yang B., Jianming W., Hattori G. 2021 IEEE International Conference on Image Processing (ICIP) IEEE; 2021. Face mask aware robust facial expression recognition during the covid-19 pandemic; pp. 240–244. [Google Scholar]
  • 148.Wang K., Peng X., Yang J., Meng D., Qiao Y. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 2020;29:4057–4069. doi: 10.1109/TIP.2019.2956143. [DOI] [PubMed] [Google Scholar]
  • 149.Li Y., Zeng J., Shan S., Chen X. Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans. Image Process. 2018;28(5):2439–2450. doi: 10.1109/TIP.2018.2886767. [DOI] [PubMed] [Google Scholar]
  • 150.I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.C. Courville, Y. Bengio, Generative adversarial nets, in: NIPS, 2014.
  • 151.Z. Zhang, Y. Song, H. Qi, Age progression/regression by conditional adversarial autoencoder, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2017, pp. 5810–5818.
  • 152.King D.E. Dlib-ml: A machine learning toolkit. J. Mach. Learn. Res. 2009;10:1755–1758. [Google Scholar]
  • 153.S. Honari, P. Molchanov, S. Tyree, P. Vincent, C. Pal, J. Kautz, Improving landmark localization with semi-supervised learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 1–9.
  • 154.Xiang M., Liu Y., Liao T., Zhu X., Yang C., Liu W., Shi H. 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) IEEE; 2021. The 3rd grand challenge of lightweight 106-point facial landmark localization on masked faces; pp. 1–6. [Google Scholar]
  • 155.Wen T., Ding Z., Yao Y., Ge Y., Qian X., et al. 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) IEEE; 2021. Towards efficient masked-face alignment via cascaded regression; pp. 1–5. [Google Scholar]
  • 156.M. Al-Sa’d, S. Kiranyaz, I. Ahmad, C. Sundell, M. Vakkuri, M. Gabbouj, A Social Distance Estimation and Crowd Monitoring System for Surveillance Cameras, Sensors 22 (2). [DOI] [PMC free article] [PubMed]
  • 157.I.J.C. Valencia, E.P. Dadios, A.M. Fillone, J.C.V. Puno, R.G. Baldovino, R.K.C. Billones, Vision-based Crowd Counting and Social Distancing Monitoring using Tiny-YOLOv4 and DeepSORT, in: 2021 IEEE International Smart Cities Conference (ISC2), 2021, pp. 1–7.
  • 158.P. Somaldo, F.A. Ferdiansyah, G. Jati, W. Jatmiko, Developing Smart COVID-19 Social Distancing Surveillance Drone using YOLO Implemented in Robot Operating System simulation environment, in: 2020 IEEE 8th R10 Humanitarian Technology Conference (R10-HTC), 2020, pp. 1–6.
  • 159.M.-N. Nguyen, V.-H. Tran, T.-N. Huynh, Depth Embedded and Dense Dilated Convolutional Network for Crowd Density Estimation, in: 2021 International Conference on System Science and Engineering (ICSSE), 2021, pp. 221–225.
  • 160.K.J. Almalki, M. Mohzary, B.-Y. Choi, S. Song, Y. Chen, Mosaic: Modeling Safety Index in Crowd by Detecting Face Masks against COVID-19 and Beyond, in: 2021 IEEE International Smart Cities Conference (ISC2), 2021, pp. 1–7.
  • 161.Y. He, Y. Xia, Y. Wang, B. Yin, Jointly Attention Network for Crowd Counting, Neurocomputing.
  • 162.Dosi M., Thakral K., Mittal S., Vatsa M., Singh R. 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) IEEE; 2021. Aecnet: Attentive efficientnet for crowd counting; pp. 1–8. [Google Scholar]
  • 163.Alvarez L.M., Llanos M.J., Obrero J.R., Feliscuzo L.S. 2021 1st International Conference in Information and Computing Research (iCORE) IEEE; 2021. Physical distance and crowd monitoring system using yolov3; pp. 139–144. [Google Scholar]
  • 164.Kammoun Jarraya S., Alotibi M., Ali M. A Deep-CNN Crowd Counting Model for Enforcing Social Distancing during COVID19 Pandemic: Application to Saudi Arabia’s Public Places. Comput. Mater. Contin. 2021;66:1315–1328. [Google Scholar]
  • 165.P.N. Amin, S.S. Moghe, S.N. Prabhakar, C.M. Nehete, Deep Learning Based Face Mask Detection and Crowd Counting, in: 2021 6th International Conference for Convergence in Technology (I2CT), 2021, pp. 1–5.
  • 166.B. Li, H. Huang, Z. Ang, P. Liu, C. Liu, Approaches on crowd counting and density estimation: a review, Pattern Analysis and Applications 24.
  • 167.Wang Q., Gao J., Lin W., Li X. NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization. IEEE Trans. Pattern Anal. Mach. Intell. 2021;43(06):2141–2149. doi: 10.1109/TPAMI.2020.3013269. [DOI] [PubMed] [Google Scholar]
  • 168.Sabzmeydani P., Mori G. 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) IEEE; 2007. Detecting pedestrians by learning shapelet features; pp. 1–8. [Google Scholar]
  • 169.N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 2005, pp. 886–893.
  • 170.Lin S.-F., Chen J.-Y., Chao H.-X. Estimation of number of people in crowded scenes using perspective transformation. IEEE Trans. Syst. Man Cybern. - Part A: Syst. Humans. 2001;31(6):645–654. [Google Scholar]
  • 171.L. Fiaschi, U. Koethe, R. Nair, F.A. Hamprecht, Learning to count with regression forest and structured labels, in: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), 2012, pp. 2685–2688.
  • 172.D. Ryan, S. Denman, C. Fookes, S. Sridharan, Crowd Counting Using Multiple Local Features, in: 2009 Digital Image Computing: Techniques and Applications, 2009, pp. 81–88.
  • 173.V. Pham, T. Kozakaya, O. Yamaguchi, R. Okada, COUNT Forest: CO-Voting Uncertain Number of Targets Using Random Forest for Crowd Density Estimation, in: 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 3253–3261.
  • 174.C. Arteta, V.S. Lempitsky, J.A. Noble, A. Zisserman, Interactive Object Counting, in: European Conference on Computer Vision (ECCV), 2014, pp. 504–518.
  • 175.Fan Z., Zhang H., Zhang Z., Lu G., Zhang Y., Wang Y. A survey of crowd counting and density estimation based on convolutional neural network. Neurocomputing. 2022;472:224–251. [Google Scholar]
  • 176.C. Li, R. Wang, J. Li, L. Fei, Face Detection Based on YOLOv3, Recent Trends in Intelligent Computing, Communication and Devices.
  • 177.A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv:1704.04861.
  • 178.K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556.
  • 179.Wojke N., Bewley A., Paulus D. 2017 IEEE international conference on image processing (ICIP) IEEE; 2017. Simple online and realtime tracking with a deep association metric; pp. 3645–3649. [Google Scholar]
  • 180.Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, Y.A. Sheikh, OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, IEEE Trans. Pattern Anal. Mach. Intell. [DOI] [PubMed]
  • 181.L. Pan, M. Mu, P. Yang, Y. Sun, R. Wang, J. Yan, P. Li, B. Hu, J. Wang, C. Hu, et al., Clinical characteristics of covid-19 patients with digestive symptoms in hubei, china: a descriptive, cross-sectional, multicenter study, Am. J. Gastroenterol. 115. [DOI] [PMC free article] [PubMed]
  • 182.Natarajan A., Su H.-W., Heneghan C., Blunt L., O’Connor C., Niehaus L. Measurement of respiratory rate using wearable devices and applications to covid-19 detection. NPJ Digit. Med. 2021;4(1):1–10. doi: 10.1038/s41746-021-00493-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 183.A.T. Purnomo, D.-B. Lin, T. Adiprabowo, W.F. Hendria, Non-contact monitoring and classification of breathing pattern for the supervision of people infected by covid-19, Sensors 21 (9). [DOI] [PMC free article] [PubMed]
  • 184.Perpetuini D., Filippini C., Cardone D., Merla A. An overview of thermal infrared imaging-based screenings during pandemic emergencies. Int. J. Environ. Res. Public Health. 2021;18(6):3286. doi: 10.3390/ijerph18063286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 185.Z. Liu, P. Luo, X. Wang, X. Tang, Deep learning face attributes in the wild, in: IEEE International Conference on Computer Vision (ICCV), 2015, pp. 3730–3738.
  • 186.T. Zheng, W. Deng, J. Hu, Cross-age lfw: A database for studying cross-age face recognition in unconstrained environments, arXiv preprint arXiv:1708.08197.
  • 187.Karasugi I.P.A., Williem . European Conference on Computer Vision Workshops (ECCVW) Springer; 2020. Face mask invariant end-to-end face recognition; pp. 261–276. [Google Scholar]
  • 188.T. Mare, G. Duta, M.-I. Georgescu, A. Sandru, B. Alexe, M. Popescu, R.T. Ionescu, A realistic approach to generate masked faces applied on two novel masked face recognition data sets, in: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks 2021), 2021, pp. 1–11.
  • 189.Huang B., Wang Z., Wang G., Jiang K., Zeng K., Han Z., Tian X., Yang Y. ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE; 2021. When face recognition meets occlusion: A new benchmark; pp. 4240–4244. [Google Scholar]
  • 190.C. Wang, H. Fang, Y. Zhong, W. Deng, Mlfw: A database for face recognition on masked faces, arXiv preprint arXiv:2109.05804.
  • 191.Ronneberger O., Fischer P., Brox T. International Conference on Medical image computing and computer-assisted intervention. Springer; 2015. U-net: Convolutional networks for biomedical image segmentation; pp. 234–241. [Google Scholar]
  • 192.Mroueh Y., Sercu T., Goel V. International conference on machine learning (ICML) PMLR; 2017. Mcgan: Mean and covariance feature matching gan; pp. 2527–2535. [Google Scholar]
  • 193.M. Jiang, X. Fan, Retinamask: a face mask detector, arXiv preprint arXiv:2005.03950.
  • 194.Z. Wang, P. Wang, P.C. Louis, L.E. Wheless, Y. Huo, Wearmask: Fast in-browser face mask detection with serverless edge computing for covid-19, arXiv preprint arXiv:2101.00784.
  • 195.Vrigkas M., Kourfalidou E.-A., Plissiti M.E., Nikou C. Facemask: A new image dataset for the automated identification of people wearing masks in the wild. Sensors. 2022;22(3):896. doi: 10.3390/s22030896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 196.S. Mishra, P. Majumdar, M. Dosi, M. Vatsa, R. Singh, Dual sensor indian masked face dataset, in: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), 2021, pp. 1–8.
  • 197.Głowacka N., Rumiński J. Face with mask detection in thermal images using deep neural networks. Sensors. 2021;21(19):6387. doi: 10.3390/s21196387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 198.Ward R.J., Jjunju F.P.M., Kabenge I., Wanyenze R., Griffith E.J., Banadda N., Taylor S., Marshall A. Flunet: An ai-enabled influenza-like warning system. IEEE Sens. J. 2021;21(21):24740–24748. doi: 10.1109/JSEN.2021.3113467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 199.Face mask detection, https://www.kaggle.com/andrewmvd/face-mask-detection, accessed: August 8, 2022.
  • 200.Face mask detection 12k images dataset, https://www.kaggle.com/ashishjangra27/face-mask-12k-images-dataset, accessed: August 8, 2022.
  • 201.Face mask lite dataset, https://www.kaggle.com/prasoonkottarathil/face-mask-lite-dataset, accessed: August 8, 2022.
  • 202.A. Marceddu, R. Ferrero, B. Montrucchio, Ways to wear a mask or a respirator (wwmr-db), 10.21227/8atn-gn55 (2021). 10.21227/8atn-gn55. [DOI]
  • 203.Face mask detection 12k images dataset, https://humansintheloop.org/resources/datasets/medical-mask-dataset/, accessed: August 8, 2022.
  • 204.Aizootech face mask detection dataset, https://github.com/AIZOOTech/FaceMaskDetection, accessed: August 8, 2022.
  • 205.T. Karras, S. Laine, T. Aila, A style-based generator architecture for generative adversarial networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4401–4410. [DOI] [PubMed]
  • 206.Guo Y., Zhang L., Hu Y., He X., Gao J. European conference on computer vision (ECCV) Springer; 2016. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition; pp. 87–102. [Google Scholar]
  • 207.Maze B., Adams J., Duncan J.A., Kalka N., Miller T., Otto C., Jain A.K., Niggel W.T., Anderson J., Cheney J., et al. 2018 international conference on biometrics (ICB) IEEE; 2018. Iarpa janus benchmark-c: Face dataset and protocol; pp. 158–165. [Google Scholar]
  • 208.Abdrakhmanova M., Kuzdeuov A., Jarju S., Khassanov Y., Lewis M., Varol H.A. Speakingfaces: A large-scale multimodal dataset of voice commands with visual and thermal video streams. Sensors. 2021;21(10):3465. doi: 10.3390/s21103465. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 209.Li Z., Hoiem D. Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 2017;40(12):2935–2947. doi: 10.1109/TPAMI.2017.2773081. [DOI] [PubMed] [Google Scholar]
  • 210.D. Lopez-Paz, M. Ranzato, Gradient episodic memory for continual learning, Adv. Neural Inf. Process. Syst. 30.
  • 211.H. Shin, J.K. Lee, J. Kim, J. Kim, Continual learning with deep generative replay, Adv. Neural Inf. Process. Syst. 30.
  • 212.R. Aljundi, K. Kelchtermans, T. Tuytelaars, Task-free continual learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 11254–11263.
  • 213.Parisi G.I., Kemker R., Part J.L., Kanan C., Wermter S. Continual lifelong learning with neural networks: A review. Neural Netw. 2019;113:54–71. doi: 10.1016/j.neunet.2019.01.012. [DOI] [PubMed] [Google Scholar]
  • 214.G.M. Van de Ven, A.S. Tolias, Three scenarios for continual learning, arXiv preprint arXiv:1904.07734.
  • 215.O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al., Matching networks for one shot learning, Adv. Neural Inf. Process. Syst. 29.
  • 216.S. Gidaris, N. Komodakis, Dynamic few-shot visual learning without forgetting, in: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2018, pp. 4367–4375.
  • 217.H. Yin, P. Molchanov, J.M. Alvarez, Z. Li, A. Mallya, D. Hoiem, N.K. Jha, J. Kautz, Dreaming to distill: Data-free knowledge transfer via deepinversion, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8715–8724.
  • 218.C. Zhang, N. Song, G. Lin, Y. Zheng, P. Pan, Y. Xu, Few-shot incremental learning with continually evolved classifiers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 12455–12464.
  • 219.T. Ahmad, A.R. Dhamija, S. Cruz, R. Rabinowitz, C. Li, M. Jafarzadeh, T.E. Boult, Few-shot class incremental learning leveraging self-supervised features, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 3900–3910.
  • 220.Drozdowski P., Rathgeb C., Dantcheva A., Damer N., Busch C. Demographic bias in biometrics: A survey on an emerging challenge. IEEE Trans. Technol. Soc. 2020;1(2):89–103. [Google Scholar]
  • 221.R. Heilweil, Big tech companies back away from selling facial recognition to police. That’s progress., https://www.vox.com/recode/2020/6/10/21287194/amazon-microsoft-ibm-facial-recognition-moratorium-police, accessed: August 8, 2022 (2020).
  • 222.B. Meden, P. Rot, P. Terhörst, N. Damer, A. Kuijper, W.J. Scheirer, A. Ross, P. Peer, V. Štruc, Privacy–enhancing face biometrics: A comprehensive survey, IEEE Trans. Inf. Forensics Secur.
  • 223.A. Puc, V. Štruc, K. Grm, Analysis of race and gender bias in deep age estimation models, in: 2020 28th European Signal Processing Conference (EUSIPCO), 2021, pp. 830–834.
  • 224.J.P. Robinson, G. Livitz, Y. Henon, C. Qin, Y. Fu, S. Timoner, Face recognition: too bias, or not too bias?, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020.
  • 225.Cavazos J.G., Phillips P.J., Castillo C.D., O’Toole A.J. Accuracy comparison across face recognition algorithms: Where are we on measuring race bias? IEEE Trans. Biom. Behav. Identity Sci. 2020;3(1):101–111. doi: 10.1109/TBIOM.2020.3027269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 226.Albiero V., Zhang K., King M.C., Bowyer K.W. Gendered differences in face recognition accuracy explained by hairstyles, makeup, and facial morphology. IEEE Trans. Inf. Forensics Secur. 2021;17:127–137. [Google Scholar]
  • 227.Babnik Z., Štruc V. 2022 30th European Signal Processing Conference (EUSIPCO) IEEE; 2022. Assessing bias in face image quality assessment; pp. 1037–1041. [Google Scholar]
  • 228.J. Yu, X. Hao, Z. Cui, P. He, T. Liu, Boosting fairness for masked face recognition, in: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (CVPRW), 2021, pp. 1531–1540.
  • 229.Privacy measures of biometrics businesses, Technical report, NEC Technical Journal (2018).
  • 230.Bukaty P. IT Governance Ltd.; 2019. The California Consumer Privacy Act (CCPA): An Implementation Guide. [Google Scholar]
  • 231.740 ILCS/14, Biometric Information Privacy Act (BIPA), Public act 095-994, Illinois General Assembly (2008).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No data was used for the research described in the article.


Articles from Image and Vision Computing are provided here courtesy of Elsevier

RESOURCES