Table 8.
Comparative analysis of techniques to mitigate modular(multiple) face identity threats
| Type/Focus on | Ref. | Concept | Methodology Used | Dataset Used | Performance | Limitation |
|---|---|---|---|---|---|---|
| Multiple FIT | [112] | Visual cues and head motion-based method for sign language prediction using HMM under partial occlusion and facial expression recognition is proposed. | FE-BN, PPCA, HMM, CLS- NN, Euclidean distance | Self-created video sequences with motion face feature, facial expression dataset | The best RR of 74% is obtained with the Gold tracks, while Bayes tracks and KLT tracks yielded 63% and 59%, respectively. | This face tracking tool is unable to detect the face in fast motions, and heavy occlusions |
| [60] | Different feature extraction methods such as LBP, WLD GJD, SIFT, SURF for FR in unconstrained environment are investigated | LBP, WLD, SIFT, SURF & GJD | Equinox and UCH Thermal FF DB | The WLD method outperforms the other methods. | GJD has low RR in outdoor setups represents poor generalization. | |
| [171] | The facial landmark-based unified and robust methods for occlusion and head pose estimation are proposed, referred as FLMD, HPE, and FDF. | SIFT, Regression | BU4DFE, BU, COFW, Multi -PIE (with self-occlusion) databases | Average mean absolute error for HPE-4.4(BU), FLMD- 6.4 (COFW), FLMD- 3.5(Multi-PIE), Overall Normalized error-6.4 (COFW) | Cascading of multiple processes creates a big generalization issue. | |
| [67] | A recurrent deep learning-based method to predict facial expression, age, and gender through mimicking the human’s activity using visual effects is proposed. | CNN+ RNN + spatiotemporal manifold | Facial Expression, Gender and Age dataset | 91.36% overall average accuracy | It does not support video based face attribute recognition. | |
| [81] | The three significant issues are identified and countermeasures are proposed for one image per subject, face occlusion, and various facial expressions. | Distance matrix, Thresholding, Affine-wrapping | ORL | NA | Modularity of occlusion with FER affects the results drastically. | |
| [154] | A method to detect the facial landmarks in 3D shape using features and model based technique is proposed. | Active Normal Model with CNN, 3D morphable model, cascade regression | BU-3DFE, Bosphorus, BU-4DFE | Overall Mean error is 3.33%, and Standard Deviation is 2.08% | Model free tracking and object localization are the practical problems. | |
| [142] | A SIFT feature extraction technique to extract regional gradient information along with SVM and PCA-based classification for facial expressions is proposed. | SIFT, SVM with PCA | Video-based CK database for FER, 15 volunteers are considered with various illumination, poses, and facial moisture level. | The average recognition accuracies are 98.52% and 94.47% (no additional light sources), and 96.97% and 95.40% (two additional light sources), respectively. Dry face recognition outperforms wet faces. | This method is not suitable for pose variation above 15% in yaw, pitch and roll direction. | |
| [87] | A method consisting of three tasks, first is to detect edge and color feature (histogram), second is enhancement (self-adaptive feature fusion), and third is to upgrade the object model (drifting) is proposed. | Fusion of Color, edge orientation histogram and self-adaptive Bayesian estimation | Self-created dataset | Running time results for the task 1 and 3 are 61.6 and 109.9, respectively | The obtained results were found good only when the initial template is well defined. | |
| [140] | A deep orientation-based method to estimate four different tasks (i.e., voxel, occlusion invariance, 3D mesh and landmark) is introduced with triplet loss to predict the results. | Locality Preseving Projection, GAN, Autoencoder | KinectFace, Bosphorus, and UMB-DB | This approach obtains 86.1%, 75.5, 81.3%, 83.9% accuracy for voxel, occlusion, landmark, and with 3D mesh, respectively. | Noise and micro expression analysis cannot be detected | |
| [43] | An efficient, effective boosting-GAN method for large-pose variations and corrupted regions (e.g., nose and eyes) is proposed with two consideration, first is that occlusion is partial, incomplete, and patch-based and the second is an encoder-decoder network (coarse face synthesis) and boosting(face generation) with an aggregation structure. | BoostGAN | Multi-PIE, LFW | RR for single point and random multipoint occlusion with 15% face pose variation are 99.48%, 99.45% respectively | The occlusion by different objects like sunglass, scarf, and mask are not considered. | |
| [167] | A CNN-based method named Region Attention Network is proposed to identify occluded face regions having variation in pose and facial expressions. | CNN (Region Attention Network) | AffectNet, SFEW, FERPlus, and RAF-DB. | RAN performance Occlusion 83.63%, pose (30, 45 degree) 82.23%, 80.40%, respectively. for RAFDB 86.90% | Biased loss is calculated that is more prone to errors. |
FIT- Facial Identity Threats, WLD-Weber Linear Descriptors, GJD- Gabor Jet Descriptors