. 2022 Jun 10;82(2):1669–1748. doi: 10.1007/s11042-022-13248-6

Table 8.

Comparative analysis of techniques to mitigate modular(multiple) face identity threats

Type/Focus on	Ref.	Concept	Methodology Used	Dataset Used	Performance	Limitation
Multiple FIT	[112]	Visual cues and head motion-based method for sign language prediction using HMM under partial occlusion and facial expression recognition is proposed.	FE-BN, PPCA, HMM, CLS- NN, Euclidean distance	Self-created video sequences with motion face feature, facial expression dataset	The best RR of 74% is obtained with the Gold tracks, while Bayes tracks and KLT tracks yielded 63% and 59%, respectively.	This face tracking tool is unable to detect the face in fast motions, and heavy occlusions
	[60]	Different feature extraction methods such as LBP, WLD GJD, SIFT, SURF for FR in unconstrained environment are investigated	LBP, WLD, SIFT, SURF & GJD	Equinox and UCH Thermal FF DB	The WLD method outperforms the other methods.	GJD has low RR in outdoor setups represents poor generalization.
	[171]	The facial landmark-based unified and robust methods for occlusion and head pose estimation are proposed, referred as FLMD, HPE, and FDF.	SIFT, Regression	BU4DFE, BU, COFW, Multi -PIE (with self-occlusion) databases	Average mean absolute error for HPE-4.4(BU), FLMD- 6.4 (COFW), FLMD- 3.5(Multi-PIE), Overall Normalized error-6.4 (COFW)	Cascading of multiple processes creates a big generalization issue.
	[67]	A recurrent deep learning-based method to predict facial expression, age, and gender through mimicking the human’s activity using visual effects is proposed.	CNN+ RNN + spatiotemporal manifold	Facial Expression, Gender and Age dataset	91.36% overall average accuracy	It does not support video based face attribute recognition.
	[81]	The three significant issues are identified and countermeasures are proposed for one image per subject, face occlusion, and various facial expressions.	Distance matrix, Thresholding, Affine-wrapping	ORL	NA	Modularity of occlusion with FER affects the results drastically.
	[154]	A method to detect the facial landmarks in 3D shape using features and model based technique is proposed.	Active Normal Model with CNN, 3D morphable model, cascade regression	BU-3DFE, Bosphorus, BU-4DFE	Overall Mean error is 3.33%, and Standard Deviation is 2.08%	Model free tracking and object localization are the practical problems.
	[142]	A SIFT feature extraction technique to extract regional gradient information along with SVM and PCA-based classification for facial expressions is proposed.	SIFT, SVM with PCA	Video-based CK database for FER, 15 volunteers are considered with various illumination, poses, and facial moisture level.	The average recognition accuracies are 98.52% and 94.47% (no additional light sources), and 96.97% and 95.40% (two additional light sources), respectively. Dry face recognition outperforms wet faces.	This method is not suitable for pose variation above 15% in yaw, pitch and roll direction.
	[87]	A method consisting of three tasks, first is to detect edge and color feature (histogram), second is enhancement (self-adaptive feature fusion), and third is to upgrade the object model (drifting) is proposed.	Fusion of Color, edge orientation histogram and self-adaptive Bayesian estimation	Self-created dataset	Running time results for the task 1 and 3 are 61.6 and 109.9, respectively	The obtained results were found good only when the initial template is well defined.
	[140]	A deep orientation-based method to estimate four different tasks (i.e., voxel, occlusion invariance, 3D mesh and landmark) is introduced with triplet loss to predict the results.	Locality Preseving Projection, GAN, Autoencoder	KinectFace, Bosphorus, and UMB-DB	This approach obtains 86.1%, 75.5, 81.3%, 83.9% accuracy for voxel, occlusion, landmark, and with 3D mesh, respectively.	Noise and micro expression analysis cannot be detected
	[43]	An efficient, effective boosting-GAN method for large-pose variations and corrupted regions (e.g., nose and eyes) is proposed with two consideration, first is that occlusion is partial, incomplete, and patch-based and the second is an encoder-decoder network (coarse face synthesis) and boosting(face generation) with an aggregation structure.	BoostGAN	Multi-PIE, LFW	RR for single point and random multipoint occlusion with 15% face pose variation are 99.48%, 99.45% respectively	The occlusion by different objects like sunglass, scarf, and mask are not considered.
	[167]	A CNN-based method named Region Attention Network is proposed to identify occluded face regions having variation in pose and facial expressions.	CNN (Region Attention Network)	AffectNet, SFEW, FERPlus, and RAF-DB.	RAN performance Occlusion 83.63%, pose (30, 45 degree) 82.23%, 80.40%, respectively. for RAFDB 86.90%	Biased loss is calculated that is more prone to errors.

FIT- Facial Identity Threats, WLD-Weber Linear Descriptors, GJD- Gabor Jet Descriptors