. 2022 Jun 10;82(2):1669–1748. doi: 10.1007/s11042-022-13248-6

Table 7.

Comparative analysis of techniques to reduce the impact of extrinsic factors

Type/Focus on	Ref.	Concept	Methodology Used	Dataset Used	Performance	Limitation
Pose	[23]	Thermal images-based method to recognize the face-biometric using contour and morphology with blood vessel network.	PCA, Bayesian Network	Synthesized DB for multi-pose (thermal face), UMD database	The best matching score is 83%	Fake vascular contours contribute in matching process with poor results
	[18]	A detailed review on various recent methodology and taxonomy under varying face poses is presented.	Low level, motion, shape, 3D, CLM, CQF, AAM	AR, LPFW	CLM,CQF,AAM shows better results among other SOTA approaches	Not effective for heavy occlusion and varying illumination condition.
	[183]	A contextually discriminative feature and structural loss function-based deep approach to detect various face poses.	CNN, Structural, contextual, Euclidian loss	LFW and Net, UMD face	For AR database, Mean error 3.26%, Standard Deviation 0.83%	Not provide good result for yaw displacement.
	[133]	A functional regression solution for the least square problem is introduced to predict shape displacement.	iCCR Algorithm, cascade regression, Monte-Carlo sampling	300-VW dataset	20 times faster, real face tracking as compared to other recent approaches.	Not efficient with Pose variance, illumination, expression.
	[38]	A novel metric learning approach to reduce synthesized variation for single training image. In addition, a multi-depth extended mode of genetic elastic model is developed to handle illumination variations.	3D Multi-depth generic elastic model in association of extension (3D-EGEM), Linear regression	Multi-PIE database	This method obtains average accuracy of 99.3% with Multi-PIE database.	It works on single training image, thus generalization for deep learning.
	[86]	An end-to-end pipeline-based AFFAIR method is proposed to achieve three tasks: learning global transformation, identifying the face location, and merging of local and global features to get robust attribute.	AFFAIR	CelebA, LFWA, MTFL	86.55% Average accuracy among gender, smile, glass and pose	Fixed number of facial point is considering.
	[172]	A review on facial LMD approaches consists of holistic (global facial shape and appearance), CLM- (local appearance), regression-(implicitly capturing of facial shape and appearance).	Holistic, constrained local model (CLM) regression based method.	BioID, AR, Extended Yale-B, FERET, CK/CK+, Multi-PIE, XM2VTSDB	Regression based modal represents the fast and efficient performance among others.	Poor results in extreme head pose, occlusion, strong illumination.
	[58]	A CNN-based DFN model is proposed for recognize the face pose variation. Here, a DCL, ICL, and loss functions are implemented to reduce the intra-class feature variation.	FE-DFN, loss function- DCL and ICL for displacement and identity consistency loss	DFN, MF1, Face scrub dataset	Identification accuracy of DFN on MEGA face challenge 1 is 82.11%	If the pose of the face is more than 60% then it shows poor results.
	[50]	Geometric projection and DL-based coarse-to-fine method is proposed for face pose estimation (i.e., yaw, pitch and roll)	CNN InceptionResNetV2, Geometric Projection	BiWi pointing’4, unconstrained DB AFLW	Classification result for BIWi, Pointing’04 and AFLW datasets are 97.50%, 82.45%, 93.25%, respectively	Errors in some extreme poses are large, results to big deviation
illumination	[186]	A theoretical analysis-based novel method to extract illumination insensitive features is introduced under Gradient faces on uncontrolled and natural lighting condition.	Histogram equalization, log-transform, low-curvature image simplifier PCA, LDAMSR, SQI, LTV, Gradient- faces	PIE DB (68 subjects), Yale-B (10 subjects), Outdoor DB (132 subjects)	RR in outdoor and natural light condition for PIE DB, Yale B DB are 99.83% (68 subj), 98.96% (10 subj), and 95.61%, respectively.	Illuminance at each point is considered as smooth, thus not generalized with real practice.
	[21]	Intra-spectral and cross-spectral FR is investigated through SWIR, MWIR, and NIR standoff distances in controlled and uncontrolled scenarios.	FR using PCA, PCA + LDA, BIC, LBP and LTP, DoG	SWIR, MWIR, NIR	SWIR-100%, MWIR- 90%. NIR- 80% identification rate	Uncontrolled cross-spectral matching is the main challenge
	[46]	An adaptive harmonic filtering-based method is proposed by utilizing filter stretching and Kirsh compass iin all eight local directions to create illumination invariance.	Low- dimensional linear subspaces, HE, gamma intensity correction, Self-quotient image (SQI). AH-ELDP	CMU-PIE, Yale B, Extended Yale B	RR of 99.45% (CMU-PIE), 96.67% (Yale B) and 84.42% for Extended Yale B face images by considering single image per subject.	Constructing a linear subspace and requiring several sample images for training.
	[141]	The SIFT and state-of-the-art FR methods are analysed based on their performance for hyper spectral images.	LBP, Gabor wavelets, HOG, SVM and SIFT	PolyU-HSFD, CMU-HSFD	The SIFT method outperforms others recent methods for illumination issues.	This method has generalization issue.
	[62]	A logarithm high frequence-based SVD method is proposed to generate face using frequency interoretation. A local-region based nearest negihbor method is deployed to combine discriminative weights (DWs) and Gaussian weights (GWs).	HF-SVD,AHFSVD, DWLNN,GWLNN, FLNN, H& LSVD, SQI, LTV, S& L-LTV, Log-DCT, LBP, TT, Gradient-face, Weber-face, and MSLDE, bipolar sigmoid function	Yale B, CMU PIE, LFW, and self-built driver face databases.	Recognition rate (in %) on the Yale B face DB - DWLNN and GWLNN with best RR 98.10%, 98.73%. average RR for GWLNN-99.97, H&LSVD-GWLNN-99.94, and for drive face DB GWLNN-average RR is 73.89	H&L-SVD is a complex illumination model, GWLNN- is not good for unequal light in small regions
	[176]	A novel mathematically proved method referred as pixel-wise AWFGT is proposed. The LBP feature is separated feature from the weber face to reduce the impact of illumination variation.	AWFGT, intensity transformation without blurring using gamma correction, LBP, k-NN, chi-square	Yale B, CMU-PIE	Recognition rate for Yale B- 99.55%, CMU-PIE- 96.63%	It performs on pixel wise operation that shows more time consumption.
LR	[108]	A fast, robust, appearance, and geometric information-based method is proposed to accurately detect low- resolution images using thermal images.	Haar features, Adaboost, Rotation invariant Gaussian distribution, LBP, BRIEF, and SURF	Thermal/visible dataset (X1- Collection) from UND, IRIS Face DB.	Automatic extraction from an Inter-Pupil Distance = 24, 64×64 pixels thermal image. BRIEF signature provides accurate and fast FR	A problem like pose variation is unsolved using this method.
	[70]	Hallucination and recognition-based method with SVD is proposed to handle the low resolution-based input face.	PCA, SVD, ED, Simultaneous Face Hallucination for Verification/ identification (SHV/SHI).	LFW DB, AR	Average PSNR and SSIM for proposed SHV = 22.72, 0.6627, and for SHI= 22.83, 0.6685	It is assumed that two similar faces can have the same local-pixel structure.
	[15]	ICA I (linear face images- original) and ICA II (noisy images) (column vector) architecture are optimized to show the effectiveness of model using five classifiers for five separate benchmark face datasets.	Log-ICA (I & II), LDA, SVM, K-NN, DT, RF	IRIS, FERET, CMU-PIE, USTC-NVIE, Yale, CK, JAFFE Dataset	Except Yale database, log-ICA-II and LDA achieve 59.3%, highest accuracy 89.33%- normal, 85.82% for thermal images.	This method is not suitable for occluded face images.
	[9]	A novel noise robust-SIFT feature descriptor is proposed. The proposed method with two benchmark dataset JAFFE and ORL represents the remarkable performance over existing approaches for face recognition.	SIFT, Laplacian of Gaussian (LoG), Difference of Gaussian (DoG), Euclidean distance	JAFFE and ORL face databases	The noise-robust SIFT technique obtained RR of 88.85% and 91.2% for JAFFE and ORL DB respectively.	The pixel-wise operation is performed, thus its time consuming
	[28]	A preserved slack block-diagonal-based method to show dynamic target structure matrix is proposed. A noise-robust dictionary learning algorithm with two layers (i.e., Laplacian and Gaussian) is utilized by SBD structure represented as SBD²L.	SBD, SBD²L,VGG16	AR, Extended Yale B, CMU PIE, Labeled Faces	SBD²L model achieves the highest RR (worst case is still as high as 60.9%) under different numbers of dictionary atoms.	If numbers of dictionary atoms are too large then recognition result will be low.
	[181]	A CNN-based novel technique to resolve low resolution problem is proposed, which consists of five layers mappings with fourteen high resolution face layers involving non-linear transformation.	DCNN- VGGNet, Back propagation, Optimization- SGD (Stochastic gradient descent)	FERET, LFW, and MBGC datasets	FERET (6×6, 12×12) - 81.4%, 92.1%, LFW (8×8)- 76.3%, MBGC(12×12)- 68.64%, overall 5% improvement in LR.	This performance of this model gradually degrades, if we have very low size images
CB	[93]	The methods that can distinguish face images from sketches involving cluttered backgrounds, noise and deformed images are investigated here. A full CNN (i.e., pFCN) method consists of two stages, first is preprocessing and sketch synthesis and second is feature extraction is investigated.	pFCNN. L1 loss function	Public face sketch DB, Cross DB, CUHK Face Sketch DB(CUFS), AR DB, XM2VTS DB	The average SSIM value for L1-pfCN is 61.78 (for CUHK student dataset). RSLCR is 56.10 (for CUFS dataset), pfCN + RSCLR is 48.04 (for cross dataset)	More complex background or heavy noise can affect the SSIM value.
	[127]	A large benchmark video dataset named Extended Tripura University Video Dataset (ETUVD), consists of complex atmospheric condition for motion objects is introduced.	Bayesian Strategy, Filtering, Histogram Equalization, Learning Strategy	Self- created video dataset ETUVD comprises 147 video clips (each 2-5min long)	This dataset provide more efficient results over 26 other classification method and 04 Deep learning based methods.	Weather degradation may affect the results.
	[168]	Where-What Networks (WWNs)-based technique to simulate the information processing pathway is proposed involving Synapse Maintenance (for background interference) and Neuron Regenesis (for improving the network) considering size, type, and location simultaneously.	WWN-7 model with Hebbian learning rule, receptive fields, update rules, PCA	Simulated scenario with face images (LFW) of 5 types, 11 sizes, and 225 Complex background locations.	For two mechanism RR 0.9960 Location error -0.9638, size error- 1.0845.	TH angle of the faces, occlusion is not considered here. It consists high computation Complexity.
CO	[16]	Motion analysis-based optimized method with task specific camera placement is discussed to enhance object images for unconstrained or dynamic environment.	PCA, Kalmen filter(Tracking), Least square fitting	Self- created real-time videos from different camera angle	Max percent for Indoor, pedestrian and vehicle, pedestrian only, vehicle only are 99,92,95,97 (In %) respectively	Simulation of real-world environment is taken into consideration that results poor performance in real practice.
	[111]	A transformation invariant adversarial light projections conducting real-time damage with feasibility assurance is analyzed. Experiments comprised a webcam and projector to conduct attacks (i.e., impersonation and obfuscation).	Multitask CNN-based FD and landmark estimation method for FaceNet, SphearFace Commercial face. cosine distance metric, fusion function	Two open-source, and one commercial FR DB (50 subjects in each case)	FaceNet and SphereFace is suitable for all white-box obfuscation attempts, while black-box setting succeeded 7 out of 10 attempts on commercial face.	The camera adjustment or view point is highly correlated with lighting condition.

LR- Low resolution, CB-Cluttered Background, CO-Camera Orientation, DFN - Deformable FaceNet, AFFAIR- lAndmark Free Face AttrIbute pRediction, CQF- Convex Quadratic Fitting, AWFGT- adaptive Weber face-based gamma transformation, MF1- Megaface challenge 1