Table 7.
Comparative analysis of techniques to reduce the impact of extrinsic factors
| Type/Focus on | Ref. | Concept | Methodology Used | Dataset Used | Performance | Limitation |
|---|---|---|---|---|---|---|
| Pose | [23] | Thermal images-based method to recognize the face-biometric using contour and morphology with blood vessel network. | PCA, Bayesian Network | Synthesized DB for multi-pose (thermal face), UMD database | The best matching score is 83% | Fake vascular contours contribute in matching process with poor results |
| [18] | A detailed review on various recent methodology and taxonomy under varying face poses is presented. | Low level, motion, shape, 3D, CLM, CQF, AAM | AR, LPFW | CLM,CQF,AAM shows better results among other SOTA approaches | Not effective for heavy occlusion and varying illumination condition. | |
| [183] | A contextually discriminative feature and structural loss function-based deep approach to detect various face poses. | CNN, Structural, contextual, Euclidian loss | LFW and Net, UMD face | For AR database, Mean error 3.26%, Standard Deviation 0.83% | Not provide good result for yaw displacement. | |
| [133] | A functional regression solution for the least square problem is introduced to predict shape displacement. | iCCR Algorithm, cascade regression, Monte-Carlo sampling | 300-VW dataset | 20 times faster, real face tracking as compared to other recent approaches. | Not efficient with Pose variance, illumination, expression. | |
| [38] | A novel metric learning approach to reduce synthesized variation for single training image. In addition, a multi-depth extended mode of genetic elastic model is developed to handle illumination variations. | 3D Multi-depth generic elastic model in association of extension (3D-EGEM), Linear regression | Multi-PIE database | This method obtains average accuracy of 99.3% with Multi-PIE database. | It works on single training image, thus generalization for deep learning. | |
| [86] | An end-to-end pipeline-based AFFAIR method is proposed to achieve three tasks: learning global transformation, identifying the face location, and merging of local and global features to get robust attribute. | AFFAIR | CelebA, LFWA, MTFL | 86.55% Average accuracy among gender, smile, glass and pose | Fixed number of facial point is considering. | |
| [172] | A review on facial LMD approaches consists of holistic (global facial shape and appearance), CLM- (local appearance), regression-(implicitly capturing of facial shape and appearance). | Holistic, constrained local model (CLM) regression based method. | BioID, AR, Extended Yale-B, FERET, CK/CK+, Multi-PIE, XM2VTSDB | Regression based modal represents the fast and efficient performance among others. | Poor results in extreme head pose, occlusion, strong illumination. | |
| [58] | A CNN-based DFN model is proposed for recognize the face pose variation. Here, a DCL, ICL, and loss functions are implemented to reduce the intra-class feature variation. | FE-DFN, loss function- DCL and ICL for displacement and identity consistency loss | DFN, MF1, Face scrub dataset | Identification accuracy of DFN on MEGA face challenge 1 is 82.11% | If the pose of the face is more than 60% then it shows poor results. | |
| [50] | Geometric projection and DL-based coarse-to-fine method is proposed for face pose estimation (i.e., yaw, pitch and roll) | CNN InceptionResNetV2, Geometric Projection | BiWi pointing’4, unconstrained DB AFLW | Classification result for BIWi, Pointing’04 and AFLW datasets are 97.50%, 82.45%, 93.25%, respectively | Errors in some extreme poses are large, results to big deviation | |
| illumination | [186] | A theoretical analysis-based novel method to extract illumination insensitive features is introduced under Gradient faces on uncontrolled and natural lighting condition. | Histogram equalization, log-transform, low-curvature image simplifier PCA, LDAMSR, SQI, LTV, Gradient- faces | PIE DB (68 subjects), Yale-B (10 subjects), Outdoor DB (132 subjects) | RR in outdoor and natural light condition for PIE DB, Yale B DB are 99.83% (68 subj), 98.96% (10 subj), and 95.61%, respectively. | Illuminance at each point is considered as smooth, thus not generalized with real practice. |
| [21] | Intra-spectral and cross-spectral FR is investigated through SWIR, MWIR, and NIR standoff distances in controlled and uncontrolled scenarios. | FR using PCA, PCA + LDA, BIC, LBP and LTP, DoG | SWIR, MWIR, NIR | SWIR-100%, MWIR- 90%. NIR- 80% identification rate | Uncontrolled cross-spectral matching is the main challenge | |
| [46] | An adaptive harmonic filtering-based method is proposed by utilizing filter stretching and Kirsh compass iin all eight local directions to create illumination invariance. | Low- dimensional linear subspaces, HE, gamma intensity correction, Self-quotient image (SQI). AH-ELDP | CMU-PIE, Yale B, Extended Yale B | RR of 99.45% (CMU-PIE), 96.67% (Yale B) and 84.42% for Extended Yale B face images by considering single image per subject. | Constructing a linear subspace and requiring several sample images for training. | |
| [141] | The SIFT and state-of-the-art FR methods are analysed based on their performance for hyper spectral images. | LBP, Gabor wavelets, HOG, SVM and SIFT | PolyU-HSFD, CMU-HSFD | The SIFT method outperforms others recent methods for illumination issues. | This method has generalization issue. | |
| [62] | A logarithm high frequence-based SVD method is proposed to generate face using frequency interoretation. A local-region based nearest negihbor method is deployed to combine discriminative weights (DWs) and Gaussian weights (GWs). | HF-SVD,AHFSVD, DWLNN,GWLNN, FLNN, H& LSVD, SQI, LTV, S& L-LTV, Log-DCT, LBP, TT, Gradient-face, Weber-face, and MSLDE, bipolar sigmoid function | Yale B, CMU PIE, LFW, and self-built driver face databases. | Recognition rate (in %) on the Yale B face DB - DWLNN and GWLNN with best RR 98.10%, 98.73%. average RR for GWLNN-99.97, H&LSVD-GWLNN-99.94, and for drive face DB GWLNN-average RR is 73.89 | H&L-SVD is a complex illumination model, GWLNN- is not good for unequal light in small regions | |
| [176] | A novel mathematically proved method referred as pixel-wise AWFGT is proposed. The LBP feature is separated feature from the weber face to reduce the impact of illumination variation. | AWFGT, intensity transformation without blurring using gamma correction, LBP, k-NN, chi-square | Yale B, CMU-PIE | Recognition rate for Yale B- 99.55%, CMU-PIE- 96.63% | It performs on pixel wise operation that shows more time consumption. | |
| LR | [108] | A fast, robust, appearance, and geometric information-based method is proposed to accurately detect low- resolution images using thermal images. | Haar features, Adaboost, Rotation invariant Gaussian distribution, LBP, BRIEF, and SURF | Thermal/visible dataset (X1- Collection) from UND, IRIS Face DB. | Automatic extraction from an Inter-Pupil Distance = 24, 64×64 pixels thermal image. BRIEF signature provides accurate and fast FR | A problem like pose variation is unsolved using this method. |
| [70] | Hallucination and recognition-based method with SVD is proposed to handle the low resolution-based input face. | PCA, SVD, ED, Simultaneous Face Hallucination for Verification/ identification (SHV/SHI). | LFW DB, AR | Average PSNR and SSIM for proposed SHV = 22.72, 0.6627, and for SHI= 22.83, 0.6685 | It is assumed that two similar faces can have the same local-pixel structure. | |
| [15] | ICA I (linear face images- original) and ICA II (noisy images) (column vector) architecture are optimized to show the effectiveness of model using five classifiers for five separate benchmark face datasets. | Log-ICA (I & II), LDA, SVM, K-NN, DT, RF | IRIS, FERET, CMU-PIE, USTC-NVIE, Yale, CK, JAFFE Dataset | Except Yale database, log-ICA-II and LDA achieve 59.3%, highest accuracy 89.33%- normal, 85.82% for thermal images. | This method is not suitable for occluded face images. | |
| [9] | A novel noise robust-SIFT feature descriptor is proposed. The proposed method with two benchmark dataset JAFFE and ORL represents the remarkable performance over existing approaches for face recognition. | SIFT, Laplacian of Gaussian (LoG), Difference of Gaussian (DoG), Euclidean distance | JAFFE and ORL face databases | The noise-robust SIFT technique obtained RR of 88.85% and 91.2% for JAFFE and ORL DB respectively. | The pixel-wise operation is performed, thus its time consuming | |
| [28] | A preserved slack block-diagonal-based method to show dynamic target structure matrix is proposed. A noise-robust dictionary learning algorithm with two layers (i.e., Laplacian and Gaussian) is utilized by SBD structure represented as SBD2L. | SBD, SBD2L,VGG16 | AR, Extended Yale B, CMU PIE, Labeled Faces | SBD2L model achieves the highest RR (worst case is still as high as 60.9%) under different numbers of dictionary atoms. | If numbers of dictionary atoms are too large then recognition result will be low. | |
| [181] | A CNN-based novel technique to resolve low resolution problem is proposed, which consists of five layers mappings with fourteen high resolution face layers involving non-linear transformation. | DCNN- VGGNet, Back propagation, Optimization- SGD (Stochastic gradient descent) | FERET, LFW, and MBGC datasets | FERET (6×6, 12×12) - 81.4%, 92.1%, LFW (8×8)- 76.3%, MBGC(12×12)- 68.64%, overall 5% improvement in LR. | This performance of this model gradually degrades, if we have very low size images | |
| CB | [93] | The methods that can distinguish face images from sketches involving cluttered backgrounds, noise and deformed images are investigated here. A full CNN (i.e., pFCN) method consists of two stages, first is preprocessing and sketch synthesis and second is feature extraction is investigated. | pFCNN. L1 loss function | Public face sketch DB, Cross DB, CUHK Face Sketch DB(CUFS), AR DB, XM2VTS DB | The average SSIM value for L1-pfCN is 61.78 (for CUHK student dataset). RSLCR is 56.10 (for CUFS dataset), pfCN + RSCLR is 48.04 (for cross dataset) | More complex background or heavy noise can affect the SSIM value. |
| [127] | A large benchmark video dataset named Extended Tripura University Video Dataset (ETUVD), consists of complex atmospheric condition for motion objects is introduced. | Bayesian Strategy, Filtering, Histogram Equalization, Learning Strategy | Self- created video dataset ETUVD comprises 147 video clips (each 2-5min long) | This dataset provide more efficient results over 26 other classification method and 04 Deep learning based methods. | Weather degradation may affect the results. | |
| [168] | Where-What Networks (WWNs)-based technique to simulate the information processing pathway is proposed involving Synapse Maintenance (for background interference) and Neuron Regenesis (for improving the network) considering size, type, and location simultaneously. | WWN-7 model with Hebbian learning rule, receptive fields, update rules, PCA | Simulated scenario with face images (LFW) of 5 types, 11 sizes, and 225 Complex background locations. | For two mechanism RR 0.9960 Location error -0.9638, size error- 1.0845. | TH angle of the faces, occlusion is not considered here. It consists high computation Complexity. | |
| CO | [16] | Motion analysis-based optimized method with task specific camera placement is discussed to enhance object images for unconstrained or dynamic environment. | PCA, Kalmen filter(Tracking), Least square fitting | Self- created real-time videos from different camera angle | Max percent for Indoor, pedestrian and vehicle, pedestrian only, vehicle only are 99,92,95,97 (In %) respectively | Simulation of real-world environment is taken into consideration that results poor performance in real practice. |
| [111] | A transformation invariant adversarial light projections conducting real-time damage with feasibility assurance is analyzed. Experiments comprised a webcam and projector to conduct attacks (i.e., impersonation and obfuscation). | Multitask CNN-based FD and landmark estimation method for FaceNet, SphearFace Commercial face. cosine distance metric, fusion function | Two open-source, and one commercial FR DB (50 subjects in each case) | FaceNet and SphereFace is suitable for all white-box obfuscation attempts, while black-box setting succeeded 7 out of 10 attempts on commercial face. | The camera adjustment or view point is highly correlated with lighting condition. |
LR- Low resolution, CB-Cluttered Background, CO-Camera Orientation, DFN - Deformable FaceNet, AFFAIR- lAndmark Free Face AttrIbute pRediction, CQF- Convex Quadratic Fitting, AWFGT- adaptive Weber face-based gamma transformation, MF1- Megaface challenge 1