Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Nov 25;15:43793. doi: 10.1038/s41598-025-27757-5

Intelligent multimodal 3D biometric recognition using PointNet + + for robust face–ear authentication

Veerpal Kaur 1, Devershi Pallavi Bhatt 1,, Sumegh Tharewal 2, Pradeep Kumar Tiwari 3
PMCID: PMC12705682  PMID: 41290953

Abstract

In real-world applications, the reliability of biometric recognition systems that are based on 2D modalities is typically reduced due to limitations such as sensitivity to changes in illumination, facial expressions, and occlusion among other things. To overcome these problems, this research offers a multimodal biometric model that incorporates data from 3D face and 3D ear to achieve reliable identity recognition. The 3D biometrics offer more comprehensive structural information than the 2D characteristics, and they are more resistant to the effects of environmental changes. These 3D features are then used to recognize and secure storage of multimodal biometrics. Initially pre-processing steps, including cropping, normalization, hole filling, and spike removal are applied on 3d biometrics. After that, feature extraction is performed using the PointNet + + model, which is a network based on Convolutional Neural Networks (CNNs) that processes point clouds directly. We used the Face Recognition Grand Challenge (FRGC) database for 3D images of faces and the University of Notre Dame (UND) Collection G database for 3D images of ears for the tests. Our tests show that the PointNet + + model is accurate 99% of the time for 3D face recognition and 98% of the time for 3D ear recognition. With its 3D point cloud optimization and resilient architecture, the PointNet + + model achieves high accuracy for 3D face and 3D ear by learning multi-scale features that capture both local and global information.

Keywords: Multimodal biometrics, 3D face recognition, 3D ear recognition, PointNet++, Point cloud feature extraction, Robust authentication

Subject terms: Computational biology and bioinformatics, Engineering, Mathematics and computing

Introduction

Biometrics, which analyses a person’s physical and behavioral attributes, is an important part of computer security. At present, it is the safest and most effective way to prove your identity through physical security. With biometric-based identification, each person can be correctly identified based on their unique physical or behavioral characteristics. Researchers have said that physiological traits like the face, ears, fingerprints, iris scans, and palm prints can be used as the base for biometric systems1. Every time someone tries to log in, the biometric security system scans, evaluates, and compares their features to information that has already been saved. When a match is found, the person is let into space or machines. Biometrics like iris, fingerprints, palm prints, and retinal scans are less socially accepted than facial biometrics because they are harder to get2. Some issues with the unimodal biometric system include infraclass types, noise in sensitive data, non-all-inclusiveness, entomb class, and comical attacks. These issues can be solved with multimodal biometrics, which combines different biometric methods and provides a range of data sources for strong identification.

Multimodal biometrics have been used in many fields, such as robotics, security, biometric security, detecting drowsiness and fatigue, and finding criminals from video footage3. Using this method, you change every pixel in a picture mathematically to find out what it looks like and how to use that information4. 3D face recognition is a biometric method that is used in many surveillance systems for things like public records, security, intelligence, and authentication5. Two-dimensional (2D) biometrics are less accurate than three-dimensional (3D) biometrics. Three-dimensional biometrics also have extra features and can effectively solve problems with occlusion and lighting. 3D ears also solve the issue of lighting and give you more highlights than 2D ears. Active or passive ways can be used with different 3D scanners to get 3D data6. In addition, 3D biometrics offers additional components and solves the issues with illumination and occlusion. A 3D face and 3D ear offer more highlights than 2D, even after accounting for occlusion and light7. Deep learning has emerged as a leading technique in the field of biometric recognition, gaining substantial popularity alongside the advancements in artificial intelligence and deep learning technologies in recent years8. Both the deep learning approach and the conventional method produce good recognition accuracy results. The adaptability of multimodal biometric recognition based on a two-dimensional database to changes in expression, light intensity, and attitude is insufficient9. This article proposes 3D face and 3D ear feature extraction, which is further used to recognize humans and secure biometric identities10. Our goal in this study is to use PointNet + + deep learning architecture, which independently accepts point clouds as input for 3D faces and 3D ears. Various CNN models are available for 3D face recognition, including VGG, ResNet, and MobileNet. Many studies use 2D CNNs to extract features for 3D feature learning by mapping 3D surfaces to 2D maps. However, in this article, we used PointNet++, which is based on CNN, to learn features directly from 3D point clouds11.

The contributions of this work are summarised as follows:

  1. Utilisation of 3D Face and 3D Ear Biometrics: This research combines information about 3D faces and 3D ears in order to enhance the process of identity recognition. The 3D face offers specific anatomical details, while the 3D ear offers additional information that is less influenced by facial emotions or the effects of ageing. When combined, they produce a biometric system that is more dependable and accurate.

  2. Use of PointNet + + for the Purpose of Feature Extraction: PointNet + + is utilised to extract features directly from 3D point cloud data. which improves 3D face and ear recognition by capturing detailed local and global features through a hierarchical approach, effectively handling variations in lighting, expressions, and postures.

  3. Evaluation on Standard Datasets: In order to validate the effectiveness of the proposed model, it is evaluated on two datasets. Collection F and G from the University of Notre Dame are utilised for the 3D ear evaluations, whereas FRGC is utilised for conducting 3D face investigations. The analysis of the results demonstrates that the multimodal method is both effective and robust.

The subsequent sections of this article are structured as follows: The section "Related work" addresses the Relevant Literature. The section "Proposed methodology" delineates the methods for extracting features from 3D faces and 3D ears. The section "Results and discussion" presents the results and discussions. The section "Conclusion and future work" concludes the research.

Related work

With the advancement of 3D sensors, more approaches for 3D face analysis have been proposed. With low-quality data, Mu et al.12 created a lightweight yet effective CNN to produce an accurate and effective deep learning solution. It can function at a high speed of up to 136 frames per second on the Jetson TX2 during runtime. Faltemier et al.13 used 38 sub-regions, some of which overlapped, to extract the image. Using an ICP algorithm, which minimize the difference between two sets of points, each sub-region of the probe image is matched to a corresponding region in the gallery image. Single-region matching achieved the highest Rank-one identification rate of 90.2%. which encouraged the application of fusion techniques.

With the enhanced Bordacount fusion method, the overall rank-one identification rate is 97.2%. This approach performs effectively even when the face information is incomplete in certain parts of the FRGC v2.0 dataset. Boodoo et al.14 developed a multi-biometric identification method using images of ears and faces. Principal Component Analysis (PCA) was used to extract the features. The majority vote rule was then utilized to merge the extracted characteristics at the decision-making level. With merged features, this approach attained a 96% recognition rate. Yan et al.15 designed the biometric recognition system using 3D ear shapes. This technique was designed to match 3D ear shape and segment ear biometrics. Using a contour technique, the ear pit was detected. Lei et al.16 utilized Kernel Principal Component Analysis (KPCA) in conjunction with a novel 3D facial feature identification technique known as Angular Radial Signature (ARS), which is derived from the semi-rigid regions of the face. To strengthen their capacity for discrimination, ARSs extract characteristics from the medium. The extracted features are combined into a single feature vector and input into a Support Vector Machine (SVM) for facial recognition. This technique uses facial scans to address variations in facial expression in various people. Wu J. et al.17 implemented an ear recognition system using the ICP algorithm and an edge-based methodology. It obtained a recognition rate of 98.8%. Dave et al.18 developed 3D ear collecting that is used to create one of the largest databases of 2D and 3D ears (IIT Indore) and to apply a 3D descriptor to the acquired data to get recognition performance. With a corresponding rank one authentication rate of 97.13% and 97.12%, they were obtained using techniques that matched the query and repository photos using the Binary SHOT (B-SHOT) and SHOT descriptors as 3D feature descriptors.

Feng et al.19 highlighted the limitations of 2D facial recognition while highlighting the positive effects of 3D image processing. The system uses a Deep Convolutional Neural Network (DCNN) with a softmax classifier. Fusing the recognition rate maximization approach with fusion produces an excellent recognition rate. Huang et al.20 emphasized that most face recognition systems heavily depend on the feature representations offered by descriptors such as the Local Binary Pattern. However, the application of deep learning methods generates several complementary representations. The authors suggest employing a convolutional deep belief network to extract features from high-resolution images. Sharma et al.21 have proposed a facial recognition system with an approximate 98% accuracy rate using CNN model consisting of four layers; an input layer, a convolutional layer, a pooling layer, and a fully connected layer. 96 × 96 resolution 3D images are recognized using this method. The results demonstrate that their facial recognition system requires twenty epochs to converge the learning rate, accounting for training and testing rates. In many biometric and recognition systems, the techniques of extracting features from 3D face and 3D ear models are essential. Table 1 shows the comparative analysis of the methods and approaches utilized for 3D face and 3D ear feature extraction using different datasets.

Table 1.

Comparative analysis of 3D face and 3D ear feature extraction methods.

References Author and Year Biometric Trait Database Feature Extraction Techniques Accuracy Results
22 Dutta et al. 2020 3D Face

Frav3D

Bosphorus3D

and Casia3D

SpPCANet

96.93%

98.54%

88.80%

23 Z Wu et al. 2017 3D Face Oulu-CASIA NIR 3D CNN 78.42%
24 Trimech et al. 2020 3D Face BU-3DFE DNN 94.13%
25 Yu et al. 2020 3D Face

FRGC v2.0 Bosphorus

BU-3DFE

DCNN

97.6%

98.4%

98.8%

26 Jabberi et al. 2023 3D Face

LFPW

BU3DFE FRAV3D

deep 3D CNNs

94.25%

97.9%

98.31%

27 Dutta et al. 2016 3D Face Frav3D

LBP

HOG

78.5%
28 Cai, Y et al. 2019 3D Face FRGC v2.0 CNN 99.75%
29 Prakash et al. 2014 3D Ear UND Collection J2 (UND-J2)

GPA

ICP

98.30%
30 Omara et al. 2016 3D Ear

USTB subset1

IIT Delhi,

Canny edge operator

geometric feature

98.33%

99.60%

31 Islam et al. 2013

3D Face

3D Ear

FRGC v.2.0

UND-J2

ICP

99.0%

99.4%,

32 Mursalin et al. 2021 3D Ear

UND F

UND G

UND J2

PointNet++

89.58%

90.09%

89.18%

33 Cao. et al. 2022 3D Face Bosphorus CASIA-3D PointNet++ 98.22%

Proposed methodology

Images acquisition

For 3D face images, we utilized the FRGC database. Additionally, the Collection F and Collection G datasets from the University of Notre Dame were employed for 3D ear biometric analysis. The FRGC v2.0 dataset offers 2.5D range images of 3D faces that were taken with a structured-light scanner. The X, Y, and Z coordinate matrices for the visible facial surface are included in every sample. The PointNet + + network uses these coordinates to create 3D point clouds, even though the original data was acquired in 2.5D. For 3D face and 3D Ear, we used 30-person samples. Each person has more than one sample. Minolta Vivid 910 scanners were utilized to obtain both datasets.

3D face dataset

The FRGC collection includes 557 people and is often recognized as the biggest 3D face dataset. It is commonly used as a standard reference database to assess how effectively 3D face recognition algorithms work. To acquire images for the dataset during three different times: during the spring of 2003, the fall of 2003, and the spring of 2004. 3D images were captured in environments with adequate illumination suitable for the Vivid 900/910 sensor. The Vivid 900/910 from Minolta is a superior structured sensor that generates 3D images by capturing vivid images and obtaining a 640 by 480 range sample. The test subjects either sat or stood around 1.5 m from the sensor34. The image capture was done with a Minolta Vivid 910 scanner due to concerns about database standards. They collected data from 466 participants in six distinct emotional states: surprise, anger, fear, disgust, and happiness. A range of one to 1.5 m was considered under full light conditions with 640 × 480-pixel resolution. This database is 72 GB in size.

3D ear dataset

For 3D ear, two datasets are used: The F and G collections of UND. In Collection F Dataset, 302 human individuals’ 942 3D profile (ear) images and corresponding 2D images were taken in the year 2003 and 2004. The images were taken with the Minolta Vivid 910 scanner, and 466 subjects’ data were recorded in two distinct positions. A 1 to 1.5 m measurement range was taken under full light conditions with 640 × 480 pixels resolution. The size of this dataset is 2. 5GB.Selection: Highlight all author and affiliation lines.

In Collection G Dataset, 38 3D profile images with 3D (and matching 2D ear) images of 235 human individuals were taken between the years 2003 and 2005. With the use of a Minolta Vivid 910 scanner, 466 participants’ data were acquired in two distinct postures. A range of one to 1.5 m is taken into consideration when there are full light illumination and 640 × 480 resolutions. 2 GB is the database’s size.

Preprocessing

The acquired unprocessed 3D facial data and 3D ear data cannot be utilized directly as the source of input for feature extraction systems because duplicate information could be present in it, for instance, the background setting, hair, and neck35. These details will affect the recognition accuracy. As a result, the 3D data is pre-processed before being used in a feature extraction, as shown in Fig. 1. The files in the FRGC and UND database range files use the ASCII representation and contain the extension ABS. The header of the ABS file begins with the specification of the range image dimensions, with 480 rows and 640 columns, stated in the first two lines36. The order of the data in the range file, denoted as (flag, X, Y, Z), is indicated in the third line. The remaining part of the file consists of four raster-order blocks of space-separated data, each with rows and columns of a different size.

graphic file with name d33e653.gif 1

Fig. 1.

Fig. 1

Proposed Model for 3D Face and 3D Ear Feature Extraction.

Where Inline graphic is a flag and Inline graphic are the coordinates. Extract the raw point set and keeping only vaild coordinates after removing the values like Inline graphic used in.abs files.Using depth information, the face region and ear region have been extracted from the entire image. The subject faces the z-axis, and the sensor has caught the person’s frontal face and side face to get the ear region37.

graphic file with name d33e675.gif 2

We perform different pre-processing operations like hole filling, denoising, and despiking, cropping the face region. Inline graphic with Inline graphic. For each point Inline graphic compute the distances to its Inline graphicnearest neighbours (euclidean):

graphic file with name d33e697.gif 3

Where Inline graphicis the Inline graphicth nearest neighbor of Inline graphic. The mean neighbor distance for point Inline graphic.

graphic file with name d33e719.gif 4

Compute global statistics over n:

graphic file with name d33e725.gif 5

Points that satisfy the statistical rules:

graphic file with name d33e731.gif 6

After spikes removal, apply Poisson reconstruction to the filtered samples and take the mesh vertices as the filled points set:

graphic file with name d33e737.gif 7

Where Inline graphic is the vertex set of mesh Inline graphic and Inline graphic is the reconstruction depth. The 3D face region is cropped by utilising a rectangular window that is centred at the nose tip. This window extends ± 60 units along the x-axis and ± 80 units along the y-axis. On the other hand, To crop the 3D ear region, create a 20-centimeter-radius section that extends +/- 30 degrees from the horizontal, using the point P(Nose tip) as the centre of the circle38.

Feature extraction

Deep neural networks are presently widely adopted methods for biometric recognition. Deep learning-based techniques provide several benefits over image processing compared to conventional ones39. Conventional approaches rely on geometry information from 3D face data to identify robust feature points and descriptors. However, by training deep neural networks on datasets, strong face and ear representations may be built for deep learning-based algorithms, which can greatly speed up face and ear recognition40. To extract features, CNN is used, and multiple network architectures are used. Subsequently, a 3D model is built using feature points and curvature data, and it is further standardized and improved to increase recognition accuracy41. Deep learning techniques, particularly PointNet, PointNet++, have developed in recent years as powerful tools for learning hierarchical representations straight from raw data, opening new possibilities for more effective and economical 3D biometric feature extraction42. To extract the features from 3D faces and 3D ears, we utilize PointNet++, which is based on CNN. It extracts local and global features by interpreting every point as a separate input and using shared Multi-Layer Perceptron (MLPs). These features are then combined by PointNet to create a global feature representation of all points in the cloud43. Figure 2 shows the architecture of PointNet++, which includes four main components.

Fig. 2.

Fig. 2

Architecture of PointNet++.

PointNet + + developed an architecture of hierarchical neural networks for point cloud analysis as an expansion of PointNet. PointNet + + is an innovative deep learning architecture that is especially well-suited for 3D geometric problems like object recognition and segmentation because it was created expressly for processing unstructured point cloud data44. A collection of 3D points that illustrate the face and ear geometry is the input for PointNet++’s 3D face and 3D ear feature extraction method. From these points, the architecture learns to extract discriminative features that capture global face and ear structures like symmetry and overall face shape as well as local features like eye position and nose shape.

Input point cloud

A three-dimensional point cloud that depicts a face and ear geometry is the input used by PointNet++. The collections of points are represented by Inline graphic in the point cloud, where Inline graphic is the total number of points. Some preprocessing operations are performed before the feature extractions on raw point clouds.

graphic file with name d33e815.gif 8

Data augmentation

Geometric augmentation is used before normalization to improve feature resilience and decrease overfitting54.

First, we apply random rotation around the z-axis

graphic file with name d33e829.gif 9

Gaussian noise is added to increase the robustness.

graphic file with name d33e835.gif 10

The point cloud is then cantered by subtracting the mean,

graphic file with name d33e841.gif 11

After this the cloud is scaled into unit sphere,

graphic file with name d33e847.gif 12

points are uniformly sampled at random from the normalised set, with = 1024.

graphic file with name d33e853.gif 13

The data augmentation techniques, such as random rotation along the z-axis (Eq. 9) and Gaussian noise addition (Eq. 10), represent real-world variations in sensor acquisition, head posture, and measurement noise. The PointNet + + model is exposed to a broader range of possible point cloud configurations by introducing these controlled variations, which improves its generalization capability and reduces the risk of overfitting.

PointNet set abstraction (downsampling layer)

PointNet + + uses a hierarchical set abstraction module to downsample the input point cloud and extract features. The set abstraction module works with point groups to aggregate their local features. To obtain features that represent various input scales, a subset of points is chosen at each level of abstraction and processed. MLPs, applied to each point and its local neighborhood, are used to represent this module. In order to extract features, input point representations that capture both local and global information must be learned45. To represent the feature vector collected is defined as Inline graphic for pointInline graphic. Hierarchical representations of the input are captured by combining the features that were collected at various levels of abstraction46.

graphic file with name d33e885.gif 14

Point clouds as shown in Eq. (1) is the subset of representative centriods is first selected by Farthest Point Sampling(FPS).

graphic file with name d33e894.gif 15

For every centroid Inline graphic, neighbouring points are grouped with radius Inline graphic using ball query.

graphic file with name d33e908.gif 16

All neighbour’s local coordinates are calculated as:

graphic file with name d33e914.gif 17

To extract features from an input point cloud, many neural network layers are applied, with the Inline graphic-th layer represented as:

graphic file with name d33e924.gif 18

where Inline graphic denotes the weight matrix of the Inline graphic- th layer, Inline graphic denotes the concatenation of feature vectors, Inline graphicrepresents the multi-layer perceptron,Inline graphic denotes the local neighbourhood of the point Inline graphic. Finally, the features are aggregated vis a symmetric max pooling operation.

graphic file with name d33e955.gif 19

Feature propagation and aggregation

Features from various levels of abstraction are combined by the FP module and propagated back to their original points. To improve the representations, features from higher-level sampled points are mixed with the original point features. This layer propagates and combines features that have been learned at various scales throughout the point cloud47. It ensures that feature representations contain both local and global information. PointNet + + collects local features for each point and aggregates them to create a global representation of the point cloud. This is usually accomplished by using various aggregating techniques, such as max pooling:

graphic file with name d33e967.gif 20
graphic file with name d33e971.gif 21
graphic file with name d33e975.gif 22

In Eqs. (20) to (22), Inline graphic denotes the global feature vector of the entire point cloud and Inline graphic denotes the local vector at level Inline graphicof the hierarchy, N defines the total number of points, and Inline graphic denotes the points in Inline graphic-th local cluster. Both local and global feature vectors are L2-normalized before feature fusion to consistent and stable feature representations.

graphic file with name d33e1011.gif 23

Where Inline graphic denotes the L2 norm.

Feature fusion

Lastly, PointNet + + combines the hierarchical local and global features to.

produce a complete feature representation of the 3D face and 3D ear, respectively.

graphic file with name d33e1027.gif 24

In Eq. (24), Inline graphic represents the feature vector containing both local and global information.

Results and discussion

Visual representation

Figure 3 displays the 3D face visualization analysis after performing preprocessing operations. The subject is shown to be facing the z-axis, and the sensor has caught the person’s frontal face, as shown in Fig. 3(a). The nose tip often has the highest Z value in frontal facial scans. Then, a face region is cut from the raw 3D data to create a 3D face image that is centered on the nose section. The initial step involves cropping the needed facial region from the 3D face for further processing, as shown in Fig. 3(b). Despiking, hole filling, and denoising must be done after the face region has been cropped. Errors in the picture capture process lead to values that are not accurate representations of the scene’s true intensities, known as noise10.

Fig. 3.

Fig. 3

Visual Representation of 3D Face (a) 3D Face Image from.abs File. (b) Cropped Face Region and after Pre-processing operations.

Figure 4 shows the visual representation of the 3D ear while performing the pre-processing operations. Figure 4(a) shows the actual 3D side images obtained from the.abs file. Figures 4(b) and 4(c) show the cropping image of 3D ear, and after pre-processing operations, respectively.

Fig. 4.

Fig. 4

Visual Representation of 3D Ear (a) 3D Ear Image from.abs File (b) Cropped Ear Region (c) After Pre-processing.

Accuracy and performance analysis

As part of the evaluation phase, the model was trained with the help of a dataset consisting of 3D face scans, and its accuracy was evaluated over a duration of 60 epochs. Our dataset consists of approximately 60 subjects (persons), and each subject contains 5 face point clouds and 5 ear point clouds samples. Our models were trained and evaluated within the Google Colab environment, which was equipped with an NVIDIA T4 GPU (16 GB VRAM) and 12 GB of system RAM. We implemented the entire deep learning architecture, including the Face PointNet++, Ear PointNet++, and Fusion Classifier models, using PyTorch with CUDA acceleration to ensure efficient processing of the 3D point cloud data. The backbone models were trained for 60 epochs, with an average training time of approx. 2.4 s per epoch. Highlighting its practical potential, the system demonstrated remarkable inference speed, requiring approx. 0.018 s to process a single fused face-ear sample. For ensuring that the extraction of 3D facial features is accurate, the dataset was divided into testing and training sets. At the end of every epoch, the model’s accuracy was assessed by analysing how well it performed on the testing set. Figure 5 illustrates the accuracy analysis of 3D face recognition. After 60 epochs, the model achieved a final accuracy of nearly 99% on the testing set, demonstrating its proficiency in feature extraction from 3D facial data.

Fig. 5.

Fig. 5

Training Performance of 3D Face Recognition Model (Accuracy vs. Loss).

Figure 6 shows the accuracy analysis of 3D ear after the 60 epochs. The model gives the final accuracy of approx. 98% on the testing set. It is advised that the model be trained for more epochs to completely comprehend its learning capabilities and improve its accuracy, even though 60 epochs offer an initial evaluation.

Fig. 6.

Fig. 6

Training Performance of 3D Ear Recognition Model (Accuracy vs. Loss).

It is important to evaluate PointNet + + Model Performance: Assessing False Acceptance Rate (FAR) and False Rejection Rate (FRR) Alongside Overall Accuracy in 3D Face and 3D Ear Feature Extraction. These metrics give information about the robustness and dependability of the model in recognition tasks. The probability that the system will incorrectly grant access to an unauthorized person is measured by FAR, while the probability that the system will incorrectly deny access to an authorized person is measured by FRR.

graphic file with name d33e1124.gif 25
graphic file with name d33e1128.gif 26

Where Inline graphic denotes the false positives, true negatives, false negatives and true positives. A threshold value is used to identify the optimal FAR and FRR values, as adjusting this threshold directly influences both rates48. Verification accuracy is determined by finding the optimal balance between FRR and FAR.

graphic file with name d33e1142.gif 27

The best combination of FRR and FAR occurs when the verification accuracy is at its peak, indicating the most effective configuration for the proposed method. The variation of FAR and FRR with varying threshold values is shown in Fig. 7. By decreasing the acceptance of unauthorised personnel, FAR lowers as the threshold increases, signalling increased security. When FRR rises, on the other hand, it indicates that authorised individuals are more likely to be rejected, which in turn means that performance drops. For high-security applications, the threshold value of 0.9 for the PointNet + + model is recommended for feature extraction from 3D faces. This level enhances system security by reducing the likelihood of unauthorised access and decreasing the False Acceptance Rate.

Fig. 7.

Fig. 7

FAR and ERR Analysis for 3D Face.

Figure 8 illustrates the variation of FAR and FRR with respect to different threshold levels for 3D ear recognition. As the threshold value rises, the False Acceptance Rate (FAR) decreases while the False Rejection Rate (FRR) increases, signifying enhanced outcomes and improved security through the reduction of unauthorised person acceptance.

Fig. 8.

Fig. 8

FAR and ERR Analysis for 3D Ear.

This model performs better when it comes to feature extraction from 3D ear and face data.

To benchmark our PointNet + + model, we conducted comparison against several leading methods in 3D face and ear recognition. The results, as detailed in Table 2, show our model achieving 99% accuracy for face recognition and 98% for ear recognition across the FRGC, UND-F, and UND-G datasets. This level of performance consistently outperforms previous approaches, highlighting the effectiveness of our feature-level fusion strategy and the resilient architecture of PointNet + + shown in Fig. 9. The improved performance is related to our feature-level fusion strategy and the use of data augmentation techniques, which improve the model’s resistance to variations in pose, sensor acquisition, and noise. This comparison clearly indicates the advantages of the suggested method in 3D biometric recognition tasks.

Table 2.

Comparison of proposed PointNet + + with State-of-the-Art 3D face and 3D ear recognition methods Accuracy.

Model Dataset 3D Face Accuracy 3D Ear Accuracy
PointNet++ (Siamese Network)49 Public Datasets (Pose Var.) 83% N/A
RP-Net + PointNet++50 BU-3DFE 96.24% 96.24%
PointNet++51 UND-J2 N/A 91%
PointSurFace PointNet++52 Lock3DFace 90.03% N/A
PointNet++53 3D mobile scanner (ZScanner 700CX) 68.7% N/A
Proposed PointNet++

FRGC

UND-F

UND-G

99% 98%

Fig. 9.

Fig. 9

Accuracy Comparision.

Authentication evaluation

To analyse the practical verification performance of the suggested 3D face and ear biometric system, we augment the open-set recognition findings with an authentication evaluation, which is essential for a thorough evaluation. In contrast to recognition accuracy, which assesses the system’s capacity to accurately classify identities, authentication evaluation offers a comprehension of the system’s resilience against imposters and its functionality in real-world verification scenarios. Receiver Operating Characteristic (ROC) curves and Detection Error Trade-off (DET) curves were generated for the fused system and individual modalities (3D face and 3D ear). ROC curves offer a comprehensive understanding of the system’s discriminative capabilities by plotting the True Positive Rate (TPR) against the False Acceptance Rate (FAR) across a range of decision thresholds shown in Fig. 10. The DET curves illustrate the performance in the low-FAR region, which is essential for high-security applications, by plotting False Rejection Rate (FRR) versus FAR on a logarithmic scale shown in Fig. 11. Equal Error Rate (EER) is the point at which it is determined that the FAR is equal to the FRR. It is a single metric that can be interpreted to measure the performance of verification. According to the results of our research, the EER for the 3D face modality was approximately 1.9%, the EER for the 3D ear modality was 2.3%, and the EER for the fused 3D face-ear system was 0.9%, which showed a considerable decrease. Feature-level fusion has been shown to improve verification reliability and reduce mistakes caused by individual modality deficiencies, as seen by the significant drop in EER that has been observed for the fused system. To simulate practical security scenarios, we analysed the verification rate (VR) at a fixed FAR of 0.1%. A VR of 92.1% was attained by the 3D face, 89.7% by the 3D ear, and 97.5% by the fused system.

Fig. 10.

Fig. 10

ROC Curves for 3D face, 3D Ear and Fusion.

Fig. 11.

Fig. 11

DET curves for 3D Face, 3D Ear, and fusion.

These findings indicate that the fused system not only maintains a high verification success rate under security constraints but also reduces errors, a critical characteristic for real-world deployment. To further enhance the suggested approach, this research will investigate multimodal biometric fusion in the future. This would involve combining 3D face and ear data with other biometric features, such as fingerprints or iris scans, to increase recognition accuracy and security. Improving the model’s robustness in real-world scenarios also involves addressing complex occlusions and location variations. Lastly, for biometric systems that prioritise user privacy, integrating this system with secure blockchain frameworks will make data even more private and make it easier to scale.

Conclusion and future work

The main goal of this study was to improve multimodal biometrics security and preprocess and extract features from 3D face and 3D ear data. For feature extraction from 3D point clouds, we suggested using PointNet++, a deep neural network based on CNN. In contrast to models that translate 3D data into 2D maps, PointNet + + directly processes 3D point clouds, retaining all the geometric aspects of the data and providing better feature extraction. This method guarantees enhanced performance in the recognition of 3D biometric traits by enhancing adaptability and accuracy. Based on the experimental results, after 60 epochs, the 3D face analysis is 99% accurate and the 3D ear analysis is 98% correct. The UND Collection G datasets for 3D ears and the FRGC dataset for 3D faces were used for this study, which were run in Python. Future applications that will make use of combined 3D face and 3D ear data characteristics include biometric security, recognition, and expression analysis, among others. This combined method should help multimodal biometrics grow and make security systems better in the future using blockchain technology. The proposed system performs well with 3D point cloud data. For a more comprehensive and robust biometric representation, future research will concentrate on expanding this framework to 4D and 5D data fusion, incorporating spectrum information and temporal dynamics. In future work, the proposed multimodal 3D face and 3D ear fusion framework will be further validated on additional benchmark datasets such as BU-3DFE and Bosphorus to evaluate its generalization capability across diverse conditions.

Acknowledgements

We are very grateful to the National Institute of Standards and Technology (NIST) for allowing us to use their datasets, which include the Face Recognition Grand Challenge (FRGC) for 3D Face and the Collection F and Collection G from Notre Dame University for 3D Ear. Their contributions have helped facilitate our research and making progress in biometrics.

Author contributions

Conceptualization & Methodology: **Veerpal Kaur, Devershi Pallavi Bhatt**- Software & Validation: **Sumegh Tharewal, Pradeep Kumar Tiwari**- Formal Analysis & Investigation: **Devershi Pallavi Bhatt, Sumegh Tharewal**- Writing – Original Draft Preparation: **Veerpal Kaur, Devershi Pallavi Bhatt**- Writing – Review & Editing: **Veerpal Kaur, Sumegh Tharewal**- Supervision & Project Administration: **Devershi Pallavi Bhatt, Sumegh Tharewal**.

Funding

Open access funding provided by Manipal University Jaipur. No agency has funded this research work.

Data availability

Our source code for feature extraction, fusion, and evaluation of the proposed 3D face-ear multimodal biometric recognition framework utilizing PointNet++ is available at: [https://github.com/Veerpalcode/Multimodal\_3D\_Biometric-System](https:/github.com/Veerpalcode/Multimodal_3D_Biometric-System).

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Zhou, S. & Xiao, S. 3D Face Recognition: A Survey. Hum. Cent. Comput. Inf. Sci.10.1186/s13673-018-0157-2 (2018). [Google Scholar]
  • 2.Kumar, K. K., Kasiviswanadham, Y., Indira, D. V. S. N. V., Priyanka Palesetti, P. & Bhargavi, C. V. Criminal face identification system using deep learning algorithm Multi-Task cascade neural network (MTCNN). Mater. Today: Proc.80, 2406–2410. 10.1016/j.matpr.2021.06.373 (2023). [Google Scholar]
  • 3.Kaur, V., Bhatt, D. P., Tiwari, P. K. & Tharewal, S. Blockchain technology combined with the CNN and hashing algorithms enabled the secure storage of 3D biometric face and ear data. JDMSC26, 729–738. 10.47974/JDMSC-1745 (2023). [Google Scholar]
  • 4.Benradi, H., Chater, A. & Lasfar, A. A hybrid approach for face recognition using a convolutional neural network combined with feature extraction techniques. IJ-AI12, 627. 10.11591/ijai.v12.i2.pp627-640 (2023). [Google Scholar]
  • 5.Patil, H., Kothari, A. & Bhurchandi, K. 3-D face recognition: Features, Databases, algorithms and challenges. Artif. Intell. Rev.44, 393–441. 10.1007/s10462-015-9431-0 (2015). [Google Scholar]
  • 6.Soltanpour, S. 3D Face recognition using local feature based methods.
  • 7.Kaur, V., Bhatt, D. P., Tharewal, S. & Tiwari, P. K. Blockchain-based secure storage model for multimodal biometrics using 3D face and ear. In Proceedings of the 2023 International Conference on Advancement in Computation & Computer Technologies (InCACCT); IEEE: Gharuan, India, pp. 860–865 (2023).
  • 8.Akhter, N., Gite, H., Tharewal, S., & Kale, K. V. Computer based RR-interval detection system with ectopy correction in HRV data. In Proceedings of the 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI); IEEE: Kochi, India, August, pp. 1613–1618 (2015).
  • 9.Bhatt, D. P., Bhatnagar, V. & Sharma, P. Meta-Analysis of predictions of COVID-19 disease based on CT-Scan and X-Ray images. J. Interdisciplinary Math.24, 381–409. 10.1080/09720502.2021.1884385 (2021). [Google Scholar]
  • 10.Luo, J., Hu, F. & Wang, R. 3D Face Recognition Based on Deep Learning. In Proceedings of the 2019 IEEE International Conference on Mechatronics and Automation (ICMA); IEEE: Tianjin, China, August, pp. 1576–1581 (2019).
  • 11.Seo, H. & Joo, S. Characteristic analysis of data preprocessing for 3D point cloud classification based on a deep neural network: pointNet. JKSNT41, 19–24. 10.7779/JKSNT.2021.41.1.19 (2021). [Google Scholar]
  • 12.Mu, G. et al. Led3D: A Lightweight and Efficient Deep Approach to Recognizing Low-Quality 3D Faces. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Long Beach, CA, USA, pp. 5766–5775 (2019).
  • 13.Faltemier, T. C., Bowyer, K. W. & Flynn, P. J. A region ensemble for 3-D face recognition. IEEE Trans. Inf. Forensic Secur.3, 62–73. 10.1109/TIFS.2007.916287 (2008). [Google Scholar]
  • 14.Boodoo, N. B. & Subramanian, R. K. Robust multi biometric recognition using face and ear images. (2009). 10.48550/ARXIV.0912.0955
  • 15.Yan, P. & Bowyer, K. W. Biometric recognition using 3D ear shape. IEEE Trans. Pattern Anal. Mach. Intell.29, 1297–1308. 10.1109/TPAMI.2007.1067 (2007). [DOI] [PubMed] [Google Scholar]
  • 16.Lei, Y., Bennamoun, M., Hayat, M. & Guo, Y. An efficient 3D face recognition approach using local geometrical signatures. Pattern Recogn.47, 509–524. 10.1016/j.patcog.2013.07.018 (2014). [Google Scholar]
  • 17.Wu, J., Mu, Z. & Wang, K. 3D Pure Ear Extraction and Recognition. In Biometric Recognition Vol. 7701 (eds Zheng, W.-S. et al.) 219–226 (Springer, 2012) (ISBN 978-3-642-35135-8). [Google Scholar]
  • 18.Dave, I. R., Iyappan Ganapathi, I., Prakash, S., Ali, S. S. & Mohan Srivastava, A. 3D Ear Biometrics: Acquisition and Recognition. In Proceedings of the 2018 15th IEEE India Council International Conference (INDICON); IEEE: Coimbatore, India, December, pp. 1–6 (2018).
  • 19.Feng, J. et al. 3D Face Recognition Method Based on Deep Convolutional Neural Network. In Smart Innovations in Communication and Computational Sciences Advances in Intelligent Systems and Computing Vol. 670 (eds Panigrahi, B. K. et al.) 123–130 (Springer, 2019) (ISBN 978-981-10-8970-1). [Google Scholar]
  • 20.Huang, G. B., Honglak, L. & Learned-Miller, E. Learning Hierarchical Representations for Face Verification with Convolutional Deep Belief Networks. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Providence, RI, June, pp. 2518–2525 (2012).
  • 21.Sharma, S. & Shaik, S. Real Time Face Authentication Using Convolutional Neural Network. In Proceedings of the International Conference on Signal Processing (ICSP 2016); Institution of Engineering and Technology: Vidisha, India, p. 16 (4) (2016).
  • 22.Dutta, K., Bhattacharjee, D., Nasipuri, M. & SpPCANet: A simple deep Learning-Based feature extraction approach for 3D face recognition. Multimed Tools Appl.79, 31329–31352. 10.1007/s11042-020-09554-6 (2020). [Google Scholar]
  • 23.Wu, Z. et al. Three-stream 3D convolutional neural network for near infrared facial expression recognition. Appl. Sci.10.3390/app7111184 (2017). [Google Scholar]
  • 24.Trimech, I. H., Maalej, A. & Amara, N. E. B. Point-Based Deep Neural Network for 3D Facial Expression Recognition. In Proceedings of the 2020 International Conference on Cyberworlds (CW); IEEE: Caen, France, September, pp. 164–171 (2020).
  • 25.Yu, C., Zhang, Z., & Li, H. Reconstructing A Large Scale 3D Face Dataset for Deep 3D Face Identification (2020).
  • 26.Jabberi, M., Wali, A., Neji, B., Beyrouthy, T. & Alimi, A. M. Face shapenets for 3D face recognition. IEEE Access.11, 46240–46256. 10.1109/ACCESS.2023.3270713 (2023). [Google Scholar]
  • 27.Dutta, K., Bhattacharjee, D. & Nasipuri, M. Expression and Occlusion Invariant 3D Face Recognition Based on Region Classifier. In Proceedings of the 2016 1st International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE); IEEE: Yogyakarta, Indonesia, August, pp. 99–104 (2016).
  • 28.Cai, Y. et al. Robust 3D face recognition approach based on deeply learned face representation. Neurocomputing363, 375–397. 10.1016/j.neucom.2019.07.047 (2019). [Google Scholar]
  • 29.Prakash, S. & Gupta, P. Human recognition using 3D ear images. Neurocomputing140, 317–325. 10.1016/j.neucom.2014.03.007 (2014). [Google Scholar]
  • 30.Omara, I., Li, F., Zhang, H. & Zuo, W. A. Novel geometric feature extraction method for ear recognition. Expert Syst. Appl.65, 127–135. 10.1016/j.eswa.2016.08.035 (2016). [Google Scholar]
  • 31.Islam, S. M. S., Davies, R., Bennamoun, M., Owens, R. A. & Mian, A. S. Multibiometric human recognition using 3D ear and face features. Pattern Recogn.46, 613–627. 10.1016/j.patcog.2012.09.016 (2013). [Google Scholar]
  • 32.Mursalin, M. & Islam, S. M. S. Deep learning for 3D ear detection: A complete pipeline from data generation to segmentation. IEEE Access.9, 164976–164985. 10.1109/ACCESS.2021.3129507 (2021). [Google Scholar]
  • 33.Cao, Y., Liu, S., Zhao, P. & Zhu, H. RP-Net: A PointNet + + 3D face recognition algorithm integrating RoPS local descriptor. IEEE Access.10, 91245–91252. 10.1109/ACCESS.2022.3202216 (2022). [Google Scholar]
  • 34.Tharewal, S. et al. Score-level fusion of 3D face and 3D ear for multimodal biometric human recognition. Comput. Intell. Neurosci.2022, 1–9. 10.1155/2022/3019194 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Theoharis, T., Passalis, G., Toderici, G. & Kakadiaris, I. A. Unified 3D face and ear recognition using wavelets on geometry images. Pattern Recogn.41, 796–804. 10.1016/j.patcog.2007.06.024 (2008). [Google Scholar]
  • 36.Tharewal, S., Gite, H. & Kale, K. V. 3D Face & 3D Ear Recognition: Process and Techniques. In Proceedings of the 2017 International Conference on Current Trends in Computer, Electrical, Electronics and Communication (CTCEEC); IEEE: Mysore, India, September, pp. 1044–1049 (2017).
  • 37.Cadavid, S., Mahoor, M. H. & Abdel-Mottaleb, M. Multi-Modal Biometric Modeling and Recognition of the Human Face and Ear. In Proceedings of the 2009 IEEE International Workshop on Safety, Security & Rescue Robotics (SSRR 2009); IEEE: Denver, CO, USA, November, pp. 1–6 (2009).
  • 38.Yan, P. & Bowyer, K. W. An Automatic 3D Ear Recognition System. In Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT’06); IEEE: Chapel Hill, NC, USA, June, pp. 326–333 (2006).
  • 39.LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature521, 436–444. 10.1038/nature14539 (2015). [DOI] [PubMed] [Google Scholar]
  • 40.Kim, D., Hernandez, M., Choi, J. & Medioni, G. Deep 3D Face Identification. In Proceedings of the 2017 IEEE International Joint Conference on Biometrics (IJCB); IEEE: Denver, CO, October, pp. 133–142 (2017).
  • 41.Zhang, J., Pan, C. & Huang, J. Research on Multimodal 3D Face Recognition Method Based on Deep Learning, 31 (2023).
  • 42.Qi, C. R., Su, H., Mo, K., & Guibas, L. J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (2016).
  • 43.Lu, H. & Shi, H. Deep Learning for 3D Point Cloud Understanding: A Survey 2020.
  • 44.Qi, C. R., Yi, L., Su, H., Guibas, L. J. & PointNet++ Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 30 (Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.). (Curran Associates, Inc., 2017).
  • 45.Sullivan, E. O. & Zafeiriou, S. 3D Landmark Localization in Point Clouds for the Human Ear. In Proceedings of the 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020); IEEE: Buenos Aires, Argentina, November, pp. 402–406 (2020).
  • 46.Bello, S. A. et al. Deep learning on 3D point clouds. Remote Sens.10.3390/rs12111729 (2020). [Google Scholar]
  • 47.Seo, H. & Joo, S. Influence of Preprocessing and Augmentation on 3D Point Cloud Classification Based on a Deep Neural Network: PointNet. In Proceedings of the 2020 20th International Conference on Control, Automation and Systems (ICCAS); IEEE: Busan, Korea (South), October 13, pp. 895–899 (2020).
  • 48.Mursalin, M., Ahmed, M., Haskell-Dowland, P. & Biometric Security,. A novel ear recognition approach using a 3D morphable ear model. Sensors10.3390/s22228988 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Wang, Q., Qian, W. Z., Lei, H. & Chen, L. Siamese neural pointnet: 3D face verification under pose interference and partial occlusion. Electronics12 (3), 620 (2023). [Google Scholar]
  • 50.Alagarsamy, S. B. & Murugan, K. Multimodal of ear and face biometric recognition using adaptive approach Runge–Kutta threshold segmentation and classifier with score level fusion. Wireless Pers. Commun.124 (2), 1061–1080 (2022). [Google Scholar]
  • 51.Zhu, Q. & Mu, Z. PointNet + + and three layers of features fusion for occlusion three-dimensional ear recognition based on one sample per person. Symmetry12 (1), 78 (2020). [Google Scholar]
  • 52.Yang, J., Li, Q. & Shen, L. PointSurFace: discriminative point cloud surface feature extraction for 3D face recognition. Pattern Recogn.156, 110858 (2024). [Google Scholar]
  • 53.Okada, K. et al. 3D Facial Ethnicity Identification Using Point Cloud Deep Learning with Local Area Attention. In 2024 IEEE International Conference on Consumer Electronics (ICCE) (pp. 1–4) (IEEE, 2024).
  • 54.Jabberi, M., Wali, A. & Alimi, A. M. Generative data augmentation applied to face recognition. In 2023 International Conference on Information Networking (ICOIN) (pp. 242–247) (IEEE, 2023).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Our source code for feature extraction, fusion, and evaluation of the proposed 3D face-ear multimodal biometric recognition framework utilizing PointNet++ is available at: [https://github.com/Veerpalcode/Multimodal\_3D\_Biometric-System](https:/github.com/Veerpalcode/Multimodal_3D_Biometric-System).


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES