Skip to main content
Heliyon logoLink to Heliyon
. 2023 Dec 8;10(1):e23252. doi: 10.1016/j.heliyon.2023.e23252

Automated sign language detection and classification using reptile search algorithm with hybrid deep learning

Hadeel Alsolai a, Leen Alsolai a, Fahd N Al-Wesabi b,, Mahmoud Othman c, Mohammed Rizwanullah d, Amgad Atta Abdelmageed d
PMCID: PMC10750143  PMID: 38148822

Abstract

Sign language recognition (SLR) contains the capability to convert sign language gestures into spoken or written language. This technology is helpful for deaf persons or hard of hearing by providing them with a way to interact with people who do not know sign language. It is also be utilized for automatic captioning in live events and videos. There are distinct methods of SLR comprising deep learning (DL), computer vision (CV), and machine learning (ML). One general approach utilises cameras for capturing the signer's hand and body movements and processing the video data for recognizing the gestures. One of challenges with SLR comprises the variability in sign language through various cultures and individuals, the difficulty of certain signs, and require for realtime processing. This study introduces an Automated Sign Language Detection and Classification using Reptile Search Algorithm with Hybrid Deep Learning (SLDC-RSAHDL). The presented SLDC-RSAHDL technique detects and classifies different types of signs using DL and metaheuristic optimizers. In the SLDC-RSAHDL technique, MobileNet feature extractor is utilized to produce feature vectors, and its hyperparameters can be adjusted by manta ray foraging optimization (MRFO) technique. For sign language classification, the SLDC-RSAHDL technique applies HDL model, which incorporates the design of Convolutional Neural Network (CNN) and Long-Short Term Memory (LSTM). At last, the RSA was exploited for the optimal hyperparameter selection of the HDL model, which resulted in an improved detection rate. The experimental result analysis of the SLDC-RSAHDL technique on sign language dataset demonstrates the improved performance of the SLDC-RSAHDL system over other existing DL techniques.

Keywords: Sign language, Deep learning, Computer vision, Reptile search algorithm, Intelligent models

1. Introduction

Sign language is a computer vision-based comprehensive complex language that captivates signs formed by the actions of hands in association with facial expressions [1]. It is a natural language employed by an individual with less or no hearing intelligence for communication. Sign language can be implemented for communicating words, letters, or sentences by employing diverse gestures of the hands [2]. This kind of communication makes it simple for hearing-challenged individual to express their opinions and assist in linking the communication gap amongst normal and hearing-challenged individuals. People have adapted to sign language for communicating since the antique period [3]. Hand signs are as old as human civilization itself. Hand gestures are specifically advantageous in expressing any emotion or word to communicate. Hence, humans around the globe employ gestures from hand regularly in expressing themselves spite the creation of writing conventions [4]. Recently, much study has been continuing in emerging systems that are able to classify gestures of diverse sign languages as provided class. Such systems have found applications in robot controls, natural language communications, virtual reality environments, and games [5]. The automated identification of human gestures is a convolutional multi-disciplinary issue that has not yet been totally resolved. In recent years, a count of methods can be employed that involve the implementation of ML procedures for sign language identification [6]. Since the beginning of Deep Learning (DL) methods, there have been attempts to identify human gestures.

To identify gestures, diverse aspects like articulated models and hand-crafted spatio-temporal descriptors were employed together with gesture classifiers, conditional random fields [7], hidden Markov models, and Support Vector Machines (SVM) have been extensively employed. But categorization of signs is unforeseeable under changing illumination conditions, and from diverse subjects is still a threatening issue [8]. An instinctive approach for producing interfaces is to look at the user's muscle activity. The device can record this action by employing a camera [9]. This recorded imagery can be recognized by DL algorithms to determine the gesture. In recent times, categorization with DCNN networks has been efficient in several identification challenges [10]. Multi-column DCNNs that use several similar networks have been demonstrated to enhance recognition rates of single networks.

This study introduces an Automated Sign Language Detection and Classification using Reptile Search Algorithm with Hybrid Deep Learning (SLDC-RSAHDL). In the SLDC-RSAHDL technique, MobileNet feature extractor is utilized to produce feature vectors, and its hyperparameters can be adjusted by manta ray foraging optimization (MRFO) system. For sign language classification, the SLDC-RSAHDL technique applies HDL model, which incorporates the design of Convolutional Neural Network (CNN) and Long-Short Term Memory (LSTM). At last, the RSA was exploited for the optimal hyperparameter selection of the HDL model, which resulted in improved detection rate. The experimental result examination of the SLDC-RSAHDL algorithm was executed on sign language database.

2. Literature review

Pandey et al. [11] proposed a novel Feed Forward Neural Network (FFNN) model system that can automatically identify sign language to help normal humans in more efficient communication with impaired visually, hearing-wise, or speech-wise. This scheme recognized the hand gesture aspect point extraction given with FF point extraction given with FFNN. Hand gesture recognition with voice process scheme by implementing Hidden Markov Model (HMM) is employed to deliver communication for normal and dump individuals. In Ref. [12], a new outline is suggested for gesture-autonomous sign language identification by employing several DL constructions containing hand semantic segmentation, Deep Recurrent Neural Network (DRNN), and hand shaped factor depiction. Abstracting hand shaped aspects is performed by implementing a single layer Convolutional Self-Organizing Map (CSOM) rather than depending on transfer learning (TL) of pre-trained CNNs (DCNNs). The series of abstracted aspect vectors is later identified by implementing BiLSTM-RNN.

In [13], a two-stream CNN (2 S–CNN) framework was suggested to identify the American Sign Language (ASL) hand signs founded on multi-modal (RGB and depth) data fusion. Initially, the hand sign information was improved to eliminate the impact of noise and background. Next, hand sign RGB and depth features are abstracted for hand sign detection by corresponding CNNs on 2 streams. Lee et al. [14] suggest an ASL learning application model. This application will be a whack-a-mole gaming with an embedded real time gesture identification scheme. As both dynamic and static gestures (J, Z) are present in ASL alphabetical system, LSTMRNN with KNN technique is accepted as the categorization technique is founded on management of a series of inputs. Features like angles amongst fingers, distance amongst finger positions, and sphere radius are abstracted as input for the categorization prototype.

Rastgoo et al. [15] suggest a new DL-founded pipeline construction for effective instinctive hand gesture language identification by implementing 2DCNN, Single Shot Detector (SSD), 3DCNN, and LSTM from RGB input videos. The authors employ a CNN-founded prototype that evaluates the 3D hand keypoint from 2D input segments. Das et al. [16] suggested a fusion porotype comprising deep TL founded on CNN with an RF categorizer for the instinctive identification of Bangla Sign Language (BSL) (numeric and alphabetical symbols). 'Ishara-Bochon' and 'Ishara-Lipi' are both datasets of secluded numeric and alphabetical symbols, corresponding to the initial comprehensive multi-purpose open-access dataset for BSL. Also, the authors suggested a background elimination protocol that eliminates needless aspects from the gesture imageries. The authors [17] suggest a Fully Convolutional Network (FCN) for online SLR to simultaneously learn temporal and spatial aspects from feebly interpreted video series with sole sentence-level explanations provided. A Gloss Feature Enhancement (GFE) segment is presented in the suggested networks to apply better series orientation learning.

3. The proposed model

In this article, we have introduced a new SLDC-RSAHDL technique for automated detection and classification of sign language using the DL and metaheuristic optimization algorithms. It follows a four stage process: MobileNet feature extraction, MRFO based hyperparameter tuning, HDL based sign language recognition (SLR), and RSA based parameter tuning. Fig. 1 signifies the overall flow of SLDC-RSAHDL approach.

Fig. 1.

Fig. 1

Overall flow of SLDC-RSAHDL approach.

3.1. Feature extraction using MobileNet

The basic principle of lightweight model is to develop effective network computation for convolution models that could minimalize the number of parameters and the computation time while guaranteeing the detection performance. Sifre, in the US in 2014, first proposed the MobileNet model, which was the depth‐separable convolution that splits the typical convolutional layer into point‐wise and depth‐wise convolutional layer separable convolutional layer that implies the summation and convolution in the classical convolutional model are divided as, such that the computation speed is improved increased and thus, the amount of weight parameters evaluated by the network could be decreased considerably [18].

Consider that the length and width of output and input are constant and that the number of channels M, input is a feature map of length DF and width DF, later a convolutional kernel of height DK and width DK, the typical convolution will output a number of channels N, feature map of length DK and width DK. Set this to G; the typical convolution is DFxDFxMxNxDKDK. This convolution process was mathematical process written as:

Gk,l,n=i,j,mKi,j,m,nFk+i1,l+j1,m (1)

The computation of every Gˆ needs the sum of each m. Depth‐separable convolution to take out the m alone.

Later depth separable convolutional layer splits the classical convolution kernels into summation and convolution parts. In such cases, the pointwise convolution map has single parameter, the amount of resultant features N, whereas the depth convolutional map has three variables, the amount of input features M, the length DK and the width DK. The original 4 parameters were split into 1 and 3 parameters; hence it can be mathematical model has been changed.

Gˆ=mi,jKˆi,j,mFk+i1,l+j1,m (2)
Gk,l,n=mGˆk,l,mKm,n (3)

Where K represents the convolutional kernel for pointwise convolutional and Kˆ represents the convolutional kernel for depthwise convolutional.

Gk,l,n=mi,jKˆi,j,m,Fk+i1,l+j1,mKm,n (4)

For such reasons, it is easier to ensure for the depth separable convolutional, the amount of convolution execution is evaluated in 2 stages. Initially, MDKxDK matrices moved DFxDF times; next, N 1x 1xM convolutional kernels moved DFxDF times; hence the overall amount of convolutional executions can be attained by adding the number of depth‐separable convolutional and the abovementioned two executions. The amount of computation is DFxDFxMxDFxDF+1x1xMxNxDFxDF, The ratio of computation work of the depth separable convolutional layer to the typical convolutional can be given as follows:

DK×DK×M×DF×DF×M×N×DF×DFDK×DK×DF×DF×M×N=1N+1DK2 (5)

The abovementioned formula demonstrates that the computation reduction is positively related to DK and N. Furthermore, the convolution kernels of the depth convolutional layer in MobileNet are known to be 3x3, and during their implementation, the computation of depth separable convolutional layer is 1/8 to 1/9 of that of the typical convolutional, thereby accomplishing the drive of enhancing the computational rate of network structure.

3.2. Hyperparameter tuning using MRFO algorithm

For hyperparameter tuning process of the MobileNet algorithm, the MRFO technique was employed. The MRFO algorithm simulates three foraging performances for upgrading the solution position [19]. The foraging performances like cyclone, somersault, and chain. The mathematical process for every foraging performance is described below:

Chain foraging: The foraging chain has been developed if manta rays arrange head‐to‐tail. In each iteration, an optimum solution was utilized for updating every individual. The subsequent mathematical model can demonstrate it:

xid(t+1)={xid(t)+r(xbestd(t)xid(t))+α(xbestd(t)xid(t)),i=1xid(t)+r(xi1d(t)xid(t))+α(xbestd(t)xid(t)),i=2,N (6)
α=2r|log(r)|

Whereas N signifies the dimensional of populations, r denotes the random vector among 0 and 1, xid(t) refers to the ith individual's position at tth iteration, α implies the weighted coefficient, and xbestd(t) stands for the plankton with maximal concentration (an optimum solution gained so far).

Cyclone foraging: If the manta rays spot food, they can generate a lengthy foraging chain and therefore swim for receiving the food. The subsequent mathematical formula defined the cyclone foraging performance:

xid(t+1)={xbestd(t)+r(xbestd(t)xid(t))+β(xbestd(t)xid(t)),i=1xbestd(t)+r(xi1d(t)xid(t))+β(xbestd(t)xid(t)),i=2,N (7)
β=2er1(Tt+1/T)sin(2πr1)

In which β and T signify the weighted factor and maximal iteration count correspondingly, and r1 denotes the random value among zero and one.

The exploration process is utilized for improving the algorithm by utilizing the subsequent mathematical process:

xrandd=Lbd+r(UbdLbd) (8)
xid(t+1)={xrandd(t)+r(xrandd(t)xid(t))+β(xrandd(t)xid(t)),i=1xrandd(t)+r(xi1d(t)xid(t))+β(xrandd(t)xid(t)),i=2,N (9)

whereas xrandd denotes the random position from the searching space, and Ubd and Lbd imply the lower and upper limits of dth dimensional correspondingly

Somersault foraging: The food position at this point was considered as pivot, whereas all the individuals performed to swim near or around the pivot and afterwards somersaults to a novel position. The equivalent mathematical formula is offered as depicted:

xid(t+1)=xid(t)+S(r2xbestdr3xid(t)),i=1,,N (10)

Whereas the somersault factor was defined by S, and r2 and r3 signify the random numbers among zero and one.

3.3. Sign language classification using optimal HDL model

In this work, the classification of signs takes place by the HDL model. For two major reasons, CNN provides better accuracy in pattern recognition and classification. Primary, its structure was highly relevant for determining local connections amongst data points; next, it decrease the amount of network parameters [20], thus resulting in a low computation difficulty than traditional plain neural network architecture. Fig. 2 displays the structure of CNN. The equation of one standard convolution layer is formulated by Eq. (11):

Xconv=conv1D(Wconv,X) (11)

Where Xconv, Wconv, correspondingly denotes the output vector, weighted matrix of convolutional layer, X indicates the sensors input, and conv1 D indicates the 1D convolutional operator. The hyperparameter of convolutional layer is the length of kernel Lk representing the count of neighboring data points aggregated, and the amount of kernel Nk representing the amount of local features extracted.

Fig. 2.

Fig. 2

Structure of CNN.

Then, Xconv is fed into the LSTM layer that exploits data at many preceding time steps for perceiving insight into current time step, represented as “long‐term dependency”. Introducing L a classical linear conversion of integration of Xtconv with Nk feature at t time step and resultant of hidden state ht1 with Nh features at prior step:

L(ht1,Xtconv)=W[ht1,Xtconv]+b, (12)

In Eq. (12), W and b denote the weighted matrix and bias vector; it can be noteworthy that the amount of features of L is equivalent to that of hidden output h. All the cells of LSTM include 3 gates such as forget gate ff, input gate fi, and output gate f0, that include nonlinear sigmoid function σ to a linear conversion L as follows:

ff=σ(Lf(ht1,Xtconv))ht1,Xtconv))Lf
fi=σ(Li(ht1,Xtconv)) (13)
f0=σ(Lo(ht1,Xtconv))

At the same time, a novel candidate of data produced at t time step can be evaluated by the tanh activation function to linear conversion of concatenation [ht1;Xtconv]:

Ct=tanh(Lc(ht1,Xtconv)), (14)

Next, the candidate enters LSTM cells:

st=(ffht1)(fiCt), (15)

and hidden output of LSTM cell at t time step can be evaluated at the output gate:

ht=f0st. (16)

Where and correspondingly represents component-wise addition and multiplication of two vectors. As soon as input data enter a network, it can be split into fixed‐length segments, and then the IDCNN layer extracts local connections amongst their surrounding points and data points beforehand, feeding to the memory cell of LSTM where long‐term dependency is recognized and preserved over time. During this hybrid DL structure, the hyperparameter that needs to be further defined is the size of hidden output Nh, amount of kernels Nk, and the kernel length Lk in the convolutional layer at all the LSTM cells.

Finally, the RSA adjusts the hyperparameter values of the HDL model. The highly coordinated and cooperative hunting method demonstrated by the crocodiles includes encircling the target, and hunting has been an inspiration for the current reptile search algorithm RSA [21].

X=[x1,1x1,jx1,n1x1,nx2,1x2,jx2,nxi,jxN1,1xN1,jxN1,nxN,1xN,jxN,n1xN,n] (17)

The initialization stage begins with generating X matrix of random solution xi,j based on Eq. (17), where n denotes the dimensionality of specific problem, i represents the index of the individual, j shows its existing location, andN represents the overall amount of individuals.

χij=rand×(UBLB)+LB,j=1,2,,n (18)

Eq. (18) produces random individuals. Now, rand represents the arbitrary integer within the range, and LB and UB represent the lower and upper bounds of searching spaces. The search process was split into two major procedures (neighboring prey, afterwards the attack) accompanied by the 4 distinct behaviors for emphasizing exploration and exploitation. Exploration exploits 2 walking strategies demonstrated by crocodiles: stomach walk and elevated walk. The key objective of the crocodile is to extend the searching region and helps for the next hunting stage. The elevated walk method can be used if tT4, whereas the stomach walk is triggered if t>T4 and t2T4. Eq. (19) is accountable for updating the position of crocodile:

x(i,j)(t+1)={Bestj(t)×η(i,j)(t)×βR(i,j)(t)×rand,tT4Bestj(t)×χ(r1,j)×ES(t)×rand,t>T4andt2T4 (19)
η(i,j)=Bestj(t)×P(i,j) (20)

In Eq. (19), T shows the maximal amount of iterations, Bestj represents the present optimum individual at j-th position, and t denotes the ongoing iteration. The hunting operator η(i,j) was determined by Eq. (20), where β shows the sensitive parameter fixed at 0.1, which governs the exploration performance.

The searching space was shrunk by using the reduction function, determined using Eq. (21), where r1 denotes a random integer ranging from 1 to N, xr1,j signifies the ithls solution random location, and e represents a smaller value.

R(i,j)=Bestj(t)χ(r1j)Bestj(t)+ϵ (21)

Eq. (22) evaluates the probability ratio, named "Evolutionary Sense", that arbitrarily alternates in [2, 2] as round passes by:

ES(t)=2×r2×(11T) (22)

Where r2 indicates the arbitrary value inside.

Eq. (23) define the percentage difference between the position of the observed and best‐obtained individual:

P(i,j)=α+χ(i,j)M(χi)Bestj(t)×(UB(j)LB(j))+ϵ (23)

In Eq. (23), α denotes the sensitive variable, with the predetermined value 0.1, which controls the fluctuations amongst possible individuals appropriate for co-operated hunting. The corresponding upper and lower boundaries of the jth position were indicated as UB(j) and LB(j)..

The average location M(X) of ith individual was expressed as follows.

M(χi)=1nj=1nχ(i,j) (24)

The RSA exploitation process is divided into hunting coordination (if t3T4 and t>T2) and cooperation (iftT and t>3T4) technique, aims to strengthen the local investigation of the search realm and closer to the optimum individual. The hunting behavior shown by the crocodile has been expressed as.

x(i,j)(t+1)={Bestj(t)×P(i,j)(t)×rand,t3T4andt>T2Bestj(t)η(i,j)(t)×eR(i,j)(t)×rand,tTandt>3T4 (25)

The basic RSA shows the time complexity of the O(N×(T×D+1 where N indicates the candidate counts, T represents the round counts, and D denotes the dimensional of solution spaces. The RSA method creates a fitness function (FF) to make superior classifier result. It explains a positive integer to exemplify the good performance of candidate outcomes. During this effort, the minimizing of classifier error rate was supposed that FF is formulated in Eq. (26).

fitness(xi)=ClassifierErrorRate(xi)=numberofmisclassifiedsamplesTotalnumberofsamples*100 (26)

4. Experimental Evaluation

In this section, the SLR performance of the SLDC-RSAHDL technique is studied using the ASL alphabet dataset from Kaggle repository [22]. The database has a group of images of alphabets in American Sign Language, divided into 29 folders that expose several classes. Table 1 and Fig. 3 offer a detailed recognition result of the SLDC-RSAHDL technique under 29 classes. The results indicate that the SLDC-RSAHDL technique performs proficiently in each class. At the same time, it is noticed that the SLDC-RSAHDL technique accomplishes effectual outcomes with average precn of 99.42 %, recal of 99.43 %, accuy of 99.51 %, and Fscore of 99.43 %.

Table 1.

Classifier outcome of SLDC-RSAHDL approach under 29 classes.

Sign Precision Recall Accuracy F-Score Sign Precision Recall Accuracy F-Score
A 99.25 99.80 99.65 99.69 P 99.57 99.33 99.58 99.32
B 99.39 99.45 99.70 99.20 Q 99.53 99.41 99.43 99.25
C 99.52 99.36 99.49 99.75 R 99.43 99.20 99.55 99.21
D 99.25 99.68 99.43 99.49 S 99.53 99.57 99.62 99.54
E 99.45 99.22 99.40 99.49 T 99.50 99.22 99.21 99.75
F 99.41 99.56 99.75 99.41 U 99.27 99.34 99.72 99.55
G 99.53 99.21 99.31 99.35 V 99.26 99.66 99.54 99.38
H 99.33 99.51 99.37 99.32 W 99.30 99.34 99.45 99.49
I 99.77 99.34 99.73 99.52 X 99.43 99.45 99.46 99.70
J 99.48 99.24 99.39 99.26 Y 99.27 99.53 99.74 99.30
K 99.27 99.49 99.36 99.32 Z 99.44 99.36 99.70 99.34
L 99.45 99.22 99.49 99.57 Space 99.51 99.72 99.64 99.28
M 99.76 99.43 99.33 99.66 Nothing 99.26 99.58 99.58 99.63
N 99.20 99.48 99.73 99.28 Delete 99.30 99.28 99.20 99.31
O 99.65 99.42 99.21 99.20 Average 99.42 99.43 99.51 99.43

Fig. 3.

Fig. 3

Average outcome of SLDC-RSAHDL approach.

Table 2 and Fig. 4, Fig. 5 reports a brief recognition outcome of the SLDC-RSAHDL approach with other optimizers. The experimental values highlighted that the RMSProp optimizer and Adam optimizers had reached almost nearer performance with accuy of 98.95 % and 98.93 %, respectively. Along with that, the SGD optimizer gains considerable outcomes with accuy of 99.28 %, precn of 99.19 %, recal of 99.24 %, and Fscore of 99.11 %. However, the SLDC-RSAHDL technique resulted in enhanced performance with accuy of 99.51 %, precn of 99.42 %, recal of 99.43 %, and Fscore of 99.43 %.

Table 2.

Recognition outcome of SLDC-RSAHDL approach with distinct measures.

Methods Accuy Precn Recal FScore
SLDC-RSAHDL 99.51 99.42 99.43 99.43
SGD Optimizer 99.28 99.19 99.24 99.11
RMSProp Optimizer 98.95 99.02 99.19 99.08
Adam Optimizer 98.93 99.00 99.15 99.01

Fig. 4.

Fig. 4

Accuy and Fscore outcome of SLDC-RSAHDL approach.

Fig. 5.

Fig. 5

Precn and Recal outcome of SLDC-RSAHDL approach.

Fig. 6 inspects the accuracy of other existing techniques during the training and validation process on test dataset. The figure stated that the other existing techniques reach enhancing accuracy values over increasing epochs. Moreover, the increasing validation accuracy over training accuracy exposed those other existing methods that learn effectively on the test dataset.

Fig. 6.

Fig. 6

Accuracy curve of other existing approaches.

The loss investigation of other existing systems at the time of training and validation is exhibited on the test dataset in Fig. 7. The outcomes inferred that other existing methods gain closer values of training and validation loss. It is clear that other existing techniques learn effectively on the test dataset.

Fig. 7.

Fig. 7

Loss curve of other existing approaches.

Table 3 reports an overall comparison analysis of the SLDC-RSAHDL technique in terms of recognition rate (RR) and computation time (CT) [23]. In Fig. 8, a comparative RR investigation of the SLDC-RSAHDL technique with other models was performed. The results imply that the KNN model resulted from ineffective outcomes with minimal RR of 97.29 %. At the same time, the SVM and ANN models have accomplished considerably enhanced performance with closer RR of 98.31 % and 98.54 % respectively. Concurrently, the CNN model accomplishes reasonable RR of 99.12 %. But the SLDC-RSAHDL technique reaches higher performance with RR of 99.43 %.

Table 3.

Comparative outcome of SLDC-RSAHDL system with other techniques.

Methods Recognition rate (%) Computation Time (min)
K-Nearest Neighbors 97.29 16.84
Support Vector Machine 98.31 15.10
Artificial Neural Network 98.54 14.36
Conv. Neural Network 99.12 11.26
SLDC-RSAHDL 99.43 6.14

Fig. 8.

Fig. 8

RR analysis of SLDC-RSAHDL approach with other algorithms.

In Fig. 9, a comparative CT examination of the SLDC-RSAHDL approach with other techniques was performed. The outcomes inferred that the KNN system resulted from ineffective outcomes with maximal CT of 16.84min. Besides, the SVM and ANN algorithms have obtained considerably superior performance with closer CTs of 15.10min and 14.36min. Finally, the CNN method reaches reasonable CT of 11.26min. However, the SLDC-RSAHDL system attains effectual performance with CT of 6.14min.

Fig. 9.

Fig. 9

CT analysis of SLDC-RSAHDL approach with other algorithms.

From the detailed results and discussion, it can be concluded that the SLDC-RSAHDL algorithm reaches effectual performance on the SLR process.

5. Conclusion

In this study, we have introduced a novel SLDC-RSAHDL technique for automated detection and classification of sign language using the DL and metaheuristic optimization algorithms. It follows a four-stage process: MobileNet feature extraction, MRFO based hyperparameter tuning, HDL based SLR, and RSA based parameter tuning. The design of the MRFO and RSA algorithms assists in the effectual selection of the hyperparameters related to the MobileNet and HDL models, which results in improved detection rate. The experimental result analysis of the SLDC-RSAHDL technique on sign language dataset demonstrates the improved performance of the SLDC-RSAHDL technique over other recent DL algorithms. In the future, the detection performance of the SLDC-RSAHDL technique was boosted by the fusion-based ensemble models' design.

Data Availability Statement

The data used in this article was not collected from any public repository. The data collected as responses for this study was collected from individuals working in the case organization.

Ethics approval

This article does not contain any studies with human participants performed by any of the authors.

Consent to Participate

Not applicable.

Funding details

None.

Informed Consent

Not applicable.

CRediT authorship contribution statement

Hadeel Alsolai: Conceptualization, Data curation, Funding acquisition, Methodology, Writing - original draft. Leen Alsolai: Conceptualization, Writing - original draft, Writing - review & editing. Fahd N. Al-Wesabi: Conceptualization, Writing - original draft, Writing - review & editing. Mahmoud Othman: Conceptualization, Methodology, Writing - original draft, Writing - review & editing. Mohammed Rizwanullah: Methodology, Software, Writing - original draft, Writing - review & editing. Amgad Atta Abdelmageed: Conceptualization, Data curation, Validation, Writing - original draft, Writing - review & editing.

Declaration of Competing interest

The authors declare that they have no conflict of interest. The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.

Acknowledgment

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number RI-44-0522.

References

  • 1.Kothadiya D., Bhatt C., Sapariya K., Patel K., Gil-González A.B., Corchado J.M. Deepsign: sign language detection and recognition using deep learning. Electronics. 2022;11:1780. [Google Scholar]
  • 2.Katoch S., Singh V., Tiwary U.S. Indian Sign Language recognition system using SURF with SVM and CNN. Array. 2022;14 [Google Scholar]
  • 3.Kamruzzaman M.M. Arabic sign language recognition and generating Arabic speech using convolutional neural network. Wireless Commun. Mobile Comput. 2020 doi: 10.1155/2020/3685614. [DOI] [Google Scholar]
  • 4.Zakariah M., Alotaibi Y.A., Koundal D., Guo Y., Mamun Elahi M. Sign language recognition for Arabic alphabets using transfer learning technique. Comput. Intell. Neurosci. 2022 doi: 10.1155/2022/4567989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bird J.J., Ekárt A., Faria D.R. British sign language recognition via late fusion of computer vision and leap motion with transfer learning to american sign language. Sensors. 2020;20:5151. doi: 10.3390/s20185151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Mannan A., Abbasi A., Javed A.R., Ahsan A., Gadekallu T.R., Xin Q. Hypertuned deep convolutional neural network for sign language recognition. Comput. Intell. Neurosci. 2022:2022. doi: 10.1155/2022/1450822. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  • 7.Hameed H., Usman M., Khan M.Z., Hussain A., Abbas H., Imran M.A., Abbasi Q.H. 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) IEEE; 2022. July. Privacy-preserving British sign language recognition using deep learning; pp. 4316–4319. [DOI] [PubMed] [Google Scholar]
  • 8.Elakkiya R. Retracted article: machine learning based sign language recognition: a review and its research frontier. J. Ambient Intell. Hum. Comput. 2021;12:7205–7224. [Google Scholar]
  • 9.Li D., Rodriguez C., Yu X., Li H. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020. Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison; pp. 1459–1469. [Google Scholar]
  • 10.Sharma S., Kumar K.A.S.L.-3D.C.N.N. American sign language recognition technique using 3-D convolutional neural networks. Multimed. Tool. Appl. 2021;80:26319–26331. [Google Scholar]
  • 11.Pandey A., Chauhan A., Gupta A. Voice based Sign Language detection for dumb people communication using machine learning. J. Pharm. Negat. Results. 2023:22–30. [Google Scholar]
  • 12.Aly S., Aly W. A novel signer-independent deep learning framework for isolated Arabic sign language gestures recognition. IEEE Access. 2020;8:83199–83212. [Google Scholar]
  • 13.Gao Q., Ogenyi U.E., Liu J., Ju Z., Liu H. vol. 19. Springer International Publishing; Portsmouth, UK: 2020. A two-stream CNN framework for American sign language recognition based on multimodal data fusion; pp. 107–118. (Advances in Computational Intelligence Systems: Contributions Presented at the 19th UK Workshop on Computational Intelligence, September 4-6, 2019). [Google Scholar]
  • 14.Lee C.K., Ng K.K., Chen C.H., Lau H.C., Chung S.Y., Tsoi T. American sign language recognition and training method with recurrent neural network. Expert Syst. Appl. 2021;167 [Google Scholar]
  • 15.Rastgoo R., Kiani K., Escalera S. Hand sign language recognition using multi-view hand skeleton. Expert Syst. Appl. 2020;150 [Google Scholar]
  • 16.Das S., Imtiaz M.S., Neom N.H., Siddique N., Wang H. A hybrid approach for Bangla sign language recognition using deep transfer learning model with random forest classifier. Expert Syst. Appl. 2023;213 [Google Scholar]
  • 17.Cheng K.L., Yang Z., Chen Q., Tai Y.W. Computer Vision–ECCV 2020: 16th European Conference. Springer International Publishing; Glasgow, UK: 2020. Fully convolutional networks for continuous sign language recognition; pp. 697–714. August 23–28. [Google Scholar]
  • 18.Wang H., Lu F., Tong X., Gao X., Wang L., Liao Z. A model for detecting safety hazards in key electrical sites based on hybrid attention mechanisms and lightweight Mobilenet. Energy Rep. 2021;7:716–724. [Google Scholar]
  • 19.Ganesh N., Shankar R., Čep R., Chakraborty S., Kalita K. Efficient feature selection using weighted superposition attraction optimization algorithm. Appl. Sci. 2023;13:3223. [Google Scholar]
  • 20.Dang H.V., Tran-Ngoc H., Nguyen T.V., Bui-Tien T., De Roeck G., Nguyen H.X. Data-driven structural health monitoring using feature fusion and hybrid deep learning. IEEE Trans. Autom. Sci. Eng. 2020;18:2087–2103. [Google Scholar]
  • 21.Stoean C., Zivkovic M., Bozovic A., Bacanin N., Strulak-Wójcikiewicz R., Antonijevic M., Stoean R. Metaheuristic-based hyperparameter tuning for recurrent deep learning: application to the prediction of solar energy generation. Axioms. 2023;12:266. [Google Scholar]
  • 22.https://www.kaggle.com/datasets/grassknoted/asl-alphabet?select=asl_alphabet_test
  • 23.Alrowais F., Alotaibi S.S., Dhahbi S., Marzouk R., Mohamed A., Hilal A.M. Sign Language recognition and classification model to enhance quality of disabled people. CMC-COMPUTERS MATERIALS & CONTINUA. 2022;73:3419–3432. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used in this article was not collected from any public repository. The data collected as responses for this study was collected from individuals working in the case organization.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES