Automated sign language detection and classification using reptile search algorithm with hybrid deep learning

Hadeel Alsolai; Leen Alsolai; Fahd N Al-Wesabi; Mahmoud Othman; Mohammed Rizwanullah; Amgad Atta Abdelmageed

doi:10.1016/j.heliyon.2023.e23252

. 2023 Dec 8;10(1):e23252. doi: 10.1016/j.heliyon.2023.e23252

Automated sign language detection and classification using reptile search algorithm with hybrid deep learning

Hadeel Alsolai ^a, Leen Alsolai ^a, Fahd N Al-Wesabi ^b,^∗, Mahmoud Othman ^c, Mohammed Rizwanullah ^d, Amgad Atta Abdelmageed ^d

PMCID: PMC10750143 PMID: 38148822

Abstract

Sign language recognition (SLR) contains the capability to convert sign language gestures into spoken or written language. This technology is helpful for deaf persons or hard of hearing by providing them with a way to interact with people who do not know sign language. It is also be utilized for automatic captioning in live events and videos. There are distinct methods of SLR comprising deep learning (DL), computer vision (CV), and machine learning (ML). One general approach utilises cameras for capturing the signer's hand and body movements and processing the video data for recognizing the gestures. One of challenges with SLR comprises the variability in sign language through various cultures and individuals, the difficulty of certain signs, and require for realtime processing. This study introduces an Automated Sign Language Detection and Classification using Reptile Search Algorithm with Hybrid Deep Learning (SLDC-RSAHDL). The presented SLDC-RSAHDL technique detects and classifies different types of signs using DL and metaheuristic optimizers. In the SLDC-RSAHDL technique, MobileNet feature extractor is utilized to produce feature vectors, and its hyperparameters can be adjusted by manta ray foraging optimization (MRFO) technique. For sign language classification, the SLDC-RSAHDL technique applies HDL model, which incorporates the design of Convolutional Neural Network (CNN) and Long-Short Term Memory (LSTM). At last, the RSA was exploited for the optimal hyperparameter selection of the HDL model, which resulted in an improved detection rate. The experimental result analysis of the SLDC-RSAHDL technique on sign language dataset demonstrates the improved performance of the SLDC-RSAHDL system over other existing DL techniques.

Keywords: Sign language, Deep learning, Computer vision, Reptile search algorithm, Intelligent models

1. Introduction

Sign language is a computer vision-based comprehensive complex language that captivates signs formed by the actions of hands in association with facial expressions [1]. It is a natural language employed by an individual with less or no hearing intelligence for communication. Sign language can be implemented for communicating words, letters, or sentences by employing diverse gestures of the hands [2]. This kind of communication makes it simple for hearing-challenged individual to express their opinions and assist in linking the communication gap amongst normal and hearing-challenged individuals. People have adapted to sign language for communicating since the antique period [3]. Hand signs are as old as human civilization itself. Hand gestures are specifically advantageous in expressing any emotion or word to communicate. Hence, humans around the globe employ gestures from hand regularly in expressing themselves spite the creation of writing conventions [4]. Recently, much study has been continuing in emerging systems that are able to classify gestures of diverse sign languages as provided class. Such systems have found applications in robot controls, natural language communications, virtual reality environments, and games [5]. The automated identification of human gestures is a convolutional multi-disciplinary issue that has not yet been totally resolved. In recent years, a count of methods can be employed that involve the implementation of ML procedures for sign language identification [6]. Since the beginning of Deep Learning (DL) methods, there have been attempts to identify human gestures.

To identify gestures, diverse aspects like articulated models and hand-crafted spatio-temporal descriptors were employed together with gesture classifiers, conditional random fields [7], hidden Markov models, and Support Vector Machines (SVM) have been extensively employed. But categorization of signs is unforeseeable under changing illumination conditions, and from diverse subjects is still a threatening issue [8]. An instinctive approach for producing interfaces is to look at the user's muscle activity. The device can record this action by employing a camera [9]. This recorded imagery can be recognized by DL algorithms to determine the gesture. In recent times, categorization with DCNN networks has been efficient in several identification challenges [10]. Multi-column DCNNs that use several similar networks have been demonstrated to enhance recognition rates of single networks.

This study introduces an Automated Sign Language Detection and Classification using Reptile Search Algorithm with Hybrid Deep Learning (SLDC-RSAHDL). In the SLDC-RSAHDL technique, MobileNet feature extractor is utilized to produce feature vectors, and its hyperparameters can be adjusted by manta ray foraging optimization (MRFO) system. For sign language classification, the SLDC-RSAHDL technique applies HDL model, which incorporates the design of Convolutional Neural Network (CNN) and Long-Short Term Memory (LSTM). At last, the RSA was exploited for the optimal hyperparameter selection of the HDL model, which resulted in improved detection rate. The experimental result examination of the SLDC-RSAHDL algorithm was executed on sign language database.

2. Literature review

Pandey et al. [11] proposed a novel Feed Forward Neural Network (FFNN) model system that can automatically identify sign language to help normal humans in more efficient communication with impaired visually, hearing-wise, or speech-wise. This scheme recognized the hand gesture aspect point extraction given with FF point extraction given with FFNN. Hand gesture recognition with voice process scheme by implementing Hidden Markov Model (HMM) is employed to deliver communication for normal and dump individuals. In Ref. [12], a new outline is suggested for gesture-autonomous sign language identification by employing several DL constructions containing hand semantic segmentation, Deep Recurrent Neural Network (DRNN), and hand shaped factor depiction. Abstracting hand shaped aspects is performed by implementing a single layer Convolutional Self-Organizing Map (CSOM) rather than depending on transfer learning (TL) of pre-trained CNNs (DCNNs). The series of abstracted aspect vectors is later identified by implementing BiLSTM-RNN.

In [13], a two-stream CNN (2 S–CNN) framework was suggested to identify the American Sign Language (ASL) hand signs founded on multi-modal (RGB and depth) data fusion. Initially, the hand sign information was improved to eliminate the impact of noise and background. Next, hand sign RGB and depth features are abstracted for hand sign detection by corresponding CNNs on 2 streams. Lee et al. [14] suggest an ASL learning application model. This application will be a whack-a-mole gaming with an embedded real time gesture identification scheme. As both dynamic and static gestures (J, Z) are present in ASL alphabetical system, LSTMRNN with KNN technique is accepted as the categorization technique is founded on management of a series of inputs. Features like angles amongst fingers, distance amongst finger positions, and sphere radius are abstracted as input for the categorization prototype.

Rastgoo et al. [15] suggest a new DL-founded pipeline construction for effective instinctive hand gesture language identification by implementing 2DCNN, Single Shot Detector (SSD), 3DCNN, and LSTM from RGB input videos. The authors employ a CNN-founded prototype that evaluates the 3D hand keypoint from 2D input segments. Das et al. [16] suggested a fusion porotype comprising deep TL founded on CNN with an RF categorizer for the instinctive identification of Bangla Sign Language (BSL) (numeric and alphabetical symbols). 'Ishara-Bochon' and 'Ishara-Lipi' are both datasets of secluded numeric and alphabetical symbols, corresponding to the initial comprehensive multi-purpose open-access dataset for BSL. Also, the authors suggested a background elimination protocol that eliminates needless aspects from the gesture imageries. The authors [17] suggest a Fully Convolutional Network (FCN) for online SLR to simultaneously learn temporal and spatial aspects from feebly interpreted video series with sole sentence-level explanations provided. A Gloss Feature Enhancement (GFE) segment is presented in the suggested networks to apply better series orientation learning.

3. The proposed model

In this article, we have introduced a new SLDC-RSAHDL technique for automated detection and classification of sign language using the DL and metaheuristic optimization algorithms. It follows a four stage process: MobileNet feature extraction, MRFO based hyperparameter tuning, HDL based sign language recognition (SLR), and RSA based parameter tuning. Fig. 1 signifies the overall flow of SLDC-RSAHDL approach.

Fig. 1 — Overall flow of SLDC-RSAHDL approach.

3.1. Feature extraction using MobileNet

The basic principle of lightweight model is to develop effective network computation for convolution models that could minimalize the number of parameters and the computation time while guaranteeing the detection performance. Sifre, in the US in 2014, first proposed the MobileNet model, which was the depth‐separable convolution that splits the typical convolutional layer into point‐wise and depth‐wise convolutional layer separable convolutional layer that implies the summation and convolution in the classical convolutional model are divided as, such that the computation speed is improved increased and thus, the amount of weight parameters evaluated by the network could be decreased considerably [18].

Consider that the length and width of output and input are constant and that the number of channels $M$ , input is a feature map of length $D_{F}$ and width $D_{F}$ , later a convolutional kernel of height $D_{K}$ and width $D_{K}$ , the typical convolution will output a number of channels N, feature map of length $D_{K}$ and width $D_{K}$ . Set this to $G$ ; the typical convolution is $D_{F} x D_{F} x M x N x D_{K} D_{K}$ . This convolution process was mathematical process written as:

Equation 1.

(1)

The computation of every $\hat{G}$ needs the sum of each $m$ . Depth‐separable convolution to take out the $m$ alone.

Later depth separable convolutional layer splits the classical convolution kernels into summation and convolution parts. In such cases, the pointwise convolution map has single parameter, the amount of resultant features N, whereas the depth convolutional map has three variables, the amount of input features $M$ , the length $D_{K}$ and the width $D_{K}$ . The original 4 parameters were split into 1 and 3 parameters; hence it can be mathematical model has been changed.

Equation 2.

(2)

Equation 3.

(3)

Where $K$ represents the convolutional kernel for pointwise convolutional and $\hat{K}$ represents the convolutional kernel for depthwise convolutional.

Equation 4.

(4)

For such reasons, it is easier to ensure for the depth separable convolutional, the amount of convolution execution is evaluated in 2 stages. Initially, $M D_{K} x D_{K}$ matrices moved $D_{F} x D_{F}$ times; next, N 1x $1 x M$ convolutional kernels moved $D_{F} x D_{F}$ times; hence the overall amount of convolutional executions can be attained by adding the number of depth‐separable convolutional and the abovementioned two executions. The amount of computation is $D_{F} x D_{F} x M x D_{F} x D_{F} + 1 x 1 x M x N x D_{F} x D_{F}$ , The ratio of computation work of the depth separable convolutional layer to the typical convolutional can be given as follows:

Equation 5.

(5)

The abovementioned formula demonstrates that the computation reduction is positively related to $D_{K}$ and $N$ . Furthermore, the convolution kernels of the depth convolutional layer in MobileNet are known to be $3 x 3$ , and during their implementation, the computation of depth separable convolutional layer is 1/8 to 1/9 of that of the typical convolutional, thereby accomplishing the drive of enhancing the computational rate of network structure.

3.2. Hyperparameter tuning using MRFO algorithm

For hyperparameter tuning process of the MobileNet algorithm, the MRFO technique was employed. The MRFO algorithm simulates three foraging performances for upgrading the solution position [19]. The foraging performances like cyclone, somersault, and chain. The mathematical process for every foraging performance is described below:

Chain foraging: The foraging chain has been developed if manta rays arrange head‐to‐tail. In each iteration, an optimum solution was utilized for updating every individual. The subsequent mathematical model can demonstrate it:

Equation 6.

(6)

α = 2 \cdot r \cdot \sqrt{| \log (r) |}

Whereas $N$ signifies the dimensional of populations, $r$ denotes the random vector among 0 and 1, $x_{i}^{d} (t)$ refers to the $i^{t h}$ individual's position at $t^{t h}$ iteration, $α$ implies the weighted coefficient, and $x_{b e s t}^{d} (t)$ stands for the plankton with maximal concentration (an optimum solution gained so far).

Cyclone foraging: If the manta rays spot food, they can generate a lengthy foraging chain and therefore swim for receiving the food. The subsequent mathematical formula defined the cyclone foraging performance:

Equation 7.

(7)

β = 2 e^{r_{1} (T - t + 1 / T)} \cdot \sin (2 π r_{1})

In which $β$ and $T$ signify the weighted factor and maximal iteration count correspondingly, and $r_{1}$ denotes the random value among zero and one.

The exploration process is utilized for improving the algorithm by utilizing the subsequent mathematical process:

Equation 8.

(8)

Equation 9.

(9)

whereas $x_{r a n d}^{d}$ denotes the random position from the searching space, and $U b^{d}$ and $L b^{d}$ imply the lower and upper limits of $d^{t h}$ dimensional correspondingly

Somersault foraging: The food position at this point was considered as pivot, whereas all the individuals performed to swim near or around the pivot and afterwards somersaults to a novel position. The equivalent mathematical formula is offered as depicted:

Equation 10.

(10)

Whereas the somersault factor was defined by $S$ , and $r_{2}$ and $r_{3}$ signify the random numbers among zero and one.

3.3. Sign language classification using optimal HDL model

In this work, the classification of signs takes place by the HDL model. For two major reasons, CNN provides better accuracy in pattern recognition and classification. Primary, its structure was highly relevant for determining local connections amongst data points; next, it decrease the amount of network parameters [20], thus resulting in a low computation difficulty than traditional plain neural network architecture. Fig. 2 displays the structure of CNN. The equation of one standard convolution layer is formulated by Eq. (11):

Equation 11.

(11)

Where $X^{c o n v},$ $W^{c o n v}$ , correspondingly denotes the output vector, weighted matrix of convolutional layer, $X$ indicates the sensors input, and $c o n v 1$ D indicates the $1 D$ convolutional operator. The hyperparameter of convolutional layer is the length of kernel $L_{k}$ representing the count of neighboring data points aggregated, and the amount of kernel $N_{k}$ representing the amount of local features extracted.

Then, $X^{c o n v}$ is fed into the LSTM layer that exploits data at many preceding time steps for perceiving insight into current time step, represented as “long‐term dependency”. Introducing $L$ a classical linear conversion of integration of $X_{t}^{c o n v}$ with $N_{k}$ feature at $t$ time step and resultant of hidden state $h_{t - 1}$ with $N_{h}$ features at prior step:

Equation 12.

(12)

In Eq. (12), $W$ and $b$ denote the weighted matrix and bias vector; it can be noteworthy that the amount of features of $L$ is equivalent to that of hidden output $h$ . All the cells of LSTM include 3 gates such as forget gate $f_{f}$ , input gate $f_{i}$ , and output gate $f_{0}$ , that include nonlinear sigmoid function $σ$ to a linear conversion $L$ as follows:

f_{f} = σ (L_{f} (h_{t - 1}, X_{t}^{c o n v})) h_{t - 1}, X_{t}^{c o n v})) L_{f}

Equation 13.

(13)

f_{0} = σ (L_{o} (h_{t - 1}, X_{t}^{c o n v}))

At the same time, a novel candidate of data produced at $t$ time step can be evaluated by the $\tanh$ activation function to linear conversion of concatenation $[h_{t - 1}; X_{t}^{c o n v}]$ :

Equation 14.

(14)

Next, the candidate enters LSTM cells:

Equation 15.

(15)

and hidden output of LSTM cell at $t$ time step can be evaluated at the output gate:

Equation 16.

(16)

Where $\oplus$ and $⊙$ correspondingly represents component-wise addition and multiplication of two vectors. As soon as input data enter a network, it can be split into fixed‐length segments, and then the IDCNN layer extracts local connections amongst their surrounding points and data points beforehand, feeding to the memory cell of LSTM where long‐term dependency is recognized and preserved over time. During this hybrid DL structure, the hyperparameter that needs to be further defined is the size of hidden output $N_{h}$ , amount of kernels $N_{k}$ , and the kernel length $L_{k}$ in the convolutional layer at all the LSTM cells.

Finally, the RSA adjusts the hyperparameter values of the HDL model. The highly coordinated and cooperative hunting method demonstrated by the crocodiles includes encircling the target, and hunting has been an inspiration for the current reptile search algorithm RSA [21].

Equation 17.

(17)

The initialization stage begins with generating $X$ matrix of random solution $x_{i, j}$ based on Eq. (17), where $n$ denotes the dimensionality of specific problem, $i$ represents the index of the individual, $j$ shows its existing location, $a n d N$ represents the overall amount of individuals.

Equation 18.

(18)

Eq. (18) produces random individuals. Now, $r a n d$ represents the arbitrary integer within the range, and $L B$ and $U B$ represent the lower and upper bounds of searching spaces. The search process was split into two major procedures (neighboring prey, afterwards the attack) accompanied by the 4 distinct behaviors for emphasizing exploration and exploitation. Exploration exploits 2 walking strategies demonstrated by crocodiles: stomach walk and elevated walk. The key objective of the crocodile is to extend the searching region and helps for the next hunting stage. The elevated walk method can be used if $t \leq \frac{T}{4}$ , whereas the stomach walk is triggered if $t > \frac{T}{4}$ and $t \leq 2 \frac{T}{4} .$ Eq. (19) is accountable for updating the position of crocodile:

Equation 19.

(19)

Equation 20.

(20)

In Eq. (19), $T$ shows the maximal amount of iterations, $B e s t_{j}$ represents the present optimum individual at $j$ - $t h$ position, and $t$ denotes the ongoing iteration. The hunting operator $η_{(i, j)}$ was determined by Eq. (20), where $β$ shows the sensitive parameter fixed at 0.1, which governs the exploration performance.

The searching space was shrunk by using the reduction function, determined using Eq. (21), where $r_{1}$ denotes a random integer ranging from 1 to $N,$ $x_{r 1, j}$ signifies the $i_{t h^{l}} s$ solution random location, and $e$ represents a smaller value.

Equation 21.

(21)

Eq. (22) evaluates the probability ratio, named "Evolutionary Sense", that arbitrarily alternates in $[- 2,$ 2] as round passes by:

Equation 22.

(22)

Where $r_{2}$ indicates the arbitrary value inside.

Eq. (23) define the percentage difference between the position of the observed and best‐obtained individual:

Equation 23.

(23)

In Eq. (23), $α$ denotes the sensitive variable, with the predetermined value 0.1, which controls the fluctuations amongst possible individuals appropriate for co-operated hunting. The corresponding upper and lower boundaries of the $j_{t h}$ position were indicated as $U B_{(j)}$ and $L B_{(j)} .$ .

The average location $M (X)$ of $i t h$ individual was expressed as follows.

Equation 24.

(24)

The RSA exploitation process is divided into hunting coordination (if $t \leq 3 \frac{T}{4}$ and $t > \frac{T}{2})$ and cooperation $(if t \leq T$ and $t > 3 \frac{T}{4})$ technique, aims to strengthen the local investigation of the search realm and closer to the optimum individual. The hunting behavior shown by the crocodile has been expressed as.

Equation 25.

(25)

The basic RSA shows the time complexity of the $O (N \times (T \times D + 1$ where $N$ indicates the candidate counts, $T$ represents the round counts, and $D$ denotes the dimensional of solution spaces. The RSA method creates a fitness function (FF) to make superior classifier result. It explains a positive integer to exemplify the good performance of candidate outcomes. During this effort, the minimizing of classifier error rate was supposed that FF is formulated in Eq. (26).

Equation 26.

(26)

4. Experimental Evaluation

In this section, the SLR performance of the SLDC-RSAHDL technique is studied using the ASL alphabet dataset from Kaggle repository [22]. The database has a group of images of alphabets in American Sign Language, divided into 29 folders that expose several classes. Table 1 and Fig. 3 offer a detailed recognition result of the SLDC-RSAHDL technique under 29 classes. The results indicate that the SLDC-RSAHDL technique performs proficiently in each class. At the same time, it is noticed that the SLDC-RSAHDL technique accomplishes effectual outcomes with average $p r e c_{n}$ of 99.42 %, $r e c a_{l}$ of 99.43 %, $a c c u_{y}$ of 99.51 %, and $F_{s c o r e}$ of 99.43 %.

Table 1.

Classifier outcome of SLDC-RSAHDL approach under 29 classes.

Sign	Precision	Recall	Accuracy	F-Score	Sign	Precision	Recall	Accuracy	F-Score
A	99.25	99.80	99.65	99.69	P	99.57	99.33	99.58	99.32
B	99.39	99.45	99.70	99.20	Q	99.53	99.41	99.43	99.25
C	99.52	99.36	99.49	99.75	R	99.43	99.20	99.55	99.21
D	99.25	99.68	99.43	99.49	S	99.53	99.57	99.62	99.54
E	99.45	99.22	99.40	99.49	T	99.50	99.22	99.21	99.75
F	99.41	99.56	99.75	99.41	U	99.27	99.34	99.72	99.55
G	99.53	99.21	99.31	99.35	V	99.26	99.66	99.54	99.38
H	99.33	99.51	99.37	99.32	W	99.30	99.34	99.45	99.49
I	99.77	99.34	99.73	99.52	X	99.43	99.45	99.46	99.70
J	99.48	99.24	99.39	99.26	Y	99.27	99.53	99.74	99.30
K	99.27	99.49	99.36	99.32	Z	99.44	99.36	99.70	99.34
L	99.45	99.22	99.49	99.57	Space	99.51	99.72	99.64	99.28
M	99.76	99.43	99.33	99.66	Nothing	99.26	99.58	99.58	99.63
N	99.20	99.48	99.73	99.28	Delete	99.30	99.28	99.20	99.31
O	99.65	99.42	99.21	99.20	Average	99.42	99.43	99.51	99.43

Open in a new tab

Fig. 3 — Average outcome of SLDC-RSAHDL approach.

Table 2 and Fig. 4, Fig. 5 reports a brief recognition outcome of the SLDC-RSAHDL approach with other optimizers. The experimental values highlighted that the RMSProp optimizer and Adam optimizers had reached almost nearer performance with $a c c u_{y}$ of 98.95 % and 98.93 %, respectively. Along with that, the SGD optimizer gains considerable outcomes with $a c c u_{y}$ of 99.28 %, $p r e c_{n}$ of 99.19 %, $r e c a_{l}$ of 99.24 %, and $F_{s c o r e}$ of 99.11 %. However, the SLDC-RSAHDL technique resulted in enhanced performance with $a c c u_{y}$ of 99.51 %, $p r e c_{n}$ of 99.42 %, $r e c a_{l}$ of 99.43 %, and $F_{s c o r e}$ of 99.43 %.

Table 2.

Recognition outcome of SLDC-RSAHDL approach with distinct measures.

Methods	$A c c u_{y}$	$P r e c_{n}$	$R e c a_{l}$	$F_{S c o r e}$
SLDC-RSAHDL	99.51	99.42	99.43	99.43
SGD Optimizer	99.28	99.19	99.24	99.11
RMSProp Optimizer	98.95	99.02	99.19	99.08
Adam Optimizer	98.93	99.00	99.15	99.01

Open in a new tab

Fig. 4 — $A c c u_{y}$ and $F_{s c o r e}$ outcome of SLDC-RSAHDL approach.

Fig. 5 — $P r e c_{n}$ and $R e c a_{l}$ outcome of SLDC-RSAHDL approach.

Fig. 6 inspects the accuracy of other existing techniques during the training and validation process on test dataset. The figure stated that the other existing techniques reach enhancing accuracy values over increasing epochs. Moreover, the increasing validation accuracy over training accuracy exposed those other existing methods that learn effectively on the test dataset.

The loss investigation of other existing systems at the time of training and validation is exhibited on the test dataset in Fig. 7. The outcomes inferred that other existing methods gain closer values of training and validation loss. It is clear that other existing techniques learn effectively on the test dataset.

Table 3 reports an overall comparison analysis of the SLDC-RSAHDL technique in terms of recognition rate (RR) and computation time (CT) [23]. In Fig. 8, a comparative RR investigation of the SLDC-RSAHDL technique with other models was performed. The results imply that the KNN model resulted from ineffective outcomes with minimal RR of 97.29 %. At the same time, the SVM and ANN models have accomplished considerably enhanced performance with closer RR of 98.31 % and 98.54 % respectively. Concurrently, the CNN model accomplishes reasonable RR of 99.12 %. But the SLDC-RSAHDL technique reaches higher performance with RR of 99.43 %.

Table 3.

Comparative outcome of SLDC-RSAHDL system with other techniques.

Methods	Recognition rate (%)	Computation Time (min)
K-Nearest Neighbors	97.29	16.84
Support Vector Machine	98.31	15.10
Artificial Neural Network	98.54	14.36
Conv. Neural Network	99.12	11.26
SLDC-RSAHDL	99.43	6.14

Open in a new tab

Fig. 8 — RR analysis of SLDC-RSAHDL approach with other algorithms.

In Fig. 9, a comparative CT examination of the SLDC-RSAHDL approach with other techniques was performed. The outcomes inferred that the KNN system resulted from ineffective outcomes with maximal CT of 16.84min. Besides, the SVM and ANN algorithms have obtained considerably superior performance with closer CTs of 15.10min and 14.36min. Finally, the CNN method reaches reasonable CT of 11.26min. However, the SLDC-RSAHDL system attains effectual performance with CT of 6.14min.

From the detailed results and discussion, it can be concluded that the SLDC-RSAHDL algorithm reaches effectual performance on the SLR process.

5. Conclusion

In this study, we have introduced a novel SLDC-RSAHDL technique for automated detection and classification of sign language using the DL and metaheuristic optimization algorithms. It follows a four-stage process: MobileNet feature extraction, MRFO based hyperparameter tuning, HDL based SLR, and RSA based parameter tuning. The design of the MRFO and RSA algorithms assists in the effectual selection of the hyperparameters related to the MobileNet and HDL models, which results in improved detection rate. The experimental result analysis of the SLDC-RSAHDL technique on sign language dataset demonstrates the improved performance of the SLDC-RSAHDL technique over other recent DL algorithms. In the future, the detection performance of the SLDC-RSAHDL technique was boosted by the fusion-based ensemble models' design.

Data Availability Statement

The data used in this article was not collected from any public repository. The data collected as responses for this study was collected from individuals working in the case organization.

Ethics approval

This article does not contain any studies with human participants performed by any of the authors.

Consent to Participate

Not applicable.

Funding details

None.

Informed Consent

Not applicable.

CRediT authorship contribution statement

Hadeel Alsolai: Conceptualization, Data curation, Funding acquisition, Methodology, Writing - original draft. Leen Alsolai: Conceptualization, Writing - original draft, Writing - review & editing. Fahd N. Al-Wesabi: Conceptualization, Writing - original draft, Writing - review & editing. Mahmoud Othman: Conceptualization, Methodology, Writing - original draft, Writing - review & editing. Mohammed Rizwanullah: Methodology, Software, Writing - original draft, Writing - review & editing. Amgad Atta Abdelmageed: Conceptualization, Data curation, Validation, Writing - original draft, Writing - review & editing.

Declaration of Competing interest

The authors declare that they have no conflict of interest. The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.

Acknowledgment

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number RI-44-0522.

References

1.Kothadiya D., Bhatt C., Sapariya K., Patel K., Gil-González A.B., Corchado J.M. Deepsign: sign language detection and recognition using deep learning. Electronics. 2022;11:1780. [Google Scholar]
2.Katoch S., Singh V., Tiwary U.S. Indian Sign Language recognition system using SURF with SVM and CNN. Array. 2022;14 [Google Scholar]
3.Kamruzzaman M.M. Arabic sign language recognition and generating Arabic speech using convolutional neural network. Wireless Commun. Mobile Comput. 2020 doi: 10.1155/2020/3685614. [DOI] [Google Scholar]
4.Zakariah M., Alotaibi Y.A., Koundal D., Guo Y., Mamun Elahi M. Sign language recognition for Arabic alphabets using transfer learning technique. Comput. Intell. Neurosci. 2022 doi: 10.1155/2022/4567989. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Bird J.J., Ekárt A., Faria D.R. British sign language recognition via late fusion of computer vision and leap motion with transfer learning to american sign language. Sensors. 2020;20:5151. doi: 10.3390/s20185151. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Mannan A., Abbasi A., Javed A.R., Ahsan A., Gadekallu T.R., Xin Q. Hypertuned deep convolutional neural network for sign language recognition. Comput. Intell. Neurosci. 2022:2022. doi: 10.1155/2022/1450822. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
7.Hameed H., Usman M., Khan M.Z., Hussain A., Abbas H., Imran M.A., Abbasi Q.H. 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) IEEE; 2022. July. Privacy-preserving British sign language recognition using deep learning; pp. 4316–4319. [DOI] [PubMed] [Google Scholar]
8.Elakkiya R. Retracted article: machine learning based sign language recognition: a review and its research frontier. J. Ambient Intell. Hum. Comput. 2021;12:7205–7224. [Google Scholar]
9.Li D., Rodriguez C., Yu X., Li H. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020. Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison; pp. 1459–1469. [Google Scholar]
10.Sharma S., Kumar K.A.S.L.-3D.C.N.N. American sign language recognition technique using 3-D convolutional neural networks. Multimed. Tool. Appl. 2021;80:26319–26331. [Google Scholar]
11.Pandey A., Chauhan A., Gupta A. Voice based Sign Language detection for dumb people communication using machine learning. J. Pharm. Negat. Results. 2023:22–30. [Google Scholar]
12.Aly S., Aly W. A novel signer-independent deep learning framework for isolated Arabic sign language gestures recognition. IEEE Access. 2020;8:83199–83212. [Google Scholar]
13.Gao Q., Ogenyi U.E., Liu J., Ju Z., Liu H. vol. 19. Springer International Publishing; Portsmouth, UK: 2020. A two-stream CNN framework for American sign language recognition based on multimodal data fusion; pp. 107–118. (Advances in Computational Intelligence Systems: Contributions Presented at the 19th UK Workshop on Computational Intelligence, September 4-6, 2019). [Google Scholar]
14.Lee C.K., Ng K.K., Chen C.H., Lau H.C., Chung S.Y., Tsoi T. American sign language recognition and training method with recurrent neural network. Expert Syst. Appl. 2021;167 [Google Scholar]
15.Rastgoo R., Kiani K., Escalera S. Hand sign language recognition using multi-view hand skeleton. Expert Syst. Appl. 2020;150 [Google Scholar]
16.Das S., Imtiaz M.S., Neom N.H., Siddique N., Wang H. A hybrid approach for Bangla sign language recognition using deep transfer learning model with random forest classifier. Expert Syst. Appl. 2023;213 [Google Scholar]
17.Cheng K.L., Yang Z., Chen Q., Tai Y.W. Computer Vision–ECCV 2020: 16th European Conference. Springer International Publishing; Glasgow, UK: 2020. Fully convolutional networks for continuous sign language recognition; pp. 697–714. August 23–28. [Google Scholar]
18.Wang H., Lu F., Tong X., Gao X., Wang L., Liao Z. A model for detecting safety hazards in key electrical sites based on hybrid attention mechanisms and lightweight Mobilenet. Energy Rep. 2021;7:716–724. [Google Scholar]
19.Ganesh N., Shankar R., Čep R., Chakraborty S., Kalita K. Efficient feature selection using weighted superposition attraction optimization algorithm. Appl. Sci. 2023;13:3223. [Google Scholar]
20.Dang H.V., Tran-Ngoc H., Nguyen T.V., Bui-Tien T., De Roeck G., Nguyen H.X. Data-driven structural health monitoring using feature fusion and hybrid deep learning. IEEE Trans. Autom. Sci. Eng. 2020;18:2087–2103. [Google Scholar]
21.Stoean C., Zivkovic M., Bozovic A., Bacanin N., Strulak-Wójcikiewicz R., Antonijevic M., Stoean R. Metaheuristic-based hyperparameter tuning for recurrent deep learning: application to the prediction of solar energy generation. Axioms. 2023;12:266. [Google Scholar]
22.https://www.kaggle.com/datasets/grassknoted/asl-alphabet?select=asl_alphabet_test
23.Alrowais F., Alotaibi S.S., Dhahbi S., Marzouk R., Mohamed A., Hilal A.M. Sign Language recognition and classification model to enhance quality of disabled people. CMC-COMPUTERS MATERIALS & CONTINUA. 2022;73:3419–3432. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used in this article was not collected from any public repository. The data collected as responses for this study was collected from individuals working in the case organization.

[bib1] 1.Kothadiya D., Bhatt C., Sapariya K., Patel K., Gil-González A.B., Corchado J.M. Deepsign: sign language detection and recognition using deep learning. Electronics. 2022;11:1780. [Google Scholar]

[bib2] 2.Katoch S., Singh V., Tiwary U.S. Indian Sign Language recognition system using SURF with SVM and CNN. Array. 2022;14 [Google Scholar]

[bib3] 3.Kamruzzaman M.M. Arabic sign language recognition and generating Arabic speech using convolutional neural network. Wireless Commun. Mobile Comput. 2020 doi: 10.1155/2020/3685614. [DOI] [Google Scholar]

[bib4] 4.Zakariah M., Alotaibi Y.A., Koundal D., Guo Y., Mamun Elahi M. Sign language recognition for Arabic alphabets using transfer learning technique. Comput. Intell. Neurosci. 2022 doi: 10.1155/2022/4567989. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Bird J.J., Ekárt A., Faria D.R. British sign language recognition via late fusion of computer vision and leap motion with transfer learning to american sign language. Sensors. 2020;20:5151. doi: 10.3390/s20185151. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Mannan A., Abbasi A., Javed A.R., Ahsan A., Gadekallu T.R., Xin Q. Hypertuned deep convolutional neural network for sign language recognition. Comput. Intell. Neurosci. 2022:2022. doi: 10.1155/2022/1450822. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

[bib7] 7.Hameed H., Usman M., Khan M.Z., Hussain A., Abbas H., Imran M.A., Abbasi Q.H. 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) IEEE; 2022. July. Privacy-preserving British sign language recognition using deep learning; pp. 4316–4319. [DOI] [PubMed] [Google Scholar]

[bib8] 8.Elakkiya R. Retracted article: machine learning based sign language recognition: a review and its research frontier. J. Ambient Intell. Hum. Comput. 2021;12:7205–7224. [Google Scholar]

[bib9] 9.Li D., Rodriguez C., Yu X., Li H. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020. Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison; pp. 1459–1469. [Google Scholar]

[bib10] 10.Sharma S., Kumar K.A.S.L.-3D.C.N.N. American sign language recognition technique using 3-D convolutional neural networks. Multimed. Tool. Appl. 2021;80:26319–26331. [Google Scholar]

[bib11] 11.Pandey A., Chauhan A., Gupta A. Voice based Sign Language detection for dumb people communication using machine learning. J. Pharm. Negat. Results. 2023:22–30. [Google Scholar]

[bib12] 12.Aly S., Aly W. A novel signer-independent deep learning framework for isolated Arabic sign language gestures recognition. IEEE Access. 2020;8:83199–83212. [Google Scholar]

[bib13] 13.Gao Q., Ogenyi U.E., Liu J., Ju Z., Liu H. vol. 19. Springer International Publishing; Portsmouth, UK: 2020. A two-stream CNN framework for American sign language recognition based on multimodal data fusion; pp. 107–118. (Advances in Computational Intelligence Systems: Contributions Presented at the 19th UK Workshop on Computational Intelligence, September 4-6, 2019). [Google Scholar]

[bib14] 14.Lee C.K., Ng K.K., Chen C.H., Lau H.C., Chung S.Y., Tsoi T. American sign language recognition and training method with recurrent neural network. Expert Syst. Appl. 2021;167 [Google Scholar]

[bib15] 15.Rastgoo R., Kiani K., Escalera S. Hand sign language recognition using multi-view hand skeleton. Expert Syst. Appl. 2020;150 [Google Scholar]

[bib16] 16.Das S., Imtiaz M.S., Neom N.H., Siddique N., Wang H. A hybrid approach for Bangla sign language recognition using deep transfer learning model with random forest classifier. Expert Syst. Appl. 2023;213 [Google Scholar]

[bib17] 17.Cheng K.L., Yang Z., Chen Q., Tai Y.W. Computer Vision–ECCV 2020: 16th European Conference. Springer International Publishing; Glasgow, UK: 2020. Fully convolutional networks for continuous sign language recognition; pp. 697–714. August 23–28. [Google Scholar]

[bib18] 18.Wang H., Lu F., Tong X., Gao X., Wang L., Liao Z. A model for detecting safety hazards in key electrical sites based on hybrid attention mechanisms and lightweight Mobilenet. Energy Rep. 2021;7:716–724. [Google Scholar]

[bib19] 19.Ganesh N., Shankar R., Čep R., Chakraborty S., Kalita K. Efficient feature selection using weighted superposition attraction optimization algorithm. Appl. Sci. 2023;13:3223. [Google Scholar]

[bib20] 20.Dang H.V., Tran-Ngoc H., Nguyen T.V., Bui-Tien T., De Roeck G., Nguyen H.X. Data-driven structural health monitoring using feature fusion and hybrid deep learning. IEEE Trans. Autom. Sci. Eng. 2020;18:2087–2103. [Google Scholar]

[bib21] 21.Stoean C., Zivkovic M., Bozovic A., Bacanin N., Strulak-Wójcikiewicz R., Antonijevic M., Stoean R. Metaheuristic-based hyperparameter tuning for recurrent deep learning: application to the prediction of solar energy generation. Axioms. 2023;12:266. [Google Scholar]

[bib22] 22.https://www.kaggle.com/datasets/grassknoted/asl-alphabet?select=asl_alphabet_test

[bib23] 23.Alrowais F., Alotaibi S.S., Dhahbi S., Marzouk R., Mohamed A., Hilal A.M. Sign Language recognition and classification model to enhance quality of disabled people. CMC-COMPUTERS MATERIALS & CONTINUA. 2022;73:3419–3432. [Google Scholar]

PERMALINK

Automated sign language detection and classification using reptile search algorithm with hybrid deep learning

Hadeel Alsolai

Leen Alsolai

Fahd N Al-Wesabi

Mahmoud Othman

Mohammed Rizwanullah

Amgad Atta Abdelmageed

Abstract

1. Introduction

2. Literature review

3. The proposed model

Fig. 1.

3.1. Feature extraction using MobileNet

3.2. Hyperparameter tuning using MRFO algorithm

3.3. Sign language classification using optimal HDL model

Fig. 2.

4. Experimental Evaluation

Table 1.

Fig. 3.

Table 2.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Table 3.

Fig. 8.

Fig. 9.

5. Conclusion

Data Availability Statement

Ethics approval

Consent to Participate

Funding details

Informed Consent

CRediT authorship contribution statement

Declaration of Competing interest

Acknowledgment

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases