Skip to main content
Data in Brief logoLink to Data in Brief
. 2025 May 27;61:111703. doi: 10.1016/j.dib.2025.111703

Sign language detection dataset: A resource for AI-based recognition systems

Bindu Garg a, Manisha Kasar a,, Priyanka Paygude a,, Amol Dhumane b, Srinivas Ambala c, Jitendra Rajpurohit b, Abhay Sharma d, Vidula Meshram e, Amber Vats a, Achyut Kashyap a
PMCID: PMC12173717  PMID: 40534917

Abstract

Sign language is a very important mode of communication among deaf and hard-of-hearing populations. Automatic sign language detection based on deep learning model is the theme of this study. Hand gestures are classified by the Convolutional Neural Network (CNN) model to different signs. For training purposes, there are 26,000 images available with 3000 images for every alphabet letter such that there is complete representation of sign language gesture. Photos were taken in controlled lighting with a consistent black background to facilitate better feature extraction. The data contains varied participants of various age groups, skin types, and hand shapes to enhance generalization. Data collection was standardized through iPhone 15 Pro Max, black background cloth, tripod stand, and remote-controlled Drodcam app to maintain consistency in image quality and framing. For diversity and realism, three participants were involved in data collection, each providing 1000 images per sign, resulting in a rich and diverse dataset. Preprocessing of data methods were used for achieving the best quality of data, such as resizing, conversion to grayscale, normalization, and augmentation. Different techniques of data augmentation like rotation, flipping, scaling, brightness change, and addition of Gaussian noise were used to introduce variations in hand gestures and make the model robust against various environmental conditions. The dataset was then partitioned into 70 % training, 15 % validation, and 15 % test sets for maximizing model performance and ensuring good generalization. The dataset show high accuracy, reflecting the potential of the model for real-world usage, such as accessibility tools for the deaf community, educational tools, and real-time sign language recognition systems.

Keywords: American Sign Language, Deep Learning, Convolutional Neural Network, Sign Language Recognition


Specifications Table

Subject Computer Science, Machine Learning, Artificial Intelligence
Specific subject area American Sign Language Data for Machine Learning and Deep Learning using Computer Vision
Type of data Images
Data collection This dataset contains high resolution images of American Sign Language categorized in 26 folders for classifications. Each folder representing a letter of the American Sign Language (ASL) alphabet, All images are uniformly resized to 224 × 224 pixels and can optionally be converted to grayscale. Each folder contains 1000 images corresponding to a specific letter, maintaining an organized and consistent structure. For instance, the first folder holds 1000 images of the letter 'A,' comprising 750 right-hand images and 250 left-hand images. Similarly, the second folder contains 1000 images of the letter 'B' with the same distribution. This pattern continues across all 26 letters, ensuring uniform representation throughout the dataset. Images were named sequentially and saved in PNG format.
Data source location Bharati Vidyapeeth (Deemed to be University) College of Engineering, Dhanakawadi, Pune- 411043.
Latitude 18.456610 or 180 27’ 24” north
Longitude 73.85630 or 730 51’ 23” east
Data accessibility Repository name: https://data.mendeley.com/
Data identification number: 10.17632/8fmvr9m98w.1
Direct URL to data Version 1 and Version 2: SignAlphaSet - Mendeley Data
Related research article None

1. Value of the Data

  • The American Sign Language (ASL) dataset consists of various ASL sign images, including A-Z, captured from different angles and lighting conditions. Each sample is labeled with the corresponding ASL gesture, for sign language recognition models. The dataset includes different hand shapes, skin tones, backgrounds, and signer variations, improving model robustness for real-world applications

  • The dataset facilitates communication for the deaf and hard-of-hearing community. It supports deep learning in gesture recognition, aids ASL learning apps, enhances Human–Computer Interaction (HCI), and improves AI-driven accessibility tools in robotics, healthcare, and virtual assistants

  • The dataset can useful to train machine learning and deep learning models to automatically recognize and differentiate between various ASL signs, improving the accuracy and efficiency of sign language recognition systems. This enables real-time ASL-to-text or ASL-to-speech conversion.

  • ASL datasets contribute to the development of advanced computer vision and deep learning algorithms, enhancing object recognition and classification in various AI applications.

The American Sign Language (ASL) dataset, which contains 26,000 high-quality images, fills key gaps in current datasets by providing better data diversity, annotation quality, and usability for real-world applications. In contrast to datasets such as Kaggle's ASL dataset (2515 images, 2 participants, very limited skin tone diversity) and Roboflow's dataset (1728 images, very limited metadata), this dataset has three participants with diverse ages (18–50 years), skin tones, and hand shapes and records 26 static alphabet gestures using 750 right-hand and 250 left-hand images per letter.

Dynamic datasets such as N-WLASL (870 videos) emphasize movement but do not possess the resolution (varies) and depth of annotation present in this dataset. Table 1 presents a complete comparison, demonstrating the dataset's distinct value for researchers interested in high-quality, varied ASL data.

Table 1.

Comparative table on datasets related to sign language.

Sr No. Dataset Ref No. Repository Total Images Data Diversity Unique Features Annotation Quality Interclass Variance
1 [2] www.kaggle.com 2515 2 participants, limited skin tones, 24 letters Basic labels, static gestures Manual, basic labels (letter only) 0.22
2 [9] roboflow.com 1728 Unknown demographics, 26 letters Minimal metadata Automated, minimal metadata N/A
3 [10] ieee-dataport.org 21,093 5 participants, moderate diversity (ages, skin tones), 26 letters Multi-angle captures Manual, includes angles 0.19
4 [11] Kaggle 870 (videos) Unknown participants, diverse gestures (words/phrases) Frame-based labels Manual, frame-based labels N/A
5 [1] Our Published Dataset data.mendeley.com 26,000 3 participants, diverse ages (18–50), skin tones, hand shapes, 26 letters High resolution (1080 × 1920px), controlled lighting, augmentation, landmark metadata Manual, detailed metadata 0.18

The dataset’s focus on high-resolution hand gesture images provides a strong foundation for static ASL recognition, with potential extensions to incorporate facial expressions and body postures via cross-modal fusion, enabling comprehensive sign language recognition systems

2. Background

The existing limitations in sign language recognition datasets introduce crucial challenges to constructing resilient AI systems. Three fundamental gaps in today's resources are (1) too small a scale (e.g., Kaggle's 2515 images, Roboflow's 1728 images), (2) too little participant diversity (usually ≤2 subjects), and (3) variable annotation quality. These limitations have direct consequences on model performance with leading systems experiencing 15–20 % accuracy declines when tested over demographic groups (Zhang & Jiang, 2024).

The largest dynamic dataset (N-WLASL) has fewer than 870 variable-resolution videos, and prevailing static datasets lack standardized preprocessing workflows. This causes researchers to dedicate 30–40 % of project time toward data cleaning in place of developing models (Sharma & Anand, 2021). In addition to this, available resources hardly make available multimodal landmarks (hand, face, pose) that are required by end-to-end sign language interpretation.

This data set enables researchers to bypass common data quality problems and focus on building next-generation recognition systems. The combination of static images and early dynamic sequences (300 movies) permits direct applications in real-time translation, while at the same time providing a foundation for subsequent temporal modelling research.

Sign language is a sophisticated communication medium that depends on hand-shapes, facial expressions, and body positions. Although AI-driven sign language recognition has the potential to improve deaf and hard-of-hearing accessibility, it needs large, heterogeneous, and high-quality datasets to address issues such as intra-class variability, illumination effects, hand occlusions, and real-time processing requirements. Existing datasets tend to have small sample sizes, low participant diversity, poor image quality, and inconsistent annotation, which limit generalization and real-world usability. Integrating multiple modalities, such as hand gestures, facial expressions, and body postures, through cross-modal fusion techniques can significantly enhance recognition accuracy and robustness.

Furthermore, the dataset supports ethical AI development by diminishing prejudice due to its population diversity. It improves more efficient machine learning models and aids in the creation of real-time gesture recognition systems, which promotes AI-based accessibility technology, virtual assistants, and smart environments.

3. Data Description

This dataset comprises 26,000 high-definition images of the American Sign Language (ASL) alphabet A to Z shown in Fig. 1. Each letter has 1000 images 750 right-hand images and 250 left-hand images, captured from multiple angles and lighting conditions. The images are high-resolution (1080 × 1920 pixels), ensuring detailed gesture recognition. Taken in controlled settings against a black background using an iPhone 15 Pro Max. The dataset images were taken in a randomized order, and the filenames or metadata therefore do not explicitly distinguish left- from right-hand gestures. Although all participants were told to use their dominant hand for consistency, this information was not encoded in the filename or metadata. The dataset is specifically designed for AI-driven hand gesture recognition, and in particular, for deep learning algorithms such as Convolution Neural Networks (CNNs). The dark background was selected to provide maximum contrast with hand movement, minimizing noise and making feature extraction easier for CNNs. In comparison to patterned or colored backgrounds, it provides consistent illumination and minimizes distraction. Subsequent versions will have more participants from all age groups (e.g., children, elderly), hand sizes, and skin tones to counteract bias, as mentioned in the Ethics Statement. It has three subjects to add variation in hand shapes, skin colors, and orientations, to increase model generalizability. The images are taken from different angles and illumination conditions to increase model robustness.

Fig. 1.

Fig1

Sample images of the Sign language.

The entire dataset is organized into 26 folders, containing a total of 26,000 images of American Sign Language (ASL) gestures the images are resize to a uniform size (224 × 224 pixels) and optionally normalized to gray scale. Each folder corresponds to a specific letter of the alphabet, making organization simple and efficient. The first folder contains 1000 images of the letter A (750 right-hand images and 250 left-hand images). Similarly, the second folder holds 1000 images of the letter B with the same distribution. This pattern continues for all 26 letters, with each folder containing 1000 images, ensuring consistency across the dataset [1,2].

Sample filename format: B_21.jpg, where:

  • B represents the ASL letter,

  • 21 is the image index, and

In addition to static ASL (American Sign Language) hand gestures, a dynamic dataset was created to capture continuous hand movement patterns for every letter A-Z. Short video clips (approximately 2.5 s of total motion at 30 FPS), were recorded for every letter using a webcam interface recording format (the clips were saved in .avi format). A folder of each letter's sequence of extracted frames was provided in .jpg format for video clips (10 per letter, with each of the ten clips capturing a short temporal sequence of gesture movements).

This dataset is a starting point to allow for the implementation of time series modelling like Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) modelling for dynamic gesture recognition, both of which can better represent temporal patterns in sequential data, like dynamic gestures.

Although this initial work describes dataset compilation and structuring effort, dynamic gesture recognition and modelling using LSTM/RNN will be part of future research initiatives when using the dataset created during the project.

4. Experimental Design, Materials and Methods

4.1. Experimental design

The data set is comprised of 26,000 images of hand signs embodying the 26 alphabet letters of sign language, with 1000 images for each letter. Images were taken under different lighting conditions, orientations of hands, and distances for robustness. In order to make generalization better, images were obtained from various view angles, and the data set was made diverse by including people of varied ages, skin types, and shapes of palms.

Preprocessing consisted of resizing the images to a predetermined size, to grayscale or RGB, and to normalize pixel intensity. An indoor setting with evenly distributed light and a black background reduced noise such that feature detection was improved. Data cleaning had been done by eliminating duplicates, fuzzy images, and wrong tags to ensure that the dataset had high quality [3].

The Data Acquisation Process consisted of following steps as summarized above in Fig. 2.

  • i.

    Step 1: collection of images was done in controlled conditions where 1000 images for each alphabet was taken for both left & right hands.

  • ii.

    Step 2: Cropping & Processing of images was done using a python script.

  • iii.

    Step 3: Uploaded the dataset of images on Mendeley Data Repository.

Fig. 2.

Fig2

Data acquisition process.

The data set was enhanced with a sequence of transformation to mimic realistic hand gesture differences in the wild [15]. The values utilized were:

  • Rotation range: ±30 degrees

  • Width and height shift: up to 20 %

  • Shear range: 0.2

  • Zoom range: ±20 %

  • Horizontal flipping: Enabled

  • Fill mode: Nearest

In addition, input images were normalized using “rescale=1.0/255.”

These transformations were applied using the ImageDataGenerator utility from Keras.

4.2. Dynamic gesture data

As part of our expanded dataset, we added dynamic gesture data for capturing the time-dependent and continuous nature of ASL. The video clips range from 2 to 3 s in length, which is the average duration for the majority of ASL signs, mirroring their normal speed and movement. This duration gives the model adequate time to model both the static pose and motion dynamics needed for effective recognition.

Data Collection Process: Participants: The three native ASL speakers who took part in the static image dataset collection also had their dynamic gestures video recorded. This will maintain the gestures in both datasets equal to one another in skill level and style. 150 unique gestures were executed by each of the participants.

Video Setup: Camera: Recording of video was done using an iPhone 15 Pro Max, which offered high resolution (1080 × 1920 pixels) at 30 frames per second (fps), a compromise between file size and smoothness of motion.

Lighting and Background: Controlled lighting and a black background characterized the recording environment, minimizing the distractions and focusing the attention on the hand movement.

Recording Conditions: Each gesture was captured in its natural motion rather than being slowed down or sped up artificially to replicate real-world signing speed. The dynamic gestures reflect normal sign language conversation flow, capturing the natural rhythm and expressiveness of each sign.

4.3. Materials

The iPhone 15 Pro Max camera with 48 MP was used for high-resolution image capture, while a black cloth background was used to minimize noise and enhance focus on hand gestures. Three participants contributed 1,000 images per sign, adding diversity in hand shapes and gestures. A camera operator ensured proper framing, focus, and consistency, while a tripod stand was used to provide stability and avoid framing inconsistencies. The DroidCam app enabled remote-controlled image capture, reducing movement artifacts. A strong WiFi connection facilitated seamless data transfer and real-time monitoring during the collection process. These materials collectively ensured the dataset's robustness, making it valuable for AI-based sign language recognition and Human-Computer Interaction (HCI) applications.

4.4. Methods

A dataset of American Sign Language (ASL) hand gesture images was collected using an iPhone 15 Pro Max, leveraging its high-resolution camera for clear and detailed images. To ensure diversity, participants of various age groups, complexions, and hand shapes contributed, capturing real-life variations in sign language. Each ASL letter gesture was photographed from multiple angles to enhance model generalization [5,6]. Preprocessing steps included resizing, grayscale or RGB conversion, and pixel normalization, with the Mediapipe Library marking landmark points. The marking of landmark points are shown in Fig. 3. Data augmentation techniques such as random rotations, flipping, scaling, brightness adjustments, and Gaussian noise addition further improved the model’s robustness by simulating real-world variations in lighting, positioning, and minor distortions [8]. Fig. 2 shows step by step procedure of data collection.

Fig. 3.

Fig3

Visual sample of original & augmented images.

Grayscale conversion is not necessary but suggested for cases when color data is not necessary (e.g., static ASL [4] gestures in which hand shape and orientation are major features). It decreases computational complexity and memory usage by 66 % (from 3 RGB channels to 1 grayscale channel) with no loss of accuracy for contour-based models such as CNNs [16]. Nonetheless, RGB is kept for situations in which skin colour or nuanced colour changes (e.g., nail polish, redness of the joints) might be helpful for landmark identification or demographic-inclusive modelling (Fig. 4).

Fig. 4.

Fig 4

ASL Dataset creation pipeline.

The robustness of the dataset was measured in terms of Mediapipe landmark coordinates to determine inter-class variability and intra-class consistency. Inter-class variability, as an average variance of landmark positions across the 26 ASL letters, was 0.18, reflecting distinct feature distributions per letter. Intra-class consistency, as the average cosine similarity of landmark coordinates within each letter's images, was high at 0.90, facilitating consistent classification within categories. These metrics highlight the dataset's capacity to enable event recognition with accurate and generalizable CNN-based, as summarized in Table 1.

To guarantee consistent image quality and compatibility across the dataset, all captured images were saved in JPG format and resized to a standard resolution of 1080 × 1960 pixels. This standardization process ensures the dataset works seamlessly with various machine learning applications (Table 1).

4.5. Training and Optimization

To ensure optimal model performance the model is trained using the Categorical Crossentropy loss function, which is well-suited for multi-class classification problems. The Adam optimizer is employed to dynamically adjust learning rates for efficient convergence [14]. Learning rate scheduling is implemented to modify learning rates based on validation performance, helping to prevent overfitting [13]. Data augmentation techniques are utilized to enhance dataset diversity and improve generalization capabilities. Model performance is comprehensively evaluated using accuracy, precision, recall, and F1-score [[7], [12]]. Training is conducted on high-performance GPU-enabled systems to accelerate computations and enhance scalability (Fig. 5).

Fig. 5.

Fig 5

Landmark points of Sign language.

To avoid overfitting threats from the small participant sample (n=3), we conducted 5-fold cross-validation on MobileNetV2, with a mean accuracy of 98.8 % (±0.5 % standard deviation). Although post-training accuracy was 99.6 % (Table 2), cross-validation results indicate stability. The small difference (<1 %) indicates minimal overfitting, possibly because of participant-specific characteristics in the training set.

Table 2.

Comparison of deep learning models for pre-training and post-training over the dataset.

Deep Learning Model Calculation before training on dataset
Calculation after training on dataset
Cross-Validation Accuracy
(Mean ± SD)
Accuracy Precision Recall F1 Score Accuracy Precision Recall F1 Score
LSTM 39.76 % 42.33 43.23 41.24 88.56 % 91.12 90.56 90.78 96.4% ± 0.6%
EfficientNetV2 41.02 % 40.34 46.34 43.12 96.88 % 98.23 97.45 97.56 95.2% ± 0.8%
MobileNetV2 37.69 % 49.67 48.56 47.23 99.60 % 99.34 99.67 99.78 98.8% ± 0.5%

4.5.1. Model Interpretability analysis

In order to clarify the Convolutional Neural Network (CNN) decision-making process, Gradient-weighted Class Activation Mapping (Grad-CAM) was used on the test set-achieving signal fine-tuned MobileNetV2 model, with 99.6 % accuracy. Grad-CAM produces heatmaps indicating the most responsible parts of an image for classification, allowing visualization of discriminative features like hand contours, finger directions, and palm configurations essential for American Sign Language (ASL) letter recognition. This interpretability study gives clues as to whether the model targets anticipated gesture features or incidental regions (e.g., background noise), and would direct any potential preprocessing or model design optimizations.

Grad-CAM was tested on sample test images for ASL letters on the MobileNetV2 model. The heatmaps in Fig. 4 show that the model repeatedly favors hand-specific features. For instance, for letter ‘A’, the closed fist and thumb position is shown by the heatmap, whereas for ‘C’, the curved finger pattern is illustrated. In few instances, limited focus on the background areas was noticed, perhaps due to very low-level noise in the black background. It indicates possible preprocessing optimizations, including closer cropping about the hand utilizing Mediapipe landmarks, for better model resilience. The study verifies that MobileNetV2 successfully captures gesture-relevant features, as anticipated by the design of the dataset for high-contrast, controlled image capture (Fig. 6).

Fig. 6.

Fig 6

GRAD-CAM heatmap of sign language.

4.6. Future work: cross-modal fusion for holistic sign language recognition

Sign language interaction unites hand movement with facial expression and body stance, which include emotional and contextual undertones required for proper interpretation. For making the dataset more practical, the future work will consist of acquiring cross-modal data in the form of synchronized facial expression and body posture images or videos along with hand movement. This will facilitate the creation of multi-modal fusion networks to boost the accuracy and robustness of American Sign Language (ASL) recognition.

A pilot study is designed to gather cross-modal data for a subset of ASL letters (A–E) from the current three participants. Images or brief videos (2–3 s, 30 fps) will record the participant's face and upper body while they gesture the ASL alphabet using the iPhone 15 Pro Max, keeping the controlled lighting and black background setup outlined in Section 4.2. Each of the three participants will provide 100 samples per letter, giving 1500 cross-modal samples (3 participants × 100 samples × 5 letters). Mediapipe's Holistic model will obtain hand, face (468 points), and pose (33 points) landmarks, saved as metadata to enable multi-modal analysis.

4.7. Metadata and repository information

The complete dataset is publicly available through the Mendeley Data repository [DOI: 10.17632/8fmvr9m98w.1 and 10.17632/8fmvr9m98w.2]. In addition to the raw images and video frames, the repository includes supporting metadata files that describe the folder structure, labeling conventions, and data collection process. Each image and video is labeled based on the corresponding ASL letter, and organized in clearly named directories. A README file is provided to guide users through the dataset contents, helping ensure consistent use across research studies. This metadata improves transparency and reproducibility, making it easier for other researchers to train, validate, or benchmark their own models using our dataset.

4.7.1. Script for loading and preprocessing

This script demonstrates how to load image data, resize each frame to 224 × 224 pixels, normalize pixel values, and prepare batches for model training. A sample code is uploaded on github “https://github.com/achyutkashyap/SignAlphaSet”.

Limitations

Lack of Dynamic Gestures: This dataset centers around static signs, whereas a majority of sign language gestures include dynamic movements that are best represented with sequential modeling.

Ethics Statement

The authors of this dataset, namely Manisha Kasar, and Achyut Kashyap, are depicted in the dataset images. The authors and child parents have willingly given written informed consent for their inclusion in the study and have agreed to the public sharing of their image data. The authors declare no conflict of interest. This research did not involve animal or human studies and did not inflict harm on any living organism.

CRediT Author Statement

Bindu Garg: Conceptualization, Writing – review & editing, Methodology, Data curation. Manisha Kasar: Conceptualization, Writing – review & editing, Methodology, Data curation, Supervision. Priyanka Paygude: Writing – review & editing. Amol Dhumane: Writing – review & editing, Methodology. Shrinivas Amnala: Data curation. Jitenrda Rajpurohit: Writing – review & editing. Abhay Sharma: Methodology. Vidula Meshram: Conceptualization. Amber Vats: Writing – review & editing. Achyut Kashyap: Writing – review & editing;.

Acknowledgments

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Contributor Information

Manisha Kasar, Email: mmkasar@bvucoep.edu.in.

Priyanka Paygude, Email: pspaygude@bvucoep.edu.in.

Data Availability

References

  • 1.Garg B., Kasar M., Kashyap A., Vats A., Sharma G., Hange A. SignAlphaSet”. Mendeley Data. 2025 doi: 10.17632/8fmvr9m98w.1. V1. [DOI] [Google Scholar]
  • 2.K. Your Machine Learning and Data Science Community
  • 3.Zhang Y., Jiang X. Recent Advances on deep learning for sign language recognition. CMES - Comput. Model. Eng. Sci. 2024;139(3):2399–2450. doi: 10.32604/cmes.2023.045731. [DOI] [Google Scholar]
  • 4.Download MS-ASL American sign language dataset from official microsoft download center
  • 5.P. Sharma and R.S. Anand. A comprehensive evaluation of deep models and optimizers for Indian sign language recognition. Graph. Vis. Comput. 5 (2021) 200032, 10.1016/j.gvc.2021.200032 [DOI]
  • 6.Orovwode, H. and Oduntan, I. and Abubakar, J. Development of a sign language recognition system using machine learning. International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems. (2023) 1-8, doi: 10.1109/icABCD59051.2023.10220456
  • 7.Kasar M., Kavimandan P., Suryawanshi T., Garg B. EmoSense: pioneering facial emotion recognition with precision through model optimization and face emotion constraints. Int. J. Eng. 2025;38(1):35–45. doi: 10.5829/ije.2025.38.01a.04. [DOI] [Google Scholar]
  • 8.Chaudharya S., Shah P., Paygude P., Chiwhane S., Mahajan P., Chavan P., Kasar M. Data Br. 2024;56 doi: 10.1016/j.dib.2024.110772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.https://public.roboflow.com/object-detection/american-sign-language-letters/1
  • 10.Qi Shi, 2023, “N-WLASL”, IEEE Dataport, doi:10.21227/x23b-d084.
  • 11.Paygude P., Thite S., Kumar A., Bhosle A., Pawar R., Mane R., Joshi R., Kasar M., Chavana P., Gayakwad M. Data Br. 2024;56 doi: 10.1016/j.dib.2024.111024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Paygude P., Gayakwad M., Wategaonkar D., Pawar R., Pujeri R., Joshi R. Dried fish dataset for Indian seafood: a machine learning application. Data Br. 2024;55 doi: 10.1016/j.dib.2024.110563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Karande S., Garg B. Performance evaluation and optimization of convolutional neural network architectures for Tomato plant disease eleven classes based on augmented leaf images dataset. Neural Comput. Appl. 2024;36:11919–11943. doi: 10.1007/s00521-024-09670-6. [DOI] [Google Scholar]
  • 14.Goud A., Garg B. A novel framework for aspect based sentiment analysis using a hybrid BERT (HybBERT) model. Multimed. Tools Appl. 2023 doi: 10.1007/s11042-023-17647-1. [DOI] [Google Scholar]
  • 15.Daneshfar F., Bartani A., Lotfi P. Image captioning by diffusion models: a survey. Eng. Appl. Artif. Intell. 2024;138 doi: 10.1016/j.engappai.2024.109288. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES