Skip to main content
Data in Brief logoLink to Data in Brief
. 2024 May 29;55:110566. doi: 10.1016/j.dib.2024.110566

Sign language images dataset from Mexican sign language

Josué Espejel 1, Laura D Jalili 1, Jair Cervantes 1,, Jared Cervantes Canales 1
PMCID: PMC11214366  PMID: 38948409

Abstract

Sign language is a complete language with its own grammatical rules, akin to any spoken language used worldwide. It comprises two main components: static words and ideograms. Ideograms involve hand movements and contact with various parts of the body to convey meaning. Variations in sign language are evident across different countries, necessitating comprehensive documentation of each country's sign language. In Mexico, there is a lack of formal datasets for Mexican Sign Language (MSL), to solve this issue we structure a dataset of 249 words of the MSL divided into 17 sub-sets, we use background and clothes of black color to enhance the areas of interest (hands and face), for each word we use an average of 11 individuals, from every video sequence we obtain an average of 15 frames from each individual, obtaining 31442 jpg images.

Keywords: Image segmentation, Hand gestures, Frame sequences, Ideogram


Specifications Table

Subject Computer Vision and Pattern Recognition
Specific subject area Sign language detection and classification using image sequences from video clips of the selected words of the MSL
Data format Raw video and images.
Type of data Video clip and RGB images (.jpg), (.xlsx) file
Data collection The images were extracted from video sequences capturing individuals demonstrating distinct hand gestures. We maintained consistency by controlling background and clothing colors, although variations in illumination were not standardized. The recording was conducted using a SONY Cyber-shot 12.1 MP camera.
Eleven individuals were selected to depict the 249 words in the dataset. These words are categorized into 15 subsets, covering various topics such as greetings, time-related expressions, days of the week, months, school supplies, family-related terms, household items, adjectives, culinary terms, clothing items, body parts, vehicles, locations, pronouns, verbs, professions, and Mexican states.
Data source location Educational Center for the Deaf. México State, México.
Latitude and Longitude: 19°30′42.98″ N, 98°52′58.55″ W.
Data accessibility Repository name: https://data.mendeley.com/datasets/6rj76z6y3n/1
Data identification number: DOI: 10.17632/6rj76z6y3n.1
Direct URL to data: mendeley.com
Instructions for accessing this dataset: The dataset comprises a .xlsx file detailing the classes, with each class having its corresponding folder and subfolders containing images and associated videos.
Related research article Josué Espejel-Cabrera, Jair Cervantes, Farid García-Lamont, José Sergio Ruiz Castilla, Laura D. Jalili, Mexican sign language segmentation using color based neuronal networks to detect the individual skin color, Expert Systems with Applications, Volume 183, 2021.

1. Value of the Data

  • The dataset comprises 17 categories covering frequently used words in elementary education. These categories include greetings, time-related terms, body parts, days of the week, months, school supplies, family-related vocabulary, locations, verbs, professions, household items, adjectives, culinary terms, vehicles, clothing items, pronouns, and Mexican states.

  • The quantity of words obtained is notably uncommon in MSL research, nearly doubling the count found in other MSL studies [[1], [2]].

  • The dataset holds potential for research purposes and could facilitate the development of a comprehensive LSM dictionary.

  • Sign language research holds significant importance, offering opportunities to enhance methodologies not only within native sign languages but also through experimentation with other languages, thereby strengthening its overall robustness.

  • Similar to any other language, MSL possesses its own set of grammar rules and cultural nuances, highlighting the significant value of this dataset for future research involving convolutional neural networks. The dataset's abundance of images and classes makes it particularly valuable.

  • This database opens avenues for the development of applications involving sign language control, particularly in high-risk work environments, where it can help minimize direct human interaction within controlled settings. Additionally, it has the potential to enable the creation of real-time translators to facilitate seamless communication among individuals.

  • Moreover, leveraging this dataset can enhance sign language learning through mobile applications, and for deaf individuals, it can lead to the development of educational tools aimed at refining sign language acquisition and proficiency, thereby simplifying implementation efforts.

  • Furthermore, the dataset can serve as a training resource for artificial vision methodologies, aiding in the detection of individuals, human poses, and movements, thereby advancing research in this field.

2. Data Description

The dataset comprises 249 classes, each corresponding to a word in MSL. To compile this dataset, we curated a list of the most frequently used words taught to young deaf individuals, categorizing them into 17 distinct areas. The videos were recorded at the Educational Center for the Deaf Institute (CES).

2.1. Database acquisition

The database was acquired utilizing the following equipment: a Sony Cyber-shot 12.1 MP camera and a tripod to position the camera at the optimal distance and angle. The distance between the camera and the individual was set at 2.5 m, as illustrated in Fig. 1.

Fig. 1.

Fig. 1

Distribution of the area, distance between camera and wall, and the height of the camera.

Throughout the database acquisition process, all individuals wore black long-sleeved shirts, and the background wall was also black. This deliberate choice of color contrast effectively accentuated the regions of interest, notably the hands and faces of the individuals.

The recording location featured windows, leading to fluctuations in luminance during video capture. The individuals involved are students of the CES Institute, with approximately 20 deaf students aged between 11 and 21. For database acquisition, we specifically selected 11 students who demonstrated proficiency in executing signs with precision.

The duration of the frame sequences varied depending on the complexity of each hand gesture, resulting in an average of 15 frames captured per sign.

The recorded videos depict individuals performing the hand movements corresponding to the selected words of the LSM. In Fig. 2 we can see the individual pose during the recording of the video clip.

Fig. 2.

Fig. 2

The image samples show the conditions of the background and clothes color.

From each video, we extracted a sequence of frames, typically averaging around 15 frames per video [1], as illustrated in Fig. 3. The number of frames acquired varies based on the duration of the video clip.

Fig. 3.

Fig. 3

Example of the frame sequence during the hand gesture.

The folder name corresponds to the class number, ranging from 001 to 249. Within each class, the frames are organized by an individual, with subfolders labelled from 01 to 11., as seen in Fig. 4, not all the classes have 11 subfolders. Therefore, each subfolder contains a video corresponding to the word of the class, along with the frame sequence associated with that video.

Fig. 4.

Fig. 4

Directory structure.

Table 1, Table 2 present detailed information regarding the distribution of images across individual words within the dataset. These tables provide insights into the comprehensive coverage of each word through the acquisition of multiple images.

Table 1.

Classes 1–120.

Class Files Class Files Class Files Class Files Class Files Class Files
1 137 21 141 41 141 61 150 81 141 101 130
2 139 22 147 42 134 62 140 82 127 102 131
3 129 23 141 43 139 63 151 83 144 103 121
4 113 24 153 44 120 64 149 84 133 104 138
5 124 25 135 45 147 65 141 85 127 105 120
6 138 26 143 46 147 66 138 86 131 106 129
7 132 27 120 47 136 67 133 87 150 107 130
8 131 28 155 48 142 68 135 88 141 108 116
9 146 29 136 49 132 69 130 89 134 109 140
10 134 30 140 50 131 70 150 90 122 110 132
11 103 31 145 51 137 71 135 91 118 111 118
12 122 32 127 52 131 72 158 92 115 112 115
13 136 33 134 53 132 73 141 93 116 113 127
14 146 34 114 54 128 74 145 94 129 114 120
15 140 35 126 55 137 75 145 95 101 115 124
16 160 36 115 56 134 76 114 96 144 116 125
17 160 37 134 57 136 77 145 97 122 117 126
18 142 38 136 58 141 78 140 98 119 118 123
19 152 39 165 59 130 79 137 99 131 119 129
20 147 40 145 60 129 80 152 100 124 120 125

Table 2.

Classes 121–249.

Class Files Class Files Class Files Class Files Class Files Class Files Class Files
121 141 141 132 161 125 181 113 201 117 221 108 241 101
122 147 142 125 162 100 182 102 202 119 222 120 242 88
123 91 143 118 163 114 183 117 203 99 223 99 243 102
124 107 144 128 164 131 184 84 204 106 224 94 244 120
125 129 145 125 165 133 185 105 205 128 225 111 245 117
126 127 146 120 166 138 186 109 206 128 226 117 246 100
127 126 147 115 167 136 187 104 207 159 227 124 247 122
128 114 148 131 168 147 188 108 208 108 228 143 248 112
129 119 149 136 169 140 189 99 209 115 229 113 249 112
130 114 150 134 170 95 190 118 210 133 230 126
131 120 151 131 171 90 191 105 211 114 231 115
132 129 152 135 172 105 192 113 212 109 232 98
133 127 153 138 173 124 193 95 213 126 233 108
134 123 154 132 174 115 194 118 214 144 234 115
135 136 155 127 175 134 195 130 215 123 235 129
136 131 156 106 176 95 196 105 216 145 236 116
137 134 157 127 177 120 197 114 217 116 237 132
138 124 158 131 178 95 198 88 218 119 238 122
139 135 159 117 179 114 199 117 219 114 239 121
140 128 160 123 180 120 200 104 220 128 240 124

3. Experimental Design, Materials and Methods

Sign Language, similar to any other language, follows its unique set of grammar rules. Furthermore, sign language displays variations across different countries, even among those sharing the same spoken language. These grammatical variances are observable in sign language as well. For example, Mexican Sign Language (MSL) differs from Spanish Sign Language [3] or Argentinian Sign Language [4]. Hence, the necessity for a dataset tailored to the native sign language prompted its design, acknowledging and addressing its limitations. Sign language can be categorized into two domains: static gestures and dynamic gestures (ideograms). Ideograms entail hand movements, bodily contact with the hands, and facial expressions to convey meaning. The dataset's features encompass these crucial aspects and emphasize the regions of interest. To achieve this, meticulous control over the environment was exercised, including the color of the surroundings, and black cloth to cover the background. Additionally, participants were provided with black shirts to cover their arms, thereby creating a stark contrast between the hands and face against the backdrop. Typically, hand gestures commence with the hands positioned at the sides of the body, followed by continuous hand movements, culminating in the hands returning to their initial position. We capture the frame sequence of each individual's hand gesture movement.

Limitations

The video clips were recorded under controlled conditions concerning color, background, and clothing. However, slight variations in luminance occurred in some of the video clips due to environmental factors. These variations presented challenges for image segmentation, especially in images with high brightness levels.

Ethics Statement

This work does not involve animal experimentation or the collection of data from any social media platform. In the dataset, all contributors participated voluntarily in its creation, and no personal data was included.

Participants were anonymized, ensuring no impact on personal data. All images were obtained in compliance with the Federal Law on the Protection of Personal Data Held by Private Entities in Mexico and the implementing legislation of member states, under the following Legal Basis:

  • 1.

    Artículo 6°, Constitución política de los Estados Unidos Mexicanos.

  • 2.

    Artículo 4°, Ley general de transparencia y acceso a la información pública.

The image dataset adheres to the relevant laws and regulations governing the privacy and security of personal information. All faces in the dataset have been blurred to eliminate any potential for identification, and participants were duly informed of this procedure. Furthermore, the acquisition of images was conducted following the ethical code and standards of conduct outlined by the Autonomous University of Mexico State (UAEMex). https://oag.uaemex.mx/normatividad/phpoffice/pdf/codigos/Codigo_de_etica_y_Conducta.pdf.

CRediT authorship contribution statement

Josué Espejel: Conceptualization, Methodology, Investigation, Writing – original draft. Laura D. Jalili: Writing – review & editing, Investigation. Jair Cervantes: Supervision, Writing – review & editing, Visualization, Resources. Jared Cervantes Canales: Writing – review & editing, Investigation.

Acknowledgments

We express our heartfelt appreciation to the Educational Center for the Deaf (CES), its students, teachers, and principal for their invaluable support and guidance throughout the word selection and video acquisition process. This research did not receive funding from any specific grant provided by public, commercial, or not-for-profit entities.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability

References

  • 1.Espejel-Cabrera J., Cervantes J., García-Lamont F., Castilla J.S.R., Jalili L.D. Mexican sign language segmentation using color based neuronal networks to detect the individual skin color. Expert Syst. Appl. 2021;183 doi: 10.1016/j.eswa.2021.115295. [DOI] [Google Scholar]
  • 2.Martínez-Sánchez V., Villalón-Turrubiates I., Cervantes-Álvarez F., Hernández-Mejía C. Exploring a novel mexican sign language lexicon video dataset. Multimodal Technol. Interact. 2023;7:83. doi: 10.20944/preprints202307.1125.v1. [DOI] [Google Scholar]
  • 3.LSE_UVIGO: a Multi-source Database for Spanish Sign Language Recognition (Docío-Fernández et al., SignLang 2020)
  • 4.Dal Bianco P., et al. In: Advances in Artificial Intelligence – IBERAMIA 2022. Bicharra Garcia A.C., Ferro M., Rodríguez Ribón J.C., editors. 2022. LSA-T: the first continuous argentinian sign language dataset for sign language translation. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES