Skip to main content
Data in Brief logoLink to Data in Brief
. 2024 Aug 10;56:110809. doi: 10.1016/j.dib.2024.110809

Create distinctive databases of ancient languages and using a computer vision model to accurately recognize and classify them

Elaf A Saeed a,, Ammar D Jasim a, Munther A Abdul Malik b
PMCID: PMC11385421  PMID: 39257691

Abstract

Cuneiform writing, an old art style, allows us to see into the past. Aside from Egyptian hieroglyphs, the cuneiform script is one of the oldest writing systems. It emerged in the second half of the fourth millennium BC. Most people believe that the Sumerians originally created it in southern Mesopotamia. Many historians place Hebrew's origins in antiquity. For example, we used the same approach to decipher the cuneiform languages; after learning how to decipher one old language, we would visit an archaeologist to learn how to decipher any other ancient language. We propose a deep-learning-based sign detector method to speed up this procedure to identify and group cuneiform tablet images according to Hebrew letter content. The Hebrew alphabet is notoriously difficult and costly to gather the training data needed for deep learning, which entails enclosing Hebrew characters in boxes. We solve this problem by using pre-existing transliterations and a sign-by-sign representation of the tablet's content in Latin characters. We recommend one of the supervised approaches because these do not include sign localization: We Find the transliteration signs in the tablet photographs by comparing them to their corresponding transliterations. Then, retrain the sign detector using these localized signs instead of utilizing annotations. Afterward, a more effective sign detector enhances the alignment quality. Consequently, this research aims to use the Yolov8 object identification pretraining model to identify Hebrew characters and categorize the cuneiform tablets. Images illustrating Hebrew passages have been culled from a Hebrew-language book. This book is known as the Old Testament, and it was organized into around 500 illustrations to aid in reading and pronouncing the characters. Another ancient document was recently discovered in Iraq, dating back to 500. It reached over a thousand photos after pre-processing and augmentation. The cuneiform digital library initiative (CDLI) website and the Iraqi Museum have compiled photographs of cuneiform tablets, with over a thousand photos available in each language.

Keywords: Yolo, Hebrew, Cuneiform, Deep learning, Object detection


Specifications Table

Subject Computer Science: Computer vision and pattern recognition.
Specific subject area Dataset containing cuneiform tablets from ancient time periods. Additionally, there is a collection of several additional pictures depicting old Hebrew manuscripts.
Type of data Table, Image, Figure etc.
Data collection The Hebrew dataset comprises pictures of Old Testament creations and a recently uncovered Hebrew text from Iraq. A collection of dead images was online. Hebrewʼs overlapping and tiny letters make data collection challenging. The Hebrew scripts images reach to more than 500 images before augmentation. The second dataset, the pictures of the Assyrian writings on 14 stone tablets were collected from the Iraq Museum, and the rest of the pictures were collected from the CDLI and British websites. The Assyrian images reached 1327, the Sumerian 1435, and the Babylonian 1321 images.
Data source location The dataset of cuneiform tablets was collected from Iraqi museum, Iraq, Baghdad (33.3281° N, 44.3850° E) and from CDLI website. According to second dataset, Hebrew Scripts was collected form old book (Old Testament creations) and a recently uncovered Hebrew text from Iraq [1,2].
Data accessibility Repository name:
dataset1: Cuneiform Tablets [3],
dataset2: Hebrew Scripts [4]
Data identification number:
Dataset1: DOI: 10.17632/wpwxgvwhtt.1;
Dataset2: 10.17632/hjbrf25mwx.1
Direct URL to data:
Dataset1: https://data.mendeley.com/datasets/wpwxgvwhtt/1
Dataset2: https://data.mendeley.com/datasets/hjbrf25mwx
Related research article None

1. Value of the Data

  • This data is significant as it was obtained with high precision and has been organized and categorized for use in other research investigations.

  • This tool may be utilized by other scholars in the field of translation for ancient languages, aiding in the comprehension and interpretation of such texts.

  • The information provided is valuable for technological advancements that include working with various languages and dialects.

  • This database can be used in machine learning applications and to create a classification of cuneiform tablets and signs for various ancient cuneiform languages.

2. Background

The writing system used Sumerian symbols to express concepts. Symbolic writing evolved through numerous stages to create the Assyrian and Babylonian languages [5]. Unlike hieroglyphic visual language, the cuneiform system emphasizes verbal communication and uses specific terminology to give exact meanings. The International Museum and Iraqi Museum hold approximately 10,000 cuneiform tablets, with the latter possessing over 2000 [6,7]. The Mesopotamian cuneiform script became Assyrian. Symbols were written on stone or clay tablets in a right-to-left orientation. The cuneiform alphabet has 600 symbols-based letters [8,6]. We want to help scientists grasp Hebrew letters so they can study better. Our main goal is to create a Hebrew letter detector to recognize and locate sign pronunciation within a bounding box accurately. Cuneiform tablet categorization is also established. The scripts were Assyrian, Sumerian, or Babylonian. This character-based method uses bounding boxes to identify Hebrew letters. Pretraining models identify Hebrew letter sounds. Scientists test detectors using letter-level bounding boxes. Its unique data collection includes almost 400 Hebrew character visuals. We trained the model for exact identification using cutting-edge approaches like YOLOv8 [9]. The study uses YOLOV8, the latest YOLO object identification technology, for Hebrew character recognition pre-training. Recognition of Hebrew characters yielded significant results.

3. Data Description

Train high-quality models with a large number of carefully annotated input pictures. Roughly 1500 pictures are suggested for accurate class detection [10]. An assemblage of Hebrew inscriptions has been gathered. The collected data set comprises images from the Old Testament book depicting the genesis of creation, as well as images of a recently discovered Hebrew manuscript in Iraq. Furthermore, a collection of deceased photographs was posted on the internet. The data collection process faced challenges due to the presence of Hebrew texts with overlapping and diminutive letters, making their identification somewhat arduous. Additionally, identifying Hebrew letters was limited to 22, necessitating a diverse range of visual data and a substantial quantity to enhance the training. There are two options to acquire a greater quantity of labeled data: either increase the size of the dataset or employ data augmentation techniques to amplify the datasetʼs size. Expanding the dataset can improve the model's capacity to detect things precisely. Fig. 1 displays several images of Hebrew texts.

Fig. 1.

Fig. 1

Some pictures of Hebrew texts.

For the classification model, pictures of the Assyrian writings for 14 stone tablets were collected from the Iraq Museum, and the rest of the pictures were collected from the Cuneiform Digital Library Initiative (CDLI (website. As for the Sumerian and Babylonian writings, part of them were collected from the CDLI website, and some of them are different pictures from what is available online. The Assyrian images reached 1327, the Sumerian ones reached 1435, and the Babylonian ones reached 1321 images. Fig. 5 shows the images from the cuneiform tablets’ dataset. Fig. 2 shows the images from the cuneiform tablet's dataset.

Fig. 5.

Fig. 5

The proposed model block diagram.

Fig. 2.

Fig. 2

Some pictures of classification dataset.

4. Experimental Design, Materials and Methods

4.1. Dataset labelling

All 22 Hebrew letters were identified, as shown in Table 1, using the roboflow annotation tool, which is one of the tools used in labeling. The largest number for annotation is 2381 for the letter Vav, and the smallest number is 123 for the letter Tet [11], as shown in Fig. 3. It shows the number of annotations that were reached after doing the labeling. Verbal signs are printed in another language to make the signs easier to read. For example, the tag (א) is pronounced as Alef, and the remaining tags (ב) are pronounced as Bet.

Table 1.

Dataset split.

Image, table 1

Fig. 3.

Fig. 3

Number of annotations to each letter.

The classification model does not need labeling using the Roboflow platform; it is just the dataset that was split to train, validate, and test images after the pre-processing and augmentation for the dataset and then entered into the model.

4.2. Enhance the dataset

To enhance the dataset, the Roboflow was used to annotate, pre-process, and enrich the dataset. With a total of 21,699 annotations, the collection includes 471 tagged photos. The auto-orient, auto-adjust constraints and the resize image (640 × 640 pixels) were utilized. After that, you may adjust the brightness, hue, saturation, and rotation between −15° and +15°, −15 % and +15 %, and −31 % and +31 %, respectively, and noise up to 1.8 % of pixel augmentation.

When used in pre-processing, auto-orient changes the orientation of the pictureʼs pixels, which in turn changes the orientation of the objects; this aids target detection in cases when the image is rotated. Implementing automated limitations might enhance our neural networks' capacity to understand the objects' nature. Edges are made more distinct with contrast pre-processing because it increases the contrasts between nearby pixels. The permitted input picture size for yolov8 is 640×640px, therefore you'll need to resize it to that size [12].

Data augmentation is effective because it enhances the semantic breadth of a dataset. A widely used data augmentation technique involves applying random rotations to the data. The source image undergoes a random rotation in either a clockwise or anticlockwise direction. In object detection problems, updating the bounding box to include the generated object is imperative. The brightness Introduces fluctuations in image luminosity to enhance the adaptability of your model to variations in lighting conditions and camera configurations. To encourage a model to investigate various color schemes for objects and scenes in the input photos, hue augmentation randomly alters the color channels of the input images [13]. This technique is beneficial for verifying that a model is not simply recalling the colors of a certain object or scene. Similar to hue modification, saturation augmentation alters the image's vibrancy. Grayscale results from a fully desaturated picture, whereas muted colors are shown in a partially desaturated one. On the other hand, increasing the saturation of an image intensifies the colors, shifting them closer to the primary hues. Noise is a type of flaw that machines find particularly vexing compared to human comprehension. While humans can disregard noise or incorporate it into the appropriate context effortlessly, algorithms encounter difficulty in doing so. The phenomenon referred to as adversarial attacks originates from manipulating pixels in a manner that is invisible to humans yet significantly impacts a neural network's capacity to anticipate outcomes accurately. Fig. 4 shows the images accompanied by labels used for training and validation purposes. The pictures show how it looks after preprocessing and augmentation.

Fig. 4.

Fig. 4

Images accompanied with labels used for training and validation purposes.

Table 2 shows the results of the two variant versions (V) of dataset splitting that were used to determine the training, validation, and testing ratios.

Table 2.

Hebrew dataset split.

Version No. Train – Val - Test # of all images Pre-processing Augmentation Background images
1 993 - 94 – 46 1133 1- Auto-orient.
2- Resize (640×640 px)
3- Auto-adjust contrast.
1- Rotation.
2- Brightness.
3- Hue.
4- Noise.
Without Background images
2 With Background images

Different numbers of images were chosen for training and testing. The initial version utilized three pre-processing techniques (auto-orient, resize, auto-adjust) and four augmentation techniques (rotation, brightness, hue, noise), resulting in the generation of 1133 images (train=993 - 88 %, val=94 - 8 %, test=46 - 4 %) using the roboflow tool. In the second version, the training was made with a background image to enhance accuracy and reduce FP.

The classification model for pre-processing, Histogram equalization, was employed to normalize the photos by equalizing light distribution. As part of the pre-processing stage, a bilateral filter was employed to reduce noise. This filter effectively smooths the pixels and enhances their edges. The applied augmentation involves rotating within a range of −90 to +90°, which enhances the modelʼs adaptability to photos captured from various perspectives. The second augmentation approach involves vertical and horizontal flips on the images. These two actions were undertaken to enhance the precision of the model, as shown in Table 3.

Table 3.

Classification dataset split.

Total number of images Train – Val - Test Pre-processing Augmentation
Classification Sumerian = 1440
Assyrian = 1330 Babylonian = 1330
70 % - 15 % - 15 % - Normalization
- Noise reduction
- Rotation
- Flipping

4.3. The proposed model

Fig. 5 provides a comprehensive overview of the models that have been presented.

Once the Roboflow platform has labeled all the classes in each image, the initial step in the detection model process is to input the image into the model.

Subsequently, the preprocessing and augmentation approach will be employed to enhance and enlarge the dataset substantially.

During the training phase, the Yolov8 model was exposed to the dataset for integration. After finishing the training, we initiated the model testing procedure by inputting images from the testing dataset. The results of this procedure involved the formation of a dataset comprising all the Hebrew letters, which were subsequently categorized based on the sound of each letter. The Google text-to-speech (gtts) application programming interface (API) was used to convert each label from text to speech [14].

The cuneiform tablets are categorized using the same methods, although, unlike other models, labeling is not mandatory in this approach. Fig. 6 shows the Proposed Models Flowchart for Detection and Classification.

Fig. 6.

Fig. 6

The proposed models flowchart for detection and classification.

4.4. Performance evaluation

The detection model achieved impressive assessment results upon training the model using a customized dataset. Specifically, the mean average precision at 50 % intersection over union (mAP50) was 92 %, the mAP50-90 was 73.5 %, the precision was 93.2 %, and the recall was 89.8 % for the initial version after 100 epochs. Table 4 displays the performance evaluation.

Table 4.

Performance evaluation.

V Epochs mAP50 mAP50-90 Precision Recall
1 100 92 % 73.5 % 93.2 % 89.8 %
2 100 91.3 % 72.5 % 92.6 % 88.9 %

The confusion matrix is a method for assessing the performance of a detection model. The y-axis represents the predicted class, whereas the x-axis represents the actual class [15].

Figs. 7 and 8 demonstrate that classes with more labels have more erroneous predictions than classes with fewer labels. As previously stated, the Tet=123 labels and the vav=2381 labels.

Fig. 7.

Fig. 7

Version one (V1) for separated dataset and their respective confusion matrix.

Fig. 8.

Fig. 8

Version two (V2) for separated dataset and their respective confusion matrix.

The maximum true positive (TP) value in versions V1 and V2 was 0.99 for the Ayin class, while the lowest was 0.62 for the Gimal class.

Fig. 9 displays the plots of F1-confidence, precision-recall, recall-confidence, and precision-confidence for various IOU levels.

Fig. 9.

Fig. 9

The confidence threshold curves.

The F1-confidence curve illustrates the balance between false positive and false negative predictions. The precision-confidence curve indicates that the average outcome for all categories is 0.99 at a confidence level of 1.00. In the precision-recall curve, the mAP50 value is 0.92. Finally, the mean result of the recall-confidence curve for all classes at a threshold of zero is 0.94.

Fig. 10 below depicts the three different kinds of losses that can occur in an object detection model. The box-loss, cls-loss, and df1-loss values for the two versions are as follows: 0.4289-0.2358-0.8316 for version 2 and 0.4235-0.2315-0.8314 for version 1.

Fig. 10.

Fig. 10

The measurement results curves.

The classification modelʼs performance accuracy is measured by an algorithmʼs top1 and top5 error rates on a classification task. The results of top1-acc and top2-acc are 96 % and 100 %, respectively, after 100 epochs, with a validation loss of 0.58714 and a train loss of 0.02729, as shown in Fig. 11.

Fig. 11.

Fig. 11

The classification mode accuracy.

The confusion matrix for the two classes results in 0.97 for Assyrian, 0.95 for Babylonian, and 0.98 for Sumerian ratio TP as shown in Fig. 12.

Fig. 12.

Fig. 12

Confusion matrix of classification model.

The generated results are regarded as very high quality for models. The classification training and testing results reached the best results; however, we also need to add more photos to the dataset to expand the diversity of images and strengthen the training.

Limitations

When collecting the dataset, many challenges were encountered. Firstly, both datasets had a limited number of photos. However, deep learning algorithms require many huge images for each class. Certain cuneiform tablets underwent degradation due to environmental causes, resulting in the partial annihilation of these artifacts. This also holds true for several ancient Hebrew manuscripts.

Ethics Statement

The data was gathered from the Iraq Museum with the consent of museum authorities. Similarly, the remaining photographs were sourced from previously listed websites accessible to the general public and can be utilized.

CRediT authorship contribution statement

Elaf A. Saeed: Data curation, Software, Writing – original draft. Ammar D. Jasim: Conceptualization, Methodology, Writing – original draft. Munther A. Abdul Malik: Writing – original draft, Writing – review & editing.

Acknowledgments

We extend our thanks to (Prof.Dr. Bahaa Amer) for assisting with the Hebrew language.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability

References

  • 1.Saeed E.A., Jasim A.D., Abdul Malik M.A. Deciphering the past: enhancing Assyrian Cuneiform recognition with YOLOv8 object detection. Int. J. Adv. Technol. Eng. Explor. 2023;10(109):1604–1621. doi: 10.19101/IJATEE.2023.10102331. [DOI] [Google Scholar]
  • 2.Liu L., et al. Deep learning for generic object detection: a survey. Int. J. Comput. Vis. Feb. 2020;128(2):261–318. doi: 10.1007/s11263-019-01247-4. [DOI] [Google Scholar]
  • 3.E. Saeed, “Cuneiform Tablets,” Mendeley Data. Accessed: May 16, 2024. [Online]. Available: https://data.mendeley.com/datasets/wpwxgvwhtt/1.
  • 4.E. Saeed, “Hebrew Scripts,” Mendeley Data. Accessed: May 16, 2024. [Online]. Available: https://data.mendeley.com/datasets/hjbrf25mwx.
  • 5.Lahoud H., Share D.L., Shechter A. A developmental study of eye movements in Hebrew word reading: the effects of word familiarity, word length, and reading proficiency. Front. Psychol. 2023;14:1052755. doi: 10.3389/fpsyg.2023.1052755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Fisseler D., Weichert F., Cammarosano M. 4th Conference Scientific Computing and Cultural Heritage. 2013. Towards an interactive and automated script feature analysis of 3D scanned cuneiform tablets; p. 16. [Google Scholar]
  • 7.Dencker T., Klinkisch P., Maul S.M., Ommer B. Deep learning of cuneiform sign detection with weak supervision using transliteration alignment. PLoS ONE. 2020;15(12) doi: 10.1371/journal.pone.0243039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Yi H., Liu B., Zhao B., Liu E. Small object detection algorithm based on improved YOLOv8 for remote sensing. IEEE J. Sel. Top Appl. Earth Obs. Remote Sens. 2023;17 doi: 10.1109/JSTARS.2023.3339235. 1734–1734. [DOI] [Google Scholar]
  • 9.Terven J., Córdova-Esparza D.-M., Romero-González J.-A. A comprehensive review of yolo architectures in computer vision: from yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023;5(4):1680–1716. [Google Scholar]
  • 10.Lakhani P., Gray D.L., Pett C.R., Nagy P., Shih G. Hello world deep learning in medical imaging. J. Digit. Imaging. 2018;31:283–289. doi: 10.1007/s10278-018-0079-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Anon. “HebrewDetection >Browse.” Accessed: Jul. 11, 2024. [Online]. Available: https://universe.roboflow.com/cuniformcustom/hebrewdetection/browse?queryText=&pageSize=50&startingIndex=0&browseQuery=true.
  • 12.Huang Z., Li L., Krizek G.C., Sun L. Research on traffic sign detection based on improved YOLOv8. J. Comput. Commun. 2023;11(07):226–232. doi: 10.4236/jcc.2023.117014. [DOI] [Google Scholar]
  • 13.Tamang S., Sen B., Pradhan A., Sharma K., Singh V.K. Enhancing covid-19 safety: exploring yolov8 object detection for accurate face mask classification. Int. J. Intell. Syst. Appl. Eng. 2023;11(2):892–897. [Google Scholar]
  • 14.Gibiansky A., et al. Deep voice 2: multi-speaker neural text-to-speech. Adv. Neural Inf. Process. Syst. 2017;30 [Google Scholar]
  • 15.Ahmad T., Ma Y., Yahya M., Ahmad B., Nazir S., Haq A.U. Object detection through modified YOLO neural network. Sci. Program. 2020;2020:1–10. doi: 10.1155/2020/8403262. 8403262. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES