Data set for Tifinagh handwriting character recognition

Omar Bencharef; Younes Chihab; Nouredine Mousaid; Mustapha Oujaoura

doi:10.1016/j.dib.2015.04.008

. 2015 Apr 23;4:11–13. doi: 10.1016/j.dib.2015.04.008

Data set for Tifinagh handwriting character recognition

Omar Bencharef ^a,^⁎, Younes Chihab ^b, Nouredine Mousaid ^a, Mustapha Oujaoura ^c

PMCID: PMC4510372 PMID: 26217753

Abstract

The Tifinagh alphabet-IRCAM is the official alphabet of the Amazigh language widely used in North Africa [1]. It includes thirty-one basic letter and two letters each composed of a base letter followed by the sign of labialization. Normalized only in 2003 (Unicode) [2], ICRAM-Tifinagh is a young character repertoire. Which needs more work on all levels. In this context we propose a data set for handwritten Tifinagh characters composed of 1376 image; 43 Image For Each character. The dataset can be used to train a Tifinagh character recognition system, or to extract the meaning characteristics of each character.

1. Specifications table

Subject area	Computer science

More specific subject area	Image processing, character recognition
Type of data	Image
How data was acquired	Hand writing and scanner
Data format	Jpg image
Experimental factors	We ask 30 students to write in each cell of a table all Tifinagh characters, we use an Epson 10000XL to data scan, and we add 13 more features to take on consideration horizontal and vertical inclination
Experimental features	1376 Image with a size of 30⁎30px (43 images/character)
Data source location	Essaouira, Morocco
Data accessibility	Within this article

Open in a new tab

2. Value of the data

•
The Amazigh language is considered as official language only in 2003. Therefore, The integration of the Tifinagh alphabet in new information technologies and communication (ICT) and engage in research in this field has become a major necessity [1], [2].
•
The Amazigh language is spoken by about 30 million people in North Africa (the oasis of Siwa in Egypt, Morocco through Libya, Tunisia, Algeria, Niger, Mali, Burkina Faso and Mauritania) [3], [4].
•
Due to the diversity of hand writing characters, there are two big approaches in this field and both need a dataset to be executed: the first one is based on complex classifiers like Artificial Neural Network or Support Vector Machine; those classifiers need a dataset to be trained to classify characters[5]. The other approaches also need a dataset this time to find a normalization of each character.
•
The data set is very useful to train classification system for Tifinagh hand writing, that remain an active area of research.
•
The dataset is the first free and on line dataset for handwriting Tifinagh character without formalities.

3. Experimental design, materials and methods

We ask 30 people (17 male and 13 female) to write the 32 Tifinagh (Fig. 1) characters on one page, and we add 13 more features to take on consideration horizontal and vertical inclination. The pages where scanned using the Epson 10000XL.

The extraction steps were:

•
We use the horizontal histogram to correct the inclination of every page [6].
•
Using connected components algorithm we detect the center of each character [7].
•
We extract 31 sub-images of 30×30px that contain the characters (Fig. 2).
•
The sub-image are named using Latin character mentioned in Fig. 1 for each character followed by number from 1 to 30( a1,a2…a43 ). For character with sub point we use a double character (hh or zz). For epsilon we use a double A(aa1,aa2…aa43).

Fig. 2 — Example of handwriting Tifinagh character from the proposed data base.

To automatically explore the dataset or to extract features from the whole dataset we propose the following Matlab code:

function x=base_generation()

//Read all jpg image from folder ‘data_set’

fileFolder = fullfile(‘data_set’);

dirOutput = dir(fullfile(fileFolder,’⁎.jpg’))

fileNames = {dirOutput.name}׳

numFrames = numel(fileNames)

cd ‘data_set׳

p = imread(fileNames{2});

//We read and converts to gray level the first image then we call the //feature extraction process

d=imread(p);

d=double(d)/255;

y=rgb2gray(d);

t =zmoment(y,11); // Call the feature extraction

b=t; //we add the ‘t’ to the data matrix

// We repeat the same treatment for the rest of the data set

for i=2:1240

p=m{i};

d=imread(p);

d=double(d)/255;

y=rgb2gray(d);

t=zmoment(y,11); // Call the feature extraction function(Zernike for // example)

b=[b;t];

End

x=b

Open in a new tab

Footnotes

^{Appendix A}

Supplementary data associated with this article can be found in the online version at doi:10.1016/j.dib.2015.04.008.

Appendix A. Supplementary materials

Supplementary data

mmc1.zip^{(2.1MB, zip)}

References

1.Sadiqi Fatima. vol. 2. Oxford University Press; 2011. The Teatchin of Tifinagh (Berber) in Morocco, Handbook of Language and Ethnic Identity; pp. 33–44. (The Success-Failure Continuum in Language and Ethnic Identity Efforts). [Google Scholar]
2.Institut Royal de la Culture Amazighe(IRCAM). Proposition de codification des tifinaghes, Rabat, Morocco, 2003a.
3.Bentayebia K., Abadaa F., Ihzmadb H., Amzazia S. vol. 2. Elsevier Meta Gene; 2014. Genetic ancestry of a Moroccan population as inferred from autosomal STRs; pp. 427–438. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Katherine E., Hoffman . vol. 26. Elsevier—Language & Communication; 2006. Berber Language Ideologies, Maintenance, and Contraction: Gendered Variation in the Indigenous Margins of Morocco; pp. 144–167. [Google Scholar]
5.Oujaoura M., Minaoui B., Fakir M., Ayachi R., Bencharef O. Article: recognition of isolated printed Tifinagh characters. Int. J. Comput. Appl. 2014;85(1) 1-13, January. [Google Scholar]
6.Vijayashree C.S., Kagawade Vishwanath C., Vasudev T. Article: estimation of Tilt in characters and correction for better readability by OCR systems. Int. J. Comput. Appl. 2014;90(13) 1-7, March. [Google Scholar]
7.Kenji Suzuki,a Isao Horiba,a, Sugieb Noboru. Linear-time connected-component labeling based on sequential local operations. Comput. Vision Image Understanding. 2003;89:1–23. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary data

mmc1.zip^{(2.1MB, zip)}

[bib1] 1.Sadiqi Fatima. vol. 2. Oxford University Press; 2011. The Teatchin of Tifinagh (Berber) in Morocco, Handbook of Language and Ethnic Identity; pp. 33–44. (The Success-Failure Continuum in Language and Ethnic Identity Efforts). [Google Scholar]

[bib2] 2.Institut Royal de la Culture Amazighe(IRCAM). Proposition de codification des tifinaghes, Rabat, Morocco, 2003a.

[bib3] 3.Bentayebia K., Abadaa F., Ihzmadb H., Amzazia S. vol. 2. Elsevier Meta Gene; 2014. Genetic ancestry of a Moroccan population as inferred from autosomal STRs; pp. 427–438. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Katherine E., Hoffman . vol. 26. Elsevier—Language & Communication; 2006. Berber Language Ideologies, Maintenance, and Contraction: Gendered Variation in the Indigenous Margins of Morocco; pp. 144–167. [Google Scholar]

[bib5] 5.Oujaoura M., Minaoui B., Fakir M., Ayachi R., Bencharef O. Article: recognition of isolated printed Tifinagh characters. Int. J. Comput. Appl. 2014;85(1) 1-13, January. [Google Scholar]

[bib6] 6.Vijayashree C.S., Kagawade Vishwanath C., Vasudev T. Article: estimation of Tilt in characters and correction for better readability by OCR systems. Int. J. Comput. Appl. 2014;90(13) 1-7, March. [Google Scholar]

[bib7] 7.Kenji Suzuki,a Isao Horiba,a, Sugieb Noboru. Linear-time connected-component labeling based on sequential local operations. Comput. Vision Image Understanding. 2003;89:1–23. [Google Scholar]

PERMALINK

Data set for Tifinagh handwriting character recognition

Omar Bencharef

Younes Chihab

Nouredine Mousaid

Mustapha Oujaoura

Abstract

1. Specifications table

2. Value of the data

3. Experimental design, materials and methods

Fig. 1.

Fig. 2.

Footnotes

Appendix A. Supplementary materials

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Data set for Tifinagh handwriting character recognition

Omar Bencharef

Younes Chihab

Nouredine Mousaid

Mustapha Oujaoura

Abstract

1. Specifications table

2. Value of the data

3. Experimental design, materials and methods

Fig. 1.

Fig. 2.

Footnotes

Appendix A. Supplementary materials

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases