Skip to main content
Data in Brief logoLink to Data in Brief
. 2015 Jul 26;4:534–543. doi: 10.1016/j.dib.2015.07.018

Handwriting Moroccan regions recognition using Tifinagh character

B El Kessab 1, C Daoui 1, B Bouikhalene 1, R Salouan 1
PMCID: PMC4783523  PMID: 26966718

Abstract

The territorial organization of Morocco during administratives division of 2009 is based on 16 regions. In this work we will create a system of recognition of handwritten words (names of regions) using the Amazigh language is an official language by the Moroccan Royal Institute of Amazigh Culture (IRCAM) (2003a) [1] such as this language is slightly treated by researchers in pattern recognition field that is why we decided to study this language (El Kessab et al., 2013 [3]; El Kessab et al., 2014 [4]) that knowing the state make a decision to computerize the various public sectors by this language.

In this context we propose a data set for handwritten Tifinagh regions composed of 1600 image (100 Image for each region). The dataset can be used in one hand to test the efficiency of the Tifinagh region recognition system in extraction of characteristics significatives and the correct identification of each region in classification phase in the other hand.


Specifications table

Subject area Computer science
More specific subject area Image processing, handwritten Tifinagh region, the Amazigh language
Type of data Image
How data was acquired Handwritten, Scanner, Marker
Data format Jpeg image
Experimental factors We ask 70 students to write 16 regions with Tifinagh characters, we use an HP G3110 with maximum resolution 4800×9600 dpi to data scan, and we use a marker in writing of characters
Experimental features 1376 Image with a size of 30×30 pixels (100 images/region)
Data source location Béni Mellal, Morocco
Data accessibility Within this article

Value of the data

  • The region is the current highest administrative division of Morocco. The regions are subdivided into a total of 63s-order administrative divisions, which are prefectures and provinces [2] A Moroccan region is governed by a Wali, nominated by the King. The Wali is also governor of the province (or prefecture) where he resides.

  • As part of a 1997 decentralization and regionalization law passed by the legislature 16 new regions of Morocco were created.

  • We chose a database word contains 1000 words written in marker and that represents the 16 region of Morocco.

  • Optical Character Recognition (OCR) can be applied on both cases printed or handwritten. In this work we use several efficient techniques in each of the three principal phases forming a the system of recognition which are firstly the pre-processing then secondly the features extraction then finally learning-classification several studies has been done for recognition of Handwritten Tifinagh regions recognition by using in the features extraction phase the square and triangular zoning method in one hand or in the learning-classification phase the support vectors machines (SVM) and the neural networks on the other hand.

  • Amazigh alphabet is considered as a national language since a new constitution of 2011 is a creative field [3], [4], [5], [6], [7] is very useful to create a system for Tifinagh hand writing words representing the regions.

Experimental design, materials and methods

For several years, on-line and off-line handwriting character recognition has been considered as a very dynamic field given that its applicability in many different domains such as bank check processing, automatic data entry and postal sorting, The postal automation, bank checks identification, automatic processing of administrative files, etc. In this work we have presented the steps of the recognition system in Fig. 1.

Fig. 1.

Fig. 1

The proposed system for handwritten Tifinagh words recognition.

We chose a database word contains 1000 words written with marker and that represents sixteen region of Morocco Table 1.

Table 1.

The obtained recognition rates τr and τg by each hybrid method and each classifier.

Regions Neural networks
Support vectors machines
Square zoning Triangular zoning Square zoning Triangular zoning
graphic file with name fx1.gif 70.00 67.57 82.00 74.49
graphic file with name fx2.gif 79.13 73.4 80.00 75.18
graphic file with name fx3.gif 60.00 60.74 83.00 80.34
graphic file with name fx4.gif 55.09 53.48 76.67 70.61
graphic file with name fx5.gif 65.21 64.00 74.00 70.78
graphic file with name fx6.gif 63.25 65.85 69.00 66.60
graphic file with name fx7.gif 50.18 50.97 68.67 65.00
graphic file with name fx8.gif 69.66 65.93 71.67 70.56
graphic file with name fx9.gif 64.46 61.71 67.00 63.60
graphic file with name fx10.gif 71.31 67.40 72.00 70.96
graphic file with name fx11.gif 73.00 71.43 74.00 71.73
graphic file with name fx12.gif 66.11 64.84 67.57 69.41
graphic file with name fx13.gif 68.67 62.69 70.40 69.00
graphic file with name fx14.gif 73.37 69.29 72.74 70.44
graphic file with name fx15.gif 67.45 66.04 69.48 61.33
graphic file with name fx16.gif 69.26 68.34 81.4 74.18
τg 66.64 64.61 73.73 70.26

All values of the recognition rate for each region τr (given in %) and also those of the global rate recognition τg of all 16 regions (given in %) which we have obtained in the table.

The extraction steps were

  • We ask 70 students (in Laboratory of Information Processing and Decision Support) to write the 16 region with Tifinagh characters (Fig. 3).

  • The direction of writing of this character is the left to right in horizontal lines.

  • The characters are written in a way separated in the text (see Fig. 2, Fig. 3).

  • Each original region image has a size equal to 30×30 pixels (Fig. 4, Fig. 5, Fig. 6, Fig. 7, Fig.8, Fig. 9, Fig. 10, Fig. 11, Fig. 12, Fig. 13, Fig. 14, Fig. 15, Fig. 16, Fig. 17).

  • The number of the square zones in features extraction equal to 4, 6 and 9 zones.

  • The number of the triangles zones in features extraction equal to 4, 6 and 8 zones.

  • Each numeral is transformed to a vector of 4, 6 and 9 components for square zoning and to a vector of 4, 6 and 8 components of triangular zoning in features extraction.

  • The standard deviation of the GRBF kernel function is equal to 0.1 in classification phase with support vectors machines.

  • The degree of the Polynomial (POL) kernel function is equal to 10 and their parameters a=b=1 in classification phase with support vectors machines.

  • We realized a variation on the size of the zones in features extraction to find the best performing method.

  • To do this, we have chosen the values {5, 10, 15} of hidden layer neurons number.

Fig. 3.

Fig. 3

Sixteen regions with Amazigh language.

Fig. 2.

Fig. 2

Comparison between the Tifinagh, Arabic and Latin characters.

Fig. 4.

Fig. 4

Example of handwritten Tifinagh region from the proposed data base.

Fig. 5.

Fig. 5

The graphical representation of recognition rate τr for each region.

Fig. 6.

Fig. 6

The original image.

Fig. 7.

Fig. 7

Graphical representation of segmentation column.

Fig.8.

Fig.8

The segmentation in columns.

Fig. 9.

Fig. 9

The square zoning method.

Fig. 10.

Fig. 10

Processes of feature extraction by square zoning.

Fig. 11.

Fig. 11

The triangular zoning method.

Fig. 12.

Fig. 12

Processes of feature extraction by triangular zoning.

Fig. 13.

Fig. 13

The determination of optimal hyperplane, vectors supports, maximum Marge and valid hyperplanes.

Fig. 14.

Fig. 14

The multi-layer perceptron.

Fig. 15.

Fig. 15

The graphical representation of recognition rate τr of each region with all methods.

Fig. 16.

Fig. 16

The graphical representation of recognition rate τr of all feature extraction methods.

Fig. 17.

Fig. 17

The graphical representation of recognition rate τr of square zoning method and SVM classifier.

The graphical representation to recognition rate of each region τr is shown in Fig. 5.

Footnotes

Appendix A

Supplementary data associated with this article can be found in the online version at doi:10.1016/j.dib.2015.07.018.

Appendix A. Supporting information

Supplementary data

mmc1.zip (17.9MB, zip)

References

  • 1.Institut Royal de la Culture Amazighe (IRCAM). Proposition de codification des tifinaghes, Rabat, Morocco, 2003a.
  • 2.Morocco in Figures 2003: a document by the Moroccan Embassy in the USA.
  • 3.El Kessab B., Daoui C., Bouikhalene B. Handwritten Tifinagh text recognition using neural networks and hidden Markov models. Int. J. Comput. Appl. 2013;75(18):975–8887. [Google Scholar]
  • 4.El Kessab B., Daoui C., Bouikhalene B., Salouan. R. Some comparative studies or cursive handwritten Tifinagh characters recognition systems. Int. J. Hybrid Inform. Technol. 2014;7(6):295–306. [Google Scholar]
  • 5.Sadiqi Fatima. Vol. 2. Oxford University Press; 2003. The Teatchin of Tifinagh (Berber) in Morocco, Hand book of Language and Ethnic Identity, The Success-Failure Continuum in Language and Ethnic Identity Efforts; pp. 33–44. [Google Scholar]
  • 6.K. Bentayebia, F. Abadaa, H. Ihzmadb, S. Amzazia, Genetic an cestry of a Moroccan population as inferred from autosomal STRs, Elsevier Meta Gene, 427–438. [DOI] [PMC free article] [PubMed]
  • 7.E. Katherine, Hoffman, Berber Language Ideologies, Maintenance, and Contraction: Gendered Variation in the Indigenous Margins of Morocco, Elsevier—Language &Communication, 144–167.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary data

mmc1.zip (17.9MB, zip)

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES