Skip to main content
Data in Brief logoLink to Data in Brief
. 2016 Mar 2;7:432–436. doi: 10.1016/j.dib.2016.02.060

Handwritten mathematical symbols dataset

Yassine Chajri 1,, Belaid Bouikhalene 1
PMCID: PMC4786757  PMID: 27006975

Abstract

Due to the technological advances in recent years, paper scientific documents are used less and less. Thus, the trend in the scientific community to use digital documents has increased considerably. Among these documents, there are scientific documents and more specifically mathematics documents.

In this context, we present our own dataset of handwritten mathematical symbols composed of 10,379 images.

This dataset gathers Arabic characters, Latin characters, Arabic numerals, Latin numerals, arithmetic operators, set-symbols, comparison symbols, delimiters, etc.

Keywords: Image processing, Handwritten mathematical symbols, Documents, Recognition


Specifications Table

Subject area Computer science
More specific subject area Image processing, handwritten mathematical symbols, documents recognition
Type of data Image
How data was acquired Handwritten, Scanner, Marker
Data format Jpeg image
Experimental factors We asked 97 students of our university to write a list of mathematical symbols, we used an HP G3110 to scan data and we used a marker in symbols writing
Experimental features 10,379 Images with a size of 80×60 pixels
Data source location Beni Mellal, Morocco
Data accessibility Within this article

Value of the data

  • Given the importance of mathematics in all branches of science (physics, engineering, medicine, economics, etc.), the recognition of handwritten mathematical expressions has become a very important area of scientific research.

  • We prepared a dataset which contains 10,379 symbols written in marker and which represents the most frequently used symbols.

  • This dataset gathers Arabic and Latin symbols which make it a very important dataset compared to the others presented in the literature.

  • It contains a large number of mathematical symbols and is characterized by several styles of writing.

  • This dataset is very useful to implement a recognition system for handwritten mathematical documents and it will help facilitate the research in this important area.

1. Data, experimental design, materials and methods

1.1. Data preparation

For the preparation of our dataset we;

  • Targeted 97 students (47 male and 50 female) of our university (Bachelor, Master and Doctorate).

  • Asked them to write a list of mathematical symbols in order to have a diversity of writing styles.

  • Used an HP G3110 to scan pages.

  • Used Radon transform [1], [2], [3] for skew detection and correction.

  • Used histogram equalization [4] for images normalization.

  • Median filtering [5], [6] for image noise reduction.

  • Used connected components algorithm for symbols detection [7].

  • Extracted 10,379 sub-images with a size of 80*60 which contain the symbols (Fig.1).

Fig. 1.

Fig.1.

Examples of handwritten mathematical symbols in our dataset.

The images are named in three parts:

  • The first is the symbol name.

  • The second part makes the difference between Arabic and Latin symbols (A or L).

  • The last part is represented by numbers from 1 to 97 (Table 1, Table 2, Table 3, Table 4).

Table 1.

Mathematical symbols dataset.

Symbols Description
A, B, C, D, E, F, G, H, I,…………….,U, V, W, X, Y, Z Latin characters
م،ن،ه،و،ي،ء،……………………..ا،ب،ت،ث،ج Arabic characters
1,2,3,4,5,6,7,8,9 Latin numerals
٠,١,٢,٣,_,_,_,٧,٨,٩ Arabic numerals
,, summation or product symbols
Integral symbol
Square root
|, (,), {, }, [, ] Delimiters symbols
=, ≠, <, >,,+, *, ×, /,,←,⋂, ⋃, ⊃, ⊄, ⊂, ∈, ∉ Arithmetic operators, comparison operators, set symbols

Table 2.

Comparison between the Arabic and Latin characters.

Latin characters Arabic characters
A ا
B ب
C ت
D ث
E ج
F ح
G خ
H د
I ذ
J ر
K ز
L س
M ش
N ص
O ض
P ط
Q ظ
R ع
S غ
T ف
U ق
V ك
W ل
X م
Y ن
Z ه
_ و
_ ي
ء

Table 3.

Comparison between the Arabic and Latin numerals.

Latin numerals Arabic numerals
0 ٠
1 ١
2 ٢
3 ٣
4 ٤
5 ٥
6 ٦
7 ٧
8 ٨
9 ٩

Table 4.

Some of the composed symbols.

Composed Latin symbols Composed Arabic symbols
Cos حتا
Sin حا
Tan طا
Log لو
Lim نها
…. ….

Footnotes

Appendix A

Supplementary data associated with this article can be found in the online version at doi:10.1016/j.dib.2016.02.060.

Appendix A. Supplementary material

Supplementary material

mmc1.zip (293B, zip)

Supplementary material

mmc2.zip (191.7KB, zip)

References

  • 1.M. Hasegawa, S. Tabbone, Histogram of radon transform with angle correlation matrix for distortion invariant shape descriptor, NeuroComputing.
  • 2.Carsten Hoilund, The Radon Transform.
  • 3.Desai A. Segmentation of characters from old typewritten documents using radon transform. Int. J. Comput. Appl. 2012;37(9):0975–8887. [Google Scholar]
  • 4.S. Parker, J. Kemi, Ladeji-Osias Implementing a Histogram Equalization Algorithm in Reconfigurable Hardware.
  • 5.Sukomal Mehta, Sanjeev Dhull, Fuzzy based median filter for gray-scale images, International Journal of Engineering Science and Advanced Technology.
  • 6.Singh Kh. Manglem. Fuzzy rule based median filter for gray-scale images. J. Inf. Hiding Multimed. Signal Process. 2011;2(2) [Google Scholar]
  • 7.Yapa R. Dharshana, Harada K. Connected component labeling algorithms for gray-scale images and evaluation of performance using digital mammograms. IJCSNS Int. J. Comput. Sci. Netw. Secur. 2008;8(6) [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary material

mmc1.zip (293B, zip)

Supplementary material

mmc2.zip (191.7KB, zip)

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES