Handwritten Math to LaTeX with Arabic Numerals
Abstract
The objective of this study is to investigate a method for automating the conversion of handwritten mathematical expressions containing Persian/Arabic numerals into LaTeX format. To achieve this, a dataset comprising 26,141 images of handwritten mathematical expressions, along with their corresponding LaTeX representations, was collected. This dataset includes a diverse range of handwriting styles and mathematical symbols. An end-to-end encoder-decoder model was developed and trained on this dataset. The model employs a modified ResNet-18 encoder to capture spatial hierarchies and a Transformer-based decoder to model symbol dependencies. The model achieved a low Character Error Rate (CER) of 0.0671 on the test set. The results demonstrate the potential of the proposed approach for accurate recognition of mathematical notation involving non-Latin numerals. This research contributes a valuable resource and methodology to the field, serving as a significant advancement toward developing robust handwriting recognition systems for mathematical notation.
Related articles
Related articles are currently not available for this article.