latex_finetuned Open-Source OCR Model - Free Processing of Handwritten Math Images and Structured Syntax

Latex Finetuned

Developed by tjoab

A Transformer-based optical character recognition model optimized for processing handwritten math images and structured math syntax.

Text Recognition

Transformers

#Handwritten Math OCR #LaTeX Conversion #Structured Math Syntax

Downloads 109

Release Time : 3/2/2025

Model Overview

Converts handwritten math formulas into clean LaTeX code, suitable for OCR recognition of individual mathematical expressions.

Model Features

Handwritten Math Formula Recognition

Specifically optimized for handwritten math formulas, capable of accurately recognizing complex mathematical expressions.

LaTeX Output

Converts recognition results into clean LaTeX code for easy use in academic and technical documents.

Efficient Training

Utilizes mixed-precision training and gradient accumulation techniques to optimize training efficiency and GPU memory usage.

Model Capabilities

Handwritten Math Formula Recognition

LaTeX Code Generation

Image-to-Text Conversion

Use Cases

Academic Research

Digitizing Handwritten Math Notes

Convert handwritten math notes into editable LaTeX format for academic paper writing.

CER of 14.9%

Education

Automatic Grading of Math Homework

Recognize handwritten math answers from students for automatic grading or format conversion.

🚀 TrOCR-LaTeX (fine-tuned on math handwriting)

Transform handwritten math into clean LaTeX code. This fine-tuned model, based on a transformer-based OCR model, is adapted for handwritten math images and structured math syntax.

🚀 Quick Start

Take your handwritten math and turn it into clean LaTeX code. This is a fine-tuned version of microsoft/trocr-base-handwritten, a transformer-based optical character recognition model, adapted to work with handwritten math images and structured math syntax.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

✨ Features

Data

Fine-tuned on Google's MathWriting dataset. Contains over 500,000 digital inks of handwritten mathematical expressions obtained through either manual labelling or programmatic generation.

Intended use & limitations

You can use this model for OCR on a single math expression.

There is degraded performance on very long expressions (due to image preprocessing, 3:2 aspect ratio seems to work best).

Create an expression chunking scheme to split the image into subimages and process each to bypass this limitation.
In order to process multiple expressions, you need to chuck groups into single expressions.

💻 Usage Examples

Basic Usage

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

# Helper funtion (path to either JPEG or PNG)
def open_PIL_image(image_path: str) -> Image.Image:
  image = Image.open(image_path)
  if image_path.split('.')[-1].lower() == 'png':
      image = Image.composite(image, PIL.Image.new('RGB', image.size, 'white'), image)
  return image


# Load model and processor from Hugging Face
processor = TrOCRProcessor.from_pretrained('tjoab/latex_finetuned')
model = VisionEncoderDecoderModel.from_pretrained('tjoab/latex_finetuned')


# Load all images as a batch
images = [open_PIL_image(path) for path in paths]

# Preprocess the images 
preproc_image = processor.image_processor(images=images, return_tensors="pt").pixel_values

# Generate and decode the tokens
# NOTE: max_length default value is very small, which often results in truncated inference if not set 
pred_ids = model.generate(preproc_image, max_length=128)
latex_preds = processor.batch_decode(pred_ids, skip_special_tokens=True)

🔧 Technical Details

Mini-batch size: 8
Optimizer: Adam
LR Scheduler: cosine
fp16 mixed precision
- Trained using automatic mixed precision (AMP) with torch.cuda.amp for reduced memory usage.
Gradient accumulation
- Used to simulate a larger effective batch size while keeping per-step memory consumption low.
- Optimizer steps occurred every 8 mini-batches.

📚 Documentation

Evaluation

Performance was evaluated using Character Error Rate (CER) defined as:

CER = (Substitutions + Insertions + Deletions) / Total Characters in Ground Truth

✅ Why CER?
- Math expressions are structurally sensitive. Shuffling even a single character can completely change the meaning.
  - x^2 vs. x_2
  - \frac{a}{b} vs. \frac{b}{a}
- CER will penalizes small error in syntax.
Evalution yeilded a CER of 14.9%.

BibTeX and Citation

The original TrORC model was introduced in this paper:

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al.

You can find the source code in their repository.

@misc{li2021trocr,
      title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, 
      author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
      year={2021},
      eprint={2109.10282},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

📄 License

No license information is provided in the original document, so this section is skipped.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご