đ TrOCR-LaTeX (fine-tuned on math handwriting)
Transform handwritten math into clean LaTeX code. This fine-tuned model, based on a transformer-based OCR model, is adapted for handwritten math images and structured math syntax.
đ Quick Start
Take your handwritten math and turn it into clean LaTeX code.
This is a fine-tuned version of microsoft/trocr-base-handwritten
,
a transformer-based optical character recognition model, adapted to work with handwritten math images and structured math syntax.
đĻ Installation
No specific installation steps are provided in the original document, so this section is skipped.
⨠Features
Data
Fine-tuned on Google's MathWriting
dataset. Contains over 500,000 digital inks of handwritten mathematical expressions obtained through either manual labelling or programmatic generation.
Intended use & limitations
You can use this model for OCR on a single math expression.
There is degraded performance on very long expressions (due to image preprocessing, 3:2 aspect ratio seems to work best).
- Create an expression chunking scheme to split the image into subimages and process each to bypass this limitation.
- In order to process multiple expressions, you need to chuck groups into single expressions.
đģ Usage Examples
Basic Usage
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
def open_PIL_image(image_path: str) -> Image.Image:
image = Image.open(image_path)
if image_path.split('.')[-1].lower() == 'png':
image = Image.composite(image, PIL.Image.new('RGB', image.size, 'white'), image)
return image
processor = TrOCRProcessor.from_pretrained('tjoab/latex_finetuned')
model = VisionEncoderDecoderModel.from_pretrained('tjoab/latex_finetuned')
images = [open_PIL_image(path) for path in paths]
preproc_image = processor.image_processor(images=images, return_tensors="pt").pixel_values
pred_ids = model.generate(preproc_image, max_length=128)
latex_preds = processor.batch_decode(pred_ids, skip_special_tokens=True)
đ§ Technical Details
- Mini-batch size: 8
- Optimizer: Adam
- LR Scheduler: cosine
fp16
mixed precision
- Trained using automatic mixed precision (AMP) with
torch.cuda.amp
for reduced memory usage.
- Gradient accumulation
- Used to simulate a larger effective batch size while keeping per-step memory consumption low.
- Optimizer steps occurred every 8 mini-batches.
đ Documentation
Evaluation
Performance was evaluated using Character Error Rate (CER) defined as:
CER = (Substitutions + Insertions + Deletions) / Total Characters in Ground Truth
BibTeX and Citation
The original TrORC model was introduced in this paper:
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al.
You can find the source code in their repository.
@misc{li2021trocr,
title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models},
author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
year={2021},
eprint={2109.10282},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
đ License
No license information is provided in the original document, so this section is skipped.