đ im2latex
This model is a base VisionEncoderDecoderModel fine-tuned on a dataset for generating LaTeX formulas from images.
đ Quick Start
You can use the model directly with the transformers
library:
from transformers import VisionEncoderDecoderModel, AutoTokenizer, AutoFeatureExtractor
import torch
from PIL import Image
model = VisionEncoderDecoderModel.from_pretrained("DGurgurov/im2latex")
tokenizer = AutoTokenizer.from_pretrained("DGurgurov/im2latex")
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/swin-base-patch4-window7-224-in22k")
image = Image.open("path/to/your/image.png")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print("Generated LaTeX formula:", generated_texts[0])
⨠Features
This model is a fine - tuned VisionEncoderDecoderModel for generating LaTeX formulas from images. It combines a Swin Transformer encoder and a GPT - 2 decoder, using PyTorch as the framework and DDP for training.
đĻ Installation
The README doesn't provide specific installation steps, so this section is skipped.
đģ Usage Examples
Basic Usage
from transformers import VisionEncoderDecoderModel, AutoTokenizer, AutoFeatureExtractor
import torch
from PIL import Image
model = VisionEncoderDecoderModel.from_pretrained("DGurgurov/im2latex")
tokenizer = AutoTokenizer.from_pretrained("DGurgurov/im2latex")
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/swin-base-patch4-window7-224-in22k")
image = Image.open("path/to/your/image.png")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print("Generated LaTeX formula:", generated_texts[0])
đ Documentation
Model Details
Property |
Details |
Encoder |
Swin Transformer |
Decoder |
GPT - 2 |
Framework |
PyTorch |
DDP (Distributed Data Parallel) |
Used for training |
Training Data
The data is taken from OleehyO/latex-formulas. The data was divided into 80:10:10 for train, val and test. The splits were made as follows:
dataset = load_dataset(OleehyO/latex-formulas, cleaned_formulas)
train_val_split = dataset["train"].train_test_split(test_size=0.2, seed=42)
train_ds = train_val_split["train"]
val_test_split = train_val_split["test"].train_test_split(test_size=0.5, seed=42)
val_ds = val_test_split["train"]
test_ds = val_test_split["test"]
Evaluation Metrics
The model was evaluated on a test set with the following results:
- Test Loss: 0.10
- Test BLEU Score: 0.67
Training Script
The training script for this model can be found in the following repository: GitHub
Citation
- If you use this work in your research, please cite our paper:
@misc{gurgurov2024imagetolatexconvertermathematicalformulas,
title={Image-to-LaTeX Converter for Mathematical Formulas and Text},
author={Daniil Gurgurov and Aleksey Morshnev},
year={2024},
eprint={2408.04015},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2408.04015},
}
đ License
This project is under the [MIT] license.