đ im2latex_model
This model is a VisionEncoderDecoderModel trained on a dataset for generating LaTeX formulas from images. It's part of a project that reproduces a specific paper.
đ Quick Start
You can use the model directly with the transformers
library. Here's a code example:
from transformers import VisionEncoderDecoderModel, AutoTokenizer, AutoFeatureExtractor
import torch
from PIL import Image
model = VisionEncoderDecoderModel.from_pretrained("your-username/your-model-name")
tokenizer = AutoTokenizer.from_pretrained("your-username/your-model-name")
feature_extractor = AutoFeatureExtractor.from_pretrained("your-username/your-model-name")
image = Image.open("path/to/your/image.png")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print("Generated LaTeX formula:", generated_texts[0])
⨠Features
- This model is a
VisionEncoderDecoderModel
trained to generate LaTeX formulas from images.
- It's part of a project reproducing the paper: https://arxiv.org/html/2408.04015v1. Note that in the paper, the model is finetuned on handwritten data after training, and this is the model before finetuning.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
from transformers import VisionEncoderDecoderModel, AutoTokenizer, AutoFeatureExtractor
import torch
from PIL import Image
model = VisionEncoderDecoderModel.from_pretrained("your-username/your-model-name")
tokenizer = AutoTokenizer.from_pretrained("your-username/your-model-name")
feature_extractor = AutoFeatureExtractor.from_pretrained("your-username/your-model-name")
image = Image.open("path/to/your/image.png")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print("Generated LaTeX formula:", generated_texts[0])
đ Documentation
Model Details
Property |
Details |
Encoder |
Swin Transformer |
Decoder |
GPT - 2 |
Framework |
PyTorch |
Training Data
The data is taken from OleehyO/latex-formulas. The data was divided into 80:10:10 for train, val and test. The splits were made as follows:
dataset = load_dataset(OleehyO/latex-formulas, cleaned_formulas)
train_val_split = dataset["train"].train_test_split(test_size=0.2, seed=42)
train_ds = train_val_split["train"]
val_test_split = train_val_split["test"].train_test_split(test_size=0.5, seed=42)
val_ds = val_test_split["train"]
test_ds = val_test_split["test"]
Evaluation Metrics
The model was evaluated on a test set with the following results:
- Test Loss: 0.09
- Test BLEU Score: 0.69
Training Script
The training script for this model can be found in the following repository: GitHub
đ License
This project is licensed under the AGPL - 3.0 license.