im2latex Open-Source Model - Free Deployment, Easily Convert Images to LaTeX Formulas!

Home

Im2latex

Developed by DGurgurov

A baseline model based on VisionEncoderDecoderModel, fine-tuned on datasets for generating LaTeX formulas from images.

Image-to-Text

Transformers

Open Source License:MIT #Image to LaTeX #Mathematical Formula Recognition #Swin-GPT Architecture

Downloads 288

Release Time : 7/15/2024

Model Overview

This model can convert images containing mathematical formulas into LaTeX code, suitable for scenarios such as academic document processing and mathematical formula recognition.

Model Features

Hybrid Architecture

Combines a visual encoder (Swin Transformer) and a text decoder (GPT-2) to effectively handle image-to-text conversion tasks.

High-Precision Formula Recognition

Achieves a BLEU score of 0.67 on test sets, capable of accurately recognizing complex mathematical formulas.

Distributed Training

Uses PyTorch's Distributed Data Parallel (DDP) for efficient training.

Model Capabilities

Image Recognition

Mathematical Formula Conversion

LaTeX Code Generation

Use Cases

Academic Research

Digitizing Paper Formulas

Convert mathematical formulas from scanned documents or images into editable LaTeX code.

Improves efficiency in academic document processing.

Educational Assistance Tool

Helps students and teachers quickly obtain LaTeX representations of formulas in images.

Facilitates sharing and teaching of mathematical content.

Document Processing

PDF Formula Extraction

Extract formula images from PDF documents and convert them into editable formats.

Simplifies document editing workflows.

🚀 im2latex

This model is a base VisionEncoderDecoderModel fine-tuned on a dataset for generating LaTeX formulas from images.

🚀 Quick Start

You can use the model directly with the transformers library:

from transformers import VisionEncoderDecoderModel, AutoTokenizer, AutoFeatureExtractor
import torch
from PIL import Image

# load model, tokenizer, and feature extractor
model = VisionEncoderDecoderModel.from_pretrained("DGurgurov/im2latex")
tokenizer = AutoTokenizer.from_pretrained("DGurgurov/im2latex")
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/swin-base-patch4-window7-224-in22k") # using the original feature extractor for now

# prepare an image
image = Image.open("path/to/your/image.png")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values

# generate LaTeX formula
generated_ids = model.generate(pixel_values)
generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

print("Generated LaTeX formula:", generated_texts[0])

✨ Features

This model is a fine - tuned VisionEncoderDecoderModel for generating LaTeX formulas from images. It combines a Swin Transformer encoder and a GPT - 2 decoder, using PyTorch as the framework and DDP for training.

📦 Installation

The README doesn't provide specific installation steps, so this section is skipped.

💻 Usage Examples

Basic Usage

from transformers import VisionEncoderDecoderModel, AutoTokenizer, AutoFeatureExtractor
import torch
from PIL import Image

# load model, tokenizer, and feature extractor
model = VisionEncoderDecoderModel.from_pretrained("DGurgurov/im2latex")
tokenizer = AutoTokenizer.from_pretrained("DGurgurov/im2latex")
feature_extractor = AutoFeatureExtractor.from_pretrained("microsoft/swin-base-patch4-window7-224-in22k") # using the original feature extractor for now

# prepare an image
image = Image.open("path/to/your/image.png")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values

# generate LaTeX formula
generated_ids = model.generate(pixel_values)
generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

print("Generated LaTeX formula:", generated_texts[0])

📚 Documentation

Model Details

Property	Details
Encoder	Swin Transformer
Decoder	GPT - 2
Framework	PyTorch
DDP (Distributed Data Parallel)	Used for training

Training Data

The data is taken from OleehyO/latex-formulas. The data was divided into 80:10:10 for train, val and test. The splits were made as follows:

dataset = load_dataset(OleehyO/latex-formulas, cleaned_formulas)
train_val_split = dataset["train"].train_test_split(test_size=0.2, seed=42)
train_ds = train_val_split["train"]
val_test_split = train_val_split["test"].train_test_split(test_size=0.5, seed=42)
val_ds = val_test_split["train"]
test_ds = val_test_split["test"]

Evaluation Metrics

The model was evaluated on a test set with the following results:

Test Loss: 0.10
Test BLEU Score: 0.67

Training Script

The training script for this model can be found in the following repository: GitHub

Citation

If you use this work in your research, please cite our paper:

@misc{gurgurov2024imagetolatexconvertermathematicalformulas,
      title={Image-to-LaTeX Converter for Mathematical Formulas and Text}, 
      author={Daniil Gurgurov and Aleksey Morshnev},
      year={2024},
      eprint={2408.04015},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2408.04015}, 
}

📄 License

This project is under the [MIT] license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご