im2latex_base Open-Source Model - Effortlessly Convert Images to LaTeX Formulas, Free Deployment and Highly Practical

Im2latex Base

Developed by Matthijs0

A VisionEncoderDecoder model for generating LaTeX formulas from images, utilizing Swin Transformer encoder and GPT-2 decoder architecture

Image-to-Text

Transformers

#Image to LaTeX #Formula Recognition #Swin-GPT2 Architecture

Downloads 56

Release Time : 1/14/2025

Model Overview

This model can convert images containing mathematical formulas into LaTeX code, suitable for digitizing formulas in academic documents, technical reports, and similar scenarios

Model Features

Hybrid Architecture Design

Combines the strengths of visual encoder (Swin Transformer) and text decoder (GPT-2) to effectively handle image-to-text conversion tasks

High-Precision Formula Recognition

Achieves a BLEU score of 0.69 on test sets, accurately recognizing and converting complex mathematical formulas

Scalability

Supports fine-tuning with handwritten formula data to enhance performance in specific scenarios

Model Capabilities

Image Recognition

Mathematical Formula Conversion

LaTeX Code Generation

Use Cases

Academic Research

Digitizing Paper Formulas

Convert mathematical formulas from paper or PDF documents into editable LaTeX code

Improves academic writing efficiency and facilitates formula reuse and modification

Educational Technology

Online Learning Platforms

Help students and teachers quickly input complex mathematical formulas

Simplifies the creation process of online mathematical content

🚀 im2latex_model

This model is a VisionEncoderDecoderModel trained on a dataset for generating LaTeX formulas from images. It's part of a project that reproduces a specific paper.

🚀 Quick Start

You can use the model directly with the transformers library. Here's a code example:

from transformers import VisionEncoderDecoderModel, AutoTokenizer, AutoFeatureExtractor
import torch
from PIL import Image

# Load model, tokenizer, and feature extractor
model = VisionEncoderDecoderModel.from_pretrained("your-username/your-model-name")
tokenizer = AutoTokenizer.from_pretrained("your-username/your-model-name")
feature_extractor = AutoFeatureExtractor.from_pretrained("your-username/your-model-name")

# Prepare an image
image = Image.open("path/to/your/image.png")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values

# Generate LaTeX formula
generated_ids = model.generate(pixel_values)
generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

print("Generated LaTeX formula:", generated_texts[0])

✨ Features

This model is a VisionEncoderDecoderModel trained to generate LaTeX formulas from images.
It's part of a project reproducing the paper: https://arxiv.org/html/2408.04015v1. Note that in the paper, the model is finetuned on handwritten data after training, and this is the model before finetuning.

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import VisionEncoderDecoderModel, AutoTokenizer, AutoFeatureExtractor
import torch
from PIL import Image

# Load model, tokenizer, and feature extractor
model = VisionEncoderDecoderModel.from_pretrained("your-username/your-model-name")
tokenizer = AutoTokenizer.from_pretrained("your-username/your-model-name")
feature_extractor = AutoFeatureExtractor.from_pretrained("your-username/your-model-name")

# Prepare an image
image = Image.open("path/to/your/image.png")
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values

# Generate LaTeX formula
generated_ids = model.generate(pixel_values)
generated_texts = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

print("Generated LaTeX formula:", generated_texts[0])

📚 Documentation

Model Details

Property	Details
Encoder	Swin Transformer
Decoder	GPT - 2
Framework	PyTorch

Training Data

The data is taken from OleehyO/latex-formulas. The data was divided into 80:10:10 for train, val and test. The splits were made as follows:

dataset = load_dataset(OleehyO/latex-formulas, cleaned_formulas)
train_val_split = dataset["train"].train_test_split(test_size=0.2, seed=42)
train_ds = train_val_split["train"]
val_test_split = train_val_split["test"].train_test_split(test_size=0.5, seed=42)
val_ds = val_test_split["train"]
test_ds = val_test_split["test"]

Evaluation Metrics

The model was evaluated on a test set with the following results:

Test Loss: 0.09
Test BLEU Score: 0.69

Training Script

The training script for this model can be found in the following repository: GitHub

📄 License

This project is licensed under the AGPL - 3.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご