nougat-latex-base Open-source Model - Free Conversion of Images to LaTeX Code, Precise Recognition of Mathematical Formulas

Nougat Latex Base

Developed by Norm

This model is a LaTeX OCR model fine-tuned based on Nougat-base, specifically designed to generate LaTeX code from images, with a particular optimization for the recognition ability of mathematical formula images.

Image-to-Text

Transformers

EnglishOpen Source License:Apache-2.0 #LaTeX formula recognition #Mathematical formula OCR #High-precision formula conversion

Downloads 8,523

Release Time : 10/8/2023

Model Overview

The LaTeX model based on Nougat improves the quality of generating LaTeX code from images by adjusting the input resolution and adopting an adaptive filling method, and is particularly suitable for the recognition of mathematical formula images.

Model Features

Optimized input resolution

Adjusted the input resolution and adopted an adaptive filling method to reduce scaling artifacts and improve the quality of LaTeX code generation.

High-performance LaTeX generation

Outperforms the similar model pix2tex in terms of token accuracy and normalized edit distance.

Special optimization for mathematical formulas

Specifically optimized for mathematical formula image segments, suitable for academic and technical document processing.

Model Capabilities

Image-to-LaTeX code conversion

Mathematical formula recognition

Academic document processing

Use Cases

Academic research

Thesis formula extraction

Extract the LaTeX code of mathematical formulas from academic thesis images.

Token accuracy 62.38%, normalized edit distance 0.0618

Education

Teaching material processing

Convert handwritten or printed mathematical formulas into editable LaTeX format.

🚀 Nougat-LaTeX-based

Nougat-LaTeX-based is a model fine-tuned from facebook/nougat-base, which enhances the ability to generate LaTeX code from images.

🚀 Quick Start

If you want to use the Nougat-LaTeX-based model, follow these steps:

Download the repo

git clone git@github.com:NormXU/nougat-latex-ocr.git
cd ./nougat-latex-ocr

Inference

import torch
from PIL import Image
from transformers import VisionEncoderDecoderModel
from transformers.models.nougat import NougatTokenizerFast
from nougat_latex import NougatLaTexProcessor

model_name = "Norm/nougat-latex-base"
device = "cuda" if torch.cuda.is_available() else "cpu"
# init model
model = VisionEncoderDecoderModel.from_pretrained(model_name).to(device)

# init processor
tokenizer = NougatTokenizerFast.from_pretrained(model_name)

latex_processor = NougatLaTexProcessor.from_pretrained(model_name)

# run test
image = Image.open("path/to/latex/image.png")
if not image.mode == "RGB":
    image = image.convert('RGB')

pixel_values = latex_processor(image, return_tensors="pt").pixel_values

decoder_input_ids = tokenizer(tokenizer.bos_token, add_special_tokens=False,
                              return_tensors="pt").input_ids
with torch.no_grad():
    outputs = model.generate(
        pixel_values.to(device),
        decoder_input_ids=decoder_input_ids.to(device),
        max_length=model.decoder.config.max_length,
        early_stopping=True,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
        use_cache=True,
        num_beams=5,
        bad_words_ids=[[tokenizer.unk_token_id]],
        return_dict_in_generate=True,
    )
sequence = tokenizer.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(tokenizer.eos_token, "").replace(tokenizer.pad_token, "").replace(tokenizer.bos_token, "")
print(sequence)

✨ Features

Nougat-LaTeX-based is fine-tuned from facebook/nougat-base with im2latex-100k to boost its proficiency in generating LaTeX code from images. Since the initial encoder input image size of nougat was unsuitable for equation image segments, leading to potential rescaling artifacts that degrades the generation quality of LaTeX code. To address this, Nougat-LaTeX-based adjusts the input resolution and uses an adaptive padding approach to ensure that equation image segments in the wild are resized to closely match the resolution of the training data.

📦 Installation

pip install transformers >= 4.34.0

🔧 Technical Details

Evaluation

Evaluated on an image-equation pair dataset collected from Wikipedia, arXiv, and im2latex-100k, curated by lukas-blecher

Model	Token Acc ↑	Normed Edit Distance ↓
pix2tex	0.5346	0.10312
pix2tex*	0.60	0.10
nougat-latex-based	0.623850	0.06180

pix2tex is a ResNet + ViT + Text Decoder architecture introduced in LaTeX-OCR.

pix2tex*: reported from LaTeX-OCR; pix2tex: my evaluation with the released checkpoint ; nougat-latex-based: evaluated on results generated with beam-search strategy.

📄 License

This project is licensed under the Apache-2.0 license.

⚠️ Important Note

The inference API widget sometimes cuts the response short. Please check this issue for more details. You may want to run the model yourself in case the inference API bug cuts the results short.

Property	Details
Model Type	Donut
Finetuned from	facebook/nougat-base
Repository	source code

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご