nougat-for-formula Open-source Model - Free Deployment, Precise Extraction of LaTeX Formula Codes from Images

Nougat For Formula

Developed by CuiSiwei

A fine-tuned mathematical formula recognition model based on Nougat-small, excelling in extracting LaTeX formula code from images

Image-to-Text

Transformers

Open Source License:Apache-2.0 #Mathematical Formula Recognition #LaTeX Generation #Academic Document Processing

Downloads 40

Release Time : 1/12/2024

Model Overview

This model takes images of black formulas on white backgrounds as input and outputs precise LaTeX formula code, making it particularly suitable for scientific document processing

Model Features

Accurate Formula Recognition

Optimized specifically for mathematical formulas, capable of accurately recognizing complex formula structures

LaTeX Output

Directly generates editable LaTeX code, facilitating academic writing

Scientific Document Optimization

Specially trained for unique symbols and formats in scientific documents

Model Capabilities

Mathematical formula recognition in images

LaTeX code generation

Scientific document processing

Table recognition

Use Cases

Academic Writing

Note Formula Conversion

Quickly convert handwritten or printed mathematical formulas into LaTeX code

Improves academic writing efficiency

Paper Formula Extraction

Extract formula code from PDF papers

Facilitates formula reuse and modification

Educational Technology

Online Learning Tools

Integrated into educational platforms to enable automatic formula recognition

Enhances learning experience

🚀 Nougat for formula

A fine - tuned model based on the small - sized Nougat, specialized in accurately identifying formulas from images and converting them into LaTeX code.

🚀 Quick Start

The following demo shows how to input an image into the model and generate LaTeX/Markdown formula code.

from transformers import NougatProcessor, VisionEncoderDecoderModel
from PIL import Image

max_length = 100 # defing max length of output
processor = NougatProcessor.from_pretrained(r".", max_length = max_length) # Replace with your path 
model = VisionEncoderDecoderModel.from_pretrained(r".") # Replace with your path

image = Image.open(r"image_path") # Replace with your path
image = processor(image, return_tensors="pt").pixel_values # The processor will resize the image according to our model

result_tensor = model.generate(
            image,
            max_length=max_length,
            bad_words_ids=[[processor.tokenizer.unk_token_id]]
              ) # generate id tensor

result = processor.batch_decode(result_tensor, skip_special_tokens=True) # Using the processor to decode the result
result = processor.post_process_generation(result, fix_markdown=False)

print(*result)

✨ Features

Accurate Formula Identification: Nougat for formula is highly effective at identifying formulas from images, taking images with a white background and black - written formulas as input and returning accurate LaTeX code for the formulas.
Potential Substitute: It can serve as a tool for converting complicated formulas to LaTeX code and has the potential to replace other similar tools.
Customizable Fine - Tuning: You can continue fine - tuning the model to enhance its ability to identify formulas from specific subjects.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

The code in the "Quick Start" section demonstrates the basic usage of inputting an image into the model and getting the corresponding LaTeX code.

Advanced Usage

You can continue fine - tuning the model to make it more powerful in identifying formulas from certain subjects. For example, you can use more data from specific scientific fields to fine - tune the model.

# Pseudo - code for advanced fine - tuning
# Assume you have new data and a training loop
from transformers import NougatProcessor, VisionEncoderDecoderModel
from torch.utils.data import DataLoader

processor = NougatProcessor.from_pretrained(r".")
model = VisionEncoderDecoderModel.from_pretrained(r".")

# Prepare new data
new_dataset = ...
dataloader = DataLoader(new_dataset, batch_size = 4)

# Define optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr = 1e-4)

# Training loop
for epoch in range(num_epochs):
    for batch in dataloader:
        inputs = processor(batch['image'], return_tensors="pt").pixel_values
        labels = processor(batch['formula'], return_tensors="pt").input_ids
        outputs = model(inputs, labels = labels)
        loss = outputs.loss
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

📚 Documentation

Model Details

Model Description

Nougat for formula is good at identifying formula from images. It takes images with white backgroud and formula written in black as input and return with accurate Latex code for the formula.

The Naugat model (Neural Optical Understanding for Academic Documents) was proposed by Meta AI in August 2023 as a visual Transformer model for processing scientific documents. It can convert PDF format documents into Markup language, especially with good recognition ability for mathematical expressions and tables.The goal of this model is to improve the accessibility of scientific knowledge by bridging human readable documents with machine readable text.

Model type: Vision Encoder Decoder
Finetuned from model: Nougat model, small - sized version

Uses

Nougat for formula can be used as a tool for converting complicated formula to Latex code. It has potential to be a good substitute for other tools.

For example, when you are taking notes and tired at coding long Latex/Markdown formula code, just make a screen shot of them and put it into Nougat for formula. Then you can get the exact code for the formula as long as it won't exceed the max length of the model you use.

You can also continue fine - tuning the model to make it more powerful in identifying formulas from certain subjects.

Nougat for formula may be useful when developing tools or apps aiming at generating Latex code.

🔧 Technical Details

Training Details

Training Data

IM2LATEX - 100K

Preprocessing

The preprocessing of X (image) has been showed in the short demo above.

The preprocessing of Y (formula) is done by:

Remove the space in the formula string.
Using processor to tokenize the string.

Training Hyperparameters

Training regime: torch.optim.AdamW(model.parameters(), lr = 1e - 4)

Evaluation

Testing Data, Factors & Metrics

Testing Data

The tesing data is also taken from IM2LATEX - 100K. Note that the train, validation and test data has been well split before downloading.

Metrics

BLEU and CER.

Results

The BLEU is 0.8157 and CER is 0.1601 on test data.

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご