đ Nougat-LaTeX-based
Nougat-LaTeX-based is a model fine-tuned from facebook/nougat-base, which enhances the ability to generate LaTeX code from images.
đ Quick Start
If you want to use the Nougat-LaTeX-based model, follow these steps:
- Download the repo
git clone git@github.com:NormXU/nougat-latex-ocr.git
cd ./nougat-latex-ocr
- Inference
import torch
from PIL import Image
from transformers import VisionEncoderDecoderModel
from transformers.models.nougat import NougatTokenizerFast
from nougat_latex import NougatLaTexProcessor
model_name = "Norm/nougat-latex-base"
device = "cuda" if torch.cuda.is_available() else "cpu"
model = VisionEncoderDecoderModel.from_pretrained(model_name).to(device)
tokenizer = NougatTokenizerFast.from_pretrained(model_name)
latex_processor = NougatLaTexProcessor.from_pretrained(model_name)
image = Image.open("path/to/latex/image.png")
if not image.mode == "RGB":
image = image.convert('RGB')
pixel_values = latex_processor(image, return_tensors="pt").pixel_values
decoder_input_ids = tokenizer(tokenizer.bos_token, add_special_tokens=False,
return_tensors="pt").input_ids
with torch.no_grad():
outputs = model.generate(
pixel_values.to(device),
decoder_input_ids=decoder_input_ids.to(device),
max_length=model.decoder.config.max_length,
early_stopping=True,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
use_cache=True,
num_beams=5,
bad_words_ids=[[tokenizer.unk_token_id]],
return_dict_in_generate=True,
)
sequence = tokenizer.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(tokenizer.eos_token, "").replace(tokenizer.pad_token, "").replace(tokenizer.bos_token, "")
print(sequence)
⨠Features
Nougat-LaTeX-based is fine-tuned from facebook/nougat-base with im2latex-100k to boost its proficiency in generating LaTeX code from images. Since the initial encoder input image size of nougat was unsuitable for equation image segments, leading to potential rescaling artifacts that degrades the generation quality of LaTeX code. To address this, Nougat-LaTeX-based adjusts the input resolution and uses an adaptive padding approach to ensure that equation image segments in the wild are resized to closely match the resolution of the training data.
đĻ Installation
pip install transformers >= 4.34.0
đ§ Technical Details
Evaluation
Evaluated on an image-equation pair dataset collected from Wikipedia, arXiv, and im2latex-100k, curated by lukas-blecher
Model |
Token Acc â |
Normed Edit Distance â |
pix2tex |
0.5346 |
0.10312 |
pix2tex* |
0.60 |
0.10 |
nougat-latex-based |
0.623850 |
0.06180 |
pix2tex is a ResNet + ViT + Text Decoder architecture introduced in LaTeX-OCR.
pix2tex*: reported from LaTeX-OCR; pix2tex: my evaluation with the released checkpoint ; nougat-latex-based: evaluated on results generated with beam-search strategy.
đ License
This project is licensed under the Apache-2.0 license.
â ī¸ Important Note
The inference API widget sometimes cuts the response short. Please check this issue for more details. You may want to run the model yourself in case the inference API bug cuts the results short.