nougat-latex-base開源模型 - 免費將圖像轉為LaTeX代碼，精準識別數學公式

首頁

Nougat Latex Base

由Norm開發

該模型是基於Nougat-base微調的LaTeX OCR模型，專門用於從圖像生成LaTeX代碼，特別優化了數學公式圖像的識別能力。

圖像生成文本

Transformers

英語開源協議:Apache-2.0 #LaTeX公式識別 #數學公式OCR #高精度公式轉換

下載量 8,523

發布時間 : 10/8/2023

模型概述

基於Nougat的LaTeX模型通過調整輸入分辨率和採用自適應填充方法，提升了從圖像生成LaTeX代碼的質量，特別適用於數學公式圖像的識別。

模型特點

優化的輸入分辨率

調整了輸入分辨率並採用自適應填充方法，減少縮放偽影，提升LaTeX代碼生成質量。

高性能LaTeX生成

在標記準確率和歸一化編輯距離上優於同類模型pix2tex。

數學公式專用優化

專門針對數學公式圖像片段進行優化，適合學術和技術文檔處理。

模型能力

圖像到LaTeX代碼轉換

數學公式識別

學術文檔處理

使用案例

學術研究

論文公式提取

從學術論文圖像中提取數學公式的LaTeX代碼。

標記準確率62.38%，歸一化編輯距離0.0618

教育

教學材料處理

將手寫或印刷的數學公式轉換為可編輯的LaTeX格式。

🚀 基於LaTeX的Nougat模型

基於LaTeX的Nougat模型是從 facebook/nougat-base 微調而來，使用 im2latex-100k 數據集進行訓練，以提升其從圖像生成LaTeX代碼的能力。該模型解決了原Nougat模型在處理方程圖像片段時，因輸入圖像尺寸不合適導致的縮放偽影問題，從而提高了LaTeX代碼的生成質量。

🚀 快速開始

安裝依賴

pip install transformers >= 4.34.0

運行步驟

下載倉庫

git clone git@github.com:NormXU/nougat-latex-ocr.git
cd ./nougat-latex-ocr

進行推理

import torch
from PIL import Image
from transformers import VisionEncoderDecoderModel
from transformers.models.nougat import NougatTokenizerFast
from nougat_latex import NougatLaTexProcessor

model_name = "Norm/nougat-latex-base"
device = "cuda" if torch.cuda.is_available() else "cpu"
# 初始化模型
model = VisionEncoderDecoderModel.from_pretrained(model_name).to(device)

# 初始化處理器
tokenizer = NougatTokenizerFast.from_pretrained(model_name)

latex_processor = NougatLaTexProcessor.from_pretrained(model_name)

# 運行測試
image = Image.open("path/to/latex/image.png")
if not image.mode == "RGB":
    image = image.convert('RGB')

pixel_values = latex_processor(image, return_tensors="pt").pixel_values

decoder_input_ids = tokenizer(tokenizer.bos_token, add_special_tokens=False,
                              return_tensors="pt").input_ids
with torch.no_grad():
    outputs = model.generate(
        pixel_values.to(device),
        decoder_input_ids=decoder_input_ids.to(device),
        max_length=model.decoder.config.max_length,
        early_stopping=True,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
        use_cache=True,
        num_beams=5,
        bad_words_ids=[[tokenizer.unk_token_id]],
        return_dict_in_generate=True,
    )
sequence = tokenizer.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(tokenizer.eos_token, "").replace(tokenizer.pad_token, "").replace(tokenizer.bos_token, "")
print(sequence)