Bpe-vocab-n-OCR開源文本提取工具 - 免費部署生成結構化分詞輸出

首頁

Bpe Vocab N OCR

由prithivMLmods開發

Bpe-vocab-n-OCR 是一款基於OCR的高級文本提取工具，專為生成結構化、分詞的輸出而優化。

圖像生成文本

Transformers

支持多種語言開源協議:Apache-2.0 #結構化OCR #多語言分詞 #圖像轉文本

下載量 76

發布時間 : 2/18/2025

模型概述

該工具基於強大的視覺-語言架構，具備增強的OCR和多語言支持能力，能夠從圖像中準確提取文本，並以逗號分隔的序列形式返回。

模型特點

先進的OCR引擎

通過大量數據集微調，確保精準的文本識別和分詞。

優化分詞輸出

生成結構化的逗號分隔文本，非常適合下游NLP任務、自動化流程和數據庫集成。

增強的多語言OCR支持

支持多種語言的文本提取，包括英語、中文、日語、韓語、阿拉伯語等。

多模態處理

無縫處理圖像和文本輸入，提供結構化的分詞輸出。

安全且優化的模型權重

使用safetensors實現高效、安全的模型加載。

模型能力

文本提取

圖像分析

多語言支持

結構化輸出

使用案例

自動化流程

文檔處理

從掃描文檔中提取文本並生成結構化數據。

提高文檔處理效率，減少人工干預。

數據庫集成

數據錄入

將圖像中的文本轉換為結構化數據並錄入數據庫。

簡化數據錄入流程，提高準確性。

🚀 Bpe-vocab-n-OCR

Bpe-vocab-n-OCR是一款先進的基於OCR的文本提取工具，優化後可生成結構化的分詞輸出。它基於強大的視覺語言架構，具備增強的OCR和多語言支持能力，能準確從圖像中提取文本，並以逗號分隔的序列形式返回。

🚀 快速開始

代碼示例

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

# Load the Bpe-vocab-n-OCR model with optimized parameters
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/Tokenized-OCR", torch_dtype="auto", device_map="auto"
)

# Recommended acceleration for performance optimization:
# model = Qwen2VLForConditionalGeneration.from_pretrained(
#     "prithivMLmods/Tokenized-OCR",
#     torch_dtype=torch.bfloat16,
#     attn_implementation="flash_attention_2",
#     device_map="auto",
# )

# Load the default processor for Bpe-vocab-n-OCR
processor = AutoProcessor.from_pretrained("prithivMLmods/Tokenized-OCR")

# Define the input messages with both an image and a text prompt
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://flux-generated.com/sample_image.jpeg",
            },
            {"type": "text", "text": "Extract and return the tokenized OCR text from the image, ensuring each word is accurately recognized and separated by commas."},
        ],
    }
]

# Prepare the input for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Generate the output
generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)