Bpe-vocab-n-OCR Open-source Text Extraction Tool - Free Deployment to Generate Structured Segmentation Output

Bpe Vocab N OCR

Developed by prithivMLmods

Bpe-vocab-n-OCR is an advanced text extraction tool based on OCR, optimized for generating structured and tokenized output.

Image-to-Text

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Structured OCR #Multilingual Tokenization #Image to Text

Downloads 76

Release Time : 2/18/2025

Model Overview

This tool is built on a powerful vision-language architecture with enhanced OCR and multilingual support, capable of accurately extracting text from images and returning it in a comma-separated sequence format.

Model Features

Advanced OCR Engine

Fine-tuned on extensive datasets to ensure precise text recognition and tokenization.

Optimized Tokenized Output

Generates structured, comma-separated text, ideal for downstream NLP tasks, automation workflows, and database integration.

Enhanced Multilingual OCR Support

Supports text extraction in multiple languages, including English, Chinese, Japanese, Korean, Arabic, and more.

Multimodal Processing

Seamlessly handles both image and text inputs, delivering structured tokenized output.

Secure and Optimized Model Weights

Uses safetensors for efficient and secure model loading.

Model Capabilities

Text Extraction

Image Analysis

Multilingual Support

Structured Output

Use Cases

Automation Workflows

Document Processing

Extracts text from scanned documents and generates structured data.

Improves document processing efficiency and reduces manual intervention.

Database Integration

Data Entry

Converts text from images into structured data for database entry.

Simplifies data entry processes and enhances accuracy.

🚀 Bpe-vocab-n-OCR

Bpe-vocab-n-OCR is an advanced OCR-based text extraction tool. It's optimized for generating structured, tokenized outputs. Built on a powerful vision - language architecture with enhanced OCR and multilingual support, it can accurately extract text from images and return it as a comma - separated sequence.

🚀 Quick Start

To use Bpe-vocab-n-OCR, you can follow the steps in the code example below.

💻 Usage Examples

Basic Usage

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

# Load the Bpe-vocab-n-OCR model with optimized parameters
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/Tokenized-OCR", torch_dtype="auto", device_map="auto"
)

# Recommended acceleration for performance optimization:
# model = Qwen2VLForConditionalGeneration.from_pretrained(
#     "prithivMLmods/Tokenized-OCR",
#     torch_dtype=torch.bfloat16,
#     attn_implementation="flash_attention_2",
#     device_map="auto",
# )

# Load the default processor for Bpe-vocab-n-OCR
processor = AutoProcessor.from_pretrained("prithivMLmods/Tokenized-OCR")

# Define the input messages with both an image and a text prompt
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://flux-generated.com/sample_image.jpeg",
            },
            {"type": "text", "text": "Extract and return the tokenized OCR text from the image, ensuring each word is accurately recognized and separated by commas."},
        ],
    }
]

# Prepare the input for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Generate the output
generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

✨ Features

High-Accuracy OCR Processing
- Extracts and tokenizes text from images with exceptional precision.
Multilingual Text Recognition
- Supports multiple languages, ensuring comprehensive OCR capabilities.
Comma-Separated Tokenized Output
- Generates structured text for seamless NLP and data processing tasks.
Efficient Image & Text Processing
- Handles both visual and textual inputs, ensuring accurate OCR-based extraction.
Optimized for Secure Deployment
- Uses safetensors for enhanced security and model efficiency.

📄 License

This project is licensed under the Apache-2.0 license.

Property	Details
Base Model	prithivMLmods/Qwen2-VL-OCR-2B-Instruct
Pipeline Tag	image-to-text
Library Name	transformers
Tags	text-generation-inference, bpe, ocr

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご