visionOCR-3B-061125-GGUF Open-source OCR Model - Supports Document Recognition, Long-text Comprehension, and Formula Conversion

Visionocr 3B 061125 GGUF

Developed by prithivMLmods

A visual OCR model fine-tuned based on Qwen2.5-VL-3B-Instruct, focusing on document-level OCR, long-context visual language understanding, and mathematical LaTeX format conversion.

Image-to-Text

Transformers

EnglishOpen Source License:Apache-2.0 #Document-level OCR #LaTeX Math Recognition #Long-context Visual Understanding

Downloads 131

Release Time : 6/12/2025

Model Overview

This model is optimized for document-level optical character recognition (OCR), long-context visual language understanding, and accurate image-to-text conversion with mathematical LaTeX format. It enhances the ability to understand documents in various input formats, extract structured data, and perform visual reasoning.

Model Features

Document-level OCR Optimization

Specifically optimized for document-level optical character recognition tasks to improve text extraction accuracy

Long-context Understanding

Enhanced ability to understand long-context visual language, suitable for processing complex documents

Mathematical LaTeX Support

Capable of accurately converting images containing mathematical formulas into LaTeX format text

Multi-quantization Versions

Provides multiple versions from BF16 to 2-bit quantization to meet different hardware requirements

Model Capabilities

Document Image-to-Text Conversion

Mathematical Formula Recognition

Structured Data Extraction

Visual Reasoning

Long Text Understanding

Use Cases

Document Processing

Digitization of Scanned Documents

Convert scanned PDFs or images into editable text

Preserve the original format and mathematical symbols

Academic Paper Processing

Extract mathematical formulas and special symbols from papers

Convert to LaTeX format

Education

Mathematical Problem Recognition

Recognize mathematical problems and formulas from images

Generate editable mathematical expressions

Property	Details
Model Type	visionOCR-3B-061125-GGUF
Base Model	prithivMLmods/visionOCR-3B-061125
Library Name	transformers
Pipeline Tag	image-text-to-text

File Name	Size	Format	Description
visionOCR-3B-061125-BF16.gguf	6.18 GB	BF16	Brain floating point 16-bit
visionOCR-3B-061125-Q6_K.gguf	2.54 GB	Q6_K	6-bit quantized
visionOCR-3B-061125-Q5_K_M.gguf	2.22 GB	Q5_K_M	5-bit quantized, medium quality
visionOCR-3B-061125-Q4_K_M.gguf	1.93 GB	Q4_K_M	4-bit quantized, medium quality
visionOCR-3B-061125-Q3_K_M.gguf	1.59 GB	Q3_K_M	3-bit quantized, medium quality
visionOCR-3B-061125-Q3_K_S.gguf	1.45 GB	Q3_K_S	3-bit quantized, small quality
visionOCR-3B-061125-Q2_K.gguf	1.27 GB	Q2_K	2-bit quantized

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Visionocr 3B 061125 GGUF

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 visionOCR-3B-061125-GGUF

📚 Documentation

Model Files

Quants Usage

📄 License