Visionocr 3B 061125 GGUF
V
Visionocr 3B 061125 GGUF
Developed by prithivMLmods
A visual OCR model fine-tuned based on Qwen2.5-VL-3B-Instruct, focusing on document-level OCR, long-context visual language understanding, and mathematical LaTeX format conversion.
Downloads 131
Release Time : 6/12/2025
Model Overview
This model is optimized for document-level optical character recognition (OCR), long-context visual language understanding, and accurate image-to-text conversion with mathematical LaTeX format. It enhances the ability to understand documents in various input formats, extract structured data, and perform visual reasoning.
Model Features
Document-level OCR Optimization
Specifically optimized for document-level optical character recognition tasks to improve text extraction accuracy
Long-context Understanding
Enhanced ability to understand long-context visual language, suitable for processing complex documents
Mathematical LaTeX Support
Capable of accurately converting images containing mathematical formulas into LaTeX format text
Multi-quantization Versions
Provides multiple versions from BF16 to 2-bit quantization to meet different hardware requirements
Model Capabilities
Document Image-to-Text Conversion
Mathematical Formula Recognition
Structured Data Extraction
Visual Reasoning
Long Text Understanding
Use Cases
Document Processing
Digitization of Scanned Documents
Convert scanned PDFs or images into editable text
Preserve the original format and mathematical symbols
Academic Paper Processing
Extract mathematical formulas and special symbols from papers
Convert to LaTeX format
Education
Mathematical Problem Recognition
Recognize mathematical problems and formulas from images
Generate editable mathematical expressions
Featured Recommended AI Models