Qwen2-VL-OCR-2B-Instruct-GGUF Open-source Multimodal Model - Realize OCR, Image-to-Text and Handwriting Recognition

Qwen2 VL OCR 2B Instruct GGUF

Developed by prithivMLmods

A multimodal model fine-tuned based on Qwen/Qwen2-VL-2B-Instruct, optimized for OCR, image-to-text conversion, LaTeX math solving, and handwriting recognition

Image-to-Text Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multimodal OCR #Handwriting Recognition #Mathematical Formula Parsing

Downloads 142

Release Time : 5/15/2025

Model Overview

A conversational model combining visual and textual understanding, supporting mixed tasks such as optical character recognition, handwritten text extraction, and mathematical formula parsing

Model Features

Multimodal OCR Capability

Capable of handling mixed recognition tasks for printed text, handwritten text, and mathematical formulas

Quantization Support

Provides multiple quantization versions from 1-bit to 8-bit to accommodate different hardware requirements

Conversational Interaction

Supports question-and-answer interactions based on visual input

Model Capabilities

Optical Character Recognition (OCR)

Handwritten Text Extraction

LaTeX Mathematical Formula Parsing

Image-to-Text Conversion

Visual Question Answering (VQA)

Use Cases

Document Digitization

Printed Document OCR

Convert printed text in scanned documents or photos into editable text

Supports complex layout recognition

Handwritten Note Transcription

Recognize messy handwritten content and convert it into digital text

Optimized for unconventional handwriting

Educational Assistance

Math Homework Parsing

Recognize handwritten or printed math problems and provide LaTeX-formatted parsing

Supports formula and symbol recognition

🚀 Qwen2-VL-OCR-2B-Instruct-GGUF [ VL / OCR ]

The Qwen2-VL-OCR-2B-Instruct model is a fine - tuned version of Qwen/Qwen2-VL-2B-Instruct. It's designed for tasks such as Optical Character Recognition (OCR), image - to - text conversion, solving math problems with LaTeX formatting, and Messy Handwriting OCR. This model combines a conversational approach with visual and textual understanding to handle multi - modal tasks effectively.

✨ Features

Tailored for OCR, image - to - text conversion, math problem solving, and Messy Handwriting OCR.
Integrates conversational, visual, and textual understanding for multi - modal tasks.

📚 Documentation

Model Files (Qwen2-VL-OCR-2B-Instruct, GGUF)

File Name	Size	Quantization	Format	Description
`Qwen2-VL-OCR-2B-Instruct.f16.gguf`	3.09 GB	FP16	GGUF	Full precision (float16)
`Qwen2-VL-OCR-2B-Instruct.Q2_K.gguf`	676 MB	Q2_K	GGUF	2 - bit quantized
`Qwen2-VL-OCR-2B-Instruct.Q3_K_L.gguf`	880 MB	Q3_K_L	GGUF	3 - bit quantized (K L variant)
`Qwen2-VL-OCR-2B-Instruct.Q3_K_M.gguf`	824 MB	Q3_K_M	GGUF	3 - bit quantized (K M variant)
`Qwen2-VL-OCR-2B-Instruct.Q3_K_S.gguf`	761 MB	Q3_K_S	GGUF	3 - bit quantized (K S variant)
`Qwen2-VL-OCR-2B-Instruct.Q4_K_M.gguf`	986 MB	Q4_K_M	GGUF	4 - bit quantized (K M variant)
`Qwen2-VL-OCR-2B-Instruct.Q4_K_S.gguf`	940 MB	Q4_K_S	GGUF	4 - bit quantized (K S variant)
`Qwen2-VL-OCR-2B-Instruct.Q5_K_M.gguf`	1.13 GB	Q5_K_M	GGUF	5 - bit quantized (K M variant)
`Qwen2-VL-OCR-2B-Instruct.Q5_K_S.gguf`	1.1 GB	Q5_K_S	GGUF	5 - bit quantized (K S variant)
`Qwen2-VL-OCR-2B-Instruct.Q6_K.gguf`	1.27 GB	Q6_K	GGUF	6 - bit quantized
`Qwen2-VL-OCR-2B-Instruct.Q8_0.gguf`	1.65 GB	Q8_0	GGUF	8 - bit quantized

i1 Quantized Variants

File Name	Size	Quantization	Description
`Qwen2-VL-OCR-2B-Instruct.i1-IQ1_M.gguf`	464 MB	i1 - IQ1_M	i1 1 - bit medium
`Qwen2-VL-OCR-2B-Instruct.i1-IQ1_S.gguf`	437 MB	i1 - IQ1_S	i1 1 - bit small
`Qwen2-VL-OCR-2B-Instruct.i1-IQ2_M.gguf`	601 MB	i1 - IQ2_M	i1 2 - bit medium
`Qwen2-VL-OCR-2B-Instruct.i1-IQ2_S.gguf`	564 MB	i1 - IQ2_S	i1 2 - bit small
`Qwen2-VL-OCR-2B-Instruct.i1-IQ2_XS.gguf`	550 MB	i1 - IQ2_XS	i1 2 - bit extra small
`Qwen2-VL-OCR-2B-Instruct.i1-IQ2_XXS.gguf`	511 MB	i1 - IQ2_XXS	i1 2 - bit extra extra small
`Qwen2-VL-OCR-2B-Instruct.i1-IQ3_M.gguf`	777 MB	i1 - IQ3_M	i1 3 - bit medium
`Qwen2-VL-OCR-2B-Instruct.i1-IQ3_S.gguf`	762 MB	i1 - IQ3_S	i1 3 - bit small
`Qwen2-VL-OCR-2B-Instruct.i1-IQ3_XS.gguf`	732 MB	i1 - IQ3_XS	i1 3 - bit extra small
`Qwen2-VL-OCR-2B-Instruct.i1-IQ3_XXS.gguf`	669 MB	i1 - IQ3_XXS	i1 3 - bit extra extra small
`Qwen2-VL-OCR-2B-Instruct.i1-IQ4_NL.gguf`	936 MB	i1 - IQ4_NL	i1 4 - bit with no - layernorm quantization
`Qwen2-VL-OCR-2B-Instruct.i1-IQ4_XS.gguf`	896 MB	i1 - IQ4_XS	i1 4 - bit extra small
`Qwen2-VL-OCR-2B-Instruct.i1-Q4_0.gguf`	938 MB	i1 - Q4_0	i1 4 - bit traditional quant
`Qwen2-VL-OCR-2B-Instruct.i1-Q4_1.gguf`	1.02 GB	i1 - Q4_1	i1 4 - bit traditional variant

Metadata

File Name	Size	Description
`.gitattributes`	3.37 kB	Git LFS tracking file
`config.json`	34 B	Config placeholder
`README.md`	672 B	Model readme

Quants Usage

(sorted by size, not necessarily quality. IQ - quants are often preferable over similar sized non - IQ quants)

Link	Type	Size/GB	Notes
GGUF	Q2_K	0.4
GGUF	Q3_K_S	0.5
GGUF	Q3_K_M	0.5	lower quality
GGUF	Q3_K_L	0.5
GGUF	IQ4_XS	0.6
GGUF	Q4_K_S	0.6	fast, recommended
GGUF	Q4_K_M	0.6	fast, recommended
GGUF	Q5_K_S	0.6
GGUF	Q5_K_M	0.7
GGUF	Q6_K	0.7	very good quality
GGUF	Q8_0	0.9	fast, best quality
GGUF	f16	1.6	16 bpw, overkill

Here is a handy graph by ikawrakow comparing some lower - quality quant types (lower is better):

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご