Q

Qari OCR V0.3 VL 2B Instruct

Developed by NAMAA-Space
QARI-OCR v0.3 is an optical character recognition vision-language model focused on Arabic structured document understanding. It is built on Qwen2-VL-2B-Instruct and excels at preserving document layout and format.
Downloads 1,016
Release Time : 4/10/2025

Model Overview

This model is specifically designed for Arabic optical character recognition, especially good at handling structured documents and can preserve HTML tags, document layout, and full diacritics (tashkeel) in Arabic.

Model Features

Layout perception and recognition
Preserve document structure through HTML/Markdown tags
Full diacritic support
Accurately recognize Arabic diacritics (tashkeel)
Multi-font processing
Trained on 12 different Arabic fonts (14px - 100px)
Structure-first design
Optimized for documents containing headings, body text, and complex layouts
Efficient training
Only takes 11 hours to train on a single GPU with 10k samples
Robust performance
Can handle low-resolution and damaged images

Model Capabilities

Arabic text recognition
Document layout understanding
HTML/Markdown structure preservation
Handwritten text recognition (preliminary ability)

Use Cases

Document processing
Digitization of Arabic documents
Convert paper Arabic documents to digital format while preserving the original layout and format
High-fidelity text conversion, preserving HTML/Markdown structure
Academic literature processing
Process Arabic academic literature containing complex layouts and full diacritics
Accurately recognize text content and structure
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase