A

Arabic Large Nougat

Developed by MohamedRashad
An end-to-end structured optical character recognition system specifically designed for Arabic, converting book page images into structured text (Markdown format)
Downloads 537
Release Time : 10/18/2024

Model Overview

This model is trained from scratch with a novel tokenizer, based on the foundational Nougat architecture, suitable for fields such as Arabic literature digitization and printed material text extraction.

Model Features

Arabic-Specific OCR
Optical character recognition system optimized specifically for Arabic text
Structured Output
Capable of generating structured text output in Markdown format
End-to-End Solution
Complete processing pipeline directly from image to text, with no intermediate steps required
Book Processing Optimization
Particularly suitable for processing Arabic book pages

Model Capabilities

Arabic Text Recognition
English Text Recognition
Book Page Processing
Markdown Format Generation

Use Cases

Literature Digitization
Digitization of Ancient Arabic Texts
Converting printed ancient Arabic texts into searchable digital text
Preserves the original text structure and formatting
Education
Textbook Content Extraction
Extracting text content from Arabic textbooks for e-learning purposes
Structured output facilitates further processing
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase