arabic-small-nougat Open-Source Model - An End-to-End OCR System Specially Built for Arabic

Arabic Small Nougat

Developed by MohamedRashad

An end-to-end structured optical character recognition system specifically designed for Arabic, fine-tuned based on the facebook/nougat-small architecture

Image-to-Text

Transformers

Supports Multiple LanguagesOpen Source License:Gpl-3.0 #Arabic OCR #Book Digitization #End-to-End Structuring

Downloads 1,149

Release Time : 2/17/2024

Model Overview

This model is an end-to-end structured OCR system for Arabic books, capable of converting Arabic book images into structured text (especially in Markdown format).

Model Features

Arabic OCR Optimization

Specially optimized for Arabic text recognition, capable of handling complex layouts in Arabic books

Structured Output

Generates structured text in Markdown format, preserving the original document's formatting information

End-to-End Processing

Complete processing pipeline from image to text without intermediate steps

Model Capabilities

Arabic Text Recognition

English Text Recognition

Book Image Processing

Markdown Format Generation

Use Cases

Literature Digitization

Digitization of Ancient Arabic Texts

Convert images of ancient Arabic texts into editable digital text

Achieves digitization and searchability of ancient text content

Printed Material Processing

Arabic Book Scanning

Process scanned Arabic book pages to extract text content

Generates structured e-book content

🚀 Arabic Small Nougat

End-to-End Structured OCR For Arabic books.

The arabic-small-nougat OCR is an end-to-end structured Optical Character Recognition (OCR) system tailored for the Arabic language, which can convert images of Arabic book pages into structured text, especially in Markdown format. It's useful for digitizing Arabic literature and extracting text from printed materials.

🚀 Quick Start

Demo

You can try the model through the online demo: https://huggingface.co/spaces/MohamedRashad/Arabic-Nougat

Local Usage

Use the following code to start using the model locally:

from PIL import Image
import torch
from transformers import NougatProcessor, VisionEncoderDecoderModel

# Load the model and processor
processor = NougatProcessor.from_pretrained("MohamedRashad/arabic-small-nougat")
model = VisionEncoderDecoderModel.from_pretrained("MohamedRashad/arabic-small-nougat")
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

context_length = 2048

def predict(img_path):
    # prepare PDF image for the model
    image = Image.open(img_path)
    pixel_values = processor(image, return_tensors="pt").pixel_values

    # generate transcription
    outputs = model.generate(
        pixel_values.to(device),
        min_length=1,
        max_new_tokens=context_length,
        bad_words_ids=[[processor.tokenizer.unk_token_id]],
    )

    page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
    page_sequence = processor.post_process_generation(page_sequence, fix_markdown=False)
    return page_sequence

print(predict("path/to/page_image.jpg"))

✨ Features

End - to - End OCR: Directly convert Arabic book page images into structured text.
Multi - language Support: Supports both Arabic and English.
Markdown Output: Ideal for generating structured Markdown text.

📚 Documentation

Description

[**Github**](https://github.com/MohamedAliRashad/arabic-nougat) 🤗 [**Hugging Face**](https://huggingface.co/collections/MohamedRashad/arabic-nougat-673a3f540bd92904c9b92a8e) 📝 [**Paper**](https://arxiv.org/abs/2411.17835) 🗂️ [**Data**](https://huggingface.co/datasets/MohamedRashad/arabic-img2md) 📽️ [**Demo**](https://huggingface.co/spaces/MohamedRashad/Arabic-Nougat)

The arabic-small-nougat OCR is based on the facebook/nougat-small architecture and has been fine - tuned using the Khatt dataset along with a custom dataset.

Bias, Risks, and Limitations

Text Hallucination: The model may occasionally generate repeated or incorrect text.
Erroneous Image Paths: It may output irrelevant image paths.
Context Length Constraint: With a maximum context length of 2048 tokens, longer book pages may result in incomplete transcriptions.

Intended Use

Designed for converting images of Arabic book pages into structured text, especially in Markdown format. It's suitable for digitizing Arabic literature and text extraction from printed materials.

Ethical Considerations

Be aware of the model's limitations, especially when accurate OCR results are crucial. Users should verify and review the output, especially in high - precision scenarios.

Model Details

Property	Details
Developed by	Mohamed Rashad
Model Type	VisionEncoderDecoderModel
Language(s) (NLP)	Arabic & English
License	GPL 3.0
Finetuned from model	nougat-small

Acknowledgment

If you use or build upon the Arabic Small Nougat OCR, please acknowledge the model developer and the open - source community. Also, include a copy of the GPL 3.0 license with any redistributed or modified versions of the model.

Citation

If you find this model useful, please consider citing the original facebook/nougat-small model and the datasets used for fine - tuning, including the Khatt dataset and any details regarding the custom dataset.

@misc{rashad2024arabicnougatfinetuningvisiontransformers,
      title={Arabic-Nougat: Fine-Tuning Vision Transformers for Arabic OCR and Markdown Extraction}, 
      author={Mohamed Rashad},
      year={2024},
      eprint={2411.17835},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.17835}, 
}
@misc {mohamed_rashad_2024,
	author       = { {Mohamed Rashad} },
	title        = { arabic-small-nougat (Revision 48741d4) },
	year         = 2024,
	url          = { https://huggingface.co/MohamedRashad/arabic-small-nougat },
	doi          = { 10.57967/hf/3534 },
	publisher    = { Hugging Face }
}

Disclaimer

The arabic-small-nougat OCR is provided "as is," and the developers make no guarantees regarding its suitability for specific tasks. Users are encouraged to thoroughly evaluate the model's output for their particular use cases and requirements.

📄 License

This model is licensed under GPL 3.0.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご