Thai-TroCR Open-Source OCR Model - Free Deployment for Precise Recognition of Thai and English Handwritten Text Line Images

Thai Trocr

Developed by openthaigpt

A Thai and English optical character recognition model fine-tuned from the TrOCR base handwriting model, excelling in processing handwritten text line images

Text Recognition

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Thai-English OCR #Handwriting Recognition #Low CER

Downloads 2,677

Release Time : 9/29/2024

Model Overview

ThaiTrOCR is an optical character recognition model specifically designed for Thai and English, combining a Vision Transformer encoder with an Electra text decoder, capable of efficient deployment in resource-constrained environments while achieving high-precision recognition

Model Features

Multilingual Support

Supports both Thai and English text recognition

Efficient and Lightweight

Compact design suitable for deployment in resource-constrained environments

High-Precision Recognition

Outperforms mainstream OCR systems on various document types

Model Capabilities

Thai Text Recognition

English Text Recognition

Handwriting Recognition

PDF Document Recognition

Scene Text Recognition

Use Cases

Document Digitization

Thai Document Scanning

Convert paper Thai documents into editable digital text

Character Error Rate only 5.76% (PDF documents)

Handwritten Note Recognition

Thai Handwritten Note Conversion

Recognize and convert Thai handwritten notes into digital text

Character Error Rate 19% (handwritten)

🚀 Thai-TrOCR Model

ThaiTrOCR is a multilingual OCR model fine - tuned for Thai and English, leveraging the TrOCR architecture for high - accuracy character recognition in resource - constrained environments.

🚀 Quick Start

ThaiTrOCR is a fine - tuned version of the TrOCR base handwritten model, specifically crafted for Optical Character Recognition (OCR) in both Thai and English. This multilingual model adeptly processes handwritten text - line images in both languages, leveraging the TrOCR architecture, which combines a Vision Transformer encoder with an Electra - based text decoder. Designed to be compact and lightweight, ThaiTrOCR is optimized for efficient deployment in resource - constrained environments while achieving high accuracy in character recognition.

✨ Features

Encoder: TrOCR Base Handwritten
Decoder: Electra Small (Trained with Thai corpus)

📦 Installation

No specific installation steps are provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# Load processor and model
processor = TrOCRProcessor.from_pretrained('openthaigpt/thai-trocr')
model = VisionEncoderDecoderModel.from_pretrained('openthaigpt/thai-trocr')

# Load an image
url = 'your_image_url_here'
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

# Process and generate text
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)

📚 Documentation

Model Performance Comparison

This section details the performance comparison between the open - source ThaiTrOCR model and other widely - used OCR systems, namely EasyOCR and Tesseract. The table below highlights their respective performance across various document types based on the average Character Error Rate (CER).

Document Type	ThaiTrOCR	EasyOCR	Tesseract
Handwritten	0.190034	0.410738	1.032375
PDF Document	0.057597	0.085937	0.761595
PDF Document (EN - TH)	0.053968	0.308075	1.061107
Real Document	0.147440	0.293482	0.915707
Scene Text	0.134182	0.390583	2.408704
Adjusted Mean	0.123600	0.298474	1.269101

Disclaimer: The test dataset at https://huggingface.co/datasets/openthaigpt/thai-ocr-evaluation includes only 104 images, which may limit the generalizability of these results. We are increasing the number of the test dataset.

Key Insights

Character Error Rate (CER): This metric evaluates the percentage of characters that were incorrectly predicted by the model. A lower CER indicates better performance. As shown in the table, ThaiTrOCR consistently outperforms EasyOCR and Tesseract across all document types, with a significantly lower average CER, making it the most accurate model in the comparison.
Model Performance: The ThaiTrOCR model is particularly effective with PDF documents (both Thai - only and bilingual English - Thai texts), and shows substantial improvement over competing models in reading scene text and handwritten content.
Tesseract Limitation: It’s important to note that Tesseract only supports single - language input at a time in this comparison. For the purposes of this benchmark, it was tested using only the Thai language setting, which might have contributed to its higher CER values.
The evaluation dataset is sourced from the openthaigpt/thai - ocr - evaluation.

📄 License

This project is licensed under the apache - 2.0 license.

👥 Sponsors

🖋️ Authors

Suchut Sapsathien (suchut@outlook.com)
Jillaphat Jaroenkantasima (autsadang41@gmail.com)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご