Khmer-trocr-ocr-v1.0 Open-source Model - Free Deployment for Precise Recognition of Khmer Names and Scripts

Khmer Trocr Ocr V1.0

Developed by songhieng

A Khmer name recognition model fine-tuned based on Microsoft TrOCR, specifically designed for optical character recognition tasks of Khmer scripts.

Text Recognition

Transformers

OtherOpen Source License:Apache-2.0 #Khmer OCR #Name Recognition #ID Card Recognition

Downloads 229

Release Time : 5/27/2025

Model Overview

This model is an optical character recognition system optimized for Khmer name recognition, using the VisionEncoderDecoder architecture, which combines visual encoding and decoding capabilities.

Model Features

Specific Domain Optimization

Fine-tuned for Khmer name recognition, enabling more accurate identification of Khmer names

Advanced Architecture

Adopts the VisionEncoderDecoderModel (ViT + RoBERTa) architecture, combining visual encoding and decoding capabilities

Language Support

Specifically designed for optical character recognition tasks of Khmer scripts

Model Capabilities

Khmer Text Recognition

Image-to-Text Conversion

Name Recognition

Use Cases

Identity Recognition

Khmer ID Card Recognition

Recognize the name information on Khmer ID cards

🚀 Khmer TrOCR OCR 📝🇰🇭

This is a fine - tuned model based on [microsoft/trocr - base - stage1](https://huggingface.co/microsoft/trocr - base - stage1), designed for Khmer name recognition. It utilizes synthetic image - text pairs of Khmer personal names to achieve accurate identification of Khmer names.

🚀 Quick Start

Install required packages

pip install transformers torch pillow

Python Inference Example

import torch
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

# Load model and processor
model = VisionEncoderDecoderModel.from_pretrained("your_username/khmer-trocr-ocr")
processor = TrOCRProcessor.from_pretrained("your_username/khmer-trocr-ocr")

# Load and process image
image = Image.open("khmer_name_images/khmer_name_00001.png").convert("RGB")
pixel_values = processor(images=image, return_tensors="pt").pixel_values

# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
pixel_values = pixel_values.to(device)

# Generate prediction
generated_ids = model.generate(pixel_values)
predicted_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print("🔤 Predicted:", predicted_text)

✨ Features

Khmer Name Recognition: Specifically designed for recognizing Khmer names.
Based on TrOCR: Utilizes the pre - trained microsoft/trocr - base - stage1 model for fine - tuning.
Synthetic Data Training: Trained on a custom - generated synthetic dataset of Khmer names.

📦 Installation

To use this model, you need to install the necessary packages. You can install them using the following command:

pip install transformers torch pillow

💻 Usage Examples

Basic Usage

import torch
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

# Load model and processor
model = VisionEncoderDecoderModel.from_pretrained("your_username/khmer-trocr-ocr")
processor = TrOCRProcessor.from_pretrained("your_username/khmer-trocr-ocr")

# Load and process image
image = Image.open("khmer_name_images/khmer_name_00001.png").convert("RGB")
pixel_values = processor(images=image, return_tensors="pt").pixel_values

# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
pixel_values = pixel_values.to(device)

# Generate prediction
generated_ids = model.generate(pixel_values)
predicted_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print("🔤 Predicted:", predicted_text)

📚 Documentation

📌 Model Details

Property	Details
Model Type	VisionEncoderDecoderModel (ViT + RoBERTa)
Base Model	`microsoft/trocr-base-stage1`
Language	Khmer (`km`)
Task	OCR (Optical Character Recognition) — specifically for Khmer script

🧠 Training

The model was fine - tuned on a synthetic dataset of rendered Khmer names using a Khmer Unicode font (KhmerOS_muol.ttf). Each image is paired with a corresponding text label for supervised training.

Input: RGB image (512x64) of a Khmer name
Output: Unicode Khmer text
Dataset: Custom - generated dataset of Khmer names (10,000+ samples)
Preprocessing: Images were rendered from text using PIL and paired with their ground - truth labels