trocr-small-handwritten Open-source OCR Model - Free Deployment for Accurate Recognition of Handwritten Text Images

Trocr Small Handwritten

Developed by microsoft

TrOCR is a Transformer-based optical character recognition model specifically designed for handwritten text image recognition.

Text Recognition

Transformers

#Handwritten OCR #Single-line text recognition #Transformer architecture

Downloads 517.96k

Release Time : 3/2/2022

Model Overview

The TrOCR model is an encoder-decoder model composed of an image Transformer encoder and a text Transformer decoder, specifically designed for optical character recognition (OCR) of single-line text images.

Model Features

Transformer-based architecture

Utilizes advanced Transformer architecture, combining image and text processing capabilities for efficient OCR.

Pre-trained model fine-tuning

Image encoder is based on DeiT pre-training, text decoder is based on UniLM pre-training, fine-tuned on the IAM handwritten dataset.

End-to-end recognition

Directly processes from image input to text output without complex preprocessing steps.

Model Capabilities

Handwritten text recognition

Single-line text image processing

English character recognition

Use Cases

Document digitization

Handwritten notes transcription

Convert handwritten note images into editable text format

Accurately recognizes handwritten text in the IAM test set

Historical archives processing

Historical manuscript transcription

Digitize historical handwritten documents

🚀 TrOCR (small-sized model, fine-tuned on IAM)

TrOCR is a model fine-tuned on the IAM dataset. It was introduced in the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al. and first released in this repository. It's designed for optical character recognition on single text-line images.

✨ Features

Encoder - Decoder Architecture: The TrOCR model is an encoder - decoder model. It uses an image Transformer as the encoder and a text Transformer as the decoder. The image encoder is initialized from the weights of DeiT, and the text decoder is initialized from the weights of UniLM.
Patch - based Image Input: Images are presented to the model as a sequence of fixed - size patches (resolution 16x16), which are linearly embedded. Absolute position embeddings are added before feeding the sequence to the Transformer encoder layers. The Transformer text decoder then autoregressively generates tokens.

🚀 Quick Start

You can use the raw model for optical character recognition (OCR) on single text - line images. Check out the model hub to find fine - tuned versions for tasks that interest you.

💻 Usage Examples

Basic Usage

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# load image from the IAM database
url = 'https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg'
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

processor = TrOCRProcessor.from_pretrained('microsoft/trocr-small-handwritten')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-small-handwritten')
pixel_values = processor(images=image, return_tensors="pt").pixel_values

generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

📚 Documentation

Model description

The TrOCR model is structured as an encoder - decoder system. The encoder part is an image Transformer, and the decoder is a text Transformer. The initialization of the image encoder comes from the weights of DeiT, and the text decoder uses the weights of UniLM.

When processing images, they are first divided into a sequence of fixed - size patches (16x16 resolution). These patches are linearly embedded, and absolute position embeddings are added before passing the sequence through the Transformer encoder layers. Subsequently, the Transformer text decoder generates tokens in an autoregressive manner.

Intended uses & limitations

This model is suitable for optical character recognition on single text - line images. You can search the model hub for fine - tuned versions tailored to specific tasks.

BibTeX entry and citation info

@misc{li2021trocr,
      title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, 
      author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
      year={2021},
      eprint={2109.10282},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご