trocr-large-printed Open Source OCR Model - Free Deployment for Precise Recognition of Single-line Printed Text

Home

Trocr Large Printed

Developed by microsoft

Transformer-based optical character recognition model for single-line printed text recognition

Text Recognition

Transformers

#Printed Text OCR #Transformer Architecture #Single-line Text Recognition

Downloads 295.59k

Release Time : 3/2/2022

Model Overview

TrOCR adopts an encoder-decoder architecture combining image Transformer and text Transformer, specifically designed for optical character recognition (OCR) tasks. This version is optimized for printed text.

Model Features

Hybrid Architecture Design

Combines visual Transformer encoder and text Transformer decoder for end-to-end OCR

Pre-trained Weight Initialization

Image encoder inherits BEiT weights, text decoder inherits RoBERTa weights, enhancing model performance

Printed Text Optimization

Specifically fine-tuned for printed text recognition, demonstrating excellent performance on the SROIE dataset

Model Capabilities

Printed text recognition

Single-line text image processing

End-to-end character recognition

Use Cases

Document Digitization

Receipt Recognition

Automatically recognize text information in scanned receipts

Performs well on the SROIE dataset

Form Processing

Extract text content from form documents

Industrial Applications

Product Label Recognition

Automatically read printed text on product labels

🚀 TrOCR (large-sized model, fine-tuned on SROIE)

TrOCR model fine-tuned on the SROIE dataset for optical character recognition.

🚀 Quick Start

The TrOCR model, fine-tuned on the SROIE dataset, was introduced in the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al. and first released in this repository.

Disclaimer: The team releasing TrOCR did not write a model card for this model so this model card has been written by the Hugging Face team.

✨ Features

The TrOCR model is an encoder - decoder model, consisting of an image Transformer as encoder, and a text Transformer as decoder. The image encoder was initialized from the weights of BEiT, while the text decoder was initialized from the weights of RoBERTa.

Images are presented to the model as a sequence of fixed - size patches (resolution 16x16), which are linearly embedded. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder. Next, the Transformer text decoder autoregressively generates tokens.

📚 Documentation

Intended uses & limitations

You can use the raw model for optical character recognition (OCR) on single text - line images. See the model hub to look for fine - tuned versions on a task that interests you.

How to use

Here is how to use this model in PyTorch:

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import requests

# load image from the IAM database (actually this model is meant to be used on printed text)
url = 'https://fki.tic.heia-fr.ch/static/img/a01-122-02-00.jpg'
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

processor = TrOCRProcessor.from_pretrained('microsoft/trocr-large-printed')
model = VisionEncoderDecoderModel.from_pretrained('microsoft/trocr-large-printed')
pixel_values = processor(images=image, return_tensors="pt").pixel_values

generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

BibTeX entry and citation info

@misc{li2021trocr,
      title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, 
      author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
      year={2021},
      eprint={2109.10282},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご