Open-source Donut model powers uae-license-detection - Effortlessly process document images without OCR

Uae License Detection

Developed by codedrainer

Donut is an OCR-free document understanding Transformer model that combines a visual encoder and text decoder to process document images

Image-to-Text

Transformers

Open Source License:MIT #Document Image Understanding #OCR-free Text Extraction #Swin-BART Architecture

Downloads 21

Release Time : 7/22/2023

Model Overview

A document understanding model based on Swin Transformer visual encoder and BART text decoder, capable of generating text directly from images without OCR preprocessing

Model Features

OCR-free Processing

Directly processes document images without traditional OCR preprocessing steps

End-to-End Training

Joint training of visual encoder and text decoder enables end-to-end document understanding

Multimodal Architecture

Combines Swin Transformer's visual processing capabilities with BART's text generation capabilities

Model Capabilities

Document Image Classification

Image-to-Text Conversion

Document Content Understanding

Use Cases

Document Processing

Document Classification

Automatically classify types of scanned documents (e.g., invoices, contracts)

Document Content Extraction

Extract structured text information from document images

Property	Details
Model Type	Donut (base-sized model, fine-tuned on RVL-CDIP)
Training Data	RVL-CDIP

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Uae License Detection

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Donut (base-sized model, fine-tuned on RVL-CDIP)

📚 Documentation

Model description

Intended uses & limitations

BibTeX entry and citation info

📄 License