donut-base-finetuned-cord-v1-2560 Open-source Model - Convert Document Images to Text without OCR

Donut Base Finetuned Cord V1 2560

Developed by naver-clova-ix

Donut is an OCR-free document understanding Transformer model that combines a visual encoder with a text decoder to achieve image-to-text conversion.

Image-to-Text

Transformers

Open Source License:MIT #OCR-free Document Parsing #Vision-to-Text Conversion #Swin-BART Architecture

Downloads 30

Release Time : 7/19/2022

Model Overview

The Donut model encodes images via Swin Transformer and generates text with a BART decoder, specifically designed for document parsing tasks and fine-tuned on the CORD dataset.

Model Features

OCR-Free Document Understanding

Processes image inputs directly without traditional OCR preprocessing steps.

End-to-End Training

Joint training of visual encoder and text decoder enables direct image-to-text conversion.

Efficient Architecture

Combines Swin Transformer's efficient image encoding with BART's powerful text generation capabilities.

Model Capabilities

Document Image Understanding

Image-to-Text Conversion

Structured Information Extraction

Use Cases

Document Processing

Receipt Parsing

Extracts structured information such as merchant name, amount, date, etc. from receipt images.

Performs excellently on the CORD dataset.

Form Recognition

Automatically identifies and extracts fields and content from forms.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Donut Base Finetuned Cord V1 2560

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Donut (base-sized model, fine-tuned on CORD)

🚀 Quick Start

✨ Features

Model Description

Intended Uses & Limitations

BibTeX Entry and Citation Info

📄 License