Donut-base-finetuned-cord-v2 Open-source Model - Extract Text Information Directly from Images without OCR

Donut Base Finetuned Cord V2

Developed by naver-clova-ix

Donut is an OCR-free document understanding Transformer model composed of a visual encoder (Swin Transformer) and a text decoder (BART), capable of directly extracting text information from images.

Image-to-Text

Transformers

Open Source License:MIT #OCR-free Document Parsing #Swin-BART Architecture #Visual Text Generation

Downloads 21.63k

Release Time : 7/19/2022

Model Overview

This model is fine-tuned on the CORD dataset, specifically designed for document parsing tasks, enabling the conversion of document content in images into structured text.

Model Features

OCR-Free Document Understanding

Processes image inputs directly without traditional OCR preprocessing steps.

End-to-End Training

Joint training of visual encoder and text decoder optimizes overall performance.

Transformer-Based Architecture

Combines the strengths of Swin Transformer and BART for efficient vision-language modeling.

Model Capabilities

Document Image Understanding

Image-to-Text Conversion

Structured Document Parsing

Use Cases

Document Processing

Receipt Parsing

Extracts structured information such as merchant name, item list, prices, etc. from receipt images.

Performs well on the CORD dataset.

Table Recognition

Converts tables in images into editable text formats.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Donut Base Finetuned Cord V2

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Donut (base-sized model, fine-tuned on CORD)

🚀 Quick Start

✨ Features

Model description

Intended uses & limitations

📄 License

BibTeX entry and citation info