Donut-base-finetuned-docvqa Open-source Model - Extract and Understand Text Information Directly from Images Without OCR

Donut Base Finetuned Docvqa

Developed by naver-clova-ix

Donut is an OCR-free document understanding Transformer model, fine-tuned on the DocVQA dataset, capable of directly extracting and comprehending text information from images.

Image-to-Text

Transformers

Open Source License:MIT #Document Visual Question Answering #OCR-free Text Extraction #Swin-BART Architecture

Downloads 167.80k

Release Time : 7/19/2022

Model Overview

This model consists of a visual encoder (Swin Transformer) and a text decoder (BART), enabling direct text generation from document images without traditional OCR preprocessing steps.

Model Features

OCR-free Processing

Directly processes document images, avoiding error accumulation issues in traditional OCR pipelines

End-to-End Training

Joint training of visual encoder and text decoder enables direct image-to-text conversion

Document Comprehension

Optimized for document images, capable of understanding structured content like invoices and contracts

Model Capabilities

Document Image Understanding

Visual Question Answering

Text Information Extraction

Image-to-Text Conversion

Use Cases

Document Processing

Invoice Information Extraction

Extracts key information like numbers and amounts from invoice images

Accurately identifies specific fields in structured documents

Contract Clause Query

Answers specific questions about contract document content

Capable of understanding key clauses in contract documents

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Donut Base Finetuned Docvqa

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Donut (base-sized model, fine-tuned on DocVQA)

🚀 Quick Start

✨ Features

Model description

Intended uses & limitations

BibTeX entry and citation info

📄 License