donut-base-finetuned-rvlcdip Open-source Document Understanding Model

Donut Base Finetuned Rvlcdip

Developed by naver-clova-ix

Donut is an OCR-free document understanding Transformer model that combines a visual encoder and text decoder to process document images.

Image-to-Text

Transformers

Open Source License:MIT #OCR-free Document Understanding #Swin-BART Architecture #Document Image Classification

Downloads 125.36k

Release Time : 7/19/2022

Model Overview

Donut consists of a Swin Transformer visual encoder and a BART text decoder, capable of generating text directly from images without traditional OCR steps. This version is a document classification model fine-tuned on the RVL-CDIP dataset.

Model Features

OCR-free Document Understanding

Processes image inputs directly without traditional OCR preprocessing steps.

End-to-End Training

Joint training of visual encoder and text decoder for end-to-end document understanding.

Swin Transformer Architecture

Utilizes the efficient Swin Transformer as the visual encoder for handling high-resolution images.

Model Capabilities

Document Image Classification

Image-to-Text Conversion

Visual Document Understanding

Use Cases

Document Processing

Document Classification

Automatically classify the type of scanned documents (e.g., invoices, contracts).

Performs well on the RVL-CDIP dataset.

Document Information Extraction

Extract key information from structured documents.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Donut Base Finetuned Rvlcdip

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Donut (base-sized model, fine-tuned on RVL-CDIP)

📚 Documentation

Model description

Intended uses & limitations

BibTeX entry and citation info

📄 License