D

Donut Base Finetuned Rvlcdip

Developed by naver-clova-ix
Donut is an OCR-free document understanding Transformer model that combines a visual encoder and text decoder to process document images.
Downloads 125.36k
Release Time : 7/19/2022

Model Overview

Donut consists of a Swin Transformer visual encoder and a BART text decoder, capable of generating text directly from images without traditional OCR steps. This version is a document classification model fine-tuned on the RVL-CDIP dataset.

Model Features

OCR-free Document Understanding
Processes image inputs directly without traditional OCR preprocessing steps.
End-to-End Training
Joint training of visual encoder and text decoder for end-to-end document understanding.
Swin Transformer Architecture
Utilizes the efficient Swin Transformer as the visual encoder for handling high-resolution images.

Model Capabilities

Document Image Classification
Image-to-Text Conversion
Visual Document Understanding

Use Cases

Document Processing
Document Classification
Automatically classify the type of scanned documents (e.g., invoices, contracts).
Performs well on the RVL-CDIP dataset.
Document Information Extraction
Extract key information from structured documents.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase