D

Donut Proto

Developed by naver-clova-ix
Donut is an OCR-free document understanding Transformer model that combines a visual encoder and text decoder for image-to-text conversion
Downloads 30
Release Time : 7/19/2022

Model Overview

The Donut model consists of a Swin Transformer visual encoder and BART text decoder, capable of encoding images into embedding tensors and autoregressively generating text, specifically designed for document understanding tasks

Model Features

OCR-free Processing
Directly processes image inputs, avoiding error accumulation issues in traditional OCR pipelines
End-to-End Training
Joint training of visual encoder and text decoder enables direct image-to-text conversion
Document Understanding Capability
Specifically optimized for document images to understand document structure and content

Model Capabilities

Document Image Processing
Image-to-Text Conversion
Document Structure Understanding
Vision-Language Joint Modeling

Use Cases

Document Processing
Document Image Classification
Automatically identifies and classifies different types of document images
Document Parsing
Extracts structured information from document images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase