D

Donut Base Finetuned Cord V1 2560

Developed by naver-clova-ix
Donut is an OCR-free document understanding Transformer model that combines a visual encoder with a text decoder to achieve image-to-text conversion.
Downloads 30
Release Time : 7/19/2022

Model Overview

The Donut model encodes images via Swin Transformer and generates text with a BART decoder, specifically designed for document parsing tasks and fine-tuned on the CORD dataset.

Model Features

OCR-Free Document Understanding
Processes image inputs directly without traditional OCR preprocessing steps.
End-to-End Training
Joint training of visual encoder and text decoder enables direct image-to-text conversion.
Efficient Architecture
Combines Swin Transformer's efficient image encoding with BART's powerful text generation capabilities.

Model Capabilities

Document Image Understanding
Image-to-Text Conversion
Structured Information Extraction

Use Cases

Document Processing
Receipt Parsing
Extracts structured information such as merchant name, amount, date, etc. from receipt images.
Performs excellently on the CORD dataset.
Form Recognition
Automatically identifies and extracts fields and content from forms.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase