D

Donut Base Finetuned Cord V2

Developed by naver-clova-ix
Donut is an OCR-free document understanding Transformer model composed of a visual encoder (Swin Transformer) and a text decoder (BART), capable of directly extracting text information from images.
Downloads 21.63k
Release Time : 7/19/2022

Model Overview

This model is fine-tuned on the CORD dataset, specifically designed for document parsing tasks, enabling the conversion of document content in images into structured text.

Model Features

OCR-Free Document Understanding
Processes image inputs directly without traditional OCR preprocessing steps.
End-to-End Training
Joint training of visual encoder and text decoder optimizes overall performance.
Transformer-Based Architecture
Combines the strengths of Swin Transformer and BART for efficient vision-language modeling.

Model Capabilities

Document Image Understanding
Image-to-Text Conversion
Structured Document Parsing

Use Cases

Document Processing
Receipt Parsing
Extracts structured information such as merchant name, item list, prices, etc. from receipt images.
Performs well on the CORD dataset.
Table Recognition
Converts tables in images into editable text formats.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase