D

Donut Base Finetuned Docvqa

Developed by naver-clova-ix
Donut is an OCR-free document understanding Transformer model, fine-tuned on the DocVQA dataset, capable of directly extracting and comprehending text information from images.
Downloads 167.80k
Release Time : 7/19/2022

Model Overview

This model consists of a visual encoder (Swin Transformer) and a text decoder (BART), enabling direct text generation from document images without traditional OCR preprocessing steps.

Model Features

OCR-free Processing
Directly processes document images, avoiding error accumulation issues in traditional OCR pipelines
End-to-End Training
Joint training of visual encoder and text decoder enables direct image-to-text conversion
Document Comprehension
Optimized for document images, capable of understanding structured content like invoices and contracts

Model Capabilities

Document Image Understanding
Visual Question Answering
Text Information Extraction
Image-to-Text Conversion

Use Cases

Document Processing
Invoice Information Extraction
Extracts key information like numbers and amounts from invoice images
Accurately identifies specific fields in structured documents
Contract Clause Query
Answers specific questions about contract document content
Capable of understanding key clauses in contract documents
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase