Donut Base Finetuned Docvqa
Donut is an OCR-free document understanding Transformer model, fine-tuned on the DocVQA dataset, capable of directly extracting and comprehending text information from images.
Downloads 167.80k
Release Time : 7/19/2022
Model Overview
This model consists of a visual encoder (Swin Transformer) and a text decoder (BART), enabling direct text generation from document images without traditional OCR preprocessing steps.
Model Features
OCR-free Processing
Directly processes document images, avoiding error accumulation issues in traditional OCR pipelines
End-to-End Training
Joint training of visual encoder and text decoder enables direct image-to-text conversion
Document Comprehension
Optimized for document images, capable of understanding structured content like invoices and contracts
Model Capabilities
Document Image Understanding
Visual Question Answering
Text Information Extraction
Image-to-Text Conversion
Use Cases
Document Processing
Invoice Information Extraction
Extracts key information like numbers and amounts from invoice images
Accurately identifies specific fields in structured documents
Contract Clause Query
Answers specific questions about contract document content
Capable of understanding key clauses in contract documents
Featured Recommended AI Models