O

OCR DocVQA Donut

Developed by jinhybr
Donut is an OCR-free document understanding Transformer model that combines a visual encoder and text decoder for document visual question answering tasks.
Downloads 240
Release Time : 11/4/2022

Model Overview

The DocVQA-fine-tuned Donut model uses Swin Transformer for image encoding and BART decoder for text generation, achieving OCR-free document understanding.

Model Features

OCR-free processing
Directly understands document content from images without traditional OCR steps
End-to-end training
Joint optimization of visual encoding and text generation
Document understanding
Can parse key information from structured documents like invoices and contracts

Model Capabilities

Document image understanding
Visual question answering
Key information extraction
Cross-modal representation learning

Use Cases

Document processing
Invoice information extraction
Automatically identifies key fields like invoice numbers and amounts from invoice images
Examples show accurate extraction of invoice numbers
Contract parsing
Analyzes terms and amount information in contract documents
Examples demonstrate recognition of purchase amounts
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase