OCR DocVQA Donut
Donut is an OCR-free document understanding Transformer model that combines a visual encoder and text decoder for document visual question answering tasks.
Downloads 240
Release Time : 11/4/2022
Model Overview
The DocVQA-fine-tuned Donut model uses Swin Transformer for image encoding and BART decoder for text generation, achieving OCR-free document understanding.
Model Features
OCR-free processing
Directly understands document content from images without traditional OCR steps
End-to-end training
Joint optimization of visual encoding and text generation
Document understanding
Can parse key information from structured documents like invoices and contracts
Model Capabilities
Document image understanding
Visual question answering
Key information extraction
Cross-modal representation learning
Use Cases
Document processing
Invoice information extraction
Automatically identifies key fields like invoice numbers and amounts from invoice images
Examples show accurate extraction of invoice numbers
Contract parsing
Analyzes terms and amount information in contract documents
Examples demonstrate recognition of purchase amounts
Featured Recommended AI Models
Š 2025AIbase