Donut Base Encoder
Donut is an OCR-free document understanding Transformer model that directly processes document images through a visual encoder
Downloads 45
Release Time : 4/2/2025
Model Overview
The Donut model uses Swin Transformer as the visual encoder to encode document images into embedding tensors, suitable for document understanding tasks. This version is a pre-trained base model that requires fine-tuning for downstream tasks.
Model Features
OCR-free processing
Directly understands document content without traditional OCR steps
Visual encoder
Uses Swin Transformer architecture to process image inputs
Pre-trained foundation
Provides pre-trained weights that can be fine-tuned for various document tasks
Model Capabilities
Document image feature extraction
Visual representation learning
Document understanding
Use Cases
Document processing
Document image classification
Classify different types of document images
Document parsing
Extract structured information from document images
Featured Recommended AI Models