D

Donut Base Encoder

Developed by eljandoubi
Donut is an OCR-free document understanding Transformer model that directly processes document images through a visual encoder
Downloads 45
Release Time : 4/2/2025

Model Overview

The Donut model uses Swin Transformer as the visual encoder to encode document images into embedding tensors, suitable for document understanding tasks. This version is a pre-trained base model that requires fine-tuning for downstream tasks.

Model Features

OCR-free processing
Directly understands document content without traditional OCR steps
Visual encoder
Uses Swin Transformer architecture to process image inputs
Pre-trained foundation
Provides pre-trained weights that can be fine-tuned for various document tasks

Model Capabilities

Document image feature extraction
Visual representation learning
Document understanding

Use Cases

Document processing
Document image classification
Classify different types of document images
Document parsing
Extract structured information from document images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase