donut-base-encoder Open-source Document Understanding Model - Process Document Images Directly without OCR

Donut Base Encoder

Developed by eljandoubi

Donut is an OCR-free document understanding Transformer model that directly processes document images through a visual encoder

Text Recognition

Transformers

Open Source License:MIT #OCR-free document understanding #Swin Transformer encoder #Document image feature extraction

Downloads 45

Release Time : 4/2/2025

Model Overview

The Donut model uses Swin Transformer as the visual encoder to encode document images into embedding tensors, suitable for document understanding tasks. This version is a pre-trained base model that requires fine-tuning for downstream tasks.

Model Features

OCR-free processing

Directly understands document content without traditional OCR steps

Visual encoder

Uses Swin Transformer architecture to process image inputs

Pre-trained foundation

Provides pre-trained weights that can be fine-tuned for various document tasks

Model Capabilities

Document image feature extraction

Visual representation learning

Document understanding

Use Cases

Document processing

Document image classification

Classify different types of document images

Document parsing

Extract structured information from document images

Property	Details
Model Type	Donut (base-sized model, pre-trained only)
Training Data	Not specified
License	MIT
Pipeline Tag	image-feature-extraction

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Donut Base Encoder

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Donut (base-sized model, pre-trained only)

📚 Documentation

Model description

Intended uses & limitations

How to use

BibTeX entry and citation info

📄 License