D

Donut Rus

Developed by Akajackson
An end-to-end Russian text recognition model based on Transformer architecture, trained on a SynthDoG synthetic dataset containing 100,000 images of Russian literary works
Downloads 550
Release Time : 4/2/2023

Model Overview

This model is a Donut model for Russian and English text recognition, employing an end-to-end Transformer architecture, particularly suitable for processing text content in document images.

Model Features

Multilingual Support
Supports Russian and English text recognition, suitable for multilingual document processing scenarios
Efficient Recognition
Achieves a normalized edit distance (Normed ED) of 0.02239 on the validation set, demonstrating excellent performance
Synthetic Data Training
Trained on 100,000 SynthDoG synthetic dataset with text content sourced from Russian literary works
Customized Tokenizer
Utilizes DeepPavlov/xlm-roberta-large-en-ru as the tokenizer, optimized for Russian language processing

Model Capabilities

Document image text recognition
Multilingual text extraction
End-to-end document processing

Use Cases

Document Processing
Multi-format Document Recognition
Recognize text content in various document formats
High-precision text extraction
Document QA System
Build a question-answering system based on recognized text content
Document Classification
Classify documents based on recognized content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase