# Document Understanding

Qwen2.5 VL 7B Instruct Quantized.w8a8
Apache-2.0
Quantized version of Qwen2.5-VL-7B-Instruct, supporting vision-text input and text output, optimized for inference efficiency through INT8 weight quantization
Image-to-Text Transformers English
Q
RedHatAI
1,992
3
Qwen2.5 VL 3B Instruct FP8 Dynamic
Apache-2.0
The FP8 quantized version of Qwen2.5-VL-3B-Instruct, supporting visual-text input and text output, and optimizing inference efficiency.
Text-to-Image Transformers English
Q
RedHatAI
112
1
H2ovl Mississippi 800m
Apache-2.0
An 800M-parameter vision-language model from H2O.ai, specializing in OCR and document understanding with excellent performance
Image-to-Text Transformers English
H
h2oai
77.67k
33
Idefics3 8B Llama3
Apache-2.0
Idefics3 is an open-source multimodal model capable of processing arbitrary sequences of image and text inputs to generate text outputs. It shows significant improvements in OCR, document understanding, and visual reasoning.
Image-to-Text Transformers English
I
HuggingFaceM4
45.86k
277
Horus OCR
Donut is a Transformer-based image-to-text model capable of extracting and generating textual content from images.
Image-to-Text Transformers
H
TeeA
21
0
Fine Tuned Rvl Cdip
A fine-tuned version of the microsoft/layoutlmv3-base model for document image classification tasks, achieving an F1 score of 0.8177 on the evaluation set
Text Recognition Transformers
F
davidhajdu
21
1
Donut Base Handwriting Recognition
MIT
Handwriting recognition model fine-tuned based on naver-clova-ix/donut-base
Text Recognition Transformers
D
Cdywalst
140
1
Docllm Baichuan2 7b
DocLLM_reimplementation is a large language model implementation project for document understanding tasks, aimed at reimplementing and improving document comprehension capabilities.
Large Language Model Transformers
D
JinghuiLuAstronaut
185
5
Donut 240202
MIT
A document understanding model fine-tuned from Yazawa/donut-base-sroie, suitable for structured document information extraction tasks
Text Recognition Transformers
D
Yazawa
93
0
Donut Receipt V3
MIT
Model fine-tuned based on naver-clova-ix/donut-base, specific purpose not explicitly stated
Large Language Model Transformers
D
mychen76
28
0
Donut Receipt V2
MIT
A model fine-tuned based on naver-clova-ix/donut-base, potentially used for receipt recognition or document understanding tasks
Large Language Model Transformers
D
mychen76
31
0
Donut Base Sroie
MIT
A model fine-tuned on an image folder dataset based on naver-clova-ix/donut-base, with no specific use case explicitly stated
Text Recognition Transformers
D
iamkhadke
13
0
Donut Trained Example 3
MIT
Fine-tuned model based on Donut architecture, specific purpose and functionality require more information
Large Language Model Transformers
D
anarenteriare
14
0
Donut Trained Example 2
MIT
Model fine-tuned based on naver-clova-ix/donut-base, specific purpose not clearly stated
Large Language Model Transformers
D
anarenteriare
13
0
Donut Base Receipt V3
MIT
Receipt recognition model fine-tuned based on naver-clova-ix/donut-base
Large Language Model Transformers
D
hyunguk1
13
0
Donut Base Receipt
MIT
A receipt recognition model fine-tuned based on naver-clova-ix/donut-base
Large Language Model Transformers
D
hyunguk1
19
0
Donut Base Ru
MIT
Model fine-tuned based on naver-clova-ix/donut-base, specific purpose not explicitly stated
Large Language Model Transformers
D
Nyaaneet
21
1
Donut Demo
MIT
This is a Donut model fine-tuned on the CORD-v2 dataset, designed for image-to-text tasks, achieving an average accuracy of 0.901.
Image-to-Text Transformers
D
katanaml
24
3
Layoutlmv3 Finetuned Funsd
A document understanding model fine-tuned on the nielsr/funsd-layoutlmv3 dataset based on microsoft/layoutlmv3-base
Text Recognition Transformers
L
Narsil
799
0
Donut Base Sroie
MIT
This model is a fine-tuned version of naver-clova-ix/donut-base on an image folder dataset, suitable for document understanding tasks.
Text Recognition Transformers
D
unstructuredio
31
1
Dof Invoice 1
MIT
Invoice processing model fine-tuned based on naver-clova-ix/donut-base
Image-to-Text Transformers
D
Sebabrata
13
0
Donut Base Label Studio 200 Invoices
MIT
Invoice recognition model based on Donut architecture, fine-tuned on a dataset of 200 invoices
Text Recognition Transformers
D
Prem11100
18
0
Donut Base Sroie
MIT
A document understanding model fine-tuned based on philschmid/donut-base-sroie
Text Recognition Transformers
D
Prem11100
13
0
Donut Base Sroie
MIT
A document understanding model fine-tuned from naver-clova-ix/donut-base, suitable for image text extraction tasks
Text Recognition Transformers
D
philschmid
185
3
Donut Demo
MIT
VisionEncoderDecoder model fine-tuned on the CORD-v2 dataset for document understanding tasks
Text Recognition Transformers
D
nielsr
56
1
Layoutlmv3 Finetuned Wildreceipt
A version fine-tuned on the WildReceipt dataset based on the LayoutLMv3-base model, designed for receipt key information extraction tasks
Text Recognition Transformers
L
Theivaprakasham
118
3
Layoutlmv3 Finetuned Invoice
An invoice information extraction model fine-tuned based on the LayoutLMv3 architecture, demonstrating outstanding performance on the SROIE dataset
Text Recognition Transformers
L
ronak1998
71
3
Layoutlmv3 Finetuned Invoice
A version of LayoutLMv3-base fine-tuned on an invoice dataset for invoice information extraction
Text Recognition Transformers
L
Theivaprakasham
896
20
Layoutlmv3 Finetuned Cord
A document understanding model fine-tuned on the CORD dataset based on LayoutLMv3, excelling in document token classification tasks
Text Recognition Transformers
L
nielsr
617
12
Layoutlmv3 Finetuned Funsd
A document understanding model fine-tuned on the FUNSD dataset based on the LayoutLMv3-base model, excelling in token classification tasks for forms and documents
Text Recognition Transformers
L
nielsr
2,420
25
Layoutlmv2 Finetuned Cord
A fine-tuned version of the microsoft/layoutlmv2-base-uncased model on an unknown dataset, suitable for document understanding tasks
Text Recognition Transformers
L
speydach
17
0
Layoutlmv2 Finetuned Cord
A document understanding model fine-tuned on the CORD dataset based on the LayoutLMv2 architecture, suitable for structured document information extraction tasks
Text Recognition Transformers
L
katanaml
29
3
Layoutlmv2 Finetuned Funsd
A document understanding model fine-tuned on the FUNSD dataset based on Microsoft's LayoutLMv2
Text Recognition Transformers
L
nielsr
1,319
13
Layoutlmv2 Finetuned Sroie Mod
A document understanding model fine-tuned from microsoft/layoutlmv2-base-uncased, suitable for structured document information extraction tasks
Large Language Model Transformers
L
Theivaprakasham
37
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase