Model Selection

Document Understanding

# Document Understanding

Qwen2.5 VL 7B Instruct Quantized.w8a8

Quantized version of Qwen2.5-VL-7B-Instruct, supporting vision-text input and text output, optimized for inference efficiency through INT8 weight quantization

Transformers English

Qwen2.5 VL 3B Instruct FP8 Dynamic

The FP8 quantized version of Qwen2.5-VL-3B-Instruct, supporting visual-text input and text output, and optimizing inference efficiency.

Transformers English

H2ovl Mississippi 800m

An 800M-parameter vision-language model from H2O.ai, specializing in OCR and document understanding with excellent performance

Transformers English

Idefics3 8B Llama3

Idefics3 is an open-source multimodal model capable of processing arbitrary sequences of image and text inputs to generate text outputs. It shows significant improvements in OCR, document understanding, and visual reasoning.

Transformers English

Donut is a Transformer-based image-to-text model capable of extracting and generating textual content from images.

Fine Tuned Rvl Cdip

A fine-tuned version of the microsoft/layoutlmv3-base model for document image classification tasks, achieving an F1 score of 0.8177 on the evaluation set

Text Recognition

Donut Base Handwriting Recognition

Handwriting recognition model fine-tuned based on naver-clova-ix/donut-base

Text Recognition

Docllm Baichuan2 7b

DocLLM_reimplementation is a large language model implementation project for document understanding tasks, aimed at reimplementing and improving document comprehension capabilities.

Large Language Model

JinghuiLuAstronaut

A document understanding model fine-tuned from Yazawa/donut-base-sroie, suitable for structured document information extraction tasks

Text Recognition

Donut Receipt V3

Model fine-tuned based on naver-clova-ix/donut-base, specific purpose not explicitly stated

Large Language Model

Donut Receipt V2

A model fine-tuned based on naver-clova-ix/donut-base, potentially used for receipt recognition or document understanding tasks

Large Language Model

Donut Base Sroie

A model fine-tuned on an image folder dataset based on naver-clova-ix/donut-base, with no specific use case explicitly stated

Text Recognition

Donut Trained Example 3

Fine-tuned model based on Donut architecture, specific purpose and functionality require more information

Large Language Model

Donut Trained Example 2

Model fine-tuned based on naver-clova-ix/donut-base, specific purpose not clearly stated

Large Language Model

Donut Base Receipt V3

Receipt recognition model fine-tuned based on naver-clova-ix/donut-base

Large Language Model

Donut Base Receipt

A receipt recognition model fine-tuned based on naver-clova-ix/donut-base

Large Language Model

Model fine-tuned based on naver-clova-ix/donut-base, specific purpose not explicitly stated

Large Language Model

This is a Donut model fine-tuned on the CORD-v2 dataset, designed for image-to-text tasks, achieving an average accuracy of 0.901.

Layoutlmv3 Finetuned Funsd

A document understanding model fine-tuned on the nielsr/funsd-layoutlmv3 dataset based on microsoft/layoutlmv3-base

Text Recognition

Donut Base Sroie

This model is a fine-tuned version of naver-clova-ix/donut-base on an image folder dataset, suitable for document understanding tasks.

Text Recognition

Invoice processing model fine-tuned based on naver-clova-ix/donut-base

Donut Base Label Studio 200 Invoices

Invoice recognition model based on Donut architecture, fine-tuned on a dataset of 200 invoices

Text Recognition

Donut Base Sroie

A document understanding model fine-tuned based on philschmid/donut-base-sroie

Text Recognition

Donut Base Sroie

A document understanding model fine-tuned from naver-clova-ix/donut-base, suitable for image text extraction tasks

Text Recognition

VisionEncoderDecoder model fine-tuned on the CORD-v2 dataset for document understanding tasks

Text Recognition

Layoutlmv3 Finetuned Wildreceipt

A version fine-tuned on the WildReceipt dataset based on the LayoutLMv3-base model, designed for receipt key information extraction tasks

Text Recognition

Theivaprakasham

Layoutlmv3 Finetuned Invoice

An invoice information extraction model fine-tuned based on the LayoutLMv3 architecture, demonstrating outstanding performance on the SROIE dataset

Text Recognition

Layoutlmv3 Finetuned Invoice

A version of LayoutLMv3-base fine-tuned on an invoice dataset for invoice information extraction

Text Recognition

Theivaprakasham

Layoutlmv3 Finetuned Cord

A document understanding model fine-tuned on the CORD dataset based on LayoutLMv3, excelling in document token classification tasks

Text Recognition

Layoutlmv3 Finetuned Funsd

A document understanding model fine-tuned on the FUNSD dataset based on the LayoutLMv3-base model, excelling in token classification tasks for forms and documents

Text Recognition

Layoutlmv2 Finetuned Cord

A fine-tuned version of the microsoft/layoutlmv2-base-uncased model on an unknown dataset, suitable for document understanding tasks

Text Recognition

Layoutlmv2 Finetuned Cord

A document understanding model fine-tuned on the CORD dataset based on the LayoutLMv2 architecture, suitable for structured document information extraction tasks

Text Recognition

Layoutlmv2 Finetuned Funsd

A document understanding model fine-tuned on the FUNSD dataset based on Microsoft's LayoutLMv2

Text Recognition

Layoutlmv2 Finetuned Sroie Mod

A document understanding model fine-tuned from microsoft/layoutlmv2-base-uncased, suitable for structured document information extraction tasks

Large Language Model

Theivaprakasham

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase