Model Selection

High-Precision OCR

# High-Precision OCR

Qwen2.5-VL-32B-Instruct is the latest vision-language model in the Qwen family, featuring powerful visual understanding and intelligent agent capabilities, supporting multimodal task processing.

Transformers Supports Multiple Languages

Qwen2.5 VL 72B Instruct AWQ

Qwen2.5-VL is a multimodal large language model launched by the QwenLM team, featuring powerful visual understanding and intelligent agent capabilities, supporting various input formats including images, videos, and text.

Transformers English

OCR TextInput Base

A specialized image-to-text model for the financial domain, supporting English text recognition, primarily used for processing image content in financial documents.

Text Recognition

Transformers English

MoAI is a large-scale language and vision hybrid model capable of processing both image and text inputs to generate text outputs.

Finetune Donut Cord V2.5

This is a vision-language model based on the Donut architecture, specifically fine-tuned for the CORD-V2 dataset for document image-to-text tasks.

MGP-STR is a pure vision-based scene text recognition model that achieves efficient OCR through multi-granularity prediction.

Text Recognition

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase