# High-Precision OCR
Space Model
Apache-2.0
Qwen2.5-VL-32B-Instruct is the latest vision-language model in the Qwen family, featuring powerful visual understanding and intelligent agent capabilities, supporting multimodal task processing.
Image-to-Text
Transformers Supports Multiple Languages

S
Alhdrawi
58
1
Qwen2.5 VL 72B Instruct AWQ
Other
Qwen2.5-VL is a multimodal large language model launched by the QwenLM team, featuring powerful visual understanding and intelligent agent capabilities, supporting various input formats including images, videos, and text.
Text-to-Image
Transformers English

Q
Benasd
173
6
OCR TextInput Base
A specialized image-to-text model for the financial domain, supporting English text recognition, primarily used for processing image content in financial documents.
Text Recognition
Transformers English

O
rohit5895
31
0
Moai 7B
MIT
MoAI is a large-scale language and vision hybrid model capable of processing both image and text inputs to generate text outputs.
Image-to-Text
Transformers

M
BK-Lee
183
45
Finetune Donut Cord V2.5
Openrail
This is a vision-language model based on the Donut architecture, specifically fine-tuned for the CORD-V2 dataset for document image-to-text tasks.
Image-to-Text
Transformers

F
fahmiaziz
97
3
Mgp Str Base
MGP-STR is a pure vision-based scene text recognition model that achieves efficient OCR through multi-granularity prediction.
Text Recognition
Transformers

M
alibaba-damo
4,981
64
Featured Recommended AI Models