# High-precision Visual Understanding
Mlcd Vit Bigg Patch14 448
MIT
MLCD-ViT-bigG is an advanced Vision Transformer model enhanced with 2D Rotary Position Encoding (RoPE2D), excelling in document understanding and visual question answering tasks.
Text Recognition
M
DeepGlint-AI
1,517
3
Cogvlm Chat Hf
Apache-2.0
CogVLM is a powerful open-source vision-language model that achieves leading performance in multiple cross-modal benchmarks
Text-to-Image
Transformers English

C
THUDM
4,816
193
Featured Recommended AI Models