AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
High-precision Visual Understanding

# High-precision Visual Understanding

Mlcd Vit Bigg Patch14 448
MIT
MLCD-ViT-bigG is an advanced Vision Transformer model enhanced with 2D Rotary Position Encoding (RoPE2D), excelling in document understanding and visual question answering tasks.
Text Recognition
M
DeepGlint-AI
1,517
3
Cogvlm Chat Hf
Apache-2.0
CogVLM is a powerful open-source vision-language model that achieves leading performance in multiple cross-modal benchmarks
Text-to-Image Transformers English
C
THUDM
4,816
193
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase