AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Image understanding

# Image understanding

Internlm Xcomposer2d5 Ol 7b
Other
InternLM-XComposer2.5-OL is a comprehensive multimodal system supporting long-term streaming video and audio interaction.
Text-to-Image
I
internlm
79
49
Paligemma Longprompt V1 Safetensors
Gpl-3.0
Experimental vision model combining keyword tags with long text descriptions for image prompt generation
Image-to-Text Transformers
P
mnemic
38
1
Mixtral AI Vision 128k 7b
MIT
A multimodal model that combines visual and language abilities, achieving image-text interaction through a merging method
Image-to-Text Transformers English
M
LeroyDyer
384
4
Vit Medium Patch16 Clip 224.tinyclip Yfcc15m
MIT
CLIP model based on ViT architecture for zero-shot image classification tasks
Image Classification
V
timm
144
0
Finetuned Git Large Chest Xrays
MIT
A vision-language model under MIT License, focused on generating text descriptions from images.
Image Generation Transformers Supports Multiple Languages
F
daniyal214
15
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase