# Image understanding
Internlm Xcomposer2d5 Ol 7b
Other
InternLM-XComposer2.5-OL is a comprehensive multimodal system supporting long-term streaming video and audio interaction.
Text-to-Image
I
internlm
79
49
Paligemma Longprompt V1 Safetensors
Gpl-3.0
Experimental vision model combining keyword tags with long text descriptions for image prompt generation
Image-to-Text
Transformers

P
mnemic
38
1
Mixtral AI Vision 128k 7b
MIT
A multimodal model that combines visual and language abilities, achieving image-text interaction through a merging method
Image-to-Text
Transformers English

M
LeroyDyer
384
4
Vit Medium Patch16 Clip 224.tinyclip Yfcc15m
MIT
CLIP model based on ViT architecture for zero-shot image classification tasks
Image Classification
V
timm
144
0
Finetuned Git Large Chest Xrays
MIT
A vision-language model under MIT License, focused on generating text descriptions from images.
Image Generation
Transformers Supports Multiple Languages

F
daniyal214
15
0
Featured Recommended AI Models