Model Selection

Image understanding

# Image understanding

Internlm Xcomposer2d5 Ol 7b

InternLM-XComposer2.5-OL is a comprehensive multimodal system supporting long-term streaming video and audio interaction.

Paligemma Longprompt V1 Safetensors

Experimental vision model combining keyword tags with long text descriptions for image prompt generation

Mixtral AI Vision 128k 7b

A multimodal model that combines visual and language abilities, achieving image-text interaction through a merging method

Transformers English

Vit Medium Patch16 Clip 224.tinyclip Yfcc15m

CLIP model based on ViT architecture for zero-shot image classification tasks

Image Classification

Finetuned Git Large Chest Xrays

A vision-language model under MIT License, focused on generating text descriptions from images.

Image Generation

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase