AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal learning

# Multimodal learning

Openvision Vit Base Patch8 384
Apache-2.0
OpenVision is a fully open-source and cost-effective family of advanced visual encoders, specifically designed for multimodal learning.
Multimodal Fusion Transformers
O
UCSC-VLAA
47
0
Eagle2.5 8B
Other
Eagle 2.5 is a cutting-edge vision-language model (VLM) designed for long-context multimodal learning, supporting the processing of video sequences up to 512 frames and high-resolution images.
Text-to-Image Transformers Other
E
nvidia
2,626
8
Instruct CLIP
Apache-2.0
InstructCLIP is a model that automatically optimizes data through contrastive learning to enhance instruction-guided image editing.
Text-to-Image English
I
SherryXTChen
74
2
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase