# Multimodal learning
Openvision Vit Base Patch8 384
Apache-2.0
OpenVision is a fully open-source and cost-effective family of advanced visual encoders, specifically designed for multimodal learning.
Multimodal Fusion
Transformers

O
UCSC-VLAA
47
0
Instruct CLIP
Apache-2.0
InstructCLIP is a model that automatically optimizes data through contrastive learning to enhance instruction-guided image editing.
Text-to-Image English
I
SherryXTChen
74
2
Featured Recommended AI Models