# Multimodal Embedding Learning
Unime LLaVA OneVision 7B
MIT
UniME is a general embedding learning framework based on multimodal large models, significantly enhancing multimodal embedding capabilities through text discriminative knowledge distillation and hard negative sample-enhanced instruction tuning strategies.
Multimodal Alignment
Transformers English

U
DeepGlint-AI
376
2
Unime LLaVA 1.6 7B
MIT
UniME is a general embedding learning model based on a multimodal large model, trained with 336×336 image resolution and ranked first on the MMEB leaderboard.
Image-to-Text
Transformers English

U
DeepGlint-AI
188
3
Featured Recommended AI Models