U

Unime Phi3.5 V 4.2B

Developed by DeepGlint-AI
UniME is a general embedding learning model based on a multimodal large model, focusing on breaking down modal barriers to achieve cross-modal retrieval and embedding learning.
Downloads 54
Release Time : 4/25/2025

Model Overview

UniME employs text discriminative knowledge distillation and hard negative sample-enhanced instruction tuning to enhance the embedding capabilities of multimodal large models, supporting cross-modal retrieval for both images and text.

Model Features

Text Discriminative Knowledge Distillation
Aligns the embeddings of the student and teacher models in batch similarity distributions using KL divergence, fine-tuning only the language model component while keeping other parameters frozen.
Hard Negative Sample-Enhanced Instruction Tuning
Uses a similarity threshold-based false negative sample filtering mechanism and an automatic hard negative sample sampling strategy to improve visual sensitivity, strengthen cross-modal alignment, and enhance instruction-following capabilities.
High-Resolution Image Processing
Supports training with 336ร—336 image resolution and delivers outstanding performance in multimodal embedding benchmarks.

Model Capabilities

Image Embedding
Text Embedding
Cross-Modal Retrieval
Multimodal Alignment

Use Cases

Cross-Modal Retrieval
Image-to-Text Retrieval
Retrieve relevant text descriptions based on image content.
Ranked first on the MMEB leaderboard.
Text-to-Image Retrieval
Retrieve relevant images based on text descriptions.
Performs excellently in diverse retrieval tasks.
Featured Recommended AI Models
ยฉ 2025AIbase