U

Unime LLaVA 1.6 7B

Developed by DeepGlint-AI
UniME is a general embedding learning model based on a multimodal large model, trained with 336×336 image resolution and ranked first on the MMEB leaderboard.
Downloads 188
Release Time : 4/25/2025

Model Overview

UniME enhances the embedding capabilities of multimodal large models through text-discriminative knowledge distillation and hard negative mining instruction tuning, suitable for cross-modal retrieval tasks.

Model Features

Text-Discriminative Knowledge Distillation
Aligns the embedding of the student model with the teacher model in batch similarity distribution via KL divergence, fine-tuning only the LLM component while freezing all other parameters.
Hard Negative Mining
Employs a similarity threshold-based false negative filtering mechanism to eliminate misleading samples and automatically selects top-k similar but mismatched samples to increase training difficulty.
High-Resolution Training
Trained with 336×336 image resolution to enhance visual detail capture capability.

Model Capabilities

Cross-modal Retrieval
Image Understanding
Text Understanding
Embedding Learning

Use Cases

Cross-modal Retrieval
Image-Text Matching
Computes the similarity between images and text descriptions
Achieved outstanding performance in MMEB evaluation
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase