U

Unime LLaVA OneVision 7B

Developed by DeepGlint-AI
UniME is a general embedding learning framework based on multimodal large models, significantly enhancing multimodal embedding capabilities through text discriminative knowledge distillation and hard negative sample-enhanced instruction tuning strategies.
Downloads 376
Release Time : 5/6/2025

Model Overview

UniME aims to break through modal barriers and enhance the embedding capabilities of multimodal large models through innovative training methods, achieving excellent performance on the MMEB leaderboard.

Model Features

Text Discriminative Knowledge Distillation
By decoupling the LLM component of the large model, processing text with prompts, and aligning the embedding vectors of the student model with the teacher model based on KL divergence, only the LLM component is fine-tuned.
Hard Negative Sample Enhancement
Adopts a false negative sample filtering mechanism based on similarity thresholds and an automatic selection strategy for top-k similar but mismatched samples to increase training difficulty and improve model performance.
Multimodal Embedding Optimization
Optimizes the multimodal system by enhancing visual sensitivity, strengthening cross-modal alignment, and improving instruction-following capabilities.

Model Capabilities

Multimodal Embedding Learning
Image Text Understanding
Cross-modal Retrieval
Text Summarization

Use Cases

Information Retrieval
Cross-modal Retrieval
Retrieve relevant text descriptions based on images, or retrieve relevant images based on text
Performs excellently in MMEB evaluations
Content Understanding
Image Content Summarization
Summarize image content with concise words
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase