L

Llave 7B

Developed by zhibinlan
LLaVE-7B is a 7-billion-parameter multimodal embedding model based on LLaVA-OneVision-7B, capable of embedding representations for text, images, multiple images, and videos.
Downloads 1,389
Release Time : 2/9/2025

Model Overview

LLaVE-7B is a multimodal embedding model that can process embedding representations for text, images, multiple images, and videos. It performs excellently on the MMEB leaderboard and demonstrates strong transfer learning capabilities.

Model Features

Multimodal Embedding Capability
Capable of embedding representations for text, images, multiple images, and videos simultaneously
Outstanding Performance
Achieved state-of-the-art performance on MMEB with only 662,000 training samples
Strong Transfer Ability
Although trained on image-text data, it can generalize to text-video retrieval tasks in a zero-shot manner
Efficient Training
Achieved excellent performance with only a small amount of data

Model Capabilities

Text Embedding Representation
Image Embedding Representation
Multi-image Embedding Representation
Video Embedding Representation
Cross-modal Retrieval
Zero-shot Transfer Learning

Use Cases

Information Retrieval
Cross-modal Retrieval
Retrieve relevant images or videos based on text queries
Ranked first on the MMEB leaderboard
Content Understanding
Image Content Understanding
Understand image content and generate relevant text representations
Can accurately distinguish different objects in images
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase