L

Llave 0.5B

Developed by zhibinlan
LLaVE is a multimodal embedding model based on the LLaVA-OneVision-0.5B model, with a parameter scale of 0.5B, capable of embedding text, images, multiple images, and videos.
Downloads 2,897
Release Time : 2/6/2025

Model Overview

LLaVE is a multimodal embedding model that can process text, image, and video data, supporting tasks such as sentence similarity calculation and zero-shot image classification.

Model Features

Multimodal Embedding
Capable of embedding text, image, and video data simultaneously
Efficient Performance
Achieves excellent performance on the MMEB leaderboard with only a small number of parameters and training data
Zero-shot Transfer Capability
Trained on image-text data but can generalize to text-video retrieval tasks in a zero-shot manner

Model Capabilities

Text embedding
Image embedding
Video embedding
Sentence similarity calculation
Zero-shot image classification
Cross-modal retrieval

Use Cases

Image Retrieval
Text-based Image Search
Retrieve relevant images based on text descriptions
Performs excellently in MMEB evaluations
Cross-modal Retrieval
Text-to-Video Retrieval
Retrieve relevant video clips based on text descriptions
Demonstrates strong performance in zero-shot scenarios
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase